Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question :
You are running a MapReduce job, and inside the Mapper you want to get the actual file name which is being processed,
what is the correct code snippet to fetch the filename in Mapper code

 :
1. String fileName = ((FileStatus) context.getFileStatus()).getPath().getName();
2. String fileName = context.getPath().getName();
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above


Correct Answer : Get Lastest Questions and Answer :

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory,
each mapper may read a different file, and I need to know which file the mapper has read.
First you need to get the input split, using the newer mapreduce API it would be done as follows:
context.getInputSplit();
But in order to get the file path and the file name you will need to first typecast the result into FileSplit.
So, in order to get the input file path you may do the following:
Path filePath = ((FileSplit) context.getInputSplit()).getPath();
String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString();
Similarly, to get the file name, you may just call upon getName(), like this:
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();





Question : In MapReduce word count,
you know your file contains the
maximum of three different words,
and after completion of the job
you want there one file will be
created for each reducer. Hence,
you have written a custom
partitioner, which is the correct
code snippet for above requirement.
  : In MapReduce word count,
1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers

Correct Answer : 3

By default hadoop has its own internal logic that it performs on keys and depending on that it calls reducers. So if you want to write a custom partitioner than you have to overwrite that default behaviour by your own logic/algorithm. Unless you know how exactly your keys will vary this logic wont be generic and based on variations you have to figure out the logic.






Question : Input file size (kb) is given, and block size is given (mb). What is the size of the intermediate data occupied.

 : Input file size (kb) is given, and block size is given (mb). What is the size of the intermediate data occupied.
1. 47KB
2. 83KB
3. Access Mostly Uused Products by 50000+ Subscribers
4. Job Fails

Correct Answer : Get Lastest Questions and Answer :
As there is no other extra information give, so assuming all the words in a file will be emitted, then intermeditae size should be 47KB





Related Questions


Question : You have created a MapReduce job to process TimeSeries Market Data file with the driver class called
HadoopDriver (in the default package) packaged into a jar called HadoopExam.jar, what is the appropriate way to submit this job to the cluster?
 : You have created a MapReduce job to process TimeSeries Market Data file with the driver class called
1. hadoop jar HadoopExam.jar HadoopDriver outputdir inputdir
2. hadoop inputdir outputdir jar HadoopExam.jar HadoopDriver
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop jar HadoopExam.jar HadoopDriver inputdir outputdir


Question : To analyze the website click of HadoopExam.com you have written a Mapreduce job, which
will product the click reports for each week e.g. 53 reports for whole year.Which of the following Hadoop API class you must use
so that output file generated as per the weeks and output data will go in corresponding output file.
 : To analyze the website click of HadoopExam.com you have written a Mapreduce job, which
1. Hive
2. MapReduce Chaining
3. Access Mostly Uused Products by 50000+ Subscribers
4. Partitioner


Question : Reducers are generally helpful to write the job ouput data in desried location or database.
In your ETL MapReduce job you set the number of reducer to zero, select the correct statement which applies.
 : Reducers are generally helpful to write the job ouput data in desried location or database.
1. You can not configure number of reducer
2. No reduce tasks execute. The output of each map task is written to a separate file in HDFS
3. Access Mostly Uused Products by 50000+ Subscribers
4. You can not configure number of reducer, it is decided by Tasktracker at runtime


Question : In the QuickTechie website log file named as MAIN.PROFILES.log you have keys are (ipaddres+locations), and the values are Number of clicks (int).
For each unique key (string), you want to find the average of all values associated with each key. In writing a MapReduce program to accomplish this, can you take advantage of a combiner?
 : In the QuickTechie website log file named as MAIN.PROFILES.log you have keys are (ipaddres+locations), and the values are Number of clicks (int).
1. No, best way to accomplish this you have to use Aapche Pig
2. No, best way to accomplish this you have to use MapReduce chaining.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes


Question : In our website www.HadoopExam.com we have Million profiles and created ETL jobs for procesing this file.
You have submited a ETL mapReduce job for HadoopExam.com websites log file analysis as well as combining profile data to Hadoop
and notice in the JobTracker's Web UI that the Mappers are 80% complete
while the reducers are 20% complete. What is the best explanation for this?
 : In our website www.HadoopExam.com we have  Million profiles and created ETL jobs for procesing this file.
1. The progress attributed to the reducer refers to the transfer of data from completed Mappers.
2. The progress attributed to the reducer refers to the transfer of data from Mappers is still going on.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The progress attributed to the reducer refers to the transfer of data from Mappers an not be predicted.


Question : In your MapReduce job, you have three configuration parameters.
What is the correct or best way to pass a these three configuration parameters to a mapper or reducer?
 : In your MapReduce job, you have three configuration parameters.
1. As key pairs in the Configuration object.
2. As value pairs in the Configuration object.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not possible