Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : Which Daemons control the Hadoop Mapreduce Job

1. TaskTracker
2. NameNode
3. Access Mostly Uused Products by 50000+ Subscribers
4. JobTracker

Correct Answer : Get Lastest Questions and Answer :

Question : Arrange the life cycle of the Mapreduce Job based on below option
1. Each Nodes which run SOFTWARE DAEMON known as Tasktracker
2. Clients submit the Mapreduce Job to the Jobtracker
3. The Jobtracker assigns Map and reduce Tasks to the other nodes on the cluster
4. The TaskTracker is responsible for actually instantiating the Map and Reduce Task
5. Tasktracker report the tasks progress back to the JobTracker

1. 1,2,3,4,5
2. 2,1,3,4,5
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,3,2,4,5

Correct Answer : Get Lastest Questions and Answer :

Question :

How to define a Job in Hadoop ?

1. Is the execution of Mapper or reducer instance
2. A couple of Mapper and reducer which work on same file block
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above
Solution : 48

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question : In word count MapReduce algorithm, why might using a combiner (Combiner, runs after the Mapper and before the Reducer. )
reduce the overall job running time?

1. combiners perform local filtering of repeated word, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.
2. combiners perform global aggregation of word counts, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.
3. Access Mostly Uused Products by 50000+ Subscribers
4. combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.

Question : The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat's
logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your
program-lines are not missed or broken, for example-but it's worth knowing about, as it does mean that data-local maps (that is,
maps that are running on the same host as their input data) will perform some remote reads. The slight overhead this causes is not
normally significant. With the latest version of Hadoop provided by Cloudera, which also include MR2.
You submitted a job to process www.HadoopExam.com single log file , which is made up of two blocks, named BLOCKX and BLOCKY.
BLOCKX is on nodeA, and is being processed by a Mapper running on that node. BLOCKY is on nodeB.
A record spans the two blocks that is, the first part of the record is in BLOCKX,
but the end of the record is in BLOCKY. What happens as the record is being read by the Mapper on NODEA?

1. The remaining part of the record is streamed across the network from either nodeA or nodeB
2. The remaining part of the record is streamed across the network from nodeA
3. Access Mostly Uused Products by 50000+ Subscribers
4. The remaining part of the record is streamed across the network from nodeB

Question : If you run the word count MapReduce program with m map tasks and r reduce tasks,
how many output files will you get at the end of the job, and how many key-value pairs will there be in each file?
Assume k is the number of unique words in the input files. (The word count program reads
text input and produces output that contains every distinct word and the number of times that word occurred anywhere in the text.)

1. There will be r files, each with approximately m/r key-value pairs.
2. There will be m files, each with approximately k/r key-value pairs.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There will be r files, each with approximately k/m key-value pairs.

Question : While processing the MAIN.PROFILE.log generated in the Apache WebServer of the QuickTechie.com website using MapReduce job.
There are 100 nodes in the cluster and 3 reducers defined. Which of the reduce tasks will process a Text key which begins with the regular expression "\w+"?

1. First Reducer will process the key, wich satisfies the regular expression "\w+"
2. Second Reducer will process the key, wich satisfies the regular expression "\w+"
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not enough data to determine which reduce task will receive which key

Question : To process the www.HadoopExam.com MAIN.PROFILE.log file You submit a job to a cluster running on MRv.
There are 1000 slave nodes in a 100 rack, You have NOT specified a rack topology script. Your job has a single Reducer which runs on Node7 of Rack7.
The output file it writes is small enough to fit in a single HDFS block. How does Hadoop handle writing the output file?

1. The first replica of the block will be stored in any node out of 1000 nodes.
2. The first replica of the block will be stored on node7 of Rack7 only. The other two replicas will be stored on other nodes in any rack.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The first replica of the block will be stored on node7 in rack7. The other two replicas will be stored on node6 and node8 in rack7

Question :

Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1

Test2.txt
www.hadoopexam.com Hadoop YARN Training

Test3.txt
http://hadoopexam.com Amazon WebService Training

Now you run the following command
hadoop fs -getmerge merge/ output1.txt
What is the correct statement?

1. It will create a new file called output1.txt in local file system, with the merged content from the all three files
2. It will create a new file called output1.txt in hdfs file system, with the merged content from the all three files
3. Access Mostly Uused Products by 50000+ Subscribers
4. This command will successful but will not merge the files because of, what to do with new line character is not defined.