Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : Which is the correct for Pseudo-Distributed mode of the Hadoop

 : Which is the correct for Pseudo-Distributed mode of the Hadoop
1. This a single machine cluster
2. All daemons run on the same machine
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2 and 3 are correct
5. Only 1 and 2 are correct




Correct Answer : Get Lastest Questions and Answer :


Explanation: A developer will configure their machine to run in Pseudo-Distributed mode

This effectively creates a single machine cluster
All five Hadoop daemons are running on the same machine
Very useful for testing code before it is deployed to the real cluster

Refer HadoopExam.com Recorded Training Module : 14 and 16





Question : Which daemon is responsible for the Housekeeping of the NameNode ?
  : Which daemon is responsible for the Housekeeping of the NameNode ?
1. JobTracker
2. Tasktracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. Secondary NameNode



Correct Answer : Get Lastest Questions and Answer :


Explanation: Hadoop is comprised of five separate daemons
NameNode : Holds the metadata for HDFS

Secondary NameNode : Performs housekeeping functions for the NameNode
- Is not a backup or hot standby for the NameNode

DataNode : Stores actual HDFS data blocks

JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running TaskTracker

TaskTracker : Instantiates and monitors individual Map and Reduce tasks

Refer HadoopExam.com Recorded Training Module : 2 and 3




Question : Which daemon is responsible for instantiating and monitoring individual Map and Reduce Task
  : Which daemon is responsible for instantiating  and monitoring individual  Map and Reduce Task
1. JobTracker
2. TaskTracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode


Correct Answer : Get Lastest Questions and Answer :


Explanation: Hadoop is comprised of five separate daemons

NameNode : Holds the metadata for HDFS

Secondary NameNode : Performs housekeeping functions for the NameNode
- Is not a backup or hot standby for the NameNode
DataNode : Stores actual HDFS data blocks

JobTracker : Manages MapReduce jobs, distributes individual tasks to machines running TaskTracker

TaskTracker : Instantiates and monitors individual Map and Reduce tasks

Refer HadoopExam.com Recorded Training Module : 2 and 3



Related Questions


Question : The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat's
logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your
program-lines are not missed or broken, for example-but it's worth knowing about, as it does mean that data-local maps (that is,
maps that are running on the same host as their input data) will perform some remote reads. The slight overhead this causes is not
normally significant. With the latest version of Hadoop , which also include MR2.
You submitted a job to process www.HadoopExam.com single log file , which is made up of two blocks, named BLOCKX and BLOCKY.
BLOCKX is on nodeA, and is being processed by a Mapper running on that node. BLOCKY is on nodeB.
A record spans the two blocks that is, the first part of the record is in BLOCKX,
but the end of the record is in BLOCKY. What happens as the record is being read by the Mapper on NODEA?
  : The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat's
1. The remaining part of the record is streamed across the network from either nodeA or nodeB
2. The remaining part of the record is streamed across the network from nodeA
3. Access Mostly Uused Products by 50000+ Subscribers
4. The remaining part of the record is streamed across the network from nodeB


Question : If you run the word count MapReduce program with m map tasks and r reduce tasks,
how many output files will you get at the end of the job, and how many key-value pairs will there be in each file?
Assume k is the number of unique words in the input files. (The word count program reads
text input and produces output that contains every distinct word and the number of times that word occurred anywhere in the text.)
  : If you run the word count MapReduce program with m map tasks and r reduce tasks,
1. There will be r files, each with approximately m/r key-value pairs.
2. There will be m files, each with approximately k/r key-value pairs.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There will be r files, each with approximately k/m key-value pairs.


Question : While processing the MAIN.PROFILE.log generated in the Apache WebServer of the QuickTechie.com website using MapReduce job.
There are 100 nodes in the cluster and 3 reducers defined. Which of the reduce tasks will process a Text key which begins with the regular expression "\w+"?
  : While processing the MAIN.PROFILE.log generated in the Apache WebServer of the QuickTechie.com website using MapReduce job.
1. First Reducer will process the key, which satisfies the regular expression "\w+"
2. Second Reducer will process the key, which satisfies the regular expression "\w+"
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not enough data to determine which reduce task will receive which key


Question : To process the www.HadoopExam.com MAIN.PROFILE.log file You submit a job to a cluster running on MRv.
There are 1000 slave nodes in a 100 rack, You have NOT specified a rack topology script. Your job has a single Reducer which runs on Node7 of Rack7.
The output file it writes is small enough to fit in a single HDFS block. How does Hadoop handle writing the output file?
  : To process the www.HadoopExam.com MAIN.PROFILE.log file You submit a job to a cluster running on MRv.
1. The first replica of the block will be stored in any node out of 1000 nodes.
2. The first replica of the block will be stored on node7 of Rack7 only. The other two replicas will be stored on other nodes in any rack.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The first replica of the block will be stored on node7 in rack7. The other two replicas will be stored on node6 and node8 in rack7


Question : In which of the following scenario we should use HBase
 :  In which of the following scenario  we should use HBase
1. If it require random read, write or both
2. If it requires to do many thousands of operations per second on multiple TB of data
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above




Question : In which scenario HBase should not be used

 :  In which scenario  HBase should not be used
1. You only append to your dataset, and tend to read the whole thing
2. For ad-hoc analytics
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. None of the above