Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : In the Job class ?

 : In the Job class ?
1. Create a Job instance

2. You submit the Job

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: The job submitter's view of the Job. It allows the user to configure the job, submit it, control its execution, and query the
state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.




Question : You have submitted the Job and then you call a setXXX() method on that job instance, what will happen ?


 : You have submitted the Job and then you call a setXXX() method on that job instance, what will happen ?
1. It will set new values on submitted job and apply on runtime

2. It will set new values and will be applied only in the Mapper and Reducer which yet to start

3. Access Mostly Uused Products by 50000+ Subscribers

4. It will not through any error and silently discard new set value


Correct Answer : Get Lastest Questions and Answer :
Explanation: The job submitter's view of the Job. It allows the user to configure the job, submit it, control its execution, and query the
state. The set methods only work until the job is submitted, afterwards they will throw an IllegalStateException.




Question : Which of the following is true?

 : Which of the following is true?
1. Both submit() and waitForCompletion() methods are blocking call

2. Both submit() and waitForCompletion() methods are non-blocking call

3. Access Mostly Uused Products by 50000+ Subscribers

4.

Correct Answer : Get Lastest Questions and Answer :
Explanation: If your aim is to run jobs in parallel then there is certainly no risk in using job.submit(). The main reason
job.waitForCompletion exists is that it's method call returns only when the job gets finished, and it returns with it's success or failure status which can
be used to determine that further steps are to be run or not.

Now, getting back at you seeing only the first job being executed, this is because by default Hadoop schedules the jobs in FIFO order. You certainly can
change this behavior.



Related Questions


Question : The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat's
logical records are lines, which will cross HDFS boundaries more often than not. This has no bearing on the functioning of your
program-lines are not missed or broken, for example-but it's worth knowing about, as it does mean that data-local maps (that is,
maps that are running on the same host as their input data) will perform some remote reads. The slight overhead this causes is not
normally significant. With the latest version of Hadoop , which also include MR2.
You submitted a job to process www.HadoopExam.com single log file , which is made up of two blocks, named BLOCKX and BLOCKY.
BLOCKX is on nodeA, and is being processed by a Mapper running on that node. BLOCKY is on nodeB.
A record spans the two blocks that is, the first part of the record is in BLOCKX,
but the end of the record is in BLOCKY. What happens as the record is being read by the Mapper on NODEA?
  : The logical records that FileInputFormats define do not usually fit neatly into HDFS blocks. For example, a TextInputFormat's
1. The remaining part of the record is streamed across the network from either nodeA or nodeB
2. The remaining part of the record is streamed across the network from nodeA
3. Access Mostly Uused Products by 50000+ Subscribers
4. The remaining part of the record is streamed across the network from nodeB


Question : If you run the word count MapReduce program with m map tasks and r reduce tasks,
how many output files will you get at the end of the job, and how many key-value pairs will there be in each file?
Assume k is the number of unique words in the input files. (The word count program reads
text input and produces output that contains every distinct word and the number of times that word occurred anywhere in the text.)
  : If you run the word count MapReduce program with m map tasks and r reduce tasks,
1. There will be r files, each with approximately m/r key-value pairs.
2. There will be m files, each with approximately k/r key-value pairs.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There will be r files, each with approximately k/m key-value pairs.


Question : While processing the MAIN.PROFILE.log generated in the Apache WebServer of the QuickTechie.com website using MapReduce job.
There are 100 nodes in the cluster and 3 reducers defined. Which of the reduce tasks will process a Text key which begins with the regular expression "\w+"?
  : While processing the MAIN.PROFILE.log generated in the Apache WebServer of the QuickTechie.com website using MapReduce job.
1. First Reducer will process the key, which satisfies the regular expression "\w+"
2. Second Reducer will process the key, which satisfies the regular expression "\w+"
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not enough data to determine which reduce task will receive which key


Question : To process the www.HadoopExam.com MAIN.PROFILE.log file You submit a job to a cluster running on MRv.
There are 1000 slave nodes in a 100 rack, You have NOT specified a rack topology script. Your job has a single Reducer which runs on Node7 of Rack7.
The output file it writes is small enough to fit in a single HDFS block. How does Hadoop handle writing the output file?
  : To process the www.HadoopExam.com MAIN.PROFILE.log file You submit a job to a cluster running on MRv.
1. The first replica of the block will be stored in any node out of 1000 nodes.
2. The first replica of the block will be stored on node7 of Rack7 only. The other two replicas will be stored on other nodes in any rack.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The first replica of the block will be stored on node7 in rack7. The other two replicas will be stored on node6 and node8 in rack7


Question : In which of the following scenario we should use HBase
 :  In which of the following scenario  we should use HBase
1. If it require random read, write or both
2. If it requires to do many thousands of operations per second on multiple TB of data
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above




Question : In which scenario HBase should not be used

 :  In which scenario  HBase should not be used
1. You only append to your dataset, and tend to read the whole thing
2. For ad-hoc analytics
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. None of the above