Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : Which of the following is correct flow for MapReduce streaming job

 :  Which of the following is correct flow for MapReduce streaming job
1. input -> pipeMap -> PipeRedue -> Output

2. input ->pipeMap -> Map -> PipeMap -> Reduce -> Output

3. Access Mostly Uused Products by 50000+ Subscribers

4. input ->Map -> PipeMap -> Reduce -> PipeReduce -> Output

Correct Answer : Get Lastest Questions and Answer :
Explanation:




Question : In Reducer Streaming MapReduce program all the key-value pairs are sent at once
 : In Reducer Streaming MapReduce program all the key-value pairs are sent at once
1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation:




Question : In MapReduce V, select the correct order of Steps of job submission

A. Instantiation of JobClient object
B. Submitting job to JobTracker by JobClient
C. Job Tracker instantiates a job object
D. Task Tracker launches a task, which in turn can run map or reduce task
E. Tasks updates the task tracker with status and counters
 : In MapReduce V, select the correct order of Steps of job submission
1. B,A,C,E,D
2. A,B,D,E,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D,E,C,B
5. A,B,C,D,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: JobClient is the primary interface by which user-job interacts with the JobTracker.
JobClient provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information
and so on.

jobClient submits the to the JobTracker and then JobTracker will instantiates a Job object (Which represent your job and its configuration)
This Job is submitted to TaskTracker and TaskTracker will run tasks like MapTask and ReduceTask. While running the tasks, each task send information back to
TaskTracker
like its current status and counters.



Related Questions


Question : What is the default input format?

  : What is the default input format?
1. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input
2. There is no default input format. The input format always should be specified.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The default input format is TextInputFormat with byte offset as a key and entire line as a value




Question : How can you overwrite the default input format?


  : How can you overwrite the default input format?
1. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file
2. In order to overwrite default input format, a developer has to set new input format
on job config before submitting the job to a cluster
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct


Question : What are the common problems with map-side join?

  : What are the common problems with map-side join?
1. The most common problem with map-side joins is introducing a high level of code complexity.
This complexity has several downsides: increased risk of bugs and performance degradation.
Developers are cautioned to rarely use map-side joins.
2. The most common problem with map-side joins is lack of the available map slots since map-side joins require a lot of mappers.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The most common problem with map-side join is not clearly specifying primary index in the join.
This can lead to very slow performance on large datasets.




Question : Will settings using Java API overwrite values in configuration files?

  :  Will settings using Java API overwrite values in configuration files?
1. No. The configuration settings in the configuration file takes precedence
2. Yes. The configuration settings using Java API take precedence
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only global configuration settings are captured in configuration files on namenode.
There are only a very few job parameters that can be set using Java API




Question : What is distributed cache?
  : What is distributed cache?
1. The distributed cache is special component on namenode that will cache frequently used data for faster client response.
It is used during reduce step
2. The distributed cache is special component on datanode that will cache frequently used data
for faster client response. It is used during map step
3. Access Mostly Uused Products by 50000+ Subscribers
4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.




Question : What is writable?

  : What is writable?
1. Writable is a java interface that needs to be implemented for streaming data to remote servers.
2. Writable is a java interface that needs to be implemented for HDFS writes.
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are corrects