Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : A combiner reduces:

1. The number of values across different keys in the iterator supplied to a single reduce method call.

2. The amount of intermediate data that must be transferred between the mapper and reducer.

3. The number of input files a mapper must process.

4. The number of output files a reducer must produce.

Correct Answer : 2

Explanation: Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can
help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation performed is commutative and
associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more than 1 times. Therefore your MapReduce
jobs should not depend on the combiners execution.

Question : Which two of the following statements are true about hdfs? Choose answers

A. An HDFS file that is larger than dfs.blocksize is split into blocks
B. Blocks are replicated to multiple datanodes
C. HDFS works best when storing a large number of relatively small files
D. Block sizes for all files must be the same size

1. A,B
2. B,C
3. C,D
4. A,D

Correct Answer : 1

Explanation: All the nodes of my Hadoop (1.2.1) cluster have the same hdfs-site.xml file with the same (non-default) value for dfs.block.size (renamed to dfs.blocksize in Hadoop 2.x) of
134217728, which is 128MB.

Question : You want to populate an associative array in order to perform a map-side join. You've decided to put this information in a text file, place that file into the
DistributedCache and read it in your Mapper before any records are processed. Identify which method in the Mapper you should use to implement code for reading the file and
populating the associative array?

1. combine

2. map

3. init

4. configure

Correct Answer : 4

Explanation: DistributedCache can be used to distribute simple, read-only data/text files and/or more complex types such as archives, jars etc. Archives (zip, tar and tgz/tar.gz files) are
un-archived at the slave nodes. Jars may be optionally added to the classpath of the tasks, a rudimentary software distribution mechanism. Files have execution permissions.
Optionally users can also direct it to symlink the distributed cache file(s) into the working directory of the task.

DistributedCache tracks modification timestamps of the cache files. Clearly the cache files should not be modified by the application or externally while the job is executing.

public void configure(JobConf job) {
// Get the cached archives/files
localArchives = DistributedCache.getLocalCacheArchives(job);
localFiles = DistributedCache.getLocalCacheFiles(job);
}

Related Questions

Question : In YARN which component monitors the progress of the task.

1. Resource Manager

2. Application Manager

3. Node Manager

4. JobTracker

Question : Select the correct Job Monitoring options in YARN

1. JobTracker web UI

2. MapR matrics database

3. History Server UIs

4. hadoop job command

Question : Which of the following is reported using Heartbeat from TaskTracker to JobTracker in MRv

1. Complete status of how many Key values are being processed.

2. If there is any error in shuffle process

3. It will report the status of MapTask or ReduceTask

4. Available RAM on each node

5. Available CPUs on each node

Question : When you submit the Job on client node, which of the following is instantiated first.

1. JobTracker

2. TaskTracker

3. NodeManager

4. JobManager

5. Job

Question : YARN architecture supports the separation of resource management and job management

1. True
2. False

Question : In MRv JobTracker supports the both resource management and job management

1. True
2. False