Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In a MapReduce job, the reducer receives all values associated with same key. Which
statement best describes the ordering of these values?

1. The values are in sorted order.
2. The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.

Correct Answer : Get Lastest Questions and Answer :
Explanation: * Input to the Reducer is the sorted output of the mappers.
* The framework calls the application's Reduce function once for each unique key in the sorted order.
* Example: For the given sample input the first map emits:
< Hello, 1>
< World, 1>
< Bye, 1>
< World, 1>
The second map emits:
< Hello, 1>
< Hadoop, 1>
< Goodbye, 1>
< Hadoop, 1>

Question : You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper's map method?

1. Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk.
2. Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDFS.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer
5. Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDFS.

Correct Answer : Get Lastest Questions and Answer :
Explanation: The mapper output (intermediate data) is stored on the Local file system
(NOT HDFS) of each individual mapper nodes. This is typically a temporary directory
location which can be setup in config by the hadoop administrator. The intermediate data is
cleaned up after the Hadoop Job completes.

Question : You are developing a MapReduce job for sales reporting. The mapper will process input
keys representing the year (IntWritable) and input values representing product identifies
(Text). Identify what determines the data types used by the Mapper for a given job.

1. The key and value types specified in the JobConf.setMapInputKeyClass and JobConf.setMapInputValuesClass methods
2. The data types specified in HADOOP_MAP_DATATYPES environment variable
3. Access Mostly Uused Products by 50000+ Subscribers
4. The InputFormat used by the job determines the mapper's input key and value types.

Correct Answer : Get Lastest Questions and Answer :
Explanation: The input types fed to the mapper are controlled by the InputFormat used.
The default input format, "TextInputFormat," will load data in as (LongWritable, Text) pairs.
The long value is the byte offset of the line in the file. The Text object holds the string
contents of the line of the file.
Note: The data types emitted by the reducer are identified by setOutputKeyClass()
andsetOutputValueClass(). The data types emitted by the reducer are identified by
setOutputKeyClass() and setOutputValueClass().
By default, it is assumed that these are the output types of the mapper as well. If this is not
the case, the methods setMapOutputKeyClass() and setMapOutputValueClass() methods
of the JobConf class will override these.

Related Questions

Question : Which of the following are MapReduce processing phases ?

1. Map
2. Reduce
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sort
5. 1 and 2 only

Question : What is true about HDFS ?

1. HDFS is based of Google File System
2. HDFS is written in Java
3. Access Mostly Uused Products by 50000+ Subscribers
4. All above are correct

Question : What are sequence files and why are they important?

1. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
2. Sequence files are binary format files that are compressed and are splitable.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : How can you use binary data in MapReduce?

1. Binary data cannot be used by Hadoop framework.
2. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers

Question : Which is Hadoop Daemon Process (MRv)

1. JobTracker
2. Tasktracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode
5. All of the above

Question : Which statement is true about apache Hadoop ?

1. HDFS performs best with a modest number of large files
2. No Random Writes is allowed to the file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above