Question : In a MapReduce job, the reducer receives all values associated with same key. Which statement best describes the ordering of these values? 1. The values are in sorted order. 2. The values are arbitrarily ordered, and the ordering may vary from run to run of the same MapReduce job. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Since the values come from mapper outputs, the reducers will receive contiguous sections of sorted values.
Correct Answer : Get Lastest Questions and Answer : Explanation: * Input to the Reducer is the sorted output of the mappers. * The framework calls the application's Reduce function once for each unique key in the sorted order. * Example: For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> < World, 1> The second map emits: < Hello, 1> < Hadoop, 1> < Goodbye, 1> < Hadoop, 1>
Question : You have just executed a MapReduce job. Where is intermediate data written to after being emitted from the Mapper's map method?
1. Intermediate data in streamed across the network from Mapper to the Reduce and is never written to disk. 2. Into in-memory buffers on the TaskTracker node running the Mapper that spill over and are written into HDFS. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Into in-memory buffers that spill over to the local file system (outside HDFS) of the TaskTracker node running the Reducer 5. Into in-memory buffers on the TaskTracker node running the Reducer that spill over and are written into HDFS.
Correct Answer : Get Lastest Questions and Answer : Explanation: The mapper output (intermediate data) is stored on the Local file system (NOT HDFS) of each individual mapper nodes. This is typically a temporary directory location which can be setup in config by the hadoop administrator. The intermediate data is cleaned up after the Hadoop Job completes.
Question : You are developing a MapReduce job for sales reporting. The mapper will process input keys representing the year (IntWritable) and input values representing product identifies (Text). Identify what determines the data types used by the Mapper for a given job.
1. The key and value types specified in the JobConf.setMapInputKeyClass and JobConf.setMapInputValuesClass methods 2. The data types specified in HADOOP_MAP_DATATYPES environment variable 3. Access Mostly Uused Products by 50000+ Subscribers 4. The InputFormat used by the job determines the mapper's input key and value types.
Correct Answer : Get Lastest Questions and Answer : Explanation: The input types fed to the mapper are controlled by the InputFormat used. The default input format, "TextInputFormat," will load data in as (LongWritable, Text) pairs. The long value is the byte offset of the line in the file. The Text object holds the string contents of the line of the file. Note: The data types emitted by the reducer are identified by setOutputKeyClass() andsetOutputValueClass(). The data types emitted by the reducer are identified by setOutputKeyClass() and setOutputValueClass(). By default, it is assumed that these are the output types of the mapper as well. If this is not the case, the methods setMapOutputKeyClass() and setMapOutputValueClass() methods of the JobConf class will override these.
1. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 2. Sequence files are binary format files that are compressed and are splitable. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above