Question : Which one of the following statements is true regarding a MapReduce job? 1. The job's Partitioner shuffles and sorts all (key.value) pairs and sends the output to all reducers 2. The default Hash Partitioner sends key value pairs with the same key to the same Reducer 3. Access Mostly Uused Products by 50000+ Subscribers 4. The Mapper must sort its output of (key.value) pairs in descending order based on value
Question : Which best describes what the map method accepts and emits? 1. It accepts a single key-value pair as input and emits a single key and list of corresponding values as output. 2. It accepts a single key-value pairs as input and can emit only one key-value pair as output. 3. Access Mostly Uused Products by 50000+ Subscribers 4. It accepts a single key-value pairs as input and can emit any number of key-value pair as output, including zero.
Correct Answer : Get Lastest Questions and Answer : Explanation: public class Mapper extends Object Maps input key/value pairs to a set of intermediate key/value pairs. Maps are the individual tasks which transform input records into a intermediate records. The transformed intermediate records need not be of the same type as the input records. A given input pair may map to zero or many output pairs. Reference: org.apache.hadoop.mapreduce Class Mapper
Question : You are developing a combiner that takes as input Text keys, IntWritable values, and emits Text keys, IntWritable values. Which interface should your class implement?
1. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input 2. There is no default input format. The input format always should be specified. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The default input format is TextInputFormat with byte offset as a key and entire line as a value
1. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file 2. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of these answers are correct
1. The most common problem with map-side joins is introducing a high level of code complexity. This complexity has several downsides: increased risk of bugs and performance degradation. Developers are cautioned to rarely use map-side joins. 2. The most common problem with map-side joins is lack of the available map slots since map-side joins require a lot of mappers. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The most common problem with map-side join is not clearly specifying primary index in the join. This can lead to very slow performance on large datasets.
1. No. The configuration settings in the configuration file takes precedence 2. Yes. The configuration settings using Java API take precedence 3. Access Mostly Uused Products by 50000+ Subscribers 4. Only global configuration settings are captured in configuration files on namenode. There are only a very few job parameters that can be set using Java API
Question : What is distributed cache? 1. The distributed cache is special component on namenode that will cache frequently used data for faster client response. It is used during reduce step 2. The distributed cache is special component on datanode that will cache frequently used data for faster client response. It is used during map step 3. Access Mostly Uused Products by 50000+ Subscribers 4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.
1. Writable is a java interface that needs to be implemented for streaming data to remote servers. 2. Writable is a java interface that needs to be implemented for HDFS writes. 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of these answers are corrects