4. You can choose any range, and it will derived based on relative value.
Correct Answer : Get Lastest Questions and Answer : Explanation: mapred.reduce.tasks : The default number of reduce tasks per job. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local".
Question : You have a MapReduce jobs, which create unique data sets and finally insert each record in JDBC database table. Reducer is responsible writing data in Database. There are chances that your cluster is very heavily loaded and few map tasks and reduce tasks can fail in between and re-launched in different node. So, which statement is correct for above scenario?
1. to avoid slowness, we should enable speculative execution
Correct Answer : Get Lastest Questions and Answer : Explanation: As enabling speculative execution will , launch same reducer tasks on another node. Which will also write the data in JDBC tables. Similarly the node which is also will also write same data. Which is not right, because we need unique record as a put.
1. RecordWriter writes the key-value pairs to the output files
2. The TextOutputFormat.LineRecordWriter implementation requires a java.io.DataOutputStream object to write the key-value pairs to the HDFS/MapR-FS file system
1. Each reducer will take , partitioned generated and decided by Hadoop framework as an input. And processes one iterable list of key-value pairs at a time.
2. Reducer generates output as a patitioned file in a format part-r-0000x