Question : In MapReduce V, select the correct order of Steps of job submission
A. Instantiation of JobClient object B. Submitting job to JobTracker by JobClient C. Job Tracker instantiates a job object D. Task Tracker launches a task, which in turn can run map or reduce task E. Tasks updates the task tracker with status and counters 1. B,A,C,E,D 2. A,B,D,E,C 3. Access Mostly Uused Products by 50000+ Subscribers 4. A,D,E,C,B 5. A,B,C,D,E
Correct Answer : Get Lastest Questions and Answer : Explanation: JobClient is the primary interface by which user-job interacts with the JobTracker. JobClient provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information and so on.
jobClient submits the to the JobTracker and then JobTracker will instantiates a Job object (Which represent your job and its configuration) This Job is submitted to TaskTracker and TaskTracker will run tasks like MapTask and ReduceTask. While running the tasks, each task send information back to TaskTracker like its current status and counters.
1. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input 2. There is no default input format. The input format always should be specified. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The default input format is TextInputFormat with byte offset as a key and entire line as a value
1. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file 2. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of these answers are correct
1. The most common problem with map-side joins is introducing a high level of code complexity. This complexity has several downsides: increased risk of bugs and performance degradation. Developers are cautioned to rarely use map-side joins. 2. The most common problem with map-side joins is lack of the available map slots since map-side joins require a lot of mappers. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The most common problem with map-side join is not clearly specifying primary index in the join. This can lead to very slow performance on large datasets.
1. No. The configuration settings in the configuration file takes precedence 2. Yes. The configuration settings using Java API take precedence 3. Access Mostly Uused Products by 50000+ Subscribers 4. Only global configuration settings are captured in configuration files on namenode. There are only a very few job parameters that can be set using Java API
Question : What is distributed cache? 1. The distributed cache is special component on namenode that will cache frequently used data for faster client response. It is used during reduce step 2. The distributed cache is special component on datanode that will cache frequently used data for faster client response. It is used during map step 3. Access Mostly Uused Products by 50000+ Subscribers 4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.
1. Writable is a java interface that needs to be implemented for streaming data to remote servers. 2. Writable is a java interface that needs to be implemented for HDFS writes. 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of these answers are corrects