Question : Which of the following are responsibilities of the ApplicationMater
1. Before starting any task, create job's output directory for job's OutputCommitter. 2. Both map tasks and reduce tasks are created by Application Master. 3. If the submitted job is small, then Application Master runs the job in the same JVM on which Application Master is running. 4. If job doesn't qualify as Uber task, Application Master requests containers for all map tasks and reduce tasks.
1. 1,2,3 2. 2,3,4 3. 1,3,4 4. 1,2,4 5. 1,2,3,4
Correct Answer : 5
Explanation: Role of an Application Master: o Before starting any task, Job setup method is called to create job's output directory for job's OutputCommitter. o As noted above, Both map tasks and reduce tasks are created by Application Master. o If the submitted job is small, then Application Master runs the job in the same JVM on which Application Master is running. It reduces the overhead of creating new container and running tasks in parallel. These small jobs are called as Uber tasks. o Uber tasks are decided by three configuration parameters, number of mappers "less than and equal to" 10, number of reducers "less than and equal to" 1 and Input file size is less than or equal to an HDFS block size. These parameters can be configured via mapreduce.job.ubertask.maxmaps , mapreduce.job.ubertask.maxreduces , and mapreduce.job.ubertask.maxbytes properties in mapred-site.xml. o If job doesn't qualify as Uber task, Application Master requests containers for all map tasks and reduce tasks.
Question : Which of the following are the steps followed as part of TaskExecution
1. Once Containers assigned to tasks, Application Master starts containers by notifying its Node Manager. 2. Application Master copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks. 3. Node Manager copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks. 4. Running Tasks, keep reporting about the progress and status (Including counters) of current task to Application Master and Application Master collects this progress information from all tasks and aggregate values are propagated to Client Node or user.
1. 1,2,3 2. 2,3,4 3. 3,4,1 4. 1,3,4 5. 1,2,3,4
Correct Answer : 4
Explanation: Task Execution: Once Containers assigned to tasks, Application Master starts containers by notifying its Node Manager. Node Manager copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks. Running Tasks, keep reporting about the progress and status (Including counters) of current task to Application Master and Application Master collects this progress information from all tasks and aggregate values are propagated to Client Node or user.
Question : Which of the following component in the MRv maintains the History of the JOB
1. MapReduce Server 2. MapReduce JobHistory Server 3. Application Master 4. 2 and 3 5. 1 , 2 and 3
Correct Answer : 2
As previously described in MapReduce 2.0 in Hadoop 0.23,the JobTracker no longer exists in MR2, and the job life cycle management functionality is now the responsibility of the short-lived Application Masters. For this reason, a new MapReduce JobHistory server was added to MR2, which maintains information about submitted MapReduce jobs after their Application Master terminates. The Resource Manager Web UI manages such forwarding of requests to the JobHistory server when the Application Master completes.
1. General application information: ApplicationId, queue to which the application was submitted, user who submitted the application and the start time for the application. 2. ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any) on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster. 3. Application tracking information: If the application supports some form of progress tracking, it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress. 4. ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState. If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure. 5. All of the above
1. The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete. 2. As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like pre-configured ports that it can listen on. 3. When the ApplicationMaster starts up, several parameters are made available to it via the environment. These include the ContainerId for the ApplicationMaster container, the application submission time and details about the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names 4. 1 and 2 5. 1,2 and 3
Question : Select the correct option which is/are correct 1. YARN takes into account all of the available compute resources on each machine in the cluster. 2. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster. 3. YARN then provides processing capacity to each application by allocating Containers. 4. 1 and 3 5. 1,2 and 3