IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following are responsibilities of the ApplicationMater

1. Before starting any task, create job's output directory for job's OutputCommitter.
2. Both map tasks and reduce tasks are created by Application Master.
3. If the submitted job is small, then Application Master runs the job in the same JVM on which Application Master is running.
4. If job doesn't qualify as Uber task, Application Master requests containers for all map tasks and reduce tasks.

1. 1,2,3
2. 2,3,4
3. 1,3,4
4. 1,2,4
5. 1,2,3,4

Correct Answer : 5

Explanation: Role of an Application Master:
o Before starting any task, Job setup method is called to create job's output directory for job's OutputCommitter.
o As noted above, Both map tasks and reduce tasks are created by Application Master.
o If the submitted job is small, then Application Master runs the job in the same JVM on which Application Master is running. It reduces the overhead of creating new container
and running tasks in parallel. These small jobs are called as Uber tasks.
o Uber tasks are decided by three configuration parameters, number of mappers "less than and equal to" 10, number of reducers "less than and equal to" 1 and Input file size is
less than or equal to an HDFS block size. These parameters can be configured via mapreduce.job.ubertask.maxmaps , mapreduce.job.ubertask.maxreduces , and
mapreduce.job.ubertask.maxbytes properties in mapred-site.xml.
o If job doesn't qualify as Uber task, Application Master requests containers for all map tasks and reduce tasks.

Question : Which of the following are the steps followed as part of TaskExecution

1. Once Containers assigned to tasks, Application Master starts containers by notifying its Node Manager.
2. Application Master copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks.
3. Node Manager copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks.
4. Running Tasks, keep reporting about the progress and status (Including counters) of current
task to Application Master and Application Master collects this progress information from all tasks and
aggregate values are propagated to Client Node or user.

1. 1,2,3
2. 2,3,4
3. 3,4,1
4. 1,3,4
5. 1,2,3,4

Correct Answer : 4

Explanation: Task Execution:
Once Containers assigned to tasks, Application Master starts containers by notifying its Node Manager.
Node Manager copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks.
Running Tasks, keep reporting about the progress and status (Including counters) of current task to Application Master and Application Master collects this progress information
from all tasks and aggregate values are propagated to Client Node or user.

Question : Which of the following component in the MRv maintains the History of the JOB

1. MapReduce Server
2. MapReduce JobHistory Server
3. Application Master
4. 2 and 3
5. 1 , 2 and 3

Correct Answer : 2

As previously described in MapReduce 2.0 in Hadoop 0.23,the JobTracker no longer exists in MR2, and the job life cycle management functionality is now the responsibility of the
short-lived Application Masters. For this reason, a new MapReduce JobHistory server was added to MR2, which maintains information about submitted MapReduce jobs after their
Application Master terminates. The Resource Manager Web UI manages such forwarding of requests to the JobHistory server when the Application Master completes.

Related Questions

Question : The ApplicationReport received from the ResourceManager consists of the :

1. General application information: ApplicationId, queue to which the application was submitted,
user who submitted the application and the start time for the application.
2. ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any)
on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster.
3. Application tracking information: If the application supports some form of progress tracking,
it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress.
4. ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState.
If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual
success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
5. All of the above

Question :

Select the correct statement for the YARN

1. The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager
and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete.
2. As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical
host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like
pre-configured ports that it can listen on.
3. When the ApplicationMaster starts up, several parameters are made available to it via the environment.
These include the ContainerId for the ApplicationMaster container, the application submission time and details about
the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names
4. 1 and 2
5. 1,2 and 3

Question :

The ApplicationMaster has to emit __________ to the ResourceManager to keep it informed that the ApplicationMaster is alive and still running.

1. heartbeats
2. messages
3. events
4. Async Messages

Question :
A _____ is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.)

1. Node Manager
2. Container
3. ApplicationMaster
4. DataNode

Question : Select the correct option which is/are correct

1. YARN takes into account all of the available compute resources on each machine in the cluster.
2. Based on the available resources, YARN negotiates resource requests from applications (such as MapReduce) running in the cluster.
3. YARN then provides processing capacity to each application by allocating Containers.
4. 1 and 3
5. 1,2 and 3

Question : As a general recommendation, allowing for ______ Containers per disk and per core gives the best balance for cluster utilization.

1. One
2. Two
3. Three
4. Four