IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : When you submitted your MapReduce job on YARN framework, Which of the following component is responsible
for monitoring resource usage e.g. CPU,memory, disk,network on individual nodes.

1. Resource Manager
2. Application Master

3. Node Manager

4. NameNode

Correct Answer : 3

Explanation: In YARN, Task Tracker is replaced with Node Manager in YARN which is a per-machine framework agent and it is responsible for containers, monitoring their resource usage (CPU,
memory, disk, network) and reporting the same to the Resource Manager. Application Master negotiates with Resource Manager to get the resources across cluster and work with the
Node Managers to execute and monitor the tasks.

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. MapReduce Application Master

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. 1-c, 2-a,3-b
4. 1-a, 2-c,3-b

Correct Answer : 2

Explanation: Components of Mapreduce Job Flow:
Mapreduce job flow on YARN involves below components.
A Client node, which submits the Mapreduce job.
The YARN Resource Manager, which allocates the cluster resources to jobs.
The YARN Node Managers, which launch and monitor the tasks of jobs.
The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by
the resource manager, and managed by the node managers.
The HDFS file system is used for sharing job files between the above entities.

Question : Developer has submitted the YARN Job, by calling submitApplication() method on Resource Manager.
Please select the correct order of the below steps after that

1. Container will be managed by Node Manager after job submission
2. Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution.
3. Resource Manager starts Application Master in the container

1. 2,3,1
2. 1,2,3
3. 2,1,3
4. 1,3,2

Correct Answer : 1

Explanation: Job Start up:
The call to Job.waitForCompletion() in the main driver class is where all the execution starts. The driver is the only piece of code that runs on our local machine, and this call
starts the communication with the Resource Manager.
Retrieves the new Job ID or Application ID from Resource Manager.
The Client Node copies Job Resources specified via the -files, -archives, and -libjars command-line arguments, as well as the job JAR file on to HDFS.
Finally, Job is submitted by calling submitApplication() method on Resource Manager.
Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution. Then Resource Manager starts Application Master in the container
provided by the scheduler. This container will be managed by Node Manager from here on wards.

Related Questions

Question : YARN requires a staging directory for temporary files created by running jobs. By default it creates /tmp/hadoop-yarn/staging
But user can not run the jobs, what could be reason.

1. Directory path is not correct
2. staging directory is full
3. Directory has restrictive permissions
4. None of the above

Question : In MrV Map or Reduce tasks runs in a container, which of the following component is responsible for launching that container

1. JobHistoryServer
2. NodeManager
3. Application Master
4. Resource Manager

Question : In MR, each node was configured with a fixed number of map slots and a fixed number of reduce slots.
Under YARN, there is no distinction between resources available for maps and resources available for reduces - all resources are available for both

1. True
2. False

Question : My container is being killed by the Node Manager, why ?

1. This is likely due to high memory usage exceeding your requested container memory size.
2. you have exceeded physical memory limits
3. you have exceeded virtual memory
4. 1 and 2
5. 1,2 and 3

Question : I have written a Hadoop MapReduce job, which uses the native library in the Job. So which is the best way to include the native libraries.

1. Setting -Djava.library.path on the command line while launching a container
2. use LD_LIBRARY_PATH
3. Setting -Dnative.library.path on the command line while launching a container
4. By Adding the Jar's in the Hadoop Job Jar

Question : Select the correct flow, for submitting the YARN application

1. ApplicationMaster needs to register itself with the ResourceManager
2. The client communicates with the ResourceManager using the 'ClientRMProtocol' to first acquire a new 'ApplicationId'
3. Client submits an 'Application' to the YARN Resource Manager
4. The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container
5. ApplicationMaster has to signal the ResourceManager of its completion
6. ApplicationMaster communicates with the NodeManager using ContainerManager
7. ApplicationMaster can then request for and receive container

1. 7,2,1,4,3,6,5
2. 2,3,4,1,7,5,6
3. 3,2,4,1,7,6,5
4. 3,4,2,7,1,6,5