Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : Which of the following are the new features of MapReduceV (YARN) architecture

1. ResourceManager High Availability: YARN now allows you to use multiple ResourceManagers so that there is no single point of failure. In-flight jobs are recovered without re-running completed tasks.
2. Monitoring and enforcing memory and CPU-based resource utilization using cgroups.
3. Continuous Scheduling: This feature decouples scheduling from the node heartbeats for improved performance in large clusters
4. 1 and 2
5. 1,2 and 3

Correct Answer : 5

Explanation:

MapReduce v2 (YARN)
New Features:
ResourceManager High Availability: YARN now allows you to use multiple ResourceManagers so that there is no single point of failure. In-flight jobs are recovered without re-running completed tasks.
Monitoring and enforcing memory and CPU-based resource utilization using cgroups.
Continuous Scheduling: This feature decouples scheduling from the node heartbeats for improved performance in large clusters.

Changed Feature:
ResourceManager Restart: Persistent implementations of the RMStateStore (filesystem-based and ZooKeeper-based) allow recovery of in-flight jobs.

Question : When you submitted your MapReduce job on YARN framework, Which of the following component is responsible
for monitoring resource usage e.g. CPU,memory, disk,network on individual nodes.

1. Resource Manager
2. Application Master

3. Node Manager

4. NameNode

Correct Answer : 3

Explanation: In YARN, Task Tracker is replaced with Node Manager in YARN which is a per-machine framework agent and it is responsible for containers, monitoring their resource usage (CPU, memory, disk, network) and reporting the same to the Resource Manager. Application Master negotiates with Resource Manager to get the resources across cluster and work with the Node Managers to execute and monitor the tasks.

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. MapReduce Application Master

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. 1-c, 2-a,3-b
4. 1-a, 2-c,3-b

Correct Answer : 2

Explanation: Components of Mapreduce Job Flow:
Mapreduce job flow on YARN involves below components.
A Client node, which submits the Mapreduce job.
The YARN Resource Manager, which allocates the cluster resources to jobs.
The YARN Node Managers, which launch and monitor the tasks of jobs.
The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers.
The HDFS file system is used for sharing job files between the above entities.

Related Questions

Question : My container is being killed by the Node Manager, why ?

1. This is likely due to high memory usage exceeding your requested container memory size.
2. you have exceeded physical memory limits
3. you have exceeded virtual memory
4. 1 and 2
5. 1,2 and 3

Question :
I have written a Hadoop MapReduce job, which uses the native libarary in the Job. So which is the best way to include the native libraries.

1. Setting -Djava.library.path on the command line while launching a container
2. use LD_LIBRARY_PATH
3. Setting -Dnative.library.path on the command line while launching a container
4. By Adding the Jar's in the Hadoop Job Jar

Question : Select the correct flow, for submitting the YARN application

1. ApplicationMaster needs to register itself with the ResourceManager
2. The client communicates with the ResourceManager using the 'ClientRMProtocol' to first acquire a new 'ApplicationId'
3. Client submits an 'Application' to the YARN Resource Manager
4. The YARN ResourceManager will then launch the ApplicationMaster (as specified) on an allocated container
5. ApplicationMaster has to signal the ResourceManager of its completion
6. ApplicationMaster communicates with the NodeManager using ContainerManager
7. ApplicationMaster can then request for and receive container

1. 7,2,1,4,3,6,5
2. 2,3,4,1,7,5,6
3. 3,2,4,1,7,6,5
4. 3,4,2,7,1,6,5

Question : Map the fillowing

1. ClientRMProtocol
2. AMRMProtocol
3. ContainerManager

A. The protocol used by the ApplicationMaster to talk to the NodeManager to start/stop containers and get status updates on the containers if needed.
B. The protocol for a client that wishes to communicate with the ResourceManager to launch a new application (i.e. the ApplicationMaster),
check on the status of the application or kill the application. For example, a job-client (a job launching program from the gateway) would use this protocol.
C. The protocol used by the ApplicationMaster to register/unregister itself to/from the ResourceManager as well as to request for resources from the Scheduler to complete its tasks.

1. 1-C, 2-A, 3-B
2. 1-A, 2-B, 3-C
3. 1-B, 2-C, 3-A
4. 1-A, 2-C, 3-B

Question : The ApplicationReport received from the ResourceManager consists of the :

1. General application information: ApplicationId, queue to which the application was submitted,
user who submitted the application and the start time for the application.
2. ApplicationMaster details: the host on which the ApplicationMaster is running, the rpc port (if any)
on which it is listening for requests from clients and a token that the client needs to communicate with the ApplicationMaster.
3. Application tracking information: If the application supports some form of progress tracking,
it can set a tracking url which is available via ApplicationReport#getTrackingUrl that a client can look at to monitor progress.
4. ApplicationStatus: The state of the application as seen by the ResourceManager is available via ApplicationReport#getYarnApplicationState.
If the YarnApplicationState is set to FINISHED, the client should refer to ApplicationReport#getFinalApplicationStatus to check for the actual
success/failure of the application task itself. In case of failures, ApplicationReport#getDiagnostics may be useful to shed some more light on the the failure.
5. All of the above

Question :

Select the correct statement for the YARN

1. The ApplicationMaster is the actual owner of the job. It will be launched by the ResourceManager
and via the client will be provided all the necessary information and resources about the job that it has been tasked with to oversee and complete.
2. As the ApplicationMaster is launched within a container that may (likely will) be sharing a physical
host with other containers, given the multi-tenancy nature, amongst other issues, it cannot make any assumptions of things like
pre-configured ports that it can listen on.
3. When the ApplicationMaster starts up, several parameters are made available to it via the environment.
These include the ContainerId for the ApplicationMaster container, the application submission time and details about
the NodeManager host running the Application Master. Ref ApplicationConstants for parameter names
4. 1 and 2
5. 1,2 and 3