Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : yarn application -kill - (application_id) is the correct command to kill any running MapReduce job.

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

Question : Which of the following schedular ensurs that certain users, groups, or production applications always get sufficient resources.
When a queue contains waiting applications, it gets at least its minimum share

1. Fair Scheduler
2. Capacity Scheduler
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2
5. Both 2 and 3

Correct Answer : Get Lastest Questions and Answer :
Exp: In addition to providing fair sharing, the Fair scheduler allows guaranteed minimum shares to be assigned to queues, which is useful for ensuring that
certain users, groups, or production applications always get sufficient resources. When a queue contains waiting applications, it gets at least its minimum
share; in contrast, when the queue does not need its full guaranteed share, the excess is split between other running applications. To avoid a single user
flooding the clusters with hundreds of jobs, the Fair scheduler can limit the number of running applications per user and per queue through the
configurations file. Using this limit, user applications will wait in the queue until previously submitted jobs finish.

Question :

Select the correct statement for the Fair Scheduler

1. allows containers to request variable amounts of memory and schedules based on those requirements
2. If an application is given a container that it cannot use immediately due to a shortage of memory, it can reserve that container, and no other application can use it until the container is released.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2
5. 1 and 3

Correct Answer : Get Lastest Questions and Answer :
The YARN Fair scheduler allows containers to request variable amounts of memory and schedules based on those requirements. Support for other
resource specifications, such as type of CPU, is under development. To prevent multiple smaller memory applications from starving a single large memory
application, a "reserved container" has been introduced. If an application is given a container that it cannot use immediately due to a shortage of memory, it
can reserve that container, and no other application can use it until the container is released. The reserved container will wait until other local containers
are released and then use this additional capacity (i.e., extra RAM) to complete the job. One reserved container is allowed per node, and each node may
have only one reserved container. The total reserved memory amount is reported in the ResourceManager UI. A larger number means that it may take
longer for new jobs to get space.

The Capacity scheduler works best when the workloads are well known, which helps in assigning the minimum capacity. For this scheduler to work most
effectively, each queue should be assigned a minimal capacity that is less than the maximal expected workload. Within each queue, multiple applications
are scheduled using hierarchical FIFO queues similar to the approach used with the stand-alone FIFO scheduler.

Related Questions

Question :

What is PIG?

1. Pig is a subset fo the Hadoop API for data processing
2. Pig is a part of the Apache Hadoop project that provides scripting languge interface for data processing
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of Above

Question : What is distributed cache?

1. The distributed cache is special component on namenode that will cache frequently used data for faster client response.
It is used during reduce step
2. The distributed cache is special component on datanode that will cache frequently used data
for faster client response. It is used during map step
3. Access Mostly Uused Products by 50000+ Subscribers
4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.

Question : You already have a cluster on the Hadoop MapReduce MRv, but now you have to upgrade the same on MRv but somehow
your management is not agreeing to install Pig. And you have to convince your management for installing the Apache Pig in Hadoop Cluster.
Which is the correct statement which you can use to show the relationship between MapReduce and Apache Pig?

1. Apache Pig rely on MapReduce which allows to do special-purpose processing not provided by MapReduce.
2. Apache Pig comes with no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Apache Pig comes with the additional capability of allowing you to control the flow of multiple MapReduce jobs.

Question : Prior to Hadoop .., the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode,
and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted
or brought up on a separate machine.Select all the actions you can accomplish once you implement HDFS High Availability on your Hadoop cluster.
1. Automatically replicate data between Active and Passive hadoop clusters.
2. Manually 'fail over' between Active and passive NameNodes
3. Automatically 'fail over' between Active and Passive NameNodes if Active one goes down.
4. Shut Active NameNode down for maintenance without disturbing the cluster.
5. Increase the parallelism in existing cluster.

1. 1,3,4
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2,3,4,5

Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis
you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now you are able to push
this full file in a directory on HDFS called /log/QT/QT31012015.log, select the correct statement for the pushing the file on HDFS.

1. The client queries the NameNode, which returns information on which DataNodes to use and the client writes to those DataNodes
2. The client writes immediately to DataNodes based on the cluster's rack locality settings

3. Access Mostly Uused Products by 50000+ Subscribers

4. The client writes immediately to DataNodes at random

Question : Select the appropriate way by which NameNode get to know all the available DataNodes in the Hadoop Cluster.

1. DataNodes are listed in the dfs.hosts file. The NameNode uses that as the definitive list of available DataNodes.
2. DataNodes heartbeat in to the master on a time-interval basis.

3. Access Mostly Uused Products by 50000+ Subscribers
4. The NameNode broadcasts a heartbeat on the network on a regular basis, and DataNodes respond.