Premium

Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)



Question : Suppose cluster resources become scarce and the scheduler ..


 : Suppose cluster resources become scarce and the scheduler ..
1. Stop working
2. All jobs will hangs for some time
3. All jobs will be stopped and needs to be re-started
4. ResourceManager symmetrically request back resources from a running application

Correct Answer : 4
As previously described, the ResourceManager is the master that arbitrates all the available cluster resources, thereby helping manage the distributed
applications running on the YARN system. It works together with the per-node NodeManagers and the per-application ApplicationMasters. In YARN, the ResourceManager is primarily limited to scheduling-that is, it allocates available resources in the system among the competing applications but does not concern itself with per-application state management. The scheduler handles only an overall resource profile for each application, ignoring local optimizations and internal application flow. In fact, YARN completely departs from the static assignment of map and reduce slots because it treats the cluster as a resource pool. Because of this clear separation of responsibilities coupled with the modularity described previously, the ResourceManager is able to address the important design requirements of scalability and support for alternative programming paradigms. In contrast to many other workflow schedulers, the ResourceManager also has the ability to symmetrically request back resources from a running application. This situation typically happens when cluster resources become scarce and the scheduler decides to reclaim some (but not all) of the resources that were given to an application.

In YARN, ResourceRequests can be strict or negotiable. This feature provides ApplicationMasters with a great deal of flexibility on how to fulfill the reclamation requests-for example, by picking containers to reclaim that are less crucial for the computation, by checkpointing the state of a task, or by migrating the computation to other running containers. Overall, this scheme allows applications to preserve work, in contrast to platforms that kill containers to satisfy resource constraints. If the application is noncollaborative, the ResourceManager can, after waiting a certain amount of time, obtain the needed
resources by instructing the NodeManagers to forcibly terminate containers.

ResourceManager failures remain significant events affecting cluster availability. As of this writing, the ResourceManager will restart running ApplicationMasters as it recovers its state. If the framework supports restart capabilities-and most will for routine fault tolerance-the platform will automatically restore users pipelines. In contrast to the Hadoop 1.0 JobTracker, it is important to mention the tasks for which the ResourceManager is not responsible. Other than tracking
application execution flow and task fault tolerance, the ResourceManager will not provide access to the application status (servlet; now part of the ApplicationMaster) or track previously executed jobs, a responsibility that is now delegated to the JobHistoryService (a daemon running on a separated node). This is consistent with the view that the ResourceManager should handle only live resource scheduling, and helps YARN central components scale better than Hadoop 1.0 JobTracker.





Question : Suppose you have MapReduce jobs with each the following priorities


JobName : Priority
Job1 : 60
Job2 : 70
Job3 : 20
Job4 : 40
Job5 : 10

Higher value means higher priority, now you have Resource Manager configured with the FIFO Scheduler.
All the jobs are submitted in the Job1 first,Job2, Job3, Job4 and Job5 as lasst with their
priorities configured. So in which order the Jobs will be executed

 : Suppose you have  MapReduce jobs with each the following priorities
1. Job2,Job1,Job4,Job3,Job5
2. Job5,Job4,job3,job2,Job1
3. Job1,Job2,Job3,Job4,Job5
4. Random order based on the Data Volume

Correct Answer : 3
YARN has a pluggable scheduling component. Depending on the use case and user needs, administrators may select either a simple FIFO (first in, first
out), capacity, or fair share scheduler. The scheduler class is set in yarn-default.xml. Information about the currently running scheduler can be found by
opening the ResourceManager web UI and selecting the Scheduler option under the Cluster menu on the left (e.g.,
http://your_cluster:8088/cluster/scheduler). The various scheduler options are described briefly in this section.

FIFO Scheduler
The original scheduling algorithm that was integrated within the Hadoop version 1 JobTracker was called the FIFO scheduler, meaning "first in, first out."
The FIFO scheduler is basically a simple "first come, first served" scheduler in which the JobTracker pulls jobs from a work queue, oldest job first. Typically,
FIFO schedules have no sense of job priority or scope. The FIFO schedule is practical for small workloads, but is feature-poor and can cause issues when
large shared clusters are used.






Question : You have configured Hadoop Mapreduce V with Fair Scheduler with the two queues.
But one of the queue is using the less than its fair share. Now as soon as new resources are available then that will be granted to this queue.
 : You have configured Hadoop Mapreduce V with Fair Scheduler with the two queues.
1. True
2. False

Correct Answer : 1

Explanation:



Related Questions


Question : You are upgrading your Hadoop cluster from MRv to MRv, but while creating MRv cluster you have not
considered or took seriously the configuration for disk drives on slave data node. But now you want the proper hardware
configuration for slave node. Please select the correct configuration for data node.

 : You are upgrading your Hadoop cluster from MRv to MRv, but while creating MRv cluster you have not
1. With JBOD configuration you should have total 12 slots with each 2TB disk drives
2. Only two drives each with 12TB is fine
3. Access Mostly Uused Products by 50000+ Subscribers

4. With RAID configuration 24 slots 500GB disk drives is fine





Question : You already have a cluster on the Hadoop MapReduce MRv, but now you have to upgrade the same on MRv
but somehow your management is not agreeing to install Apache Hive. And you have to convince your management
for installing the Apache Hive in Hadoop Cluster. Which is the correct statement which you can use to show
the relationship between MapReduce and Apache Hive?
  : You already have a cluster on the Hadoop MapReduce MRv, but now you have to upgrade the same on MRv
1. Apche Hive comes with the additional capabilities that allow certain types of data manipulation not possible with MapReduce.
2. Apache Hive programs can only rely on MapReduce but are extensible, allowing developers to do special-purpose processing not provided by MapReduce.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Apache Hive comes with no additional capabilities to MapReduce. Hive programs are executed as MapReduce jobs via the Hive interpreter.


Question : What is HIVE?
  : What is HIVE?
1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data
2. Hive is a way to add data from local file system to HDFS
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing


Question : Which statement is true about apache Hadoop ?


 : Which statement is true about apache Hadoop ?
1. HDFS performs best with a modest number of large files
2. No Randome Writes is alowed to the file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above


Question : Which statement is true about the storing files in HDFS


  : Which statement is true about the storing files in HDFS
1. Files are split in the block
2. All the blocks of the files should remain on same macine
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 3 are correct


Question : Select the correct statement for the NameNode ?

 :  Select the correct statement for the NameNode ?
1. NameNode daemon must be running at all the times
2. NameNode holds all its metadata in RAM for fast access.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 2 are correct