Question : Suppose cluster resources become scarce and the scheduler ..
1. Stop working 2. All jobs will hangs for some time 3. All jobs will be stopped and needs to be re-started 4. ResourceManager symmetrically request back resources from a running application
Correct Answer : 4 As previously described, the ResourceManager is the master that arbitrates all the available cluster resources, thereby helping manage the distributed applications running on the YARN system. It works together with the per-node NodeManagers and the per-application ApplicationMasters. In YARN, the ResourceManager is primarily limited to scheduling-that is, it allocates available resources in the system among the competing applications but does not concern itself with per-application state management. The scheduler handles only an overall resource profile for each application, ignoring local optimizations and internal application flow. In fact, YARN completely departs from the static assignment of map and reduce slots because it treats the cluster as a resource pool. Because of this clear separation of responsibilities coupled with the modularity described previously, the ResourceManager is able to address the important design requirements of scalability and support for alternative programming paradigms. In contrast to many other workflow schedulers, the ResourceManager also has the ability to symmetrically request back resources from a running application. This situation typically happens when cluster resources become scarce and the scheduler decides to reclaim some (but not all) of the resources that were given to an application.
In YARN, ResourceRequests can be strict or negotiable. This feature provides ApplicationMasters with a great deal of flexibility on how to fulfill the reclamation requests-for example, by picking containers to reclaim that are less crucial for the computation, by checkpointing the state of a task, or by migrating the computation to other running containers. Overall, this scheme allows applications to preserve work, in contrast to platforms that kill containers to satisfy resource constraints. If the application is noncollaborative, the ResourceManager can, after waiting a certain amount of time, obtain the needed resources by instructing the NodeManagers to forcibly terminate containers.
ResourceManager failures remain significant events affecting cluster availability. As of this writing, the ResourceManager will restart running ApplicationMasters as it recovers its state. If the framework supports restart capabilities-and most will for routine fault tolerance-the platform will automatically restore users pipelines. In contrast to the Hadoop 1.0 JobTracker, it is important to mention the tasks for which the ResourceManager is not responsible. Other than tracking application execution flow and task fault tolerance, the ResourceManager will not provide access to the application status (servlet; now part of the ApplicationMaster) or track previously executed jobs, a responsibility that is now delegated to the JobHistoryService (a daemon running on a separated node). This is consistent with the view that the ResourceManager should handle only live resource scheduling, and helps YARN central components scale better than Hadoop 1.0 JobTracker.
Question : Suppose you have MapReduce jobs with each the following priorities
Higher value means higher priority, now you have Resource Manager configured with the FIFO Scheduler. All the jobs are submitted in the Job1 first,Job2, Job3, Job4 and Job5 as lasst with their priorities configured. So in which order the Jobs will be executed
1. Job2,Job1,Job4,Job3,Job5 2. Job5,Job4,job3,job2,Job1 3. Job1,Job2,Job3,Job4,Job5 4. Random order based on the Data Volume
Correct Answer : 3 YARN has a pluggable scheduling component. Depending on the use case and user needs, administrators may select either a simple FIFO (first in, first out), capacity, or fair share scheduler. The scheduler class is set in yarn-default.xml. Information about the currently running scheduler can be found by opening the ResourceManager web UI and selecting the Scheduler option under the Cluster menu on the left (e.g., http://your_cluster:8088/cluster/scheduler). The various scheduler options are described briefly in this section.
FIFO Scheduler The original scheduling algorithm that was integrated within the Hadoop version 1 JobTracker was called the FIFO scheduler, meaning "first in, first out." The FIFO scheduler is basically a simple "first come, first served" scheduler in which the JobTracker pulls jobs from a work queue, oldest job first. Typically, FIFO schedules have no sense of job priority or scope. The FIFO schedule is practical for small workloads, but is feature-poor and can cause issues when large shared clusters are used.
Question : You have configured Hadoop Mapreduce V with Fair Scheduler with the two queues. But one of the queue is using the less than its fair share. Now as soon as new resources are available then that will be granted to this queue. 1. True 2. False
Question : What is HIVE? 1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data 2. Hive is a way to add data from local file system to HDFS 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing