Question : Which YARN daemon or service monitors and Controls per-application resource using (e.g., memory CPU)? 1. ApplicationMaster 2. NodeManager 3. Access Mostly Uused Products by 50000+ Subscribers 4. ResourceManager
Explanation: An important new concept in YARN is the ApplicationMaster. The ApplicationMaster is, in effect, an instance of a framework-specific library and is responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption. It has the responsibility of negotiating for appropriate resource containers from the ResourceManager, tracking their status, and monitoring progress. The ApplicationMaster design enables YARN to offer the following important new features: Scale: The ApplicationMaster provides much of the job-oriented functionality of the JobTracker so that the entire system can scale more dramatically. Simulations have shown that jobs may scale to 10,000 node clusters composed of modern hardware without significant issue. As a pure scheduler, the ResourceManager does not, for example, have to provide fault tolerance for resources across the cluster. By shifting fault tolerance to the ApplicationMaster instance, control becomes local, rather than global. Furthermore, because an instance of an ApplicationMaster is made available per application, the ApplicationMaster itself is rarely a bottleneck in the cluster. Openness: Moving all application framework-specific code into the ApplicationMaster generalizes the system so that it can now support multiple frameworks such as MapReduce, MPI, and Graph Processing. These features were the result of some key YARN design decisions: Move all complexity (to the extent possible) to the ApplicationMaster, while providing sufficient functionality to allow application framework authors sufficient flexibility and power. Because it is essentially user code, do not trust the ApplicationMaster(s). In other words, no ApplicationMaster is a privileged service. The YARN system (ResourceManager and NodeManager) has to protect itself from faulty or malicious ApplicationMaster(s) and resources granted to them at all costs. In reality, every application has its own instance of an ApplicationMaster. However, its completely feasible to implement an ApplicationMaster to manage a set of applications (e.g., ApplicationMaster for Pig or Hive to manage a set of MapReduce jobs). Furthermore, this concept has been stretched to manage long-running services, which manage their own applications (e.g., launching HBase in YARN via a special HBaseAppMaster).
Question : Which YARN process run as container of a submitted job and is responsible for resource qrequests? 1. ApplicationManager 2. JobTracker 3. Access Mostly Uused Products by 50000+ Subscribers 4. JobHistoryServer 5. ResoureManager
Explanation: A Container is a collection of physical resources on a single node, such as memory (RAM), CPU cores, and disks. There can be multiple Containers on a single Node (or a single large one). Every node in the system is considered to be composed of multiple Containers of minimum memory size (512MB or 1 GB, for example). The Application Master can request any Container as a multiple of the minimum memory size.
A Container thus represents a resource (memory, CPU) on a single node in a given cluster. A Container is supervised by the Node Manager and scheduled by the Resource Manager.
Each application starts out as an Application Master, which is itself a Container (often referred to as container -0). Once started, the Application Master must negotiate with the Resource Manager for more Containers. Container requests (and releases) can take place in a dynamic manner at run-time. For instance, a MapReduce job may request a certain amount of mapper Containers, and as they finish, release them and request that more reducer containers be started.
Question : Which scheduler would you deploy to ensure that your cluster allows short jobs to finish within a reasonable time without starting long-running jobs? 1. Complexity Fair Scheduler (CFS) 2. Capacity Scheduler 3. Access Mostly Uused Products by 50000+ Subscribers 4. FIFO Scheduler
Explanation: The scheduler organizes apps further into queues, and shares resources fairly between these queues. By default, all users share a single queue, named default. If an app specifically lists a queue in a container resource request, the request is submitted to that queue. It is also possible to assign queues based on the user name included with the request through configuration. Within each queue, a scheduling policy is used to share resources between the running apps. The default is memory-based fair sharing, but FIFO and multi-resource with Dominant Resource Fairness can also be configured. Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions mapred.fairscheduler.sizebasedweight Take into account job sizes in calculating their weights for fair sharing. By default, weights are only based on job priorities. Setting this flag to true will make them based on the size of the job (number of tasks needed) as well,though not linearly (the weight will be proportional to the log of the number of tasks needed). This lets larger jobs get larger fair shares while still providing enough of a share to small jobs to let them finish fast. Boolean value, default: false.