1. Stored as metadata on the NameNode 2. Stored along with the data in HDFS 3. Stored in the Metadata 4. Stored in ZooKeeper 5. Stored in Hive Metastore
Correct Answer : 5 Introduction
All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using JPOX ORM solution so any store that is supported by it can be used by Hive. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has a JDBC driver can probably be used.
Question : __________ are responsible for local monitoring of resource availability, fault reporting, and container life-cycle management (e.g., starting and killing jobs).
Explanation: The central ResourceManager runs as a standalone daemon on a dedicated machine and acts as the central authority for allocating resources to the various competing applications in the cluster. The ResourceManager has a central and global view of all cluster resources and, therefore, can provide fairness, capacity, and locality across all users. Depending on the application demand, scheduling priorities, and resource availability, the ResourceManager dynamically allocates resource containers to applications to run on particular nodes. A container is a logical bundle of resources (e.g., memory, cores) bound to a particular cluster node. To enforce and track such assignments, the ResourceManager interacts with a special system daemon running on each node called the NodeManager. Communications between the ResourceManager and NodeManagers are heartbeat based for scalability. NodeManagers are responsible for local monitoring of resource availability, fault reporting, and container life-cycle management (e.g., starting and killing jobs). The ResourceManager depends on the NodeManagers for its "global view" of the cluster.
User applications are submitted to the ResourceManager via a public protocol and go through an admission control phase during which security credentials are validated and various operational and administrative checks are performed. Those applications that are accepted pass to the scheduler and are allowed to run. Once the scheduler has enough resources to satisfy the request, the application is moved from an accepted state to a running state. Aside from internal bookkeeping, this process involves allocating a container for the ApplicationMaster and spawning it on a node in the cluster. Often called 'container 0,' the ApplicationMaster does not get any additional resources at this point and must request and release additional containers.
Question : Typically, an ApplicationMaster will need to harness the processing power of multiple servers to complete a job. In which of the following order this can be accomplished.
1. ApplicationMaster issues resource requests to the ResourceManager 2. ResourceManager generates a lease for the resource, which is acquired by a subsequent ApplicationMaster heartbeat 3. The ResourceManager will attempt to satisfy the resource requests coming from each application according to availability and scheduling policies. 4. A token-based security mechanism guarantees its authenticity when the ApplicationMaster presents the container lease to the NodeManager
1. 1,2,3,4 2. 1,3,4,2 3. 1,3,2,4 4. 1,4,3,2
Correct Answer : 3
Explanation: YARN makes few assumptions about the ApplicationMaster, although in practice it expects most jobs will use a higher level programming framework. By delegating all these functions to ApplicationMasters, YARN's architecture gains a great deal of scalability, programming model flexibility, and improved user agility. For example, upgrading and testing a new MapReduce framework can be done independently of other running MapReduce frameworks. Typically, an ApplicationMaster will need to harness the processing power of multiple servers to complete a job. To achieve this, the ApplicationMaster issues resource requests to the ResourceManager. The form of these requests includes specification of locality preferences (e.g., to accommodate HDFS use) and properties of the containers. The ResourceManager will attempt to satisfy the resource requests coming from each application according to availability and scheduling policies. When a resource is scheduled on behalf of an ApplicationMaster, the ResourceManager generates a lease for the resource, which is acquired by a subsequent ApplicationMaster heartbeat. A token based security mechanism guarantees its authenticity when the ApplicationMaster presents the container lease to the NodeManager. In MapReduce, the code running in the container can be a map or a reduce task. Commonly, running containers will communicate with the ApplicationMaster through an application-specific protocol to report status and health information and to receive framework-specific commands. In this way, YARN provides a basic infrastructure for monitoring and life-cycle management of containers, while application-specific semantics are managed independently by each framework. This design is in sharp contrast to the original Hadoop version 1 design, in which scheduling was designed and integrated around managing only MapReduce tasks.
1. a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node. 2. There can be only one container on a single node 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 5. 1 and 3
1. On start-up, the NodeManager registers with the ResourceManager 2. Its primary goal is to manage only the containers (On the node) assigned to it by the ResourceManager 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 5. 1 and 3
Question : Select the correct statement for HDFS in Hadoop . 1. NameNode federation significantly improves the scalability and performance of HDFS by introducing the ability to deploy multiple NameNodes for a single cluster. 2. built-in high availability for the NameNode via a new feature called the Quorum Journal Manager (QJM). QJM-based HA features an active NameNode and a standby NameNode 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 3 5. 1,2 and 3