IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : When running with N JournalNodes, the system can tolerate at most _____ failures and continue to function normally.

1. N/2
2. (N - 1) / 2
3. (N + 1) / 2
4. (N - 2) / 2

Correct Answer : 3
Ensure that you prepare the following hardware resources:

NameNode machines: The machines where you run Active and Standby NameNodes, should have exact same hardware. For recommended hardware for Hadoop, see Hardware recommendations for
Apache Hadoop.

JournalNode machines: The machines where you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be co-located on machines with
other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.

Note
There must be at least three JournalNode daemons, because edit log modifications must be written to a majority of JNs. This lets the system tolerate failure of a single machine.
You may also run more than three JournalNodes, but in order to increase the number of failures that the system can tolerate, you must run an odd number of JNs, (i.e. 3, 5, 7, etc.).

Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.

Zookeeper machines: For automated failover functionality, there must be an existing Zookeeper cluster available. The Zookeeper service nodes can be co-located with other Hadoop
daemons.

In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and therefore do not deploy a Secondary NameNode, CheckpointNode, or BackupNode in an HA
cluster.

Question : Table schemas in Hive are:

1. Stored as metadata on the NameNode
2. Stored along with the data in HDFS
3. Stored in the Metadata
4. Stored in ZooKeeper
5. Stored in Hive Metastore

Correct Answer : 5
Introduction

All the metadata for Hive tables and partitions are stored in Hive Metastore. Metadata is persisted using JPOX ORM solution so any store that is supported by it can be used by
Hive. Most of the commercial relational databases and many open source datstores are supported. Any datastore that has a JDBC driver can probably be used.

Question : __________ are responsible for local monitoring of resource availability, fault reporting,
and container life-cycle management (e.g., starting and killing
jobs).

1. NodeManagers
2. Application Manager
3. Application Master
4. Resource Manager

Correct Answer : 1

Explanation: The central ResourceManager runs as a standalone daemon on a dedicated machine and acts as the central authority for allocating resources to the
various competing applications in the cluster. The ResourceManager has a central and global view of all cluster resources and, therefore, can provide
fairness, capacity, and locality across all users. Depending on the application demand, scheduling priorities, and resource availability, the
ResourceManager dynamically allocates resource containers to applications to run on particular nodes. A container is a logical bundle of resources (e.g.,
memory, cores) bound to a particular cluster node. To enforce and track such assignments, the ResourceManager interacts with a special system daemon
running on each node called the NodeManager. Communications between the ResourceManager and NodeManagers are heartbeat based for scalability.
NodeManagers are responsible for local monitoring of resource availability, fault reporting, and container life-cycle management (e.g., starting and killing
jobs). The ResourceManager depends on the NodeManagers for its "global view" of the cluster.

User applications are submitted to the ResourceManager via a public protocol and go through an admission control phase during which security
credentials are validated and various operational and administrative checks are performed. Those applications that are accepted pass to the scheduler and
are allowed to run. Once the scheduler has enough resources to satisfy the request, the application is moved from an accepted state to a running state.
Aside from internal bookkeeping, this process involves allocating a container for the ApplicationMaster and spawning it on a node in the cluster. Often
called 'container 0,' the ApplicationMaster does not get any additional resources at this point and must request and release additional containers.

Related Questions

Question : Select the correct statement regarding Capacity Scheduler

1. The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees.
2. The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource requirements than the default.
3. The Capacity scheduler works best when the workloads are not known
4. 1 and 3
5. 1 and 2

Question :

Select the correct statement which applies to container

1. a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node.
2. There can be only one container on a single node
3. A container is supervised by the NodeManager and scheduled by the ResourceManager
4. 1 and 2
5. 1 and 3

Question : Select the correct statement which applies to Node Manager

1. On start-up, the NodeManager registers with the ResourceManager
2. Its primary goal is to manage only the containers (On the node) assigned to it by the ResourceManager
3. The NodeManager is YARNs per-node "worker" agent, taking care of the individual compute nodes in a Hadoop cluster.
4. 1 and 2
5. 1 and 3

Question : In the YARN design, Map-Reduce is just one

1. Resource Manager
2. Application
3. Container
4. None of the above

Question : Select the correct statement for HDFS in Hadoop .

1. NameNode federation significantly improves the scalability and performance of HDFS by introducing the ability to deploy multiple NameNodes for a single cluster.
2. built-in high availability for the NameNode via a new feature called the Quorum Journal Manager (QJM). QJM-based HA features an active NameNode and a standby NameNode
3. The standby NameNode can become active either by a manual process or automatically
4. 1 and 3
5. 1,2 and 3

Question : Select the correct statement which applies to "Fair Scheduler"

1. Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time
2. By default, the Fair Scheduler bases scheduling fairness decisions only on CPU
3. It can be configured to schedule with both memory and CPU
4. 1 and 3
5. 1 2 and 3