IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : YARN requires a staging directory for temporary files created by running jobs. By default it creates /tmp/hadoop-yarn/staging
But user can not run the jobs, what could be reason.

1. Directory path is not correct
2. staging directory is full
3. Directory has restrictive permissions
4. None of the above

Correct Answers: 3

Explanation: YARN requires a staging directory for temporary files created by running jobs. By default it creates /tmp/hadoop-yarn/staging with restrictive permissions that may prevent
your users from running jobs

Question : In MrV Map or Reduce tasks runs in a container, which of the following component is responsible for launching that container

1. JobHistoryServer
2. NodeManager
3. Application Master
4. Resource Manager

Correct Answer : 2

Explanation: The MapReduce-specific capabilities of the JobTracker have moved into the MapReduce Application Master, one of which is started to manage each MapReduce job and terminated
when the job completes. The JobTracker's function of serving information about completed jobs has been moved to the JobHistoryServer. The TaskTracker has been replaced with the
NodeManager, a YARN service that manages resources and deployment on a node. NodeManager is responsible for launching containers, each of which can house a map or reduce task.

Question : In MR, each node was configured with a fixed number of map slots and a fixed number of reduce slots.
Under YARN, there is no distinction between resources available for maps and resources available for reduces - all resources are available for both

1. True
2. False

Correct Answer : 1
One of the larger changes in MR2 is the way that resources are managed. In MR1, each node was configured with a fixed number of map slots and a fixed number of reduce slots. Under
YARN, there is no distinction between resources available for maps and resources available for reduces - all resources are available for both. Second, the notion of slots has been
discarded, and resources are now configured in terms of amounts of memory (in megabytes) and CPU (in "virtual cores",). Resource configuration is an inherently difficult topic, and
the added flexibility that YARN provides in this regard also comes with added complexity. Cloudera Manager will pick sensible values automatically, but if you are setting up your
cluster manually or just interested in the details

Related Questions

Question : In the Hadoop . framework, if HBase is also running on the same node for which available RAM is GB, so what is the ideal configuration
for "Reserved System Memory"

1. 1GB
2. 2GB
3. 3GB
4. No need to reserve

Question : MapReduce runs on top of YARN and utilizes YARN Containers to schedule and execute its Map and Reduce tasks.
When configuring MapReduce resource utilization on YARN, which of the aspects to consider:

1. The physical RAM limit for each Map and Reduce task
2. The JVM heap size limit for each task.
3. The amount of virtual memory each task will receive.
4. 1 and 3
5. All 1,2 and 3

Question : Assuming you're not running HDFS Federation, what is the maximum number of NameNode daemons you
should run on your cluster in order to avoid a split-brain scenario with your NameNode when running HDFS
High Availability (HA) using Quorum-based storage?

1. Two active NameNodes and two Standby NameNodes
2. One active NameNode and one Standby NameNode
3. Two active NameNodes and on Standby NameNode
4. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy

Question : When running with N JournalNodes, the system can tolerate at most _____ failures and continue to function normally.

1. N/2
2. (N - 1) / 2
3. (N + 1) / 2
4. (N - 2) / 2

Question : Table schemas in Hive are:

1. Stored as metadata on the NameNode
2. Stored along with the data in HDFS
3. Stored in the Metadata
4. Stored in ZooKeeper
5. Stored in Hive Metastore

Question : __________ are responsible for local monitoring of resource availability, fault reporting,
and container life-cycle management (e.g., starting and killing
jobs).

1. NodeManagers
2. Application Manager
3. Application Master
4. Resource Manager