IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Select the correct statement regarding Capacity Scheduler

1. The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees.
2. The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource requirements than the default.
3. The Capacity scheduler works best when the workloads are not known
4. 1 and 3
5. 1 and 2

Correct Answer : Get Lastest Questions and Answer :

Explanation: The Capacity scheduler is another pluggable scheduler for YARN that allows for multiple groups to securely share a large Hadoop cluster. Developed by
the original Hadoop team at Yahoo!, the Capacity scheduler has successfully been running many of the largest Hadoop clusters.
To use the Capacity scheduler, an administrator configures one or more queues with a predetermined fraction of the total slot (or processor) capacity.
This assignment guarantees a minimum amount of resources for each queue. Administrators can configure soft limits and optional hard limits on the
capacity allocated to each queue. Each queue has strict ACLs (Access Control Lists) that control which users can submit applications to individual queues.
Also, safeguards are in place to ensure that users cannot view or modify applications from other users.
The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees. These minimums are not given
away in the absence of demand. Excess capacity is given to the most starved queues, as assessed by a measure of running or used capacity divided by
the queue capacity. Thus, the fullest queues as defined by their initial minimum capacity guarantee get the most needed resources. Idle capacity can be
assigned and provides elasticity for the users in a cost-effective manner.
Queue definitions and properties such as capacity and ACLs can be changed, at run time, by administrators in a secure manner to minimize disruption to
users. Administrators can add additional queues at run time, but queues cannot be deleted at run time. In addition, administrators can stop queues at run
time to ensure that while existing applications run to completion, no new applications can be submitted.
The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource
requirements than the default. Using information from the NodeManagers, the Capacity scheduler can then place containers on the best-suited nodes.
The Capacity scheduler works best when the workloads are well known, which helps in assigning the minimum capacity. For this scheduler to work most
effectively, each queue should be assigned a minimal capacity that is less than the maximal expected workload. Within each queue, multiple applications
are scheduled using hierarchical FIFO queues similar to the approach used with the stand-alone FIFO scheduler.

Question :

Select the correct statement which applies to container

1. a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node.
2. There can be only one container on a single node
3. A container is supervised by the NodeManager and scheduled by the ResourceManager
4. 1 and 2
5. 1 and 3

Correct Answer : Get Lastest Questions and Answer :

Explanation: At the fundamental level, a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node. There can be multiple
containers on a single node (or a single large one). Every node in the system is considered to be composed of multiple containers of minimum size of
memory (e.g., 512 MB or 1 GB) and CPU. The ApplicationMaster can request any container so as to occupy a multiple of the minimum size.
A container thus represents a resource (memory, CPU) on a single node in a given cluster. A container is supervised by the NodeManager and
scheduled by the ResourceManager.
Each application starts out as an ApplicationMaster, which is itself a container (often referred to as container 0). Once started, the ApplicationMaster must
negotiate with the ResourceManager for more containers. Container requests (and releases) can take place in a dynamic fashion at run time. For instance,
a MapReduce job may request a certain amount of mapper containers; as they finish their tasks, it may release them and request more reducer containers
to be started.

Question : Select the correct statement which applies to Node Manager

1. On start-up, the NodeManager registers with the ResourceManager
2. Its primary goal is to manage only the containers (On the node) assigned to it by the ResourceManager
3. The NodeManager is YARNs per-node "worker" agent, taking care of the individual compute nodes in a Hadoop cluster.
4. 1 and 2
5. 1 and 3

Correct Answer : Get Lastest Questions and Answer :

Explanation: The NodeManager is YARN's per-node worker agent, taking care of the individual compute nodes in a Hadoop cluster. Its duties include keeping up-todate
with the ResourceManager, overseeing application containers life-cycle management, monitoring resource usage (memory, CPU) of individual
containers, tracking node health, log management, and auxiliary services that may be exploited by different YARN applications.
On start-up, the NodeManager registers with the ResourceManager; it then sends heartbeats with its status and waits for instructions. Its primary goal is
to manage application containers assigned to it by the ResourceManager.
YARN containers are described by a container launch context (CLC). This record includes a map of environment variables, dependencies stored in
remotely accessible storage, security tokens, payloads for NodeManager services, and the command necessary to create the process. After validating the
authenticity of the container lease, the NodeManager configures the environment for the container, including initializing its monitoring subsystem with the
resource constraints specified application. The NodeManager also kills containers as directed by the ResourceManager.

Related Questions

Question : How blocks are stored in HDFS

1. As a binary file
2. As a decoded file
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stored as archived

Question :

Which is the correct option for accessing the file which is stored in HDFS

1. Application can read and write files in HDFS using JAVA API
2. There is a command line option to access the files
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 2 are correct

Question : Which is the correct command to copy files from local to HDFS file systems

1. hadoop fs -copy pappu.txt pappu.txt
2. hadoop fs -copyFromPath pappu.txt pappu.txt
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Question :

Select the feature of Mapreduce

1. Automatic parallelization and distribution
2. fault-tolerance
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : Your cluster has slave nodes in three different racks, and you have written a rack topology script identifying each machine as being in hadooprack,
hadooprack2, or hadooprack3. A client machine outside of the cluster writes a small (one-block) file to HDFS. The first replica of the block is written
to a node on hadooprack2. How is block placement determined for the other two replicas?

1. One will be written to another node on hadooprack2, and the other to a node on a different rack.

2. Either both will be written to nodes on hadooprack1, or both will be written to nodes on hadooprack3.

3. Access Mostly Uused Products by 50000+ Subscribers

4. One will be written to hadooprack1, and one will be written to hadooprack3.

Question : Identify which three actions you can accomplish once you implement HDFS High Availability (HA) on your Hadoop cluster.

1. Automatically replicate data between two clusters.

2. Manually 'fail over' between NameNodes.

3. Automatically 'fail over' between NameNodes if one goes down.

4. Shut one NameNode down for maintenance without halting the cluster.

5. Write data to two clusters simultaneously.

1. 1,3,4
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2,3,4,5