Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : In a YARN cluster definitions are as given in image.

What is the maximum amount of virtual memory allocated for each
map task before YARN will kill its Container?

1. 4 GB
2. 17.2 GB
3. 8.4 GB
4. 24.6 GB
5. 2 and 3

Correct Answer : 3

Explanation: With the preceding settings on our example cluster, each Map task will receive the following memory allocations:

Total physical RAM allocated = 4 GB

Virtual memory upper limit = 4*2.1 = 8.4 GB

With MapReduce on YARN, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Map and Reduce tasks as needed by each job. In our example cluster, with the above configurations, YARN will be able to allocate up to 10 Mappers (40/4) or 5 Reducers (40/8) on each node (or some other combination of Mappers and Reducers within the 40 GB per node limit).

Question : Assuming you're not running HDFS Federation, what is the maximum number of NameNode daemons you
should run on your cluster in order to avoid a split-brain scenario with your NameNode when running HDFS
High Availability (HA) using Quorum-based storage?

1. Two active NameNodes and two Standby NameNodes
2. One active NameNode and one Standby NameNode
3. Two active NameNodes and on Standby NameNode
4. Unlimited. HDFS High Availability (HA) is designed to overcome limitations on the number of NameNodes you can deploy

Correct Answer : 2

Explanation: In a typical HA cluster, two separate machines are configured as NameNodes. In a working cluster, one of the NameNode machine is in the Active state, and the other is in the Standby state.

The Active NameNode is responsible for all client operations in the cluster, while the Standby is acts as a slave. The Standby machine maintains enough state to provide a fast failover (if required).

In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called JournalNodes (JNs). When the Active node performs any namespace modification, the Active node durably logs a modification record to a majority of these JNs. The Standby node reads the edits from the JNs and continuously watches the JNs for changes to the edit log. Once the Standby Node observes the edits, it applies these edits to its own namespace. When using QJM, JournalNodes acts the shared editlog storage. In a failover event, the Standby ensures that it has read all of the edits from the JounalNodes before promoting itself to the Active state. (This mechanism ensures that the namespace state is fully synchronized before a failover completes.)

Note
Secondary NameNode is not required in HA configuration because the Standby node also performs the tasks of the Secondary NameNode.

In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information of the location of blocks in your cluster. To get accurate information of the block locations, DataNodes are configured with the location of both the NameNodes and send block location information and heartbeats to both NameNode machines.

It is vital for the correct operation of an HA cluster that only one of the NameNodes should be Active at a time. Failure to do so, would cause the namespace state to quickly diverge between the two NameNode machines thus causing potential data loss. (This situation is called as split-brain scenario.)
To prevent the split-brain scenario, the JournalNodes allow only one NameNode to be a writer at a time. During failover, the NameNode, that is to chosen to become active, takes over the role of writing to the JournalNodes. This process prevents the other NameNode from continuing in the Active state and thus lets the new Active node proceed with the failover safely.

Question : When running with N JournalNodes, the system can tolerate at most _____ failures and continue to function normally.

1. N/2
2. (N - 1) / 2
3. (N + 1) / 2
4. (N - 2) / 2

Correct Answer : 3
Ensure that you prepare the following hardware resources:

NameNode machines: The machines where you run Active and Standby NameNodes, should have exact same hardware. For recommended hardware for Hadoop, see Hardware recommendations for Apache Hadoop.

JournalNode machines: The machines where you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be co-located on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager.

Note
There must be at least three JournalNode daemons, because edit log modifications must be written to a majority of JNs. This lets the system tolerate failure of a single machine. You may also run more than three JournalNodes, but in order to increase the number of failures that the system can tolerate, you must run an odd number of JNs, (i.e. 3, 5, 7, etc.).

Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.

Zookeeper machines: For automated faolover functionality, there must be an existing Zookeeper cluster available. The Zookeeper service nodes can be co-located with other Hadoop daemons.

In an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and therefore do not deploy a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster.

Related Questions

Question : Select the correct command/commands which can be used to Dump the container logs

1. yarn logs -applicationId ApplicationId
2. yarn logs -appOwner AppOwner
3. Access Mostly Uused Products by 50000+ Subscribers
4. yarn logs -nodeAddress NodeAddress

1. 1,2,3
2. 2,3,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2,4
5. All 1,2,3,4

Question : You have a cluster of Nodes in Geneva Datacenter , and you find a specific node in your cluster appears to be running
slower than other nodes with all having same hardware configuration. You think that RAM could be failure in the system.
Which commands may be used to the view the memory seen in the system?

1. free
2. df
3. Access Mostly Uused Products by 50000+ Subscribers
4. jps

Question : You have a cluster of Nodes in Geneva Datacenter , and you find a specific node in your cluster appears to be running
slower than other nodes with all having same hardware configuration. You think that RAM could be failure in the system.
Which commands may be used to the view the memory seen in the system?
1. free 2. du 3. top 4. dmidecode 5. ramusage 6. ps -aef | grep java 7. memoryusage

1. 1,4,5
2. 1,2,4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,4,6
5. 1,4,7

Question : You are having a Hadoop cluster in the Geneva Datacenter with a NameNode on host hadoopexam,
a Secondary NameNode on host hadoopexam2 and 1000 slave node. Everyday you have to create a report thrice a day,
when the last checkpoint happened. Select the best way to find this.

1. Connect to the web UI of the Primary NameNode (http://hadoopexam1:50090/) and look at the "Last Checkpoint" information.
2. Execute hdfs dfsadmin -lastreport on the command line
3. Access Mostly Uused Products by 50000+ Subscribers
4. With the command line option hdfs dfsadmin -Checkpointinformation

Question : Which of the follwing information can be received by connecting the web UI of the Secondary NameNode

1. NameNode Address
2. Last Checkpoint Time
3. Access Mostly Uused Products by 50000+ Subscribers
4. CheckPoint Size
5. CheckPoint Edit Dirs

1. 1,2,3,4
2. 2,3,4,5
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2,3,4,5

Question : In HadoopExam Inc's Geneva Datacenter you have a Hadoop Clucter with NameNode as HadoopExam and Secondary Namenode as HadoopExam all other
remaining nodes are data nodes. A specific node in your cluster appears to be running slower than other nodes with the same hardware configuration.
You suspect that the system is swapping memory to disk due to over allocation of resources. Which commands may be used to view the memory and swap usage on the system?

1. ps -aef | grep java

2. vmstat

3. Access Mostly Uused Products by 50000+ Subscribers

4. du

5. memswap

6. memoryusage

7. top

1. 1,5,6
2. 1,4,7
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5,6,7
5. 1,6,7