Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : Each machine in our cluster has GB of RAM. Some of this RAM should be reserved for Operating System usage. On each node,
we will assign 40 GB RAM for YARN to use and keep 8 GB for the Operating System. The following property sets the maximum memory YARN can utilize on the node:
yarn.nodemanager.resource.memory-mb --> 40960
YARN takes the resource management capabilities that were in MapReduce and packages them so they can be used by new engines.
This also streamlines MapReduce to do what it does best, process data. With YARN, you can now run multiple applications in Hadoop,
all sharing a common resource management.

an example physical cluster of slave nodes each with 48 GB ram, 12 disks and 2 hex core CPUs (12 total cores).
In yarn-site.xml yarn.nodemanager.resource.memory-mb --> 40960s

The next step is to provide YARN guidance on how to break up the total resources available into Containers. You do this by specifying the minimum unit of RAM to allocate for a Container. We want to allow for a maximum of 20 Containers, and thus need (40 GB total RAM) / (20 # of Containers) = 2 GB minimum per container:

In yarn-site.xml yarn.scheduler.minimum-allocation-mb --> 2048
YARN will allocate Containers with RAM amounts greater than the yarn.scheduler.minimum-allocation-mb.

MapReduce 2 runs on top of YARN and utilizes YARN Containers to schedule and execute its map and reduce tasks.
When configuring MapReduce 2 resource utilization on YARN, there are three aspects to consider: * Physical RAM limit for each Map And Reduce task * The JVM heap size limit for each task * The amount of virtual memory each task will get

You can define how much maximum memory each Map and Reduce task will take. Since each Map and each Reduce will run in a separate Container, these maximum memory settings should be at least equal to or more than the YARN minimum Container allocation. For our example cluster, we have the minimum RAM for a Container (yarn.scheduler.minimum-allocation-mb) = 2 GB. We will thus assign 4 GB for Map task Containers, and 8 GB for Reduce tasks Containers.

In mapred-site.xml: mapreduce.map.memory.mb -->4096 mapreduce.reduce.memory.mb-->8192

Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN.In mapred-site.xml:
mapreduce.map.java.opts -->-Xmx3072m mapreduce.reduce.java.opts --> -Xmx6144m

In above scenerio..

1. YARN will be able to allocate on each node up to 10 mappers or 3 reducers or a permutation within that.
2. YARN will be able to allocate on each node up to 8 mappers or 5 reducers or a permutation within that.
3. Access Mostly Uused Products by 50000+ Subscribers
4. With YARN and MapReduce 2, you will have pre-configured static slots for Map and Reduce tasks

Correct Answer : Get Lastest Questions and Answer :

Explanation: With YARN and MapReduce 2, there are no longer pre-configured static slots for Map and Reduce tasks. The entire cluster is available for dynamic resource allocation of Maps and Reduces as needed by the job. In our example cluster, with the above configurations, YARN will be able to allocate on each node up to 10 mappers (40/4) or 5 reducers (40/8) or a permutation within that.

Question :
Which basic configuration parameters
must you set to migrate
your cluster from MapReduce 1 (MRv1) to
MapReduce V2 (MRv2)?

1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. B,D,E
5. D,E,F

Correct Answer : Get Lastest Questions and Answer :

Explanation: Because MR1 functionality has been split into two components in Hadoop 2, MapReduce cluster configuration options have been split into YARN configuration options, which go in yarn-site.xml; and MapReduce configuration options, which go in mapred-site.xml. Many have been given new names to reflect the shift. As JobTrackers and TaskTrackers no longer exist in MR2, all configuration options pertaining to them no longer exist, although many have corresponding options for the ResourceManager, NodeManager, and JobHistoryServer.

A minimal configuration required to run MR2 jobs on YARN is:
yarn-site.xml:
yarn.resourcemanager.hostname
your.hostname.com

yarn.nodemanager.aux-services
mapreduce_shuffle

mapred-site.xml
mapreduce.framework.name
yarn

Question :
Which is the default scheduler in YARN?

1. YARN doesn't configure a default scheduler, you must first assign an appropriate scheduler class in yarn-site.xml
2. Capacity Scheduler
3. Access Mostly Uused Products by 50000+ Subscribers
4. FIFO Scheduler

Correct Answer : Get Lastest Questions and Answer :

Explanation: The Capacity scheduler is the default scheduler that ships with Hadoop YARN
Step 1: yarn.resourcemanager.scheduler.class

The class to use as the resource scheduler.

Default value: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Defined in the yarn-default.xml

Related Questions

Question : As you are upgrading your Hadoop Cluster from CDH to CDH, and while doing that you have to
back up the Configurations data and stop the services.
so which of the following is a correct command for putting the Active NameNode into safe mode.

1. sudo -u hdfs dfsadmin -safemode enter
2. sudo -u hdfs hdfs -safemode enter
3. sudo -u hdfs hdfs dfsadmin
4. sudo -u hdfs hdfs dfsadmin -safemode enter

Question : As you are upgrading your Hadoop Cluster from CDH to CDH, and while doing that you have to
back up the Configurations data and stop the services.
so which of the following is a correct command for a saveNamespace operation.

1. sudo -u hdfs -saveNamespace
2. sudo -u dfsadmin -saveNamespace
3. sudo -u hdfs hdfs dfsadmin -saveNamespace
4. sudo -u hdfs hdfs dfsadmin Namespace

Question : As you are upgrading your Hadoop Cluster from CDH to CDH, and while doing that you have to back
up the Configurations data and stop the services.
so what happen when you do "sudo -u hdfs hdfs dfsadmin -saveNamespace " to perform saveNamespace operation,

1. This will result in two new fsimage being written out with all new edit log entries.
2. This will result in a backup of last fsimage and all the new operation will be appended in the existing fsimage being written out with no edit log entries.
3. This will result in a new fsimage being written out with no edit log entries.
4. This will result in a backup of last fsimage.

Question : Which of the following is correct command to stop the all Hadoop Services across your entire cluster.

1. for x in `cd /etc/init.d ; ls hadoop-*` ; do sudo service $x stop ; done
2. for x in `cd /etc/init.d ; ls mapred-*` ; do sudo service $x stop ; done
3. for x in `cd /etc/init.d ; ls NameNode-*` ; do sudo service $x stop ; done
4. for x in `cd /etc/init.d ; ls hdfs-*` ; do sudo service $x stop ; done

Question : Select the correct statement which applies while taking the back up HDFS metadata on the NameNode machine

1. Do this step when you are sure that all Hadoop services have been shut down.
2. If there NameNode XML is configured with "dfs.name.dir", with multiple path values as a comma-separated then we should take the back up of all the directories.
3. If you see a file containing the word lock,in the configured directory for NameNode, the NameNode is probably still running.
4. 1 and 3
5. 2 and 3

Question : Please select the correct command to uninstall Hadoop

1. On Red Hat-compatible systems: $ sudo yum remove bigtop-utils bigtop-jsvc bigtop-tomcat sqoop2-client hue-common solr
2. On SLES systems: $ sudo remove bigtop-utils bigtop-jsvc bigtop-tomcat sqoop2-client hue-common solr
3. On Ubuntu systems: sudo remove bigtop-utils bigtop-jsvc bigtop-tomcat sqoop2-client hue-common solr
4. All of above