Question : While upgrading the CDH with YARN, you have to install or update and start a Zookeeper, why you have to do that? 1. For High Availability of NameNode 2. For High Availability of JobTracker 3. For High Availability of Resource Manager 4. 1 and 2 only 5. All 1,2 and 3
Correct Answer : 4
Explanation: Install and deploy ZooKeeper. Cloudera recommends that you install (or update) and start a ZooKeeper cluster before proceeding. This is a requirement if you are deploying high availability (HA) for the NameNode or JobTracker.
Question :
Which of the following step needs to be followed for upgradng the HDFS Metadata for an HA deployement
1. Run the following command on the active NameNode only, and make sure the JournalNodes have been upgraded to CDH 5 and are up and running before you run the command. $ sudo service hadoop-hdfs-namenode -upgrade 2. If Kerberos is not enabled: $ sudo -u hdfs hdfs namenode -bootstrapStandby $ sudo service hadoop-hdfs-namenode start
3. Start up the DataNodes: On each DataNode: $ sudo service hadoop-hdfs-datanode start
4. Wait for NameNode to exit safe mode, and then start the Secondary NameNode. a. To check that the NameNode has exited safe mode, look for messages in the log file, or the NameNode's web interface, that say "...no longer in safe mode." b. To start the Secondary NameNode (if used), enter the following command on the Secondary NameNode host: $ sudo service hadoop-hdfs-secondarynamenode start 1. 1,2,3 2. 2,3,4 3. 1,3,4 4. 1,2,4
Correct Answer : 1 Explanation: Upgrade the HDFS Metadata Note: What you do in this step differs depending on whether you are upgrading an HDFS HA deployment using Quorum-based storage, or a non-HA deployment using a secondary NameNode. (If you have an HDFS HA deployment using NFS storage, do not proceed; you cannot upgrade that configuration to CDH 5. Unconfigure your NFS shared storage configuration before you attempt to upgrade.) o For an HA deployment, do sub-steps 1, 2, and 3. o For a non-HA deployment, do sub-steps 1, 3, and 4. 1. To upgrade the HDFS metadata, run the following command on the NameNode. If HA is enabled, do this on the active NameNode only, and make sure the JournalNodes have been upgraded to CDH 5 and are up and running before you run the command. $ sudo service hadoop-hdfs-namenode -upgrade Important: In an HDFS HA deployment, it is critically important that you do this on only one NameNode. You can watch the progress of the upgrade by running: $ sudo tail -f /var/log/hadoop-hdfs/hadoop-hdfs-namenode-(hostname>.log Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/(name> is complete Note: The NameNode upgrade process can take a while depending on how many files you have. 2. Do this step only in an HA configuration. Otherwise skip to starting up the DataNodes. Wait for NameNode to exit safe mode, and then re-start the standby NameNode. o If Kerberos is enabled: $ kinit -kt /path/to/hdfs.keytab hdfs/(fully.qualified.domain.name@YOUR-REALM.COM> && hdfs namenode -bootstrapStandby $ sudo service hadoop-hdfs-namenode start o If Kerberos is not enabled: o $ sudo -u hdfs hdfs namenode -bootstrapStandby $ sudo service hadoop-hdfs-namenode start For more information about the haadmin -failover command, see Administering an HDFS High Availability Cluster. 3. Start up the DataNodes: On each DataNode: $ sudo service hadoop-hdfs-datanode start 4. Do this step only in a non-HA configuration. Otherwise skip to starting YARN or MRv1. Wait for NameNode to exit safe mode, and then start the Secondary NameNode. a. To check that the NameNode has exited safe mode, look for messages in the log file, or the NameNode's web interface, that say "...no longer in safe mode." b. To start the Secondary NameNode (if used), enter the following command on the Secondary NameNode host: $ sudo service hadoop-hdfs-secondarynamenode start c. To complete the cluster upgrade, follow the remaining steps below.
Question : . You have correctly configured the YARN cluster, now you have to start YARN cluster properly with following steps.
1. On the ResourceManager system: $ sudo service hadoop-yarn-resourcemanager start
2. On each NodeManager system (typically the same ones where DataNode service runs): $ sudo service hadoop-yarn-nodemanager start
3. On the MapReduce JobHistory Server system: $ sudo service hadoop-mapreduce-historyserver start Select the correct order of the above steps
1. 1,2,3 2. 2,3,1 3. 3,2,1 4. In any random order you can start above services
Correct Answer : 1
Explanation: To start YARN, start the ResourceManager and NodeManager services: Make sure you always start ResourceManager before starting NodeManager services. On the ResourceManager system: $ sudo service hadoop-yarn-resourcemanager start On each NodeManager system (typically the same ones where DataNode service runs): $ sudo service hadoop-yarn-nodemanager start
To start the MapReduce JobHistory Server On the MapReduce JobHistory Server system: $ sudo service hadoop-mapreduce-historyserver start For each user who will be submitting MapReduce jobs using MapReduce v2 (YARN), or running Pig, Hive, or Sqoop in a YARN installation, make sure that the HADOOP_MAPRED_HOME environment variable is set correctly as follows: $ export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce