Premium

Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)



Question :

Select the feature of Mapreduce
  :
1. Automatic parallelization and distribution
2. fault-tolerance
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above


Correct Answer : Get Lastest Questions and Answer :





Question : While upgrading your cluster from MRv to MRv, you wish to have additional data nodes with the higher amount of Hard Drive and Memory.
But you are worried that, you can have new nodes with the advanced hardware or not. Select the correct statement regarding new nodes in MRv2 cluster.
 : While upgrading your cluster from MRv to MRv, you wish to have additional data nodes with the higher amount of Hard Drive and Memory.
1. With new slave nodes you can have any amount of hard drive space
2. With new slave nodes you must have at least 12 X2TB of hard drive space

3. Access Mostly Uused Products by 50000+ Subscribers
4. New node hardware must be equal to existing hardware.

Correct Answer : Get Lastest Questions and Answer :

Explanation: Begin by starting one DataNode only to make sure it can properly connect to the NameNode. Use the service command to run the /etc/init.d script.
$ sudo service hadoop-hdfs-datanode start
You'll see some extra information in the logs such as:
10/10/25 17:21:41 INFO security.UserGroupInformation:
Login successful for user hdfs/fully.qualified.domain.name@YOUR-REALM.COM using keytab file /etc/hadoop/conf/hdfs.keytab
If you can get a single DataNode running and you can see it registering with the NameNode in the logs, then start up all the DataNodes. You should now be able to do all HDFS operations. You can have different slave nodes on the cluster to have different amounts of disk space available. The capacity of the current slave nodes has nothing to do with capacity with which new nodes can be configured.






Question : In the QuickTechie Inc Hadoop Infrastructure, you have created a rack topology and in the script to identify each machine
as being in hadooprack1, hadooprack2, or hadooprack3. Now as a developer from your desktop which is outside of your QuickTechie Hadoop cluster
but on the same network, you writes 64MB of data. Your Hadoop cluster has all the default configuration, with first replica of the block is written
to a node on hadooprack2. Now select the correct statement for other two replicas.
 : In the QuickTechie Inc Hadoop Infrastructure, you have created a rack topology and in the script to identify each machine
1. One of the replica will be written on hadooprack2, and the other on hadooprack3.

2. Either both will be written to nodes on hadooprack1, or both will be written to nodes on hadooprack3.

3. Access Mostly Uused Products by 50000+ Subscribers

4. One will be written to hadooprack1, and one will be written to hadooprack3.


Correct Answer : Get Lastest Questions and Answer :

Explanation: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.

The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.

For the default threefold replication, Hadoop's rack placement policy is to write the first copy of a block on a node in one rack, then the other two copies on two nodes in a different rack. Since the first copy is written to hadooprack2, the other two will either be written to two nodes on hadoprack1, or two nodes on hadooprack3.

Replica Placement: The First Baby Steps

The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies.

Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.

The NameNode determines the rack id each DataNode belongs to via the process outlined in Hadoop Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks. This prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.

For the common case, when the replication factor is three, HDFSs placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack, and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without compromising data reliability or read performance.

The current, default replica placement policy described here is a work in progress.



Related Questions


Question : The Fair scheduler works best when there is a

 : The Fair scheduler works best when there is a
1. When there is a need of Higher Memory
2. lot of variability between queues
3. Access Mostly Uused Products by 50000+ Subscribers
4. When there is a need of Higher CPU
5. When all the Jobs needs to be processed in submission order


Question : Select the correct statement regarding Capacity Scheduler
 : Select the correct statement regarding Capacity Scheduler
1. The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees.
2. The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource requirements than the default.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3
5. 1 and 2


Question :

Which of the following properties can exist only in the hdfs-site.xml



 :
1. fs.default.name
2. hadoop.http.staticuser.user
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2
5. 1 and 3


Question : Which of the following properties can be configured in mapred-site.xml



 : 	Which of the following properties can be configured in mapred-site.xml
1. yarn --> mapreduce.framework.name
2. $mr_hist:10020 --> mapreduce.jobhistory.address
3. Access Mostly Uused Products by 50000+ Subscribers
4. 2 and 3
5. 1,2 and 3


Question : Which of the following properties are configured in the yarn-site.xml
1. mapreduce.shuffle --> yarn.nodemanager.aux-services
2. org.apache.hadoop.mapred.ShuffleHandler --> yarn.nodemanager.aux-services.mapreduce.shuffle.class
3. Access Mostly Uused Products by 50000+ Subscribers
4. $rmgr:8030 --> yarn.resourcemanager.scheduler.address
5. $rmgr:8031 --> yarn.resourcemanager.resource-tracker.address
6. $rmgr:8032 --> yarn.resourcemanager.address
7. $rmgr:8033 --> yarn.resourcemanager.admin.address
8. $rmgr:8088 --> yarn.resourcemanager.webapp.address

 : Which of the following properties are configured in the yarn-site.xml
1. 1,2,3,6,7,8
2. 2,3,4,5,7,8
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2,3,4,7,8
5. All 1,2,3,4,5,6,7,8


Question : Select the correct statement which applies to "Fair Scheduler"

 : Select the correct statement which applies to
1. Fair scheduling is a method of assigning resources to applications such that all apps get, on average, an equal share of resources over time
2. By default, the Fair Scheduler bases scheduling fairness decisions only on CPU
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3
5. 1 2 and 3