Cloudera Hadoop Administrator Certification Certification Questions and Answer (Dumps and Practice Questions)

Question : The Fair scheduler works best when there is a

1. When there is a need of Higher Memory
2. lot of variability between queues
3. Access Mostly Uused Products by 50000+ Subscribers
4. When there is a need of Higher CPU
5. When all the Jobs needs to be processed in submission order

Correct Answer : Get Lastest Questions and Answer :

Explanation: A new feature in the YARN Fair scheduler is support for hierarchical queues. Queues may now be nested inside other queues, with each queue splitting
the resources allotted to it among its subqueues in a fair scheduling fashion. One use of hierarchical queues is to represent organizational boundaries and
hierarchies. For example, Marketing and Engineering departments may now arrange a queue structure to reflect their own organization. A queue can also
be divided into subqueues by job characteristics, such as short, medium, and long run times.
The Fair scheduler works best when there is a lot of variability between queues. Unlike with the Capacity scheduler, all jobs make progress rather than
proceeding in a FIFO fashion in their respective queues.

Question : Select the correct statement regarding Capacity Scheduler

1. The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees.
2. The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource requirements than the default.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 3
5. 1 and 2

Correct Answer : Get Lastest Questions and Answer :

Explanation: The Capacity scheduler is another pluggable scheduler for YARN that allows for multiple groups to securely share a large Hadoop cluster. Developed by
the original Hadoop team at Yahoo!, the Capacity scheduler has successfully been running many of the largest Hadoop clusters.
To use the Capacity scheduler, an administrator configures one or more queues with a predetermined fraction of the total slot (or processor) capacity.
This assignment guarantees a minimum amount of resources for each queue. Administrators can configure soft limits and optional hard limits on the
capacity allocated to each queue. Each queue has strict ACLs (Access Control Lists) that control which users can submit applications to individual queues.
Also, safeguards are in place to ensure that users cannot view or modify applications from other users.
The Capacity scheduler permits sharing a cluster while giving each user or group certain minimum capacity guarantees. These minimums are not given
away in the absence of demand. Excess capacity is given to the most starved queues, as assessed by a measure of running or used capacity divided by
the queue capacity. Thus, the fullest queues as defined by their initial minimum capacity guarantee get the most needed resources. Idle capacity can be
assigned and provides elasticity for the users in a cost-effective manner.
Queue definitions and properties such as capacity and ACLs can be changed, at run time, by administrators in a secure manner to minimize disruption to
users. Administrators can add additional queues at run time, but queues cannot be deleted at run time. In addition, administrators can stop queues at run
time to ensure that while existing applications run to completion, no new applications can be submitted.
The Capacity scheduler currently supports memory-intensive applications, where an application can optionally specify higher memory resource
requirements than the default. Using information from the NodeManagers, the Capacity scheduler can then place containers on the best-suited nodes.
The Capacity scheduler works best when the workloads are well known, which helps in assigning the minimum capacity. For this scheduler to work most
effectively, each queue should be assigned a minimal capacity that is less than the maximal expected workload. Within each queue, multiple applications
are scheduled using hierarchical FIFO queues similar to the approach used with the stand-alone FIFO scheduler.

Question :

Which of the following properties can exist only in the hdfs-site.xml

1. fs.default.name
2. hadoop.http.staticuser.user
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2
5. 1 and 3

Correct Answer : Get Lastest Questions and Answer :
Exp: core-site.xml
In this file, we define two essential properties for the entire system.
hdfs://$nn:9000 --> fs.default.name
$HTTP_STATIC_USER --> hadoop.http.staticuser.user
First, we define the name of the default file system we wish to use. Because we are using HDFS, we will set this value to hdfs://$nn:9000 ($nn is the
NameNode we specified in the script and 9000 is the standard HDFS port). Next we add the hadoop.http.staticuser.user (hdfs) that we defined in the install script.
This login is used as the default user for the built-in web user interfaces.

hdfs-site.xml
The hdfs-site.xml file holds information about the Hadoop HDFS file system. Most of these values were set at the beginning of the script. They are copied
as follows:
$NN_DATA_DIR --> dfs.namenode.name.dir
$SNN_DATA_DIR --> fs.checkpoint.dir
$SNN_DATA_DIR --> fs.checkpoint.edits.dir
$DN_DATA_DIR --> dfs.datanode.data.dir
The remaining two values are set to the standard default port numbers ($nn is the NameNode and $snn is the SecondaryNameNode we input to the
script):

$nn:50070 --> dfs.namenode.http-address
$snn:50090 --> dfs.namenode.secondary.http-address

Related Questions

Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want
to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are
able to push this full file in a directory on HDFS called /log/QT/QT31012015.log. Now the permission for your file and directory as below in the HDFS.

QT31012015.log -> rw-rw-r-x
Which is the correct statement for the file (QT31012015.log)

1. No one can modify the contents of the file.

2. The owner and group can modify the contents of the file. Others cannot.
3. Access Mostly Uused Products by 50000+ Subscribers

4. HDFS runs in userspace which makes all users with access to the namespace able to read, write, and modify all files.
5. The owner and group cannot delete the file, but others can.

Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want
to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are able
to push this full file in a directory on HDFS called /log/QT/QT31012015.log. Now the permission for your file and directory as below in the HDFS.
QT31012015.log -> rw-r--r--
/log/QT -> rwxr-xr-x
Which is the correct statement for the file (QT31012015.log)

1. The file cannot be deleted by anyone but the owner
2. The file cannot be deleted by anyone

3. Access Mostly Uused Products by 50000+ Subscribers
4. The file's existing contents can be modified by the owner, but no-one else

Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to
save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are able to push this
full file in a directory on HDFS called /log/QT/QT31012015.log. You want that your data in /log/QT/QT31012015.log file will not be compromised,
so what does HDFS help us for this

1. Storing multiple replicas of data blocks on different DataNodes.

2. Reliance on SAN devices as a DataNode interface.
3. Access Mostly Uused Products by 50000+ Subscribers

4. DataNodes make copies of their data blocks, and put them on different local disks.

Question :What is HBASE?

1. Hbase is separate set of the Java API for Hadoop cluster
2. Hbase is a part of the Apache Hadoop project that provides interface for scanning large amount of data using Hadoop infrastructure
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase is a part of the Apache Hadoop project that provides a SQL like interface for data processing.

Question :What is the role of the namenode?

1. Namenode splits big files into smaller blocks and sends them to different datanodes
2. Namenode is responsible for assigning names to each slave node so that they can be identified by the clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 2 and 3 are valid answers

Question : What happen if a datanode loses network connection for a few minutes?

1. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, administrator will need to manually delete the extra replicas
2. All data will be lost on that node. The administrator has to make sure the proper data distribution between nodes
3. Access Mostly Uused Products by 50000+ Subscribers
4. The namenode will detect that a datanode is not responsive and will start replication of the data from remaining replicas. When datanode comes back online, the extra replicas will be deleted

Ans : 4
Exp : The replication factor is actively maintained by the namenode. The namenode monitors the status of all datanodes and keeps track which blocks are located on that node. The moment the datanode is not avaialble it will trigger replication of the data from the existing replicas. However, if the datanode comes back up, overreplicated data will be deleted. Note: the data might be deleted from the original datanode.

Question : What happen if one of the datanodes has much slower CPU? How will it effect the performance of the cluster?

1. The task execution will be as fast as the slowest worker.
However, if speculative execution is enabled, the slowest worker will not have such big impact
2. The slowest worker will significantly impact job execution time. It will slow everything down
3. Access Mostly Uused Products by 50000+ Subscribers
4. It depends on the level of priority assigned to the task. All high priority tasks are executed in parallel twice. A slower datanode would therefore be bypassed. If task is not high priority, however, performance will be affected.
Ans : 1
Exp : Hadoop was specifically designed to work with commodity hardware. The speculative execution helps to offset the slow workers. The multiple instances of the same task will be created and job tracker will take the first result into consideration and the second instance of the task will be killed

Question :If you have a file M size and replication factor is set to , how many blocks can you find on the cluster that will correspond to
that file (assuming the default apache and cloudera configuration)?

1. 3
2. 6
3. Access Mostly Uused Products by 50000+ Subscribers
4. 12
Ans : 2
Exp : Based on the configuration settings the file will be divided into multiple blocks according to the default block size of 64M. 128M / 64M = 2 . Each block will be replicated according to replication factor settings (default 3). 2 * 3 = 6 .

Question : What is replication factor?

1. Replication factor controls how many times the namenode replicates its metadata
2. Replication factor creates multiple copies of the same file to be served to clients
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct.
Ans : 3
Exp : Data is replicated in the Hadoop cluster based on the replication factor. The high replication factor guarantees data availability in the event of failure.

Question : How does the Hadoop cluster tolerate datanode failures?

1. Failures are anticipated. When they occur, the jobs are re-executed.
2. Datanodes talk to each other and figure out what need to be re-replicated if one of the nodes goes down
3. Access Mostly Uused Products by 50000+ Subscribers
4. Since Hadoop is design to run on commodity hardware, the datanode failures are expected. Namenode keeps track of all available datanodes and actively maintains replication factor on all data.
Ans : 4
Exp : The namenode actively tracks the status of all datanodes and acts immediately if the datanodes become non-responsive. The namenode is the central "brain" of the HDFS and starts replication of the data the moment a disconnect is detected.

Question :Which of the following tool, defines a SQL like language..

1. Pig
2. Hive
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume
Ans 2

Question : Hadoop framework provides a mechanism for copying with machine issues such as faulty configuration or impeding hardware failure. MapReduce detects
that one or a number of machines are performing poorly and starts more copies of a map or reduce task. all the task run simulteneously
and the task taht finishes first are used. Which term describe this behaviour..

1. Partitioning
2. Combining
3. Access Mostly Uused Products by 50000+ Subscribers
4. Speculative Execution
Ans : 4

Question :
By using hadoop fs -put command to write a 500 MB file using 64 MB blcok, but while the file is half written, can other user read the already written block

1. It will throw an exception
2. File block would be accessible which are already written
3. Access Mostly Uused Products by 50000+ Subscribers
4. Until the whole file is copied nothing can be accessible.
Ans :4
Exp : While writing the file of 528MB size using following command
hadoop fs -put tragedies_big4 /user/training/shakespeare/
We tried to read the file using following command and output is below.
[hadoopexam@localhost ~]$ hadoop fs -cat /user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [hadoopexam@localhost ~]$ hadoop fs -cat /user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [training@localhost ~]$ hadoop fs -cat /user/training/shakespeare/tragedies_big4 cat: "/user/training/shakespeare/tragedies_big4": No such file or directory [training@localhost ~]$
Once the put command finishes then only we are able to "cat" this file.

Question :

What is a BloomFilter

1. It is a data structure
2. A bloom filter is a compect representation of a set that support only conatin query.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above