Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : Rows from the HBase cab directly be inserted as input to Mapreduce job

1. True
2. False

Correct Answer : 1

Explanation: You can run MR over data that is stored in HBase

Refer HadoopExam.com Recorded Training Module : 18

Question : In which of the following scenerio we should use HBase

1. If it require random read, write or both
2. If it requires to do many thousands of operations per second on multiple TB of data
3. If access pattern is well known and simple
4. All of the above

Correct Answer : 4

Apache HBase : Use Apache HBase when you need random, realtime read or write access to your Big Data.
HBase goal is the hosting of very large tables
- billions of rows X millions of columns
- atop clusters of commodity hardware.
- If you know the access pattern in advance you can put all the data which are used together in a single column family
hence it access become faster.

Refer HadoopExam.com Recorded Training Module : 18

Question : In which scenerio HBase should not be used

1. You only append to your dataset, and tend to read the whole thing
2. For ad-hoc analytics
3. If data volume is quite small
4. All of the above
5. None of the above

Correct Answer : 4

When Should I Use or not HBase?

First, make sure you have enough data.
If you have hundreds of millions or billions of rows, then HBase is a good candidate.
If you only have a few thousand or million rows, then using a traditional RDBMS might
be a better choice due to the fact that all of your data might wind up on a single node (or two)
and the rest of the cluster may be sitting idle.

Second, make sure you can live without all the extra features that an RDBMS provides and Ad-hoc analysis could make your queries slower.

Related Questions

Question : What are the core components of the Hadoop framework

1. HDFS (Hadoop Distributed File System)
2. MapReduce
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 nad 2 both are correct

Question : Which project is the part of Hadoop Ecosystem ?

1. Pig
2. Hive
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2
5. All of the above

Question : What is the possible data block size in hadoop

1. 64 MB
2. 128 MB
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2 are correct

Question : What is the default replication factor in the HDFS...

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Question : Which of the following are MapReduce processing phases ?

1. Map
2. Reduce
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sort
5. 1 and 2 only

Question : What is true about HDFS ?

1. HDFS is based of Google File System
2. HDFS is written in Java
3. Access Mostly Uused Products by 50000+ Subscribers
4. All above are correct