Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question : A combiner reduce the amount of data sent to the Reducer ?

  : A combiner reduce the amount of data sent to the Reducer ?
1. True
2. False

Correct Answer : 1


Explanation: Often, Mappers produce large amounts of intermediate data
- The data must be passed to the Reducers
- This can result in a lot of network traffic.

You can specify the Combiner, which is consider mini-reducer
- Combiner runs locally on a single Mappers output.
- Output from the Combiner is sent to the Reducers.
- Input and Output data types for the Combiner and Reducer must be identical.

Combiner can be applied only when operation performed is commutative and associative.

Note : The Combiner may run once, or more than once, on the output from any given Mapper.

Do not put the in the Combiner which could influence your results if it runs more than once.


Refer HadoopExam.com Recorded Training Module : 3






Question :

Combiner reduces the network traffic but increases the amount of work needed to be done by the reducer ?

  :
1. True
2. False

Correct Answer : 2


Explanation: Combiner decreases the amount of network traffice required during the shuffle and sort phase
and often also decreases the amount of work needed to be done by the reducer.

Often, Mappers produce large amounts of intermediate data
- The data must be passed to the Reducers
- This can result in a lot of network traffic.

You can specify the Combiner, which is consider mini-reducer
- Combiner runs locally on a single Mappers output.
- Output from the Combiner is sent to the Reducers.
- Input and Output data types for the Combiner and Reducer must be identical.

Combiner can be applied only when operation performed is commutative and associative.

Note : The Combiner may run once, or more than once, on the output from any given Mapper.

Do not put the in the Combiner which could influence your results if it runs more than once.


Refer HadoopExam.com Recorded Training Module : 3




Question :

Which is the correct for Pseudo-Distributed mode of the Hadoop

  :
1. This a single machine cluster
2. All daemons run on the same machine
3. It does not require to run all the daemon in this mode
4. All 1,2 and 3 are correct
5. Only 1 and 2 are correct




Correct Answer : 5


Explanation: A developer will configure their machine to run in Pseudo-Distributed mode

This effectively creates a single machine cluster
All five Hadoop daemons are running on the same machine
Very useful for testing code before it is deployed to the real cluster

Refer HadoopExam.com Recorded Training Module : 14 and 16



Related Questions


Question : TaskTracker can not start multiple task in the same node

 :  TaskTracker can not start multiple task in the same node
1. True
2. False


Question : TaskTracker runs all the MapTask in the same JVM, if machine has enough processing power and Memory

 : TaskTracker runs all the MapTask in the same JVM, if machine has enough processing power and Memory
1. True
2. False


Question : Select the correct statement


 : Select the correct statement
1. While job is running the intermediate data is keep deleted
2. Reducers write their final output to HDFS
3. Intermediate data is never deleted, HDFS stores them for History Tracking
4. All 1,2 and 3 are correct
5. None of the above



Question : The Intermediate data is held on the TaskTrackers local disk ?
 : The Intermediate data is held on the TaskTrackers local disk ?
1. True
2. False



Question : Which hadoop project gives SQL like interface to access data which is stored in HDFS
 :  Which hadoop project gives SQL like interface to access data which is stored in HDFS
1. Flume
2. Hive
3. Pig
4. 2 and 3


Question : Which of the following project provides the dataflow for tranforming large datasets

 :  Which of the following project provides the dataflow for tranforming large datasets
1. Hive
2. Pig
3. Flume
4. 2 and 3 both