Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : The Intermediate data is held on the TaskTrackers local disk ?

1. True
2. False

Correct Answer : 1

Intermedate Data

The intermediate data is held on the TaskTrackers local disk
- As Reducers start up, the intermediate data is distributed across the network to the Reducers
- Reducers write their final output to HDFS
- Once the job has completed, the TaskTracker can delete the intermediate data from its local disk
- Note that the intermediate data is not deleted until the entire job completes

Refer HadoopExam.com Recorded Training Module : 2,3 and 4

Question : Which hadoop project gives SQL like interface to access data which is stored in HDFS

1. Flume
2. Hive
3. Pig
4. 2 and 3

Correct Answer : 2

Apache Hive :

Hive is an abstraction on top of MapReduce
Allows users to query data in the Hadoop cluster without knowing Java or MapReduce
- Uses the HiveQL language
- Very similar to SQL
- The Hive Interpreter runs on a client machine
- Turns HiveQL queries into MapReduce jobs
- Hive Submits jobs to the cluster
Note: this does not turn the cluster into a relational database server!
It is still simply running MapReduce jobs
Those jobs are created by the Hive Interpreter

Refer HadoopExam.com Recorded Training Module : 12 and 13

Question : Which of the following project provides the dataflow for tranforming large datasets

1. Hive
2. Pig
3. Flume
4. 2 and 3 both

Correct Answer : 2

Apache Pig : Pig is an alternative abstraction on top of MapReduce
Uses a dataflow scripting language
- Called PigLatin
- The Pig interpreter runs on the client machine
Takes the PigLatin script and turns it into a series of MapReduce jobs
Submits those jobs to the cluster
As with Hive, nothing magical happens on the cluster
It is still simply running MapReduce jobs

Refer HadoopExam.com Recorded Training Module : 11

Related Questions

Question :

What are supported programming language for Hadoop

1. Java and Scripting Language
2. Any Programming Language
3. Only Java
4. C , Cobol and Java

Question :

How does Hadoop process large volumes of data?

1. Hadoop uses a lot of machines in parallel. This optimizes data processing.
2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware
3. Hadoop ships the code to the data instead of sending the data to the code
4. Hadoop uses sophisticated cacheing techniques on namenode to speed processing of data

Question :

What are sequence files and why are they important?

1. Sequence files are binary format files that are compressed and are splitable.
They are often used in high-performance map-reduce jobs
2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
3. Sequence files are intermediate files that are created by Hadoop after the map step
4. All of above

Question :

What are map files and why are they important?

1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware"
2. Map files are the files that show how the data is distributed in the Hadoop cluster.
3. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution
4. Map files are sorted sequence files that also have an index. The index allows fast data look up.

Question :

Which of the following utilities allows you to create and run MapReduce jobs with any executable or script
as the mapper and or the reducer?

1. Oozie
2. Sqoop
3. Flume
4. Hadoop Streaming

Question :

You need a distributed, scalable, data Store that allows you random, realtime read-write access to hundreds of terabytes of data.
Which of the following would you use?

1. Hue
2. Pig
3. Hive
4. Oozie
5. HBase