Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question :

What are map files and why are they important?

1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware"
2. Map files are the files that show how the data is distributed in the Hadoop cluster.
3. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution
4. Map files are sorted sequence files that also have an index. The index allows fast data look up.

Correct Answer : Get Lastest Questions and Answer :

The Hadoop map file is a variation of the sequence file. They are very important for map-side join design pattern.

A MapFile is a sorted SequenceFile with an index to permit lookups by key. MapFile can
be thought of as a persistent form of java.util.Map (although it doesnt implement this
interface), which is able to grow beyond the size of a Map that is kept in memory.

Refer HadoopExam.com Recorded Training Module : 7

Question :

Which of the following utilities allows you to create and run MapReduce jobs with any executable or script
as the mapper and or the reducer?

1. Oozie
2. Sqoop
3. Flume
4. Hadoop Streaming

Correct Answer : Get Lastest Questions and Answer :

The Streaming API allows developers to use any language they wish to write Mappers and Reducers
As long as the language can read from standard input and write to standard output

Question :

You need a distributed, scalable, data Store that allows you random, realtime read-write access to hundreds of terabytes of data.
Which of the following would you use?

1. Hue
2. Pig
3. Hive
4. Oozie
5. HBase

Correct Answer : Get Lastest Questions and Answer :

Apache HBase : HBase is the Hadoop database
- A NoSQL datastore
- Can store massive amounts of data
- Gigabytes, terabytes, and even petabytes of data in a table
- Scales to provide very high write throughput
- Hundreds of thousands of inserts per second
- Copes well with sparse data
- Tables can have many thousands of columns
- Even if most columns are empty for any given row
- Has a very constrained access model
- Insert a row, retrieve a row, do a full or partial table scan
- Only one column (the row key ) is indexed
- Does not support multi row transaction

Refer HadoopExam.com Recorded Training Module : 18

Related Questions

Question : The Mapper may use or completely ignore the input key ?

1. True
2. False

Question :What would be the key when file is an input to the MapReduce job

1. The key is the byte offset into the file at which the line starts
2. the key is the line contents itself
3. Access Mostly Uused Products by 50000+ Subscribers
4. Nobe of the above

Question :The Mappers output must be in the form of key value pairs

1. True
2. False

Question : The key output of the Mapper must be identical to reducer input key.

1. True
2. False

Question : One key is processed by one reducer ?

1. True
2. False

Question : Number of the Mapper configuration is defined in JobConf object ?

1. True
2. False