Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question :

You have defined flume agent as a1, with following configuration

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become

  :
1. /flume/events/2012-06-12/1150/00
2. /flume/events/2012-06-12/1200/00
3. Access Mostly Uused Products by 50000+ Subscribers
4. /flume/events/2012-06-12/1160/00

Correct Answer : Get Lastest Questions and Answer :

The population in year 2001 = 2 * ((150*150 + 100*100) power 1/2) = 361



Explanation: The above configuration will round down the timestamp to the last 10th minute as round value set tp 10 and round unit is minute.




Question :

You have defined flume agent as a1, with following configuration

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 20
a1.sinks.k1.hdfs.roundUnit = minute

an event with timestamp 11:51:34 AM, June 12, 2012 and another event happens as 11:54:34 AM, June 12, 2012
So in which of the path the file will be stored

 :
1. /flume/events/2012-06-12/1140/00
2. /flume/events/2012-06-12/1200/00
3. Access Mostly Uused Products by 50000+ Subscribers
4. /flume/events/2012-06-12/1160/00

Correct Answer : Get Lastest Questions and Answer :

Explanation: The above configuration will round down the timestamp to the last 20th minute as round value set tp 20 and round unit is minute.



 :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
Correct Answer
: Get Lastest Questions and Answer :

Explanation: Reducer groups by key within the partition, hence it needs to use Partioner, Key Comparator as well as Group Comparator to implement Secondary Sort.
From the all 4 option best fit comparator is 2nd one which compares the first part (year) in the reducer it will be in the same group. And letter on on the
secomd part you can make sorting using KeyComprator.

We must now ensure that all the values for the same natural key are passed in one call to the Reducer
Achieved by defining a Grouping Comparator class

Determines which keys and values are passed in a single call to the Reducer
Looks at just the natural key

Grouping comparators can be used in a secondary sort to ensure that only the natural key is used for partitioning and grouping




Question :

There are two input files as belwo to MapReduce Join job.

input/A
A.a11 A.a12
A.a21 A.a22
B.a21 A.a32
A.a31 A.a32
B.a31 A.a32

input/B
A.a11 B.a12
A.a11 B.a13
B.a11 B.a12
B.a21 B.a22
A.a31 B.a32
B.a31 B.a32

After running the MapReduce join code snippet(Left Hand Side)

What would be the first line of the output

 :
1. A.a11 A.a12 B.a12
2. A.a11 A.a12 A.a11 B.a13
3. Access Mostly Uused Products by 50000+ Subscribers
4. B.a21 A.a32 B.a21 B.a22

Correct Answer : Get Lastest Questions and Answer :



Related Questions


Question : Select the correct statement ?
 : Select the correct statement ?
1. Block size is usually 64 MB or 128 MB
2. Blocks are replicated across multiple machine
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above


Question : Which is the master node for tracking the files block in HDFS ?

 : Which is the master node for tracking the files block in HDFS ?
1. JOBTracker
2. DataNode
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataMasteNode


Question : Select the correct options

 : Select the correct options
1. NameNode store the metadata for the files
2. DataNode holds the actual blocks
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 2 are correct


Question : Select the correct statement for the NameNode ?

 :  Select the correct statement for the NameNode ?
1. NameNode daemon must be running at all the times
2. NameNode holds all its metadata in RAM for fast access.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 2 are correct




Question : If NameNode stops, the cluster becomes inaccessible ?

  : If NameNode stops, the cluster becomes inaccessible ?
1. True
2. Flase


Question : Secondary NameNode is a backup for NameNode ?


 : Secondary NameNode is a backup for NameNode ?
1. True
2. False