Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question : In the regular WordCount MapReduce example, you have following driver code.
public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
} }
Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount -D mapred.reduce.tasks=3
Select the correct statement from below.
  : In the regular WordCount MapReduce example, you have following driver code.
1. It will run 3 Reducer as command line option would be preferred
2. It will run 2 reducers as driver code has defined number of reducer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of reducer can not be determined command line and driver configuration is just a hint

Correct Answer : Get Lastest Questions and Answer : Following are the priorities of the 3 options for setting number of reduces
Option1: setNumReduceTasks(2) within the application code
Option2: -D mapreduce.job.reduces=2 as command line argument
Option3: through $HADOOP_CONF_DIR/mapred-site.xml file

property : name mapreduce.job.reduces
value : 2
Above racked in priority order - option 1 will override 2, and 2 will override 3. In other words Option 1 will be the one used by your job in this scenario.





Question :

You are running the regular WordCount example with the Mapper and Reducer defined in a separate class. Now you have 4 files
in a directory from which you want to count number of words.
Out of these 4 files, 3 files has 1 line in each file and 4th file has 0 lines.
Now you run the wordcount job, then how many Mapper will be executed (Assuming you are running on a single node)?


  :
1. Only 1 Mapper as it is a single node cluster
2. 3 Mapper, only for the files which has the data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of Mapper is non-deterministic

Correct Answer : Get Lastest Questions and Answer :

Explanation: If a file size is a less than block size (64MB), then for each file one Mapper will be executed. It does not matter whether file size is zero.




Question : Please select the correct features for the HDFS
  : Please select the correct features for the HDFS
1. Files in HDFS can concurrently updated and read
2. Files in HDFS can concurrently updated
3. Access Mostly Uused Products by 50000+ Subscribers
4. Files in HDFS cannot be concurrently read

Correct Answer : Get Lastest Questions and Answer :

Explanation: An application adds data to HDFS by creating a new file and writing the data to it. After the file is closed, the bytes written cannot be altered or removed except that new data can be added to the file by reopening the file for append. HDFS implements a single-writer, multiple-reader model.
The HDFS client that opens a file for writing is granted a lease for the file; no other client can write to the file. The writing client periodically renews the lease by sending a heartbeat to the NameNode. When the file is closed, the lease is revoked. The lease duration is bound by a soft limit and a hard limit. Until the soft limit expires, the writer is certain of exclusive access to the file. If the soft limit expires and the client fails to close the file or renew the lease, another client can preempt the lease. If after the hard limit expires (one hour) and the client has failed to renew the lease, HDFS assumes that the client has quit and will automatically close the file on behalf of the writer, and recover the lease. The writer's lease does not prevent other clients from reading the file; a file may have many concurrent readers.



Related Questions


Question :

You have defined flume agent as a1, with following configuration

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

an event with timestamp 11:54:34 AM, June 12, 2012 will cause the hdfs path to become

  :
1. /flume/events/2012-06-12/1150/00
2. /flume/events/2012-06-12/1200/00
3. Access Mostly Uused Products by 50000+ Subscribers
4. /flume/events/2012-06-12/1160/00


Question :

You have defined flume agent as a1, with following configuration

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = hdfs
a1.sinks.k1.channel = c1
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 20
a1.sinks.k1.hdfs.roundUnit = minute

an event with timestamp 11:51:34 AM, June 12, 2012 and another event happens as 11:54:34 AM, June 12, 2012
So in which of the path the file will be stored

 :
1. /flume/events/2012-06-12/1140/00
2. /flume/events/2012-06-12/1200/00
3. Access Mostly Uused Products by 50000+ Subscribers
4. /flume/events/2012-06-12/1160/00


Question :

There are two input files as belwo to MapReduce Join job.

input/A
A.a11 A.a12
A.a21 A.a22
B.a21 A.a32
A.a31 A.a32
B.a31 A.a32

input/B
A.a11 B.a12
A.a11 B.a13
B.a11 B.a12
B.a21 B.a22
A.a31 B.a32
B.a31 B.a32

After running the MapReduce join code snippet(Left Hand Side)

What would be the first line of the output

 :
1. A.a11 A.a12 B.a12
2. A.a11 A.a12 A.a11 B.a13
3. Access Mostly Uused Products by 50000+ Subscribers
4. B.a21 A.a32 B.a21 B.a22


Question :

Select the correct code snippet which will produce the 12 files each for a month, considering you have defined 12 reducres for this job

Sample input data
10.1.255.266,hadoopexam.com,index.html,20/Aug/2013
10.1.255.2,hadoopexam.com,index.html,11/Feb/2013
10.1.255.233,hadoopexam.com,index.html,14/Jan/2013

 :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers


Question :

From the below given code snippet please select the correct one which is able to create Compressed Sequence file.

 :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers


Question :

Select the correct code snippet which is able to read the compressed sequence file

 :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers