Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question :

What is the result of the following command (the database username is foo and password is bar)?
$ sqoop list-tables - - connect jdbc : mysql : / / localhost/databasename - - table - - username foo -
- password bar

1. sqoop lists only those tables in the specified MySql database that have not already been imported into FDFS
2. sqoop returns an error
3. Access Mostly Uused Products by 50000+ Subscribers
4. sqoopimports all the tables from SQLHDFS

Correct Answer : Get Lastest Questions and Answer :

Question :

Which best describes the primary function of Flume?

1. Flume is a platform for analyzing large data sets that consists of a high level language for
expressing data analysis programs, coupled with an infrastructure consisting of sources and sinks
for importing and evaluating large data sets
2. Flume acts as a Hadoop filesystem for log files
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume provides a query languages for Hadoop similar to SQL
5. Flume is a distributed server for collecting and moving large amount of data into HDFS as its
produced from streaming data flows

Correct Answer : Get Lastest Questions and Answer :

Question :

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. C,E

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question :

Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1
Test2.txt
www.hadoopexam.com Hadoop YARN Training
Test3.txt
http://hadoopexam.com Amazon WebService Training
Now you run the following command
hadoop fs -getmerge merge/ output1.txt
What is the content in the output1.txt file

1.
hadoopexam.com Hadoop Training 1
www.hadoopexam.com Hadoop YARN Training
http://hadoopexam.com Amazon WebService Training

2.
www.hadoopexam.com Hadoop YARN Training
hadoopexam.com Hadoop Training 1
http://hadoopexam.com Amazon WebService Training

3. Access Mostly Uused Products by 50000+ Subscribers
It could be any random order
4.
www.hadoopexam.com Hadoop YARN Traininghadoopexam.com Hadoop Training 1http://hadoopexam.com Amazon WebService Training

Question :

Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1
Test2.txt
www.hadoopexam.com Hadoop YARN Training
Test3.txt
http://hadoopexam.com Amazon WebService Training
Now you run the following command
hadoop fs -getmerge -nl merge/ output2.txt
What is the content in the output2.txt file

1.
hadoopexam.com Hadoop Training 1
www.hadoopexam.com Hadoop YARN Training
http://hadoopexam.com Amazon WebService Training

2.

hadoopexam.com Hadoop Training 1

www.hadoopexam.com Hadoop YARN Training

http://hadoopexam.com Amazon WebService Training

3. Access Mostly Uused Products by 50000+ Subscribers
4. www.hadoopexam.com Hadoop YARN Traininghadoopexam.com Hadoop Training 1http://hadoopexam.com Amazon WebService Training

Question :

In the regular WordCount MapReduce example, you have following driver code

public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount");
job.setJarByClass(WordCount.class);
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
}}

Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount

Select the correct statement from below.

1. It will run 2 Mapper and 2 Reducer
2. It will run 2 Reducer, but number of Mapper is not known.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There is not enough information to tell number of reducer.

Question : In the regular WordCount MapReduce example, you have following driver code.
public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
} }
Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount -D mapred.reduce.tasks=3
Select the correct statement from below.

1. It will run 3 Reducer as command line option would be preferred
2. It will run 2 reducers as driver code has defined number of reducer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of reducer can not be determined command line and driver configuration is just a hint

Question :

You are running the regular WordCount example with the Mapper and Reducer defined in a separate class. Now you have 4 files
in a directory from which you want to count number of words.
Out of these 4 files, 3 files has 1 line in each file and 4th file has 0 lines.
Now you run the wordcount job, then how many Mapper will be executed (Assuming you are running on a single node)?

1. Only 1 Mapper as it is a single node cluster
2. 3 Mapper, only for the files which has the data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of Mapper is non-deterministic

Question : Please select the correct features for the HDFS

1. Files in HDFS can concurrently updated and read
2. Files in HDFS can concurrently updated
3. Access Mostly Uused Products by 50000+ Subscribers
4. Files in HDFS cannot be concurrently read