Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question :

Map wordCountMap = new Map(String, List(String>>(); //It holds each word as a key and all the same words are in the list
In a word count Mapper class, you are emitting key value pair as
Case 1 : context.write("word, IntWritable(1))
and

Case 2 : context.write("word, IntWritable(wordCountMap.get("word").size())) " ,

Select the correct statement from above example code snippet

1. In both the cases consumption of network bandwidth would be same
2. In Case 1 Network bandwidth consumption would be low
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cannot be determined

Correct Answer : Get Lastest Questions and Answer :

Explanation: In case 2, you are locally counting the words on each Mapper hence, data transfer over the network would be low.

Question :

Suppose you have the file in hdfs directory as below
/myapp/map.zip

And you will use the following API method to add this file to DistributedCache

JobConf job = new JobConf();
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

Which of the best place to read this file in a MapReduce job

1. Inside the map() method of the Mapper
2. You can randomly read this file as needed in the Mapper code
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above statement are correct

Correct Answer : Get Lastest Questions and Answer :

Explanation: You should read the file in the configure() method so that, files can be loaded only once and not loaded each time a map() method is called.

Question : You have added the below files in Distributed cache

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
job);
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

Which of the following is a correct method to get all the paths in an Array of the Distributed Cache files

1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array.
2. 2. There is a direct method available on the DistributedCache.getAllFilePath()
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4. All of the above

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question :

Select the correct code snippet which implemets the WritableComparable correctly for a pair of Strings

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers

Question :

If you have 100 files of size 100MB and block size is of 64MB then how many maps will run?

1. 100
2. 200
3. Access Mostly Uused Products by 50000+ Subscribers
4. Between 100 and 200

Question :

If you want to use a file in distributed cache then in which method should you read it?

1. map
2. run
3. Access Mostly Uused Products by 50000+ Subscribers
4. setup

Question : From the Acmeshell.com website you have your all the data stored
in Oracle database table called MAIN.PROFILES table. In HDFS you already
have your Apache WebServer log file stored called users_activity.log .
Now you want to combine/join both the data users_activity.log file and MAIN.PROFILES
table. Initailly, you want to import the table data from the database
into Hive using Sqoop with the delimeter (;) and column order remain same.
Now select the correct MapReduce code snippet which can produce the csv file,
so that we can load the output of MapReduce job in the HIve table created
in above steps called PROFILE.

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Question : As part of HadoopExam consultency team, you have been given a requirement by a Hotel to create
a GUI apllication, so all the hotel's sales or booking you will add and edit the customer information, and you dont want to spend the
money on enterprize RDBMS, hence decided simple file as a storage and considered the csv file. So HDFS is the better choice for
storing such information in the file.

1. No, because HDFS is optimized for read-once, streaming access for relatively large files.
2. No, because HDFS is optimized for write-once, streaming access for relatively large files.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes, because HDFS is optimized for write-once, streaming access for relatively large files.

Question : Please identify the statement which can correctly describe the use of RAM, of the NameNode

1. To store filenames, initial 100 lines from the each stored file in HDFS.
2. To store filenames, and while reading the file work as a buffer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. To store filenames, list of blocks but no metadata.