Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : The Mapper may use or completely ignore the input key ?

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

Question :What would be the key when file is an input to the MapReduce job

1. The key is the byte offset into the file at which the line starts
2. the key is the line contents itself
3. Access Mostly Uused Products by 50000+ Subscribers
4. Nobe of the above

Correct Answer : Get Lastest Questions and Answer :

Question :The Mappers output must be in the form of key value pairs

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question :

Map wordCountMap = new Map(String, List(String>>(); //It holds each word as a key and all the same words are in the list
In a word count Mapper class, you are emitting key value pair as
Case 1 : context.write("word, IntWritable(1))
and

Case 2 : context.write("word, IntWritable(wordCountMap.get("word").size())) " ,

Select the correct statement from above example code snippet

1. In both the cases consumption of network bandwidth would be same
2. In Case 1 Network bandwidth consumption would be low
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cannot be determined

Question :

Suppose you have the file in hdfs directory as below
/myapp/map.zip

And you will use the following API method to add this file to DistributedCache

JobConf job = new JobConf();
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

Which of the best place to read this file in a MapReduce job

1. Inside the map() method of the Mapper
2. You can randomly read this file as needed in the Mapper code
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above statement are correct

Question : You have added the below files in Distributed cache

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
job);
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

Which of the following is a correct method to get all the paths in an Array of the Distributed Cache files

1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array.
2. 2. There is a direct method available on the DistributedCache.getAllFilePath()
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4. All of the above

Question :

Suppose you want to create following Hive table, with the partitioned by Date column. Which is the correct syntax

id int,
date date,
name varchar

1. create table table_name ( id int, date date, name string ) ) partitioned by (date string)
2. create table table_name ( id int, date date, name string ) ) partitioned by (string)
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 2 and 3 correct

Question :

You have following DDL to create Hive table

CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE

Select the correct statement which applies
A. The statement above creates the page_view table with viewTime, userid, page_url, referrer_url, and ip columns (including comments).
B. The table is also partitioned
C. Data is stored in sequence files.
D. The data format in the files is assumed to be field-delimited by ctrl-A and row-delimited by newline.

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C,D
5. A,B,C

Question :

Select the correct statement for the below command

CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair

1. The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement
2. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 is correct
5. All 1,2 and 3 are correct