Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question :

You have user profile records in an OLTP database that you want to join with web server logs
which you have already ingested into HDFS. What is the best way to acquire the user profile for
use in HDFS?
A. Ingest with Hadoop streaming
B. Ingest with Apache Flume
C. Ingest using Hive's LOAD DATA command
D. Ingest using Sqoop
E. Ingest using Pig's LOAD command

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. A,E

Correct Answer : Get Lastest Questions and Answer :

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. Access Mostly Uused Products by 50000+ Subscribers

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-a, 2-c,3-b

Correct Answer : Get Lastest Questions and Answer :

Explanation: Components of Mapreduce Job Flow:
Mapreduce job flow on YARN involves below components.
A Client node, which submits the Mapreduce job.
The YARN Resource Manager, which allocates the cluster resources to jobs.
The YARN Node Managers, which launch and monitor the tasks of jobs.
The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers.
The HDFS file system is used for sharing job files between the above entities.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Developer has submitted the YARN Job, by calling submitApplication() method on Resource Manager.
Please select the correct order of the below stpes after that

1. Container will be managed by Node Manager after job submission
2. Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution.
3. Access Mostly Uused Products by 50000+ Subscribers

1. 2,3,1
2. 1,2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,3,2

Correct Answer : Get Lastest Questions and Answer :

Explanation: Job Start up:
The call to Job.waitForCompletion() in the main driver class is where all the execution starts. The driver is the only piece of code that runs on our local machine, and this call starts the communication with the Resource Manager.
Retrieves the new Job ID or Application ID from Resource Manager.
The Client Node copies Job Resources specified via the -files, -archives, and -libjars command-line arguments, as well as the job JAR file on to HDFS.
Finally, Job is submitted by calling submitApplication() method on Resource Manager.
Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution. Then Resource Manager starts Application Master in the container provided by the scheduler. This container will be managed by Node Manager from here on wards.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Related Questions

Question :

Map wordCountMap = new Map(String, List(String>>(); //It holds each word as a key and all the same words are in the list
In a word count Mapper class, you are emitting key value pair as
Case 1 : context.write("word, IntWritable(1))
and

Case 2 : context.write("word, IntWritable(wordCountMap.get("word").size())) " ,

Select the correct statement from above example code snippet

1. In both the cases consumption of network bandwidth would be same
2. In Case 1 Network bandwidth consumption would be low
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cannot be determined

Question :

Suppose you have the file in hdfs directory as below
/myapp/map.zip

And you will use the following API method to add this file to DistributedCache

JobConf job = new JobConf();
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

Which of the best place to read this file in a MapReduce job

1. Inside the map() method of the Mapper
2. You can randomly read this file as needed in the Mapper code
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above statement are correct

Question : You have added the below files in Distributed cache

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
job);
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

Which of the following is a correct method to get all the paths in an Array of the Distributed Cache files

1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array.
2. 2. There is a direct method available on the DistributedCache.getAllFilePath()
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4. All of the above

Question :

Suppose you want to create following Hive table, with the partitioned by Date column. Which is the correct syntax

id int,
date date,
name varchar

1. create table table_name ( id int, date date, name string ) ) partitioned by (date string)
2. create table table_name ( id int, date date, name string ) ) partitioned by (string)
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 2 and 3 correct

Question :

You have following DDL to create Hive table

CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE

Select the correct statement which applies
A. The statement above creates the page_view table with viewTime, userid, page_url, referrer_url, and ip columns (including comments).
B. The table is also partitioned
C. Data is stored in sequence files.
D. The data format in the files is assumed to be field-delimited by ctrl-A and row-delimited by newline.

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C,D
5. A,B,C

Question :

Select the correct statement for the below command

CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair

1. The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement
2. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 is correct
5. All 1,2 and 3 are correct