Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : Which statement is true with respect to MapReduce . or YARN

1. It is the newer version of MapReduce, using this performance of the data processing can be increased.
2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. Only 2 and 3 are correct
Ans : 5
Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM)
and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question :

Which statement is true about ApplicationsManager

1. is responsible for accepting job-submissions
2. negotiating the first container for executing the application specific ApplicationMaster
and provides the service for restarting the ApplicationMaster container on failure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 2 are correct
Ans : 5
Exp : The ApplicationsManager is responsible for accepting job-submissions,
negotiating the first container for executing the application specific ApplicationMaster and provides the
service for restarting the ApplicationMaster container on failure.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question :

Which tool is used to list all the blocks of a file ?

1. hadoop fs
2. hadoop fsck
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Possible
Ans : 2

Question : Identify the MapReduce v (MRv / YARN) daemon responsible for launching application containers and
monitoring application resource usage?

1. ResourceManager
2. NodeManager
3. Access Mostly Uused Products by 50000+ Subscribers
4. ApplicationMasterService
5. TaskTracker.

Ans : 3
Exp :The fundamental idea of MRv2(YARN)is to split up the two major functionalities of the JobTracker, resource
management and job scheduling/monitoring, into separate daemons. The idea is to have a global
ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the
classical sense of Map-Reduce jobs or a DAG of jobs.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Identify the tool best suited to import a portion of a relational database every day as files into HDFS, and
generate Java classes to interact with that imported data?

1. Oozie
2. Flume
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hue
5. Sqoop

Ans : 5

Exp :Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities:
Imports individual tables or entire databases to files in HDFS Generates Java classes to allow you to interact
with your imported data Provides the ability to import from SQL databases straight into your Hive data
warehouse

Data Movement Between Hadoop and Relational Databases
Data can be moved between Hadoop and a relational database as a bulk data transfer, or relational tables can
be accessed from within a MapReduce map function.
Note:

* Cloudera's Distribution for Hadoop provides a bulk data transfer tool (i.e., Sqoop) that imports individual
tables or entire databases into HDFS files. The tool also generates Java classes that support interaction with
the imported data. Sqoop supports all relational databases over JDBC, and Quest Software provides a
connector (i.e., OraOop) that has been optimized for access to data residing in Oracle databases.

Question : Given no tables in Hive, which command will import the entire contents of the LOGIN table
from the database into a Hive table called LOGIN that uses commas (,) to separate the fields in the data files?

1. hive import --connect jdbc:mysql://dbhost/db --table LOGIN --terminated-by ',' --hive-import
2. hive import --connect jdbc:mysql://dbhost/db --table LOGIN --fields-terminated-by ',' --hive-import
3. Access Mostly Uused Products by 50000+ Subscribers
4. sqoop import --connect jdbc:mysql://dbhost/db --table LOGIN --fields-terminated-by ',' --hive-import
Ans : 4
Exp : Sqoop import to a Hive table requires the import option followed by the --table option to specify the database table name and the --hive-import option. If --hive-table is not specified, the Hive table will have the same name as the imported database table. If --hive-overwrite is specified, the Hive table will be overwritten if it exists. If the --fields-terminated-by option is set, it controls the character used to separate the fields in the Hive table's data files.

Watch Hadoop Professional training Module : 22 by www.HadoopExam.com
http://hadoopexam.com/index.html/#hadoop-training

Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?

1. TaskTracker

2. Secondary NameNode

3. NodeManager

4. DataNode

5. ZooKeeper

6. JobTracker

7. NameNode

8. JournalNode

1. 1,2
2. 2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5,6
4. 7,8

Correct Answer : Get Lastest Questions and Answer :

Explanation: Each slave node in a cluster configured to run MapReduce v2 (MRv2) on YARN typically runs a DataNode daemon (for HDFS functions) and NodeManager daemon (for YARN functions). The NodeManager handles communication with the ResourceManager, oversees application container lifecycles, monitors CPU and memory resource use of the containers, tracks the node health, and handles log management. It also makes available a number of auxiliary services to YARN applications.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN?

1. The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job
2. The ApplicationMaster chooses the number based on the number of available nodes

3. Access Mostly Uused Products by 50000+ Subscribers
4. NodeManager where the job's HDFS blocks reside
5. The developer specifies the number in the job configuration

Correct Answer : Get Lastest Questions and Answer :
Each Mapper task processes a single InputSplit. The client calculates the InputSplits before submitting the job to the cluster. The developer may specify how the input split is calculated, with a single HDFS block being the most common split. This is true for both MapReduce v1 (MRv1) and YARN MapReduce implementations.

With YARN, each mapper will be run in a container which consists of a specific amount of CPU and memory resources. The ApplicationMaster requests a container for each mapper. The ResourceManager schedules the resources and instructs the ApplicationMaster of available NodeManagers where the container may be launched.

With MRv1, each Tasktracker (slave node) is configured to handle a maximum number of concurrent map tasks. The JobTracker (master node) assigns a Tasktracker a specific Inputslit to process as a single map task.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : In a Sqoop job Assume $PREVIOUSREFRESH contains a date:time string for the last time the import was run, e.g., '-- ::'.
Which of the following import command control arguments prevent a repeating Sqoop job from downloading the entire EVENT table every day?

1. --incremental lastmodified --refresh-column lastmodified --last-value "$PREVIOUSREFRESH"
2. --incremental lastmodified --check-column lastmodified --last-time "$PREVIOUSREFRESH"
3. Access Mostly Uused Products by 50000+ Subscribers
4. --incremental lastmodified --check-column lastmodified --last-value "$PREVIOUSREFRESH"

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question :

What is the result of the following command (the database username is foo and password is bar)?
$ sqoop list-tables - - connect jdbc : mysql : / / localhost/databasename - - table - - username foo -
- password bar

1. sqoop lists only those tables in the specified MySql database that have not already been imported into FDFS
2. sqoop returns an error
3. Access Mostly Uused Products by 50000+ Subscribers
4. sqoopimports all the tables from SQLHDFS

Question :

Which best describes the primary function of Flume?

1. Flume is a platform for analyzing large data sets that consists of a high level language for
expressing data analysis programs, coupled with an infrastructure consisting of sources and sinks
for importing and evaluating large data sets
2. Flume acts as a Hadoop filesystem for log files
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume provides a query languages for Hadoop similar to SQL
5. Flume is a distributed server for collecting and moving large amount of data into HDFS as its
produced from streaming data flows

Question :

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. C,E

Question :

You have user profile records in an OLTP database that you want to join with web server logs
which you have already ingested into HDFS. What is the best way to acquire the user profile for
use in HDFS?
A. Ingest with Hadoop streaming
B. Ingest with Apache Flume
C. Ingest using Hive's LOAD DATA command
D. Ingest using Sqoop
E. Ingest using Pig's LOAD command

1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. A,E

Question : Map the following in case of YARN

1. YARN Resource Manager
2. YARN Node Managers
3. Access Mostly Uused Products by 50000+ Subscribers

a. which launch and monitor the tasks of jobs
b. allocates the cluster resources to jobs
c. which coordinates the tasks running in the MapReduce job

1. 1-a, 2-b,3-c
2. 1-b, 2-a,3-c
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-a, 2-c,3-b

Question : Developer has submitted the YARN Job, by calling submitApplication() method on Resource Manager.
Please select the correct order of the below stpes after that

1. Container will be managed by Node Manager after job submission
2. Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution.
3. Access Mostly Uused Products by 50000+ Subscribers

1. 2,3,1
2. 1,2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,3,2