Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Secondary NameNode is a backup for NameNode ?

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

The Secondary NameNode is not a backup of the NameNode and take care of some housekeeping
tasks for the NameNode

Question : You have a MapReduce job which is dependent on two external jdbc jars called ojdbc.jar and openJdbc.jar
which of the following command will correctly include this external jars in the running Jobs classpath

1. hadoop jar job.jar HadoopExam -cp ojdbc6.jar,openJdbc6.jar
2. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar,openJdbc6.jar
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar openJdbc6.jar
Ans : 2
Exp : The syntax for executing a job and including archives in the job's classpath is: hadoop jar -libjars ,[,...]

Question : You have Sqoop to import the EVENT table from the database,
then write a Hadoop streaming job in Python to scrub the data,
and use Hive to write the new data into the Hive EVENT table.
How would you automate this data pipeline?

1. Using first Sqoop job and then remaining Part using MapReduce job chaining.
2. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Oozie coordinator job to run the workflow job daily.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job,
and define an Zookeeper coordinator job to run the workflow job daily.

Ans :2
Exp : In Oozie, scheduling is the function of an Oozie coordinator job.
Oozie does not allow you to schedule workflow jobs
Oozie coordinator jobs cannot aggregate tasks or define workflows;
coordinator jobs are simple schedules of previously defined worksflows.
You must therefore assemble the various tasks into a single workflow
job and then use a coordinator job to execute the workflow job.

Question : QuickTechie Inc has a log file which is tab-delimited text file. File contains two columns username and loginid
You want use an InputFormat that returns the username as the key and the loginid as the value. Which of the following
is the most appropriate InputFormat should you use?

1. KeyValueTextInputFormat
2. MultiFileInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. SequenceFileInputFormat
5. TextInputFormat

Correct Answer : Get Lastest Questions and Answer :

Explanation: An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Each line is divided into key and value
parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty.The KeyValueTextInputFormat parses each line of text as a key, a
separator and a value. The default separator is the tab character. In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than
tab(which is default) to separate key and Value.

Sample Input :
one,first line
two,second line
Ouput Required :

Key : one
Value : first line
Key : two
Value : second line

Question : In the QuickTechie Inc Hadoop cluster you have defined block size as MB. The input file contains MB of valid input data
and is loaded into HDFS. How many map tasks should run without considering any failure of MapTask during the execution of this job?

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Correct Answer : Get Lastest Questions and Answer :

Explanation: 194/64 = will 3.xxx hence total 4 map task will be executed. 3 for full block of 64MB and last one for remaining data.

Watch the training Module 21 from http://hadoopexam.com/index.html/#hadoop-training

Related Questions

Question : Please map the followings
A. Find all the Running Jobs
B. Get the completion status of a Particular job
C. Stop already running Job

1. hadoop job -list
2. hadoop job -kill job_id
3. Access Mostly Uused Products by 50000+ Subscribers

1. A-1, B-2, C-3
2. A-1, B-3, C-2
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-3, B-1, C-2
5. A-3, B-2, C-3

Question : Which is/are the correct ways to set the Job Priority from below?

1. Configuration conf = new Configuration
conf.set("mapred.job.priority" , "VERY_LOW")

2. Passing as a parameter while submitting job
-D mapred.job.priority=VERY_LOW

3. Access Mostly Uused Products by 50000+ Subscribers
hadoop job -set-priority job_id

4. 1,2

5. 1,2,3

Question : You can use the Job Priority to prioritize your Job Over other Jobs in other Pools or Queue

1. True
2. False

Question : In the label-based scheduling

1. User can override the default scheduling algorithm and can have more control where the Job should run the Cluster

2. Location of the labels file can be defined using jobtracker.node.labels.file in mapred-site.xml file

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Select correct statement regarding label based scheduling

1. To list all the available labels in the cluster, you can use hadoop job -showlabels

2. We can use following command line option to Submit job with label hadoop jar -D mapred.job.label=hadoopexam

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : You have following command executed

hadoop job -showlables
Node lables :
CentOS001 : [heavy, hig_ram, high_cpu]
CentOS002 : [light, low_ram, low_cpu]
CentOS003 : [medium, m_ram, m_cpu]

Ans now you submit the job with below command

hadoop jar -D mapred.job.label=hadoopexam

What would happen?

1. It will submit the entire job on CentOS001

2. It will submit the entire job on CentOS002

3. Access Mostly Uused Products by 50000+ Subscribers

4. It will use default scheduling algorithm

5. Job will hang