Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : What are the reasons new Hadoop Framework named YARN has been developed?

1. Tasks slot configuration is not dynamic

2. MRv1 only supports MapReduce

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: YARN is more general than MR and it should be possible to run other computing models like BSP besides MR. Prior to YARN, it required
a separate cluster
for MR, BSP and others. Now they can coexist in a single cluster, which leads to higher usage of the cluster. Here are some of the applications ported to
YARN.

From a MapReduce perspective in legacy MR there are separate slots for Map and Reduce tasks, but in YARN there is no fixed purpose of a container. The same
container can be used for a Map task, Reduce task, Hama BSP Task or something else. This leads to better utilization.

Also, it makes it possible to run different versions of Hadoop in the same cluster which is not possible with legacy MR, which makes is easy from a
maintenance point.

Question : Which of the following main responsibility of JobTracker has been separated in YARN?

1. Resource Management

2. Job Management

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: : In Hadoop 2, MapReduce is split into two components: The cluster resource management capabilities have become YARN, while the MapReduce-specific
capabilities remain MapReduce. In the former MR1 architecture, the cluster was managed by a service called the JobTracker. TaskTracker services lived on each
node and would launch tasks on behalf of jobs. The JobTracker would serve information about completed jobs. In MR2, the functions of the JobTracker are
divided into three services. The ResourceManager is a persistent YARN service that receives and runs applications
(a MapReduce job is an application) on the cluster. It contains the scheduler, which, as in MR1, is pluggable.

The MapReduce-specific capabilities of the JobTracker have moved into the MapReduce Application Master, one of which is started to manage each MapReduce job
and terminated when the job completes. The JobTracker s function of serving information about completed jobs has been moved to the JobHistoryServer. The
TaskTracker has been replaced with the NodeManager, a YARN service that manages resources and deployment on a node. NodeManager is responsible for launching
containers, each of which can house a map or reduce task.

Because MR1 functionality has been split into two components in Hadoop 2, MapReduce cluster configuration options have been split into YARN configuration
options, which go in yarn-site.xml; and MapReduce configuration options, which go in mapred-site.xml. Many have been given new names to reflect the shift. As
JobTrackers and TaskTrackers no longer exist in MR2, all configuration options pertaining to them no longer exist, although many have corresponding options for
the ResourceManager, NodeManager, and JobHistoryServer. We ll follow up with a full translation table in a future post.

Question : Which all are the responsibilities of ResourceManager in MRv?

1. Resource Negotiations

2. Cluster Resources allocations

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: ResourceManager is the central authority that manages resources and schedules applications running atop of YARN.
The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The
Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about
restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements
of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc.

The Scheduler has a pluggable policy which is responsible for partitioning the cluster resources among the various queues, applications etc. The current
schedulers such as the CapacityScheduler and the FairScheduler would be some examples of plug-ins.

Related Questions

Question : Secondary NameNode is a backup for NameNode ?

1. True
2. False

Question : You have a MapReduce job which is dependent on two external jdbc jars called ojdbc.jar and openJdbc.jar
which of the following command will correctly include this external jars in the running Jobs classpath

1. hadoop jar job.jar HadoopExam -cp ojdbc6.jar,openJdbc6.jar
2. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar,openJdbc6.jar
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop jar job.jar HadoopExam -libjars ojdbc6.jar openJdbc6.jar
Ans : 2
Exp : The syntax for executing a job and including archives in the job's classpath is: hadoop jar -libjars ,[,...]

Question : You have Sqoop to import the EVENT table from the database,
then write a Hadoop streaming job in Python to scrub the data,
and use Hive to write the new data into the Hive EVENT table.
How would you automate this data pipeline?

1. Using first Sqoop job and then remaining Part using MapReduce job chaining.
2. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job, and define an Oozie coordinator job to run the workflow job daily.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Define the Sqoop job, the MapReduce job, and the Hive job as an Oozie workflow job,
and define an Zookeeper coordinator job to run the workflow job daily.

Ans :2
Exp : In Oozie, scheduling is the function of an Oozie coordinator job.
Oozie does not allow you to schedule workflow jobs
Oozie coordinator jobs cannot aggregate tasks or define workflows;
coordinator jobs are simple schedules of previously defined worksflows.
You must therefore assemble the various tasks into a single workflow
job and then use a coordinator job to execute the workflow job.

Question : QuickTechie Inc has a log file which is tab-delimited text file. File contains two columns username and loginid
You want use an InputFormat that returns the username as the key and the loginid as the value. Which of the following
is the most appropriate InputFormat should you use?

1. KeyValueTextInputFormat
2. MultiFileInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. SequenceFileInputFormat
5. TextInputFormat

Question : In the QuickTechie Inc Hadoop cluster you have defined block size as MB. The input file contains MB of valid input data
and is loaded into HDFS. How many map tasks should run without considering any failure of MapTask during the execution of this job?

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Question : What is data localization ?

1. Before processing the data, bringing them to the local node.
2. Hadoop will start the Map task on the node where data block is kept via HDFS
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the 1 and 2 is correct

Question : All the mappers, have to communicate with all the reducers...

1. True
2. False

Question : Mapper and Reducer runs on the same machine then output of the Mapper will not be transferred via network to the reducer

1. True
2. False