Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : How data node sends their block report to NameNode or JobTracker

1. Data node sends HeartBeat information only once, when data is stored on HDFS

2. Data node sends HeartBeat information once in a day

3. Access Mostly Uused Products by 50000+ Subscribers

4. All 1 and 3

Correct Answer : Get Lastest Questions and Answer :
Explanation: The TaskTracker sends a heartbeat (TaskTracker#transmitHeartBeat) to the JobTracker at regular intervals, in the heart beat it also
indicates that it can take new tasks for
execution. Then the JobTracker (JobTracker#heartbeat) consults the Scheduler (TaskScheduler#assignTasks) to assign tasks to the TaskTracker and sends the
list of tasks as part of the
HeartbeatResponse to the TaskTracker. With that it also sends Block report of data node.

Question : Which of the following Scheduler you can configure in Hadoop?
A. Fair Scheduler
B. Capacity Scheduler
C. Weight Scheduler
D. Timing Scheduler

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D
5. B,D

Correct Answer : Get Lastest Questions and Answer :
Explanation: Hadoop schedulers
Since the pluggable scheduler was implemented, several scheduler algorithms have been developed for it. The sections that follow explore the various
algorithms
available and when it makes sense to use them.

FIFO scheduler
The original scheduling algorithm that was integrated within the JobTracker was called FIFO. In FIFO scheduling, a JobTracker pulled jobs from a work queue,
oldest job first. This schedule had no concept of the priority or size of the job, but the approach was simple to implement and efficient.

Fair scheduler
The core idea behind the fair share scheduler was to assign resources to jobs such that on average over time, each job gets an equal share of the available
resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require more time to
execute. This behavior allows for some interactivity among Hadoop jobs and permits greater responsiveness of the Hadoop cluster to the variety of job types
submitted. The fair scheduler was developed by Facebook.

Capacity scheduler
The capacity scheduler shares some of the principles of the fair scheduler but has distinct differences, too. First, capacity scheduling was defined for large
clusters, which may have multiple, independent consumers and target applications. For this reason, capacity scheduling provides greater control as well as the
ability to provide a minimum capacity guarantee and share excess capacity among users. The capacity scheduler was developed by Yahoo!.
In capacity scheduling, instead of pools, several queues are created, each with a configurable number of map and reduce slots. Each queue is also assigned a
guaranteed capacity (where the overall capacity of the cluster is the sum of each queue's capacity).

Question : Which is true about Fair Scheduler in Hadoop?

1. It is a default scheduler in Hadoop

2. Each user has its own pool

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: The core idea behind the fair share scheduler was to assign resources to jobs such that on average over time, each job gets an equal
share of the
available resources. The result is that jobs that require less time are able to access the CPU and finish intermixed with the execution of jobs that require
more time to execute. This behavior allows for some interactivity among Hadoop jobs and permits greater responsiveness of the Hadoop cluster to the variety of
job types submitted.

The Hadoop implementation creates a set of pools into which jobs are placed for selection by the scheduler. Each pool can be assigned a set of shares to
balance resources across jobs in pools (more shares equals greater resources from which jobs are executed). By default, all pools have equal shares, but
configuration is possible to provide more or fewer shares depending upon the job type. The number of jobs active at one time can also be constrained, if
desired, to minimize congestion and allow work to finish in a timely manner.

To ensure fairness, each user is assigned to a pool. In this way, if one user submits many jobs, he or she can receive the same share of cluster resources as
all other users (independent of the work they have submitted). Regardless of the shares assigned to pools, if the system is not loaded, jobs receive the shares
that would otherwise go unused (split among the available jobs).

The scheduler implementation keeps track of the compute time for each job in the system. Periodically, the scheduler inspects jobs to compute the difference
between the compute time the job received and the time it should have received in an ideal scheduler. The result determines the deficit for the task. The job
of the scheduler is then to ensure that the task with the highest deficit is scheduled next.

Related Questions

Question : Which of the following are MapReduce processing phases ?

1. Map
2. Reduce
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sort
5. 1 and 2 only

Question : What is true about HDFS ?

1. HDFS is based of Google File System
2. HDFS is written in Java
3. Access Mostly Uused Products by 50000+ Subscribers
4. All above are correct

Question : What are sequence files and why are they important?

1. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
2. Sequence files are binary format files that are compressed and are splitable.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : How can you use binary data in MapReduce?

1. Binary data cannot be used by Hadoop framework.
2. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers

Question : Which is Hadoop Daemon Process (MRv)

1. JobTracker
2. Tasktracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode
5. All of the above

Question : Which statement is true about apache Hadoop ?

1. HDFS performs best with a modest number of large files
2. No Random Writes is allowed to the file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above