Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : A _____ is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.)

1. Node Manager
2. Container
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode

Correct Answer : Get Lastest Questions and Answer :
A Container is the basic unit of processing capacity in YARN, and is an encapsulation of resource elements (memory, cpu etc.).

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : __________ are responsible for local monitoring of resource availability, fault reporting,
and container life-cycle management (e.g., starting and killing
jobs).

1. NodeManagers
2. Application Manager
3. Access Mostly Uused Products by 50000+ Subscribers
4. Resource Manager

Correct Answer : Get Lastest Questions and Answer :

Explanation: The central ResourceManager runs as a standalone daemon on a dedicated machine and acts as the central authority for allocating resources to the
various competing applications in the cluster. The ResourceManager has a central and global view of all cluster resources and, therefore, can provide
fairness, capacity, and locality across all users. Depending on the application demand, scheduling priorities, and resource availability, the
ResourceManager dynamically allocates resource containers to applications to run on particular nodes. A container is a logical bundle of resources (e.g.,
memory, cores) bound to a particular cluster node. To enforce and track such assignments, the ResourceManager interacts with a special system daemon
running on each node called the NodeManager. Communications between the ResourceManager and NodeManagers are heartbeat based for scalability.
NodeManagers are responsible for local monitoring of resource availability, fault reporting, and container life-cycle management (e.g., starting and killing
jobs). The ResourceManager depends on the NodeManagers for its "global view" of the cluster.

User applications are submitted to the ResourceManager via a public protocol and go through an admission control phase during which security
credentials are validated and various operational and administrative checks are performed. Those applications that are accepted pass to the scheduler and
are allowed to run. Once the scheduler has enough resources to satisfy the request, the application is moved from an accepted state to a running state.
Aside from internal bookkeeping, this process involves allocating a container for the ApplicationMaster and spawning it on a node in the cluster. Often
called 'container 0,' the ApplicationMaster does not get any additional resources at this point and must request and release additional containers.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com

Question : Your cluster has slave nodes in three different racks, and you have written a rack topology script identifying each machine as being in hadooprack,
hadooprack2, or hadooprack3. A client machine outside of the cluster writes a small (one-block) file to HDFS. The first replica of the block is written
to a node on hadooprack2. How is block placement determined for the other two replicas?

1. One will be written to another node on hadooprack2, and the other to a node on a different rack.

2. Either both will be written to nodes on hadooprack1, or both will be written to nodes on hadooprack3.

3. Access Mostly Uused Products by 50000+ Subscribers

4. One will be written to hadooprack1, and one will be written to hadooprack3.

Correct Answer : Get Lastest Questions and Answer :

Explanation: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block
are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of
replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time.

The NameNode makes all decisions regarding replication of blocks. It periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. Receipt of a
Heartbeat implies that the DataNode is functioning properly. A Blockreport contains a list of all blocks on a DataNode.

For the default threefold replication, Hadoop's rack placement policy is to write the first copy of a block on a node in one rack, then the other two copies on two nodes in a
different rack. Since the first copy is written to hadooprack2, the other two will either be written to two nodes on hadoprack1, or two nodes on hadooprack3.

Replica Placement: The First Baby Steps

The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature
that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The
current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production
systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies.

Large HDFS instances run on a cluster of computers that commonly spread across many racks. Communication between two nodes in different racks has to go through switches. In most
cases, network bandwidth between machines in the same rack is greater than network bandwidth between machines in different racks.

The NameNode determines the rack id each DataNode belongs to via the process outlined in Hadoop Rack Awareness. A simple but non-optimal policy is to place replicas on unique racks.
This prevents losing data when an entire rack fails and allows use of bandwidth from multiple racks when reading data. This policy evenly distributes replicas in the cluster which
makes it easy to balance load on component failure. However, this policy increases the cost of writes because a write needs to transfer blocks to multiple racks.

For the common case, when the replication factor is three, HDFSs placement policy is to put one replica on one node in the local rack, another on a node in a different (remote) rack,
and the last on a different node in the same remote rack. This policy cuts the inter-rack write traffic which generally improves write performance. The chance of rack failure is far
less than that of node failure; this policy does not impact data reliability and availability guarantees. However, it does reduce the aggregate network bandwidth used when reading
data since a block is placed in only two unique racks rather than three. With this policy, the replicas of a file do not evenly distribute across the racks. One third of replicas are
on one node, two thirds of replicas are on one rack, and the other third are evenly distributed across the remaining racks. This policy improves write performance without
compromising data reliability or read performance.

The current, default replica placement policy described here is a work in progress.

Related Questions

Question : Select correct statements regarding ExpressLane features of MapR
A. It allows small jobs to be executed before large jobs
B. ExpressLane feature reserves one or more map or reduce slots on each task tracker
C. If there is no slot available to run small job, it will be executed on "ephemeral slots"
D. It requires to use fair scheduler
E. If Small job is running on ephemeral slots and found to violet the definition of "small" , then job will be killed and re-scheduled as a normal job.

1. A,B,C
2. C,D,E
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,E
5. A,B,C,D,E

Question : MapR Local volume is

1. Replicated across the cluster node. Hence, it never fills up.

2. never replicated across the nodes

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 2,3

Question : How will MapR make sure, that maximum local volume is available for direct shuffle?

1. It reserve few nodes in the cluster and don t store or replicate data on these nodes

2. If there is more space required it delete the data replicated on this node and deleted data will be replicated on another node in cluster

3. Access Mostly Uused Products by 50000+ Subscribers

4. Extra hard disks will be attached few selected nodes in the cluster than average storage size of entire cluster.

Question : Please arrange below statements in order of execution

A. output is generated in SequenceFile format and stored on local disk
B. Convert input text data into UTF-8 Writable Text format
C. map() method process key and value
D. Shuffle data and submit to reducer
E. Output stored in text format on HDFS/MapR-FS

1. B,C,A,D,E
2. A,B,C,D,E
3. Access Mostly Uused Products by 50000+ Subscribers
4. E,D,A,B,C
5. C,A,B,E,D

Question : If your process involves two MapReduce job. Then which is the ideal scenario?

1. Input to Job1 (Text File) , Output of Job1(Text) ,Input to job2(Text) , Output to Job2(Text)

2. Input to Job1 (Text File) , Output of Job1(SequneceFile) ,Input to job2(SequneceFile) , Output to Job2(SequneceFile)

3. Access Mostly Uused Products by 50000+ Subscribers

4. All above will give same performance.

Question : Select correct statement regarding compression in MapR
A. MapR by default supports configurable compression at the volume level
B. If data added on the volume is different type for each directory. Then you can configure compression codec on directory level
C. MapR can detect the compression type of data , based on file extension or from header information in Sequence file
D. You can use LD_LIBRARY_PATH environment variable to point to the directory containing native codecs

1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. C,D
5. A,B,C,D