Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In Hadoop ., which one of the following statements is true about a standby NameNode? The Standby NameNode:

1. Communicates directly with the active NameNode to maintain the state of the active NameNode.

2. Receives the same block reports as the active NameNode.

3. Runs on the same machine and shares the memory of the active NameNode.

4. Processes all client requests and block reports from the appropriate DataNodes.

Correct Answer : 2

Explanation: The options for NameNode HA include running a primary NameNode and a hot standby NameNode. They share an edits log, either on a NFS mount, or through quorum journal mode in
HDFS itself. The former gives you the benefit of having an external source for storing your HDFS metadata, while the latter gives you the benefit of having no dependencies external
to Hadoop.

Question : Identify the MapReduce v (MRv / YARN) daemon responsible for launching application containers and monitoring application resource usage?

1. ResourceManager

2. NodeManager

3. ApplicationMaster

4. ApplicationMasterService

5. TaskTracker

Correct Answer : 3

Explanation: Launching Containers
Once the ApplicationMaster obtains containers from the ResourceManager, it can then proceed to actual
launch of the containers. Before launching a container, it first has to construct the
ContainerLaunchContext object according to its needs, which can include allocated resource capability,
security tokens (if enabled), the command to be executed to start the container, an environment for the
process, necessary binaries/jar/shared objects, and more. It can either launch containers one by one by
communicating to a NodeManager, or it can batch all containers on a single node together and launch
them in a single call by providing a list of StartContainerRequests to the NodeManager.

Question : A client application creates an HDFS file named foo.txt with a replication factor of . Identify which best describes the file access rules in HDFS
if the file has a single block that is stored on data nodes A, B and C?

1. The file will be marked as corrupted if data node B fails during the creation of the file.

2. Each data node locks the local file to prohibit concurrent readers and writers of the file.

3. Each data node stores a copy of the file in the local file system with the same name as the HDFS file.

4. The file can be accessed if at least one of the data nodes storing the file is available.

Correct Answer : 4

Explanation: HDFS keeps three copies of a block on three different datanodes to protect against true data corruption. HDFS also tries to distribute these three replicas on more than one
rack to protect against data availability issues. The fact that HDFS actively monitors any failed datanode(s) and upon failure detection immediately schedules re-replication of
blocks (if needed) implies that three copies of data on three different nodes is sufficient to avoid corrupted files.

Note: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block
are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of
replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The
NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS,
2 copies are stored on datanodes on same rack and 3rd copy on a different rack.

Related Questions

Question : Which of the following provides highest compression ratio on MapR-FS

1. lz4

2. lzf

3. gZip

4. zlib

Question : You have "HadoopExam.log" file is stored in "HadoopExam.zip (. TB in size)" and same zip file is transferred to MapR-FS directory and you are aware that by default
it will compress the files. However, the size remain same in MapR-FS why ?

1. Compression codec is not configured properly.

2. File size bigger than 1 TB will not be compressed.

3. By default, MapR does not compress files whose filename extensions indicate they are already compressed.

4. Compression is not set on parent directory level.

Question : Lets say you have following output after the Map phase in a MapReduce job

Partition p1
(I,1)
(Learn, 1)
(Hadoop, 1)

Partition P2
(I,1)
(Learn,1)
(Spark,1)

MapReduce framework will call the reduce method

1. Twice , one for each partition

2. 4 times, one for each distinct key

3. 6 Time, one for each key

4. It is unpredictable

Question : Map the following scheduler

1. Capacity Scheduler
2. Fair Scheduler

A. Pool
B. Queue
C. Support Pre-emption

1. 1-A, 1-B, 2-C

2. 1-B, 2-A, 2-C

3. 1-B, 2-B, 1-C

4. 1-A, 1-B, 1-C

Question : Map the followings

1. Resource Manager
2. Node Manager
3. Application Master

A. Creates and Deletes the container
B. Launches the Apps
C. Request containers for the Apps

1. 1-A, 2-B, 3-C
2. 1-C, 2-B, 3-A
3. 1-B. 2-C. 3-A
4. 1-A, 2-C, 3-B

Question : You are upgrading Hadoop Installation to use YARN, why and which are the correct features of YARN ?

1. YARN is similar to JobTracker and support multiple instances of JobTracker on per cluster.
2. YARN support both MapReduce and Non-MapReduce framework
3. You can also do the slot configuration in YARN
4. YARN support scheduler

1. 1,2,4

2. 1,2,3,4

3. 1,3,4

4. 1,4