Question : In Hadoop ., which one of the following statements is true about a standby NameNode? The Standby NameNode:
1. Communicates directly with the active NameNode to maintain the state of the active NameNode.
2. Receives the same block reports as the active NameNode.
3. Runs on the same machine and shares the memory of the active NameNode.
4. Processes all client requests and block reports from the appropriate DataNodes.
Correct Answer : 2
Explanation: The options for NameNode HA include running a primary NameNode and a hot standby NameNode. They share an edits log, either on a NFS mount, or through quorum journal mode in HDFS itself. The former gives you the benefit of having an external source for storing your HDFS metadata, while the latter gives you the benefit of having no dependencies external to Hadoop.
Question : Identify the MapReduce v (MRv / YARN) daemon responsible for launching application containers and monitoring application resource usage?
1. ResourceManager
2. NodeManager
3. ApplicationMaster
4. ApplicationMasterService
5. TaskTracker
Correct Answer : 3
Explanation: Launching Containers Once the ApplicationMaster obtains containers from the ResourceManager, it can then proceed to actual launch of the containers. Before launching a container, it first has to construct the ContainerLaunchContext object according to its needs, which can include allocated resource capability, security tokens (if enabled), the command to be executed to start the container, an environment for the process, necessary binaries/jar/shared objects, and more. It can either launch containers one by one by communicating to a NodeManager, or it can batch all containers on a single node together and launch them in a single call by providing a list of StartContainerRequests to the NodeManager.
Question : A client application creates an HDFS file named foo.txt with a replication factor of . Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes A, B and C?
1. The file will be marked as corrupted if data node B fails during the creation of the file.
2. Each data node locks the local file to prohibit concurrent readers and writers of the file.
3. Each data node stores a copy of the file in the local file system with the same name as the HDFS file.
4. The file can be accessed if at least one of the data nodes storing the file is available.
Correct Answer : 4
Explanation: HDFS keeps three copies of a block on three different datanodes to protect against true data corruption. HDFS also tries to distribute these three replicas on more than one rack to protect against data availability issues. The fact that HDFS actively monitors any failed datanode(s) and upon failure detection immediately schedules re-replication of blocks (if needed) implies that three copies of data on three different nodes is sufficient to avoid corrupted files.
Note: HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file. An application can specify the number of replicas of a file. The replication factor can be specified at file creation time and can be changed later. Files in HDFS are write-once and have strictly one writer at any time. The NameNode makes all decisions regarding replication of blocks. HDFS uses rack-aware replica placement policy. In default configuration there are total 3 copies of a datablock on HDFS, 2 copies are stored on datanodes on same rack and 3rd copy on a different rack.