Question : Prior to Hadoop .., the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.Select all the actions you can accomplish once you implement HDFS High Availability on your Hadoop cluster. 1. Automatically replicate data between Active and Passive hadoop clusters. 2. Manually 'fail over' between Active and passive NameNodes 3. Automatically 'fail over' between Active and Passive NameNodes if Active one goes down. 4. Shut Active NameNode down for maintenance without disturbing the cluster. 5. Increase the parallelism in existing cluster.
Correct Answer : Get Lastest Questions and Answer : Explanation: Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. This impacted the total availability of the HDFS cluster in two major ways: In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode. Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime. The HDFS High Availability feature addresses the above problems by providing the option of running two redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This allows a fast failover to a new NameNode in the case that a machine crashes, or a graceful administrator-initiated failover for the purpose of planned maintenance. NameNode High Availability allows you to configure two NameNodes: an Active NameNode and a Standby NameNode. In this configuration, you can shut one of the two NameNodes down without affecting the operation of the cluster. You can configure HDFS HA such that the NameNodes fail over automatically, and yet you can still fail them over manually. In a typical HA cluster, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary. In order for the Standby node to keep its state synchronized with the Active node, the current implementation requires that the two nodes both have access to a directory on a shared storage device (eg an NFS mount from a NAS). This restriction will likely be relaxed in future versions. When any namespace modification is performed by the Active node, it durably logs a record of the modification to an edit log file stored in the shared directory. The Standby node is constantly watching this directory for edits, and as it sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the shared storage before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs. In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both. It is vital for the correct operation of an HA cluster that only one of the NameNodes be Active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called "split-brain scenario," the administrator must configure at least one fencing method for the shared storage. During a failover, if it cannot be verified that the previous Active node has relinquished its Active state, the fencing process is responsible for cutting off the previous Active's access to the shared edits storage. This prevents it from making any further edits to the namespace, allowing the new Active to safely proceed with failover.
Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now you are able to push this full file in a directory on HDFS called /log/QT/QT31012015.log, select the correct statement for the pushing the file on HDFS. 1. The client queries the NameNode, which returns information on which DataNodes to use and the client writes to those DataNodes 2. The client writes immediately to DataNodes based on the cluster's rack locality settings
4. The client writes immediately to DataNodes at random
Correct Answer : Get Lastest Questions and Answer : Explanation: When the first client contacts the name-node to open the file for writing, the name-node grants a lease to the client to create this file. When the second client tries to open the same file for writing, the name-node will see that the lease for the file is already granted to another client, and will reject the open request for the second client.The NameNode will return a list of DataNodes to which the client should write. The contents of the file are never sent to the NameNode
Question : Select the appropriate way by which NameNode get to know all the available DataNodes in the Hadoop Cluster. 1. DataNodes are listed in the dfs.hosts file. The NameNode uses that as the definitive list of available DataNodes. 2. DataNodes heartbeat in to the master on a time-interval basis.
Explanation: The data node is where the actual data resides. Some interesting traits of the same are as follows: All datanodes send a heartbeat message to the namenode every 3 seconds to say that they are alive. If the namenode does not receive a heartbeat from a particular data node for 10 minutes, then it considers that data node to be dead/out of service and initiates replication of blocks which were hosted on that data node to be hosted on some other data node. The data nodes can talk to each other to rebalance data, move and copy data around and keep the replication high. When the datanode stores a block of information, it maintains a checksum for it as well. The data nodes update the namenode with the block information periodically and before updating verify the checksums. If the checksum is incorrect for a particular block i.e. there is a disk level corruption for that block, it skips that block while reporting the block information to the namenode. In this way, namenode is aware of the disk level corruption on that datanode and takes steps accordingly.DataNodes heartbeat in to the master every three seconds. When a DataNode heartbeats in to the NameNode the first time, the NameNode marks it as being available. DataNodes can be listed in a file pointed to by thedfs.hosts property, but this only lists the names of possible DataNodes. It is not a definitive list of those which are available but, rather, a list of the only machines which may be used as DataNodes if they begin to heartbeat. Namenode : Namenode is the node which stores the filesystem metadata i.e. which file maps to what block locations and which blocks are stored on which datanode. The namenode maintains two in-memory tables, one which maps the blocks to datanodes (one block maps to 3 datanodes for a replication value of 3) and a datanode to block number mapping. Whenever a datanode reports a disk corruption of a particular block, the first table gets updated and whenever a datanode is detected to be dead (because of a node/network failure) both the tables get updated. Failover semantics: The secondary namenode regularly connects to the primary namenode and keeps snapshotting the filesystem metadata into local/remote storage.
1. a container is a collection of physical resources such as RAM, CPU cores, and disks on a single node. 2. There can be only one container on a single node 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 5. 1 and 3
1. On start-up, the NodeManager registers with the ResourceManager 2. Its primary goal is to manage only the containers (On the node) assigned to it by the ResourceManager 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 5. 1 and 3
Question : Select the correct statement for HDFS in Hadoop . 1. NameNode federation significantly improves the scalability and performance of HDFS by introducing the ability to deploy multiple NameNodes for a single cluster. 2. built-in high availability for the NameNode via a new feature called the Quorum Journal Manager (QJM). QJM-based HA features an active NameNode and a standby NameNode 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 3 5. 1,2 and 3