Question : Which describes how a client reads a file from HDFS?
1. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).
2. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.
3. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode. 4. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client.
Correct Answer : 1 Explanation: Option 1 explains the How client read of the files from HDFS ?
Question : Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.
1. Yes.
2. Yes, but only if one of the tables fits into memory
3. Yes, so long as both tables fit into memory.
4. No, MapReduce cannot perform relational operations.
5. No, but it can be done with either Pig or Hive.
Correct Answer : 1
Explanation: When processing large data sets the need for joining data by a common key can be very useful, if not essential. By joining data you can further gain insight such as joining with timestamps to correlate events with a time a day. The need for joining data are many and varied. We will be covering 3 types of joins, Reduce-Side joins, Map-Side joins and the Memory-Backed Join over 3 separate posts. This installment we will consider working with Reduce-Side joins.
Question : A NameNode in Hadoop . manages ______________.
1. Two namespaces: an active namespace and a backup namespace
2. A single namespace
3. An arbitrary number of namespaces
4. No namespaces
Correct Answer : 2
Explanation: HDFS has two main layers:
Namespace : Consists of directories, files and blocks. It supports all the namespace related file system operations such as create, delete, modify and list files and directories.
Block Storage Service, which has two parts: Block Management (performed in the Namenode) Provides Datanode cluster membership by handling registrations, and periodic heart beats. Processes block reports and maintains location of blocks. Supports block related operations such as create, delete, modify and get block location. Manages replica placement, block replication for under replicated blocks, and deletes blocks that are over replicated. Storage - is provided by Datanodes by storing blocks on the local file system and allowing read/write access. The prior HDFS architecture allows only a single namespace for the entire cluster. In that configuration, a single Namenode manages the namespace. HDFS Federation addresses this limitation by adding support for multiple Namenodes/namespaces to HDFS.
The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree." Essentially, Namespace means a container. In this context is means the file name grouping or hierarchy structure. Metadata contains things like the owners of files, permission bits, block location, size etc
1. They would see Hadoop throw a ConcurrentFileAccessException when they try to access this file.
2. They would see the current state of the file, up to the last bit written by the command.
3. They would see the current of the file through the last completed block.
4. They would see no content until the whole file written and closed.
Question : Which statement is true 1. Output of the reducer could be zero 2. Output of the reducer is written to the HDFS 3. In practice, the reducer usually emits a single key-value pair for each input key 4. All of the above