Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which describes how a client reads a file from HDFS?

1. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

2. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

3. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode,
and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
4. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the
DataNode to the NameNode, and then from the NameNode to the client.

Correct Answer : 1
Explanation: Option 1 explains the How client read of the files from HDFS ?

Question : Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.

1. Yes.

2. Yes, but only if one of the tables fits into memory

3. Yes, so long as both tables fit into memory.

4. No, MapReduce cannot perform relational operations.

5. No, but it can be done with either Pig or Hive.

Correct Answer : 1

Explanation: When processing large data sets the need for joining data by a common key can be very useful, if not essential. By joining data you can further gain insight such as joining
with timestamps to correlate events with a time a day. The need for joining data are many and varied. We will be covering 3 types of joins, Reduce-Side joins, Map-Side joins and the
Memory-Backed Join over 3 separate posts. This installment we will consider working with Reduce-Side joins.

Question : A NameNode in Hadoop . manages ______________.

1. Two namespaces: an active namespace and a backup namespace

2. A single namespace

3. An arbitrary number of namespaces

4. No namespaces

Correct Answer : 2

Explanation: HDFS has two main layers:

Namespace : Consists of directories, files and blocks.
It supports all the namespace related file system operations such as create, delete, modify and list files and directories.

Block Storage Service, which has two parts:
Block Management (performed in the Namenode)
Provides Datanode cluster membership by handling registrations, and periodic heart beats.
Processes block reports and maintains location of blocks.
Supports block related operations such as create, delete, modify and get block location.
Manages replica placement, block replication for under replicated blocks, and deletes blocks that are over replicated.
Storage - is provided by Datanodes by storing blocks on the local file system and allowing read/write access.
The prior HDFS architecture allows only a single namespace for the entire cluster. In that configuration, a single Namenode manages the namespace. HDFS Federation addresses this
limitation by adding support for multiple Namenodes/namespaces to HDFS.

The namenode manages the filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories in the tree." Essentially, Namespace means a
container. In this context is means the file name grouping or hierarchy structure. Metadata contains things like the owners of files, permission bits, block location, size etc

Related Questions

Question : You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of
text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will
produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?

1. Processor and network I/O

2. Disk I/O and network I/O

3. Processor and RAM

4. Processor and disk I/O

Question : You use the hadoop fs -put command to write a MB file using and HDFS block size of MB . Just after this command has finished writing MB of this file, what
would another user see when trying to access this life?

1. They would see Hadoop throw a ConcurrentFileAccessException when they try to access this file.

2. They would see the current state of the file, up to the last bit written by the command.

3. They would see the current of the file through the last completed block.

4. They would see no content until the whole file written and closed.

Question : Which statement is true

1. Output of the reducer could be zero
2. Output of the reducer is written to the HDFS
3. In practice, the reducer usually emits a single key-value pair for each input key
4. All of the above

Question : Which of the below is correct with regards to Map Reduce performance and Chunk Size on MapRF-FS

1. Smaller chunk sizes result in lower performance.

2. Smaller chunk sizes result in higher performance.

3. Larger chunk sizes result in lower performance.

4. Larger chunk sizes always result in lower performance.

Question : You have created a directory in MapR-Fs with chunk size as a MB and written a file called "HadoopExam.log" in the directory, which has in TB in size. While writing
MapReduce job you realized that, it is not performing well and wish to change the chunk size from 256MB to other size. Select the correct option which applies.

1. For better job performance , change the block size to 256MB to 300MB (Maximum possible block size)

2. For better job performance , change the block size to 256MB to 64MB (Minimum possible block size)

3. You can not change the block szie, once file is written.

4. Block size does not impact the performance of the MapReduce job.

Question : Select the correct statement, regarding MapR-FS compression for files.

1. Compression is applied automatically to uncompressed files unless you turn compression off
2. Compressed data uses less bandwidth on the network than uncompressed data.
3. Compressed data uses less disk space.
4. Compressed data uses more metadata.

1. 1,2

2. 1,3,4

3. 1,2,3

4. 1,2,4