Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following provides highest compression ratio on MapR-FS

1. lz4

2. lzf

3. gZip

4. zlib

Correct Answer : 4
Explanation: MapR supports three different compression algorithms:
lz4 (default)
lzf
zlib
Compression algorithms can be evaluated for compression ratio (higher compression means less disk space used), compression speed and decompression speed. The following table gives a
comparison for the three supported algorithms. The data is based on a single-thread, Core 2 Duo at 3 GHz.

Compression Type | Compression Ratio | Compression Speed | Decompression Speed
lz4 | 2.084 | 330 MB/s | 915 MB/s
lzf | 2.076 | 197 MB/s | 465 MB/s
zlib | 3.095 | 14 MB/s | 210 MB/s

Note that compression speed depends on various factors including:
block size (the smaller the block size, the faster the compression speed)
single-thread vs. multi-thread system
single-core vs. multi-core system
the type of codec used

Compression is set at the directory level. Any files written by a Hadoop application, whether via the file APIs or over NFS, are compressed according to the settings for the
directory where the file is written. Sub-directories on which compression has not been explicitly set inherit the compression settings of the directory that contains them.
If you change a directory's compression settings after writing a file, the file will keep the old compression settings---that is, if you write a file in an uncompressed directory and
then turn compression on, the file does not automatically end up compressed, and vice versa. Further writes to the file will use the file's existing compression setting.
Note : Only the owner of a directory can change its compression settings or other attributes. Write permission is not sufficient.

Question : You have "HadoopExam.log" file is stored in "HadoopExam.zip (. TB in size)" and same zip file is transferred to MapR-FS directory and you are aware that by default
it will compress the files. However, the size remain same in MapR-FS why ?

1. Compression codec is not configured properly.

2. File size bigger than 1 TB will not be compressed.

3. By default, MapR does not compress files whose filename extensions indicate they are already compressed.

4. Compression is not set on parent directory level.

Correct Answer : 3

Explanation: By default, MapR does not compress files whose filename extensions indicate they are already compressed. The default list of filename extensions is as follows: bz2,
gz,lzo,snappy,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png

Question : Lets say you have following output after the Map phase in a MapReduce job

Partition p1
(I,1)
(Learn, 1)
(Hadoop, 1)

Partition P2
(I,1)
(Learn,1)
(Spark,1)

MapReduce framework will call the reduce method

1. Twice , one for each partition

2. 4 times, one for each distinct key

3. 6 Time, one for each key

4. It is unpredictable

Correct Answer : 3

Explanation: reduce method is called for each key and value pair.

Related Questions

Question : You want to run Hadoop jobs on your development workstation for testing before you
submit them to your production cluster. Which mode of operation in Hadoop allows you to
most closely simulate a production cluster while using a single machine?

1. Run all the nodes in your production cluster as virtual machines on your development workstation.
2. Run the hadoop command with the -jt local and the -fs file:///options.
3. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine.
4. Run simldooop, the Apache open-source software for simulating Hadoop clusters.

Question : Assuming default settings, which best describes the order of data provided to a reducers reduce method:

1. The keys given to a reducer aren't in a predictable order, but the values associated with those keys always are.
2. Both the keys and values passed to a reducer always appear in sorted order.
3. Neither keys nor values are in any predictable order.
4. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order

Question : Which HDFS command displays the contents of the file x in the user's HDFS home directory?

1. hadoop fs -Is x

2. hdfs fs -get x

3. hadoop fs -cat x

4. hadoop fs -cp x

Question : Using Hadoop mapreduce framework, you have to use Unix /bin/cat command as a Mapper and /bin/wc as a reducer. Select the correct option from below command.

1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapR /bin/cat -reducer /bin/wc
2. $HADOOP_HOME/bin/hadoop \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc
3. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-map /bin/cat \
-red /bin/wc
4. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

Question : Below snippet submits new streaming job for Hadoop MapReduce.

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

Now input line submitted as below.

We are learning BigData
Training provided by HadoopExam.com
MapReduce is nice to learn

While processing above line what will be the Keys.

1. [W,T,M]

2. Size of each line in Bytes e.g. [1000, 1050, 900]

3. Access Mostly Uused Products by 50000+ Subscribers

4. Entire content of the line [We are learning BigData, Training provided by HadoopExam.com, MapReduce is nice to learn]

Question : Below snippet submits new streaming job for Hadoop MapReduce.

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /bin/wc

Now input line submitted as below.

We [tab]are learning BigData
Training [tab] provided by HadoopExam.com
Map[tab]Reduce is nice to learn

While processing above line what will be the Keys.

1. [W,T,M]

2. Size of each line in Bytes e.g. [1000, 1050, 900]

3. Access Mostly Uused Products by 50000+ Subscribers

4. Entire content of the line [We are learning BigData, Training provided by HadoopExam.com, MapReduce is nice to learn]

5. [We, Training, Map]