Question : Which of the following provides highest compression ratio on MapR-FS
1. lz4
2. lzf
3. gZip
4. zlib
Correct Answer : 4 Explanation: MapR supports three different compression algorithms: lz4 (default) lzf zlib Compression algorithms can be evaluated for compression ratio (higher compression means less disk space used), compression speed and decompression speed. The following table gives a comparison for the three supported algorithms. The data is based on a single-thread, Core 2 Duo at 3 GHz.
Note that compression speed depends on various factors including: block size (the smaller the block size, the faster the compression speed) single-thread vs. multi-thread system single-core vs. multi-core system the type of codec used
Compression is set at the directory level. Any files written by a Hadoop application, whether via the file APIs or over NFS, are compressed according to the settings for the directory where the file is written. Sub-directories on which compression has not been explicitly set inherit the compression settings of the directory that contains them. If you change a directory's compression settings after writing a file, the file will keep the old compression settings---that is, if you write a file in an uncompressed directory and then turn compression on, the file does not automatically end up compressed, and vice versa. Further writes to the file will use the file's existing compression setting. Note : Only the owner of a directory can change its compression settings or other attributes. Write permission is not sufficient.
Question : You have "HadoopExam.log" file is stored in "HadoopExam.zip (. TB in size)" and same zip file is transferred to MapR-FS directory and you are aware that by default it will compress the files. However, the size remain same in MapR-FS why ?
1. Compression codec is not configured properly.
2. File size bigger than 1 TB will not be compressed.
3. By default, MapR does not compress files whose filename extensions indicate they are already compressed.
4. Compression is not set on parent directory level.
Correct Answer : 3
Explanation: By default, MapR does not compress files whose filename extensions indicate they are already compressed. The default list of filename extensions is as follows: bz2, gz,lzo,snappy,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png
Question : Lets say you have following output after the Map phase in a MapReduce job
Partition p1 (I,1) (Learn, 1) (Hadoop, 1)
Partition P2 (I,1) (Learn,1) (Spark,1)
MapReduce framework will call the reduce method
1. Twice , one for each partition
2. 4 times, one for each distinct key
3. 6 Time, one for each key
4. It is unpredictable
Correct Answer : 3
Explanation: reduce method is called for each key and value pair.
1. Run all the nodes in your production cluster as virtual machines on your development workstation. 2. Run the hadoop command with the -jt local and the -fs file:///options. 3. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine. 4. Run simldooop, the Apache open-source software for simulating Hadoop clusters.
1. The keys given to a reducer aren't in a predictable order, but the values associated with those keys always are. 2. Both the keys and values passed to a reducer always appear in sorted order. 3. Neither keys nor values are in any predictable order. 4. The keys given to a reducer are in sorted order but the values associated with each key are in no predictable order