Question : Which of the below is correct with regards to Map Reduce performance and Chunk Size on MapRF-FS
1. Smaller chunk sizes result in lower performance.
2. Smaller chunk sizes result in higher performance.
3. Larger chunk sizes result in lower performance.
4. Larger chunk sizes always result in lower performance.
Correct Answer : 1 Explanation: Files in MapR-FS are split into chunks (similar to Hadoop blocks) that are normally 256 MB by default. Any multiple of 65,536 bytes is a valid chunk size, but tuning the size correctly is important: Smaller chunk sizes result in larger numbers of map tasks, which can result in lower performance due to task scheduling overhead Larger chunk sizes require more memory to sort the map task output, which can crash the JVM or add significant garbage collection overhead MapR can deliver a single stream at upwards of 300 MB per second, making it possible to use larger chunks than in stock Hadoop. Generally, it is wise to set the chunk size between 64 MB and 256 MB.
Question : You have created a directory in MapR-Fs with chunk size as a MB and written a file called "HadoopExam.log" in the directory, which has in TB in size. While writing MapReduce job you realized that, it is not performing well and wish to change the chunk size from 256MB to other size. Select the correct option which applies.
1. For better job performance , change the block size to 256MB to 300MB (Maximum possible block size)
2. For better job performance , change the block size to 256MB to 64MB (Minimum possible block size)
3. You can not change the block szie, once file is written.
4. Block size does not impact the performance of the MapReduce job.
Correct Answer : 3 Explanation: Chunk size is set at the directory level. Files inherit the chunk size settings of the directory that contains them, as do subdirectories on which chunk size has not been explicitly set. Any files written by a Hadoop application, whether via the file APIs or over NFS, use chunk size specified by the settings for the directory where the file is written. If you change a directory's chunk size settings after writing a file, the file will keep the old chunk size settings. Further writes to the file will use the file's existing chunk size.
Question : Select the correct statement, regarding MapR-FS compression for files.
1. Compression is applied automatically to uncompressed files unless you turn compression off 2. Compressed data uses less bandwidth on the network than uncompressed data. 3. Compressed data uses less disk space. 4. Compressed data uses more metadata.
1. 1,2
2. 1,3,4
3. 1,2,3
4. 1,2,4
Correct Answer : 3
Explanation: MapR provides compression for files stored in the cluster. Compression is applied automatically to uncompressed files unless you turn compression off. The advantages of compression are: Compressed data uses less bandwidth on the network than uncompressed data. Compressed data uses less disk space.
1. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line. 2. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line. 3. The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines. 4. Input file splits may cross line breaks. A line that crosses file splits is ignored. 5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
Question : For each intermediate key, each reducer task can emit: 1. As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous). 2. As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs. 3. As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type. 4. One final key-value pair per value associated with the key; no restrictions on the type. 5. One final key-value pair per key; no restrictions on the type.