Question : You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?
Question : Which best describes how TextInputFormat processes input files and line breaks?
1. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line. 2. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line. 3. The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines. 4. Input file splits may cross line breaks. A line that crosses file splits is ignored. 5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
Question : For each intermediate key, each reducer task can emit: 1. As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous). 2. As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs. 3. As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type. 4. One final key-value pair per value associated with the key; no restrictions on the type. 5. One final key-value pair per key; no restrictions on the type.