1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware" 2. Map files are the files that show how the data is distributed in the Hadoop cluster. 3. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution 4. Map files are sorted sequence files that also have an index. The index allows fast data look up.
The Hadoop map file is a variation of the sequence file. They are very important for map-side join design pattern.
A MapFile is a sorted SequenceFile with an index to permit lookups by key. MapFile can be thought of as a persistent form of java.util.Map (although it doesnt implement this interface), which is able to grow beyond the size of a Map that is kept in memory.
Refer HadoopExam.com Recorded Training Module : 7
Question :
Which of the following utilities allows you to create and run MapReduce jobs with any executable or script as the mapper and or the reducer?
The Streaming API allows developers to use any language they wish to write Mappers and Reducers As long as the language can read from standard input and write to standard output
Question :
You need a distributed, scalable, data Store that allows you random, realtime read-write access to hundreds of terabytes of data. Which of the following would you use?
Apache HBase : HBase is the Hadoop database - A NoSQL datastore - Can store massive amounts of data - Gigabytes, terabytes, and even petabytes of data in a table - Scales to provide very high write throughput - Hundreds of thousands of inserts per second - Copes well with sparse data - Tables can have many thousands of columns - Even if most columns are empty for any given row - Has a very constrained access model - Insert a row, retrieve a row, do a full or partial table scan - Only one column (the row key ) is indexed - Does not support multi row transaction
Refer HadoopExam.com Recorded Training Module : 18