Question : What are map files and why are they important?
1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware" 2. Map files are the files that show how the data is distributed in the Hadoop cluster. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Map files are sorted sequence files that also have an index. The index allows fast data look up.
The Hadoop map file is a variation of the sequence file. They are very important for map-side join design pattern.
A MapFile is a sorted SequenceFile with an index to permit lookups by key. MapFile can be thought of as a persistent form of java.util.Map (although it doesnt implement this interface), which is able to grow beyond the size of a Map that is kept in memory.
Refer HadoopExam.com Recorded Training Module : 7
Question : Let's assume you have following files in the hdfs directory called merge. Test1.txt hadoopexam.com Hadoop Training 1 Test2.txt www.hadoopexam.com Hadoop YARN Training Test3.txt http://hadoopexam.com Amazon WebService Training Now you run the following command hadoop fs -getmerge -nl merge/ output2.txt What is the content in the output2.txt file
1. hadoopexam.com Hadoop Training 1 www.hadoopexam.com Hadoop YARN Training http://hadoopexam.com Amazon WebService Training
getmerge Usage: hadoop fs -getmerge (src) (localdst) [addnl] Takes a source directory and a destination file as input and concatenates files in src into the destination local file. Optionally addnl can be set to enable adding a newline character at the end of each file. "nl" option will add additional new line after each line in the file.
Question : In the regular WordCount MapReduce example, you have following driver code
public class WordCount extends Configured implements Tool { public static void main(String args[]) throws Exception { int res = ToolRunner.run(new WordCount(), args); System.exit(res); } public int run(String[] args) throws Exception { Path inputPath = new Path("shakespeare1"); Path outputPath = new Path(""+System.currentTimeMillis()); Configuration conf = getConf(); Job job = new Job(conf, this.getClass().toString()); FileInputFormat.setInputPaths(job, inputPath); FileOutputFormat.setOutputPath(job, outputPath); job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setNumReduceTasks(2); return job.waitForCompletion(true) ? 0 : 1; }}
Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class. hadoop jar wc.jar WordCount
Select the correct statement from below. 1. It will run 2 Mapper and 2 Reducer 2. It will run 2 Reducer, but number of Mapper is not known. 3. Access Mostly Uused Products by 50000+ Subscribers 4. There is not enough information to tell number of reducer.
Correct Answer : Get Lastest Questions and Answer : As you can see in the driver code it has been defined that there would be job.setNumReduceTasks(2); Total two reducer will be executed.