Question : You are running a MapReduce job, and inside the Mapper you want to get the actual file name which is being processed, what is the correct code snippet to fetch the filename in Mapper code
How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to know which file the mapper has read. First you need to get the input split, using the newer mapreduce API it would be done as follows: context.getInputSplit(); But in order to get the file path and the file name you will need to first typecast the result into FileSplit. So, in order to get the input file path you may do the following: Path filePath = ((FileSplit) context.getInputSplit()).getPath(); String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString(); Similarly, to get the file name, you may just call upon getName(), like this: String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
Question : In MapReduce word count, you know your file contains the maximum of three different words, and after completion of the job you want there one file will be created for each reducer. Hence, you have written a custom partitioner, which is the correct code snippet for above requirement. 1. A 2. B 3. Access Mostly Uused Products by 50000+ Subscribers
Correct Answer : 3
By default hadoop has its own internal logic that it performs on keys and depending on that it calls reducers. So if you want to write a custom partitioner than you have to overwrite that default behaviour by your own logic/algorithm. Unless you know how exactly your keys will vary this logic wont be generic and based on variations you have to figure out the logic.
Question : Input file size (kb) is given, and block size is given (mb). What is the size of the intermediate data occupied.
Correct Answer : Get Lastest Questions and Answer : As there is no other extra information give, so assuming all the words in a file will be emitted, then intermeditae size should be 47KB