Question : In a MapReduce job, you want each of you input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
1. Increase the parameter that controls minimum split size in the job configuration. 2. Write a custom MapRunner that iterates over all key-value pairs in the entire file. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Write a custom FileInputFormat and override the method isSplittable to always return false.
Explanation: When isSplitable returns false only a single mapper processes the entire file. The mapper can emit any number of KV pairs. Subclasses of FileInputFormat can also override the isSplitable(FileSystem, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers. API Describe as below Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. FileInputFormat implementations can override this and return false to ensure that individual input files are never split-up so that Mappers process entire files.
Question : Given a directory of files with the following structure: line number, tab character, string: Example: 1. AvilakaurKohni 2. Decemberandaugust 3. Access Mostly Uused Products by 50000+ Subscribers You want to send each line as one record to your Mapper. Which InputFormat would you use to complete the line: setInputFormat (________.class);
Correct Answer : Get Lastest Questions and Answer : Exp: KeyValueTextInputFormat TextInputFormats keys, being simply the offset within the file, are not normally very useful.It is common for each line in a file to be a key value pair, separated by a delimiter such as a tab character. For example, this is ths output produced by TextOutputFormat. Hadoop File System default output format. To interpret such files correctly, KeyValueTextInputFormat is appropriate.
You can specify the separator via the mapreduce.input.keyvaluelinerecordreader.key.value.separator property or key.value.separator.in.input.line in the old API It is a tab character by default. Consider the following input file, where space represent a horizontal tab character line1 On the top of the Crumpetty Tree line2 The Quangle Wangle sat, line3 But his face you could not see, line4 On account of his Beaver Hat. Like in the TextInputFormat case, the input is in a single split comprising four records,although this time the keys are the Text sequences before the tab in each line: (line1, On the top of the Crumpetty Tree) (line2, The Quangle Wangle sat,) (line3, But his face you could not see,) (line4, On account of his Beaver Hat.) SequenceFileInputFormat To use data from sequence files as the input to MapReduce, you use SequenceFileInputFormat. The keys and values are determined by the sequence file, and you need to make sure that your map input types correspond
Question : What is a SequenceFile?
1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects. 2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects. 3. Access Mostly Uused Products by 50000+ Subscribers 4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be same type.