Correct Answer : Get Lastest Questions and Answer : Explanation: Adjusting inputSplit size smaller or larger than the block size will influence the number of mappers that are launched in the job because one mapper is instantiated per input split.
Correct Answer : Get Lastest Questions and Answer : Explanation: Block is the physical representation of data. Split is the logical representation of data present in Block.
Block and split size can be changed in properties.
Map reads data from Block through splits i.e. split act as a broker between Block and Mapper.
Consider two blocks:
Block 1
aa bb cc dd ee ff gg hh ii jj Block 2
ww ee yy uu oo ii oo pp kk ll nn Now map reads block 1 till aa to JJ and doesn't know how to read block 2 i.e. block doesn't know how to process different block of information. Here comes a Split it will form a Logical grouping of Block 1 and Block 2 as single Block, then it forms offset(key) and line (value) using inputformat and record reader and send map to process further processing.
If your resource is limited and you want to limit the number of maps you can increase the split size. For example: If we have 640 MB of 10 blocks i.e. each block of 64 MB and resource is limited then you can mention Split size as 128 MB then then logical grouping of 128 MB is formed and only 5 maps will be executed with a size of 128 MB. L record of an input split may be incomplete, as may be the first record of an input split. Processing whole records is the responsibility of the RecordReader.
If we specify split size is false then whole file will form one input split and processed by one map which it takes more time to process when file is big.
Question : What is the purpose of "CombineFileSplit" ?
Correct Answer : Get Lastest Questions and Answer : Explanation: A sub-collection of input files. Unlike FileSplit, CombineFileSplit class does not represent a split of a file, but a split of input files into smaller sets. A split may contain blocks from different file but all the blocks in the same split are probably local to some rack CombineFileSplit can be used to implement RecordReader's, with reading one record per file.
1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array. 2. 2. There is a direct method available on the DistributedCache.getAllFilePath() 3. Access Mostly Uused Products by 50000+ Subscribers 4. 4. All of the above
Question : Select the correct statement while reading/writing the data in RDBMS using MapReduce 1. In order to use DBInputFormat you need to write a class that deserializes the columns from the database record into individual data fields to work with 2. The DBOutputFormat writes to the database by generating a set of INSERT statements in each reducer 3. Access Mostly Uused Products by 50000+ Subscribers 4. If you want to export a very large volume of data, you may be better off generating the INSERT statements into a text file, and then using a bulk data import tool provided by your database to do the database import. 5. All of the above