1. The default input format is xml. Developer can specify other input formats as appropriate if xml is not the correct input 2. There is no default input format. The input format always should be specified. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The default input format is TextInputFormat with byte offset as a key and entire line as a value
Explanation: Hadoop permits a large range of input formats. The default is text input format. This format is the simplest way to access data as text lines.
Question : How can you overwrite the default input format?
1. In order to overwrite default input format, the Hadoop administrator has to change default settings in config file 2. In order to overwrite default input format, a developer has to set new input format on job config before submitting the job to a cluster 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of these answers are correct
Developer can always set different input formats on job configuration (e.g sequence files, binary files, compressed format).
Question : What are the common problems with map-side join?
1. The most common problem with map-side joins is introducing a high level of code complexity. This complexity has several downsides: increased risk of bugs and performance degradation. Developers are cautioned to rarely use map-side joins. 2. The most common problem with map-side joins is lack of the available map slots since map-side joins require a lot of mappers. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The most common problem with map-side join is not clearly specifying primary index in the join. This can lead to very slow performance on large datasets.
Exp: - Map-side join uses memory for joining the data based on a key. As a result the data size is limited to the size of the available memory. If this exceeds available memory an out of memory error will occur