Question : What are the common problems with map-side join?
1. The most common problem with map-side joins is introducing a high level of code complexity. This complexity has several downsides: increased risk of bugs and performance degradation. Developers are cautioned to rarely use map-side joins. 2. The most common problem with map-side joins is lack of the avaialble map slots since map-side joins require a lot of mappers. 3. Access Mostly Uused Products by 50000+ Subscribers 4. The most common problem with map-side join is not clearly specifying primary index in the join. This can lead to very slow performance on large datasets.
Exp: - Map-side join uses memory for joining the data based on a key. As a result the data size is limited to the size of the available memory. If this exceeds available memory an out of memory error will occur
Question : Will settings using Java API overwrite values in configuration files?
1. No. The configuration settings in the configuration file takes precedence 2. Yes. The configuration settings using Java API take precedence 3. Access Mostly Uused Products by 50000+ Subscribers 4. Only global configuration settings are captured in configuration files on namenode. There are only a very few job parameters that can be set using Java API
Explanation: Developer has full control over the setting on Hadoop cluster. All configurations can be changed via Java API
Question : What is distributed cache? 1. The distributed cache is special component on namenode that will cache frequently used data for faster client response. It is used during reduce step 2. The distributed cache is special component on datanode that will cache frequently used data for faster client response. It is used during map step 3. Access Mostly Uused Products by 50000+ Subscribers 4. The distributed cache is a component that allows developers to deploy jars for Map-Reduce processing.
Explanation: Distributed cache is the Hadoop answer to the problem of deploying third-party libraries. Distributed cache will allow libraries to be deployed to all datanodes.