Question : You have following data in a Hive table, assuming based on the tag you want to join the data colortable,1,pink,300 colortable,2,red,500 colortable,3,yellow,300 flowertable,1,rose flowertable,2,amaryllis flowertable,3,sunflower flowertable,4,lily flowertable,5,cosmos And produce the output as below hive query. Select color,flower from colortable join flowertable ON (colortable.id=flowertable.id) Select the correct MapReduce program which produces the output as above query.
Correct Answer : Get Lastest Questions and Answer : Explanation: In the Mapper it produces the output as below, as there is width part ignored. color colortable,1,pink color colortable,2,red color colortable,3,yellow color flowertable,1,rose color flowertable,2,amaryllis color flowertable,3,sunflower color flowertable,4,lily color flowertable,5,cosmos Now everything in the single reducer hence we can create two separate hashmap as below. color {(1,pink),(2,red),(3,yellow)} flower {(1,rose),(2,amaryllis), (3, sunflower), (4, lily), (5, cosmos)} Now we iterate over the color id's and then look for the flower map, whether it has any flower with same ids. If yes then reducer emit as below pink rose red amaryllis yellow sunflower
Explanation: AgePartitioner is a custom Partitioner to partition the data according to age. The age is a part of the value from the input file. The data is partitioned based on the range of the age. In this example, there are 3 partitions, the first partition contains the information where the age is less than 20 The second partition contains data with age ranging between 20 and 50 and the third partition contains data where the age is greater than 50.
Explanation: Reducer groups by key within the partition, hence it needs to use Partioner, Key Comparator as well as Group Comparator to implement Secondary Sort. From the all 4 option best fit comparator is 2nd one which compares the first part (year) in the reducer it will be in the same group. And letter on the second part you can make sorting using KeyComprator.
We must now ensure that all the values for the same natural key are passed in one call to the Reducer Achieved by defining a Grouping Comparator class
Determines which keys and values are passed in a single call to the Reducer Looks at just the natural key
Grouping comparators can be used in a secondary sort to ensure that only the natural key is used for partitioning and grouping
Question :
There are two input files as below to MapReduce Join job.