Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : You have following data in a Hive table, assuming based on the tag you want to join the data
colortable,1,pink,300
colortable,2,red,500
colortable,3,yellow,300
flowertable,1,rose
flowertable,2,amaryllis
flowertable,3,sunflower
flowertable,4,lily
flowertable,5,cosmos
And produce the output as below hive query.
Select color,flower from colortable join flowertable ON (colortable.id=flowertable.id)
Select the correct MapReduce program which produces the output as above query.

  : You have following data in a Hive table, assuming based on the tag you want to join the data
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers

Correct Answer : Get Lastest Questions and Answer :
Explanation: In the Mapper it produces the output as below, as there is width part ignored.
color colortable,1,pink
color colortable,2,red
color colortable,3,yellow
color flowertable,1,rose
color flowertable,2,amaryllis
color flowertable,3,sunflower
color flowertable,4,lily
color flowertable,5,cosmos
Now everything in the single reducer hence we can create two separate hashmap as below.
color {(1,pink),(2,red),(3,yellow)}
flower {(1,rose),(2,amaryllis), (3, sunflower), (4, lily), (5, cosmos)}
Now we iterate over the color id's and then look for the flower map, whether it has any flower with same ids. If yes then reducer emit as below
pink rose
red amaryllis
yellow sunflower




Question :

  :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers

Correct Answer : Get Lastest Questions and Answer :

Explanation:
AgePartitioner is a custom Partitioner to partition the data according to age.
The age is a part of the value from the input file.
The data is partitioned based on the range of the age.
In this example, there are 3 partitions, the first partition contains the information where the age is less than 20
The second partition contains data with age ranging between 20 and 50 and the third partition contains data where the age is greater than 50.





  :
1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
Correct Answer
: Get Lastest Questions and Answer :

Explanation: Reducer groups by key within the partition, hence it needs to use Partioner, Key Comparator as well as Group Comparator to implement Secondary Sort.
From the all 4 option best fit comparator is 2nd one which compares the first part (year) in the reducer it will be in the same group. And letter on the
second part you can make sorting using KeyComprator.

We must now ensure that all the values for the same natural key are passed in one call to the Reducer
Achieved by defining a Grouping Comparator class

Determines which keys and values are passed in a single call to the Reducer
Looks at just the natural key

Grouping comparators can be used in a secondary sort to ensure that only the natural key is used for partitioning and grouping




Question :

There are two input files as below to MapReduce Join job.

input/A
A.a11 A.a12
A.a21 A.a22
B.a21 A.a32
A.a31 A.a32
B.a31 A.a32

input/B
A.a11 B.a12
A.a11 B.a13
B.a11 B.a12
B.a21 B.a22
A.a31 B.a32
B.a31 B.a32

After running the MapReduce join code snippet(Left Hand Side)

What would be the first line of the output

 :
1. A.a11 A.a12 B.a12
2. A.a11 A.a12 A.a11 B.a13
3. Access Mostly Uused Products by 50000+ Subscribers
4. B.a21 A.a32 B.a21 B.a22

Correct Answer : Get Lastest Questions and Answer :



Related Questions


Question : Using Hadoop mapreduce framework, you have to use org.apache.hadoop.mapred.lib.IdentityMapper java class as a Mapper and /bin/wc as a reducer.
Select the correct option from below command.

 : Using Hadoop mapreduce framework, you have to use org.apache.hadoop.mapred.lib.IdentityMapper java class as a Mapper and /bin/wc as a reducer.
1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapR org.apache.hadoop.mapred.lib.IdentityMapper -reducer /bin/wc


2. $HADOOP_HOME/bin/hadoop \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc


3. Access Mostly Uused Products by 50000+ Subscribers
-input myInputDirs \
-output myOutputDir \
-map org.apache.hadoop.mapred.lib.IdentityMapper \
-red /bin/wc


4. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc


Question : By default, streaming tasks exiting with non-zero status are considered to be _________ tasks.

 :  By default, streaming tasks exiting with non-zero status are considered to be _________ tasks.
1. Failure

2. Success

3. Access Mostly Uused Products by 50000+ Subscribers



Question : You have written your Python code as a Mapper for MapReduce job in a file called "myPythonScript.py". To run MapReduce job, You have to transfer this Python file on
each node of the cluster, before starting job.
 : You have written your Python code as a Mapper for MapReduce job in a file called
1. True
2. False


Question : You have written your Python code as a Mapper for MapReduce job in a file called "myPythonScript.py" and /bin/wc as a reducer.
Select the correct option which will run MapReduce job.

 : You have written your Python code as a Mapper for MapReduce job in a file called
1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc


2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-file myPythonScript.py


3. Access Mostly Uused Products by 50000+ Subscribers
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-source myPythonScript.py


4. Any of the above


Question : You have written your Python code as a Mapper for MapReduce job in a file called "myPythonScript.py" and /bin/wc as a reducer.
Your mapper also uses a lookup data , stored in myDictionary.txt file
Select correct option which will run MapReduce job.
 : You have written your Python code as a Mapper for MapReduce job in a file called
1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-file myDictionary.txt

2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-file myPythonScript.py
-file myDictionary.txt
3. Access Mostly Uused Products by 50000+ Subscribers
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-source myPythonScript.py
-file myDictionary.txt
4. Any of the above


Question : Which of the following is/are valid option for Streaming Job Submission command

A. -inputformat JavaClassName
B. -outputformat JavaClassName
C. -partitioner JavaClassName
D. -combiner streamingCommand or JavaClassName
 : Which of the following is/are valid option for Streaming Job Submission command
1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,D
5. A,B,C,D