Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In MRv Driver class , a new Job object is created, What else is true for Driver class ?

1. Always use ToolRunner class

2. Always provide the input file

3. It checks the command line syntex

4. Also sets values for the driver, mapper, and reducer classes used.

Correct Answer : 4
Explanation:

Question : What are the TWO main components of the YARN ResourceManager process? Choose answers
A. Job Tracker
B. Task Tracker
C. Scheduler
D. Applications Manager

1. A,B
2. B,C
3. C,D
4. A,D
5. B,D

Correct Answer : 3
Explanation:

Question : Given a directory of files with the following structure: line number, tab character, string:
Example:
1abialkjfjkaoasdfjksdlkjhqweroij
2kadfjhuwqounahagtnbvaswslmnbfgy
3kjfteiomndscxeqalkzhtopedkfsikj
You want to send each line as one record to your Mapper. Which InputFormat should you
use to complete the line: conf.setInputFormat (____.class) ; ?

1. SequenceFileAsTextInputFormat
2. SequenceFileInputFormat
3. KeyValueFileInputFormat
4. BDBInputFormat

Correct Answer : 3
Explanation: KeyValueTextInputFormat
TextInputFormats keys, being simply the offset within the file, are not normally very useful.It is
common for each line in a file to be a key value pair, separated by
a delimiter such as a tab character. For exammple, this is ths output produced by TextOutputFormat.
Hadoop File System defaul output format. To interpret such files correctly, KeyValueTextInputFormat
is appropriate.
You can specify the separator via the mapreduce.input.keyvaluelinerecordreader.key.value.separator
property or key.value.separator.in.input.line in the old API
It is a tab character by default. Consider the following input file, where space represent a horizontal
tab character
line1 On the top of the Crumpetty Tree
line2 The Quangle Wangle sat,
line3 But his face you could not see,
line4 On account of his Beaver Hat.
Like in the TextInputFormat case, the input is in a single split comprising four records,although this
time the keys are the Text sequences before the tab in each line:
(line1, On the top of the Crumpetty Tree)
(line2, The Quangle Wangle sat,)
(line3, But his face you could not see,)
(line4, On account of his Beaver Hat.)
SequenceFileInputFormat
To use data from sequence files as the input to MapReduce, you use SequenceFileInputFormat. The
keys and values are determined by the sequence file, and you need to
make sure that your map input types correspond

Related Questions

Question : You have the following key-value pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer's reduce method?

1. Six
2. Five
3. Access Mostly Uused Products by 50000+ Subscribers
4. Two
5. One

Question : You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt.
How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this
directory?

1. Four, all files will be processed
2. Three, the pound sign is an invalid character for HDFS file names
3. Access Mostly Uused Products by 50000+ Subscribers
4. None, the directory cannot be named jobdata
5. One, no special characters can prefix the name of an input file

Question : On a cluster running MapReduce v (MRv), a TaskTracker heartbeats into the JobTracker
on your cluster, and alerts the JobTracker it has an open map task slot.
What determines how the JobTracker assigns each map task to a TaskTracker?

1. The amount of RAM installed on the TaskTracker node.
2. The amount of free disk space on the TaskTracker node.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The average system load on the TaskTracker node over the past fifteen (15) minutes.
5. The location of the InsputSplit to be processed in relation to the location of the node.

Question : The Hadoop framework provides a mechanism for coping with machine issues such as
faulty configuration or impending hardware failure. MapReduce detects that one or a
number of machines are performing poorly and starts more copies of a map or reduce task.
All the tasks run simultaneously and the task finish first are used. This is called:

1. Combine
2. IdentityMapper
3. Access Mostly Uused Products by 50000+ Subscribers
4. Default Partitioner
5. Speculative Execution

Question : You've written a MapReduce job that will process million input records and generated
500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will
create a significant amount of intermediate data that it needs to transfer between mappers
and reduces which is a potential bottleneck. A custom implementation of which interface is
most likely to reduce the amount of intermediate data transferred across the network?

1. A. Partitioner
2. OutputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. Writable
5. Combiner

Question : You are using MapR Hadoop framework to analyzing financial data, with some data modeling algorithms.
And this algorithms are written in Java and created a Jar file out of this, with approx. size of 2 MB.
Which is the best way to make this library available to your MapReduce job at runtime?

1. Have your system administrator copy the JAR to all nodes in the cluster and set its
location in the HADOOP_CLASSPATH environment variable before you submit your job. What else is the requirement of the Class using using this libjars.
2. Have your system administrator place the JAR file on a Web server accessible to all
cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Package your code and the Apache Commands Math library into a zip file named JobJar.zip