Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : You've written a MapReduce job based on HadoopExam websites log file named MAIN.PROFILE.log file , resulting in an extremely
large amount of output data. Which of the following cluster resources will your job stress? ?

1. network I/O and disk I/O
2. network I/O and RAM
3. Access Mostly Uused Products by 50000+ Subscribers
4. RAM , network I/O and disk I/O

Correct Answer : Get Lastest Questions and Answer :
Explanation: MapReduce inputs typically come from input files loaded onto our processing cluster in HDFS. These files are evenly distributed across all our nodes. Running
a MapReduce program involves running mapping tasks on many or all of the nodes in our cluster. Each of these mapping tasks is equivalent: no mappers have particular "identities"
associated with them. Therefore, any mapper can process any input file. Each mapper loads the set of files local to that machine and processes them. When the mapping phase has
completed, the intermediate (key, value) pairs must be exchanged between machines to send all values with the same key to a single reducer. The reduce tasks are spread across the
same nodes in the cluster as the mappers. This is the only communication step in MapReduce. Individual map tasks do not exchange information with one another, nor are they aware of
one another's existence. Similarly, different reduce tasks do not communicate with one another. The user never explicitly marshals information from one machine to another; all data
transfer is handled by the Hadoop MapReduce platform itself, guided implicitly by the different keys associated with values. This is a fundamental element of Hadoop MapReduce's
reliability. If nodes in the cluster fail, tasks must be able to be restarted. If they have been performing side-effects, e.g., communicating with the outside world, then the shared
state must be restored in a restarted task. By eliminating communication and side-effects, restarts can be handled more gracefully.When denormalizing a table, the amount of data
written by the map phase will far exceed the amount of data read by the map phase. All of the data written during the map phase is first written to local disk and then transferred
over the network to the reducers during the beginning of the reduce phase. Writing a very large amount of data in the map phase will therefore create a large amount of local disk
I/O on the machines running map tasks and network I/O. Because map output is stored in a fixed size buffer that is written periodically to disk, this operation will not tax the
memory of the machines running the map tasks. Denormalizing a table is not a compute-intesive operation, so this operation will not tax the processors of the machines running the map
tasks. Partition & Shuffle: After the first map tasks have completed, the nodes may still be performing several more map tasks each. But they also begin exchanging the
intermediate outputs from the map tasks to where they are required by the reducers. This process of moving map outputs to the reducers is known as shuffling. A different subset of
the intermediate key space is assigned to each reduce node; these subsets (known as "partitions") are the inputs to the reduce tasks. Each map task may emit (key, value) pairs to any
partition; all values for the same key are always reduced together regardless of which mapper is its origin. Therefore, the map nodes must all agree on where to send the different
pieces of the intermediate data. The Partitioner class determines which partition a given (key, value) pair will go to. The default partitioner computes a hash value for the key and
assigns the partition based on this result. Sort: Each reduce task is responsible for reducing the values associated with several intermediate keys. The set of intermediate keys on a

single node is automatically sorted by Hadoop before they are presented to the Reducer. Reduce: A Reducer instance is created for each reduce task. This is an instance of
user-provided code that performs the second important phase of job-specific work. For each key in the partition assigned to a Reducer, the Reducer's reduce() method is called once.
This receives a key as well as an iterator over all the values associated with the key. The values associated with a key are returned by the iterator in an undefined order. The
Reducer also receives as parameters OutputCollector and Reporter objects; they are used in the same manner as in the map() method.

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Question : You have written a Mapper which invokes the following five calls to the OutputCollector.collect method:

output.collect(new Text("Flag"), new Text("Rahul"));
output.collect(new Text("Shirt"), new Text("Yakul"));
output.collect(new Text("Shoe"), new Text("Rahul"));
output.collect(new Text("Flag"), new Text("Gemini"));
output.collect(new Text("Socks"), new Text("Yakul"));

How many times will the Reducer's reduce() method be invoked.

1. 5
2. 4
3. Access Mostly Uused Products by 50000+ Subscribers
4. 7
5. 8

Correct Answer : Get Lastest Questions and Answer :

Explanation: Keys and values: In MapReduce, no value stands on its own. Every value has a key associated with it. Keys identify related values. For example, a log of time-coded speedometer
readings from multiple cars could be keyed by license-plate number; it would look like:

AAA-123 65mph, 12:00pm
ZZZ-789 50mph, 12:02pm
AAA-123 40mph, 12:05pm
CCC-456 25mph, 12:15pm
...
The mapping and reducing functions receive not just values, but (key, value) pairs. The output of each of these functions is the same: both a key and a value must be emitted to the
next list in the data flow.
MapReduce is also less strict than other languages about how the Mapper and Reducer work. In more formal functional mapping and reducing settings, a mapper must produce exactly one
output element for each input element, and a reducer must produce exactly one output element for each input list. In MapReduce, an arbitrary number of values can be output from each
phase; a mapper may map one input into zero, one, or one hundred outputs. A reducer may compute over an input list and emit one or a dozen different outputs.After the shuffle and
sort phase there will be 4 calls to the reduce method, one being passed "Flag" as the key and "Rahul, Gemini" as values, one bing passed "Shirt" as the key and "Yakul" as the value,
one being passed "Shoe" as the key and "Rahul" as the value and one being passed "Socks" as the key and "Yakul" as the value.

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Question : ___________ is an optimization technique where a computer system performs some task that may not be actually needed. The main idea is to
do work before it is known whether that work will be needed at all, so as to prevent a delay that would have to be incurred by doing the work after it
is known whether it is needed. If it turns out the work was not needed after all, the results are ignored. The Hadoop framework also provides a
mechanism to handle machine issues such as faulty configuration or hardware failure. The JobTracker detects that one or a number of
machines are performing poorly and starts more copies of a map or reduce task. This behaviour is known as ________________

1. Task Execution
2. Job Execution
3. Access Mostly Uused Products by 50000+ Subscribers
4. Speculative Execution

Correct Answer : Get Lastest Questions and Answer :
Explanation: Speculative execution: One problem with the Hadoop system is that by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit
the rest of the program. For example if one node has a slow disk controller, then it may be reading its input at only 10% the speed of all the other nodes. So when 99 map tasks are
already complete, the system is still waiting for the final map task to check in, which takes much longer than all the other nodes. By forcing tasks to run in isolation from one
another, individual tasks do not know where their inputs come from. Tasks trust the Hadoop platform to just deliver the appropriate input. Therefore, the same input can be processed
multiple times in parallel, to exploit differences in machine capabilities. As most of the tasks in a job are coming to a close, the Hadoop platform will schedule redundant copies of
the remaining tasks across several nodes which do not have other work to perform. This process is known as speculative execution. When tasks complete, they announce this fact to the
JobTracker. Whichever copy of a task finishes first becomes the definitive copy. If other copies were executing speculatively, Hadoop tells the TaskTrackers to abandon the tasks and
discard their outputs. The Reducers then receive their inputs from whichever Mapper completed successfully, first. Speculative execution is enabled by default. You can disable
speculative execution for the mappers and reducers by setting the mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution JobConf options to false,
respectively.The MapReduce framework monitors the progress of tasks. When one or more tasks are not making sufficiently fast progress, additional copies of the tasks will be
speculaively executed. For additional information on speculative execution, see chapter 4 in Hadoop: The Definitive Guide, 3rd Edition in the Task Execution: Speculative Execution
section.

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Related Questions

Question : You have the following key-value pairs as output from your Map task:
(the, 1)
(fox, 1)
(faster, 1)
(than, 1)
(the, 1)
(dog, 1)
How many keys will be passed to the Reducer's reduce method?

1. Six
2. Five
3. Access Mostly Uused Products by 50000+ Subscribers
4. Two
5. One

Question : You have a directory named jobdata in HDFS that contains four files: _first.txt, second.txt, .third.txt and #data.txt.
How many files will be processed by the FileInputFormat.setInputPaths () command when it's given a path object representing this
directory?

1. Four, all files will be processed
2. Three, the pound sign is an invalid character for HDFS file names
3. Access Mostly Uused Products by 50000+ Subscribers
4. None, the directory cannot be named jobdata
5. One, no special characters can prefix the name of an input file

Question : On a cluster running MapReduce v (MRv), a TaskTracker heartbeats into the JobTracker
on your cluster, and alerts the JobTracker it has an open map task slot.
What determines how the JobTracker assigns each map task to a TaskTracker?

1. The amount of RAM installed on the TaskTracker node.
2. The amount of free disk space on the TaskTracker node.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The average system load on the TaskTracker node over the past fifteen (15) minutes.
5. The location of the InsputSplit to be processed in relation to the location of the node.

Question : The Hadoop framework provides a mechanism for coping with machine issues such as
faulty configuration or impending hardware failure. MapReduce detects that one or a
number of machines are performing poorly and starts more copies of a map or reduce task.
All the tasks run simultaneously and the task finish first are used. This is called:

1. Combine
2. IdentityMapper
3. Access Mostly Uused Products by 50000+ Subscribers
4. Default Partitioner
5. Speculative Execution

Question : You've written a MapReduce job that will process million input records and generated
500 million key-value pairs. The data is not uniformly distributed. Your MapReduce job will
create a significant amount of intermediate data that it needs to transfer between mappers
and reduces which is a potential bottleneck. A custom implementation of which interface is
most likely to reduce the amount of intermediate data transferred across the network?

1. A. Partitioner
2. OutputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. Writable
5. Combiner

Question : You are using MapR Hadoop framework to analyzing financial data, with some data modeling algorithms.
And this algorithms are written in Java and created a Jar file out of this, with approx. size of 2 MB.
Which is the best way to make this library available to your MapReduce job at runtime?

1. Have your system administrator copy the JAR to all nodes in the cluster and set its
location in the HADOOP_CLASSPATH environment variable before you submit your job. What else is the requirement of the Class using using this libjars.
2. Have your system administrator place the JAR file on a Web server accessible to all
cluster nodes and then set the HTTP_JAR_URL environment variable to its location.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Package your code and the Apache Commands Math library into a zip file named JobJar.zip