Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In our website www.HadoopExam.com we have Million profiles and created ETL jobs for processing this file.
You have submitted a ETL mapReduce job for HadoopExam.com websites log file analysis as well as combining profile data to Hadoop
and notice in the JobTracker's Web UI that the Mappers are 80% complete
while the reducers are 20% complete. What is the best explanation for this?

1. The progress attributed to the reducer refers to the transfer of data from completed Mappers.
2. The progress attributed to the reducer refers to the transfer of data from Mappers is still going on.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The progress attributed to the reducer refers to the transfer of data from Mappers an not be predicted.

Correct Answer : Get Lastest Questions and Answer :

Explanation: While the reduce() method is not called until all of the mappers have completed, transfer of data from completed mappers starts prior to all of the mappers having completed.
Themapred.reduce.slowstart.completed.maps property specifies the percentage of mappers that must complete before the reducers can start receiving data from the completed mappers.
Reducer has 3 primary phases: Shuffle
The Reducer copies the sorted output from each Mapper using HTTP across the network.
Sort : The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key).
The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort : To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping
comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce.The
grouping comparator is specified via Job.setGroupingComparatorClass(Class). The sort order is controlled by Job.setSortComparatorClass(Class). For example, say that you want to find
duplicate web pages and tag them all with the url of the "best" known example. You would set up the job like: Map Input Key: url, Map Input Value: document , Map Output Key: document
checksum, url pagerank , Map Output Value: url , Partitioner: by checksum , OutputKeyComparator: by checksum and then decreasing pagerank , OutputValueGroupingComparator: by checksum
Reduce : No reduce task's reduce() method is called until all map tasks have completed. Every reduce task'sreduce() method expects to receive its data in sorted order. In this phase
the reduce(Object, Iterable, Context) method is called for each (key, (collection of values)) in the sorted inputs. The output of the reduce task is typically written to a
RecordWriter via TaskInputOutputContext.write(Object, Object). The output of the Reducer is not re-sorted. If the reduce() method is called before all of the map tasks have
completed, it would be possible that the reduce() method would receive the data out of order. For more information about the shuffle and sort phase

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Question : In your MapReduce job, you have three configuration parameters.
What is the correct or best way to pass a these three configuration parameters to a mapper or reducer?

1. As key pairs in the Configuration object.
2. As value pairs in the Configuration object.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not possible

Correct Answer : Get Lastest Questions and Answer :

Explanation: Unless I'm missing something, if you have a Properties object containing every property you need in your M/R job, you simply need to write the content of the Properties object
to the Hadoop Configuration object. For example, something like this:

Configuration conf = new Configuration();
Properties params = getParameters(); // do whatever you need here to create your object
for (Entry < Object, Object > entry : params.entrySet()) {
String propName = (String)entry.getKey();
String propValue = (String)entry.getValue();
conf.set(propName, propValue);
}

Then inside your M/R job, you can use the Context object to get back your Configuration in both the mapper (the map function) or the reducer (the reduce function), like this:

public void map(MD5Hash key, OverlapDataWritable value, Context context)
Configuration conf = context.getConfiguration();
String someProperty = conf.get("something");
....
}

Note that when using the Configuration object, you can also access the Context in the setup and cleanup methods, useful to do some initialization if needed.

Also it's worth mentioning you could probably directly call the addResource method from the Configuration object to add your properties directly as an InputStream or a file, but I
believe this has to be an XML configuration like the regular Hadoop XML configs, so that might just be overkill.

In case of non-String objects, I would advise using serialization: You can serialize your objects, and then convert them to Strings (probably encode them for example with Base64 as
I'm not sure what would happen if you have unusual characters), and then on the mapper/reducer side de-serialize the objects from the Strings you get from the properties inside
Configuration.

Another approach would be to do the same serialization technique, but instead write to HDFS, and then add these files to the DistributedCache. Sounds a bit overkill, but this would
probably work.

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Question : In word count MapReduce algorithm, why might using a combiner (Combiner, runs after the Mapper and before the Reducer. )
reduce the overall job running time?

1. combiners perform local filtering of repeated word, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.
2. combiners perform global aggregation of word counts, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.
3. Access Mostly Uused Products by 50000+ Subscribers
4. combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be shuffled across the network to the reducers.

Correct Answer : Get Lastest Questions and Answer :

Explanation: Combiner: The pipeline showed earlier omits a processing step which can be used for optimizing bandwidth usage by your MapReduce job. Called the Combiner, this pass runs after
the Mapper and before the Reducer. Usage of the Combiner is optional. If this pass is suitable for your job, instances of the Combiner class are run on every node that has run map
tasks. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output
from the Mappers. The Combiner is a "mini-reduce" process which operates only on data generated by one machine.
Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same
document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer. By using a Combiner, these can be condensed into a
single ("cat", 3) pair to be sent to the Reducer. Now each node only sends a single value to the reducer for each word -- drastically reducing the total bandwidth required for the
shuffle process, and speeding up the job. The best part of all is that we do not need to write any additional code to take advantage of this! If a reduce function is both commutative
and associative, then it can be used as a Combiner as well. You can enable combining in the word count program by adding the following line to the driver:

conf.setCombinerClass(Reduce.class);
The Combiner should be an instance of the Reducer interface. If your Reducer itself cannot be used directly as a Combiner because of commutativity or associativity, you might still
be able to write a third class to use as a Combiner for your job.The only affect a combiner has is to reduce the number of records that are passed from the mappers to the reducers in
the shuffle and sort phase. For more information on combiners, see chapter 2 of Hadoop: The Definitive Guide, 3rd Edition in the Scaling Out: Combiner Functions section.

Watch the training from http://hadoopexam.com/index.html/#hadoop-training

Related Questions

Question : All keys used for intermediate output from mappers must:

1. Implement a splittable compression algorithm.
2. Be a subclass of FileInputFormat.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Override isSplitable.
5. Implement a comparator for speedy sorting.

Question : Which Hadoop component is responsible for managing the distributed file system metadata?

1. NameNode
2. Metanode
3. Access Mostly Uused Products by 50000+ Subscribers
4. NameSpaceManager

Question : You need to move a file titled "weblogs" into HDFS. When you try to copy the file, you can't.
You know you have ample space on your DataNodes. Which action should you take to
relieve this situation and store more files in HDFS?

1. Increase the block size on all current files in HDFS.
2. Increase the block size on your remaining files.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Increase the amount of memory for the NameNode.
5. Increase the number of disks (or size) for the NameNode.

Question : In the reducer, the MapReduce API provides you with an iterator over Writable values.
What does calling the next () method return?

1. It returns a reference to a different Writable object time.
2. It returns a reference to a Writable object from an object pool.
3. Access Mostly Uused Products by 50000+ Subscribers
4. It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object.
5. It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise.

Question : MapReduce v (MRv/YARN) splits which major functions of the JobTracker into separate daemons? Select two.
A. Heath states checks (heartbeats)
B. Resource management
C. Job scheduling/monitoring
D. Job coordination between the ResourceManager and NodeManager
E. Launching tasks
F. Managing file system metadata
G. MapReduce metric reporting
H. Managing tasks

1. B,C
2. A,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. C,H
5. B,G

Question : For each input key-value pair, mappers can emit:

1. As many intermediate key-value pairs as designed. There are no restrictions on the
types of those key-value pairs (i.e., they can be heterogeneous).
2. As many intermediate key-value pairs as designed, but they cannot be of the same type
as the input key-value pair.
3. Access Mostly Uused Products by 50000+ Subscribers
4. One intermediate key-value pair, but of the same type.
5. As many intermediate key-value pairs as designed, as long as all the keys have the
same types and all the values have the same type.