Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In MapReduce jobs all the values must implemented following interfaces

1. Comparable

2. Writable

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop
Map-Reduce framework should implement this interface.

Note that hashCode() is frequently used in Hadoop to partition keys. It's important that your implementation of hashCode() returns the same result across
different instances of the JVM. Note also that the default hashCode() implementation in Object does not satisfy this property.

public interface Writable
A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.

Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.

Writable in an interface in Hadoop and types in Hadoop must implement this interface. Hadoop provides these writable wrappers for almost all Java primitive
types and some other types,but sometimes we need to pass custom objects and these custom objects should implement Hadoop's Writable interface.Hadoop
MapReduce uses implementations of Writables for interacting with user-provided Mappers and Reducers.

Question : In MapReduce jobs all the keys must implemented following interfaces

1. Comprable

2. Witable

3. Access Mostly Uused Products by 50000+ Subscribers

4.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Comparable "This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to
as the class's natural
ordering, and the class's compareTo method is referred to as its natural comparison method.

For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator,
the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.

Virtually all Java core classes that implement Comparable have natural orderings that are consistent with equals. One exception is java.math.BigDecimal,
whose
natural ordering equates BigDecimal objects with equal values and different precisions (such as 4.0 and 4.00).

Question : Put the following java classes in order, as they act when mapper is executed.
A. InputKey
B. InputSplit
C. RecordReader
D. InputFormat

1. D,B,C,A
2. A,B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D,C,B

Correct Answer : Get Lastest Questions and Answer :
Explanation: InputFormat : Hadoop relies on the input format of the job to do three things:
1. Validate the input configuration for the job (i.e., checking that the data is there).
2. Split the input blocks and files into logical chunks of type InputSplit, each of which is assigned to a map task for processing.
3. Access Mostly Uused Products by 50000+ Subscribers

InputSplit : represents the data to be processed by an individual Mapper.
Typically, it presents a byte-oriented view on the input and is the responsibility of RecordReader of the job to process this and present a record-oriented
view.
RecordReader : A RecordReader uses the data within the boundaries created by the input split to generate key/value pairs. In the context of file-based input,
the cestart �? is the byte position in the file where the RecordReader should start generating key/value pairs. The ceend �? is where it should stop reading
records.
These are not hard boundaries as far as the API is concerned "there is nothing stopping a developer from reading the entire file for each map task. While
reading the entire file is not advised, reading outside of the boundaries it often necessary to ensure that a complete record is generated

Related Questions

Question :

Which of the following is a correct way to disable the Speculative-execution

A. In Command Line
bin/hadoop jar -Dmapreduce.map.speculative =false \
-D mapreduce.reduce.speculative=false jar>

B. In JobConfiguration:
jobconf.setBoolean("mapreduce.map.speculative", false);
jobconf.setBoolean("mapreduce.reduce.speculative ", false);

C. In JobConfiguration:
jobconf.setBoolean("mapreduce.speculative", false);

D. In JobConfiguration:
jobconf.setBoolean("mapreduce.mapred.speculative", false);

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D

Question :

You have written a word count MapReduce program for a big file, almost 5TB in size. Now you want after completion of the job,
you want to create a single file from all the reducers output. Which is the best option. Assuming all the output files of
jobs are written in the output directory

/data/weblogs/weblogs_md5_groups.bcp

1. hadoop fs -getmerge weblogs_md5_ groups.bcp /data/weblogs/weblogs_md5_groups.bcp
2. hadoop fs -getmerge /data/weblogs/weblogs_md5_groups.bcp/*
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop fs -getmerge /data/weblogs/weblogs_md5_groups.bcp weblogs_md5_ groups.bcp

Question : Which of the following data format can be analyzed by Hadoop

1. XML
2. CSV
3. Access Mostly Uused Products by 50000+ Subscribers
4. Text
5. All of the above

Question : What are supported programming language for Hadoop

1. Java and Scripting Language
2. Any Programming Language
3. Access Mostly Uused Products by 50000+ Subscribers
4. C , Cobol and Java

Question : How does Hadoop process large volumes of data?

1. Hadoop uses a lot of machines in parallel. This optimizes data processing.
2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop uses sophisticated caching techniques on namenode to speed processing of data

Question : What are sequence files and why are they important?

1. Sequence files are binary format files that are compressed and are splitable.
They are often used in high-performance map-reduce jobs
2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of above