Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Arrange following in correct order of execution
A. call to main() method
B. Instantiation of new Configuration object
D. job.waitForCompletion()
C. Calling ToolRunner.run() static method

1. A,B,C,D
2. D,C,B,A
3. Access Mostly Uused Products by 50000+ Subscribers
4. B,A,C,D
5. A,C,D,B

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Question : Select the correct statement for Driver class

1. Driver class needs to check input argument if input directory path is provided or not.

2. We have to configure mapper.reducer in Driver class only

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: The Driver class first checks the invocation of the command (checks the count of the command-line arguments provided)

It sets values for the job, including the driver, mapper, and reducer classes used.
In the Driver class, we also define the types for output key and value in the job as Text and FloatWritable respectively. If the mapper and reducer classes
do NOT use the same output key and value types, we must specify for the mapper. In this case, the output value type of the mapper is Text, while the output
value type of the reducer is FloatWritable.

There are 2 ways to launch the job " syncronously and asyncronously. The job.waitForCompletion() launches the job syncronously. The driver code will
block waiting for the job to complete at this line. The true argument informs the framework to write verbose output to the controlling terminal of the job.

The main() method is the entry point for the driver. In this method, we instantiate a new Configuration object for the job. We then call the ToolRunner
static run() method.

You have to compile the three classes and place the compiled classes into a directory called ceclasses �?. Use the jar command to put the mapper and reducer
classes into a jar file the path to which is included in the classpath when you build the driver. After you build the driver, the driver class is also added
to the existing jar file.

Question : Sometimes, before running your MapReduce job. You configure the below environment variable
LD_LIBRARY_PATH
Why?

1. It defines a list of directories where your executables are located

2. It points to all the jars in the Hadoop distribu3on required to compile and run your MapReduce programs

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: HADOOP_HOME : allows you to reference the value of the HADOOP_HOME variable when defining other variables.
LD_LIBRARY_PATH : environment variable defines the path to your library files for executables.
uses libraries that are specifically compiled for the MapR distribution.
Using Hadoop native libraries improves the performance of your MapReduce jobs by using compiled object code rather than Java byte codes.

Related Questions

Question : You are running a MapReduce job, and inside the Mapper you want to get the actual file name which is being processed,
what is the correct code snippet to fetch the filename in Mapper code

1. String fileName = ((FileStatus) context.getFileStatus()).getPath().getName();
2. String fileName = context.getPath().getName();
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : In MapReduce word count,
you know your file contains the
maximum of three different words,
and after completion of the job
you want there one file will be
created for each reducer. Hence,
you have written a custom
partitioner, which is the correct
code snippet for above requirement.

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers

Question : Input file size (kb) is given, and block size is given (mb). What is the size of the intermediate data occupied?

1. 47KB
2. 83KB
3. Access Mostly Uused Products by 50000+ Subscribers
4. Job Fails

Question : For HadoopExam.com user profiles you need to analyze roughly ,, JPEG files of all the.
Each file is no more than 3kB.Because your Hadoop cluster isn't optimized for storing and processing many small files,
you decide to group the files into a single archive. The toolkit that will be used to process
the files is written in Ruby and requires that it be run with administrator privileges.
Which of the following file formats should you select to build your archive?

1. TIFF
2. SequenceFiles
3. Access Mostly Uused Products by 50000+ Subscribers
4. MPEG
5. Avro

Ans : 5

Exp :The two formats that are best suited to merging small files into larger archives for processing in Hadoop are Avro and SequenceFiles. Avro has Ruby bindings; SequenceFiles are
only supported in Java.

JSON, TIFF, and MPEG are not appropriate formats for archives. JSON is also not an appropriate format for image data.

Question : SequenceFiles are flat files consisting of binary key/value pairs. SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting
respectively.
There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:
You have created a SequenceFile (MAIN.PROFILE.log) with custom key and value types. What command displays the contents of a
SequenceFile named MAIN.PROFILE.log in your terminal in human-readable format?

1. hadoop fs -decrypt MAIN.PROFILE.log
2. hadoop fs -text MAIN.PROFILE.log
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop fs -encode MAIN.PROFILE.log

Question : Speculative execution is an optimization technique where a computer system performs
some task that may not be actually needed. The main idea is to do work before it is known whether that work will be needed at all,
so as to prevent a delay that would have to be incurred by doing the work after it is known whether it is needed. If it turns out the work was not needed
after all, any changes made by the work are reverted and the results are ignored. In a ETL MapReduce job which will use Mappers to process data
and then using DBMSOutputFormat with the Reducers you directly push to Oracle database. Select the correct statement which applies for
speculative execution.

1. Disable speculative execution for the data insert job
2. Enable speculative execution for the data insert job
3. Access Mostly Uused Products by 50000+ Subscribers
4. Configure only single mapper for the data insert job

Question : Apache MRUnit is a Java library that helps developers unit test Apache Hadoop map reduce jobs.
MRUnit testing framework is based on JUnit and it can test Map Reduce programs written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop
You have a Reducer which simply sums up the values for any given key. You write a unit test in MRUnit to test the Reducer, with this code:
@Test
public void testETLReducer() {
List < IntWritable > values = new ArrayList < IntWritable > ();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
List < IntWritable > values2 = new ArrayList < IntWritable > ();
values2.add(new IntWritable(1));
values2.add(new IntWritable(1));
reduceDriver.withInput(new LongWritable("5673"), values);
reduceDriver.withInput(new LongWritable("109098"), values2);
reduceDriver.withOutput(new LongWritable("109098"), new IntWritable(2));
reduceDriver.runTest();
} What is the result?

1. The test will pass with warning and error
2. The test will pass with no warning and error
3. Access Mostly Uused Products by 50000+ Subscribers
4. Code will not compile