Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : For HadoopExam.com user profiles you need to analyze roughly ,, JPEG files of all the.
Each file is no more than 3kB.Because your Hadoop cluster isn't optimized for storing and processing many small files,
you decide to group the files into a single archive. The toolkit that will be used to process
the files is written in Ruby and requires that it be run with administrator privileges.
Which of the following file formats should you select to build your archive?

1. TIFF
2. SequenceFiles
3. Access Mostly Uused Products by 50000+ Subscribers
4. MPEG
5. Avro

Ans : 5

Exp :The two formats that are best suited to merging small files into larger archives for processing in Hadoop are Avro and SequenceFiles. Avro has Ruby bindings; SequenceFiles are
only supported in Java.

JSON, TIFF, and MPEG are not appropriate formats for archives. JSON is also not an appropriate format for image data.

Question : SequenceFiles are flat files consisting of binary key/value pairs. SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting
respectively.
There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:
You have created a SequenceFile (MAIN.PROFILE.log) with custom key and value types. What command displays the contents of a
SequenceFile named MAIN.PROFILE.log in your terminal in human-readable format?

1. hadoop fs -decrypt MAIN.PROFILE.log
2. hadoop fs -text MAIN.PROFILE.log
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop fs -encode MAIN.PROFILE.log

Correct Answer : Get Lastest Questions and Answer :

Explanation: SequenceFiles are flat files consisting of binary key/value pairs.SequenceFile provides SequenceFile.Writer, SequenceFile.Reader and SequenceFile.Sorter classes for writing,
reading and sorting respectively. There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs: Writer : Uncompressed records.
RecordCompressWriter : Record-compressed files, only compress values.
BlockCompressWriter : Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
The actual compression algorithm used to compress key and/or values can be specified by using the appropriate CompressionCodec. The recommended way is to use the static createWriter
methods provided by the SequenceFile to chose the preferred format. The SequenceFile.Reader acts as the bridge and can read any of the above SequenceFile formats. SequenceFile
Formats Essentially there are 3 different formats for SequenceFiles depending on the CompressionType specified. All of them share a common header described below. SequenceFile Header
version - 3 bytes of magic header SEQ, followed by 1 byte of actual version number (e.g. SEQ4 or SEQ6)
keyClassName -key class
valueClassName - value class
compression - A boolean which specifies if compression is turned on for keys/values in this file.
blockCompression - A boolean which specifies if block-compression is turned on for keys/values in this file.
compression codec - CompressionCodec class which is used for compression of keys and/or values (if compression is enabled).
metadata - SequenceFile.Metadata for this file. sync - A sync marker to denote end of the header. Uncompressed SequenceFile Format Header, Record , Record length , Key length , Key,
Value A sync-marker every few 100 bytes or so.A SequenceFile contains the name of the classes used for the key and value as part of its header. hadoop fs -text reads the records, and
calls the toString() method of the relevant class to display human-readable output on the console. The hadoop fs -cat command would display the raw data from the file, which is not
human-readable. hadoop fs -get retrieves the file from HDFS and places it on the local disk, which is not what was required. The other options are syntactically incorrect.

Question : Speculative execution is an optimization technique where a computer system performs
some task that may not be actually needed. The main idea is to do work before it is known whether that work will be needed at all,
so as to prevent a delay that would have to be incurred by doing the work after it is known whether it is needed. If it turns out the work was not needed
after all, any changes made by the work are reverted and the results are ignored. In a ETL MapReduce job which will use Mappers to process data
and then using DBMSOutputFormat with the Reducers you directly push to Oracle database. Select the correct statement which applies for
speculative execution.

1. Disable speculative execution for the data insert job
2. Enable speculative execution for the data insert job
3. Access Mostly Uused Products by 50000+ Subscribers
4. Configure only single mapper for the data insert job

Correct Answer : Get Lastest Questions and Answer :

Explanation: I usually disable speculative execution for MapReduce task when I write to RDBMS in Hive user defined table function.

set mapred.map.tasks.speculative=false;
set mapred.reduce.tasks.speculative.execution=false;
set hive.mapred.reduce.tasks.speculative.execution=false;

And if you tune the mapred.reduce.tasks, you can control RDBMS session-running number. It is good also to use Batch mode and control the commit If we do not disable speculative
execution, it is possible that multiple instances of a given Reducer could run, which would result in more data than was intended being inserted into the target RDBMS. None of the
other options presented is required; although you need the database driver on the client machine if you plan to connect to the RDBMS from that client, it does not need to be present.
It is certainly not necessary for yours to be the only job running on the cluster, and the values ofdfs.datanode.failed.volumes.tolerated and the block size of the input data are
irrelevant. Finally, the RDBMS does not need to allow passwordless login.

Question : Apache MRUnit is a Java library that helps developers unit test Apache Hadoop map reduce jobs.
MRUnit testing framework is based on JUnit and it can test Map Reduce programs written on 0.20 , 0.23.x , 1.0.x , 2.x version of Hadoop
You have a Reducer which simply sums up the values for any given key. You write a unit test in MRUnit to test the Reducer, with this code:
@Test
public void testETLReducer() {
List < IntWritable > values = new ArrayList < IntWritable > ();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
List < IntWritable > values2 = new ArrayList < IntWritable > ();
values2.add(new IntWritable(1));
values2.add(new IntWritable(1));
reduceDriver.withInput(new LongWritable("5673"), values);
reduceDriver.withInput(new LongWritable("109098"), values2);
reduceDriver.withOutput(new LongWritable("109098"), new IntWritable(2));
reduceDriver.runTest();
} What is the result?

1. The test will pass with warning and error
2. The test will pass with no warning and error
3. Access Mostly Uused Products by 50000+ Subscribers
4. Code will not compile

Correct Answer : Get Lastest Questions and Answer : Example : @Test
public void testMapReduce() {
mapReduceDriver.withInput(new LongWritable(), new Text(
"655209;1;796764372490213;804422938115889;6"));
List (IntWritable) values = new ArrayList(IntWritable)();
values.add(new IntWritable(1));
values.add(new IntWritable(1));
mapReduceDriver.withOutput(new Text("6"), new IntWritable(2));
mapReduceDriver.runTest();
}
MRUnit supports two style of testings. First style is to tell the framework both input and output values and let the framework do the assertions, second is the more traditional
approach where you do the assertion yourself. Lets write a test using the first approach.When testing a Reducer using MRUnit, you should only pass the Reducer a single keyand list of
values. In this case, we use the withInput() method twice, but only the second call will actually be used -- the first will be overridden by the second. If you want to test the
Reducer with two inputs, you would have to write two tests. Testing a Hadoop job requires a lot of effort not related to the job. You must configure it to run locally, create a
sample input file, run the job on your sample input, and then compare to an expected output file. This not only takes time, but makes your tests run very slow due to all the file
I/O. MRUnit is: a unit test library designed to facilitate easy integration between your MapReduce development process and standard development and testing tools such as JUnit With
MRUnit, there are no test files to create, no configuration parameters to change, and generally less test code. You can cut the clutter and focus on the meat of your tests.

Watch the training Module 21 from http://hadoopexam.com/index.html/#hadoop-training

Related Questions

Question : You need to create a job that does frequency analysis on input data. You will do this by writing a Mapper that uses TextInputFormat and splits each value (a line of
text from an input file) into individual characters. For each one of these characters, you will emit the character as a key and an InputWritable as the value. As this will
produce proportionally more intermediate data than input data, which two resources should you expect to be bottlenecks?

1. Processor and network I/O

2. Disk I/O and network I/O

3. Processor and RAM

4. Processor and disk I/O

Question : You use the hadoop fs -put command to write a MB file using and HDFS block size of MB . Just after this command has finished writing MB of this file, what
would another user see when trying to access this life?

1. They would see Hadoop throw a ConcurrentFileAccessException when they try to access this file.

2. They would see the current state of the file, up to the last bit written by the command.

3. They would see the current of the file through the last completed block.

4. They would see no content until the whole file written and closed.

Question : Which statement is true

1. Output of the reducer could be zero
2. Output of the reducer is written to the HDFS
3. In practice, the reducer usually emits a single key-value pair for each input key
4. All of the above

Question : Which of the below is correct with regards to Map Reduce performance and Chunk Size on MapRF-FS

1. Smaller chunk sizes result in lower performance.

2. Smaller chunk sizes result in higher performance.

3. Larger chunk sizes result in lower performance.

4. Larger chunk sizes always result in lower performance.

Question : You have created a directory in MapR-Fs with chunk size as a MB and written a file called "HadoopExam.log" in the directory, which has in TB in size. While writing
MapReduce job you realized that, it is not performing well and wish to change the chunk size from 256MB to other size. Select the correct option which applies.

1. For better job performance , change the block size to 256MB to 300MB (Maximum possible block size)

2. For better job performance , change the block size to 256MB to 64MB (Minimum possible block size)

3. You can not change the block szie, once file is written.

4. Block size does not impact the performance of the MapReduce job.

Question : Select the correct statement, regarding MapR-FS compression for files.

1. Compression is applied automatically to uncompressed files unless you turn compression off
2. Compressed data uses less bandwidth on the network than uncompressed data.
3. Compressed data uses less disk space.
4. Compressed data uses more metadata.

1. 1,2

2. 1,3,4

3. 1,2,3

4. 1,2,4