Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : You are running a MapReduce job, and inside the Mapper you want to get the actual file name which is being processed,
what is the correct code snippet to fetch the filename in Mapper code

1. String fileName = ((FileStatus) context.getFileStatus()).getPath().getName();
2. String fileName = context.getPath().getName();
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Correct Answer : Get Lastest Questions and Answer :

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory,
each mapper may read a different file, and I need to know which file the mapper has read.
First you need to get the input split, using the newer mapreduce API it would be done as follows:
context.getInputSplit();
But in order to get the file path and the file name you will need to first typecast the result into FileSplit.
So, in order to get the input file path you may do the following:
Path filePath = ((FileSplit) context.getInputSplit()).getPath();
String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString();
Similarly, to get the file name, you may just call upon getName(), like this:
String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();

Question : In MapReduce word count,
you know your file contains the
maximum of three different words,
and after completion of the job
you want there one file will be
created for each reducer. Hence,
you have written a custom
partitioner, which is the correct
code snippet for above requirement.

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers

Correct Answer : 3

By default hadoop has its own internal logic that it performs on keys and depending on that it calls reducers. So if you want to write a custom partitioner than you have to overwrite
that default behavior by your own logic/algorithm. Unless you know how exactly your keys will vary this logic wont be generic and based on variations you have to figure out the
logic.

Question : Input file size (kb) is given, and block size is given (mb). What is the size of the intermediate data occupied?

1. 47KB
2. 83KB
3. Access Mostly Uused Products by 50000+ Subscribers
4. Job Fails

Correct Answer : Get Lastest Questions and Answer :
As there is no other extra information give, so assuming all the words in a file will be emitted, then intermediate size should be 47KB

Related Questions

Question : A combiner reduces:

1. The number of values across different keys in the iterator supplied to a single reduce method call.

2. The amount of intermediate data that must be transferred between the mapper and reducer.

3. The number of input files a mapper must process.

4. The number of output files a reducer must produce.

Question : Which two of the following statements are true about hdfs? Choose answers

A. An HDFS file that is larger than dfs.blocksize is split into blocks
B. Blocks are replicated to multiple datanodes
C. HDFS works best when storing a large number of relatively small files
D. Block sizes for all files must be the same size

1. A,B
2. B,C
3. C,D
4. A,D

Question : You want to populate an associative array in order to perform a map-side join. You've decided to put this information in a text file, place that file into the
DistributedCache and read it in your Mapper before any records are processed. Identify which method in the Mapper you should use to implement code for reading the file and
populating the associative array?

1. combine

2. map

3. init

4. configure

Question : What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

1. You will not be able to compress the intermediate data.

2. You will longer be able to take advantage of a Combiner.

3. By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order.

4. There are no concerns with this approach. It is always advisable to use multiple reduces.

Question : You wrote a map function that throws a runtime exception when it encounters a control character in input data. The input supplied to your mapper contains twelve such
characters totals, spread across five file splits. The first four file splits each have two control characters and the last split has four control characters. Identify the number
of failed task attempts you can expect when you run the job with mapred.max.map.attempts set to 4:

1. You will have forty-eight failed task attempts

2. You will have seventeen failed task attempts

3. You will have five failed task attempts

4. You will have twelve failed task attempts

5. You will have twenty failed task attempts

Question : To process input key-value pairs, your mapper needs to lead a MB data file in memory. What is the best way to accomplish this?

1. Serialize the data file, insert in it the JobConf object, and read the data into memory in the setup method of the mapper.

2. Place the data file in the DistributedCache and read the data into memory in the map method of the mapper.

3. Place the data file in the DataCache and read the data into memory in the configure method of the mapper.

4. Place the data file in the DistributedCache and read the data into memory in the setup method of the mapper.