Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : When you change the size of inputSplit with configuration, what would be impact?

1. It will change number of reducer for a particular job processing this file.

2. It will change number of mapper for a particular job processing this file.

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Adjusting inputSplit size smaller or larger than the block size will influence the number of mappers that are launched in the job
because one mapper is
instantiated per input split.

Question : Select correct statement regarding inputSplit

1. Last record of inputsplit will always be a complete record

2. Last record of inputsplit may be complete or incomplete

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Block is the physical representation of data. Split is the logical representation of data present in Block.

Block and split size can be changed in properties.

Map reads data from Block through splits i.e. split act as a broker between Block and Mapper.

Consider two blocks:

Block 1

aa bb cc dd ee ff gg hh ii jj
Block 2

ww ee yy uu oo ii oo pp kk ll nn
Now map reads block 1 till aa to JJ and doesn't know how to read block 2 i.e. block doesn't know how to process different block of information. Here comes a
Split it will form a Logical grouping of Block 1 and Block 2 as single Block, then it forms offset(key) and line (value) using inputformat and record reader
and send map to process further processing.

If your resource is limited and you want to limit the number of maps you can increase the split size. For example: If we have 640 MB of 10 blocks i.e. each
block of 64 MB and resource is limited then you can mention Split size as 128 MB then then logical grouping of 128 MB is formed and only 5 maps will be
executed with a size of 128 MB. L record of an input split may be incomplete, as may be the first record of an input split. Processing whole records is the
responsibility of the RecordReader.

If we specify split size is false then whole file will form one input split and processed by one map which it takes more time to process when file is big.

Question : What is the purpose of "CombineFileSplit" ?

1. combines multiple files into a single split

2. combines multiple splits into a single split

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: A sub-collection of input files. Unlike FileSplit, CombineFileSplit class does not represent a split of a file, but a split of
input files into smaller
sets. A split may contain blocks from different file but all the blocks in the same split are probably local to some rack
CombineFileSplit can be used to implement RecordReader's, with reading one record per file.

Related Questions

Question :

Map wordCountMap = new Map(String, List(String>>(); //It holds each word as a key and all the same words are in the list
In a word count Mapper class, you are emitting key value pair as
Case 1 : context.write("word, IntWritable(1))
and

Case 2 : context.write("word, IntWritable(wordCountMap.get("word").size())) " ,

Select the correct statement from above example code snippet

1. In both the cases consumption of network bandwidth would be same
2. In Case 1 Network bandwidth consumption would be low
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cannot be determined

Question : Suppose you have the file in hdfs directory as below
/myapp/map.zip

And you will use the following API method to add this file to DistributedCache

JobConf job = new JobConf();
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

Which of the best place to read this file in a MapReduce job

1. Inside the map() method of the Mapper
2. You can randomly read this file as needed in the Mapper code
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above statement are correct

Question : You have added the below files in Distributed cache

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
job);
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

Which of the following is a correct method to get all the paths in an Array of the Distributed Cache files

1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array.
2. 2. There is a direct method available on the DistributedCache.getAllFilePath()
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4. All of the above

Question :

Which of the given is a correct code snippet of the Mapper,
for implementing word count example.

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers

Question : Select the correct statement while reading/writing the data in RDBMS using MapReduce

1. In order to use DBInputFormat you need to write a class that deserializes the columns from the database record into individual data fields to work with
2. The DBOutputFormat writes to the database by generating a set of INSERT statements in each reducer
3. Access Mostly Uused Products by 50000+ Subscribers
4. If you want to export a very large volume of data, you may be better off generating the INSERT statements into a text file, and then using a bulk data import tool provided by your database to do the database import.
5. All of the above

Question :

You have following data in a hive table

ID:INT,COLOR:TEXT,WIDTH:INT
1,green,190
2,blue,300
3,yellow,299
4,blue,199
5,green,199
6,yellow,299
7,green,799
8,red,800

Select the correct MapReduce program which can produce the output similar to below Hive Query.

Select `(green|blue)?+.+` from table;

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4