Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question :

Map wordCountMap = new Map(String, List(String>>(); //It holds each word as a key and all the same words are in the list
In a word count Mapper class, you are emitting key value pair as
Case 1 : context.write("word, IntWritable(1))
and

Case 2 : context.write("word, IntWritable(wordCountMap.get("word").size())) " ,

Select the correct statement from above example code snippet


  :
1. In both the cases consumption of network bandwidth would be same
2. In Case 1 Network bandwidth consumption would be low
3. Access Mostly Uused Products by 50000+ Subscribers
4. Cannot be determined

Correct Answer : Get Lastest Questions and Answer :

Explanation: In case 2, you are locally counting the words on each Mapper hence, data transfer over the network would be low.




Question : Suppose you have the file in hdfs directory as below
/myapp/map.zip

And you will use the following API method to add this file to DistributedCache

JobConf job = new JobConf();
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);

Which of the best place to read this file in a MapReduce job

  : Suppose you have the file in hdfs directory as below
1. Inside the map() method of the Mapper
2. You can randomly read this file as needed in the Mapper code
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above statement are correct

Correct Answer : Get Lastest Questions and Answer :

Explanation: You should read the file in the configure() method so that, files can be loaded only once and not loaded each time a map() method is called.




Question : You have added the below files in Distributed cache

JobConf job = new JobConf();
DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),
job);
DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job);
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job);
DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job);

Which of the following is a correct method to get all the paths in an Array of the Distributed Cache files


  : You have added the below files in Distributed cache
1. 1. Iterate over the DistributedCache instance in the Mapper and add all the cached file paths to an array.
2. 2. There is a direct method available on the DistributedCache.getAllFilePath()
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4. All of the above


Correct Answer : Get Lastest Questions and Answer :



Related Questions


Question : In a MapReduce job with map tasks, how many map task attempts will there be?
A. It depends on the number of reduces in the job.
  : In a MapReduce job with  map tasks, how many map task attempts will there be?
1. Between 500 and 1000.
2. At most 500.
3. At least 500.
4. Exactly 500.


Question : Which HDFS command uploads a local file X into an existing HDFS directory Y?
  : Which HDFS command uploads a local file X into an existing HDFS directory Y?
1. hadoop scp X Y
2. hadoop fs -localPut X Y
3. hadoop fs-put X Y
4. hadoop fs -get X Y


Question : Which one of the following files is required in every Oozie Workflow application?
  : Which one of the following files is required in every Oozie Workflow application?
1. job.properties
2. Config-default.xml
3. Workflow.xml
4. Oozie.xml


Question : You want to perform analysis on a large collection of images. You want to store this data in
HDFS and process it with MapReduce but you also want to give your data analysts and
data scientists the ability to process the data directly from HDFS with an interpreted high-level
programming language like Python. Which format should you use to store this data in
HDFS?

  : You want to perform analysis on a large collection of images. You want to store this data in
1. SequenceFiles
2. Avro
3. JSON
4. XML
5. CSV


Question : Which best describes how TextInputFormat processes input files and line breaks?

  : Which best describes how TextInputFormat processes input files and line breaks?
1. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.
2. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line.
3. The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines.
4. Input file splits may cross line breaks. A line that crosses file splits is ignored.
5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.


Question : For each intermediate key, each reducer task can emit:
  : For each intermediate key, each reducer task can emit:
1. As many final key-value pairs as desired. There are no restrictions on the types of those key-value pairs (i.e., they can be heterogeneous).
2. As many final key-value pairs as desired, but they must have the same type as the intermediate key-value pairs.
3. As many final key-value pairs as desired, as long as all the keys have the same type and all the values have the same type.
4. One final key-value pair per value associated with the key; no restrictions on the type.
5. One final key-value pair per key; no restrictions on the type.