Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In the reducer, the MapReduce API provides you with an iterator over Writable values.
What does calling the next () method return?

1. It returns a reference to a different Writable object time.
2. It returns a reference to a Writable object from an object pool.
3. Access Mostly Uused Products by 50000+ Subscribers
4. It returns a reference to a Writable object. The API leaves unspecified whether this is a reused object or a new object.
5. It returns a reference to the same Writable object if the next value is the same as the previous value, or a new Writable object otherwise.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Calling Iterator.next() will always return the SAME EXACT instance of
IntWritable, with the contents of that instance replaced with the next value

Question : MapReduce v (MRv/YARN) splits which major functions of the JobTracker into separate daemons? Select two.
A. Heath states checks (heartbeats)
B. Resource management
C. Job scheduling/monitoring
D. Job coordination between the ResourceManager and NodeManager
E. Launching tasks
F. Managing file system metadata
G. MapReduce metric reporting
H. Managing tasks

1. B,C
2. A,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. C,H
5. B,G

Correct Answer : Get Lastest Questions and Answer :
Explanation: The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate
daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-
Reduce jobs or a DAG of jobs.

The central goal of YARN is to clearly separate two things that are unfortunately smushed
together in current Hadoop, specifically in (mainly) JobTracker:
/ Monitoring the status of the cluster with respect to which nodes have which resources
available. Under YARN, this will be global.
/ Managing the parallelization execution of any specific job. Under YARN, this will be done
separately for each job.

Question : For each input key-value pair, mappers can emit:

1. As many intermediate key-value pairs as designed. There are no restrictions on the
types of those key-value pairs (i.e., they can be heterogeneous).
2. As many intermediate key-value pairs as designed, but they cannot be of the same type
as the input key-value pair.
3. Access Mostly Uused Products by 50000+ Subscribers
4. One intermediate key-value pair, but of the same type.
5. As many intermediate key-value pairs as designed, as long as all the keys have the
same types and all the values have the same type.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Mapper maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks that transform input records into intermediate records. The
transformed intermediate records do not need to be of the same type as the input records.
A given input pair may map to zero or many output pairs.

Related Questions

Question : What are map files and why are they important?

1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware"
2. Map files are the files that show how the data is distributed in the Hadoop cluster.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Map files are sorted sequence files that also have an index. The index allows fast data look up.

Question : Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1
Test2.txt
www.hadoopexam.com Hadoop YARN Training
Test3.txt
http://hadoopexam.com Amazon WebService Training
Now you run the following command
hadoop fs -getmerge -nl merge/ output2.txt
What is the content in the output2.txt file

1.
hadoopexam.com Hadoop Training 1
www.hadoopexam.com Hadoop YARN Training
http://hadoopexam.com Amazon WebService Training

2.

hadoopexam.com Hadoop Training 1

www.hadoopexam.com Hadoop YARN Training

http://hadoopexam.com Amazon WebService Training

3. Access Mostly Uused Products by 50000+ Subscribers
4. www.hadoopexam.com Hadoop YARN Traininghadoopexam.com Hadoop Training 1http://hadoopexam.com Amazon WebService Training

Question : In the regular WordCount MapReduce example, you have following driver code

public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount");
job.setJarByClass(WordCount.class);
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
}}

Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount

Select the correct statement from below.

1. It will run 2 Mapper and 2 Reducer
2. It will run 2 Reducer, but number of Mapper is not known.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There is not enough information to tell number of reducer.

Question : In the regular WordCount MapReduce example, you have following driver code.
public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
} }
Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount -D mapred.reduce.tasks=3
Select the correct statement from below.

1. It will run 3 Reducer as command line option would be preferred
2. It will run 2 reducers as driver code has defined number of reducer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of reducer can not be determined command line and driver configuration is just a hint

Question : You are running the regular WordCount example with the Mapper and Reducer defined in a separate class. Now you have files
in a directory from which you want to count number of words.
Out of these 4 files, 3 files has 1 line in each file and 4th file has 0 lines.
Now you run the wordcount job, then how many Mapper will be executed (Assuming you are running on a single node)?

1. Only 1 Mapper as it is a single node cluster
2. 3 Mapper, only for the files which has the data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of Mapper is non-deterministic

Question : Please select the correct features for the HDFS

1. Files in HDFS can concurrently updated and read
2. Files in HDFS can concurrently updated
3. Access Mostly Uused Products by 50000+ Subscribers
4. Files in HDFS cannot be concurrently read