Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following is/are responsibilities of InputFormat?

1. Validate the input files and directories that exist for the job

2. Split the input files into InputSplits

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: InputFormat : Hadoop relies on the input format of the job to do three things:
1. Validate the input configuration for the job (i.e., checking that the data is there).
2. Split the input blocks and files into logical chunks of type InputSplit, each of which is assigned to a map task for processing.
3. Access Mostly Uused Products by 50000+ Subscribers

Question : Select the correct statement regarding input split and block size

1. input split may be smaller the block size

2. input split may be larger the block size

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Block is the physical representation of data. Split is the logical representation of data present in Block.

Block and split size can be changed in properties.

Map reads data from Block through splits i.e. split act as a broker between Block and Mapper.

Consider two blocks:

Block 1

aa bb cc dd ee ff gg hh ii jj
Block 2

ww ee yy uu oo ii oo pp kk ll nn
Now map reads block 1 till aa to JJ and doesn't know how to read block 2 i.e. block doesn't know how to process different block of information. Here comes a
Split it will form a Logical grouping of Block 1 and Block 2 as single Block, then it forms offset(key) and line (value) using inputformat and record reader
and send map to process further processing.

If your resource is limited and you want to limit the number of maps you can increase the split size. For example: If we have 640 MB of 10 blocks i.e. each
block of 64 MB and resource is limited then you can mention Split size as 128 MB then then logical grouping of 128 MB is formed and only 5 maps will be
executed with a size of 128 MB.

If we specify split size is false then whole file will form one input split and processed by one map which it takes more time to process when file is big.

Question : Which of the following are the methods available in a InputFormat class and needs to be implemented?

1. getSplits()

2. createRecordReader()

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
Logically split the set of input files for the job.
Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be tuple.

RecordReader getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
throws IOException
Get the RecordReader for the given InputSplit.
It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the
individual task.

Related Questions

Question : What are map files and why are they important?

1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware"
2. Map files are the files that show how the data is distributed in the Hadoop cluster.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Map files are sorted sequence files that also have an index. The index allows fast data look up.

Question : Let's assume you have following files in the hdfs directory called merge.
Test1.txt
hadoopexam.com Hadoop Training 1
Test2.txt
www.hadoopexam.com Hadoop YARN Training
Test3.txt
http://hadoopexam.com Amazon WebService Training
Now you run the following command
hadoop fs -getmerge -nl merge/ output2.txt
What is the content in the output2.txt file

1.
hadoopexam.com Hadoop Training 1
www.hadoopexam.com Hadoop YARN Training
http://hadoopexam.com Amazon WebService Training

2.

hadoopexam.com Hadoop Training 1

www.hadoopexam.com Hadoop YARN Training

http://hadoopexam.com Amazon WebService Training

3. Access Mostly Uused Products by 50000+ Subscribers
4. www.hadoopexam.com Hadoop YARN Traininghadoopexam.com Hadoop Training 1http://hadoopexam.com Amazon WebService Training

Question : In the regular WordCount MapReduce example, you have following driver code

public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount");
job.setJarByClass(WordCount.class);
job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class);
job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
}}

Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount

Select the correct statement from below.

1. It will run 2 Mapper and 2 Reducer
2. It will run 2 Reducer, but number of Mapper is not known.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There is not enough information to tell number of reducer.

Question : In the regular WordCount MapReduce example, you have following driver code.
public class WordCount extends Configured implements Tool {
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new WordCount(), args);
System.exit(res);
}
public int run(String[] args) throws Exception {
Path inputPath = new Path("shakespeare1");
Path outputPath = new Path(""+System.currentTimeMillis());
Configuration conf = getConf();
Job job = new Job(conf, this.getClass().toString());
FileInputFormat.setInputPaths(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);
job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class);
job.setJobName("Word Count");
job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(2);
return job.waitForCompletion(true) ? 0 : 1;
} }
Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class.
hadoop jar wc.jar WordCount -D mapred.reduce.tasks=3
Select the correct statement from below.

1. It will run 3 Reducer as command line option would be preferred
2. It will run 2 reducers as driver code has defined number of reducer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of reducer can not be determined command line and driver configuration is just a hint

Question : You are running the regular WordCount example with the Mapper and Reducer defined in a separate class. Now you have files
in a directory from which you want to count number of words.
Out of these 4 files, 3 files has 1 line in each file and 4th file has 0 lines.
Now you run the wordcount job, then how many Mapper will be executed (Assuming you are running on a single node)?

1. Only 1 Mapper as it is a single node cluster
2. 3 Mapper, only for the files which has the data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Number of Mapper is non-deterministic

Question : Please select the correct features for the HDFS

1. Files in HDFS can concurrently updated and read
2. Files in HDFS can concurrently updated
3. Access Mostly Uused Products by 50000+ Subscribers
4. Files in HDFS cannot be concurrently read