Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In a MapReduce job, you want each of you input files processed by a single map task.
How do you configure a MapReduce job so that a single map task processes each input
file regardless of how many blocks the input file occupies?

1. Increase the parameter that controls minimum split size in the job configuration.
2. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Write a custom FileInputFormat and override the method isSplittable to always return false.

Correct Answer : Get Lastest Questions and Answer :

Explanation: When isSplitable returns false only a single mapper processes the entire file. The mapper can emit any number of KV pairs.
Subclasses of FileInputFormat can also override the isSplitable(FileSystem, Path) method to ensure input-files are not split-up and are processed as a whole by Mappers.
API Describe as below
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. FileInputFormat implementations can override this and return false to ensure
that individual input files are never split-up so that Mappers process entire files.

Question : Given a directory of files with the following structure: line number, tab character, string: Example:
1. AvilakaurKohni
2. Decemberandaugust
3. Access Mostly Uused Products by 50000+ Subscribers
You want to send each line as one record to your Mapper.
Which InputFormat would you use to complete the line: setInputFormat (________.class);

1. BDBInputFormat
2. KeyValueTextInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. SequenceFileAsTextInputFormat

Correct Answer : Get Lastest Questions and Answer :
Exp: KeyValueTextInputFormat
TextInputFormats keys, being simply the offset within the file, are not normally very useful.It is common for each line in a file to be a key value pair, separated by
a delimiter such as a tab character. For example, this is ths output produced by TextOutputFormat. Hadoop File System default output format. To interpret such files correctly,
KeyValueTextInputFormat is appropriate.

You can specify the separator via the mapreduce.input.keyvaluelinerecordreader.key.value.separator property or key.value.separator.in.input.line in the old API
It is a tab character by default. Consider the following input file, where space represent a horizontal tab character
line1 On the top of the Crumpetty Tree
line2 The Quangle Wangle sat,
line3 But his face you could not see,
line4 On account of his Beaver Hat.
Like in the TextInputFormat case, the input is in a single split comprising four records,although this time the keys are the Text sequences before the tab in each line:
(line1, On the top of the Crumpetty Tree)
(line2, The Quangle Wangle sat,)
(line3, But his face you could not see,)
(line4, On account of his Beaver Hat.)
SequenceFileInputFormat
To use data from sequence files as the input to MapReduce, you use SequenceFileInputFormat. The keys and values are determined by the sequence file, and you need to
make sure that your map input types correspond

Question : What is a SequenceFile?

1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects.
2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects.
3. Access Mostly Uused Products by 50000+ Subscribers
4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs.
Each key must be the same type. Each value must be same type.

Correct Answer : Get Lastest Questions and Answer :

Related Questions

Question : Using Streaming will help

1. Always improving performance

2. You can write program in Python or sed/awk

3. Access Mostly Uused Products by 50000+ Subscribers

4. It always impact the performance

Question : Select correct statement regarding Hadoop Streaming

1. Framework still creates JVMs for tasks

2. Scripted programs may run more slowly. In this Streaming may reduce performance

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Please put below in Order of processing (Hadoop Streaming)
A. PipeMap task processes input from your input files/directories and passes them to your script as standard input.
B. map function processes key-value pairs one record at a time in an input split (just as in a normal MapReduce job).
C. write your output to standard output which is wired into the standard input of the PipeMap task
D. PipeMap task then processes intermediate results from your map function, and the Hadoop framework sorts and shuffles the data to the reducers.
E. PipeReduce sends these intermediate results to its standard out, which is wired to the standard input of your reduce script.
F. reduce script processes a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.)
G. PipeReduce program will then collect all the output and write to the output directory.

1. B,E,F,G ,A,C,D
2. A,B,C,D,E,F,G
3. Access Mostly Uused Products by 50000+ Subscribers
4. E,F,A,B,C,D,G
5. D,E,F,G,A,B,C

Question : You have below code for Hadoop Streaming

hadoop jar $THEJARFILE \
-input file:///etc/passwd
-output streamOut0 \
-mapper '/bin/cat' \
-reducer '/bin/cat'

In above Job

1. /bin/cat command as a Mapper

2. /et/passwd is a input file

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : In Streaming process reducer will

1. received keys in sorted order along with their associated values, one value at a time

2. received keys in sorted order along with their associated values, all associated values at once

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,3

Question : Please map the following, with regards to Streaming

A. key.value.separator.in.input.line
B. stream.map.input.field.separator
C. stream.map.output.field.separator
D. mapred.texoutputformat.separator

1. output field separator in the map
2. Setting key-value separator (default value is \t)
3. Access Mostly Uused Products by 50000+ Subscribers
4. configure the key-value separator in the reducer output files

1. A-1, B-3, C-2, D-4
2. A-3, B-1, C-2, D-4
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-1, B-3, C-4, D-2
5. A-2, B-3, C-1, D-4