Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Using Streaming will help

1. Always improving performance

2. You can write program in Python or sed/awk

3. Access Mostly Uused Products by 50000+ Subscribers

4. It always impact the performance

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Question : Select correct statement regarding Hadoop Streaming

1. Framework still creates JVMs for tasks

2. Scripted programs may run more slowly. In this Streaming may reduce performance

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: You can use streaming for either rapid prototyping using sed/awk, or for full-blown MapReduce deployments. Note that the streaming feature does not include C++ programs "
these are supported through a similar feature called pipes.

Be aware that streaming may introduce some performance penalty:

Framework still creates JVMs for tasks
Scripted programs may run more slowly
Streaming may improve performance for example:

Code doing map and reduce functions may perform better than Java

Question : Please put below in Order of processing (Hadoop Streaming)
A. PipeMap task processes input from your input files/directories and passes them to your script as standard input.
B. map function processes key-value pairs one record at a time in an input split (just as in a normal MapReduce job).
C. write your output to standard output which is wired into the standard input of the PipeMap task
D. PipeMap task then processes intermediate results from your map function, and the Hadoop framework sorts and shuffles the data to the reducers.
E. PipeReduce sends these intermediate results to its standard out, which is wired to the standard input of your reduce script.
F. reduce script processes a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.)
G. PipeReduce program will then collect all the output and write to the output directory.

1. B,E,F,G ,A,C,D
2. A,B,C,D,E,F,G
3. Access Mostly Uused Products by 50000+ Subscribers
4. E,F,A,B,C,D,G
5. D,E,F,G,A,B,C

Correct Answer : Get Lastest Questions and Answer :
Explanation: : The PipeMap task processes input from your input files/directories and passes them to your script as standard input. Your map
function processes key-value pairs one record at a time in an
input split (just as in a normal MapReduce job).
You write your output to standard output which is wired into the standard input of the PipeMap task. The PipeMap task then processes intermediate results
from your map function,
and the Hadoop framework sorts and shuffles the data to the reducers.

The same data flow mechanism occurs now on the reduce side. PipeReduce sends these intermediate results to its standard out, which is wired to the standard
input of your reduce script.
After your reduce script processes
a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.) The PipeReduce program will then
collect all the output
and write to the output directory.

Related Questions

Question : Which Daemon distributes individual task to machines

1. TaskTracker
2. JobTracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. NameNode
Solution : 15

Question : You are using MapR Hadoop platform for Analyzing logs of WebServers you collected from Amazon Webservice.
However, you are not able to decide which one of the Pig or MapReduce you should use for this analysis. Now cpnsidering this scenario you have been asked to
find most accurately describes the relationship between MapReduce and Pig, Which opion you will see is more accurate?

1. Pig is more powerfull and allow certain types of data manipulation not possible with MapReduce.
2. Pig provides has same capabilities as MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs and Chaining the MapReduce jobs, which is not possible
only with the MapReduce.

Question :You have a WebSite called Quicktechie.com hosted on AWS (Amazon web service Cloud). You have been collecting your logs using Apache Flume and storing same in the
HDFS file system. However, while storing logs, you do the followings.
1. Remove all new line character from each log files (Size of individiual log file is not less than 200MB).
2. Append all the log files from same server together, however while appending log files you add new line character between two log files.
All the log files are in Text format and each newly created log file size (After appending) is not less than 100GB.
You selected TextInputFormat in your MapReduce job to read the logs data
for further processing. While log file processing, splits are representing data in HDFS. Select correct statement for split and new line character. ?

1. Input file splits may cross line breaks. A line that crosses tile splits is ignored.
2. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.

Question : In a MapReduce job, you want each of you input files processed by a single map task.
How do you configure a MapReduce job so that a single map task processes each input
file regardless of how many blocks the input file occupies?

1. Increase the parameter that controls minimum split size in the job configuration.
2. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Write a custom FileInputFormat and override the method isSplittable to always return false.

Question : Given a directory of files with the following structure: line number, tab character, string: Example:
1. AvilakaurKohni
2. Decemberandaugust
3. Access Mostly Uused Products by 50000+ Subscribers
You want to send each line as one record to your Mapper.
Which InputFormat would you use to complete the line: setInputFormat (________.class);

1. BDBInputFormat
2. KeyValueTextInputFormat
3. Access Mostly Uused Products by 50000+ Subscribers
4. SequenceFileAsTextInputFormat

Question : What is a SequenceFile?

1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable objects.
2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable objects.
3. Access Mostly Uused Products by 50000+ Subscribers
4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs.
Each key must be the same type. Each value must be same type.