Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following information can be captured using Framework level counters?
A. CPU Statistics e.g. total time spent executing map and reduce tasks
B. Garbage collect ion counter
C. How much RAM was consumed by all tasks
D. A,B
E. A,B,C

1. CPU Statistics e.g. total time spent executing map and reduce tasks

2. Garbage collect ion counter

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Counters represent global counters, defined either by the MapReduce framework or applications. Each Counter can be of any Enum
type. Counters of a particular Enum are bunched into groups of type Counters.Group.

Applications can define arbitrary Counters (of type Enum) and update them via Reporter.incrCounter(Enum, long) or Reporter.incrCounter(String, String, long)
in the map and/or reduce methods. These counters are then globally aggregated by the framework.

Hadoop provides an option where a certain set of bad input records can be skipped when processing map inputs. Applications can control this feature through
the SkipBadRecords class.

This feature can be used when map tasks crash deterministically on certain input. This usually happens due to bugs in the map function. Usually, the user
would have to fix these bugs. This is, however, not possible sometimes. The bug may be in third party libraries, for example, for which the source code is
not available. In such cases, the task never completes successfully even after multiple attempts, and the job fails. With this feature, only a small portion
of data surrounding the bad records is lost, which may be acceptable for some applications (those performing statistical analysis on very large data, for
example).

By default this feature is disabled. For enabling it, refer to SkipBadRecords.setMapperMaxSkipRecords(Configuration, long) and
SkipBadRecords.setReducerMaxSkipGroups(Configuration, long).

With this feature enabled, the framework gets into 'skipping mode' after a certain number of map failures. For more details, see
SkipBadRecords.setAttemptsToStartSkipping(Configuration, int). In 'skipping mode', map tasks maintain the range of records being processed. To do this, the
framework relies on the processed record counter. See SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS and SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS. This
counter enables the framework to know how many records have been processed successfully, and hence, what record range caused a task to crash. On further
attempts, this range of records is skipped.

The number of records skipped depends on how frequently the processed record counter is incremented by the application. It is recommended that this counter
be incremented after every record is processed. This may not be possible in some applications that typically batch their processing. In such cases, the
framework may skip additional records surrounding the bad record. Users can control the number of skipped records through
SkipBadRecords.setMapperMaxSkipRecords(Configuration, long) and SkipBadRecords.setReducerMaxSkipGroups(Configuration, long). The framework tries to narrow
the range of skipped records using a binary search-like approach. The skipped range is divided into two halves and only one half gets executed. On
subsequent failures, the framework figures out which half contains bad records. A task will be re-executed till the acceptable skipped value is met or all
task attempts are exhausted. To increase the number of task attempts, use JobConf.setMaxMapAttempts(int) and JobConf.setMaxReduceAttempts(int).

Question : Select correct statement regarding Counters
A. We can use Custom Counter in Both Reducer and Mapper for Count Bad Records or Outlier checks
B. Counters can be incremented as well decremented in Mapper and Reducer
C. Counter can be checked in Job History Server
D. Counters are stored in the JobTracker Memory
E. We can create maximum 100 Counters

1. A,B,C,D
2. A,B,C,E
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,D,E
5. B,C,D,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: Practically there is no limit on number of Counter. As these are stored in JobTracker memory. So Counter must be created based on JobTracker memory.

Question : What of the following information can be captured using existing counter?
A. Total number of Bytes read and written
B. Total number of Tasks Launched (Mapper + Reducer)
C. CPU Consumed
D. Memory Used
E. Number of records which are ignored while Map Tasks

1. A,B,C,D
2. A,B,C,E
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,D,E
5. B,C,D,E

Correct Answer : Get Lastest Questions and Answer :
Explanation: To count anything regarding your Job Custom parameters, you have to create a Custom counter and it cannot be done , with Just built in counter.

Related Questions

Question : Using Streaming will help

1. Always improving performance

2. You can write program in Python or sed/awk

3. Access Mostly Uused Products by 50000+ Subscribers

4. It always impact the performance

Question : Select correct statement regarding Hadoop Streaming

1. Framework still creates JVMs for tasks

2. Scripted programs may run more slowly. In this Streaming may reduce performance

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Please put below in Order of processing (Hadoop Streaming)
A. PipeMap task processes input from your input files/directories and passes them to your script as standard input.
B. map function processes key-value pairs one record at a time in an input split (just as in a normal MapReduce job).
C. write your output to standard output which is wired into the standard input of the PipeMap task
D. PipeMap task then processes intermediate results from your map function, and the Hadoop framework sorts and shuffles the data to the reducers.
E. PipeReduce sends these intermediate results to its standard out, which is wired to the standard input of your reduce script.
F. reduce script processes a record from standard input, it may write to its standard output (which is wired to the PipeReduce standard input.)
G. PipeReduce program will then collect all the output and write to the output directory.

1. B,E,F,G ,A,C,D
2. A,B,C,D,E,F,G
3. Access Mostly Uused Products by 50000+ Subscribers
4. E,F,A,B,C,D,G
5. D,E,F,G,A,B,C

Question : You have below code for Hadoop Streaming

hadoop jar $THEJARFILE \
-input file:///etc/passwd
-output streamOut0 \
-mapper '/bin/cat' \
-reducer '/bin/cat'

In above Job

1. /bin/cat command as a Mapper

2. /et/passwd is a input file

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : In Streaming process reducer will

1. received keys in sorted order along with their associated values, one value at a time

2. received keys in sorted order along with their associated values, all associated values at once

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,3

Question : Please map the following, with regards to Streaming

A. key.value.separator.in.input.line
B. stream.map.input.field.separator
C. stream.map.output.field.separator
D. mapred.texoutputformat.separator

1. output field separator in the map
2. Setting key-value separator (default value is \t)
3. Access Mostly Uused Products by 50000+ Subscribers
4. configure the key-value separator in the reducer output files

1. A-1, B-3, C-2, D-4
2. A-3, B-1, C-2, D-4
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-1, B-3, C-4, D-2
5. A-2, B-3, C-1, D-4