Question : Which of the following information can be captured using Framework level counters? A. CPU Statistics e.g. total time spent executing map and reduce tasks B. Garbage collect ion counter C. How much RAM was consumed by all tasks D. A,B E. A,B,C
1. CPU Statistics e.g. total time spent executing map and reduce tasks
Correct Answer : Get Lastest Questions and Answer : Explanation: Counters represent global counters, defined either by the MapReduce framework or applications. Each Counter can be of any Enum type. Counters of a particular Enum are bunched into groups of type Counters.Group.
Applications can define arbitrary Counters (of type Enum) and update them via Reporter.incrCounter(Enum, long) or Reporter.incrCounter(String, String, long) in the map and/or reduce methods. These counters are then globally aggregated by the framework.
Hadoop provides an option where a certain set of bad input records can be skipped when processing map inputs. Applications can control this feature through the SkipBadRecords class.
This feature can be used when map tasks crash deterministically on certain input. This usually happens due to bugs in the map function. Usually, the user would have to fix these bugs. This is, however, not possible sometimes. The bug may be in third party libraries, for example, for which the source code is not available. In such cases, the task never completes successfully even after multiple attempts, and the job fails. With this feature, only a small portion of data surrounding the bad records is lost, which may be acceptable for some applications (those performing statistical analysis on very large data, for example).
By default this feature is disabled. For enabling it, refer to SkipBadRecords.setMapperMaxSkipRecords(Configuration, long) and SkipBadRecords.setReducerMaxSkipGroups(Configuration, long).
With this feature enabled, the framework gets into 'skipping mode' after a certain number of map failures. For more details, see SkipBadRecords.setAttemptsToStartSkipping(Configuration, int). In 'skipping mode', map tasks maintain the range of records being processed. To do this, the framework relies on the processed record counter. See SkipBadRecords.COUNTER_MAP_PROCESSED_RECORDS and SkipBadRecords.COUNTER_REDUCE_PROCESSED_GROUPS. This counter enables the framework to know how many records have been processed successfully, and hence, what record range caused a task to crash. On further attempts, this range of records is skipped.
The number of records skipped depends on how frequently the processed record counter is incremented by the application. It is recommended that this counter be incremented after every record is processed. This may not be possible in some applications that typically batch their processing. In such cases, the framework may skip additional records surrounding the bad record. Users can control the number of skipped records through SkipBadRecords.setMapperMaxSkipRecords(Configuration, long) and SkipBadRecords.setReducerMaxSkipGroups(Configuration, long). The framework tries to narrow the range of skipped records using a binary search-like approach. The skipped range is divided into two halves and only one half gets executed. On subsequent failures, the framework figures out which half contains bad records. A task will be re-executed till the acceptable skipped value is met or all task attempts are exhausted. To increase the number of task attempts, use JobConf.setMaxMapAttempts(int) and JobConf.setMaxReduceAttempts(int).
Question : Select correct statement regarding Counters A. We can use Custom Counter in Both Reducer and Mapper for Count Bad Records or Outlier checks B. Counters can be incremented as well decremented in Mapper and Reducer C. Counter can be checked in Job History Server D. Counters are stored in the JobTracker Memory E. We can create maximum 100 Counters
Correct Answer : Get Lastest Questions and Answer : Explanation: Practically there is no limit on number of Counter. As these are stored in JobTracker memory. So Counter must be created based on JobTracker memory.
Question : What of the following information can be captured using existing counter? A. Total number of Bytes read and written B. Total number of Tasks Launched (Mapper + Reducer) C. CPU Consumed D. Memory Used E. Number of records which are ignored while Map Tasks 1. A,B,C,D 2. A,B,C,E 3. Access Mostly Uused Products by 50000+ Subscribers 4. A,B,D,E 5. B,C,D,E
Correct Answer : Get Lastest Questions and Answer : Explanation: To count anything regarding your Job Custom parameters, you have to create a Custom counter and it cannot be done , with Just built in counter.