Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : In General, which of the following will help us to improve the MapReduce job performance, with regards to Circular Buffer?

1. Increasing the size of Circular Buffer

2. Reducing number of Spills of Circular Buffer

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2
5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Spilling map output to the disk multiple times (before the final spill) can lead to additional overhead of reading and merging of
the spilled records.
From "Hadoop the definitive guide" [Each map task has a circular memory buffer that it writes the output to. The buffer is 100 MB by default, a size that can
be tuned by changing the io.sort.mb property. When the contents of the buffer reaches a certain threshold size (io.sort.spill.percent, which has the default
0.80, or 80%), a background thread will start to spill the contents to disk]

If possible eliminate all intermediate spills and just spill the final output

Question : Select correct statement regarding Circular buffer and spilling of these buffers

1. When circular buffer reaches 80% (or any configured size). It will first sent data to sort by key, if combiner is configured, it will also executed.

2. By default 10 Spills can again merge, after spills

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Each Map task outputs data in the form of Key/Value pair.
mapreduce.tasktracker.map.tasks.maximum: 8
The maximum number of map tasks that will be run simultaneously by a task tracker
mapreduce.map.memory.mb: 128
The amount of memory to request from the scheduler for each map task.
The output is stored in a Ring Buffer rather than being written directly to the disk.
When the Ring Buffer reaches 80% capacity, the content is "spilled" to disk.
This process will create multiple files on the datanode (shuffle spill files).
mapreduce.map.sort.spill.percent: 0.80
The soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not
block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5
Hadoop will merge all the spill files on a given datanode into a single file
This single file is both sorted and partitioned based on number of reducers.
mapreduce.task.io.sort.mb: 512
The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.
mapreduce.task.io.sort.factor: 64
The number of streams to merge at once while sorting files. This determines the number of open file handles.
mapreduce.reduce.shuffle.input.buffer.percent: 0.70
The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle.
mapreduce.reduce.input.buffer.percent: 0.70
The percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map
outputs in memory must consume less than this threshold before the reduce can begin.
mapreduce.reduce.shuffle.parallelcopies: 128
The default number of parallel transfers run by reduce during the copy(shuffle) phase.
mapreduce.reduce.memory.mb: 1024
The amount of memory to request from the scheduler for each reduce task.
mapreduce.reduce.shuffle.merge.percent: 0.66
The usage threshold at which an in-memory merge will be initiated, expressed as a percentage of the total memory allocated to storing in-memory map outputs,
as defined by mapreduce.reduce.shuffle.input.buffer.percent.

Circular Buffer

The Ring Buffer (aka Circular Buffer) is a key concept in the MapReduce ecosystem.

We have two major challenges in any map/reduce program:

We are dealing with a massive amount of data
If this isn't true, we don't need to use map/reduce
The result of the map tasks can not be constantly written to disk
This would be too slow
Nor can it be stored entirely within memory
Most systems would not have a sufficient amount of memory

We have to use a combination of disks/memory efficiently.

The circular buffer is fast. Writing to memory is much faster than doing an I/O to disk. Flushing the data is only performed when needed.

Continuous logging can fill up space on the systems, causing other programs to also run out of space and fail. In such cases, either logs have to be manually removed or a log rotation policy has to be implemented.

Question : Please map the followings

A. mapred.map.child.java.opts
B. mapred.reduce.child.java.opts
C. mapred.child.java.opts
D. mapred.child.ulimit

1. Maximum size of virtual memory consumed by a task and its children.
2. Applies to Map Tasks
3. Access Mostly Uused Products by 50000+ Subscribers
4. Applies to both Map and Reduce tasks

1. A-1, B-2 , C-3, D-4
2. A-2, B-3 , C-1, D-4
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-2, B-3 , C-4, D-1
5. A-3, B-2 , C-1, D-4

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Related Questions

Question : In the Job class ?

1. Create a Job instance

2. You submit the Job

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : You have submitted the Job and then you call a setXXX() method on that job instance, what will happen ?

1. It will set new values on submitted job and apply on runtime

2. It will set new values and will be applied only in the Mapper and Reducer which yet to start

3. Access Mostly Uused Products by 50000+ Subscribers

4. It will not through any error and silently discard new set value

Question : Which of the following is true?

1. Both submit() and waitForCompletion() methods are blocking call

2. Both submit() and waitForCompletion() methods are non-blocking call

3. Access Mostly Uused Products by 50000+ Subscribers

4.

Question : Using ToolRunner allows you to make use of the GenericOptionsParser , which help us

1. to pass Hadoop Options

2. to pass command-line arguments

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Put in correct order by which Hadoop Configuration can be override with priority from lowest to highest
A. Hadoop Framework JarFile (In built in framework)
B. Global XML file
C. Local XML file
D. Command-line arguments
E. Within the Driver class

1. C,D,E,B,A
2. B,A,D,E,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. E,C,A,B,D
5. A, B, C,D,E

Question : Which all are types of Counter available in MapReduce framework
A. File-System level
B. Job level
C. Framework level
D. Custom counter

1. A,B,C
2. B,C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C,D