Question : Select the recommended approach for setting memory parameters A. mapred.child.ulimit parameter be more than twice the heap size. B. io.sort.mb parameter must be less than the heap size. C. It is better to use environment variable to set JVM heap size. Instead of job specific parameters D. A,B E. A,C
1. mapred.child.ulimit parameter be more than twice the heap size.
2. io.sort.mb parameter must be less than the heap size.
Correct Answer : Get Lastest Questions and Answer : Explanation: 1. mapred.child.java.opts and mapred.job.map.memory.mb are different memory usage aspects. mapred.child.java.opts just gives the maximum heap size which child JVM can use; mapred.job.map.memory.mb is maximum virtual memory
allowed by a Hadoop task subprocess and can be larger than maprechild.java.opts because it also need store other memory stuff(stack, etc) except heap.
2. sometimes setting mapred.child.java.opts is not enough for these reasons: a. this property only consider heap size for each JVM, so it would not flexible(eg. if my mapper task wants more memory but my reducer only need a little memory); b. this property doesn't consider the situation of spawning new process from original task which are not constrained in their total memory, this situation may bring a huge memory usage affect for whole task process tree.
Hadoop gives us two choices for above disadvantages: one is by setting mapred.child.ulimit. This property which is strict upper bound could prevent single JVM process from leaking memory and affecting other running processes. But, this property also is not flexible and does't consider the spawned processes.
another choice is by setting mapred.cluster.map.memory.mb (The size, in terms of virtual memory, of a single map slot in the Map-Reduce framework, used by the scheduler. A job can ask for multiple slots for a single map task via mapred.job.map.memory.mb, up to the limit specified by mapred.cluster.max.map.memory.mb) and setting mapre.job.map.memory.mb (The size, in terms of virtual memory, of a single map task for the job. if this map task use more memory than this property, this task will be terminated).
-Xmx specify the maximum heap space of the allocated jvm. This is the space reserved for object allocation that is managed by the garbage collector. On the other hand, mapred.job.map.memory.mb specifies the maximum virtual memory allowed by a Hadoop task subprocess. If you exceed the max heap size, the JVM throws an OutOfMemoryException.
The JVM may use more memory than the max heap size because it also needs space to store object definitions (permgen space) and the stack. If the process uses more virtual memory than mapred.job.map.memory.mb it is killed by hadoop.
So one doesn't take precedence over the other (and they measure different aspects of memory usage), but -Xmx is a parameter to the JVM and mapred.job.map.memory.mb is a hard upper-bound of the virtual memory a task attempt can use, enforced by hadoop.
Question : Which of the following can help to improve performance of MapReduce job ?