Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Select correct statements from below

1. reduce phase does not start until all map tasks complete

2. In general, it is re-commended, we should not enable speculative execution

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: Speculative execution can be done in both Map and Reduce phase.

Question : In which of the Scenario, we should enable JVM re-use ?

1. When there is long running TaskTracker and JobTracker

2. If we have small number of Map tasks and reduce tasks

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. None of 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: We should enable JVM re-use only in case , when there are large number of very small either map tasks or reduce tasks. For small
number of tasks JVM re-use is not useful, because it will not give good performance improvement as well as heap memory issue can arise.

Question : Select correct statement regarding JVM re-use?

1. There is a parameter named mapred.job.reuse.jvm.num.tasks to configure JVM re-use

2. if we set mapred.job.reuse.jvm.num.tasks to -1 , unlimited number of tasks can be executed on this JVM

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Correct Answer : Get Lastest Questions and Answer :
Explanation: If you have very small tasks that are definitely running after each other, it is useful to set this property to -1 (meaning that a
spawned JVM will be reused unlimited times).

So you just spawn (number of task in your cluster available to your job)-JVMs instead of (number of tasks)-JVMs.

This is a huge performance improvement. In long running jobs the percentage of the runtime in comparison to setup a new JVM is very low, so it doesn't give
you a huge performance boost.

Also in long running tasks it is good to recreate the task process, because of issues like heap fragmentation degrading your performance.

In addition, if you have some mid-time-running jobs, you could reuse just 2-3 of the tasks, having a good trade-off.

Related Questions

Question : Which of the following methods of the Mapper class is/are called?

1. setup()

2. map()

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Map the following
A. setup()
B. map()
C. cleanup()

1. once for each record
2. once for each Mapper/split
3. Access Mostly Uused Products by 50000+ Subscribers

1. A-1, B-2, C-3
2. A-3, B-1, C-31
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-2, B-3, C-1
5. A-3, B-2, C-1

Question : You have written a MapReduce job. You open a connection to HBASE and read data from it. Which is the write place to close HBase connection?

1. IN setup() method of a Mapper class

2. At the end of map() method of a Mapper class

3. Access Mostly Uused Products by 50000+ Subscribers

4. 2 and 3 both are correct

Question : You have defined Mapper class as below
public class HadoopExamMapper extends Mapper{
public void map(XXXXX key, YYYYY value, Context)
}
What is the correct replacement for XXXXX and YYYYY

1. LongWritable, Text

2. LongWritable, IntWritable

3. Access Mostly Uused Products by 50000+ Subscribers

4. IntWritable, Text

Question : Which of the following is a correct statement regarding Input key and Value for the Reducer class

1. Both input key and value type of Reducer must match the output key and value type of a defined Mapper class

2. The output key class and output value class in the Reducer must match those defined in the job configuration

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,3

5. 1,2

Question : You have following reducer class defined
public class HadoopExamReducer extends Reducer {
public void reduce(XXXXX, key, YYYYY value, Context context) ....
}
What is the correct replacement for XXXXX and YYYYY

1. Text, Iterable

2. Text, IntWritable

3. Access Mostly Uused Products by 50000+ Subscribers

4. IntWritable, List