Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following in-built counter will help us to find total number of Bytes written on local file system. During job execution

1. FILE_BYTES_WRITTEN

2. MAPRFS_BYTES_WRITTEN

3. Access Mostly Uused Products by 50000+ Subscribers

4. LOCAL_BYTES_WRITTEN

Correct Answer : Get Lastest Questions and Answer :
Explanation: 1. Filesystem counters
Filesystem counters are used to analysis experimental results. The following are the typical built-in filesystem counters.
Local file system
FILE_BYTES_READ
FILE_BYTES_WRITTEN
HDFS file system
HDFS_BYTES_READ
HDFS_BYTES_WRITTEN
FILE_BYTES_READ is the number of bytes read by local file system. Assume all the map input data comes from HDFS, then in map phase FILE_BYTES_READ should be
zero. On the other hand, the input file of reducers are data
on the reduce-side local disks which are fetched from map-side disks. Therefore, FILE_BYTES_READ denotes the total bytes read by reducers.

FILE_BYTES_WRITTEN consists of two parts. The first part comes from mappers. All the mappers will spill intermediate output to disk. All the bytes that
mappers write to disk will be included in FILE_BYTES_WRITTEN.
The second part comes
from reducers. In the shuffle phase, all the reducers will fetch intermediate data from mappers and merge and spill to reducer-side disks. All the bytes
that reducers write to disk will also be included in FILE_BYTES_WRITTEN.

HDFS_BYTES_READ denotes the bytes read by mappers from HDFS when the job starts. This data includes not only the content of source file but also metadata
about splits.

HDFS_BYTES_WRITTEN denotes the bytes written to HDFS. It s the number of bytes of the final output.

Note that since HDFS and local file systems are different file systems so the data from the two file systems will never overlap.

Question : label-based scheduling, help us to override the default scheduling algorithm and run tasks on specific nodes

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Question : While doing MRUnit test, you provide input key and value as well as expected output. What happens if the actual output does not match the
expected output

1. Test case will fail and driver will throw an exception

2. Test case will fail and no exception from driver

3. Access Mostly Uused Products by 50000+ Subscribers

4. Any of the above can happen

Correct Answer : Get Lastest Questions and Answer :
Explanation:

Related Questions

Question : Please map the followings
A. Find all the Running Jobs
B. Get the completion status of a Particular job
C. Stop already running Job

1. hadoop job -list
2. hadoop job -kill job_id
3. Access Mostly Uused Products by 50000+ Subscribers

1. A-1, B-2, C-3
2. A-1, B-3, C-2
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-3, B-1, C-2
5. A-3, B-2, C-3

Question : Which is/are the correct ways to set the Job Priority from below?

1. Configuration conf = new Configuration
conf.set("mapred.job.priority" , "VERY_LOW")

2. Passing as a parameter while submitting job
-D mapred.job.priority=VERY_LOW

3. Access Mostly Uused Products by 50000+ Subscribers
hadoop job -set-priority job_id

4. 1,2

5. 1,2,3

Question : You can use the Job Priority to prioritize your Job Over other Jobs in other Pools or Queue

1. True
2. False

Question : In the label-based scheduling

1. User can override the default scheduling algorithm and can have more control where the Job should run the Cluster

2. Location of the labels file can be defined using jobtracker.node.labels.file in mapred-site.xml file

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : Select correct statement regarding label based scheduling

1. To list all the available labels in the cluster, you can use hadoop job -showlabels

2. We can use following command line option to Submit job with label hadoop jar -D mapred.job.label=hadoopexam

3. Access Mostly Uused Products by 50000+ Subscribers

4. 1,2

5. 1,2,3

Question : You have following command executed

hadoop job -showlables
Node lables :
CentOS001 : [heavy, hig_ram, high_cpu]
CentOS002 : [light, low_ram, low_cpu]
CentOS003 : [medium, m_ram, m_cpu]

Ans now you submit the job with below command

hadoop jar -D mapred.job.label=hadoopexam

What would happen?

1. It will submit the entire job on CentOS001

2. It will submit the entire job on CentOS002

3. Access Mostly Uused Products by 50000+ Subscribers

4. It will use default scheduling algorithm

5. Job will hang