Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : Which statement is true with respect to MapReduce . or YARN
 :   Which statement is true with respect to MapReduce . or YARN
1. It is the newer version of MapReduce, using this performance of the data processing can be increased.
2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. Only 2 and 3 are correct
Ans : 5
Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker,
resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM)
and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com


Question : Which statement is true about ApplicationsManager

 :   Which statement is true with respect to MapReduce . or YARN
1. is responsible for accepting job-submissions
2. negotiating the first container for executing the application specific ApplicationMaster
and provides the service for restarting the ApplicationMaster container on failure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 2 are correct
Ans : 5
Exp : The ApplicationsManager is responsible for accepting job-submissions,
negotiating the first container for executing the application specific ApplicationMaster and provides the
service for restarting the ApplicationMaster container on failure.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com


Question : Which tool is used to list all the blocks of a file ?

 :   Which statement is true with respect to MapReduce . or YARN
1. hadoop fs
2. hadoop fsck
3. Access Mostly Uused Products by 50000+ Subscribers
4. Not Possible
Ans : 2






Question : Which two daemons typically run on each slave node in a Hadoop cluster running MapReduce v (MRv) on YARN?

1. TaskTracker

2. Secondary NameNode

3. NodeManager

4. DataNode

5. ZooKeeper

6. JobTracker

7. NameNode

8. JournalNode


 :   Which statement is true with respect to MapReduce . or YARN
1. 1,2
2. 2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5,6
4. 7,8

Correct Answer : Get Lastest Questions and Answer :

Explanation: Each slave node in a cluster configured to run MapReduce v2 (MRv2) on YARN typically runs a DataNode daemon (for HDFS functions) and NodeManager daemon (for YARN functions).
The NodeManager handles communication with the ResourceManager, oversees application container lifecycles, monitors CPU and memory resource use of the containers, tracks the node
health, and handles log management. It also makes available a number of auxiliary services to YARN applications.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com





Question : How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN?
 :  How does the Hadoop framework determine the number of Mappers required for a MapReduce job on a cluster running MapReduce v (MRv) on YARN?
1. The number of Mappers is equal to the number of InputSplits calculated by the client submitting the job
2. The ApplicationMaster chooses the number based on the number of available nodes

3. Access Mostly Uused Products by 50000+ Subscribers
4. NodeManager where the job's HDFS blocks reside
5. The developer specifies the number in the job configuration



Correct Answer : Get Lastest Questions and Answer :
Each Mapper task processes a single InputSplit. The client calculates the InputSplits before submitting the job to the cluster. The developer may specify how the input split is
calculated, with a single HDFS block being the most common split. This is true for both MapReduce v1 (MRv1) and YARN MapReduce implementations.

With YARN, each mapper will be run in a container which consists of a specific amount of CPU and memory resources. The ApplicationMaster requests a container for each mapper. The
ResourceManager schedules the resources and instructs the ApplicationMaster of available NodeManagers where the container may be launched.

With MRv1, each Tasktracker (slave node) is configured to handle a maximum number of concurrent map tasks. The JobTracker (master node) assigns a Tasktracker a specific Inputslit to
process as a single map task.

You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com





Question :

You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25
KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you
decide to do the following actions:
1. Group the individual images into a set of larger files
2. Use the set of larger files as input for a MapReduce job that processes them directly with
Python using Hadoop streaming
Which data serialization system gives you the flexibility to do this?

A. CSV
B. XML
C. HTML
D. Avro
E. Sequence Files
F. JSON

  :
1. A,B
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. D,E
5. C,E

Correct Answer : Get Lastest Questions and Answer :






Related Questions


Question : Please map the below, for compression types?
A. NONE
B. RECORD
C. BLOCK

1. Do not compress
2. Compress only values
3. Access Mostly Uused Products by 50000+ Subscribers
4. Compress both value and key

 : Please map the below, for compression types?
1. A-1, B-4, C-3
2. A-1, B-3, C-4
3. Access Mostly Uused Products by 50000+ Subscribers
4. A-1, B-2, C-3
5. A-3, B-2, C-4


Question : Which of the following information, is stored in a header of Sequence file?
A. Magic number to know, it s a SequnceFile
B. Type of key
C. Type of Value
D. Compression Codec detail

 : Which of the following information, is stored in a header of Sequence file?
1. A,D
2. C,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. B,C,D
5. A,B,C,D


Question : In case of SequenceFile, all the keys are stored in header and their respective values are stored as a content. Including key length and value
length.
 : In case of SequenceFile, all the keys are stored in header and their respective values are stored as a content. Including key length and value
1. True
2. False


Question : Select correct statement regarding SequenceFile
A. '\n' is used as a record terminator
B. Sync marker is used as a record terminator
C. The key-value records are bundled into blocks.
D. The block delimiters are called "markers", and the size of a block is tunable
 : Select correct statement regarding SequenceFile
1. A,C,D
2. A,B,D
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C
5. A,B,C,D


Question : You have to write a Job , which reads SequenceFile and produce as an output Compressed SequenceFile (Compression type is Gzip). Below is the
code snippet for Driver class

setOutputFormatClass(1)
setCompressOutput(2)
setOutputCompressorClass(3)
setOutputCompressionType(4)
setInputFormatClass(5)

Map the below.

B. job,true
C. job, GzipCodec.class
D. job, CompressionType.BLOCK
E. SequenceFileInputFormat.class
A. SequenceFileOutputFormat.class

 : You have to write a Job , which reads SequenceFile and produce as an output Compressed SequenceFile (Compression type is Gzip). Below is the
1. 1-B, 2-A, 3-D, 4-C, 5-E
2. 1-E, 2-B, 3-C, 4-D, 5-A
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1-A, 2-B, 3-C, 4-D, 5-E
5. 1-D, 2-B, 3-C, 4-A, 5-E


Question : Using SequenceFile can save disk space as well as time if more than one MapReduce jobs are chained together
 : Using SequenceFile can save disk space as well as time if more than one MapReduce jobs are chained together
1. True
2. False