Premium

Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : Identify which best defines a SequenceFile?
  : Identify  which best defines a SequenceFile?
1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects
2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects
3. Access Mostly Uused Products by 50000+ Subscribers
4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.

Correct Answer : Get Lastest Questions and Answer :
Explanation: SequenceFile is a flat file consisting of binary key/value pairs.
There are 3 different SequenceFile formats:
Uncompressed key/value records.
Record compressed key/value records - only 'values' are compressed here.
Block compressed key/value records - both keys and values are collected in 'blocks'
separately and compressed. The size of the 'block' is configurable.




Question : Your client application submits a MapReduce job to your Hadoop cluster. Identify the
Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.

  : Your client application submits a MapReduce job to your Hadoop cluster. Identify the
1. TaskTracker
2. NameNode
3. Access Mostly Uused Products by 50000+ Subscribers
4. JobTracker
5. Secondary NameNode

Correct Answer : Get Lastest Questions and Answer :
Explanation: JobTracker is the daemon service for submitting and tracking MapReduce jobs in Hadoop. There is only One Job Tracker process run on any hadoop cluster. Job
Tracker runs on its own JVM process. In a typical production cluster its run on a separate machine. Each slave node is configured with job tracker node location. The JobTracker is
single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted. JobTracker in Hadoop performs following actions(from Hadoop Wiki:)
Client applications submit jobs to the Job tracker.
The JobTracker talks to the NameNode to determine the location of the data The JobTracker locates TaskTracker nodes with available slots at or near the data
The JobTracker submits the work to the chosen TaskTracker nodes. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often
enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something
to avoid, and it may may even blacklist the TaskTracker as unreliable. When the work is completed, the JobTracker updates its status. Client applications can poll the JobTracker for
information.




Question : How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?

  : How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?
1. Keys are presented to reducer in sorted order; values for a given key are not sorted.
2. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.

Correct Answer : Get Lastest Questions and Answer :
Explanation: Reducer has 3 primary phases:
1. Shuffle : The Reducer copies the sorted output from each Mapper using HTTP across the network.
2. Sort : The framework merge sorts Reducer inputs by keys (since different Mappers may have
output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.
SecondarySort : To achieve a secondary sort on the values returned by the value iterator, the application
should extend the key with the secondary key and define a grouping comparator. The keys
will be sorted using the entire key, but will be grouped using the grouping comparator to
decide which keys and values are sent in the same call to reduce.
3. Access Mostly Uused Products by 50000+ Subscribers
(collection of values)> in the sorted inputs.
The output of the reduce task is typically written to a RecordWriter via
TaskInputOutputContext.write(Object, Object).
The output of the Reducer is not re-sorted.
Reference: org.apache.hadoop.mapreduce, Class
Reducer


Related Questions


Question : The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes.
What are these writable data types optimized for?


  :  The Hadoop API uses basic Java types such as LongWritable, Text, IntWritable. They have almost the same features as default java classes.
1. Writable data types are specifically optimized for network transmissions
2. Writable data types are specifically optimized for file system storage
3. Access Mostly Uused Products by 50000+ Subscribers
4. Writable data types are specifically optimized for data retrieval




Question : Can a custom type for data Map-Reduce processing be implemented?

  :  Can a custom type for data Map-Reduce processing be implemented?
1. No, Hadoop does not provide techniques for custom datatypes
2. Yes, but only for mappers
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes, but only for reducers


Question : What happens if mapper output does not match reducer input?

  : What happens if mapper output does not match reducer input?
1. Hadoop API will convert the data to the type that is needed by the reducer.
2. Data input/output inconsistency cannot occur. A preliminary validation check is executed prior
to the full execution of the job to ensure there is consistency.
3. Access Mostly Uused Products by 50000+ Subscribers
4. A real-time exception will be thrown and map-reduce job will fail




Question : Can you provide multiple input paths to a map-reduce jobs?
  : Can you provide multiple input paths to a map-reduce jobs?
1. Yes, but only in Hadoop 0.22+
2. No, Hadoop always operates on one input directory
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes, but the limit is currently capped at 10 input paths.




Question : Can you assign different mappers to different input paths?

  : Can you assign different mappers to different input paths?
1. Yes, but only if data is identical.
2. Yes, different mappers can be assigned to different directories
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes but only in Hadoop .22+





Question : Can you suppress reducer output?

  :  Can you suppress reducer output?
1. Yes, there is a special data type that will suppress job output
2. No, map reduce job will always generate output.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes, but only during map execution when reducers have been set to zero