Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : Which one of the following is NOT a valid Oozie action?

1. mapreduce

2. pig

3. hive

4. mrunit

Correct Answer : 4

Explanation: With MRUnit, you can craft test input, push it through your mapper and/or reducer, and verify its output all in a JUnit test. As do other JUnit tests,
this allows you to debug
your code using the JUnit test as a driver. A map/reduce pair can be tested using MRUnits MapReduceDriver. A combiner can be tested using MapReduceDriver as well. A
PipelineMapReduceDriver allows you to test a workflow of map/reduce jobs. Currently, partitioners do not have a test driver under MRUnit. MRUnit allows you
to do TDD and write
light-weight unit tests which accommodate Hadoops specific architecture and constructs. MRUnit can not be used in OOzie workflow.

Question : You want to count the number of occurrences for each unique word in the supplied input data. You've decided to implement this by having your
mapper tokenize each word and
emit a literal value 1, and then have your reducer increment a counter for each literal 1 it receives. After successful implementing this, it occurs to you
that you could optimize
this by specifying a combiner. Will you be able to reuse your existing Reduces as your combiner in this case and why or why not?

1. Yes, because the sum operation is both associative and commutative and the input and output types to the reduce method match.

2. No, because the sum operation in the reducer is incompatible with the operation of a Combiner.

3. No, because the Reducer and Combiner are separate interfaces.

4. No, because the Combiner is incompatible with a mapper which doesn't use the same data type for both the key and value.

5. Yes, because Java is a polymorphic object-oriented language and thus reducer code can be reused as a combiner.

Correct Answer : 1

Explanation: Combiners are used to increase the efficiency of a MapReduce program. They are used to aggregate intermediate map output locally on individual mapper
outputs. Combiners can
help you reduce the amount of data that needs to be transferred across to the reducers. You can use your reducer code as a combiner if the operation
performed is commutative and
associative. The execution of combiner is not guaranteed, Hadoop may or may not execute a combiner. Also, if required it may execute it more than 1 times.
Therefore your MapReduce jobs should not depend on the combiner's execution.

Question : Workflows expressed in Oozie can contain:

1. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

2. Sequences of MapReduce job only; on Pig on Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.

3. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.

4. Iterative repetition of MapReduce jobs until a desired answer or state is reached.

Correct Answer : 1

Explanation: Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph),
specifying a sequence of actions
execution. This graph is specified in hPDL (a XML Process Definition Language). hPDL is a fairly compact language, using a limited amount of flow control and
action nodes. Control
nodes define the flow of execution and include beginning and end of a workflow (start, end and fail nodes) and mechanisms to control the workflow execution
path (decision, fork and
join nodes).
Workflow definitions currently running workflow instances, including instance states and variables Note: Oozie is a Java Web-Application that runs in a Java
servlet-container -
Tomcat and uses a database to store:

Related Questions

Question : Which describes how a client reads a file from HDFS?

1. The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directory off the DataNode(s).

2. The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode.

3. The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode,
and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode.
4. The client contacts the NameNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the
DataNode to the NameNode, and then from the NameNode to the client.

Question : Can you use MapReduce to perform a relational join on two large tables sharing a key? Assume that the two tables are formatted as comma-separated files in HDFS.

1. Yes.

2. Yes, but only if one of the tables fits into memory

3. Yes, so long as both tables fit into memory.

4. No, MapReduce cannot perform relational operations.

5. No, but it can be done with either Pig or Hive.

Question : A NameNode in Hadoop . manages ______________.

1. Two namespaces: an active namespace and a backup namespace

2. A single namespace

3. An arbitrary number of namespaces

4. No namespaces

Question : In Hadoop ., which one of the following statements is true about a standby NameNode? The Standby NameNode:

1. Communicates directly with the active NameNode to maintain the state of the active NameNode.

2. Receives the same block reports as the active NameNode.

3. Runs on the same machine and shares the memory of the active NameNode.

4. Processes all client requests and block reports from the appropriate DataNodes.

Question : Identify the MapReduce v (MRv / YARN) daemon responsible for launching application containers and monitoring application resource usage?

1. ResourceManager

2. NodeManager

3. ApplicationMaster

4. ApplicationMasterService

5. TaskTracker

Question : A client application creates an HDFS file named foo.txt with a replication factor of . Identify which best describes the file access rules in HDFS
if the file has a single block that is stored on data nodes A, B and C?

1. The file will be marked as corrupted if data node B fails during the creation of the file.

2. Each data node locks the local file to prohibit concurrent readers and writers of the file.

3. Each data node stores a copy of the file in the local file system with the same name as the HDFS file.

4. The file can be accessed if at least one of the data nodes storing the file is available.