Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : You need to run the same job many times with minor variations. Rather than hardcoding all
job configuration options in your drive code, you've decided to have your Driver subclass
org.apache.hadoop.conf.Configured and implement the org.apache.hadoop.util.Toolinterface.
Identify which invocation correctly passes.mapred.job.name with a value of Example to Hadoop?

1. hadoop "mapred.job.name=Example" MyDriver input output
2. hadoop MyDriver mapred.job.name=Example input output
3. Access Mostly Uused Products by 50000+ Subscribers
4. hadoop setproperty mapred.job.name=Example MyDriver input output
5. hadoop setproperty ("mapred.job.name=Example") MyDriver input output

Correct Answer : Get Lastest Questions and Answer :
Explanation: Configure the property using the -D key=value notation:
-D mapred.job.name='My Job'
You can list a whole bunch of options by calling the streaming jar with just the -info
argument

Question : What types of algorithms are difficult to express in MapReduce v (MRv)?

1. Algorithms that require applying the same mathematical function to large numbers of individual binary records.
2. Relational operations on large amounts of structured and semi-structured data.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Large-scale graph algorithms that require one-step link traversal.
5. Text analysis algorithms on large collections of unstructured text (e.g, Web crawls).

Correct Answer : Get Lastest Questions and Answer :
Explanation: Limitations of Mapreduce - where not to use Mapreduce
While very powerful and applicable to a wide variety of problems, MapReduce is not the answer to every problem. Here are some problems I found where MapReudce is not suited
and some papers that address the limitations of Mapreduce.
1. Computation depends on previously computed values : If the computation of a value depends on previously computed values, then MapReduce cannot be used. One good example is the
Fibonacci series where each value is summation
of the previous two values. i.e., f(k+2) = f(k+1) + f(k). Also, if the data set is small enough to be computed on a single machine, then it is better to do it as a single
reduce(map(data)) operation rather than going through the entire map reduce process.
2. Full-text indexing or ad hoc searching : The index generated in the Map step is one dimensional, and the Reduce step must not generate a large amount of data or there will be a
serious performance degradation. For
example, CouchDB's MapReduce may not be a good fit for full-text indexing or ad hoc searching. This is a problem better suited for a tool such as Lucene.
3. Access Mostly Uused Products by 50000+ Subscribers
naturally in MapReduce, since map
and reduce tasks run independently and in isolation. However, there are many examples of algorithms that depend crucially on the existence of shared global state during processing,
making them difficult to implement in MapReduce (since the single opportunity for global synchronization in MapReduce is the barrier between the map and reduce phases of processing)

Question : Which project gives you a distributed, Scalable, data store that allows you random, , real-time read/write access to hundreds of terabytes of data?

1. HBase
2. Hue
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive
5. Sqoop

Correct Answer : Get Lastest Questions and Answer :
Explanation: Use Apache HBase when you need random, , real-time read/write access to
your Big Data.
Note: This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source,
distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages
the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.
Features :
Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of tables
Automatic failover support between RegionServers.
Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables.
Easy to use Java API for client access.
Block cache and Bloom Filters for real-time queries.
Query predicate push down via server side Filters
Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data
encoding options
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via
JMX

Related Questions

Question :

in 3 mappers and 2 reducers how many distinct copy operations will be there in the sort or shuffle phase

1. 3
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 5

Question :

In which scenario MapReduce is not suitable..

1. text mining on the unstructured documents
2. Analyzing web documents
3. Access Mostly Uused Products by 50000+ Subscribers
4. for a large computation of financial risk modeling and performance analysis.

Question : How can you use binary data in MapReduce?

1. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file
2. Binary data cannot be used by Hadoop framework. Binary data should be converted to a Hadoop compatible format prior to loading
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers

Question : What is map - side join?

1. Map-side join is done in the map phase and done in memory
2. Map-side join is a technique in which data is eliminated at the map step
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of these answers are correct

Question : How can you disable the reduce step?

1. The Hadoop administrator has to set the number of the reducer slot to zero on all slave nodes. This will disable the reduce step.
2. It is impossible to disable the reduce step since it is critical part of the Map-Reduce abstraction.
3. Access Mostly Uused Products by 50000+ Subscribers
4. While you cannot completely disable reducers you can set output to one.
There needs to be at least one reduce step in Map-Reduce abstraction.

Question : Why would one create a map-reduce without the reduce step?

1. Developers should design Map-Reduce jobs without reducers only if no reduce slots are available on the cluster
2. Developers should never design Map-Reduce jobs without reducers. An error will occur upon compile
3. Access Mostly Uused Products by 50000+ Subscribers
4. It is not possible to create a map-reduce job without at least one reduce step.
A developer may decide to limit to one reducer for debugging purposes