Mapr (HP) Hadoop Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : You have to run MapReduce job, where Mapper is a Java class and Reducer is a Unix command "/bin/wc " . After completing your entire Job, you want that only two
partitions should be created. Select correct options which fulfill this requirement.

1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-reducer=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc
2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D mapred.reduce.tasks=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc
3. Access Mostly Uused Products by 50000+ Subscribers
-D mapred.reduce.count=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc
4. As default file count would always be 2, hence no specific configuration is required.
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc

Correct Answer : Get Lastest Questions and Answer :
Explanation: Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not
create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.

-D mapred.reduce.tasks=0

To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".

Specifying the Number of Reducers
To specify the number of reducers, for example two, use:

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D mapred.reduce.tasks=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer /bin/wc

Question : You have following data in a file callled HadoopExam.txt
Learning.Hadoop.From.HadoopExam.com
Learning.Spark.From.QuickTechie.com
Learning.Cassandra.From.Training4Exam.com
Learning.HBase.From.AWSExam.blogspot.com

Now from above data , while running Hadoop MapReduce streaming job, you want to creare key-set as below.

[Learning.Hadoop,Learning.Spark,Learning.Cassandra,Learning.HBase]
Which of the following is a correct code snippet.

1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D stream.map.output.field.separator=. \
-D stream.num.map.output.key.fields=15 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer
2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D stream.map.output.field.separator=. \
-D stream.num.map.output.key.fields=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer
3. Access Mostly Uused Products by 50000+ Subscribers
-D stream.map.output.field.separator=. \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer
4. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D stream.map.output.field.separator=. \
-D stream.num.map.output.key.counts=2 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer

Correct Answer : Get Lastest Questions and Answer :
Explanation: you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n >= 1) character
rather than the
first character in a line (the default) as the separator between the key and value. For example:

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-D stream.map.output.field.separator=. \
-D stream.num.map.output.key.fields=4 \
-input myInputDirs \
-output myOutputDir \
-mapper org.apache.hadoop.mapred.lib.IdentityMapper \
-reducer org.apache.hadoop.mapred.lib.IdentityReducer

In the above example, "-D stream.map.output.field.separator=." specifies "." as the field separator for the map outputs, and the prefix up to the fourth "." in a line will be the key
and the rest of the line (excluding the fourth ".") will be the value. If a line has less than four "."s, then the whole line will be the key and the value will be an empty Text
object (like the one created by new Text("")).

Similarly, you can use "-D stream.reduce.output.field.separator=SEP" and "-D stream.num.reduce.output.fields=NUM" to specify the nth field separator in a line of the reduce outputs
as the separator between the key and the value.

Similarly, you can specify "stream.map.input.field.separator" and "stream.reduce.input.field.separator" as the input separator for Map/Reduce inputs. By default the separator is the
tab character.

Question : The ________________ options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive
that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name
config variable.

1. -files and -archives

2. -file and -archive

3. Access Mostly Uused Products by 50000+ Subscribers

4. -archives

Correct Answer : Get Lastest Questions and Answer :
Explanation: The -files and -archives options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already
uploaded to HDFS.
These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.

Note: The -files and -archives options are generic options. Be sure to place the generic options before the command options, otherwise the command will fail. For an example, see The
-archives Option. Also see Other Supported Options.

Making Files Available to Tasks

The -files option creates a symlink in the current working directory of the tasks that points to the local copy of the file.

In this example, Hadoop automatically creates a symlink named testfile.txt in the current working directory of the tasks. This symlink points to the local copy of testfile.txt.

-files hdfs://host:fs_port/user/testfile.txt

User can specify a different symlink name for -files using #.

-files hdfs://host:fs_port/user/testfile.txt#testfile
Multiple entries can be specified like this:

-files hdfs://host:fs_port/user/testfile1.txt,hdfs://host:fs_port/user/testfile2.txt

The -archives option allows you to copy jars locally to the current working directory of tasks and automatically unjar the files.

In this example, Hadoop automatically creates a symlink named testfile.jar in the current working directory of tasks. This symlink points to the directory that stores the unjarred
contents of the uploaded jar file.

-archives hdfs://host:fs_port/user/testfile.jar

Related Questions

Question :

Select the correct code snippet which implements the WritableComparable correctly for a pair of Strings

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers

Question :

If you have 100 files of size 100MB and block size is of 64MB then how many maps will run?

1. 100
2. 200
3. Access Mostly Uused Products by 50000+ Subscribers
4. Between 100 and 200

Question :

If you want to use a file in distributed cache then in which method should you read it?

1. map
2. run
3. Access Mostly Uused Products by 50000+ Subscribers
4. setup

Question : From the Acmeshell.com website you have you're all the data stored
in Oracle database table called MAIN.PROFILES table. In HDFS you already
have your Apache WebServer log file stored called users_activity.log .
Now you want to combine/join both the data users_activity.log file and MAIN.PROFILES
table. Initailly, you want to import the table data from the database
into Hive using Sqoop with the delimeter (;) and column order remain same.
Now select the correct MapReduce code snippet which can produce the csv file,
so that we can load the output of MapReduce job in the Hive table created
in above steps called PROFILE.

1. 1
2. 2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 4

Question : As part of HadoopExam consultency team, you have been given a requirement by a Hotel to create
a GUI application, so all the hotel's sales or booking you will add and edit the customer information, and you dont want to spend the
money on enterprise RDBMS, hence decided simple file as a storage and considered the csv file. So HDFS is the better choice for
storing such information in the file.

1. No, because HDFS is optimized for read-once, streaming access for relatively large files.
2. No, because HDFS is optimized for write-once, streaming access for relatively large files.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Yes, because HDFS is optimized for write-once, streaming access for relatively large files.

Question : Please identify the statement which can correctly describe the use of RAM, of the NameNode

1. To store filenames, initial 100 lines from the each stored file in HDFS.
2. To store filenames, and while reading the file work as a buffer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. To store filenames, list of blocks but no metadata.