Question : You have to run MapReduce job, where Mapper is a Java class and Reducer is a Unix command "/bin/wc " . After completing your entire Job, you want that only two partitions should be created. Select correct options which fulfill this requirement.
1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -reducer=2 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc 2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -D mapred.reduce.tasks=2 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc 3. Access Mostly Uused Products by 50000+ Subscribers -D mapred.reduce.count=2 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc 4. As default file count would always be 2, hence no specific configuration is required. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc
Correct Answer : Get Lastest Questions and Answer : Explanation: Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
-D mapred.reduce.tasks=0
To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".
Specifying the Number of Reducers To specify the number of reducers, for example two, use:
Question : You have following data in a file callled HadoopExam.txt Learning.Hadoop.From.HadoopExam.com Learning.Spark.From.QuickTechie.com Learning.Cassandra.From.Training4Exam.com Learning.HBase.From.AWSExam.blogspot.com
Now from above data , while running Hadoop MapReduce streaming job, you want to creare key-set as below.
[Learning.Hadoop,Learning.Spark,Learning.Cassandra,Learning.HBase] Which of the following is a correct code snippet. 1. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -D stream.map.output.field.separator=. \ -D stream.num.map.output.key.fields=15 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer org.apache.hadoop.mapred.lib.IdentityReducer 2. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -D stream.map.output.field.separator=. \ -D stream.num.map.output.key.fields=2 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer org.apache.hadoop.mapred.lib.IdentityReducer 3. Access Mostly Uused Products by 50000+ Subscribers -D stream.map.output.field.separator=. \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer org.apache.hadoop.mapred.lib.IdentityReducer 4. $HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \ -D stream.map.output.field.separator=. \ -D stream.num.map.output.key.counts=2 \ -input myInputDirs \ -output myOutputDir \ -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer org.apache.hadoop.mapred.lib.IdentityReducer
Correct Answer : Get Lastest Questions and Answer : Explanation: you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n >= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
In the above example, "-D stream.map.output.field.separator=." specifies "." as the field separator for the map outputs, and the prefix up to the fourth "." in a line will be the key and the rest of the line (excluding the fourth ".") will be the value. If a line has less than four "."s, then the whole line will be the key and the value will be an empty Text object (like the one created by new Text("")).
Similarly, you can use "-D stream.reduce.output.field.separator=SEP" and "-D stream.num.reduce.output.fields=NUM" to specify the nth field separator in a line of the reduce outputs as the separator between the key and the value.
Similarly, you can specify "stream.map.input.field.separator" and "stream.reduce.input.field.separator" as the input separator for Map/Reduce inputs. By default the separator is the tab character.
Question : The ________________ options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
Correct Answer : Get Lastest Questions and Answer : Explanation: The -files and -archives options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
Note: The -files and -archives options are generic options. Be sure to place the generic options before the command options, otherwise the command will fail. For an example, see The -archives Option. Also see Other Supported Options.
Making Files Available to Tasks
The -files option creates a symlink in the current working directory of the tasks that points to the local copy of the file.
In this example, Hadoop automatically creates a symlink named testfile.txt in the current working directory of the tasks. This symlink points to the local copy of testfile.txt.
-files hdfs://host:fs_port/user/testfile.txt
User can specify a different symlink name for -files using #.
-files hdfs://host:fs_port/user/testfile.txt#testfile Multiple entries can be specified like this:
The -archives option allows you to copy jars locally to the current working directory of tasks and automatically unjar the files.
In this example, Hadoop automatically creates a symlink named testfile.jar in the current working directory of tasks. This symlink points to the directory that stores the unjarred contents of the uploaded jar file.