Question : In the regular WordCount MapReduce example, you have following driver code. public class WordCount extends Configured implements Tool { public static void main(String args[]) throws Exception { int res = ToolRunner.run(new WordCount(), args); System.exit(res); } public int run(String[] args) throws Exception { Path inputPath = new Path("shakespeare1"); Path outputPath = new Path(""+System.currentTimeMillis()); Configuration conf = getConf(); Job job = new Job(conf, this.getClass().toString()); FileInputFormat.setInputPaths(job, inputPath); FileOutputFormat.setOutputPath(job, outputPath); job.setJobName("WordCount"); job.setJarByClass(WordCount.class); job.setJarByClass(WordCount.class); job.setJobName("Word Count"); job.setMapperClass(WordMapper.class); job.setReducerClass(SumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setNumReduceTasks(2); return job.waitForCompletion(true) ? 0 : 1; } } Now you run the below command on a single node cluste. Where wc.jar is jar file containing Driver,Mapper and Reducer class. hadoop jar wc.jar WordCount -D mapred.reduce.tasks=3 Select the correct statement from below. 1. It will run 3 Reducer as command line option would be preferred 2. It will run 2 reducers as driver code has defined number of reducer 3. Access Mostly Uused Products by 50000+ Subscribers 4. Number of reducer can not be determined command line and driver configuration is just a hint
Correct Answer : Get Lastest Questions and Answer : Following are the priorities of the 3 options for setting number of reduces Option1: setNumReduceTasks(2) within the application code Option2: -D mapreduce.job.reduces=2 as command line argument Option3: through $HADOOP_CONF_DIR/mapred-site.xml file
property : name mapreduce.job.reduces value : 2 Above racked in priority order - option 1 will override 2, and 2 will override 3. In other words Option 1 will be the one used by your job in this scenario.
Question :
You are running the regular WordCount example with the Mapper and Reducer defined in a separate class. Now you have 4 files in a directory from which you want to count number of words. Out of these 4 files, 3 files has 1 line in each file and 4th file has 0 lines. Now you run the wordcount job, then how many Mapper will be executed (Assuming you are running on a single node)?
Explanation: If a file size is a less than block size (64MB), then for each file one Mapper will be executed. It does not matter whether file size is zero.
Question : Please select the correct features for the HDFS 1. Files in HDFS can concurrently updated and read 2. Files in HDFS can concurrently updated 3. Access Mostly Uused Products by 50000+ Subscribers 4. Files in HDFS cannot be concurrently read
Explanation: An application adds data to HDFS by creating a new file and writing the data to it. After the file is closed, the bytes written cannot be altered or removed except that new data can be added to the file by reopening the file for append. HDFS implements a single-writer, multiple-reader model. The HDFS client that opens a file for writing is granted a lease for the file; no other client can write to the file. The writing client periodically renews the lease by sending a heartbeat to the NameNode. When the file is closed, the lease is revoked. The lease duration is bound by a soft limit and a hard limit. Until the soft limit expires, the writer is certain of exclusive access to the file. If the soft limit expires and the client fails to close the file or renew the lease, another client can preempt the lease. If after the hard limit expires (one hour) and the client has failed to renew the lease, HDFS assumes that the client has quit and will automatically close the file on behalf of the writer, and recover the lease. The writer's lease does not prevent other clients from reading the file; a file may have many concurrent readers.