Explanation: Components of Mapreduce Job Flow: Mapreduce job flow on YARN involves below components. A Client node, which submits the Mapreduce job. The YARN Resource Manager, which allocates the cluster resources to jobs. The YARN Node Managers, which launch and monitor the tasks of jobs. The MapReduce Application Master, which coordinates the tasks running in the MapReduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager, and managed by the node managers. The HDFS file system is used for sharing job files between the above entities.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : At line number 4 you replace with "this.conf= new Configuration(otherConf)" where otherConf is an object of Configuration class. 1. A new configuration with the same settings cloned from another. 2. It will give runtime error 3. Access Mostly Uused Products by 50000+ Subscribers Ans : 1 Exp : A new configuration with the same settings cloned from another.
Configuration() A new configuration. Configuration(boolean loadDefaults) A new configuration where the behavior of reading from the default resources can be turned off. Configuration(Configuration other) A new configuration with the same settings cloned from another.
Question : Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count) and that you want to sort them by number of occurrences. Then which one of the following class will help you to get globally sorted file 1. Combiner 2. Partitioner 3. Access Mostly Uused Products by 50000+ Subscribers 4. By Default all the files are sorted.
Ans : 2 Exp : it is possible to produce a set of sorted files that, if concatenated, would form a globally sorted file. The secret to doing this is to use a partitioner that respects the total order of the output. For example, if we had four partitions, we could put keys for temperatures less than negative 10 C in the first partition, those between negative 10 C and 0 C in the second, those between 0 C and 10 C in the third, and those over 10C in the fourth.
Question : Which of the following could be replaced safely at line number 9 1. Job job = new Job(); 2. Job job = new Job(conf); 3. Access Mostly Uused Products by 50000+ Subscribers 4. You can not change this line from either 1 or 2 Ans : 3 Exp : All 1 and 2 are correct, however not having conf will ignore the custom configuration and 2nd argument present Custom job name. If you dont provide it take default job name defined by framework.
Question : If we are processing input data from database then at line 10 which of the following is correct InputFormat for reading from DB 1. DataBaseInputFormat 2. DBMSInputFormat 3. Access Mostly Uused Products by 50000+ Subscribers 4. Not Supported Ans : 3 Exp : The DBInputFormat is an InputFormat class that allows you to read data from a database. An InputFormat is Hadoops formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL queries performed against the database. Most queries are supported, subject to a few limitations
Question : At line number 13 you replace number of reducer to 1 and Setting Reducer class as IdenityReducer then which of the following statement is correct 1. In both the cases behaviors is same 2. With 0 reducer, reduce step will be skipped and mapper output will be the final out 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 3 both are correct 5. 2 and 3 both are correct
Ans : 5 Exp : If you do not need sorting of map results - you set 0 reduced, and the job is called map only. If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer. we have a third case : we do need aggregation and, in this case we need reducer.
Question : When you are implementing the secondary sort (Sorting based on values) like, following output is produced as Key Part of the Mapper
Ans : 2 Exp : Map output key is year and temperature to achieve sorting. Unless you define a grouping comparator that uses only the year part of the map output key, you can not make all records of the same year go to same reduce method call You're right that by partitioning on the year youll get all the data for a year in the same reducer, so the comparator will effectively sort the data for each year by the temperature
Question : What is the use of job.setJarByClass(MapReduceJob.class) at line number 16 1. This method sets the jar file in which each node will look for the Mapper and Reducer classes 2. This is used to define which is the Driver class 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 both are correct Ans : 1 Exp : This method sets the jar file in which each node will look for the Mapper and Reducer classes. It does not create a jar from the given class. Rather, it identifies the jar containing the given class. And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job
Question : At line number 18 if path "/out" is already exist in HDFS then
1. Hadoop will delete this directory and create new empty directory and after processing put all output in this directory 2. It will write new data in existing directory and dont delete the existing data in this directory 3. Access Mostly Uused Products by 50000+ Subscribers 4. It will overwrite the existing content with new content Ans : 3 Exp : It will throw exception, because hadoop will check the input and output specification before running any new job. So it avoid already existing data being overwritten.
Question : If you remove both line 10 and 11 from this code then what happen. 1. It will throw compile time error 2. Program will run successfully but Output file will not be created 3. Access Mostly Uused Products by 50000+ Subscribers Ans : 3 Exp : As both are the default Input and Output format hence the program will run without any issue.
Question : If you replace line 19 return job.waitForCompletion(true) ? 1 : 0; with job.submit(); then which is correct statement 1. In the cases MapReduce will run successfully 2. with waitForCompletion, Submit the job to the cluster and wait for it to finish 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above correct
Explanation: waitForCompletion Submit the job to the cluster and wait for it to finish. submit Submit the job to the cluster and return immediately.
Question : In above method at line 1 if you replace context.write(new Text(testCount.toString()), NullWritable.get()); with context.write(testCount.toString(), NullWritable.get()); what would happen 1. It would not work, because String is directly not supported 2. It would work, but it will not give good performance 3. Access Mostly Uused Products by 50000+ Subscribers 4. Code will not compile at all after this change Ans : 2 Exp : Text class stores text using standard UTF8 encoding. It provides methods to serialize, deserialize, and compare texts at byte level. The type of length is integer and is serialized using zero-compressed format. In addition, it provides methods for string traversal without converting the byte array to a string. Also includes utilities for serializing/deserialing a string, coding/decoding a string, checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.
Question : Which is the correct statement when you poorly define the Partioner 1. it has a direct impact on the overall performance of your job and can reduce the performance of the overall job 2. a poorly designed partitioning function will not evenly distributes the values over the reducers 3. Access Mostly Uused Products by 50000+ Subscribers 4. Both 1 and 2 are correct 5. All 1, 2 and 3 are correct
Ans : 4 Exp : First, it has a direct impact on the overall performance of your job: a poorly designed partitioning function will not evenly distributes the charge over the reducers, potentially losing all the interest of the map/reduce distributed infrastructure.
Question : In above code we will replace LongWritable with Long then what would happen, the input to this file from a file.
1. Code will run, but not produce result as expected 2. Code will not run as key has to be WritableComparable 3. Access Mostly Uused Products by 50000+ Subscribers 4. It will throw java.lang.ClassCastException Ans : 4 Exp : The key class of a mapper that maps text files is always LongWritable. That is because it contains the byte offset of the current line and this could easily overflow an integer.
Question :Select the correct statement regarding reducer
1. Number of reducer is defined as part of Job Configuration 2. All values of the same key can be processed by multiple reducer. 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1,2 and 3 are correct 5. 1 and 3 are correct