A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance
Correct Answer : Get Lastest Questions and Answer : Apache Hadoop is the framework for Huge Data Volume processing, and it also creates various child task to process data in parallel Creating and destroying this child task is monitored by hadoop.
Related Questions
Question : Which statement is true with respect to MapReduce . or YARN 1. It is the newer version of MapReduce, using this performance of the data processing can be increased. 2. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling or monitoring, into separate daemons. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above 5. Only 2 and 3 are correct Ans : 5 Exp : MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling or monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : Which statement is true about ApplicationsManager
1. is responsible for accepting job-submissions 2. negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above 5. 1 and 2 are correct Ans : 5 Exp : The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
You can also Refer/Consider Advance Hadoop YARN Training by HadoopExam.com
Question : Which tool is used to list all the blocks of a file ?
Configuration() A new configuration. Configuration(boolean loadDefaults) A new configuration where the behavior of reading from the default resources can be turned off. Configuration(Configuration other) A new configuration with the same settings cloned from another.
Question : Suppose that your jobs input is a (huge) set of word tokens and their number of occurrences (word count) and that you want to sort them by number of occurrences. Then which one of the following class will help you to get globally sorted file 1. Combiner 2. Partitioner 3. Access Mostly Uused Products by 50000+ Subscribers 4. By Default all the files are sorted.
Ans : 2 Exp : it is possible to produce a set of sorted files that, if concatenated, would form a globally sorted file. The secret to doing this is to use a partitioner that respects the total order of the output. For example, if we had four partitions, we could put keys for temperatures less than negative 10 C in the first partition, those between negative 10 C and 0 C in the second, those between 0 C and 10 C in the third, and those over 10C in the fourth.
Question : Which of the following could be replaced safely at line number 9 1. Job job = new Job(); 2. Job job = new Job(conf); 3. Access Mostly Uused Products by 50000+ Subscribers 4. You can not change this line from either 1 or 2 Ans : 3 Exp : All 1 and 2 are correct, however not having conf will ignore the custom configuration and 2nd argument present Custom job name. If you dont provide it take default job name defined by framework.
Question : If we are processing input data from database then at line 10 which of the following is correct InputFormat for reading from DB 1. DataBaseInputFormat 2. DBMSInputFormat 3. Access Mostly Uused Products by 50000+ Subscribers 4. Not Supported Ans : 3 Exp : The DBInputFormat is an InputFormat class that allows you to read data from a database. An InputFormat is Hadoops formalization of a data source; it can mean files formatted in a particular way, data read from a database, etc. DBInputFormat provides a simple method of scanning entire tables from a database, as well as the means to read from arbitrary SQL queries performed against the database. Most queries are supported, subject to a few limitations
Question : At line number 13 you replace number of reducer to 1 and Setting Reducer class as IdenityReducer then which of the following statement is correct 1. In both the cases behaviors is same 2. With 0 reducer, reduce step will be skipped and mapper output will be the final out 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 3 both are correct 5. 2 and 3 both are correct
Ans : 5 Exp : If you do not need sorting of map results - you set 0 reduced, and the job is called map only. If you need to sort the mapping results, but do not need any aggregation - you choose identity reducer. we have a third case : we do need aggregation and, in this case we need reducer.
Question : When you are implementing the secondary sort (Sorting based on values) like, following output is produced as Key Part of the Mapper
Ans : 2 Exp : Map output key is year and temperature to achieve sorting. Unless you define a grouping comparator that uses only the year part of the map output key, you can not make all records of the same year go to same reduce method call You're right that by partitioning on the year youll get all the data for a year in the same reducer, so the comparator will effectively sort the data for each year by the temperature
Question : What is the use of job.setJarByClass(MapReduceJob.class) at line number 16 1. This method sets the jar file in which each node will look for the Mapper and Reducer classes 2. This is used to define which is the Driver class 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1 and 2 both are correct Ans : 1 Exp : This method sets the jar file in which each node will look for the Mapper and Reducer classes. It does not create a jar from the given class. Rather, it identifies the jar containing the given class. And yes, that jar file is "executed" (really the Mapper and Reducer in that jar file are executed) for the MapReduce job
Question : At line number 18 if path "/out" is already exist in HDFS then
1. Hadoop will delete this directory and create new empty directory and after processing put all output in this directory 2. It will write new data in existing directory and dont delete the existing data in this directory 3. Access Mostly Uused Products by 50000+ Subscribers 4. It will overwrite the existing content with new content Ans : 3 Exp : It will throw exception, because hadoop will check the input and output specification before running any new job. So it avoid already existing data being overwritten.
Question : If you remove both line 10 and 11 from this code then what happen. 1. It will throw compile time error 2. Program will run successfully but Output file will not be created 3. Access Mostly Uused Products by 50000+ Subscribers Ans : 3 Exp : As both are the default Input and Output format hence the program will run without any issue.
Question : If you replace line 19 return job.waitForCompletion(true) ? 1 : 0; with job.submit(); then which is correct statement 1. In the cases MapReduce will run successfully 2. with waitForCompletion, Submit the job to the cluster and wait for it to finish 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above correct
Question : Which is the correct statement when you poorly define the Partioner 1. it has a direct impact on the overall performance of your job and can reduce the performance of the overall job 2. a poorly designed partitioning function will not evenly distributes the values over the reducers 3. Access Mostly Uused Products by 50000+ Subscribers 4. Both 1 and 2 are correct 5. All 1, 2 and 3 are correct
Ans : 4 Exp : First, it has a direct impact on the overall performance of your job: a poorly designed partitioning function will not evenly distributes the charge over the reducers, potentially losing all the interest of the map/reduce distributed infrastructure.
Question : In above code we will replace LongWritable with Long then what would happen, the input to this file from a file.
1. Code will run, but not produce result as expected 2. Code will not run as key has to be WritableComparable 3. Access Mostly Uused Products by 50000+ Subscribers 4. It will throw java.lang.ClassCastException Ans : 4 Exp : The key class of a mapper that maps text files is always LongWritable. That is because it contains the byte offset of the current line and this could easily overflow an integer.
Question :Select the correct statement regarding reducer
1. Number of reducer is defined as part of Job Configuration 2. All values of the same key can be processed by multiple reducer. 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1,2 and 3 are correct 5. 1 and 3 are correct