Question : Arrange following in correct order of execution A. call to main() method B. Instantiation of new Configuration object D. job.waitForCompletion() C. Calling ToolRunner.run() static method
Correct Answer : Get Lastest Questions and Answer : Explanation: The Driver class first checks the invocation of the command (checks the count of the command-line arguments provided)
It sets values for the job, including the driver, mapper, and reducer classes used. In the Driver class, we also define the types for output key and value in the job as Text and FloatWritable respectively. If the mapper and reducer classes do NOT use the same output key and value types, we must specify for the mapper. In this case, the output value type of the mapper is Text, while the output value type of the reducer is FloatWritable.
There are 2 ways to launch the job " syncronously and asyncronously. The job.waitForCompletion() launches the job syncronously. The driver code will block waiting for the job to complete at this line. The true argument informs the framework to write verbose output to the controlling terminal of the job.
The main() method is the entry point for the driver. In this method, we instantiate a new Configuration object for the job. We then call the ToolRunner static run() method.
You have to compile the three classes and place the compiled classes into a directory called ceclasses Â?. Use the jar command to put the mapper and reducer classes into a jar file the path to which is included in the classpath when you build the driver. After you build the driver, the driver class is also added to the existing jar file.
Question : Sometimes, before running your MapReduce job. You configure the below environment variable LD_LIBRARY_PATH Why?
1. It defines a list of directories where your executables are located
2. It points to all the jars in the Hadoop distribu3on required to compile and run your MapReduce programs
Correct Answer : Get Lastest Questions and Answer : Explanation: HADOOP_HOME : allows you to reference the value of the HADOOP_HOME variable when defining other variables. LD_LIBRARY_PATH : environment variable defines the path to your library files for executables. uses libraries that are specifically compiled for the MapR distribution. Using Hadoop native libraries improves the performance of your MapReduce jobs by using compiled object code rather than Java byte codes.
Exp :The two formats that are best suited to merging small files into larger archives for processing in Hadoop are Avro and SequenceFiles. Avro has Ruby bindings; SequenceFiles are only supported in Java.
JSON, TIFF, and MPEG are not appropriate formats for archives. JSON is also not an appropriate format for image data.
Question : SequenceFiles are flat files consisting of binary key/value pairs. SequenceFile provides Writer, Reader and SequenceFile.Sorter classes for writing, reading and sorting respectively. There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs: You have created a SequenceFile (MAIN.PROFILE.log) with custom key and value types. What command displays the contents of a SequenceFile named MAIN.PROFILE.log in your terminal in human-readable format?
1. Disable speculative execution for the data insert job 2. Enable speculative execution for the data insert job 3. Access Mostly Uused Products by 50000+ Subscribers 4. Configure only single mapper for the data insert job