Correct Answer : Get Lastest Questions and Answer : Explanation: WritableComparables can be compared to each other, typically via Comparators. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface.
Note that hashCode() is frequently used in Hadoop to partition keys. It's important that your implementation of hashCode() returns the same result across different instances of the JVM. Note also that the default hashCode() implementation in Object does not satisfy this property.
public interface Writable A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput. Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Implementations typically implement a static read(DataInput) method which constructs a new instance, calls readFields(DataInput) and returns the instance.
Writable in an interface in Hadoop and types in Hadoop must implement this interface. Hadoop provides these writable wrappers for almost all Java primitive types and some other types,but sometimes we need to pass custom objects and these custom objects should implement Hadoop's Writable interface.Hadoop MapReduce uses implementations of Writables for interacting with user-provided Mappers and Reducers.
Question : In MapReduce jobs all the keys must implemented following interfaces
Correct Answer : Get Lastest Questions and Answer : Explanation: Comparable "This interface imposes a total ordering on the objects of each class that implements it. This ordering is referred to as the class's natural ordering, and the class's compareTo method is referred to as its natural comparison method.
For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.
Virtually all Java core classes that implement Comparable have natural orderings that are consistent with equals. One exception is java.math.BigDecimal, whose natural ordering equates BigDecimal objects with equal values and different precisions (such as 4.0 and 4.00).
Question : Put the following java classes in order, as they act when mapper is executed. A. InputKey B. InputSplit C. RecordReader D. InputFormat
Correct Answer : Get Lastest Questions and Answer : Explanation: InputFormat : Hadoop relies on the input format of the job to do three things: 1. Validate the input configuration for the job (i.e., checking that the data is there). 2. Split the input blocks and files into logical chunks of type InputSplit, each of which is assigned to a map task for processing. 3. Access Mostly Uused Products by 50000+ Subscribers
InputSplit : represents the data to be processed by an individual Mapper. Typically, it presents a byte-oriented view on the input and is the responsibility of RecordReader of the job to process this and present a record-oriented view. RecordReader : A RecordReader uses the data within the boundaries created by the input split to generate key/value pairs. In the context of file-based input, the cestart Â? is the byte position in the file where the RecordReader should start generating key/value pairs. The ceend Â? is where it should stop reading records. These are not hard boundaries as far as the API is concerned "there is nothing stopping a developer from reading the entire file for each map task. While reading the entire file is not advised, reading outside of the boundaries it often necessary to ensure that a complete record is generated
1. Hadoop uses a lot of machines in parallel. This optimizes data processing. 2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hadoop uses sophisticated caching techniques on namenode to speed processing of data
1. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs 2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of above