Question : Identify which best defines a SequenceFile? 1. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous Writable objects 2. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous Writable objects 3. Access Mostly Uused Products by 50000+ Subscribers 4. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be the same type.
Correct Answer : Get Lastest Questions and Answer : Explanation: SequenceFile is a flat file consisting of binary key/value pairs. There are 3 different SequenceFile formats: Uncompressed key/value records. Record compressed key/value records - only 'values' are compressed here. Block compressed key/value records - both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
Question : Your client application submits a MapReduce job to your Hadoop cluster. Identify the Hadoop daemon on which the Hadoop framework will look for an available slot schedule a MapReduce operation.
Correct Answer : Get Lastest Questions and Answer : Explanation: JobTracker is the daemon service for submitting and tracking MapReduce jobs in Hadoop. There is only One Job Tracker process run on any hadoop cluster. Job Tracker runs on its own JVM process. In a typical production cluster its run on a separate machine. Each slave node is configured with job tracker node location. The JobTracker is single point of failure for the Hadoop MapReduce service. If it goes down, all running jobs are halted. JobTracker in Hadoop performs following actions(from Hadoop Wiki:) Client applications submit jobs to the Job tracker. The JobTracker talks to the NameNode to determine the location of the data The JobTracker locates TaskTracker nodes with available slots at or near the data The JobTracker submits the work to the chosen TaskTracker nodes. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker. A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable. When the work is completed, the JobTracker updates its status. Client applications can poll the JobTracker for information.
Question : How are keys and values presented and passed to the reducers during a standard sort and shuffle phase of MapReduce?
1. Keys are presented to reducer in sorted order; values for a given key are not sorted. 2. Keys are presented to reducer in sorted order; values for a given key are sorted in ascending order. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Keys are presented to a reducer in random order; values for a given key are sorted in ascending order.
Correct Answer : Get Lastest Questions and Answer : Explanation: Reducer has 3 primary phases: 1. Shuffle : The Reducer copies the sorted output from each Mapper using HTTP across the network. 2. Sort : The framework merge sorts Reducer inputs by keys (since different Mappers may have output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged. SecondarySort : To achieve a secondary sort on the values returned by the value iterator, the application should extend the key with the secondary key and define a grouping comparator. The keys will be sorted using the entire key, but will be grouped using the grouping comparator to decide which keys and values are sent in the same call to reduce. 3. Access Mostly Uused Products by 50000+ Subscribers (collection of values)> in the sorted inputs. The output of the reduce task is typically written to a RecordWriter via TaskInputOutputContext.write(Object, Object). The output of the Reducer is not re-sorted. Reference: org.apache.hadoop.mapreduce, Class Reducer
1. Writable data types are specifically optimized for network transmissions 2. Writable data types are specifically optimized for file system storage 3. Access Mostly Uused Products by 50000+ Subscribers 4. Writable data types are specifically optimized for data retrieval
1. Hadoop API will convert the data to the type that is needed by the reducer. 2. Data input/output inconsistency cannot occur. A preliminary validation check is executed prior to the full execution of the job to ensure there is consistency. 3. Access Mostly Uused Products by 50000+ Subscribers 4. A real-time exception will be thrown and map-reduce job will fail
1. Yes, there is a special data type that will suppress job output 2. No, map reduce job will always generate output. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Yes, but only during map execution when reducers have been set to zero