Question : The Intermediate data is held on the TaskTrackers local disk ? 1. True 2. False
Correct Answer : 1
Intermedate Data
The intermediate data is held on the TaskTrackers local disk - As Reducers start up, the intermediate data is distributed across the network to the Reducers - Reducers write their final output to HDFS - Once the job has completed, the TaskTracker can delete the intermediate data from its local disk - Note that the intermediate data is not deleted until the entire job completes
Refer HadoopExam.com Recorded Training Module : 2,3 and 4
Question : Which hadoop project gives SQL like interface to access data which is stored in HDFS 1. Flume 2. Hive 3. Pig 4. 2 and 3
Correct Answer : 2
Apache Hive :
Hive is an abstraction on top of MapReduce Allows users to query data in the Hadoop cluster without knowing Java or MapReduce - Uses the HiveQL language - Very similar to SQL - The Hive Interpreter runs on a client machine - Turns HiveQL queries into MapReduce jobs - Hive Submits jobs to the cluster Note: this does not turn the cluster into a relational database server! It is still simply running MapReduce jobs Those jobs are created by the Hive Interpreter
Refer HadoopExam.com Recorded Training Module : 12 and 13
Question : Which of the following project provides the dataflow for tranforming large datasets
1. Hive 2. Pig 3. Flume 4. 2 and 3 both
Correct Answer : 2
Apache Pig : Pig is an alternative abstraction on top of MapReduce Uses a dataflow scripting language - Called PigLatin - The Pig interpreter runs on the client machine Takes the PigLatin script and turns it into a series of MapReduce jobs Submits those jobs to the cluster As with Hive, nothing magical happens on the cluster It is still simply running MapReduce jobs
Refer HadoopExam.com Recorded Training Module : 11
1. Hadoop uses a lot of machines in parallel. This optimizes data processing. 2. Hadoop was specifically designed to process large amount of data by taking advantage of MPP hardware 3. Hadoop ships the code to the data instead of sending the data to the code 4. Hadoop uses sophisticated cacheing techniques on namenode to speed processing of data
1. Sequence files are binary format files that are compressed and are splitable. They are often used in high-performance map-reduce jobs 2. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 3. Sequence files are intermediate files that are created by Hadoop after the map step 4. All of above
1. Map files are stored on the namenode and capture the metadata for all blocks on a particular rack. This is how Hadoop is "rack aware" 2. Map files are the files that show how the data is distributed in the Hadoop cluster. 3. Map files are generated by Map-Reduce after the reduce step. They show the task distribution during job execution 4. Map files are sorted sequence files that also have an index. The index allows fast data look up.