1. Iterative repetition of MapReduce jobs until a desired answer or state is reached. 2. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks. 3. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins. 4. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.
A decision control node allows Oozie to determine the workflow execution path based on some criteria Similar to a switch case statement fork and join control nodes split one execution path into multiple execution paths which run concurrently fork splits the execution path join waits for all concurrent execution paths to complete before proceeding fork and join are used in pairs
Question :
You have an employee who is a Date Analyst and is very comfortable with SQL. He would like to run ad-hoc analysis on data in your HDFS duster. Which of the following is a data warehousing software built on top of Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user? A. Pig B. Hue C. Hive D. Sqoop E. Oozie
Hive is an abstraction on top of MapReduce Allows users to query data in the Hadoop cluster without knowing Java or MapReduce - Uses the HiveQL language - Very similar to SQL - The Hive Interpreter runs on a client machine - Turns HiveQL queries into MapReduce jobs - Hive Submits jobs to the cluster Note: this does not turn the cluster into a relational database server! It is still simply running MapReduce jobs Those jobs are created by the Hive Interpreter
Refer HadoopExam.com Recorded Training Module : 12 and 13
Question :
You need to import a portion of a relational database every day as files to HDFS, and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this? A. Pig B. Hue C. Hive D. Flume E. Sqoop F. Oozie G. fuse-dfs
Hive is an abstraction on top of MapReduce Allows users to query data in the Hadoop cluster without knowing Java or MapReduce - Uses the HiveQL language - Very similar to SQL - The Hive Interpreter runs on a client machine - Turns HiveQL queries into MapReduce jobs - Hive Submits jobs to the cluster Note: this does not turn the cluster into a relational database server! It is still simply running MapReduce jobs Those jobs are created by the Hive Interpreter
Sqoop provides a method to import data from tables in a relational database into HDFS - Does this very efficiently via a Map only MapReduce job - Can also go the other way - Populate database tables from files in HDFS
Refer HadoopExam.com Recorded Training Module : 12 and 13 and 19
1. Number of reducer is defined as part of Job Configuration 2. All values of the same key can be processed by multiple reducer. 3. Access Mostly Uused Products by 50000+ Subscribers 4. 1,2 and 3 are correct 5. 1 and 3 are correct
1. Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce. 2. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.
1. Input file splits may cross line breaks. A line that crosses tile splits is ignored. 2. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line. 5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.
1. Increase the parameter that controls minimum split size in the job configuration. 2. Write a custom MapRunner that iterates over all key-value pairs in the entire file. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Write a custom FileInputFormat and override the method isSplittable to always return false.