Oozie is a workflow engine - Runs on a server - Typically outside the cluster - Runs workflows of Hadoop jobs - Including Pig, Hive, Sqoop jobs - Submits those jobs to the cluster based on a workflow definition - Workflow definitions are submited via HTTP - Jobs can be run at specific times - One time or recurring jobs - Jobs can be run when data is present in a directory
Select the correct statement? 1. In oozie workflow, all the MapReduce jobs can run in sequence only 2. Jobs can run parallel as well as in sequence 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above 5. 2 and 3
Oozie is a system for describing the workflow of a job, where that job may contain a set of map reduce jobs, pig scripts, fs operations etc and supports fork and joining of the data flow.
It doesnt however allow you to stream the input of one MR job as the input to another - the map-reduce action in oozie still requires an output format of some type, typically a File based on, so your output from job 1 will still be serialized via HDFS, before being processed by job 2.
Oozie can run jobs sequentially (one after the other) and in parallel (multiple at a time)
1. No reducer can start until last Mapper finished 2. If mapper is running slow then another instance of Mapper will be started by Hadoop on another machine 3. Access Mostly Uused Products by 50000+ Subscribers 4. The result of the first mapper finished will be used 5. All of the above