Question : Which tool is best suited to import a portion of a relational database every day as files into HDFS, and generate Java classes to interact with that imported data?
Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities: " Imports individual tables or entire databases to files in HDFS " Generates Java classes to allow you to interact with your imported data " Provides the ability to import from SQL databases straight into your Hive data warehouse After setting up an import job in Sqoop, you can get started working with SQL database-backed data from your Hadoop MapReduce cluster in minutes.
The input to the import process is a database table. Sqoop will read the table row-by-row into HDFS. The output of this import process is a set of files containing a copy of the imported table. The import process is performed in parallel. For this reason, the output will be in multiple files. These files may be delimited text files (for example, with commas or tabs separating each field), or binary Avro or SequenceFiles containing serialized record data.
A by-product of the import process is a generated Java class which can encapsulate one row of the imported table. This class is used during the import process by Sqoop itself. The Java source code for this class is also provided to you, for use in subsequent MapReduce processing of the data. This class can serialize and deserialize data to and from the SequenceFile format. It can also parse the delimited-text form of a record. These abilities allow you to quickly develop MapReduce applications that use the HDFS-stored records in your processing pipeline. You are also free to parse the delimiteds record data yourself, using any other tools you prefer.
Please refer Hadoop Professional Recorded Training provided by HadoopExam.com
Question : Workflows expressed in Oozie can contain: 1. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins. 2. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
Explanation: Oozie is a workflow system specifically built to work with Hadoop, MapReduce, and Pig job. An Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig jobs) arranged in a control dependency DAG (Direct Acyclic Graph). "Control dependency" from one action to another means that the second action can't run until the first action has completed. Oozie workflows definitions are written in hPDL (a XML Process Definition Language similar to JBOSS JBPM jPDL). Users write workflows in an XML language that define one or more MapReduce jobs, their interdependencies, and what to do in the case of failures. These workflows are uploaded to the Oozie server where they are scheduled to run or executed immediately. When Oozie executes a MapReduce job as part of a workflow, it is run by the Oozie server, which keeps track of job-level failures and status.
Oozie workflow actions start jobs in remote systems (i.e. Hadoop, Pig). Upon action completion, the remote systems callback Oozie to notify the action completion, at this point Oozie proceeds to the next
Question :
What is PIG? 1. Pig is a subset fo the Hadoop API for data processing 2. Pig is a part of the Apache Hadoop project that provides scripting language interface for data processing 3. Access Mostly Uused Products by 50000+ Subscribers 4. None of Above
Pig is a project that was developed by Yahoo for people with very strong skills in scripting languages. Using scripting language, it dynamically creates Map Reduce jobs automatically