IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Question : Which tool is best suited to import a portion of a relational database every day as files into HDFS,
and generate Java classes to interact with that imported data?

1. Oozie
2. Hue
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sqoop
5. Pig or Hive

Correct Answer : Get Lastest Questions and Answer :

Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities:
" Imports individual tables or entire databases to files in HDFS
" Generates Java classes to allow you to interact with your imported data
" Provides the ability to import from SQL databases straight into your Hive data warehouse
After setting up an import job in Sqoop, you can get started working with SQL database-backed data from your Hadoop MapReduce cluster in minutes.

The input to the import process is a database table. Sqoop will read the table row-by-row into HDFS. The output of this import process is a set of files containing a copy of the
imported table. The import process is performed in parallel. For this reason, the output will be in multiple files. These files may be delimited text files (for example, with
commas or tabs separating each field), or binary Avro or SequenceFiles containing serialized record data.

A by-product of the import process is a generated Java class which can encapsulate one row of the imported table. This class is used during the import process by Sqoop itself. The
Java source code for this class is also provided to you, for use in subsequent MapReduce processing of the data. This class can serialize and deserialize data to and from the
SequenceFile format. It can also parse the delimited-text form of a record. These abilities allow you to quickly develop MapReduce applications that use the HDFS-stored records in
your processing pipeline. You are also free to parse the delimiteds record data yourself, using any other tools you prefer.

Please refer Hadoop Professional Recorded Training provided by HadoopExam.com

Question : Workflows expressed in Oozie can contain:

1. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
2. Iterative repetition of MapReduce jobs until a desired answer or state is reached.

3. Access Mostly Uused Products by 50000+ Subscribers
4. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.

Correct Answer : Get Lastest Questions and Answer :

Explanation: Oozie is a workflow system specifically built to work with Hadoop, MapReduce, and Pig job. An Oozie workflow is a collection of actions (i.e. Hadoop Map/Reduce jobs, Pig
jobs) arranged in a control dependency DAG (Direct Acyclic Graph). "Control dependency" from one action to another means that the second action can't run until the first action
has completed. Oozie workflows definitions are written in hPDL (a XML Process Definition Language similar to JBOSS JBPM jPDL). Users write workflows in an XML language that define
one or more MapReduce jobs, their interdependencies, and what to do in the case of failures. These workflows are uploaded to the Oozie server where they are scheduled to run or
executed immediately. When Oozie executes a MapReduce job as part of a workflow, it is run by the Oozie server, which keeps track of job-level failures and status.

Oozie workflow actions start jobs in remote systems (i.e. Hadoop, Pig). Upon action completion, the remote systems callback Oozie to notify the action completion, at this point
Oozie proceeds to the next

Question :

What is PIG?

1. Pig is a subset fo the Hadoop API for data processing
2. Pig is a part of the Apache Hadoop project that provides scripting language interface for data processing
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of Above

Correct Answer : Get Lastest Questions and Answer :

Pig is a project that was developed by Yahoo for people with very strong skills in scripting languages.
Using scripting language, it dynamically creates Map Reduce jobs automatically

IBM Certified Data Architect - Big Data Certification Questions and Answers (Dumps and Practice Questions)

Related Questions