Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question :

Workflows expressed in Oozie can contain:


 :
1. Iterative repetition of MapReduce jobs until a desired answer or state is reached.
2. Sequences of MapReduce and Pig jobs. These are limited to linear sequences of actions with exception handlers but no forks.
3. Sequences of MapReduce jobs only; no Pig or Hive tasks or jobs. These MapReduce sequences can be combined with forks and path joins.
4. Sequences of MapReduce and Pig. These sequences can be combined with other actions including forks, decision points, and path joins.


Correct Answer : Get Lastest Questions and Answer :

Apache OOZie Control Nodes :

A decision control node allows Oozie to determine the workflow execution path based on some criteria
Similar to a switch case statement
fork and join control nodes split one execution path into multiple execution paths which run concurrently
fork splits the execution path
join waits for all concurrent execution paths to complete before proceeding
fork and join are used in pairs





Question :

You have an employee who is a Date Analyst and is very comfortable with SQL.
He would like to run ad-hoc analysis on data in your HDFS duster.
Which of the following is a data warehousing software built on top of
Apache Hadoop that defines a simple SQL-like query language well-suited for this kind of user?
A. Pig B. Hue C. Hive D. Sqoop E. Oozie

 :
1. A
2. B
3. C
4. D
5. E


Correct Answer : Get Lastest Questions and Answer :

Apache Hive :

Hive is an abstraction on top of MapReduce
Allows users to query data in the Hadoop cluster without knowing Java or MapReduce
- Uses the HiveQL language
- Very similar to SQL
- The Hive Interpreter runs on a client machine
- Turns HiveQL queries into MapReduce jobs
- Hive Submits jobs to the cluster
Note: this does not turn the cluster into a relational database server!
It is still simply running MapReduce jobs
Those jobs are created by the Hive Interpreter

Refer HadoopExam.com Recorded Training Module : 12 and 13





Question :

You need to import a portion of a relational database every day as files to HDFS,
and generate Java classes to Interact with your imported data. Which of the following tools should you use to accomplish this?
A. Pig B. Hue C. Hive D. Flume E. Sqoop F. Oozie G. fuse-dfs

 :
1. A,B
2. B,C
3. C,E
4. F,G


Correct Answer : Get Lastest Questions and Answer :

Apache Hive :

Hive is an abstraction on top of MapReduce
Allows users to query data in the Hadoop cluster without knowing Java or MapReduce
- Uses the HiveQL language
- Very similar to SQL
- The Hive Interpreter runs on a client machine
- Turns HiveQL queries into MapReduce jobs
- Hive Submits jobs to the cluster
Note: this does not turn the cluster into a relational database server!
It is still simply running MapReduce jobs
Those jobs are created by the Hive Interpreter

Sqoop provides a method to import data from tables in a relational database into HDFS
- Does this very efficiently via a Map only MapReduce job
- Can also go the other way
- Populate database tables from files in HDFS

Refer HadoopExam.com Recorded Training Module : 12 and 13 and 19


Related Questions


Question :Number of reducer is defined by the user ?

 :Number of reducer is defined by the user ?
1. True
2. False


Question :Select the correct staement regarding reducer


  :Select the correct staement regarding reducer
1. Number of reducer is defined as part of Job Configuration
2. All values of the same key can be processed by multiple reducer.
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2 and 3 are correct
5. 1 and 3 are correct


Question : Distributing the values among associated with the key in sorted order to the reducer is defined as ?
  : Distributing the values among associated with the key in sorted order to the reducer is defined as ?
1. Map and Reduce
2. Shuffle and Sort
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above


Question :

Which of the following statements most accurately describes the relationship between MapReduce and Pig?

 :
1. Pig provides additional capabilities that allow certain types of data manipulation not possible with MapReduce.
2. Pig provides no additional capabilities to MapReduce. Pig programs are executed as MapReduce jobs via the Pig interpreter.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Pig provides the additional capability of allowing you to control the flow of multiple MapReduce jobs.




Question :Which of the following best describes the workings of TextInputFormat?

 :Which of the following best describes the workings of TextInputFormat?
1. Input file splits may cross line breaks. A line that crosses tile splits is ignored.
2. The input file is split exactly at the line breaks, so each Record Reader will read a series of complete lines.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
5. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.


Question :

In a MapReduce job, you want each of you input files processed by a single map task.
How do you configure a MapReduce job so that a single map task processes each input
file regardless of how many blocks the input file occupies?


 :
1. Increase the parameter that controls minimum split size in the job configuration.
2. Write a custom MapRunner that iterates over all key-value pairs in the entire file.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Write a custom FileInputFormat and override the method isSplittable to always return false.