Premium

Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)



Question :

Suppose you want to create following Hive table, with the partitioned by Date column. Which is the correct syntax

id int,
date date,
name varchar

  :
1. create table table_name ( id int, date date, name string ) ) partitioned by (date string)
2. create table table_name ( id int, date date, name string ) ) partitioned by (string)
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only 2 and 3 correct

Coorect Answer : 3
Partitioned tables can be created using the PARTITIONED BY clause. A table can have one or more partition columns and a separate data directory is created for each distinct value combination in the partition columns. Further, tables or partitions can be bucketed using CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This can improve performance on certain kinds of queries.
If, when creating a partitioned table, you get this error: "FAILED: Error in semantic analysis: Column repeated in partitioning columns," it means you are trying to include the partitioned column in the data of the table itself. You probably really do have the column defined. However, the partition you create makes a pseudocolumn on which you can query, so you must rename your table column to something else (that users should not query on!).





Question :

You have following DDL to create Hive table

CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE

Select the correct statement which applies
A. The statement above creates the page_view table with viewTime, userid, page_url, referrer_url, and ip columns (including comments).
B. The table is also partitioned
C. Data is stored in sequence files.
D. The data format in the files is assumed to be field-delimited by ctrl-A and row-delimited by newline.


  :
1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,B,C,D
5. A,B,C

Correct Answer : Get Lastest Questions and Answer :
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
STORED AS SEQUENCEFILE;
The statement above creates the page_view table with viewTime, userid, page_url, referrer_url, and ip columns (including comments). The table is also partitioned and data is stored in sequence files. The data format in the files is assumed to be field-delimited by ctrl-A and row-delimited by newline.





Question :

Select the correct statement for the below command

CREATE TABLE new_key_value_store
ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
STORED AS RCFile
AS
SELECT (key % 1024) new_key, concat(key, value) key_value_pair
FROM key_value_store
SORT BY new_key, key_value_pair


  :
1. The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement
2. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1 and 2 is correct
5. All 1,2 and 3 are correct

Correct Answer : Get Lastest Questions and Answer :

Explanation: The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 etc. In addition, the new target table is created using a specific SerDe and a storage format independent of the source tables in the SELECT statement.


Related Questions


Question : Because of OutOfMemory a Map or Reduce Task can crash, How does Hadoop MapReduce v (MRv)
handle JVMs when a new MapReduce job is started on a cluster?
 : Because of OutOfMemory a Map or Reduce Task can crash, How does Hadoop MapReduce v (MRv)
1. The TaskTracker may or may not use same JVM for each task it manages on that node
2. The TaskTracker reuse the same JVM for each task it manages on that node
3. Access Mostly Uused Products by 50000+ Subscribers
4. The TaskTracker spawns a new JVM for each task it manages on that node


Question : You have configured Hadoop cluster with MR. And, you have a directory called HadoopExam in HDFS containing three files: Exam and Exam.
You submit a job to the cluster, using that directory as the input directory.
A few seconds after you have submitted the job, a user start copying a large file, Exam3,
into the directory. Select the correct statement?
 : You have configured Hadoop cluster with MR. And, you have a directory called HadoopExam in HDFS containing three files: Exam and Exam.
1. All files Exam1, Exam2 and Exam3 will be processed by the job
2. Only files Exam1, and Exam2 will be processed by the job
3. Access Mostly Uused Products by 50000+ Subscribers
4. Only files Exam3 will be processed by the job


Question :
As you know Hadoop cluster is made of Multiple nodes and each file is divided in multiple blocks and stored in different nodes.
For this you need to able to serialize your data and you use Writable interface for this, select the correct statement for the
Writable interface.
 :
1. Writable is a class that all keys and values in MapReduce must extend. Classes extending this interface must implement methods for serializing and deserializing themselves.
2. Writable is an class that all keys and values in MapReduce must extend. Classes extending this interface must not implement methods for serializing and deserializing themselves until they want to customize it.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Writable is an interface that all values in MapReduce must implement. Classes implementing this interface must implement methods for serializing and deserializing themselves.


Question :
You have written a MapReduce job, and in the Reducer you want the data to be adjusted from multiple reducers before
writing to the HDFS,Is it possible that reduce tasks to communicate with each other and can talk to each other? .
 :
1. Yes, all reducer task runs can share the data by doing proper configuration
2. Yes, each reduce task runs independently and in isolation, by creating a shared file reducer can communicate with each other
3. Access Mostly Uused Products by 50000+ Subscribers
4. It all depends on file size created if it is smaller than block size then it is possible


Question : All HadoopExam website subscribers information is stored in the MySQL database,
Which tool is best suited to import a portion of a subscribers information every day as files into HDFS,
and generate Java classes to interact with that imported data?
 : All HadoopExam website subscribers information is stored in the MySQL database,
1. Hive
2. Pig
3. Access Mostly Uused Products by 50000+ Subscribers
4. Flume


Question :A client application of HadoopExam creates an HDFS file named HadoopExam.txt with a replication factor of .
Identify which best describes the file access rules in HDFS if the file has a single block that is stored on data nodes C1, C2, C3, C4 and C5?
 :A client application of HadoopExam creates an HDFS file named HadoopExam.txt with a replication factor of .
1. The file can not be accessed if at least one of the DataNodes storing the block is un-available.
2. The file can be accessed if at least one of the DataNodes storing the block is available and client connected to that node only.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The file can be accessed if at least one of the DataNodes storing the block is available and even NameNode is crashed.