Cloudera Hadoop Developer Certification Questions and Answer (Dumps and Practice Questions)

Question : Which staement is true about Apache Flume

1. Flume is a distributed service
2. it is used to moving large amount of data as it is produced
3. It is ideal for gathering logs from multiple systems
4. Can be used to inserting logs into HDFS
5. All of the above

Correct Answer : 5

Apache Flume :

Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced
- Ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated

Question : Which statement is wrong about flume

1. Flume can continue deliver events in the face of system component failure
2. Flume can scales horizontally
3. Flume provides a central Master controller for manageability
4. 1 and 3
5. None of the above

Correct Answer : 5

Explanation :
Flume is designed to continue delivering events in the face of
system component failure
- Flume scales horizontally to support scalability
- As load increases, more machines can be added to the
configuration
- Flume provides a central Master controller for manageability
- Administrators can monitor and reconfigure data flows on the fly
- Flume can be extended by adding connectors to existing storage
layers or data platforms
- General sources already provided include data from files, syslog,
and standard output (stdout) from a process
- General endpoints already provided include files on the local
filesystem or in HDFS
- Other connectors can be added using Flume is API

Question :

Flume can be extended by adding connectors to existing storage layers

1. True
2. False

Correct Answer : 1

Flume can be extended by adding Sources and Sinks to existing storage layers or data plalorms
- General Sources include data from files, syslog, and standard output from a process
- General Sinks include files on the local filesystem or HDFS
- Developers can write their own Sources or Sinks

Related Questions

Question : What are sequence files and why are they important?

1. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted
2. Sequence files are binary format files that are compressed and are splitable.
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : How can you use binary data in MapReduce?

1. Binary data cannot be used by Hadoop fremework.
2. Binary data can be used directly by a map-reduce job. Often binary data is added to a sequence file
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hadoop can freely use binary files with map-reduce jobs so long as the files have headers

Question : What is HIVE?

1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data
2. Hive is a way to add data from local file system to HDFS
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing

Question : Which is Hadoop Daemon Process

1. JobTracker
2. Tasktracker
3. Access Mostly Uused Products by 50000+ Subscribers
4. DataNode
5. All of the above

Question : Which statement is true about apache Hadoop ?

1. HDFS performs best with a modest number of large files
2. No Randome Writes is alowed to the file
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above

Question : Which statement is true about the storing files in HDFS

1. Files are split in the block
2. All the blocks of the files should remain on same macine
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the above
5. 1 and 3 are correct