Question : Which staement is true about Apache Flume
1. Flume is a distributed service 2. it is used to moving large amount of data as it is produced 3. It is ideal for gathering logs from multiple systems 4. Can be used to inserting logs into HDFS 5. All of the above
Correct Answer : 5
Apache Flume :
Flume is a distributed, reliable, available service for efficiently moving large amounts of data as it is produced - Ideally suited to gathering logs from multiple systems and inserting them into HDFS as they are generated
Question : Which statement is wrong about flume
1. Flume can continue deliver events in the face of system component failure 2. Flume can scales horizontally 3. Flume provides a central Master controller for manageability 4. 1 and 3 5. None of the above
Correct Answer : 5
Explanation : Flume is designed to continue delivering events in the face of system component failure - Flume scales horizontally to support scalability - As load increases, more machines can be added to the configuration - Flume provides a central Master controller for manageability - Administrators can monitor and reconfigure data flows on the fly - Flume can be extended by adding connectors to existing storage layers or data platforms - General sources already provided include data from files, syslog, and standard output (stdout) from a process - General endpoints already provided include files on the local filesystem or in HDFS - Other connectors can be added using Flume is API
Question :
Flume can be extended by adding connectors to existing storage layers
1. True 2. False
Correct Answer : 1
Flume can be extended by adding Sources and Sinks to existing storage layers or data plalorms - General Sources include data from files, syslog, and standard output from a process - General Sinks include files on the local filesystem or HDFS - Developers can write their own Sources or Sinks
1. Sequence files are a type of the file in the Hadoop framework that allow data to be sorted 2. Sequence files are binary format files that are compressed and are splitable. 3. Access Mostly Uused Products by 50000+ Subscribers 4. All of the above
Question : What is HIVE? 1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data 2. Hive is a way to add data from local file system to HDFS 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing