Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now you are able to push this full file in a directory on HDFS called /log/QT/QT31012015.log. Now you also get to know you can store the same data in the HBase as well, because it provides ...
Explanation: When Would I Use Apache HBase? Use Apache HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.Apache HBase provides random, realtime read/write access to your data. HDFS does not allow random writes. HDFS is built for scalability, fault tolerance, and batch processing.
Features
Linear and modular scalability. Strictly consistent reads and writes. Automatic and configurable sharding of tables Automatic failover support between RegionServers. Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase tables. Easy to use Java API for client access. Block cache and Bloom Filters for real-time queries. Query predicate push down via server side Filters Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options Extensible jruby-based (JIRB) shell Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
Question : You have setup a Hadoop Cluster in Norman data center, and having all the settings as default., how much data will you be able to store on your Hadoop cluster if it has 12 nodes with 4TB of raw disk space per node allocated to HDFS storage? 1. Nearly 3TB 2. Nearly 12TB
Explanation: These instructions for cluster configuration assume that you have already downloaded and unzipped a copy of Hadoop. Module 3 discusses getting started with Hadoop for this tutorial. Module 7 discusses how to set up a larger cluster and provides preliminary setup instructions for Hadoop, including downloading prerequisite software. The HDFS configuration is located in a set of XML files in the Hadoop configuration directory; conf/ under the main Hadoop install directory (where you unzipped Hadoop to). The conf/hadoop-defaults.xml file contains default values for every parameter in Hadoop. This file is considered read-only. You override this configuration by setting new values in conf/hadoop-site.xml. This file should be replicated consistently across all machines in the cluster. (It is also possible, though not advisable, to host it on NFS.)In cluster we have 12 nodes, each with 4TB of disk space allocated to HDFS, you have a total of 48TB of space available. 48/3 = 16, so you can store approximately 16TB of data. Configuration settings are a set of key-value pairs of the format: property-name property-value
Adding the line final -- true inside the property body will prevent properties from being overridden by user applications. This is useful for most system-wide configuration options.By default, Hadoop replicates each HDFS block three times.
Question : You have a website www.QuickTechie.com, where you have the entire user profiles stored in the MySQL database. Now you want to fetch everyday new profiles from this database and store into the HDFS as log file, also you wanted to have POJO's created to interact with the imported data. Select the tool which perfectly solves above problem. 1. Oozie 2. Hue 3. Access Mostly Uused Products by 50000+ Subscribers 4. Sqoop 5. Pig or Hive
Correct Answer : Get Lastest Questions and Answer : Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.Sqoop ("SQL-to-Hadoop") is a straightforward command-line tool with the following capabilities: Imports individual tables or entire databases to files in HDFS , Generates Java classes to allow you to interact with your imported data, Provides the ability to import from SQL databases straight into your Hive data warehouse. After setting up an import job in Sqoop, you can get started working with SQL database-backed data from your Hadoop MapReduce cluster in minutes. The input to the import process is a database table. Sqoop will read the table row-by-row into HDFS. The output of this import process is a set of files containing a copy of the imported table. The import process is performed in parallel. For this reason, the output will be in multiple files. These files may be delimited text files (for example, with commas or tabs separating each field), or binary Avro or SequenceFiles containing serialized record data. A by-product of the import process is a generated Java class which can encapsulate one row of the imported table. This class is used during the import process by Sqoop itself. The Java source code for this class is also provided to you, for use in subsequent MapReduce processing of the data. This class can serialize and deserialize data to and from the SequenceFile format. It can also parse the delimited-text form of a record. These abilities allow you to quickly develop MapReduce applications that use the HDFS-stored records in your processing pipeline. You are also free to parse the delimiteds record data yourself, using any other tools you prefer. Please refer Hadoop Professional Recorded Training provided by HadoopExam.com
1. When Job QT2 gets submitted, it will get assigned tasks, while job QT1 continues to run with fewer tasks 2. When Job QT2 gets submitted, Job QT1 has to finish first, before job QT1 can gets scheduled. 3. Access Mostly Uused Products by 50000+ Subscribers 4. When Job QT1 gets submitted, it consumes all the task slots.