Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the article. You decided to use HBase rather than HDFS to store this article. What would be the reason, you preferred the HBase over HDFS.
1. Fault tolerance 2. Batch processing 3. Random writes 4. Even Distribution of Data.
Correct Answer : 3 Explanation: Apache HBase provides random, realtime read/write access to your data. HDFS does not allow random writes. HDFS is built for scalability, fault tolerance, and batch processing. HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups. Features of HBase Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This makes it very suitable for tasks such as high-speed counter aggregation. Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are automatically split and re-distributed as your data grows. Automatic RegionServer failover Hadoop/HDFS Integration: HBase supports HDFS out of the box as its distributed file system. MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both source and sink. Java Client API: HBase supports an easy to use Java API for programmatic access. Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends. Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization. Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.
Question : All the software professionals who are subscriber at QuickTechie.com created their profile, as an administrator you also store the joining date of the profile. Full History of all the users and their profile is being stored in HBase for further analysis. Now one of the data scientist wants to fire ad-hoc query to fetch the Joining date of one of bad profiling who is publishing adult content on the website. In order to fetch the data from a cell (Joining Date), you need to supply HBase with which of the following? 1. A row key, column family and column qualifier 2. A row key, column qualifier and version
3. A column key 4. A column key and column qualifier
Correct Answer : 1
Explanation: HBase table maintains maps of Keys to Values (key -> value). Each of these mappings is called a keyvalue or a cell. Each cell identifies with (rowkey, columnFamily, columnQualifier, timestamp-> value) map. When you don't specify the timestamp (rowkey, columnFamily, columnQualifier-> value) map, Get will retrieve only the current version of the row.
" HBase is a key/value store. Specifically it is a Sparse, Consistent, Distributed, Multidimensional, Sorted map. " Map HBase maintains maps of Keys to Values (key -> value). Each of these mappings is called a "KeyValue" or a "Cell". You can find a value by its key... That's it. " Sorted These cells are sorted by the key. This is a very important property as it allows for searching ("give me all values for which the key is between X and Y"), rather than just retrieving a value for a known key. " Multidimensional The key itself has structure. Each key consists of the following parts: row-key, column family, column, and time-stamp. So the mapping is actually: (rowkey, column family, column, timestamp) -> value rowkey and value are just bytes (column family needs to be printable), so you can store anything that you can serialize into a byte[] into a cell. " Sparse This follows from the fact the HBase stores key -> value mappings and that a "row" is nothing more than a grouping of these mappings (identified by the rowkey mentioned above). Unlike NULL in most relational databases, no storage is needed for absent information, there will be just no cell for a column that does not have any value. It also means that every value carries all its coordinates with it.
Distributed One key feature of HBase is that the data can be spread over 100s or 1000s of machines and reach billions of cells. HBase manages the load balancing automatically.
Consistent HBase makes two guarantees: All changes the with the same rowkey (see Multidimensional above) are atomic. A reader will always read the last written (and committed) values.
Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the article. You decided to use HBase rather than HDFS to store this article. You need to create a ARTICLES table in HBase. The table will consist of a one Column Family called PROFILE_ARTICLES and two column qualifiers, USER and COMMENT. Select the correct command which will create this table: 1. create 'ARTICLES', {NAME => 'Author', NAME =>'Comment'}
3. create 'ARTICLES', 'PROFILE_ARTICLES' {NAME => 'Author', NAME => 'Comment'}
4. create 'ARTICLES', 'PROFILE_ARTICLES'
Correct Answer : 4 Explanation: When you create a HBase table, you need to specify table name and column family name. For this example: Table name: 'ARTICLES' ColumnFamily: 'PROFILE_ARTICLES' For example, in the HBase shell, use create to create the table by passing it a name and then verify it with thedescribe command. hbase> create 'ARTICLES', 'PROFILE_ARTICLES' Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section. Examples: hbase> create 't1', {NAME => 'f1', VERSIONS => 5} hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create 't1', 'f1', 'f2', 'f3' hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} Tables Management commands : alter Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described on the main help command output. Dictionary must include name of column family to alter.For example, to change or add the 'f1' column family in table 't1' from current value to keep a maximum of 5 cell VERSIONS, do: hbase> alter 't1', NAME => 'f1', VERSIONS => 5 You can operate on several column families: hbase> alter 't1', 'f1', {NAME => 'f2', IN_MEMORY => true}, {NAME => 'f3', VERSIONS => 5} To delete the 'f1' column family in table 't1', use one of:hbase> alter 't1', NAME => 'f1', METHOD => 'delete' hbase> alter 't1', 'delete' => 'f1' You can also change table-scope attributes like MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end; for example, to change the max size of a region to 128MB, do: hbase> alter 't1', MAX_FILESIZE => '134217728' You can add a table coprocessor by setting a table coprocessor attribute: hbase> alter 't1', 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2' Since you can have multiple coprocessors configured for a table, a sequence number will be automatically appended to the attribute name to uniquely identify it. The coprocessor attribute must match the pattern below in order for the framework to understand how to load the coprocessor classes:[coprocessor jar file location] | class name | [priority] | [arguments] You can also set configuration settings specific to this table or column family: hbase> alter 't1', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'} hbase> alter 't1', {NAME => 'f2', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}} You can also remove a table-scope attribute: hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'MAX_FILESIZE' hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'coprocessor$1' There could be more than one alteration in one command: hbase> alter 't1', { NAME => 'f1', VERSIONS => 3 }, { MAX_FILESIZE => '134217728' }, { METHOD => 'delete', NAME => 'f2' }, OWNER => 'johndoe', METADATA => { 'mykey' => 'myvalue' } create Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. hbase> create 't1', {NAME => 'f1', VERSIONS => 5} hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create 't1', 'f1', 'f2', 'f3' hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}} Table configuration options can be put at the end.