Question : To avoid the loss of older articles version you enabled the temporality in HBase database, so all the version of article can be saved. Now the column in a table called ARTICLES, user_article_title has a version from 1 to 444 available for a article created by user JOHN with title "HBase Tutorial". Now to bring the table in stable state you execute a delete statement for deleting version 445. Select the correct statement...
1. The entire row of conatining article will be deleted
2. Only cells with specified version 445 are deleted 3. As version 445 does not exist hence nothing in the row will be deleted
Explanation: Regardless of row versions, a Delete class object deletes a specific single row completely. public Delete(byte[] row) Create a Delete operation for the specified row. If no further operations are done, this will delete everything associated with the specified row (all versions of all columns in all families). " org.apache.hadoop.hbase.client.Delete: Used to perform Delete operations on a single row. To delete an entire row, instantiate a Delete object with the row to delete. To further define the scope of what to delete, perform additional methods as outlined below. To delete specific families, execute deleteFamily for each family to delete. To delete multiple versions of specific columns, execute deleteColumns for each column to delete. To delete specific versions of specific columns, execute deleteColumn for each column version to delete. Specifying timestamps, deleteFamily and deleteColumns will delete all versions with a timestamp less than or equal to that passed. If no timestamp is specified, an entry is added with a timestamp of 'now' where 'now' is the servers's System.currentTimeMillis(). Specifying a timestamp to the deleteColumn method will delete versions only with a timestamp equal to that specified. If no timestamp is passed to deleteColumn, internally, it figures the most recent cell's timestamp and adds a delete at that timestamp; i.e. it deletes the most recently added cell. The timestamp passed to the constructor is used ONLY for delete of rows. For anything less -- a deleteColumn, deleteColumns or deleteFamily -- then you need to use the method overrides that take a timestamp. The constructor timestamp is not referenced.
Question : Given the following HBase table schema:( for user articles from QuickTechie.com website.) Row Key, ArticleContent:userProfileName, ArticleContent_A:address, UserVersion:3, UserVersion:10 A table scan will return the column data in which sorted order? 1. Row Key, ArticleContent_Altered:address, ArticleContent:userProfileName, UserVersion:3, UserVersion:10 2. Row Key, ArticleContent_Altered:address,ArticleContent:userProfileName, UserVersion:10, UserVersion:3 3. Row Key, ArticleContent:userProfileName, ArticleContent_Altered:address, UserVersion:3, UserVersion:10 4. Row Key, ArticleContent:userProfileName, ArticleContent_Altered:address, UserVersion:10, UserVersion:3
Explanation: HBase table contents are sorted in the following order: row key, column family, and column qualifier, and timestamp. It keeps them in a lexicographical order. In this example, there are two comparisons. First, column family ArticleContent is before ArticleContent_Altered. Within a column family, data will get sorted by a column qualifier. Thus, UserVersion:10 comes beforeUserVersion:3, as 1 from 10 is less than 3 in a lexicographical order. All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). The row key is treated by HBase as an array of bytes but it must have a string representation. A special property of the row key Map is that it keeps them in a lexicographical order. For example, numbers going from 1 to 100 will be ordered like this: 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,...,9,91,92,93,94,95,96,97,98,99 Perform Scans Using HBase Shell You can perform scans using HBase Shell, for testing or quick queries. Use the following guidelines or issue the scan command in HBase Shell with no parameters for more usage information. This represents only a subset of possibilities. # Display usage information hbase> scan # Scan all rows of table 't1' hbase> scan 't1' # Specify a startrow, limit the result to 10 rows, and only return selected columns hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'} # Specify a timerange hbase> scan 't1', {TIMERANGE => [1303668804, 1303668904]} # Specify a custom filter hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)} # Disable the block cache for a specific scan (experts only) hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}
Question : You are going to store this data in HBase file, select the order in which they will be stored.
Row # ROW KEY COLUMNFamilty:ColumnQualifier:timestamp Column Value A 01012015002 Article:Metadata,timestamp=201, value=Ankit B 01012015002 Article:User,timestamp=201, value=Baba C 01012015001 BLOG:Title,timestamp=501, value=Chitrank D 01012015001 BLOG:Author,timestamp=501, value=David E 01012015002 Article:Number,timestamp=201, value=Eigen F 01012015001 BLOG:Text,timestamp=501, value=Farukh
1. D, F, C, A, E, B 2. C, D, F, A, B, E 3. A, E, B, D, C, F
4. D, S, B, F, C, E
Correct Answer : Get Lastest Questions and Answer : Explanation: All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). The order in which data will be stored in HFile lexicographical order under each of the following 1. row key 2. column family 3. column qualifier 4. and timestamp.
This keeps data in a order.
In this example data, two row keys will be sorted in the following order (01012015001, 01012015002). The rows with 01012015001 are in the same column family, BLOG, so the column qualifiers will be stored in the following order [Author, Text, Title] which is: D,F,C. The rows with 01012015002 are in the same column family, Comments, so the column qualifiers will be stored in the following order [ Metadata, Number, User] which is:A,E,B. So final order will be D, F, C, A, E, B All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns between rows as well, it is your responsibility to keep track of the column names. The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows. For more information about how HBase stores data internally.In the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores: monotonically increasing values are bad. The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
If you do need to upload time series data into HBase, you should study OpenTSDB as a successful example. It has a page describing the link: schema it uses in HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the lead position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.