Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)

Question : To avoid the loss of older articles version you enabled the temporality in HBase database, so all the version of article can be saved. Now the column
in a table called ARTICLES, user_article_title has a version from 1 to 444 available for a article created by user JOHN with title "HBase Tutorial".
Now to bring the table in stable state you execute a delete statement for deleting version 445. Select the correct statement...

1. The entire row of conatining article will be deleted

2. Only cells with specified version 445 are deleted
3. As version 445 does not exist hence nothing in the row will be deleted

4. The delete fails with an error

Correct Answer : Get Lastest Questions and Answer :

Explanation: Regardless of row versions, a Delete class object deletes a specific single row completely.
public Delete(byte[] row)
Create a Delete operation for the specified row.
If no further operations are done, this will delete everything associated with the specified row (all versions of all columns in all families).
" org.apache.hadoop.hbase.client.Delete: Used to perform Delete operations on a single row. To delete an entire row, instantiate a Delete object with the row to delete. To further define the scope of what to delete, perform additional methods as outlined below. To delete specific families, execute deleteFamily for each family to delete.
To delete multiple versions of specific columns, execute deleteColumns for each column to delete. To delete specific versions of specific columns, execute deleteColumn for each column version to delete. Specifying timestamps, deleteFamily and deleteColumns will delete all versions with a timestamp less than or equal to that passed. If no timestamp is specified, an entry is added with a timestamp of 'now' where 'now' is the servers's System.currentTimeMillis(). Specifying a timestamp to the deleteColumn method will delete versions only with a timestamp equal to that specified. If no timestamp is passed to deleteColumn, internally, it figures the most recent cell's timestamp and adds a delete at that timestamp; i.e. it deletes the most recently added cell.
The timestamp passed to the constructor is used ONLY for delete of rows. For anything less -- a deleteColumn, deleteColumns or deleteFamily -- then you need to use the method overrides that take a timestamp. The constructor timestamp is not referenced.

Question : Given the following HBase table schema:( for user articles from QuickTechie.com website.)
Row Key, ArticleContent:userProfileName, ArticleContent_A:address, UserVersion:3, UserVersion:10
A table scan will return the column data in which sorted order?

1. Row Key, ArticleContent_Altered:address, ArticleContent:userProfileName, UserVersion:3, UserVersion:10
2. Row Key, ArticleContent_Altered:address,ArticleContent:userProfileName, UserVersion:10, UserVersion:3
3. Row Key, ArticleContent:userProfileName, ArticleContent_Altered:address, UserVersion:3, UserVersion:10
4. Row Key, ArticleContent:userProfileName, ArticleContent_Altered:address, UserVersion:10, UserVersion:3

Correct Answer : Get Lastest Questions and Answer :

Explanation: HBase table contents are sorted in the following order: row key, column family, and column qualifier, and timestamp. It keeps them in a lexicographical order.
In this example, there are two comparisons. First, column family ArticleContent is before ArticleContent_Altered. Within a column family, data will get sorted by a column qualifier. Thus, UserVersion:10 comes beforeUserVersion:3, as 1 from 10 is less than 3 in a lexicographical order. All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). The row key is treated by HBase as an array of bytes but it must have a string representation. A special property of the row key Map is that it keeps them in a lexicographical order. For example, numbers going from 1 to 100 will be ordered like this: 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,...,9,91,92,93,94,95,96,97,98,99
Perform Scans Using HBase Shell
You can perform scans using HBase Shell, for testing or quick queries. Use the following guidelines or issue the scan command in HBase Shell with no parameters for more usage information. This represents only a subset of possibilities.
# Display usage information
hbase> scan
# Scan all rows of table 't1'
hbase> scan 't1'
# Specify a startrow, limit the result to 10 rows, and only return selected columns
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
# Specify a timerange
hbase> scan 't1', {TIMERANGE => [1303668804, 1303668904]}
# Specify a custom filter
hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
# Disable the block cache for a specific scan (experts only)
hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Question : You are going to store this data in HBase file, select the order in which they will be stored.

Row # ROW KEY COLUMNFamilty:ColumnQualifier:timestamp Column Value
A 01012015002 Article:Metadata,timestamp=201, value=Ankit
B 01012015002 Article:User,timestamp=201, value=Baba
C 01012015001 BLOG:Title,timestamp=501, value=Chitrank
D 01012015001 BLOG:Author,timestamp=501, value=David
E 01012015002 Article:Number,timestamp=201, value=Eigen
F 01012015001 BLOG:Text,timestamp=501, value=Farukh

1. D, F, C, A, E, B
2. C, D, F, A, B, E
3. A, E, B, D, C, F

4. D, S, B, F, C, E

Correct Answer : Get Lastest Questions and Answer :
Explanation: All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). The order in which data will be stored in HFile lexicographical order under each of the following
1. row key
2. column family
3. column qualifier
4. and timestamp.

This keeps data in a order.

In this example data, two row keys will be sorted in the following order (01012015001, 01012015002). The rows with 01012015001 are in the same column family, BLOG, so the column qualifiers will be stored in the following order [Author, Text, Title] which is: D,F,C. The rows with 01012015002 are in the same column family, Comments, so the column qualifiers will be stored in the following order [ Metadata, Number, User] which is:A,E,B. So final order will be D, F, C, A, E, B All data model operations HBase return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first). There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. Thus, while HBase can support not only a wide number of columns per row, but a heterogenous set of columns between rows as well, it is your responsibility to keep track of the column names. The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows. For more information about how HBase stores data internally.In the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly) there is a an optimization note on watching out for a phenomenon where an import process walks in lock-step with all clients in concert pounding one of the table's regions (and thus, a single node), then moving onto the next region, etc. With monotonically increasing row-keys (i.e., using a timestamp), this will happen. See this comic by IKai Lan on why monotonically increasing row keys are problematic in BigTable-like datastores: monotonically increasing values are bad. The pile-up on a single region brought on by monotonically increasing keys can be mitigated by randomizing the input records to not be in sorted order, but in general it's best to avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.

If you do need to upload time series data into HBase, you should study OpenTSDB as a successful example. It has a page describing the link: schema it uses in HBase. The key format in OpenTSDB is effectively [metric_type][event_timestamp], which would appear at first glance to contradict the previous advice about not using a timestamp as the key. However, the difference is that the timestamp is not in the lead position of the key, and the design assumption is that there are dozens or hundreds (or more) of different metric types. Thus, even with a continual stream of input data with a mix of metric types, the Puts are distributed across various points of regions in the table.

Related Questions

Question : As an HBase administrator at Acmeshell.com you have configured HBase to store a maximum of versions.
You have inserted 7 versions of your data in a Column Family called Acmeshell. At what point are the older versions removed from Acmeshell?

1. Never, the older version has to be manually deleted.
2. The older versions are removed at major compaction.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The older versions are removed at minor compaction.

Question : To analysing the entire QuickTechie.com articles backup table stored in HBase, you found that it is not perfoming well and showing slowness.
You considered the block size option and increased the block size from 64KB to 512KB assuming ARTICLE table size is 1TB. Why does increasing block size improve scan performance?

1. When you increase block size then HBase will reduce the seek on the disk by which scan performance increased.
2. Increasing block size means fewer block indexes that need to be read from disk, which increase scan performance.
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Question : You have written a Mobile Application for an advertising company called Acmeshell.com.
Your Mobile application needs to retrieve 01011990(start date) to 31121990 (end date) non-sequential rows from a table with 1TB mobiles ads stored as rows .
What does your application need to implement to retrieve data for known row keys ?

1. HTable.get(List(Get) gets)

2. Increase the Block Cache
3. Access Mostly Uused Products by 50000+ Subscribers
4. HTable.get(Get get)

Question : You have created an HBase application called Acmeshell and from within Acmeshell, you want to create a new table named AcmeLogs.
In this AcmeLogs table you will be storing 2 Billion Mobile advertisement and its clickstream information. You start with the following Java code:
You have already created HBaseAdmin object ( name of object acmeAdmin)using the configuration as well as HTableDescriptor with
table name "AcmeLogs". Now you want to finally create the table using HTableDescriptor, select the correct command.

1. HTable.createTable(acmeTable);
2. HBaseAdmin.createTable(acmeTable);
3. Access Mostly Uused Products by 50000+ Subscribers

4. acmeAdmin.createTable(acmeTable);

Question : You have created an advertising application based on HBase called Acmeshell
in Acmeshell you wish to insert text using the add method, with the add text you also want to store the time when the click happened
on the advertisement, in HBase select the correct syntax so that you can also store click timestamp using the Put class?

1. put.add(column_family, column_qualifier, data, click_timestamp)

2. put.add(column_family, column_qualifier, click_timestamp, data)

3. Access Mostly Uused Products by 50000+ Subscribers

4. put.insert(click_timestamp, column_family, column_qualifier, data)

Question :

Select the correct statement for deciding number of column families..

1. Recommend no more than three Column Families
2. Column Families are defined by access scope
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2 and 3 are correct
5. only 1 and 3 are correct