Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)

Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the article.
You decided to use HBase rather than HDFS to store this article. What would be the reason, you preferred the HBase over HDFS.

1. Fault tolerance
2. Batch processing
3. Random writes
4. Even Distribution of Data.

Correct Answer : 3
Explanation: Apache HBase provides random, realtime read/write access to your data. HDFS does not allow random writes.
HDFS is built for scalability, fault tolerance, and batch processing. HDFS is a distributed file system that is well suited for the storage of large files.
It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist
on HDFS for high-speed lookups. Features of HBase
Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This makes it very suitable for tasks such as high-speed counter aggregation. Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are automatically split and re-distributed as your data grows. Automatic RegionServer failover
Hadoop/HDFS Integration: HBase supports HDFS out of the box as its distributed file system.
MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both source and sink.
Java Client API: HBase supports an easy to use Java API for programmatic access.
Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.
Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.
Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.

Question : All the software professionals who are subscriber at QuickTechie.com created their profile, as an administrator you also store the joining date of
the profile. Full History of all the users and their profile is being stored in HBase for further analysis. Now one of the data scientist wants to fire ad-hoc
query to fetch the Joining date of one of bad profiling who is publishing adult content on the website. In order to fetch the data from a cell (Joining Date),
you need to supply HBase with which of the following?

1. A row key, column family and column qualifier
2. A row key, column qualifier and version

3. A column key
4. A column key and column qualifier

Correct Answer : 1

Explanation: HBase table maintains maps of Keys to Values (key -> value). Each of these mappings is called a keyvalue or a cell. Each cell identifies with
(rowkey, columnFamily, columnQualifier, timestamp-> value) map. When you don't specify the timestamp (rowkey, columnFamily, columnQualifier-> value) map,
Get will retrieve only the current version of the row.

" HBase is a key/value store. Specifically it is a
Sparse, Consistent, Distributed, Multidimensional, Sorted map.
" Map
HBase maintains maps of Keys to Values (key -> value). Each of these mappings is called a "KeyValue" or a "Cell". You can find a value by its key... That's it.
" Sorted
These cells are sorted by the key. This is a very important property as it allows for searching ("give me all values for which the key is between X and Y"), rather than just retrieving a value for a known key.
" Multidimensional
The key itself has structure. Each key consists of the following parts:
row-key, column family, column, and time-stamp.
So the mapping is actually:
(rowkey, column family, column, timestamp) -> value
rowkey and value are just bytes (column family needs to be printable), so you can store anything that you can serialize into a byte[] into a cell.
" Sparse
This follows from the fact the HBase stores key -> value mappings and that a "row" is nothing more than a grouping of these mappings (identified by the rowkey mentioned above).
Unlike NULL in most relational databases, no storage is needed for absent information, there will be just no cell for a column that does not have any value.
It also means that every value carries all its coordinates with it.

Distributed
One key feature of HBase is that the data can be spread over 100s or 1000s of machines and reach billions of cells. HBase manages the load balancing automatically.

Consistent
HBase makes two guarantees:
All changes the with the same rowkey (see Multidimensional above) are atomic. A reader will always read the last written (and committed) values.

Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the article.
You decided to use HBase rather than HDFS to store this article. You need to create a ARTICLES table in HBase. The table will consist of a one Column Family called
PROFILE_ARTICLES and two column qualifiers, USER and COMMENT. Select the correct command which will create this table:

1. create 'ARTICLES', {NAME => 'Author', NAME =>'Comment'}

2. create 'ARTICLES', 'PROFILE_ARTICLES:Author', 'PROFILE_ARTICLES:Comment'

3. create 'ARTICLES', 'PROFILE_ARTICLES' {NAME => 'Author', NAME => 'Comment'}

4. create 'ARTICLES', 'PROFILE_ARTICLES'

Correct Answer : 4
Explanation: When you create a HBase table, you need to specify table name and column family name. For this example: Table name: 'ARTICLES' ColumnFamily: 'PROFILE_ARTICLES' For example, in the HBase shell, use create to create the table by passing it a name and then verify it with thedescribe command.
hbase> create 'ARTICLES', 'PROFILE_ARTICLES' Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. Dictionaries are described below in the GENERAL NOTES section. Examples:
hbase> create 't1', {NAME => 'f1', VERSIONS => 5} hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} hbase> # The above in shorthand would be the following: hbase> create 't1', 'f1', 'f2', 'f3' hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
Tables Management commands : alter Alter column family schema; pass table name and a dictionary specifying new column family schema. Dictionaries are described on the main help command output. Dictionary must include name of column family to alter.For example, to change or add the 'f1' column family in table 't1' from current value to keep a maximum of 5 cell VERSIONS, do: hbase> alter 't1', NAME => 'f1', VERSIONS => 5
You can operate on several column families: hbase> alter 't1', 'f1', {NAME => 'f2', IN_MEMORY => true}, {NAME => 'f3', VERSIONS => 5}
To delete the 'f1' column family in table 't1', use one of:hbase> alter 't1', NAME => 'f1', METHOD => 'delete'
hbase> alter 't1', 'delete' => 'f1'
You can also change table-scope attributes like MAX_FILESIZE, READONLY, MEMSTORE_FLUSHSIZE, DEFERRED_LOG_FLUSH, etc. These can be put at the end; for example, to change the max size of a region to 128MB, do: hbase> alter 't1', MAX_FILESIZE => '134217728'
You can add a table coprocessor by setting a table coprocessor attribute: hbase> alter 't1', 'coprocessor'=>'hdfs:///foo.jar|com.foo.FooRegionObserver|1001|arg1=1,arg2=2'
Since you can have multiple coprocessors configured for a table, a sequence number will be automatically appended to the attribute name to uniquely identify it. The coprocessor attribute must match the pattern below in order for the framework to understand how to load the coprocessor classes:[coprocessor jar file location] | class name | [priority] | [arguments] You can also set configuration settings specific to this table or column family:
hbase> alter 't1', CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}
hbase> alter 't1', {NAME => 'f2', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
You can also remove a table-scope attribute:
hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'MAX_FILESIZE' hbase> alter 't1', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
There could be more than one alteration in one command: hbase> alter 't1', { NAME => 'f1', VERSIONS => 3 }, { MAX_FILESIZE => '134217728' }, { METHOD => 'delete', NAME => 'f2' }, OWNER => 'johndoe', METADATA => { 'mykey' => 'myvalue' } create Create table; pass table name, a dictionary of specifications per column family, and optionally a dictionary of table configuration. hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
hbase> # The above in shorthand would be the following:
hbase> create 't1', 'f1', 'f2', 'f3'
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
Table configuration options can be put at the end.

Related Questions

Question : When column is decided during schema design which statement correctly applies..

1. Flushing and Compaction are per Region basis
2. The more Column Families the greater the I/O load
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2 and 3 are correct
5. Only 1 and 2 are correct

Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the
article. You decided to use HBase rather than HDFS to store this article. You have created an ARTICLES table in HBase to store all the versions of the articles
in this table. Now after one year you find that there is no use of keeping version older than 10 (version number) , hence you fire a query to delete to delete
older version. Immediately you ran scan query on the same table and also next day you ran the same scan query, you found that there is a huge improvement on
the scan operation on next day, what could be reason?

1. HBase ran a major compaction

2. HBase rebalnced the indexes on the table
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase ran a minor compaction

Question : All the articles stored in the ARTICLES table have a column called RowKey which is mainly used for article_id, following article id has been
created using the random number generator. Now you want to find that whether article number 7654321 exist or not, there is a bloom filter enabled to give
better performence on look up. What would be return by the bloom filter ?
Row Keys : 5672343 , 0987543 , 2345895 , 1473578, 7654321, 907654

1. The hash value of 7654321
2. It will be confirmed that article_id 7654321 is not available.
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above

Question : You have following data with a row key ()
10 Engineers:Name, timestamp=1331314762083, value=Stole 10
Engineers:Salary, timestamp=1331314762085, value=8000 10
Skills:Skill_1, timestamp=1331314762088, value=J2ee 10
Skills:Skill_2, timestamp=1331314762091, value=Java

How many regions will be read during a scan of the entire data?

1. One
2. Two
3. Access Mostly Uused Products by 50000+ Subscribers
4. Four

Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the
article. You decided to use HBase rather than HDFS to store this article. You have created an ARTICLES table in HBase to store all the versions of the articles
in this table. Select the Column Family attribute settings which will retain at least one version of an article always but expire all other versions that are older than 1 month (30 Days) for a given Column Family?

1. LENGTH = 30, MIN_VERSIONS = 1
2. TTL = 30, VERSIONS = 1

3. Access Mostly Uused Products by 50000+ Subscribers

4. TTL = 2592000 , MIN_VERSIONS = 1

Question : Given that following is your entire data set:

23 column=Engineers:FirstName, timestamp=1331314762084, value=Shobhit
23 column=Engineers:Payment, timestamp=1331314762086, value=800000
23 column=TechnicalSkills:1_FirstSkill, timestamp=1331314762089, value=J2EE
23 column=TechnicalSkills:2_AnotherSkill, timestamp=1331314762092, value=Java

How many sets of physical files will be read during a scan of the entire data set immediately following a major compaction?

1. One
2. Two
3. Access Mostly Uused Products by 50000+ Subscribers
4. Four