Mapr (HP) HBase Developer Certification Questions and Answers (Dumps and Practice Questions)

Question : You have a AcmeLog table in HBase. The RowKeys are numbers.
You want to retrieve all entries that have row key 100.
Which shell command should you use?

1. get 'AcmeLog', (FILTER ='100')
2. get 'AcmeLog', '100'

3. Access Mostly Uused Products by 50000+ Subscribers
4. scan 'AcmeLog', '100'

Correct Answer : Get Lastest Questions and Answer :

Explanation: HBase gives you two classes to read data: Get and Scan. The Get class reads data by specifying a single row key and Scan class supports a range scan. In the HBase Shell, a get
operation performs the action on a single row. To geteverything for a row, simply execute a get operation with the row to get.

Further Reading
The HBase Shell wikiincludes a section on scan which includes:
Get row or cell contents; pass table name, row, and optionally a dictionary of column(s), timestamp and versions. Examples:

hbase> get 't1', 'r1'
hbase> get 't1', 'r1', {COLUMN => 'c1'}
hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}

Question : You have a AcmeUsers table in HBase and you would like to insert a row that consists
of a AcmeID,jayesh2014 and an email address, john@acmeshell.com. The table has a single Column Family
named Meta and the row key will be the Acme's ID. Which command help in this case?

1. put 'AcmeUsers', 'jayesh2014', 'john@acmeshell.com'

2. put 'AcmeUsers', 'Meta:AcmeID', 'jayesh2014', 'Email, 'john@acmeshell.com'

3. Access Mostly Uused Products by 50000+ Subscribers

4. put 'AcmeUsers', 'AcmeID:jayesh2014', 'Email:john@acmeshell.com'

Correct Answer : Get Lastest Questions and Answer :

Explanation: In the HBase Shell, you can type put commands to insert a row. put takes 'tableName', 'rowkey','value(optional)', 'columnFamily:columnQualifier', 'value'.
Put a cell 'value' at specified table/row/column and optionally timestamp coordinates. To put a cell value into table't1' at row 'r1' under column 'c1' marked with the time 'ts1',
do:
hbase> put 't1', 'r1', 'c1', 'value', ts1

Question : You are storing page view data for a large number of Web sites, each of which has
many subdomains (www.acmeshell.com, archive.acmeshell.com, beta.acmeshell.com, etc.). Your reporting tool needs
to retrieve the total number of page views for a given subdomain of a Web site. Which of the following rowkeys should you use?

1. The domain name followed by the URL

2. The URL followed by the reverse domain name

3. Access Mostly Uused Products by 50000+ Subscribers

4. The URL

5. The URL
including http
http://www.training4exam.com/hbase-hot-spot-detection-and-resolution

Correct Answer : Get Lastest Questions and Answer :
Explanation: HBase will normally split a region in 2 at it's mid point when it reaches hbase.hregion.max.filesize (depending on the split policy). You can rely on
automatic splitting and but you'll end with odd and lexically uneven split points because of the nature of your rowkeys (lots of "com" domains against few "org" domains). It may be
not your exact case but think of this potential issue:
Starting with an empty table with just 1 region you insert 145M domains sequentially, starting from com.. and ending in org..
At 80 million mark (a fictitious com.nnnn.www), the region automatically splits into 2 at "com.f*", resulting in 2 40 million regions, and continues writing rows into region 2
At 120 million mark (a fictitious com.yyyy.www), the second region reaches the max filesize and splits into 2 40 million regions at "com.p*" and continues writing rows into region 3.
The job ends with the 150M domains, no more splits are performed.
Given this case, Regions 1 & 2 will store 40M rows each one but Region 3 will store 65M rows (it would be splitted at 80M, but it maybe never reach that amount). Also, since
you'll write always to the last region (even with batching enabled), the job would be a lot slower than issuing batches of writes to multiple regions at the same time. Another
problem, imagine you realize you also need to add .us domains (10M). Given this design they will go to the Region 3, increasing the amount of rows hosted to 75M. The common approach
to ensure even distribution of keys among regions is to prepend to the rowkey a few chars of the md5 of the key (in this case the domain name). In HBase, the very first bytes of the

row keys determine the region that will host it. Just by prepending a few chars of the md5 would be enough to prevent as much as hotspotting as possible (one region getting too much
writes) and to get good automatic splits, but it's generally recommended to pre-split tables to ensure even better splitting. If you prepend 2 chars of the md5 to your rowkeys you
can pre-split the table with 15 split points: "10", "20", "30" .. until "e0". That will create 16 regions and in case any of them needs to be automatically splitted it will be done

at their mid point. i.e: When the region starting at "a0" and ending in "af" reaches hbase.hregion.max.filesize it will be splitted about "a8" and each one of the regions will store
half of the "a" bucket.
This is an example of which regions would host each row if you have 16 pre-split regions with 2 char prefixed row keys:
- Region 1 ---------
0b|com.example4.www
- Region 2 ---------
1b|org.example.www
10|com.example.www
- Region 5 ---------
56|com.example3.www
Given a lot more domains it would end being much more even and almost all regions would store the same amount of domains. In most of cases having 8-16 pre-split regions will be more
than enough, but if not, you can go for 32 or even 64 pre-split regions, until a max of 256 (that would be having "01", "02", "03" ... "9f", "a0", "a1" ... until "fe") This data
access pattern is to retrieve the total number of page views for a given subdomain of a web site. It is best to store the subdomain data clustered together because HBase is really
good at scanning clustered data. If you store the data with the reverse domain name, the same subdomain data will be clustered together. You can efficiently calculate total number of
page views across the subdomains.

Related Questions

Question : You need to insert a cell with a specific timestamp (version) . Which of the following is true?

1. The timestamp for the entire row must be updated to 13353903160532
2. The Put class allows setting a cell specific timestamp
3. Access Mostly Uused Products by 50000+ Subscribers
4. The HTable class allows you to temporarily roll back the newer versions of the cell

Question : Select the correct statement

1. .META table holds the list of all user space region
2. Entries in .META tables are keyed by region name
3. Access Mostly Uused Products by 50000+ Subscribers
4. Region name is made up of first raw values of that region

1. 1,2,4
2. 1,2,3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 1,2,3,4

Question : Region name is made up of

1. Table name
2. Region Start row
3. Access Mostly Uused Products by 50000+ Subscribers
4. MD5 hash of table name, start row and creation timestamp
5. ALL of the above

Question : Which one of the client interface to HBase you can use, so you will
have functionality needed to store and retrive data from HBase as well as delete obsolete values

1. HMasterInterface
2. HTable
3. Access Mostly Uused Products by 50000+ Subscribers
4. HTablePool

Question : Which is/are the statement correct..

A. All operations taht mutate data are gaurnteed to be atomic on a per-row-basis
B. A reading client will not be affected by another updating a particular row
C. Many client can update the same row at same time

1. A,C correct
2. A,B correct
3. Access Mostly Uused Products by 50000+ Subscribers
4. All A,B,C are correct

Question : Select the correct statement..

A. Create HTable instances only once, usually when your application start
B. Create a separate HTable instance for every thread you execute or use HTablePool
C. Updates are atomic on per row basis

1. Only A,B correct
2. Only B,C correct
3. Access Mostly Uused Products by 50000+ Subscribers
4. All A,B,C are correct