Premium

Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)



Question : You want to store clickstream data in HBase. Your data consists of the following:
the source id
the name of the cluster
the url of the click
the datetimestamp for each click
Which rowkey should you use if you want to retrieve the source ids with a scan and sorted with the most recent first?
 : You want to store clickstream data in HBase. Your data consists of the following:
1. (source_id)(Long.MAX_VALUE - (Long)datetimestamp)
2. ((Long)datetimestamp)(source_id)

3. Access Mostly Uused Products by 50000+ Subscribers
4. (source_id)(datetimestamp)(Long.MAX_VALUE)


Correct Answer : Get Lastest Questions and Answer :

Explanation: One of design considerations for yours rowkey is an access pattern of table. In this scenario, your access pattern is to retrieve the source ids with the most recent first. HBase stores rows in sorted order. Using the rowkey with reverse timestamp (Long.MAX_VALUE - (long) timestamp)>, the latest source id will be at the top of table and thus will be scanned first. This will avoid having to scan the entire rowkey and save the storage for the smaller byte value of timestamp. A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly), the technique involves appending (Long.MAX_VALUE - timestamp) to the end of any key, e.g., [key][reverse_timestamp]. The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys are in sorted order, this key sorts before any older row-keys for [key] and thus is first. If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., timestamp = Long.MAX_VALUE - timestamp) will create the property of being able to do a Scan on [hostname][log-event] to obtain the quickly obtain the most recently captured events.




Question : Given the following HBase code:
byte [] rowKey = Bytes.toBytes(65);
Put put = new Put(rowKey);
put.add("info".getBytes(), "FirstName".getBytes(), "Kimberly".getBytes());
put.add("info".getBytes(), "LastName".getBytes(), "Grant".getBytes());
What does "info" represent?

 : Given the following HBase code:
1. Primary key of the row
2. Column family name
3. Access Mostly Uused Products by 50000+ Subscribers
4. Column value


Correct Answer : Get Lastest Questions and Answer :

Explanation: public Put add(byte[] family,
byte[] qualifier,
byte[] value)
Add the specified column and value to this Put operation.
Parameters:
family - family name
qualifier - column qualifier
value - column value
public Put add(byte[] family,
byte[] qualifier,
long ts,
byte[] value)
Add the specified column and value, with the specified timestamp as its version to this Put operation.
Parameters:
family - family name
qualifier - column qualifier
ts - version timestamp
value - column value
Returns:
this




Question : Given the following HBase code:
byte [] rowKey = Bytes.toBytes(65);
Put put = new Put(rowKey);
put.add("info".getBytes(), "FirstName".getBytes(), "Kimberly".getBytes());
put.add("info".getBytes(), "LastName".getBytes(), "Grant".getBytes());
What does "FirstName" represent?

 : Given the following HBase code:
1. Primary key of the row
2. Column family name
3. Access Mostly Uused Products by 50000+ Subscribers
4. Column value


Correct Answer : Get Lastest Questions and Answer :

Explanation: public Put add(byte[] family,
byte[] qualifier,
byte[] value)
Add the specified column and value to this Put operation.
Parameters:
family - family name
qualifier - column qualifier
value - column value
public Put add(byte[] family,
byte[] qualifier,
long ts,
byte[] value)
Add the specified column and value, with the specified timestamp as its version to this Put operation.
Parameters:
family - family name
qualifier - column qualifier
ts - version timestamp
value - column value
Returns:
this


Related Questions


Question : You have a AcmeLog table in HBase. The RowKeys are numbers.
You want to retrieve all entries that have row key 100.
Which shell command should you use?
 : You have a AcmeLog table in HBase. The RowKeys are numbers.
1. get 'AcmeLog', (FILTER ='100')
2. get 'AcmeLog', '100'

3. Access Mostly Uused Products by 50000+ Subscribers
4. scan 'AcmeLog', '100'




Question : You have a AcmeUsers table in HBase and you would like to insert a row that consists
of a AcmeID,jayesh2014 and an email address, john@acmeshell.com. The table has a single Column Family
named Meta and the row key will be the Acme's ID. Which command help in this case?
 : You have a AcmeUsers table in HBase and you would like to insert a row that consists
1. put 'AcmeUsers', 'jayesh2014', 'john@acmeshell.com'

2. put 'AcmeUsers', 'Meta:AcmeID', 'jayesh2014', 'Email, 'john@acmeshell.com'

3. Access Mostly Uused Products by 50000+ Subscribers

4. put 'AcmeUsers', 'AcmeID:jayesh2014', 'Email:john@acmeshell.com'




Question : You are storing page view data for a large number of Web sites, each of which has
many subdomains (www.acmeshell.com, archive.acmeshell.com, beta.acmeshell.com, etc.). Your reporting tool needs
to retrieve the total number of page views for a given subdomain of a Web site. Which of the following rowkeys should you use?
 : You are storing page view data for a large number of Web sites, each of which has
1. The domain name followed by the URL

2. The URL followed by the reverse domain name

3. Access Mostly Uused Products by 50000+ Subscribers

4. The URL




Question : You have network servers producing timeseries data from network traffic logs.
You want to attain high write throughput for storing this data in an HBase table.
Which of these should you choose for a row key to maximize your write throughput?
 : You have  network servers producing timeseries data from network traffic logs.
1. (hashCode(centralServerGeneratedSequenceID)>(timestamp>
2. (timestamp>

3. Access Mostly Uused Products by 50000+ Subscribers

4. (Long.MAX_VALUE - timestamp>




Question : If you have more than one tables in RDBMS which are frequently joined to fetch the data, now you want migrate these tables in HBase.
Please select correct statement from below..
 : If you have more than one tables in RDBMS which are frequently joined to fetch the data, now you want migrate these tables in HBase.
1. Create all the tables each with multiple column families in HBASE
2. Create a single table with as many column families
as tables
3. Access Mostly Uused Products by 50000+ Subscribers
for all the tables
4. Any of the above will fine




Question : You have a table with the following rowkeys based on the date:
21012010, 22012010, 23012010,21052010,28012010,24012010,29012010
In which order will these rows be retrieved from a scan?
 : You have a table with the following rowkeys based on the date:
1. 21012010, 22012010, 23012010, 21052010, 28012010, 24012010, 29012010
2. 21012010, 21052010, 22012010, 23012010, 24012010, 28012010, 29012010
3. Access Mostly Uused Products by 50000+ Subscribers
4. It could be in any random order