Question : You want to store clickstream data in HBase. Your data consists of the following: the source id the name of the cluster the url of the click the datetimestamp for each click Which rowkey should you use if you want to retrieve the source ids with a scan and sorted with the most recent first? 1. (source_id)(Long.MAX_VALUE - (Long)datetimestamp) 2. ((Long)datetimestamp)(source_id)
Explanation: One of design considerations for yours rowkey is an access pattern of table. In this scenario, your access pattern is to retrieve the source ids with the most recent first. HBase stores rows in sorted order. Using the rowkey with reverse timestamp (Long.MAX_VALUE - (long) timestamp)>, the latest source id will be at the top of table and thus will be scanned first. This will avoid having to scan the entire rowkey and save the storage for the smaller byte value of timestamp. A common problem in database processing is quickly finding the most recent version of a value. A technique using reverse timestamps as a part of the key can help greatly with a special case of this problem. Also found in the HBase chapter of Tom White's book Hadoop: The Definitive Guide (O'Reilly), the technique involves appending (Long.MAX_VALUE - timestamp) to the end of any key, e.g., [key][reverse_timestamp]. The most recent value for [key] in a table can be found by performing a Scan for [key] and obtaining the first record. Since HBase keys are in sorted order, this key sorts before any older row-keys for [key] and thus is first. If the most important access path is to pull most recent events, then storing the timestamps as reverse-timestamps (e.g., timestamp = Long.MAX_VALUE - timestamp) will create the property of being able to do a Scan on [hostname][log-event] to obtain the quickly obtain the most recently captured events.
Question : Given the following HBase code: byte [] rowKey = Bytes.toBytes(65); Put put = new Put(rowKey); put.add("info".getBytes(), "FirstName".getBytes(), "Kimberly".getBytes()); put.add("info".getBytes(), "LastName".getBytes(), "Grant".getBytes()); What does "info" represent?
Explanation: public Put add(byte[] family, byte[] qualifier, byte[] value) Add the specified column and value to this Put operation. Parameters: family - family name qualifier - column qualifier value - column value public Put add(byte[] family, byte[] qualifier, long ts, byte[] value) Add the specified column and value, with the specified timestamp as its version to this Put operation. Parameters: family - family name qualifier - column qualifier ts - version timestamp value - column value Returns: this
Question : Given the following HBase code: byte [] rowKey = Bytes.toBytes(65); Put put = new Put(rowKey); put.add("info".getBytes(), "FirstName".getBytes(), "Kimberly".getBytes()); put.add("info".getBytes(), "LastName".getBytes(), "Grant".getBytes()); What does "FirstName" represent?
Explanation: public Put add(byte[] family, byte[] qualifier, byte[] value) Add the specified column and value to this Put operation. Parameters: family - family name qualifier - column qualifier value - column value public Put add(byte[] family, byte[] qualifier, long ts, byte[] value) Add the specified column and value, with the specified timestamp as its version to this Put operation. Parameters: family - family name qualifier - column qualifier ts - version timestamp value - column value Returns: this