Premium

Mapr (HP) HBase Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : To analysing the entire QuickTechie.com articles backup table stored in HBase, you found that it is not perfoming well and showing slowness.
You considered the block size option and increased the block size from 64KB to 512KB assuming ARTICLE table size is 1TB. Why does increasing block size improve scan performance?
  :   To analysing the entire QuickTechie.com articles backup table stored in HBase, you found that it is not perfoming well and showing slowness.
1. When you increase block size then HBase will reduce the seek on the disk by which scan performance increased.
2. Increasing block size means fewer block indexes that need to be read from disk, which increase scan performance.
3. Access Mostly Uused Products by 50000+ Subscribers
4. None of the above



Correct Answer : Get Lastest Questions and Answer :
Explanation: Do not turn off block cache (You'd do it by setting hbase.block.cache.size to zero). Currently we do not do well if you do this because the
regionserver will spend all its time loading hfile indices over and over again. If your working set it such that block cache does you no good, at least size the block
cache such that hfile indices will stay up in the cache (you can get a rough idea on the size you need by surveying regionserver UIs; you'll see index block size accounted
near the top of the webpage).The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k. Larger cell values require larger blocksizes.
There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting indexes should be roughly halved).
As HBase reads entire blocks of data for efficient I/O usage, it retains these blocks in an in-memory cache so that subsequent reads do not need any disk operation.
In MapReduce, each block is assigned to a map task to process the contained data. This means larger block sizes equal fewer map tasks to run as the number of mappers
is driven by the number of blocks that need processing. In HBase, values are always freighted with their coordinates; as a cell value passes through the system, it'll be
accompanied by its row, column name, and timestamp - always. If your rows and column names are large, especially compared to the size of the cell value, then you may run
up against some interesting scenarios. One such is the case described by Marc Limotte at the tail of HBASE-3551 (recommended!). Therein, the indices that are kept on HBase
storefiles (hfile) to facilitate random access may end up occupyng large chunks of the HBase allotted RAM because the cell value coordinates are large. Mark in the above
cited comment suggests upping the block size so entries in the store file index happen at a larger interval or modify the table schema so it makes for smaller rows and
column names. Compression will also make for larger indices. See the thread a question storefileIndexSize up on the user mailing list.

Most of the time small inefficiencies don't matter all that much. Unfortunately, this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes,
and rowkeys they could be repeated several billion times in your data. An hfile is the file format that HBase uses to store data in HDFS. It contains a multi-layered index
which allows HBase to seek to the data without having to read the whole file. The size of those indexes is a factor of the block size (64KB by default), the size of your keys
and the amount of data you are storing. For big data sets it's not unusual to see numbers around 1GB per region server, although not all of it will be in cache because the LRU
will evict indexes that aren't used.






Question : You have written a Mobile Application for an advertising company called Acmeshell.com.
Your Mobile application needs to retrieve 01011990(start date) to 31121990 (end date) non-sequential rows from a table with 1TB mobiles ads stored as rows .
What does your application need to implement to retrieve data for known row keys ?
  :  You have written a Mobile Application for an advertising company called Acmeshell.com.
1. HTable.get(List(Get) gets)

2. Increase the Block Cache
3. Access Mostly Uused Products by 50000+ Subscribers
4. HTable.get(Get get)




Correct Answer : Get Lastest Questions and Answer : Scan class reads the entire rows, or reads data by specifying a startRow parameter-defining the row key where the scan begins to read from the HBase table.
The optional stopRow parameter can be used to limit the scan to a specific row key where it should end the reading. Scan is best for retrieving a range of data in sequentially.
HTabe.get(Get get): Extracts specific cells from a given row. HTable.get(List (Get> gets): Extracts specific cells from the given rows in batch.
HTable.get is best for retrieving the non-sequential data with the known rowkeys. In this scenario, you are retrieving 200-300 rows non sequential rows, HTable.get(List (Get> gets)
is the better solution. Class HTable
Used to communicate with a single HBase table. An implementation of HTableInterface. Instances of this class can be constructed directly but it is encouraged that users get
instances via HConnection and HConnectionManager. See HConnectionManager class comment for an example. This class is not thread safe for reads nor write.

In case of writes (Put, Delete), the underlying write buffer can be corrupted if multiple threads contend over a single HTable instance.
In case of reads, some fields used by a Scan are shared among all threads. The HTable implementation can either not contract to be safe in case of a Get
Instances of HTable passed the same Configuration instance will share connections to servers out on the cluster and to the zookeeper ensemble as well as caches of region locations.
This is usually a *good* thing and it is recommended to reuse the same configuration object for all your tables. This happens because they will all share the same underlying
HConnection instance. See HConnectionManager for more on how this mechanism works.
HConnection will read most of the configuration it needs from the passed Configuration on initial construction. Thereafter, for settings such as hbase.client.pause,
hbase.client.retries.number, and hbase.client.rpc.maxattempts updating their values in the passed Configuration subsequent to HConnection construction will go unnoticed.
To run with changed values, make a new HTable passing a new Configuration instance that has the new configuration.
Note that this class implements the Closeable interface. When a HTable instance is no longer required, it *should* be closed in order to ensure that the underlying resources
are promptly released. Please note that the close method can throw java.io.IOException that must be handled.
Class Scan
Used to perform Scan operations. All operations are identical to Get with the exception of instantiation. Rather than specifying a single row, an optional startRow and stopRow
may be defined. If rows are not specified, the Scanner will iterate over all rows. To scan everything for each row, instantiate a Scan object.

To modify scanner caching for just this scan, use setCaching. If caching is NOT set, we will use the caching value of the hosting HTable. See HTable.setScannerCaching(int).
In addition to row caching, it is possible to specify a maximum result size, using setMaxResultSize(long). When both are used, single server requests are limited by either
number of rows or maximum result size, whichever limit comes first. To further define the scope of what to get when scanning, perform additional methods as outlined below.
To get all columns from specific families, execute addFamily for each family to retrieve.

To get specific columns, execute addColumn for each column to retrieve. To only retrieve columns within a specific range of version timestamps, execute setTimeRange.
To only retrieve columns with a specific timestamp, execute setTimestamp. To limit the number of versions of each column to be returned, execute setMaxVersions.
To limit the maximum number of values returned for each call to next(), execute setBatch. To add a filter, execute setFilter. Expert: To explicitly disable server-side
block caching for this scan, execute setCacheBlocks(boolean).





Question : You have created an HBase application called Acmeshell and from within Acmeshell, you want to create a new table named AcmeLogs.
In this AcmeLogs table you will be storing 2 Billion Mobile advertisement and its clickstream information. You start with the following Java code:
You have already created HBaseAdmin object ( name of object acmeAdmin)using the configuration as well as HTableDescriptor with
table name "AcmeLogs". Now you want to finally create the table using HTableDescriptor, select the correct command.
  :  You have created an HBase application called Acmeshell and from within Acmeshell, you want to create a new table named AcmeLogs.
1. HTable.createTable(acmeTable);
2. HBaseAdmin.createTable(acmeTable);
3. Access Mostly Uused Products by 50000+ Subscribers

4. acmeAdmin.createTable(acmeTable);



Correct Answer : Get Lastest Questions and Answer :
Explanation: HBaseAdmin class provides an interface to manage HBase database table metadata and general administrative functions. HBaseAdmin can create,
drop, list, enable and disable tables. It can also be used to add and drop table column families. Once you create a table, the table is automatically enabled,
so you don't need to callenableTable manually. HBaseAdmin class Configuration configuration = new Configuration();
HBaseAdmin acmeAdmin = new HBaseAdmin(configuration);
HTableDescriptor descriptor = new HTableDescriptor(Bytes.toBytes("tablename"));
HColumnDescriptor columnDescriptor = new HColumnDescriptor(Bytes.toBytes("columnfamilyname"));
descriptor.addFamily(columnDescriptor);
acmeAdmin.createTable(descriptor);
HBase schemas can be created or updated with shell or by using HBaseAdmin in the Java API.
Tables must be disabled when making ColumnFamily modifications, for example:
Configuration config = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
String table = "myTable";
admin.disableTable(table);
HColumnDescriptor cf1 = ...;
admin.addColumn(table, cf1); // adding new ColumnFamily
HColumnDescriptor cf2 = ...;
admin.modifyColumn(table, cf2); // modifying existing ColumnFamily
admin.enableTable(table);
public static void createSchemaTables (Configuration config) {
try { final HBaseAdmin admin = new HBaseAdmin(config);
HTableDescriptor table = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
table.addFamily(new HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.SNAPPY));
System.out.print("Creating table. ");
createOrOverwrite(admin, table);
System.out.println(" Done.");
admin.close();



Related Questions


Question : Which of the following is supported features of HBase

1. HBase is a Column Family oriented
2. HBase support multi row transactions
3. HBase queries data using get or put or scan
4. HBase supports Secondary Indexes

  : Which of the following is supported features of HBase
1. All 1,2,3, and 4 are correct
2. Only 1,2,3 are correct
3. Only 2,3, and 4 are correct
4. Only 1,3, and 4 arre correct


Question : Which of the following feature is provided by HBase

  : Which of the following feature is provided by HBase
1. Random reads and writes
2. High Throughput
3. Caching
4. All of the above



Question : In which scenerio HBase should be used

  : In which scenerio HBase should be used
1. When data volume is huge e.g. TB to PB
2. When High throughput is needed e.g. 1000s queries per second
3. When there is a need of Higher Cache
4. When Data is Sparse
5. All of the above



Question : In which case Hbase should not be used ?

  : In which case Hbase should not be used ?
1. When you only append data to your dataset and read the whole data
2. When you need random read
3. When you need random write
4. When access pattern is well known




Question :

One technical architech designing a solution for storing huge data volume which is everyday generated by stock markets and its purpose
is only storing these data and reading it once for daily analysis and he comes to conclusion that, he will use HBase. So he took a right decision


  :
1. Yes
2. No


Question :

Please select the correct answer for HBase ?

  :
1. In HBase every row has a Row Key
2. All columns in HBase are belongs to a particular column family
3. A table can have one or more column families
4. Table cells are versioned
5. All of the above