Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)

Question : To analysing the entire QuickTechie.com articles backup table stored in HBase, you found that it is not perfoming well and showing slowness.
You considered the block caching option, select the correct statement regarding the block caching enabling and disabling. Assume entire table size is 1TB and
available RAM is 128GB

1. Disabling block caching does not improve scan performance on the ARTICLE table.

2. When you disable block caching, you free up memory for MemStore which improves scan performance for ARTICLE table.
3. By disabling the block caching, it free up that memory for other operations. However, block caching would not help whether you diable or enable for scan operation. Because the entire table is 1TB and max available memory is 128GB, hence full table will not fit in 128GB
4. None of the above.

Correct Answer : Get Lastest Questions and Answer :
Explanation: HBase is a distributed database built around the core concepts of an ordered write log and a log-structured merge tree. As with any database, optimized I/O is a critical concern to HBase. When possible, the priority is to not perform any I/O at all. This means that memory utilization and caching structures are of utmost importance. To this end, HBase maintains two cache structures: the "memory store" and the "block cache". Memory store, implemented as the MemStore, accumulates data edits as they're received, buffering them in memory (1). The block cache, an implementation of the BlockCache interface, keeps data blocks resident in memory after they're read. The MemStore is important for accessing recent edits. Without the MemStore, accessing that data as it was written into the write log would require reading and deserializing entries back out of that file, at least a O(n)operation. Instead, MemStore maintains a skiplist structure, which enjoys a O(log n) access cost and requires no disk I/O. The MemStore contains just a tiny piece of the data stored in HBase, however. Servicing reads from the BlockCache is the primary mechanism through which HBase is able to serve random reads with millisecond latency. When a data block is read from HDFS, it is cached in the BlockCache. Subsequent reads of neighboring data - data from the same block - do not suffer the I/O penalty of again retrieving that data from disk (2). It is the BlockCache that will be the remaining focus of this post. Blocks to cache: Before understanding the BlockCache, it helps to understand what exactly an HBase "block" is. In the HBase context, a block is a single unit of I/O. When writing data out to an HFile, the block is the smallest unit of data written. Likewise, a single block is the smallest amount of data HBase can read back out of an HFile. Be careful not to confuse an HBase block with an HDFS block, or with the blocks of the underlying file system - these are all different. HBase blocks come in 4 varieties: DATA, META, INDEX, and BLOOM. DATA blocks store user data. When the BLOCKSIZE is specified for a column family, it is a hint for this kind of block. Mind you, it's only a hint. While flushing the MemStore, HBase will do its best to honor this guideline. After each Cell is written, the writer checks if the amount written is >= the target BLOCKSIZE. If so, it'll close the current block and start the next one . INDEX and BLOOM blocks serve the same goal; both are used to speed up the read path. INDEX blocks provide an index over the Cells contained in the DATA blocks. BLOOM blocks contain a bloom filter over the same data. The index allows the reader to quickly know where a Cell should be stored. The filter tells the reader when a Cell is definitely absent from the data. Finally, META blocks store information about the HFile itself and other sundry information - metadata, as you might expect. A more comprehensive overview of the HFile formats and the roles of various block types is provided in Apache HBase I/O - HFile. HBase BlockCache and its implementations. There is a single BlockCache instance in a region server, which means all data from all regions hosted by that server share the same cache pool (5). The BlockCache is instantiated at region server startup and is retained for the entire lifetime of the process. Traditionally, HBase provided only a single BlockCache implementation: the LruBlockCache. The 0.92 release introduced the first alternative in HBASE-4027: the SlabCache. HBase 0.96 introduced another option via HBASE-7404, called the BucketCache. The key difference between the tried-and-true LruBlockCache and these alternatives is the way they manage memory. Specifically, LruBlockCache is a data structure that resides entirely on the JVM heap, while the other two are able to take advantage of memory from outside of the JVM heap. This is an important distinction because JVM heap memory is managed by the JVM Garbage Collector, while the others are not. In the cases of SlabCache and BucketCache, the idea is to reduce the GC pressure experienced by the region server process by reducing the number of objects retained on the heap.As HBase reads entire blocks of data for efficient I/O usage, it retains these blocks in an in-memory cache so that subsequent reads do not need any disk operation. For a full table scan on a large data set, you may not be able to fit all scan data into the block cache. For a full table scan you'll see better performance if you disable block cache. Disabling block caching doesn't affect the available memory for MemStore. The Block Cache is a read cache, and the Memstore is a write buffer. HBase provides three different BlockCache implementations: the default onheap LruBlockCache, BucketCache, and SlabCache, which are both (usually) offheap. This section discusses benefits and drawbacks of each implementation, how to choose the appropriate option, and configuration options for each. There are two reasons to consider enabling one of the alternative BlockCache implementations. The first is simply the amount of RAM you can dedicate to the region server. Community wisdom recognizes the upper limit of the JVM heap, as far as the region server is concerned, to be somewhere between 14GB and 31GB (9). The precise limit usually depends on a combination of hardware profile, cluster configuration, the shape of data tables, and application access patterns. You'll know you've entered the danger zone when GC pauses and RegionTooBusyExceptions start flooding your logs. The other time to consider an alternative cache is when response latency really matters. Keeping the heap down around 8-12GB allows the CMS collector to run very smoothly (10), which has measurable impact on the 99th percentile of response times. Given this restriction, the only choices are to explore an alternative garbage collector or take one of these off-heap implementations for a spin. This second option is exactly what I've done. In my next post, I'll share some unscientific-but-informative experiment results where I compare the response times for different BlockCache implementations.

Question :

To drop a table it must be first disable ?

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

Question :

To change the column families it is not necessary to disable
the table

1. True
2. False

Correct Answer : Get Lastest Questions and Answer :

In case of changing the column families
Must disable the table first

Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)

Related Questions