HBase NoSQL Interview Preparation (Q&A)

Question: Please let us know the Difference Between HBase and Hadoop/HDFS?
Answer : HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS is a distributed file system that is well suited for the storage of large files. Its documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed StoreFiles that exist on HDFS for high-speed lookups. Assumptions and Goals of HDFS
Hardware Failure
Streaming Data Access
Large Data Sets
Simple Coherency Model
Moving Computation is Cheaper than Moving Data
Portability Across Heterogeneous Hardware and Software Platforms
HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.

Question: What is the maximum recommended cell size?
Answer : A rough rule of thumb, with little empirical validation, is to keep the data in HDFS and store pointers to the data in HBase if you expect the cell size to be consistently above 10 MB. If you do expect large cell values and you still plan to use HBase for the storage of cell contents, you'll want to increase the block size and the maximum region size for the table to keep the index size reasonable and the split frequency acceptable.

Question: What happens if we change the block size of a column family on an already populated database?
Answer : When we change the block size of the column family, the new data takes the new block size while the old data is within the old block size. When the compaction occurs, old data will take the new block size. “New files, as they are flushed, will have the new block size, whereas existing data will continue to be read correctly. After the next major compaction, all data should be converted to the new block size.�?

Question: What is the difference between HBASE and RDBMS?
Answer :

Related Questions

Question: Can major compaction manually triggered ?

Question: Can you explain data versioning ? Answer : In addition to being a schema-less database, HBase is also versioned. Every time you perform an operation on a cell, HBase implicitly stores a new version. Creating, modifying, and deleting a cell are all treated identically; they’re all new versions. When a cell exceeds the maximum number of versions, the extra records are dropped during the next major compaction. Instead of deleting an entire cell, you can operate on a specific version or versions within that cell. Values within a cell are versioned. Versions are identified by their timestamp, a long. When a version isn’t specified, the current timestamp is used as the basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Question: Which process or component is responsible for managing HBase RegionServer ?

Question: Which component is responsible for managing and monitoring of Regions ?

Question: Why Would You Need HBase? Answer : Use HBase when you need fault-tolerant, random, real time read/write access to data stored in HDFS. Use HBase when you need strong data consistency. HBase provides Bigtable-like capabilities on top of Hadoop. HBase’s goal is the hosting of very large tables — billions of rows times millions of columns — atop clusters of commodity hardware. HBase manages structured data on top of HDFS for you, efficiently using the underlying replicated storage as backing store to gain the benefits of its fault tolerance and data availability and locality. Question: When Would You Not Want To Use HBase?

Question: What is the use of "HColumnDescriptor " ?

Question:In HBase what is the problem with "Time Series Data" and can you explain the Hotspot ?

Question: What is salting and How it helps the "TimeSeries HotSpot" problem ?

Question-30 : What is "Field swap/promotion" ?