HBase NoSQL Interview Preparation (Q&A)

Question: What is HFile ?
Answer : The HFile is the underlying storage format for HBase. HFiles belong to a column family and a column family can have multiple HFiles. But a single HFile can’t have data for multiple column families.

Question: How HBase Handles the write failure.
Answer: Failures are common in large distributed systems, and HBase is no exception. Imagine that the server hosting a MemStore that has not yet been flushed crashes. You’ll lose the data that was in memory but not yet persisted. HBase safeguards against that by writing to the WAL before the write completes. Every server that’s part of the. HBase cluster keeps a WAL to record changes as they happen. The WAL is a file on the underlying file system. A write isn’t considered successful until the new WAL entry is successfully written. This guarantee makes HBase as durable as the file system backing it. Most of the time, HBase is backed by the Hadoop Distributed Filesystem (HDFS). If HBase goes down, the data that was not yet flushed from the MemStore to the HFile can be recovered by replaying the WAL.

Question: Which of the API command you will use to read data from HBase.
Answer : Get
exmaple
Get g = new Get(Bytes.toBytes("John Smith"));
Result r = usersTable.get(g);

Question: What is the BlcokCache ?
Answer : HBase also use the cache where it keeps the most used data in JVM Heap, along side Memstore. d. The BlockCache is designed to keep frequently accessed data from the HFiles in memory so as to avoid disk reads. Each column family has its own BlockCache The “Block�? in BlockCache is the unit of data that HBase reads from disk in a single pass. The HFile is physically laid out as a sequence of blocks plus an index over those blocks. f. This means reading a block from HBase requires only looking up that block’s location in the index and retrieving it from disk. The block is the smallest indexed unit of data and is the smallest unit of data that can be read from disk.

Related Questions

Question: Can major compaction manually triggered ?

Question: Can you explain data versioning ? Answer : In addition to being a schema-less database, HBase is also versioned. Every time you perform an operation on a cell, HBase implicitly stores a new version. Creating, modifying, and deleting a cell are all treated identically; they’re all new versions. When a cell exceeds the maximum number of versions, the extra records are dropped during the next major compaction. Instead of deleting an entire cell, you can operate on a specific version or versions within that cell. Values within a cell are versioned. Versions are identified by their timestamp, a long. When a version isn’t specified, the current timestamp is used as the basis for the operation. The number of cell value versions retained by HBase is configured via the column family. The default number of cell versions is three. Question: Which process or component is responsible for managing HBase RegionServer ?

Question: Which component is responsible for managing and monitoring of Regions ?

Question: Why Would You Need HBase? Answer : Use HBase when you need fault-tolerant, random, real time read/write access to data stored in HDFS. Use HBase when you need strong data consistency. HBase provides Bigtable-like capabilities on top of Hadoop. HBase’s goal is the hosting of very large tables — billions of rows times millions of columns — atop clusters of commodity hardware. HBase manages structured data on top of HDFS for you, efficiently using the underlying replicated storage as backing store to gain the benefits of its fault tolerance and data availability and locality. Question: When Would You Not Want To Use HBase?

Question: What is the use of "HColumnDescriptor " ?

Question:In HBase what is the problem with "Time Series Data" and can you explain the Hotspot ?

Question: What is salting and How it helps the "TimeSeries HotSpot" problem ?

Question-30 : What is "Field swap/promotion" ?