HBase NoSQL Interview Preparation (Q&A)

Question: Once you delete the data in HBase, when exactly they are physically removed ?
Answer : During Major compaction, b. Because HFiles are immutable, it’s not until a major compaction runs that these tombstone records are reconciled and space is truly recovered from deleted records.

Question: Please describe minor compaction
Answer : Minor : A minor compaction folds HFiles together, creating a larger HFile from multiple smaller HFiles.

Question: Please describe major compactation ?
Answer : When a compaction operates over all HFiles in a column family in a given region, it’s called a major compaction. Upon completion of a major compaction, all HFiles in the column family are merged into a single file

Question: What is tombstone record ?
Answer : The Delete command doesn’t delete the value immediately. Instead, it marks the record for deletion. That is, a new “tombstone�? record is written for that value, marking it as deleted. The tombstone is used to indicate that the deleted value should no longer be included in Get or Scan results.

Related Questions

Question: Explain what is WAL and Hlog in Hbase?

Question: In Hbase what is column families? Answer : Column families comprise the basic unit of physical storage in Hbase to which features like compressions are applied. Question: Explain what is the row key? Answer : Row key is defined by the application. As the combined key is pre-fixed by the rowkey, it enables the application to define the desired sort order. It also allows logical grouping of cells and make sure that all cells with the same rowkey are co-located on the same server. Question: Explain deletion in Hbase? Mention what are the three types of tombstone markers in Hbase?

Question: Explain how does Hbase actually delete a row?

Question: Explain what happens if you alter the block size of a column family on an already occupied database? Answer : When you alter the block size of the column family, the new data occupies the new block size while the old data remains within the old block size. During data compaction, old data will take the new block size. New files as they are flushed, have a new block size whereas existing data will continue to be read correctly. All data should be transformed to the new block size, after the next major compaction. Question: What is Bloom filter and how it helps ?

Question: How scan caching helps in HBase ?

Question: What is the impact of Scan Caching in MapReduce Jobs ?

Question: What is the Impact of "Turn off WAL on Puts" ?

Question: Why to "Pre-Creating Regions " ?