Question-: Please order the following in the form of read path for Cassandra? 1. Check partition key cache 2. Check Row Cache 3. Locate the data on disk using compression offset map 4. Fetch the data from SSTables on disk 5. Go directly to the compression offset map if a partition key is found in the partition key cache. 6. Check memtable. 7. Check Bloom filter.
A. 1,2,3,4,5,6,7 B. 6,2,7,1,5,3,4 C. 1,4,3,5,2,7,6 D. 1,7,2,4,5,3,6 E. 6,7,2,1,3,5,4
Answer: B Exp: You should always keep in mind that first it will check the memtables and last one would be SSTables on disk this would help in reducing the given choices. Below are the order which would be followed when data needs to be read from the Cassandra table.
1. Check memtable. 2. Check Row Cache, if enabled. 3. Check Bloom filter. 4. Check partition key cache, if enabled 5. Go directly to the compression offset map if a partition key is found in the partition key cache. If the partition summary is checked, then the partition index is accessed. 6. Locate the data on disk using compression offset map. 7. Fetch the data from SSTables on disk. Row cache should be used when your database is read-intensive. And if db is write-intensive then row cache should be avoided. If a write comes in for the row, the cache for that row is invalidated and is not cached again until the row is read. Hence, row cache is not write-through.
If partition is updated, then the entire partition would be removed from the cache. If the desired partition is not found in row cache then Bloom filter would be checked.
Dev and Admin both
Question-: Which of the following statement is correct for the Bloom filter? A. Each SSTable has an associated Bloom filter. B. It is always sure that all SSTables identified by the Bloom Filter will have requested partition data. C. Bloom filter is stored on heap, so that faster access is possible. D. Bloom filter can grow up to 1-2 GB for per billion partitions.
Answer: A, D Exp: In Cassandra each SSTable would have associated with the Bloom Filter. Which can establish that an SSTable does not contain certain partition data. It is also used for determining the likelihood that partition data is stored in an SSTable by narrowing the pool of keys, which increases the partition key lookup.
Using the Bloom filter storage engine discovers which SSTables are likely to have requested data. However, it is never guaranteed. If the Bloom Filter does not rule out an SSTable, the Cassandra database checks the partition key cache.
The Bloom filter is stored in off-heap memory, and grows approximately 1-2 GB per billion partitions.
Admin and Dev both
Question-: Which of the following statements are valid? A. Partition key cache “hit� save one seek during the write operations. B. If a partition key is found in the Partition Key cache then, engine directly go to the compression offset map to fund the compressed block on disk that has the data. C. If partition key is not found in the key cache, then partition summary will be checked. D. Partition summary stores the sampling of partition index. E. If partition keys are found in partition summary, partition index would be searched. F. The compression offset map stores pointers to the exact location on disk where the desired partition data will be found.
Answer: B,C,D,E, F Exp: Partition key cache, partition summary and Partition index all are part of the read path. Hence, option-1 can bi discarded. And these are check one by one to find the data location.
Partition Key cache: It store a cache of the partition index in off-heap memory. If partition key is found in this cache (“hit�), reduces one seek and engine directly go to the compression offset map to find the compressed block on disk that has the data. If partition key is not found in the key cache, then partition summary would be checked.
Partition Summary: The partition summary is an off-heap memory that stores a sample of the partition index. For example, if we set 100 sample keys to be saved. Then it will store 1st , 100th, 200th, 300th etc. sample keys and their exact location in the file. After finding the range pf possible partition key values, the partition index would be searched.
Partition Index: Resides on disk and it has all partition keys mapped to their offset. After the partition summary is checked for a range of partition keys, the search seeks the location of the desired partition key in the partition index.
Compression offset map: The compression offset map stores pointers to the exact location on disk where the desired partition data will be found. This location is stored in off-heap memory and is accessed by either the partition key cache or the partition index. After the compression offset map identifies the disk location, the desired compression partition data is fetched from the correct SSTables.