Question-: You know when you write the data in Cassandra cluster there are various possible places where data would be written and while reading data back it checks all these storages to retrieve the latest possible data. However, for efficiency it needs to store the data in sorted order by clustering columns. Which of the following storage would have data stored by clustering column?
A. MemTable B. SSTable C. Partition Key Cache D. Row Cache E. Commit log
Answer: A, B Exp: While writing the data Cassandra first write the data in the MemTable as well as CommitLog. In MemTable data would be sorted by the Clustering column but in the Commit log it would be written as sequentially as on when received means append only mode. SSTables are created using the Memtables, after certain interval all the MemTables are compacted and created as SSTable and would be stored on the disk. Once SSTables are created they can not be modified. Only new SSTable can be created from existing SSTable.
Admin and Dev both
Question-: Please map the following
A. Row Cache B. Bloom Filter C. Partition Key Cache D. Partition Summary E. Partition Index F. Compression offset map
1. Subset of the partition data stored on disk in the SSTables will be stored in memory 2. Helps in finding which SSTables can have requested data 3. Access Mostly Uused Products by 50000+ Subscribers 4. Stores the sampling of partition index. 5. Stores an index of all partition keys mapped to their offset. 6. Stores pointers to the exact location on disk where the desired partition data will be found.
Answer: A-1, B-2, C-3, D-4, E-5 Exp: Cassandra read path goes through the various storages and indexes to return the correct data. And it uses the caches, summary, indexes etc. Let’s check one by one in sequence.
1. Memtable: This is in memory data structure, if desired data is found in that. It would be written from here only. Because it would have entire row requested stored. Hence, no further read require until and unless there is read repair. 2. Row Cache: It caches only when enabled. Which stores a subset of the partition data which is stored on disk in SSTable and bring in memory. It is off-heap storage and should not be considered while setting Java memory. And you have to specify the amount of memory you want to use as a cache. Remember it is not writing through. If a write comes in for the row, the cache of that row is invalidated and is not cached again until the row is read. If the partition is updated then the entire partition will be evicted from the cache. If the desired partition is not found in cache then Bloom Filter would be checked. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Partition Key Cache: This is again needing to be enabled and off-heap. It stores the sampling of the partition index. A partition index contains all partition keys. 5. Partition Index: It is on the disk and stores an index of all partition keys mapped to their offset. 6. Compression offset map: It stores pointers to the exact location of disk that the desired partition data will be found.
Admin only
Question-: Please map the following
A. Partition Summary B. Key Cache C. SSTables D. Partition Index
1. Stores the byte offset into the partition index. 2. Stores the byte offset of the most recently accessed records. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Stores the index of all partition keys mapped to their offset.
Answer: A-1,B-2,C-3,D-4
Explanation: Cassandra read path goes through the various storages and indexes to return the correct data. And it uses the caches, summary, indexes etc. Let’s check one by one in sequence.
1. Memtable: This is in memory data structure, if desired data is found in that. It would be written from here only. Because it would have entire row requested stored. Hence, no further read require until and unless there is read repair. 2. Row Cache: It caches only when enabled. Which stores a subset of the partition data which is stored on disk in SSTable and bring in memory. It is off-heap storage and should not be considered while setting Java memory. And you have to specify the amount of memory you want to use as a cache. Remember it is not writing through. If a write comes in for the row, the cache of that row is invalidated and is not cached again until the row is read. If the partition is updated then the entire partition will be evicted from the cache. If the desired partition is not found in cache then Bloom Filter would be checked. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Partition Key Cache: This is again needing to be enabled and off-heap. It stores the sampling of the partition index. A partition index contains all partition keys. 5. Partition Index: It is on the disk and stores an index of all partition keys mapped to their offset. 6. Compression offset map: It stores pointers to the exact location of disk that the desired partition data will be found.