Question-: You are working with the read heavy database requirement and you decide to use the Cassandra caching mechanism, which of the following is are correct for Cassandra inbuilt caching? A. You can only cache the Partition Key B. You can cache both Partition Key as well as entire Row C. When read happens it first check the existence of key in Partition Key Cache and then Row cache. D. When read happens it first check the existence of key in Row Cache and then Partition Key cache.
Answer: B, D Exp: Cassandra provides the in-built caching solution. There are two things which can be cached in case of Cassandra database as below. Key caching is enabled by default and high level of key caching are recommended for most of the scenarios. Row cache and Key cache can co-exist. - Partition Key cache: This caches the partition index for a Cassandra table. If you don’t enable this then Cassandra would read the data directly from the Disk. - Row Cache: You need to configure the number of rows you wanted to cache by setting the rows_per_partition.
In Cassandra when a particular node goes down, the client can read data from another cached replica of the data. However, there is no separate caching tier in the Cassandra. Hence, always remain in sync with what is on the disk.
Usually, administrator should enable either the Partition Key cache or row cache for a table. Administrator should consider the row only when the number of reads is much bigger than the number of writes. Consider using the operating system page cache instead of the row cache, because writes to a partition invalidates the whole partition in the cache. You can enable the caching by configuring the caching table property. If you want to set the property globally then set the properties in Cassandra.yaml file.
row_cache_size_in_mb parameter determines how much space in memory would be allocated for most frequently read partitions.
You can configure the cache by creating a property map of values for the caching property as below. - Keys : ALL or NONE - rows_per_partition : Number of Rows either N, ALL or NONE
Admin Only
Question-: Please map the followings
A. SizeTieredCompactionStrategy B. DiteTieredCompactionStrategy C. LeveledCompactionStrategy
1. This triggers a minor compaction when there are a number of similar sized SSTables on the disk. 2. Stores the data written within a certain period of time in the same SSTable. 3. Access Mostly Uused Products by 50000+ Subscribers
Answer: A-1, B-2, C-3
Explanation: SizeTieredCompactionStrategy (STCS) : This is a default compaction strategy, If there are same size SSTables exists then it initiates the minor compaction, however, it does not involve all the tables in a keyspace.
DateTieredCompactionStrategy : As name suggest it stores the data based on the period and has good option of the time series data. You can define like new SSTable for every 4 hours and then initiate the compaction accordingly.
LeveledCompactionStrategy: This creates the SSTables relatively small and fixed size and by default its value us 160MB. And they are grouped into levels and SSTables are guaranteed to be non-overlapping.
Admin and Dev both
Question-: You are working with the Cassandra database for writing around MB of data. While writing you client application is making sure it has received that acknowledgement is received for each write. After just writing particular node goes down, which had acknowledged the write request. You queried another node and you don’t find the written data. How come that is possible, because Cassandra already acknowledged the write request?
A. Cassandra cluster is not configured correctly B. There is a bug in the Cassandra storage engine C. Data is only written to the SSTable and Memtables of that node D. Data is only written to Memtable and Commit log of that node E. Data is only written to Commit log and SSTables
Answer: D Exp: As you know while writing the data to Cassandra, Cassandra follow the below path. Concurrently data would be written to the Commit Log (On disk) and Memtable (in memory). Once data is written on both of this node will acknowledge the successful write. However, in this case node crashed, it means data is not yet copied or replicated on another node. That’s the reason even after getting successful acknowledgement data is lost. In such scenario following can be done - Avoid this scenario to happen by keeping the commit log in different storage layer. - Restart the node and replay everything from the commit log.