Explanation: The gc_grace_seconds defines the time Cassandra keeps tombstone around. Tombstones are special values Cassandra writes instead of the actual data whenever data is deleted or its TTL expires. The default value of the gc_grace_seconds is 10 days. On single node we can set this value as 0.
Next is max_hint_window_in_ms (default is 3 hours), this is the maximum time hints are generated for a node, which does not response. After this interval, new hints are no longer generated until the node is back up and responsive. If a node comes up and goes down then this value would be reset. This setting can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes.
N0odesync tries to validate all tables within their respective deadlines, while respecting the configured rate limit. If a table is 10GB and has a deadline_target_sec=10 and the rate_in_kb is set to 1MB/sec, validation will not happen quickly enough. Hence, we need to tune both this parameter.
The rate_in_kb sets the per node rate of the local NodeSync service, it controls the maximum number of bytes per second used to validate the data. Each node with NodeSync enabled has deadline_target_sec set. Which is the target for the maximum time between 2 validations of the same data. As long as the deadline is met. Keep in mind that deadline_target_sec should always be less than or equal to the grace period.
Admin only
Question-: While reading the data Cassandra introduced a data structure named bloom filter. Which of the following correctly applies for using bloom filter? A. It helps to find that data definitely does not exist or exists in an SSTable. B. It helps to find that data probably does not exist in an SSTable or definitely exists in SSTable. C. It helps to find that data definitely does not exist in an SSTable or probably exists in SSTable. D. Increasing the bloom filter (by reducing the property value property bloom_filter_fp_chance) will require more memory. E. In rarely read database, it is better to set bloom_filter_fp_chance to a much higher number.
Answer: C, D, E Exp: While reading the data from the Cassandra, Storage engine merges the data whatever currently available in RAM (memtable) with the data in (SSTables). However, it is not good idea to read every SSTable to find that data is available in which SSTable. Hence, there is a data structure introduced which is known as Bloom Filter. Bloom Filter is a probable value, which will tell you what is the probable value that data you are looking for is exists in this SSTable. But with the assurance it can tell you that data definitely does not exists in particular SSTable.
Hence, Bloom filter does not guarantee that the data exists in a given SSTable, bloom filters can be made accurate by allowing them to consume more RAM. You as an administrators or operators can tune the probability by setting the value for parameter bloom_filter_fp_chance which should be between 0 and 1, which has default value as 0.1
Remember Bloom Filters are stored on the RAM, and offheap. Hence, while setting Heap size, you should not consider the bloom filter. If value of bloom_filter_fp_chance increases then memory requirement increases.
Hence Bloom Filter tell you about the False Positive chance. Based on its value engine will scan through the SSTables. Hence, parameters should be tuned as below. 1. If you have more RAM availability and still using mechanical disks which are slow. You should use bloom_filter_fp_chance with lower value e.g. 0.01 to avoid any excess IO. 2. If you have less amount of RAM and very fast disk like SSD then you should avoid the more usage of RAM by setting bloom_filter_fp_chance value to higher. 3. Access Mostly Uused Products by 50000+ Subscribers 4. Similarly, you are working in analytics world where data always needs to be read in whole or required entire data scanning then again you should keep that value lower.
Admin Only
Question-: You can change the bloom_filter_fp_chance using the Alter table statement and that would be applied immediately on existing as well as new SSTable? A. True B. False
Answer: B Exp: You can see the current value of the Bloom Filter using the Describe table command. And if you want to change this value then you have to run ALTER TABLE command. When you change the bloom_filter_fp_chance value for a table, the effect would not be immediate. The bloom filter will be calculated again when the new SSTable file would be written and persisted on the disk. However, existing SSTable would not be modified, that would be again affected when compaction happens.
If you want to initiate the SSTable rewrite then you can use the below commands