Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Match the following configuration parameters and their usage.

A. gc_grace_seconds
B. max_hint_window_ms
C. deadline_target_sec
D. rate_in_kb

1. Defines the time Cassandra keeps tombstone around.
2. Once this expires nodes stop saving hints.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Maximum number of bytes per second used to validate the data.

Answer: A-1, B-2, C-3, D-4

Explanation: The gc_grace_seconds defines the time Cassandra keeps tombstone around. Tombstones are special values Cassandra writes instead of the actual data whenever data is deleted or its TTL expires. The default value of the gc_grace_seconds is 10 days. On single node we can set this value as 0.

Next is max_hint_window_in_ms (default is 3 hours), this is the maximum time hints are generated for a node, which does not response. After this interval, new hints are no longer generated until the node is back up and responsive. If a node comes up and goes down then this value would be reset. This setting can prevent a sudden demand for resources when a node is brought back online and the rest of the cluster attempts to replay a large volume of hinted writes.

N0odesync tries to validate all tables within their respective deadlines, while respecting the configured rate limit. If a table is 10GB and has a deadline_target_sec=10 and the rate_in_kb is set to 1MB/sec, validation will not happen quickly enough. Hence, we need to tune both this parameter.

The rate_in_kb sets the per node rate of the local NodeSync service, it controls the maximum number of bytes per second used to validate the data. Each node with NodeSync enabled has deadline_target_sec set. Which is the target for the maximum time between 2 validations of the same data. As long as the deadline is met. Keep in mind that deadline_target_sec should always be less than or equal to the grace period.

Admin only

Question-: While reading the data Cassandra introduced a data structure named bloom filter. Which of the following correctly applies for using bloom filter?
A. It helps to find that data definitely does not exist or exists in an SSTable.
B. It helps to find that data probably does not exist in an SSTable or definitely exists in SSTable.
C. It helps to find that data definitely does not exist in an SSTable or probably exists in SSTable.
D. Increasing the bloom filter (by reducing the property value property bloom_filter_fp_chance) will require more memory.
E. In rarely read database, it is better to set bloom_filter_fp_chance to a much higher number.

Answer: C, D, E
Exp: While reading the data from the Cassandra, Storage engine merges the data whatever currently available in RAM (memtable) with the data in (SSTables). However, it is not good idea to read every SSTable to find that data is available in which SSTable. Hence, there is a data structure introduced which is known as Bloom Filter. Bloom Filter is a probable value, which will tell you what is the probable value that data you are looking for is exists in this SSTable. But with the assurance it can tell you that data definitely does not exists in particular SSTable.

Hence, Bloom filter does not guarantee that the data exists in a given SSTable, bloom filters can be made accurate by allowing them to consume more RAM. You as an administrators or operators can tune the probability by setting the value for parameter bloom_filter_fp_chance which should be between 0 and 1, which has default value as 0.1

Remember Bloom Filters are stored on the RAM, and offheap. Hence, while setting Heap size, you should not consider the bloom filter. If value of bloom_filter_fp_chance increases then memory requirement increases.

Hence Bloom Filter tell you about the False Positive chance. Based on its value engine will scan through the SSTables. Hence, parameters should be tuned as below.
1. If you have more RAM availability and still using mechanical disks which are slow. You should use bloom_filter_fp_chance with lower value e.g. 0.01 to avoid any excess IO.
2. If you have less amount of RAM and very fast disk like SSD then you should avoid the more usage of RAM by setting bloom_filter_fp_chance value to higher.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Similarly, you are working in analytics world where data always needs to be read in whole or required entire data scanning then again you should keep that value lower.

Admin Only

Question-: You can change the bloom_filter_fp_chance using the Alter table statement and that would be applied immediately on existing as well as new SSTable?
A. True
B. False

Answer: B
Exp: You can see the current value of the Bloom Filter using the Describe table command. And if you want to change this value then you have to run ALTER TABLE command.
When you change the bloom_filter_fp_chance value for a table, the effect would not be immediate. The bloom filter will be calculated again when the new SSTable file would be written and persisted on the disk. However, existing SSTable would not be modified, that would be again affected when compaction happens.

If you want to initiate the SSTable rewrite then you can use the below commands

- nodetool scrub
- nodetool upgradesstables -a

Admin only

Related Questions

Question-: You are having node Cassandra cluster spanning two data centers. One of the seed nodes from this cluster is down and you have to replace that node. What all you would be doing?
A. You would update the Cassandra.yaml file for each node and remove the IP of dead node as seed node.
B. You would update the Cassandra.yaml file for each node and add the IP of new node as seed node.
C. You would be performing rolling restart on all nodes so that nodes are aware of the changes in the seed list
D. You would be updating the jvm.options file of new node and add the IP address of the dead node in replace_address property.

Question-: What all are the functionality of the seed nodes?
A. They would be contacted while bootstrapping to get the gossip info.
B. Any time any node can contact to seed node to get the gossip info from the seed node.
C. Seed node are always used when data read from the cluster.
D. They are also known as coordinator node

Question-: You want to replace the currently running node in the cluster for applying software patch on particular node, which of the following is correct for that?
A. You would first add a new node and then remove the old node on which the patch should be applied.
B. You would be replacing node by using replace_address property in the jvm.option file.
C. You must make sure that the consistency level One is used on the old node
D. You must make sure that the consistency level One is not used on the old node

Question-: You have created the Cassandra cluster using the “GossipPropertyFileSnitch�?. This is a node cluster. Now you have found that the one of the node in the cluster is placed in the wrong rack. What would you do fix that?
A. Decommission node and re-add it to the correct rack and datacenter
B. Update the node’s topology and start the node.
C. Update the Cassandra.yaml file and restart the node
D. Bring down the cluster and then place the node in correct rack

Question-: You have your Cassandra cluster is setup in datacenters, and you want to remove one of the datacenters from the cluster. Which of the following steps at least you have to do to remove the Datacenter from the cluster?
A. No client should write on the nodes in the datacenter which is going to be removed.
B. We need to run the “nodetool repair –full�?
C. Update the keyspace so that they no longer point to datacenter which is going to be removed.
D. Shutdown all the nodes from the datacenter which is being removed.
E. Run “nodetool assassinate�? command on every node in the datacenter being removed.
F. Restart the all the nodes in remaining two datacenter

Question-: Which of the following statement is true with regards to the “nodetool drain�? command?
A. It flushes all the SSTables to the disk
B. It flushes all the memtables to SSTables on disk.
C. It replays data from commit log
D. Cassandra will stops listening for connections from the client and other nodes.
E. You should use this command before upgrading a node to a newer version of Cassandra