Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Which of the following statements are true.
A. Memtable is maintained per table basis.
B. SSTable is maintained per table basis
C. Commit Log is shared among tables.
D. Memtable is shared among the tables.
E. SSTable is shared among the tables.

Answer: A,B,C

Explanation: Memtables and SSTables are maintained per table basis. And Commit Logs are shared among the tables. SSTables are immutable, and not written to again after the memtable is flushed. Hence, it is possible that a single partition can span across multiple SSTable.

Admin and Dev both

Question-: There is a process called Compaction for merging SSTables, which of the following statement is true?
A. While insert and update happens Cassandra engine overwrite existing rows with inserts and updates.
B. Engine does not perform deletes by removing the deleted data. Instead, the database marks deleted data with tombstone.
C. During compaction there is temporary spike in disk space as well as disk I/O.
D. Database can read data from new SSTables even before compaction process finishes.
E. During compaction there would be high cache miss.
F. Out of date versions of a row may exist on other node even compaction happen on another node.

Answer: B,C,D,F

Explanation: Compaction works with the collection of SSTables. This process various rows from across SSTables and assemble in one complete row. As this process uses the sorted data by partition key, hence there is no random I/O and overall process is performant. However, during the compaction process there will be a disk usage spike because there would be a time where both old and new SSTables co-exists. Cassandra engine can read data directly from the new SSTable even before it finishes writing.

As the database processes writes and reads, it replaces the old SSTables with the new SSTables in page cache. The process of caching new SSTable, while directing reads away from the old ones, is incremental and does not cause a dramatic cache miss. Hence, there would be predictable high performance even under heavy load.

As Cassandra can have replicas of each row on two or more nodes. Each node performs compaction independently. This means that out of date versions of a row have been dropped from one node but they may still exist on another node.

Admin and Dev only

Question-: Consider you have a setup of Cassandra cluster with the replication factor as to prevent data loss. Which of the following statement is true when you delete a row/data?

A. Consider you have a setup of Cassandra cluster with the replication factor as 3 to prevent data loss. Which of the following statement is true when you delete a row/data?
B. If a node has a record with the Tombstone marked and another node has more recent changes. Then while reading you would not get data.
C. If a node has record with the Tombstone marker and another node has older value record then while reading it will return the data/record.
D. If client writes a new update to existing tombstone record with the grace period, then there would be an overwritten to the existing Tombstone record.
E. Storage engine uses hinted hindoffs to replay the database mutations that the node missed while it was down.

Answer: A, D, E
Exp: As mentioned in the question we have multi node Cassandra cluster. Which can store replica of the same data across two or more nodes. If a node receives a delete for data it stores locally, the node marks the specified record for deletion and tries to pass the tombstone to other nodes containing replicas of that record. If one replica node is unresponsive at that time, it does not receive the tombstone immediately, so it still contains the pre-delete version of the record. If the tombstone has already been deleted from the rest of the cluster before that node recovers, the database treats the record on the recovered node as new data, and propagates it to the rest of the cluster. This kind of deleted but persisted records are called a zombie.

To prevent the reappearance of zombies, the database gives each tombstone record a grace period. The purpose of grace period to give unresponsive nodes time to recover and process tombstone normally. When multiple replica answers are part of a read request, and those response differ, then whichever values are most recent take precedence. For example, if a node has a tombstone but another node has a more recent change, then final result includes the more recent change,

If a node has a tombstone and another node has only an older value for the record, then the final record will have the tombstone. If a client writes a new update to the tombstone during the grace period, the database overwrites the tombstone.

When unresponsive node recovers, engine usage hinted handoffs to replay the database mutations that the node missed while it was down. Cassandra does not replay a mutation for a Tombstone during grace period. If the node does not recover until after the grace period end, the deletion might be missed.

Admin and Dev only

Related Questions

Question-: In the latest version of Cassandra you can repair the data using NodeSync utility/service, which runs in background. Which of the below correctly applies for NodeSync utility?
A. Using nodetool you can start/stop/enable the NodeSync service
B. NodeSync can be enabled either only for all table or not at all.
C. NodeSync works on the segments which are specific to a table. Created by dividing tokens in equal size.
D. NodeSync prioritize the segment in order to meet the per-table deadline target.

Question-: NodeSync utility is used for repairing the data on each table level, which is further divided in the segments. Which of the following is a valid statement in this case?
A. While repairing a particular segment maintained as locked in nodesync_status table.
B. NodeSync depends on read repair path.
C. If across the datacenter if WAN (Wide Area Network) is not good. Then also NodeSync utility performance would not be affected.
D. NodeSync validates the data only if replication factor 2 or more.

Question-: When NodeSync utility needs to repair the data in a particular segment, it follows the read path. Please arrange the below in the read repair order flow.

A. Read data from all replicas
B. Pick the data with the latest timestamp
C. Repair node with stale data

Question-: Match the following configuration parameters and their usage.

A. gc_grace_seconds
B. max_hint_window_ms
C. deadline_target_sec
D. rate_in_kb

1. Defines the time Cassandra keeps tombstone around.
2. Once this expires nodes stop saving hints.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Maximum number of bytes per second used to validate the data.

Question-: While reading the data Cassandra introduced a data structure named bloom filter. Which of the following correctly applies for using bloom filter?
A. It helps to find that data definitely does not exist or exists in an SSTable.
B. It helps to find that data probably does not exist in an SSTable or definitely exists in SSTable.
C. It helps to find that data definitely does not exist in an SSTable or probably exists in SSTable.
D. Increasing the bloom filter (by reducing the property value property bloom_filter_fp_chance) will require more memory.
E. In rarely read database, it is better to set bloom_filter_fp_chance to a much higher number.

Question-: You can change the bloom_filter_fp_chance using the Alter table statement and that would be applied immediately on existing as well as new SSTable?
A. True
B. False