Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Which of the following statements are correct for the underlying storage engine of Cassandra?
A. Cassandra follows read-before-write strategy
B. In most of the cases Cassandra storage engine groups inserts and updates in memory and at intervals write the data to disk in append mode.
C. Cassandra database sequentially writes immutable files.
D. A,B
E. B,C

Answer: E
Exp: Cassandra avoids the reading before writing. Read-before-write can result in large latencies in read performance and other problems. And to avoid read before write storage engine groups inserts and updates in memory and, at certain interval it sequentially writes the data to disk in append mode. Once written to disk, the data is immutable and is never overwritten.

Because the Cassandra storage engine writes data sequentially, which can avoid the amplification and disk failure, the database accommodates inexpensive, consumer SSDs extremely well.

Admin and Dev both

Question-: Please arrange below in correct order of writing the data by Cassandra Storage engine?

A. Logging data in the commit log
B. Writing data to memtable
C. Flushing data from the memtable
D. Strong data on disk in SSTables

Answer: A,B,C,D
Exp: When write happens it first goes to commit log as well as memTables. Commit logs survives permanently even if power fails on a node. Memtable keeps all the write operations in sorted order until reaching a configurable limit and then flushed to SSTable.

While flushing the data from memtable database writes data to disk and also partition index would be created on the disk that maps the tokens to a location on disk.

Even we can flush the data manually using the nodetool flush or nodetool drain command. It is always recommended that before restarting the node we should flush the memtable, which can reduce the commit log replay time.

Admin only

Question-: There are two tables Table_A and Table_B with the following throughput.
- Table_A has extremely high throughput
- Table_B has very low throughput
Which of the following statements are correct with regards to memtable and commit log segments?

A. Commit logs are divided into segments.
B. New writes would happen in new segments only when previous segment is filled.
C. When the commit log reaches its threshold it will forces Table_B memtable to be flushed as well.
D. A,B
E. A,B,C

Answer: A,B,C

Explanation: Commit Logs are made of segments. All the writes are recorded in order and new segments are created whem existing segment filled. Engine will purges commit log segments only when all the data in a segment only after all the data in a segment has been slushed to disk from the emtable.

All the commit log segments will have write from all the tables (in this case from both A and B) as well as from system tables. As Table_A has high throughput it fills faster than Table_B. And Table_B memtable will be flushed slowly then Table_A. When the commit lg reaches to its threshold it forces Table_B memtable to flush and then purges the segments.

Admin and Dev both

Related Questions

Question-: You have node Apache Cassandra cluster where consistency level is set as QUORUM as well as replication factor is . (CL=QUORUM , RF=). When write happens and one of the node goes down, what would happen in this case (other settings are default)?

A. Coordinated node will store the hint
B. As writes already done on 2 nodes, it will return successful write.
C. Coordinated node will return UnavailableException
D. When failed nodes comes back after 6 hours, coordinated nodes will replay the hint. So that 3rd copy of data will be created.

Question-: You have node cluster with setting as CL=ANY and RF=, What happen when all the nodes are down where data needs to be written (Assume other settings are default)?

A. It means coordinator node will store the hints.
B. Even all 3 nodes are down, it will return successful write.
C. It will wait for one of the 3 nodes to come back until than write will hangs.
D. If all the nodes come back after 4 hours all the replicas will be copied from coordinated node.
E. None of the above

Question-: Which of the following would help in keeping the data in sync across the cluster?
A. Hinted Handdoff
B. Read Repair
C. Anti-entropy repair

Question-: You have consistency level set as ONE. Would always have a read repair operation as blocking?
A. True
B. False

Question-: You see that your Cassandra database is occasionally not in Sync and you decided to enable the NodeSync utility. What all are correct for this utility?
A. You have to schedule the NodeSync activity which can periodically e.g. every 4 hours.
B. It has a high impact on the Cluster performance.
C. This does not require manual intervention.
D. Each node should run the NodeSync service.
E. NodeSync is enabled for per table basis and it validates the local data ranges for NodeSync-enabled tables and repairs any inconsistency found.
Ans; C,D,E
Exp: NodeSync is an utility service to repair the data in Cassandra database. This service runs in the background. This has low overhead and provides consistent performance and minimal manual effort required like just enabling it. It has following features
- Continuously validates that data is in sync on all replica.
- Always running but low impact on cluster performance.
- Fully automatic, no manual intervention needed.
- Completely replace anti-entropy repairs.
Each node runs the NodeSync service, and remain idle if there is nothing to validate. It needs to be enabled on per table basis, which continuously validates the local data ranges for the tables on which this is enabled.

Remember: when NodeSync is enabled on the table, then running repair command on that particular table will be rejected.

Admin only

Question-: Which of the following statement is/are correct?
A. Vnode helps in determining partition range and rebalancing the cluster when adding or removing nodes.
B. You must have same token architecture across the entire cluster. Means all the nodes should be vnodes enabled or single-token architecture.
C. You can have one of the Datacenter as transaction only in same cluster.
D. When adding more than once nodes to the cluster using allocation algorithm they should not be added altogether and should be done one by one.

Question-: You have a Cassandra cluster enabled with the vnodes. How can you disable the same?
A. We need to comment the num_tokens in cassandra.yaml file
B. We need to comment the allocate_tokens_for_loal_replication_factor in cassandra.yaml file
C. Uncomment the initial_token and set it to 1.
D. Comment the initial_token
E. Uncomment num_tokens and set it with the 8