Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: In the latest version of Cassandra you can repair the data using NodeSync utility/service, which runs in background. Which of the below correctly applies for NodeSync utility?
A. Using nodetool you can start/stop/enable the NodeSync service
B. NodeSync can be enabled either only for all table or not at all.
C. NodeSync works on the segments which are specific to a table. Created by dividing tokens in equal size.
D. NodeSync prioritize the segment in order to meet the per-table deadline target.

Answer: A,C,D

Explanation: NodeSync is utility which repair the data in background.
- Once enabled it always runs, if there is data to be repaired else not.
- Even being continuously running it has very low impact on the cluster performance.
- It does not require ant manual intervention.
- It completely replaces the anti-entropy repairs.
- This runs on every node for the enabled tables. You can have it is enabled for few tables and not all.
- NodeSync service validates only the local data (divided in segment). This segment act as saves points. E.g. 1000MB local partition can be divided into 5 segment each with the size of 200MB
- Each segment either fully repaired on not at all. Segment level atomicity for repair.
- Even segments are prioritized for repair in order to meet the per-table deadline target.

Admin only

Question-: NodeSync utility is used for repairing the data on each table level, which is further divided in the segments. Which of the following is a valid statement in this case?
A. While repairing a particular segment maintained as locked in nodesync_status table.
B. NodeSync depends on read repair path.
C. If across the datacenter if WAN (Wide Area Network) is not good. Then also NodeSync utility performance would not be affected.
D. NodeSync validates the data only if replication factor 2 or more.

Answer: A,B,D

Explanation: Yes, while repairing a particular segment its status needs to maintained in a system table “system_distributed.nodesync_status�? table. And whatever segment it start repairing/validating, it would mark as locked in this table. So not any other process start repair the same segment.

NodeSync repair uses the read_repair path. And obviously repair is required when replication factor 2 or more. For single copy there is no repair. However, when replication factor is 2 or more and data could be stored across the datacenter which depend on the WAN. And if WAN network does not provide good speed then this repair performance would certainly impact.

Admin only

Question-: When NodeSync utility needs to repair the data in a particular segment, it follows the read path. Please arrange the below in the read repair order flow.

A. Read data from all replicas
B. Pick the data with the latest timestamp
C. Repair node with stale data

Answer: A,B,C

Explanation:
A. Read data from all the replicas across the nodes in the cluster.
B. If there is data inconsistency then pick the data with the latest timestamp.
C. Repair stale nodes in the cluster.

Admin only

Related Questions

Question-: Once you have removed a node from the Cassandra cluster, using the “nodetool decommission�? command. Which of the following is correct?
A. Node will go offline
B. JVM process would be running on this node
C. Data would not be deleted from the decommissioned node.
D. If you want to add this node back to the cluster, then you should not delete this data.

Question-: In Cassandra cluster, once of the node is trying to get the gossip info from the node, it was already doing. But somehow, it is not able to get the gossip info. Then what would happen in this case?
A. The node would not get the gossip info then it will retry after 15 mins to get the info from same node.
B. Node will become offline. As it does not have gossip info.
C. Node will try to gossip info from the next neighboring node.
D. Node will connect to the seed nodes to get the gossip info

Question-: You are running node Cassandra cluster, and you found that one of the node in the cluster is dead. You want to replace that node with the new node. Which of the following correctly applies in this situation?
A. You have to manually copy the data from the dead node to the new node. Before adding to the cluster. So that replacement is quick.
B. You must have tested this node by adding to this cluster and it may have created directories for the data, save_caches, commitlog and hints.
C. You can not replace the nodes in the single-token cluster architecture.
D. Node you are adding to the cluster must not have previous data in the data directory, saved_caches, commitlog and hints.

Question-: In your node Cassandra cluster, one of the node is down and the node is seed node. What is the next step (assuming you have nodes designated as seed node)?
A. You should quickly add replacement for this node.
B. You don’t have to worry as there two other nodes are available as seed node.
C. You should update the seed node info on each node in the Cassandra cluster.
D. Use the “nodetool removenode�? command to remove that dead node, which will take care of updating seed info on each node.

Question-: You have node Cassandra cluster which is physically separated in two datacenters. nodes in each datacenter. You have found that one of the nodes is down in the Cluster (this is not a seed node). You need to replace this node from the cluster. What all are true, for replacing the node, rather than first remove and then add?
A. You don’t have to move the data two time
B. Same tokens will be used by the new replaced node.
C. Existing node will work as a backup new node.
D. Cassandra cluster would be offline for few seconds only

Question-: While replacing a dead node in the cluster you have to update the jvm.option file with the “replace_address�? properties on the new node, what value it should have?
A. IP address of the seed nodes
B. IP address of the new node
C. IP address of the existing dead node
D. List of IP address for all the seed nodes