Question-: In the latest version of Cassandra you can repair the data using NodeSync utility/service, which runs in background. Which of the below correctly applies for NodeSync utility? A. Using nodetool you can start/stop/enable the NodeSync service B. NodeSync can be enabled either only for all table or not at all. C. NodeSync works on the segments which are specific to a table. Created by dividing tokens in equal size. D. NodeSync prioritize the segment in order to meet the per-table deadline target.
Answer: A,C,D
Explanation: NodeSync is utility which repair the data in background. - Once enabled it always runs, if there is data to be repaired else not. - Even being continuously running it has very low impact on the cluster performance. - It does not require ant manual intervention. - It completely replaces the anti-entropy repairs. - This runs on every node for the enabled tables. You can have it is enabled for few tables and not all. - NodeSync service validates only the local data (divided in segment). This segment act as saves points. E.g. 1000MB local partition can be divided into 5 segment each with the size of 200MB - Each segment either fully repaired on not at all. Segment level atomicity for repair. - Even segments are prioritized for repair in order to meet the per-table deadline target.
Admin only
Question-: NodeSync utility is used for repairing the data on each table level, which is further divided in the segments. Which of the following is a valid statement in this case? A. While repairing a particular segment maintained as locked in nodesync_status table. B. NodeSync depends on read repair path. C. If across the datacenter if WAN (Wide Area Network) is not good. Then also NodeSync utility performance would not be affected. D. NodeSync validates the data only if replication factor 2 or more.
Answer: A,B,D
Explanation: Yes, while repairing a particular segment its status needs to maintained in a system table “system_distributed.nodesync_status� table. And whatever segment it start repairing/validating, it would mark as locked in this table. So not any other process start repair the same segment.
NodeSync repair uses the read_repair path. And obviously repair is required when replication factor 2 or more. For single copy there is no repair. However, when replication factor is 2 or more and data could be stored across the datacenter which depend on the WAN. And if WAN network does not provide good speed then this repair performance would certainly impact.
Admin only
Question-: When NodeSync utility needs to repair the data in a particular segment, it follows the read path. Please arrange the below in the read repair order flow.
A. Read data from all replicas B. Pick the data with the latest timestamp C. Repair node with stale data
Answer: A,B,C
Explanation: A. Read data from all the replicas across the nodes in the cluster. B. If there is data inconsistency then pick the data with the latest timestamp. C. Repair stale nodes in the cluster.