Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: You know when you write the data in Cassandra cluster there are various possible places where data would be written and while reading data back it checks all these storages to retrieve the latest possible data. However, for efficiency it needs to store the data in sorted order by clustering columns. Which of the following storage would have data stored by clustering column?

A. MemTable
B. SSTable
C. Partition Key Cache
D. Row Cache
E. Commit log

Answer: A, B
Exp: While writing the data Cassandra first write the data in the MemTable as well as CommitLog. In MemTable data would be sorted by the Clustering column but in the Commit log it would be written as sequentially as on when received means append only mode. SSTables are created using the Memtables, after certain interval all the MemTables are compacted and created as SSTable and would be stored on the disk. Once SSTables are created they can not be modified. Only new SSTable can be created from existing SSTable.

Admin and Dev both

Question-: Please map the following

A. Row Cache
B. Bloom Filter
C. Partition Key Cache
D. Partition Summary
E. Partition Index
F. Compression offset map

1. Subset of the partition data stored on disk in the SSTables will be stored in memory
2. Helps in finding which SSTables can have requested data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stores the sampling of partition index.
5. Stores an index of all partition keys mapped to their offset.
6. Stores pointers to the exact location on disk where the desired partition data will be found.

Answer: A-1, B-2, C-3, D-4, E-5
Exp: Cassandra read path goes through the various storages and indexes to return the correct data. And it uses the caches, summary, indexes etc. Let’s check one by one in sequence.

1. Memtable: This is in memory data structure, if desired data is found in that. It would be written from here only. Because it would have entire row requested stored. Hence, no further read require until and unless there is read repair.
2. Row Cache: It caches only when enabled. Which stores a subset of the partition data which is stored on disk in SSTable and bring in memory. It is off-heap storage and should not be considered while setting Java memory. And you have to specify the amount of memory you want to use as a cache. Remember it is not writing through. If a write comes in for the row, the cache of that row is invalidated and is not cached again until the row is read. If the partition is updated then the entire partition will be evicted from the cache. If the desired partition is not found in cache then Bloom Filter would be checked.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Partition Key Cache: This is again needing to be enabled and off-heap. It stores the sampling of the partition index. A partition index contains all partition keys.
5. Partition Index: It is on the disk and stores an index of all partition keys mapped to their offset.
6. Compression offset map: It stores pointers to the exact location of disk that the desired partition data will be found.

Admin only

Question-: Please map the following

A. Partition Summary
B. Key Cache
C. SSTables
D. Partition Index

1. Stores the byte offset into the partition index.
2. Stores the byte offset of the most recently accessed records.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stores the index of all partition keys mapped to their offset.

Answer: A-1,B-2,C-3,D-4

Explanation: Cassandra read path goes through the various storages and indexes to return the correct data. And it uses the caches, summary, indexes etc. Let’s check one by one in sequence.

1. Memtable: This is in memory data structure, if desired data is found in that. It would be written from here only. Because it would have entire row requested stored. Hence, no further read require until and unless there is read repair.
2. Row Cache: It caches only when enabled. Which stores a subset of the partition data which is stored on disk in SSTable and bring in memory. It is off-heap storage and should not be considered while setting Java memory. And you have to specify the amount of memory you want to use as a cache. Remember it is not writing through. If a write comes in for the row, the cache of that row is invalidated and is not cached again until the row is read. If the partition is updated then the entire partition will be evicted from the cache. If the desired partition is not found in cache then Bloom Filter would be checked.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Partition Key Cache: This is again needing to be enabled and off-heap. It stores the sampling of the partition index. A partition index contains all partition keys.
5. Partition Index: It is on the disk and stores an index of all partition keys mapped to their offset.
6. Compression offset map: It stores pointers to the exact location of disk that the desired partition data will be found.

Admin only

Related Questions

Question-: You have decided to upgrade your Cassandra cluster from older version to newer version. And also, your older cluster has nodes and now in the new cluster you would be having nodes. Which all things you should do while migration preparation?
A. Configure the same schema in the new 10 node cluster.
B. Configure that client writes should go in both the cluster.
C. Take the snapshot from the old cluster and copy the data files.
D. Take the snapshot from the old cluster and copy the data using sstableloader.
E. Switch to the new cluster

Question-: You are having a node Cassandra cluster with the single token architecture, now you plan to add two more nodes to the Cassandra cluster. Which of the following is true in this case?
A. Existing node should keep their existing token assignments.
B. New nodes are assigned tokens that bisect the existing token range.
C. You have to re-calculate the tokens for the entire cluster, and assign the new tokens to the existing nodes.
D. It is fine, if you are having old data on the new nodes.

Question-: You are adding new node to the existing Cassandra cluster which has single token architecture. You must keep the initial_token property as blank?
A. True
B. False

Question-: You are planning to setup new Cassandra cluster with the single token architecture and should span datacenters. It is ideal to have seed nodes from the single datacenter only?
A. True
B. False

Question-: ___________________ refers to how up-to-date and synchronized a row of data is on all of its replicas.
A. Transaction
B. Compaction
C. Consistency
D. Compression

Question-: Which of the following command would flush the memtable without listening for connections to other nodes?
A. nodetool drain
B. nodetool flush
C. nodetool commit
D. nodetool tpstats