Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Please order the following in the form of read path for Cassandra?
1. Check partition key cache
2. Check Row Cache
3. Locate the data on disk using compression offset map
4. Fetch the data from SSTables on disk
5. Go directly to the compression offset map if a partition key is found in the partition key cache.
6. Check memtable.
7. Check Bloom filter.

A. 1,2,3,4,5,6,7
B. 6,2,7,1,5,3,4
C. 1,4,3,5,2,7,6
D. 1,7,2,4,5,3,6
E. 6,7,2,1,3,5,4

Answer: B
Exp: You should always keep in mind that first it will check the memtables and last one would be SSTables on disk this would help in reducing the given choices. Below are the order which would be followed when data needs to be read from the Cassandra table.

1. Check memtable.
2. Check Row Cache, if enabled.
3. Check Bloom filter.
4. Check partition key cache, if enabled
5. Go directly to the compression offset map if a partition key is found in the partition key cache.
If the partition summary is checked, then the partition index is accessed.
6. Locate the data on disk using compression offset map.
7. Fetch the data from SSTables on disk.
Row cache should be used when your database is read-intensive. And if db is write-intensive then row cache should be avoided. If a write comes in for the row, the cache for that row is invalidated and is not cached again until the row is read. Hence, row cache is not write-through.

If partition is updated, then the entire partition would be removed from the cache. If the desired partition is not found in row cache then Bloom filter would be checked.

Dev and Admin both

Question-: Which of the following statement is correct for the Bloom filter?
A. Each SSTable has an associated Bloom filter.
B. It is always sure that all SSTables identified by the Bloom Filter will have requested partition data.
C. Bloom filter is stored on heap, so that faster access is possible.
D. Bloom filter can grow up to 1-2 GB for per billion partitions.

Answer: A, D
Exp: In Cassandra each SSTable would have associated with the Bloom Filter. Which can establish that an SSTable does not contain certain partition data. It is also used for determining the likelihood that partition data is stored in an SSTable by narrowing the pool of keys, which increases the partition key lookup.

Using the Bloom filter storage engine discovers which SSTables are likely to have requested data. However, it is never guaranteed. If the Bloom Filter does not rule out an SSTable, the Cassandra database checks the partition key cache.

The Bloom filter is stored in off-heap memory, and grows approximately 1-2 GB per billion partitions.

Admin and Dev both

Question-: Which of the following statements are valid?
A. Partition key cache “hit�? save one seek during the write operations.
B. If a partition key is found in the Partition Key cache then, engine directly go to the compression offset map to fund the compressed block on disk that has the data.
C. If partition key is not found in the key cache, then partition summary will be checked.
D. Partition summary stores the sampling of partition index.
E. If partition keys are found in partition summary, partition index would be searched.
F. The compression offset map stores pointers to the exact location on disk where the desired partition data will be found.

Answer: B,C,D,E, F
Exp: Partition key cache, partition summary and Partition index all are part of the read path. Hence, option-1 can bi discarded. And these are check one by one to find the data location.

Partition Key cache: It store a cache of the partition index in off-heap memory. If partition key is found in this cache (“hit�?), reduces one seek and engine directly go to the compression offset map to find the compressed block on disk that has the data. If partition key is not found in the key cache, then partition summary would be checked.

Partition Summary: The partition summary is an off-heap memory that stores a sample of the partition index. For example, if we set 100 sample keys to be saved. Then it will store 1st , 100th, 200th, 300th etc. sample keys and their exact location in the file. After finding the range pf possible partition key values, the partition index would be searched.

Partition Index: Resides on disk and it has all partition keys mapped to their offset. After the partition summary is checked for a range of partition keys, the search seeks the location of the desired partition key in the partition index.

Compression offset map: The compression offset map stores pointers to the exact location on disk where the desired partition data will be found. This location is stored in off-heap memory and is accessed by either the partition key cache or the partition index. After the compression offset map identifies the disk location, the desired compression partition data is fetched from the correct SSTables.

Admin only

Related Questions

Question-: You are working as an administrator for the Cassandra database. Where the data modeling is done of the time series data. And you have decided to use the “DateTieredCompactionStrategy�?. What all the benefits of having compaction with this strategy?

A. It helps in compacting SSTable based on time period.
B. It helps in compacting SSTable based on size.
C. With this you can have better disk usage
D. Your read performance will also increase after compaction
E. There is lesser RAM needed.
F. You can have your data in only one data center

Question-: Whenever compaction happens
A. It always deletes the tombstone data
B. It keeps the tombstone data upto 3 consecutive Compaction. So that read repair can happen.
C. It would delete the tombstone data if gc grace period had expired.
D. It deletes the tombstone data if it is older than 1hr

Question-: When compaction happens, it picks the partition from the both the old SSTables and merge them, it is always the case that new partition segment in new SSTable bigger than both of the older partition segment.
A. True
B. False

Question-: When compaction is done, then in which of the below case, new SSTable partition segment smaller than the older one?

A. When there are lot of delete operations on both of the partition segment.
B. When there are lot of tombstone marked data in both the partition segment.
C. When there are lot of insert operations on both the partition segments.
D. When there are lot of UPDATE operations.

Question-: You have a big Cassandra table with the overall size around M records. You run the following command.

COPY HE_KEYSPACE.TBL_HADOOPEXAM_COURSES TO ‘home/hadoopexam/he_courses_data.csv’ with HEADER=true and PAGETIMEOUT=40 and PAGESIZE=20 AND DELIMITER=’~’;

However, while doing this exercise. You get below error.

./dump_cassandra.sh: xmalloc: ../../.././lib/sh/strtrans.c:63: cannot allocate XXXXXXXXXX bytes (YYYYYYYY bytes allocated)", "stdout_lines": ["[Sat Jul 13 11:12:24 UTC 2019] Executing the following query:", "COPY HE_KEYSPACE.TBL_HADOOPEXAM_COURSES TO ‘home/hadoopexam/he_courses_data.csv’ with HEADER=true and PAGETIMEOUT=40 and PAGESIZE=20 AND DELIMITER=’~’;"

What is the cause and how can you correct the same?

A. You have to remove PAGETIMEOUT parameter
B. You have to increase the PAGESIZE parameter from 20 to more
C. You have to add BEGINTOKEN and ENDTOKEN parameters
D. You have to add MAXOUTPUTSIZE parameters

Question-: Which of the following helps keeping all the data together based on the partition key?
A. Row Cache
B. Key Cache
C. Partition
D. Bloom filter
E. Clustering key