Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Select correct statement with regards to tombstones marker?
A. Insert or updating data with null values can cause of tombstones record generation.
B. Tombstone go through read path
C. Tombstone go through write path
D. Having excessive number of tombstones can improve the overall performance of DB.

Answer: A, C
Exp: Tombstones are created while data is deleted and following operations causes the tombstones to be created.
1. Using Delete with CQL.
2. If record is expiring because of time-to-live setting
3. Using materialized views can cause
4. Insert or updating data with the null values.
5. If we do update using collection column.
Tombstone go through write path, and are written to SSTables on one or more nodes. A key differentiator of a tombstone is a built-in expiration known as the grace period, at the end of expiration period, the tombstone is deleted as part of compaction process.

And if there are excessive number of tombstones on a table then performance will be negatively impacted. And it is a question with the Data Model design and it was not correctly designed.

Admin and Dev only

Question-: You have been given below database design

CREATE KEYSPACE hadoopexam WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE hadoopexam.price_by_year_and_name (
purchase_year int,
course_name text,
price int,
username text,
PRIMARY KEY ((purchase_year , course_name), price)
) WITH CLUSTERING ORDER BY (pricesa ASC);

Which of the following delete statement will create partition level tombstones?

A. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price= 2000;
B. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training';
C. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price> 1999;
D. Partitopn level tombstones cannot be created.

Answer: B
Exp: Tombstones is marker for deletion of the record/s. It can happen in any part of the partitions. There are below possible tombstones
- Partition tombstones: When entire partition is deleted explicitly.
- Row tombstones: When particular row in a partition got deleted.
- Range tombstones: When more than one row like using range condition (<>)
- ComplexColumn tombstones: When inserting updating complex column data
- Cell tombstones: While inserting or updating with null in a particular column. Or deleting particular column in a row.
- TTL Tombstones: When TTL expires.
In the given option, if we check where clause
1. Particular row will be marked for tombstone.
2. Entire partition will be marked for deletion.
3. More than one row marked for deletion (Range)
Hence, option-2 is correct.

Admin and Dev only

Question-: You have table in Cassandra as below

hadoopexam.price_by_year_and_name
(
purchase_year int,
course_name text,
price int,
username text
)

Where purchase_year, course_name are partition key and price is used to create secondary index. Which of the following statement is applicable here?

A. Select all records having price > 1000, causes single partition read.
B. Price column will be used for ordering of the data at storage level.
C. Secondary index on price column are stored locally on each node.
D. Price column will not be used for ordering of the data at storage level.

Answer: C, D
Exp: If we want to use non primary key column as part of where clause, we should use the secondary indexes. However, this is not an ideal solution. Rather you should create a materialized view or additional table that is ordered by price column.

Non-primary key column dos not have any role or ordering the data on storage layer. So querying particular value of a non-primary column results in scanning all partitions. Scanning all partitions generally results in a prohibitive read latency and is not allowed.

Secondary indexes are stored locally on each node. If query includes both partition key and secondary index column in where clause then query will be successful.

Admin and Dev only

Related Questions

Question-: You are working with the read heavy database requirement and you decide to use the Cassandra caching mechanism, which of the following is are correct for Cassandra inbuilt caching?
A. You can only cache the Partition Key
B. You can cache both Partition Key as well as entire Row
C. When read happens it first check the existence of key in Partition Key Cache and then Row cache.
D. When read happens it first check the existence of key in Row Cache and then Partition Key cache.

Question-: Please map the followings

A. SizeTieredCompactionStrategy
B. DiteTieredCompactionStrategy
C. LeveledCompactionStrategy

1. This triggers a minor compaction when there are a number of similar sized SSTables on the disk.
2. Stores the data written within a certain period of time in the same SSTable.
3. Access Mostly Uused Products by 50000+ Subscribers

Question-: You are working with the Cassandra database for writing around MB of data. While writing you client application is making sure it has received that acknowledgement is received for each write. After just writing particular node goes down, which had acknowledged the write request. You queried another node and you don’t find the written data. How come that is possible, because Cassandra already acknowledged the write request?

A. Cassandra cluster is not configured correctly
B. There is a bug in the Cassandra storage engine
C. Data is only written to the SSTable and Memtables of that node
D. Data is only written to Memtable and Commit log of that node
E. Data is only written to Commit log and SSTables

Question-: You know when you write the data in Cassandra cluster there are various possible places where data would be written and while reading data back it checks all these storages to retrieve the latest possible data. However, for efficiency it needs to store the data in sorted order by clustering columns. Which of the following storage would have data stored by clustering column?

A. MemTable
B. SSTable
C. Partition Key Cache
D. Row Cache
E. Commit log

Question-: Please map the following

A. Row Cache
B. Bloom Filter
C. Partition Key Cache
D. Partition Summary
E. Partition Index
F. Compression offset map

1. Subset of the partition data stored on disk in the SSTables will be stored in memory
2. Helps in finding which SSTables can have requested data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stores the sampling of partition index.
5. Stores an index of all partition keys mapped to their offset.
6. Stores pointers to the exact location on disk where the desired partition data will be found.

Question-: Please map the following

A. Partition Summary
B. Key Cache
C. SSTables
D. Partition Index

1. Stores the byte offset into the partition index.
2. Stores the byte offset of the most recently accessed records.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stores the index of all partition keys mapped to their offset.