Question-: Select correct statement with regards to tombstones marker? A. Insert or updating data with null values can cause of tombstones record generation. B. Tombstone go through read path C. Tombstone go through write path D. Having excessive number of tombstones can improve the overall performance of DB.
Answer: A, C Exp: Tombstones are created while data is deleted and following operations causes the tombstones to be created. 1. Using Delete with CQL. 2. If record is expiring because of time-to-live setting 3. Using materialized views can cause 4. Insert or updating data with the null values. 5. If we do update using collection column. Tombstone go through write path, and are written to SSTables on one or more nodes. A key differentiator of a tombstone is a built-in expiration known as the grace period, at the end of expiration period, the tombstone is deleted as part of compaction process.
And if there are excessive number of tombstones on a table then performance will be negatively impacted. And it is a question with the Data Model design and it was not correctly designed.
Admin and Dev only
Question-: You have been given below database design
CREATE KEYSPACE hadoopexam WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE hadoopexam.price_by_year_and_name ( purchase_year int, course_name text, price int, username text, PRIMARY KEY ((purchase_year , course_name), price) ) WITH CLUSTERING ORDER BY (pricesa ASC);
Which of the following delete statement will create partition level tombstones?
A. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price= 2000; B. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training'; C. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price> 1999; D. Partitopn level tombstones cannot be created.
Answer: B Exp: Tombstones is marker for deletion of the record/s. It can happen in any part of the partitions. There are below possible tombstones - Partition tombstones: When entire partition is deleted explicitly. - Row tombstones: When particular row in a partition got deleted. - Range tombstones: When more than one row like using range condition (<>) - ComplexColumn tombstones: When inserting updating complex column data - Cell tombstones: While inserting or updating with null in a particular column. Or deleting particular column in a row. - TTL Tombstones: When TTL expires. In the given option, if we check where clause 1. Particular row will be marked for tombstone. 2. Entire partition will be marked for deletion. 3. More than one row marked for deletion (Range) Hence, option-2 is correct.
Where purchase_year, course_name are partition key and price is used to create secondary index. Which of the following statement is applicable here?
A. Select all records having price > 1000, causes single partition read. B. Price column will be used for ordering of the data at storage level. C. Secondary index on price column are stored locally on each node. D. Price column will not be used for ordering of the data at storage level.
Answer: C, D Exp: If we want to use non primary key column as part of where clause, we should use the secondary indexes. However, this is not an ideal solution. Rather you should create a materialized view or additional table that is ordered by price column.
Non-primary key column dos not have any role or ordering the data on storage layer. So querying particular value of a non-primary column results in scanning all partitions. Scanning all partitions generally results in a prohibitive read latency and is not allowed.
Secondary indexes are stored locally on each node. If query includes both partition key and secondary index column in where clause then query will be successful.