Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Which of the following statements are correct, with regards to configuration?
A. You should use connect.yaml file for defining and configuring for client connection and security.
B. You should use commit log directory to a different disk drive from the data file directories.
C. Cassandra.yaml should be used for defining caching parameters of the tables.
D. A,C
E. B,C
F. A,B,C

Answer: E

Explanation: In Cassandra the main configuration file is cassandra.yaml file. Which is used for setting the initialization properties for a cluster, caching parameters for tables, tuning and resource utilization, timeout settings and client connections, backups and security. You should also change the commitlog directory to a different disk drive from the data file directories.
Based on above we can say option-2 and 3 are correct.

Admin and Dev both

Question-: Map the following
A. Clustering column
B. Materialized View
C. Partition Key

1. Using this data can be divided in logical groups.
2. It helps in retrieving data sorted by date column.
3. You can build another table using the existing table.

Answer: A-2, B-3, C-1
Exp: In Cassandra database you can store data in a table similar to RDBMS which must have a primary key.
With the help of partition key, it can be decided on which node data should be stored and you can divides data into logical groups. Using partition key data will be distributed evenly across the nodes in cluster. Please note: for efficiency and performance query and write requests across multiple partitions should be avoided.
Clustering column: Using the clustering column data will be sorted within the partition. Suppose you want to fetch the data sorted by a date column then define date column as a cluster column.
Materialized view: Similar to RDBMS this are the tables created using the other tables. And you can have different primary key and even set different properties. If you change the data in underline table from this view is created then data in the materialized view can also updated.

Admin and Dev Both

Question-: Which of the following is applicable when you design your table?
A. One of the nodes in the cluster should have all the data from all the remaining nodes in the cluster.
B. Each node in the cluster should have roughly equal amount of data.
C. Partition key should be the first column, while defining primary key.
D. While reading data, you should try that you read data from as more partitions as possible.

Answer: B, C
Exp: You should try that each node in the cluster have roughly equal amount of data, so that cluster remain balanced. While defining primary key, have to check that first column in the primary key is same as partition key.
Partitions are group of rows that share the same partition key. When you issue a read query, it should read rows from as few partitions as possible.
Each partition may reside on a different node in the cluster. And the coordinator node generally need to issue separate commands to separate nodes for each partition you request. And it leads to overhead and latency. Even if you are using single node cluster than also it is expensive to read data from across the partitions.

Dev only

Related Questions

Question-: Which of the following statements are true.
A. Memtable is maintained per table basis.
B. SSTable is maintained per table basis
C. Commit Log is shared among tables.
D. Memtable is shared among the tables.
E. SSTable is shared among the tables.

Question-: There is a process called Compaction for merging SSTables, which of the following statement is true?
A. While insert and update happens Cassandra engine overwrite existing rows with inserts and updates.
B. Engine does not perform deletes by removing the deleted data. Instead, the database marks deleted data with tombstone.
C. During compaction there is temporary spike in disk space as well as disk I/O.
D. Database can read data from new SSTables even before compaction process finishes.
E. During compaction there would be high cache miss.
F. Out of date versions of a row may exist on other node even compaction happen on another node.

Question-: Consider you have a setup of Cassandra cluster with the replication factor as to prevent data loss. Which of the following statement is true when you delete a row/data?

A. Consider you have a setup of Cassandra cluster with the replication factor as 3 to prevent data loss. Which of the following statement is true when you delete a row/data?
B. If a node has a record with the Tombstone marked and another node has more recent changes. Then while reading you would not get data.
C. If a node has record with the Tombstone marker and another node has older value record then while reading it will return the data/record.
D. If client writes a new update to existing tombstone record with the grace period, then there would be an overwritten to the existing Tombstone record.
E. Storage engine uses hinted hindoffs to replay the database mutations that the node missed while it was down.

Question-: Select correct statement with regards to tombstones marker?
A. Insert or updating data with null values can cause of tombstones record generation.
B. Tombstone go through read path
C. Tombstone go through write path
D. Having excessive number of tombstones can improve the overall performance of DB.

Question-: You have been given below database design

CREATE KEYSPACE hadoopexam WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE hadoopexam.price_by_year_and_name (
purchase_year int,
course_name text,
price int,
username text,
PRIMARY KEY ((purchase_year , course_name), price)
) WITH CLUSTERING ORDER BY (pricesa ASC);

Which of the following delete statement will create partition level tombstones?

A. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price= 2000;
B. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training';
C. DELETE from hadoopexam.price_by_year_and_name WHERE purchase_year = 2019 AND course_name = 'Apache Spark Scala Training' AND price> 1999;
D. Partitopn level tombstones cannot be created.

Question-: You have table in Cassandra as below

hadoopexam.price_by_year_and_name
(
purchase_year int,
course_name text,
price int,
username text
)

Where purchase_year, course_name are partition key and price is used to create secondary index. Which of the following statement is applicable here?

A. Select all records having price > 1000, causes single partition read.
B. Price column will be used for ordering of the data at storage level.
C. Secondary index on price column are stored locally on each node.
D. Price column will not be used for ordering of the data at storage level.