Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: You have been given below sample data with the millions of the rows

While designing data model we have below requirement which needs to be satisfied.
- We should be able to query table which can return n newest users in the group.
- Data should be evenly stored across the nodes in the cluster.
- Each new day there should be a new partition.
- Analytics group has huge volume of data compare to any other group.
- Query should be something like below
SELECT * FROM he_group WHERE coursegroup = ? LIMIT ?

A. CREATE TABLE he_group (
coursegroup text,
subs_timeuuid timeuuid,
subscribed_date text,
username text,
email text,
first text,
last text,
location text,
PRIMARY KEY ( coursegroup , subs_timeuuid ), subscribed_date )
) WITH CLUSTERING ORDER BY subs_timeuuid DESC)

B.
CREATE TABLE he_group (
coursegroup text,
subs_timeuuid timeuuid,
subscribed_date text,
username text,
email text,
first text,
last text,
location text,
PRIMARY KEY (coursegroup ), subscribed_date , subs_timeuuid )
) WITH CLUSTERING ORDER BY subs_timeuuid DESC)

C.
CREATE TABLE he_group (
coursegroup text,
subs_timeuuid timeuuid,
subscribed_date text,
username text,
email text,
first text,
last text,
location text,
PRIMARY KEY ( coursegroup , subscribed_date ), subs_timeuuid )
) WITH CLUSTERING ORDER BY subs_timeuuid DESC)

D.
CREATE TABLE he_group (
coursegroup text,
subs_timeuuid timeuuid,
subscribed_date text,
username text,
email text,
first text,
last text,
location text,
PRIMARY KEY ( coursegroup , subscribed_date ), subs_timeuuid )
) WITH CLUSTERING ORDER BY subscribed_date DESC)

Answer: C
Exp: As question is clearly saying that they wanted to query the n number of newest users in the group and not across the group. Hence, obvious column to think for ordering is timeuuid. As newest users, hence it should be order by desc for column subs_timeuuid. Hence, we can discard option-4

Each new day there should be a new partition. Hence, we should have subscribe_date as part of partition key. We can discard option-1 as well. Also we want to query data for each individual group and data should be sorted. Hence, we should also have coursegroup as part of partition key. Hence, option-3 satisfies the given requirement.

Dev Only

Question-: You are designing a table with the columns (A, B, C, D,E) and you defined key as below
PRIMARY KEY (A, B, C)
Which of the following statement is true?

A. Columns A, B are partition key
B. Column A is a partition key
C. Columns B, C is a composite clustering key
D. Column C is a clustering key

Answer: B, C
Exp: Let’s see few basic concepts regarding primary, Partition and clustering key concept.

- To identify a row uniquely we need to use primary key.
- If primary key is made of more than one column than it is known as a composite key
- Partition key helps in finding the physical location of data in the cluster ring.
- Clustering key is part of primary key and not a partition key. This key is used for ordering data under each partition.
Lets see few examples how the Primary keys are defined.

- Primary Key (A) : Column A is a partition key.
- Primary Key (A, B) : Here Column A is a partition key and Column B is a Clustering key.
- Primary Key ((A,B)) : Here both columns A and B are considered as composite partition key.
- Primary Key (A,B,C) : Here Column A is a partition key, while (B,C) is a composite clustering key.
- Primary Key ((A,B),C) : Here column A,B are part of composite partition key, while Column C is a clustering Key.
- Primary Key ((A,B), C, D) : Here Column (A,B) are considered as composite partition key, while column (C,D) are considered composite clustering key.

Admin/Dev both

Question-: Which of the following should be taken care while designing the data model in Apache Cassandra?

A. Data should be evenly distributed across the node in the cluster.
B. While reading the data, we have to make sure that minimum partitions are read.
C. While reading the data, we have to make sure as much as possible partitions (try to maximize it) are read.
D. Data duplication is encouraged to avoid multiple table read.

Answer: A,B,D
Exp: While designing Data model in Cassandra we have to take care following things

1. Spread Data Evenly across the cluster. Based on the hash key value of partition key data will be distributed across the cluster. We have to define the partition key such a way that data would be evenly distributed across the cluster.
2. While reading the data, we have to make sure that minimum number of partition accessed. Ideally only 1. Because in most of the cases partition resides across the nodes in the cluster. And query coordinator node issue separate command for reading the data from different partition.
3. While designing data model we can thing per table per query pattern, if we have more than one table then we can think of duplicating the data in many table.

Dev Only

Related Questions

Question-: Which of the following statement is correct with regards to write consistency?
A. Write to first replica and the replica crashes one second later. The other messages are not delivered. The data is lost.
B. Write to first replica and the operation times out. Future reads can return the old or the new value. You will not know the data is incorrect.
C. Write to first replica and one of the other replicas is down. The node comes back online. The application will get old data from that node until the node gets the correct data or a read repair occurs.
D. Write at QUORUM and then a read at QUORUM. One of the replicas dies. You will always get the correct data.

Question-: Which of the following is correct for transactions in Cassandra?
A. Cassandra offers atomic, isolated and durable transaction with eventual and tunable consistency.
B. Cassandra does not support consistency in ACID sense.
C. Cassandra support atomicity and isolation at row-level.
D. Inserts or updates of more than two rows in the same partition are treated as one write operation.
E. Delete operation is not atomic at partition level.

Question-: Suppose you have setup the consistency level as QUORUM with the replication factor as , which of the following statements are correct?
A. Database replicates the write to all nodes in the cluster and waits for acknowledgement from two nodes.
B. If the write fails on one node and succeeds on another node, Cassandra will repot as a failure.
C. If the write fails on one node and succeeds on another node, then replicated write that succeeds on the other node will be rolled back.
D. If the write fails on one node and succeeds on another node, then replicated write that succeeds on the other node will not be rolled back.

Question-: Which of the following statements are correct with regards to Isolation in Cassandra database?
A. A write to a row within a single partition on a single node is only visible to the client performing the operation.
B. A write to a row within a single partition on a single node is visible to all the client connecting to the database.
C. All updates in a batch operation belonging to a given partition key on a single node is only visible to the client performing the operation.
D. All updates in a batch operation belonging to a given partitions keys on a multiple node is not isolated.

Question-: Which of the following is a valid statement with regards to Gossip protocol, in Cassandra database setup?
A. You should setup every node as a seed node for better performance in the cluster and each node well aware about each other in ring.
B. You should use the same list of seed nodes for each node in the cluster.
C. Seed node is only single point of failure in the Cassandra cluster setup. Hence, you should have more than one node setup as a seed node.
D. To permanently change a node’s membership in a cluster, you must explicitly add or remove nodes from the cluster.

Question-: What all the benefits of the defining v-nodes or converting a physical node into multiple v-nodes in the Cassandra Cluster?
A. With v-nodes tokens are automatically calculated for each v-node and assigned accordingly.
B. While adding and removing nodes cluster automatedly balance the cluster and load would be evenly distributed across the nodes.
C. When new node is added in the cluster can be build faster, because every node share the load of building this new node.
D. The proportion of vnodes assigned to each machine in a cluster can be assigned, so smaller and larger nodes can be used in the cluster.