Question-: Which of the following statements are correct, with regards to configuration? A. You should use connect.yaml file for defining and configuring for client connection and security. B. You should use commit log directory to a different disk drive from the data file directories. C. Cassandra.yaml should be used for defining caching parameters of the tables. D. A,C E. B,C F. A,B,C
Answer: E
Explanation: In Cassandra the main configuration file is cassandra.yaml file. Which is used for setting the initialization properties for a cluster, caching parameters for tables, tuning and resource utilization, timeout settings and client connections, backups and security. You should also change the commitlog directory to a different disk drive from the data file directories. Based on above we can say option-2 and 3 are correct.
Admin and Dev both
Question-: Map the following A. Clustering column B. Materialized View C. Partition Key
1. Using this data can be divided in logical groups. 2. It helps in retrieving data sorted by date column. 3. You can build another table using the existing table.
Answer: A-2, B-3, C-1 Exp: In Cassandra database you can store data in a table similar to RDBMS which must have a primary key. With the help of partition key, it can be decided on which node data should be stored and you can divides data into logical groups. Using partition key data will be distributed evenly across the nodes in cluster. Please note: for efficiency and performance query and write requests across multiple partitions should be avoided. Clustering column: Using the clustering column data will be sorted within the partition. Suppose you want to fetch the data sorted by a date column then define date column as a cluster column. Materialized view: Similar to RDBMS this are the tables created using the other tables. And you can have different primary key and even set different properties. If you change the data in underline table from this view is created then data in the materialized view can also updated.
Admin and Dev Both
Question-: Which of the following is applicable when you design your table? A. One of the nodes in the cluster should have all the data from all the remaining nodes in the cluster. B. Each node in the cluster should have roughly equal amount of data. C. Partition key should be the first column, while defining primary key. D. While reading data, you should try that you read data from as more partitions as possible.
Answer: B, C Exp: You should try that each node in the cluster have roughly equal amount of data, so that cluster remain balanced. While defining primary key, have to check that first column in the primary key is same as partition key. Partitions are group of rows that share the same partition key. When you issue a read query, it should read rows from as few partitions as possible. Each partition may reside on a different node in the cluster. And the coordinator node generally need to issue separate commands to separate nodes for each partition you request. And it leads to overhead and latency. Even if you are using single node cluster than also it is expensive to read data from across the partitions.