Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: Which all are the possible node types in a Cassandra datacenter?
A. Transactional
B. Graph
C. Analytics
D. Search
E. SearchAnalytics

Answer: A,B,C,D,E

Explanation: You have to make sure that each datacenter would have only one node type as below and have specific purpose.
- Transactional: This is mainly used for storing transactional data for example e-commerce users purchase etc.
- Graph: When you have connected data like social network data then you should use this for analyzing, searching and managing highly connected data.
- Analytics : This node can be integrated with Apache Spark (To learn Apache Spark check here)
- Search: It can be integrated with the Apache Solr for providing efficient search solution.
- Search Analytics: You can have both search engine and analytics together.

Question-: Which of the following statement is correct with regards to Data storage in Cassandra?
A. All Data is first written to the SSTables.
B. All Data is first written to the Commit logs.
C. Once data is written to commit log it can be archived, deleted.
D. Once data is written to SSTables it can be archived, deleted.

Answer: B, D

Explanation: We need to understand the purpose of SSTables, and Commit Log first
Whenever write happens on the Database data first get written to commit log for the durability. And then data will be flushed to SSTables where it can be archived, deleted or recycled.

Question-: Which of the following statements are correct for the SSTable?
A. SSTable is a mutable data file.
B. SSTable is an Immutable data file.
C. SSTables are append only files.
D. SSTables are stored sequentially and separately maintained for each database table.

Answer: B, C,D
Exp: SSTable are named from sorted string table, which is an immutable file. Cassandra database periodically writes memtables to SSTables. And SSTables are append oly and stored on disk sequentially for each database table.

Related Questions

Question-: Which of the following statements are correct, with regards to configuration?
A. You should use connect.yaml file for defining and configuring for client connection and security.
B. You should use commit log directory to a different disk drive from the data file directories.
C. Cassandra.yaml should be used for defining caching parameters of the tables.
D. A,C
E. B,C
F. A,B,C

Question-: Map the following
A. Clustering column
B. Materialized View
C. Partition Key

1. Using this data can be divided in logical groups.
2. It helps in retrieving data sorted by date column.
3. You can build another table using the existing table.

Question-: Which of the following is applicable when you design your table?
A. One of the nodes in the cluster should have all the data from all the remaining nodes in the cluster.
B. Each node in the cluster should have roughly equal amount of data.
C. Partition key should be the first column, while defining primary key.
D. While reading data, you should try that you read data from as more partitions as possible.

Question-: Which of the following is correct way to model your data so that minimum partition can be read while querying?
A. Model your data around relationships among the data.
B. Model your data around relationships among the objects.
C. Model your data around the queries you will be using.
D. A,C
E. B,C

Question-: You want to store all the subscriptions detail for the users subscribed the courses on HadoopExam.com. However, we want to group the users based on the course type. Sample data as below.

As you know that Analytics group has the highest number of users like more than a million and BigData group has few thousands of users. Which of the following table design is suitable so that data can be evenly spread across 5 node cluster as well as you would always query data for groups having CourseGroup as part of condition and data should be ordered based on username?
Here, hash_prefix holds a prefix of a hash of the username. Which is first byte of the hash modulo four.
And also, there are 1000’s of CourseGroup in data.
A. CREATE TABLE HE_GROUP (
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup), username)
)

B. CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup, hash_prefix), username)
)

C.
CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup, email ), username)
)

D.
CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((username, email), username)
)

Question-: You have been given below sample data

And your data model is as below. There are 1000’s of CourseGroup in data.

CREATE TABLE he_users (
id uuid PRIMARY KEY,
username text,
email text,
location text,
first text,
last text
)

CREATE TABLE he_groups (
coursegroup text,
user_id uuid,
PRIMARY KEY coursegroup , user_id)
)

Which of the following statements are correct?

A. This modeling would have reduce the duplication of users across many groups.
B. To get the all user info we need to read all the partitions. For 1000 groups, 1000 partition will be read.
C. This is good model for heavy read.
D. This is a good model for very frequent update of user info.