Datastax Cassandra Administrator Certification Questions and Answer (Pratice Questions and Dumps)

Question-: You wanted to increase the memory allocated to Java process, which runs the Cassandra node. Which of the following is correct place to increase the memory?
A. You would be updating logback.xml file
B. You would be updating the Cassandra.yaml file
C. You would be updating the jvm.options file
D. You would be updating the node system properties

Answer: C

Explanation: Java process required its own memory. So suppose you have 256GB RAM on the node, on which the Cassandra node would be setup. You would be using the out this 128GB for running your Cassandra node smoothly, that is only for the Java process. Cassandra node would have some other processes also running. Hence, you need to keep the memory available. So in this case 128GB for java process and remaining 128GB.
Now question is how can we allocate the memory to Java process. If you have ever worked with the Java then you might be knowing that there is a way, while starting the Java process you can provide various options like -Xmx (for maximum memory), -Xms (for minimum required memory). What is the memory required for New Generation is Java process can be set using the -Xmn etc.
So, where do you configure all these values so that Cassandra node while read all these values and set the options accordingly.
There is file called, jvm.options. Which you can use to setup all these values. It is always recommended that you leave 15-20% of RAM for other process.
Admin Only

Question-: Which all below are the recommendation for the Java GC tuning in the Cassandra cluster node?
A. Heap size should be always between ¼ and ½ of the available memory (RAM) on the node.
B. Node should not use offheap cache and file system cache.
C. While tuning the GC parameters enable the GC logging
D. GCInspector class logs information about any garbage collection that takes long than 200ms.

Answer: A,C,D

Explanation: GC settings and tuning is very critical for the Cassandra cluster, Hence, setting Java Heap size more than 32GB may interfere with the OS page cache. Operating systems that maintain the OS page cache for frequently accessed data are very good at keeping this data in memory. Properly tuning the OS page cache usually results in better performance that increasing the row cache, there are some guidelines you need to follow while tuning the GC.
- Heap size which is usually between ¼ and ½ of system memory and should not be larger than 32 GB
- You should always keep the enough memory for the offheap cache and file system cache.
- Enable GC logging when adjusting GC.
- Enable the parallel processing for GC.
- GCInspector class logs information about any garbage collection that takes longer than 200ms. Garbage collection that occur frequently and take a moderate length of time (seconds) to complete that indicates that there is excessive garbage collection pressure on the JVM.
If you wanted to enable the gc logging go to the file jvm.options and update the following setting
-xloggc:/var/log/Cassandra/gc.log
However, once you change this value then it required node to be restarted.
Similarly Heap sizes are also configured in this file only.
-Xmx and -Xmx (This should be generally kept the same, so that while starting the node, all required memory can be reserved by the node JVM process). And it can avoid the GC pauses.

Question-: You have recently faced GC issues on a particular node in the Cassandra cluster. And to troubleshoot that you have to enable the gc logging. You identified the issue and fixed the same. You kept the GC logging on for this node for next days. As you wanted to troubleshoot again in case issue re-occur, is this fine?
A. Yes
B. No

Answer: B

Explanation: Keeping the GC logging on can be a detrimental to your Ring(Cassandra cluster). So you should avoid the keeping GC logging on. If GC logging on, it can affect your Cassandra cluster.

Related Questions

Question-: Which of the following statements are correct, with regards to configuration?
A. You should use connect.yaml file for defining and configuring for client connection and security.
B. You should use commit log directory to a different disk drive from the data file directories.
C. Cassandra.yaml should be used for defining caching parameters of the tables.
D. A,C
E. B,C
F. A,B,C

Question-: Map the following
A. Clustering column
B. Materialized View
C. Partition Key

1. Using this data can be divided in logical groups.
2. It helps in retrieving data sorted by date column.
3. You can build another table using the existing table.

Question-: Which of the following is applicable when you design your table?
A. One of the nodes in the cluster should have all the data from all the remaining nodes in the cluster.
B. Each node in the cluster should have roughly equal amount of data.
C. Partition key should be the first column, while defining primary key.
D. While reading data, you should try that you read data from as more partitions as possible.

Question-: Which of the following is correct way to model your data so that minimum partition can be read while querying?
A. Model your data around relationships among the data.
B. Model your data around relationships among the objects.
C. Model your data around the queries you will be using.
D. A,C
E. B,C

Question-: You want to store all the subscriptions detail for the users subscribed the courses on HadoopExam.com. However, we want to group the users based on the course type. Sample data as below.

As you know that Analytics group has the highest number of users like more than a million and BigData group has few thousands of users. Which of the following table design is suitable so that data can be evenly spread across 5 node cluster as well as you would always query data for groups having CourseGroup as part of condition and data should be ordered based on username?
Here, hash_prefix holds a prefix of a hash of the username. Which is first byte of the hash modulo four.
And also, there are 1000’s of CourseGroup in data.
A. CREATE TABLE HE_GROUP (
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup), username)
)

B. CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup, hash_prefix), username)
)

C.
CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((coursegroup, email ), username)
)

D.
CREATE TABLE HE_GROUP(
coursegroup text,
username text,
email text,
first text,
last text,
location text,
hash_prefix int,
PRIMARY KEY ((username, email), username)
)

Question-: You have been given below sample data

And your data model is as below. There are 1000’s of CourseGroup in data.

CREATE TABLE he_users (
id uuid PRIMARY KEY,
username text,
email text,
location text,
first text,
last text
)

CREATE TABLE he_groups (
coursegroup text,
user_id uuid,
PRIMARY KEY coursegroup , user_id)
)

Which of the following statements are correct?

A. This modeling would have reduce the duplication of users across many groups.
B. To get the all user info we need to read all the partitions. For 1000 groups, 1000 partition will be read.
C. This is good model for heavy read.
D. This is a good model for very frequent update of user info.