Question-: Suppose there are two Database tables in your Cassandra cluster, HE_VISIT and HE_SIGNUP. As you can see that HE_VISIT table has a high throughput, frequent write operations for this table. Which of the following statements are correct with regards to memtable flush for each table? A. Table HE_VISIT’s memtable fills up rapidly and gets flushed frequently the table HE_SIGNUP. B. Table HE_SIGNUP’s memtable fills up slowly and rarely gets flushed. C. When commit log reaches to its maximum size then it forces HE_SIGNUP’s to flush. D. If the commit log space and memtable space are same size than table HE_Signup’s memtable would flush every time table HE_VISIT is flushed.
Answer: A,B,C,D Exp: Suppose you have two tables table A and table B, where write to the table A is very high and table B has a very low write. In this case the commit log would have write from both the table. And memtable of table A get filled very rapidly so it got flush frequently. But in case of table B table fills slowly and flushed very rarely. Now the role of the commit log is to keep data written from both the table. And commit log has its own define Max size, if it reaches to that maximum size then it forces all the table’s memtable to be flushed to the disc as SSTable. Reason being commit log is common for all the table. And as you know commit log itself is divided into the segments, so when commit log reaches its maximum size it has to delete oldest segment for the commit log as well.
Admin/Dev both
Question-: Which of the following is shared across all the tables? A. Memtable B. SSTables C. Commit logs D. Both memtable and SSTable
Answer: C Exp: Memtables and SSTables are maintained for each table. And commit log is shared across all the tables on the node. Remember SSTables are immutable and once written they cannot be changed, only a bigger SSTable can be created by merging the various memtables.
Question-: You have been given below detail from the directory stored on one of the nodes in the Cassandra cluster.
data/hadoopexam/course_fee-a5g22x211gf422l7790c34ad987777d3d/xx-1-bti-Data.db Can you please map the following?
Answer: A-1, B-2, C-3, D-4 Exp: In the given example you can easily identify what is the name of the table, what is the keyspace name, what is the unique identifier of the table and what format SSTable is using etc.
In this case HadoopExam is the name of the keyspace, which is created under the data directory. Hence for each keyspace you would have one directory created on the data directory. And the next is the directory, which contains the initial as a table name and next Big hexadecimal string is a unique identifier for the table. In this directory various .db files are created, where initial is the version of the SSTable and format of the SSTable.