CREATE TABLE he_food_chain( country_code text, state text, city text, cafe_name text, PRIMARY KEY (country_code, state, city, cafe_name ) );
Which of the following statement is/are correct?
A. All café within a country, state and city reside in the same partition. B. All café within a country reside in the same partition. C. Within a country, state and city you would get result order by café name. D. Within a country you would get result order by state, city and café name.
Answer: B,D
Explanation: As we defined Primary Key as (country_code, state, city, cafe_name) then it means only first column is considered as Partition Key which country_code in this case. Hence, all the data will be partitioned based on the country_code. All the records with the same country will go in the same partition. And remaining three columns state, city and café_name are considered as a clustering key. Hence, they would store data under one partition order by these three columns. If you want to fetch all the café within a city, you have to iterate the result returned based on the country_code. Clustering keys by default consider the ascending order. In this case results are first order by state ascending and then city ascending and finally name of café.
If you want different sorting order than you can design table as below for example.
CREATE TABLE he_food_chain( country_code text, state text, city text, cafe_name text, PRIMARY KEY (country_code, state, city, cafe_name ) ) WITH CLUSTERING ORDER BY (state desc, city asc, café_name asc) ;
Partition key is not part of the ORDER BY statement because its values are hashed and therefore would not be close to each other in the cluster.
When you run the CQL query, you must include all partition key columns, at a minimum. And then you can apply additional filter by adding each clustering key in the order in which the clustering key appear.
Dev Only
Question-: You have defined below table definition
CREATE TABLE he_food_chain( country_code text, state text, city text, cafe_name text, opening_date timestamp, PRIMARY KEY ((country_code, state, city) , opening_date, cafe_name ) ) WITH CLUSTERING ORDER BY (opening_date ASC, cafe_name ASC ) ;
Which of the following is valid query?
A. SELECT * FROM he_food_chain WHERE country_code = 'IND' and state = 'RAJ' and city = 'UDAIPUR' B. SELECT * FROM he_food_chain WHERE country_code = 'IND' and state = 'RAJ' and city = 'UDAIPUR' and opening_date < '2019-01-01 00:00:00+0200' C. SELECT * FROM he_food_chain WHERE country_code = 'IND' and state = 'RAJ' D. SELECT * FROM he_food_chain WHERE country_code = 'IND' and state = 'RAJ' and city = 'UDAIPUR' and cafe_name = 'PRATAP DHABA'
Answer: A,B
Explanation: In Option-3 we are not having city column in where clause which is a partition key. Hence, this is not correct. In option-4 it is using clustering column café_name without considering the preceding clustering key column opening_date. Hence, this is also invalid.
Here, clustering keys column order matters because the clustering keys provide the sort order of the result set. Because of the clustering key’s responsibility for sorting, we know all data matching the first clustering key will be adjacent to all other data matching that clustering key. For example all the café with the same opening date will be grouped together in alphabetical order. Café with different dates will appear in temporal order.
Because it know the order, CQL can truncate section of the partition that don’t match the qury criteria. As clustering key café_name is secondary to clustering ket opening_date, café_name will appear in alphabetical order only for café opened on the same day. Hence, it is not correct to define café_name in CQL without specifying an opening date.
Admin/Dev Both
Question-: Which of the following are correct limitation for the Cassandra database? A. You should set the ordering of data while table creation only. B. All data for a single partition must fit on disk in a single node in the cluster. C. A single column value is limited to 1 MB only. D. No join or subquery support for aggregation.
Answer: A,B,D
Explanation: Cassandra database is good for BigData storage but it comes with its own limitations which you should always keep in mind as below.
- Cassandra database recommend that you keep number of rows within a partition below 100,000 as well as disk size under 100MB (Please check documentation if this has changed) - A single column value is limited to 2 GB (And 1 MB is recommended) - All data for a single partition must fit on disk in a single node in the cluster. - Ordering is set at table creation time on a per-partition basis. It helps in avoiding clients attempt to sort billions of stored rows on a table. - Denormalization is encouraged and join or subquery are not supported for aggregations.