Question-: When compaction is done, then in which of the below case, new SSTable partition segment smaller than the older one?
A. When there are lot of delete operations on both of the partition segment. B. When there are lot of tombstone marked data in both the partition segment. C. When there are lot of insert operations on both the partition segments. D. When there are lot of UPDATE operations.
Answer: A, B Exp: When there are lot of delete then it is possible that there are lot of tombstone data which has the gc grace period expired. Which can cause new SSTable partition segment bigger than the older one. In case of lot of insert operation there would be bigger partition segment as part of the result. And in case of UPDATE there is no such impact on size.
Admin Only
Question-: You have a big Cassandra table with the overall size around M records. You run the following command.
COPY HE_KEYSPACE.TBL_HADOOPEXAM_COURSES TO ‘home/hadoopexam/he_courses_data.csv’ with HEADER=true and PAGETIMEOUT=40 and PAGESIZE=20 AND DELIMITER=’~’;
However, while doing this exercise. You get below error.
./dump_cassandra.sh: xmalloc: ../../.././lib/sh/strtrans.c:63: cannot allocate XXXXXXXXXX bytes (YYYYYYYY bytes allocated)", "stdout_lines": ["[Sat Jul 13 11:12:24 UTC 2019] Executing the following query:", "COPY HE_KEYSPACE.TBL_HADOOPEXAM_COURSES TO ‘home/hadoopexam/he_courses_data.csv’ with HEADER=true and PAGETIMEOUT=40 and PAGESIZE=20 AND DELIMITER=’~’;"
What is the cause and how can you correct the same?
A. You have to remove PAGETIMEOUT parameter B. You have to increase the PAGESIZE parameter from 20 to more C. You have to add BEGINTOKEN and ENDTOKEN parameters D. You have to add MAXOUTPUTSIZE parameters
Answer: D
Explanation: Yes, you can use the COPY TO command to copy data from Cassandra table to a csv file. However, you also need to know the use of each parameters for almost all basic command used. There are following command options. In real exam they would not ask for all the command but frequently used command you should know. One of the example is COPY TO and COPY FROM command.
PAGESIZE : This shows the page size while fetching the data. If your PAGESIZE is higher than PAGETIMEOUT should also be high.
PAGETIMEOUT: It is a timeout for fetching each page. If your partition size is large than you should have large PAGETIMEOUT. If there is timeout error then consider increasing this value. Which is not the case in the given example.
BEGINTTOKENS: From the token where the data should be exported.
ENDTOKEN : Maximum number of token till the data needs to be exported.
MAXREQUESTS : Count of concurrently processing the data.
MAXOUTPUTSIZE: It is not able to allocate enough space for the data you are exporting. Hence, you need to tune this parameter. This is the parameter which actually measures the maximum size of the output file measured in number of lines. Beyond this value, the output file would be splited into segments.
Admin and Dev both
Question-: Which of the following helps keeping all the data together based on the partition key? A. Row Cache B. Key Cache C. Partition D. Bloom filter E. Clustering key
Answer: C Exp: To keep all the data together with the same partition key is achieved by the concept of partition. Partition created based on the partition key. On a single node data with the same partition key goes into the same partition. Clustering key helps this data to be sorted based on the clustering columns. All others are like Row Cache. Key Cache and Bloom filter are in memory data structure. To retrieving and locating data quickly.