Question-: Which of the following are correct statement for the sstableloader command? A. This Tool help in loading the external data to an existing cluster. B. This tool help in loading the existing SSTable to the cluster. C. When you load the data from SSTable then previous cluster and new cluster should have the same number of nodes. D. Loading the data using this tool required that previous cluster and new cluster have same application strategy or partitioner.
Answer: A,B
Explanation: The sstableloader command is used to load the existing SStable into the cluster. There is no hard and fast requirement that previous cluster and new cluster have the same number of nodes or the same replication strategy, this can be different. So when data is loaded using the sstableloader on the new cluster, then partition strategy or replication strategy would be rearranged as per the new cluster configuration.
Question-: You have many csv files with almost million records across all the files , there is a one column in the csv file which has a date data in it. But before loading the csv file you want to change the format of date column. Which of the following is most suitable solution for this requirement? A. You would be using CQL copy command B. You would be using sstableloader C. You would be using DSBulk tool D. You would be using spark
Answer: D
Explanation: There are multiple options to load data in Cassandra Database, as mentioned previously you should use different tool based on the different requirement. In this question it is given that you wanted to format the date before uploading the data in the Cassandra database. So if you want to modify the data or pre-process the data then Spark is the best solution for this. Using the spark programming you can modify the format of the data as per your requirement and then directly save the data in Cassandra database. You can use parallelism to speed up this process. There are a lot of Optimization possible in this process and various tuning parameters are available through Spark as well.
Question-: Which of the following statement are correct for the DSBulk tool? A. This can be used to export data from Casandra database. B. This can be used how to load JSON file in database. C. This can be used to load csv file in database D. This can be used to rebalance Cassandra cluster. E. This can be used to repartition Cassandra cluster.
Answer: A,B,C
Explanation: DSBulk tool it is one of the best tool available for importing and exporting huge volume of data to the Cassandra cluster. These tools support Both csv and JSON format. Yes, this tool is available as a command line interface. This tool can not be used for rebalancing or repartitioning the Cassandra database.