Question : Which of the following are true for semi-structured data? A. These data can be organized into a specialized repository. B. These data can be easily stored in RDBMS table C. These data can have associated metadata and keywords. D. JSON and XML data are the example of semi-structured data
1. A,B 2. B,C 3. C,D 4. A,D 5. B,D
Correct Answer : 3 Explanation: Yes, semi-structured data cannot be sired in a specialized repository like RDBMS table or Spread sheets. They are generally stored in individual documents. XML and JSON are the examples of semi-structured data. These data can have associated metadata like schema in XML and possibly keywords etc. So you cannot say that semi-structured data cannot have schema.
Question : Which of the following are example of quasi-structured data? A. XML Data B. JSON Data C. Clickstream data D. Google Search results E. Any Website web page data for scrapping
1. A,B,C 2. B,C,D 3. C,D,E 4. A,D,E 5. A,C,E
Correct Answer : 3 Explanation: Sometime it is very difficult to correctly choose data format between semi-structured and quasi-structured data. So remember that data which has associated metadata and can be formatted or structured using that metadata than they are semi-structured data like JSON and XML data. And data which is still in text format but cannot be easily formatted and require a good and intelligent transformer to convert them in a well format. Like Google search result, Web scrapping data, any web page, and Web server click-stream data.
Question : Which of the following are the characteristics of the unstructured data? A. It can be easily analyzed. B. It is difficult to query and search these data. C. It is free-form of data D. Audio and video files are example of unstructured data E. These data is scattered and dispersed
1. A,B,C 2. B,C,D 3. C,D,E 4. B,C,D,E 5. A,C,D,E
Correct Answer : 4 Explanation: Example of unstructured data is PDFS, Documents, and Audio and Video files. These data cannot be easily queried or searched. It requires the lot of effort to make sense out of this. You will be using pre-processing like MapReduce algorithm to transform that data in a specialized format like Avro, Parquet etc. before you can query that data.