Premium

Dell EMC Data Science and BigData Certification Questions and Answers



Question : Which of the following are true for semi-structured data?
A. These data can be organized into a specialized repository.
B. These data can be easily stored in RDBMS table
C. These data can have associated metadata and keywords.
D. JSON and XML data are the example of semi-structured data

 : Which of the following are true for semi-structured data?
1. A,B
2. B,C
3. C,D
4. A,D
5. B,D

Correct Answer : 3
Explanation: Yes, semi-structured data cannot be sired in a specialized repository like RDBMS table or Spread sheets. They are generally stored in individual documents. XML and JSON are the examples of
semi-structured data. These data can have associated metadata like schema in XML and possibly keywords etc. So you cannot say that semi-structured data cannot have schema.




Question : Which of the following are example of quasi-structured data?
A. XML Data
B. JSON Data
C. Clickstream data
D. Google Search results
E. Any Website web page data for scrapping

 : Which of the following are example of quasi-structured data?
1. A,B,C
2. B,C,D
3. C,D,E
4. A,D,E
5. A,C,E

Correct Answer : 3
Explanation: Sometime it is very difficult to correctly choose data format between semi-structured and quasi-structured data. So remember that data which has associated metadata and can be formatted or
structured using that metadata than they are semi-structured data like JSON and XML data.
And data which is still in text format but cannot be easily formatted and require a good and intelligent transformer to convert them in a well format. Like Google search result, Web scrapping data, any web page, and
Web server click-stream data.





Question : Which of the following are the characteristics of the unstructured data?
A. It can be easily analyzed.
B. It is difficult to query and search these data.
C. It is free-form of data
D. Audio and video files are example of unstructured data
E. These data is scattered and dispersed

 : Which of the following are the characteristics of the unstructured data?
1. A,B,C
2. B,C,D
3. C,D,E
4. B,C,D,E
5. A,C,D,E

Correct Answer : 4
Explanation: Example of unstructured data is PDFS, Documents, and Audio and Video files. These data cannot be easily queried or searched. It requires the lot of effort to make sense out of this. You will
be using pre-processing like MapReduce algorithm to transform that data in a specialized format like Avro, Parquet etc. before you can query that data.


Related Questions


Question : You are working as a data scientists in a retail chain company. To you and your team have been given a project to implement recommendation engines for the products which company is selling online and you
decided to create an analytics sandbox. So which of the following you are trying to achieve?


 : You are working as a data scientists in a retail chain company. To you and your team have been given a project to implement recommendation engines for the products which company is selling online and you
1. You are creating a Hive table in Hadoop Framework.

2. You are defining the SQL queries for extracting the data.

3. You are estimating the size of the datasets and planning in total 5 to 10 time storage size for the data.

4. You would be transforming your semi-structured data into well formatted data and saving into csv file.

5. You are selecting the Advanced Analytics model.



Question : You are working with a training company which provides online trainings in various profession. You have received the data for further analysis which are already transformed and structured. You find that
there is a high correlation between course category, course watched and number of hours training watched. You need to use some technique to handle this highly co-related variable, which of the below you will be using?


 : You are working with a training company which provides online trainings in various profession. You have received the data for further analysis which are already transformed and structured. You find that
1. You will take a square root of each variable, so that correlation can be removed.

2. You will be discarding these all three variables.

3. You would be using normalizing technique so that three variables become equal in size.

4. You will be creating a new variable which is a function of these three correlated variable.



Question : You are doing advanced analytics for the one of the medical application using the regression and you have two variables which are weight and height and they are very important input variables, which
cannot be ignored and they are also highly co-related. What is the best solution for that?


 : You are doing advanced analytics for the one of the medical application using the regression and you have two variables which are weight and height and they are very important input variables, which
1. You will take cube root of height

2. You will take square root of weight

3. You will take square of the height.

4. You would consider using BMI (Body Mass Index)



Question : You are working as a data science consultant for a gaming company. You have three member team and all other stake holders are from the company itself like project managers and project sponsored, data team
etc. During the discussion project managed asked you that when can you tell me that the model you are using is robust enough, after which step you can consider answer for this question?

 : You are working as a data science consultant for a gaming company. You have three member team and all other stake holders are from the company itself like project managers and project sponsored, data team
1. Data Preparation

2. Discovery

3. Operationalize

4. Model planning

5. Model building



Question : Which of the following statements are correct with regards to R programming vector?
A. Vector always have character types internally
B. Vector is having one dimension.
C. Vector elements are always with the same data types.
D. Vector (1,2,3,"four",TRUE) internally store this value as "1" "2" "3" "four" "TRUE"

 : Which of the following statements are correct with regards to R programming vector?
1. A,B
2. B,C
3. C,D
4. A,D
5. B,D


Question : Which of the following statement is true with regards to Array and List?


 : Which of the following statement is true with regards to Array and List?
1. Array can have mixed data type values while List cannot.

2. List can have mixed data type values while Array cannot.

3. Both List and Array can have mixed data types.

4. Both List and Array can have only same data types.