Premium

Dell EMC Data Science and BigData Certification Questions and Answers



Question : Which of the following skills a data scientists required?
A. Web designing to represent best visuals of its results from algorithm.
B. He should be creative
C. Should possess good programming skills
D. Should be very good at mathematics and statistic
E. He should possess database administrative skills.

 : Which of the following skills a data scientists required?
1. A,B,C
2. B,C,D
3. C,D,E
4. A,D,E
5. A,C,E

Correct Answer : 2
Explanation: Yes a data scientists should have combination of skills like to solve the complex problem he should be creative as well as able to find new solutions and use of existing data. And solve the
problem skills required are programming as currently we see SAS, R, Python, Spark, Java and SPSS even day by day new technologies are coming.
To apply various existing and new algorithm using Machine Learning, or AI it require good mathematics and statistics skills (Where the programmer feels, weaknesses).
Another skill required is using visualization techniques like Qlik, Tableau etc.





Question : Which of the following steps you will be using in the discovery phase?
A. What all are the data sources for the project?
B. Analyze the Raw data and its format and structure.
C. What all tools are required, in the project?
D. What is the network capacity required
E. What Unix server capacity required?

 : Which of the following steps you will be using in the discovery phase?
1. A,B,C
2. B,C,D
3. C,D,E
4. B,C,D,E
5. A,B,C,D,E

Correct Answer : 5
Explanation: During the discovery phase you need to find how much resources are required as early as possible and for that even you can involve various stakeholders like Software engineering team, DBAs,
Network engineers, System administrators etc. for your requirement and these resources are already available or you need to procure them. Also, what would be source of the data? What all tools and software's are
required to execute the same?




Question : Which of the following tool can be used to load and clean the huge volume of data?


 : Which of the following tool can be used to load and clean the huge volume of data?
1. Spark GraphX

2. Cloudera Knox

3. Apache Hadoop

4. Oracle MySQL

5. Qlik


Correct Answer : 3
Explanation: Hadoop has the capability to transform the data in required format by writing Custom MapReduce code or even you can use Sqoop (SQL to Hadoop) for loading data from RDBMS to HDFS. Similarly you
can use Flume to load file based data. All are underline uses the MapReduce code to copy data from one system to HDFS. And during the data load you can even apply the transformation logic as well. So this it will help
while data loading transformations to be applied.


Related Questions


Question : There are students who subscribed for the training materials from an Educational Portal and then appear for the final exam. Portal provides three means of preparing for the exam as below
1. Prepare using Books
2. Prepare using Recorded Video Trainings
. Prepare using Sample Practice Questions and Study Notes
You divide 90 students in three groups as below
Group-1: Is using only Books for exam preparation
Group-2: Is using only recorded video trainings for exam preparation
Group-: Is using only Practice Questions for the exam preparation
Which of the following Hypothesis test you can use in this scenario to compare their exam scores to find that which of the exam preparation technique is more effective?


 : There are  students who subscribed for the training materials from an Educational Portal and then appear for the final exam. Portal provides three means of preparing for the exam as below
1. You will be using Student t-test

2. You will be using Welch's t-test

3. You will be using Wilcoxon sun test

4. You will be using ANOVA

5. You would be applying 3student's t-tests, by creating three pairs



Question : Which of the following is true about the clustering?
A. It is a supervised learning
B. It is a unsupervised learning
C. This technique can be used to finding hidden structure within the labelled data
D. Dividing employees in three groups based on their salary is an example of Clustering

 : Which of the following is true about the clustering?
1. A,B
2. B,C
3. B,C,D
4. A,B,D
5. A,B,C,D


Question : You are working in a data analytics company as a data scientist, you have been given a set of various types of Pizzas available across various premium food centers in a country. This data is given as
numeric values like Calorie, Size, and Sale per day etc. You need to group all the pizzas with the similar properties, which of the following technique you would be using for that?


 : You are working in a data analytics company as a data scientist, you have been given a set of various types of Pizzas available across various premium food centers in a country. This data is given as
1. Association Rules

2. Naive Bayes Classifier

3. K-means Clustering

4. Linear Regression

5. Grouping



Question : In which of the following cases you can use the K-means clustering?
A. Image Processing
B. Customer Segmentation
C. Classification of the plants
D. Reducing the customer churn rate

 : In which of the following cases you can use the K-means clustering?
1. A,B
2. B,C
3. A,B,C
4. B,C,D
5. A,B,C,D


Question : You want to apply the K-Means clustering on the total number of objects which are M, total attributes on each object are n. You need to create groups or clusters. What would be matrix dimension you
would be using to store all the objects attributes?


 : You want to apply the K-Means clustering on the total number of objects which are M, total attributes on each object are n. You need to create  groups or clusters. What would be matrix dimension you
1. MXn

2. MX5

3. nX5

4. 5X5

5. MXM



Question : You have data of , people who make the purchasing from a specific grocery store. You also have their income detail in the data. You have created clusters using this data. But in one of the cluster
you see that only 30 people are falling as below
30, 2400, 2600, 2700, 2270 etc.
What would you do in this case?


 : You have data of , people who make the purchasing from a specific grocery store. You also have their income detail in the data. You have created  clusters using this data. But in one of the cluster
1. You will be increasing number of clusters.

2. You will be decreasing the number of clusters.

3. You will remove that 30 people from dataset

4. You will be multiplying standard deviation with the 100.