Dell EMC Data Science and BigData Certification Questions and Answers

Question : There are students who subscribed for the training materials from an Educational Portal and then appear for the final exam. Portal provides three means of preparing for the exam as below
1. Prepare using Books
2. Prepare using Recorded Video Trainings
. Prepare using Sample Practice Questions and Study Notes
You divide 90 students in three groups as below
Group-1: Is using only Books for exam preparation
Group-2: Is using only recorded video trainings for exam preparation
Group-: Is using only Practice Questions for the exam preparation
Which of the following Hypothesis test you can use in this scenario to compare their exam scores to find that which of the exam preparation technique is more effective?

1. You will be using Student t-test

2. You will be using Welch's t-test

3. You will be using Wilcoxon sun test

4. You will be using ANOVA

5. You would be applying 3student's t-tests, by creating three pairs

Correct Answer : 4
Explanation: : In the Hypothesis t-test, you can use Student, Welch or Wilcoxon t-test. If there are only two groups. Hence, we can discard the option 1,2 and 3. Only remaining option is Option-4 which is
ANOVA (Analysis of variance) and its correct as well.
You can apply multiple t-tests by creating pairs for example Group1 with Group2, Group2 with Group3 and Growp1 with Group3. Hence, total 3 t-test can be applied. However, multiple t-tests may not perform well on
several populations for two reasons. Because
1. If number of groups increases then number of t-test also increases
2. If you increase number of t-test, then probability of committing type-1 error also increases.
Above two issues can be taken care using the ANOVA (Analysis of Variance).
ANOVA is a generalization of the Hypothesis testing of the difference of two population means. ANOVA tests whether any of the population means differ from the other population means. In case of ANOVA following are the
Null and Alternate Hypothesis
Null Hypothesis: All the population means are equal (u1=u2=u3=u4.=un)
Alternate Hypothesis: In this case at least one pair of population means is not equal (u(i) <> u(j))
In this case also we are assuming that each population is assumed to be normally distributed with the same variance.

Question : Which of the following is true about the clustering?
A. It is a supervised learning
B. It is a unsupervised learning
C. This technique can be used to finding hidden structure within the labelled data
D. Dividing employees in three groups based on their salary is an example of Clustering

1. A,B
2. B,C
3. B,C,D
4. A,B,D
5. A,B,C,D

Correct Answer : 3
Explanation: Clustering is unsupervised machine learning technique to group the data, without even having the pre-defined labels to group them, based on their similarity in characteristics. It can help you
in finding the hidden structure in the unlabeled data. Unsupervised means, you are not applying any labels in the advance on the data. For example in a large company you can create three groups of all the employee
based on their salary. Clustering is an exploratory data analysis technique and you don't make any predictions in this. Major applications of the clustering's are marketing, economics, and various branches of science.

Question : You are working in a data analytics company as a data scientist, you have been given a set of various types of Pizzas available across various premium food centers in a country. This data is given as
numeric values like Calorie, Size, and Sale per day etc. You need to group all the pizzas with the similar properties, which of the following technique you would be using for that?

1. Association Rules

2. Naive Bayes Classifier

3. K-means Clustering

4. Linear Regression

5. Grouping

Correct Answer : 3
Explanation: Using K means clustering you can create group of objects based on their properties. Where K is number of the groups. In this case, in each group you determine the center of the group and then
find the how far each object characteristics from the center. If it is near the center than it can be part of the group. Suppose we have 100 objects and we need to determine 4 groups. Hence, here K=4. Now we determine
4 center values and based on that center value we determine the distance of each object from the center.

Related Questions

Question : Which word or phrase completes the statement? A data warehouse is to a centralized database
for reporting as an analytic sandbox is to a _______?

1. Collection of data assets for modeling

2. Collection of low-volume databases
3. Access Mostly Uused Products by 50000+ Subscribers

4. Collection of data assets for ETL

Question : You do a Students t-test to compare the average test scores of sample groups from populations A
and B. Group A averaged 10 points higher than group B. You find that this difference is significant,
with a p-value of 0.03. What does that mean?

1. There is a 3% chance that you have identified a difference between the populations when in
reality there is none.
2. The difference in scores between a sample from population A and a sample from population B
will tend to be within 3% of 10 points.
3. Access Mostly Uused Products by 50000+ Subscribers
sample group from population B.
4. There is a 97% chance that a sample group from population A will score 10 points higher that a
sample group from population B.

Question : What is one modeling or descriptive statistical function in MADlib that is typically not provided in a
standard relational database?

1. Expected value
2. Variance
3. Access Mostly Uused Products by 50000+ Subscribers

4. Quantiles

Question : : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

1. Discovery
2. Data Preparation
3. Access Mostly Uused Products by 50000+ Subscribers
4. Communicate Results

Question : You are testing two new weight-gain formulas for puppies. The test gives the results:
Control group: 1% weight gain
Formula A. 3% weight gain
Formula B. 4% weight gain
A one-way ANOVA returns a p-value = 0.027
What can you conclude?

1. Formula A and Formula B are about equally effective at promoting weight gain.
2. Formula A and Formula B are both effective at promoting weight gain.
3. Access Mostly Uused Products by 50000+ Subscribers
4. Either Formula A or Formula B is effective at promoting weight gain.

Question : Data visualization is used in the final presentation of an analytics project. For what else is this
technique commonly used?

1. Data exploration
2. Descriptive statistics
3. Access Mostly Uused Products by 50000+ Subscribers
4. Model selection