Dell EMC Data Science and BigData Certification Questions and Answers

Question : You have been given two population HEPop and HEPop, you need to do Hypothesis testing on this data to find that they are equal or not. However, you cannot assume that data is normally distributed. Which
of the following test would help?

1. Use Welch t-test

2. Use Student t-test

3. Use Teacher t-test

4. Use Wilcoxon rank sum test

Correct Answer : 4
Explanation: There are two types of t-test one is parametric t-test and another is non-parametric t-test. When you use parametric t-test than it makes the assumption about the population distributions from
which you take the samples. Suppose you can not assumed or transformed to follow a normal distribution, then a non-parametric test can be used.
Wilcoxon rank-sum test is a nonparametric hypothesis test and checks whether two population are identically distributed or not. As Wilcoxon test does not assume anything about the population distribution, it generally
considered more robust than the t-test. In other words, there are fewer assumptions to violate.
If you can assume that data is normally distributed than you can use the Student or Welch t-tests.

Question : You are conducting a Hypothesis test and Null Hypothesis is true. But you have rejected that Null Hypothesis, what type of this error?

1. Type-I Error

2. Type-II Error

3. Type-III Error

4. Type-IV Error

5. There is no error

Correct Answer : 1
Explanation: As question is saying that Null Hypothesis is true, but you still rejected that Null Hypothesis it means there is error. You can say that option 5 is not correct.
Next what all Error types we have
Type-I Error: You are rejecting Null Hypothesis even it is true and it is denoted by sign alpha.
Type-II Error: You are accepting Null Hypothesis even it is False. And that is denoted by sign Beta.

Hence, based on that we can say option-1 is correct. Usually you will calculate the probability of committing type-1 and type-2 error. If probability is 5%, it means that committing type-1 error is 0.5%, we can say
that there are 5% chances that you will reject the Null Hypothesis even it is true.

Question : You are conducting a Hypothesis test for two populations HEPop and HEPop. Which of the following statements are correct with regards to the Power and Sample Size?
A. The power of a test is the probability of correctly rejecting the null hypothesis
B. The power of a test is the probability of correctly accepting the null hypothesis
C. It is represented as (1-Probability of Type two Error)
D. Power of a test improves when the sample size increases.

1. A,B
2. A,C,D
3. A,B,C
4. B,C,D
5. A,B,C,D

Correct Answer : 5
Explanation: In the Hypothesis t-test , power for a test is the probability of correctly rejecting the Null Hypothesis. Which is denoted by the 1-Beta. Where Beta is the probability of a type 2 error. As
your sample size increases the power will also increases. Power can be used to determine the sample sizes.
Power of Hypothesis tests depends on the true difference of the population means.

Related Questions

Question : Your colleague, who is new to Hadoop, approaches you with a question. They want to know how
best to access their data. This colleague has previously worked extensively with SQL and
databases.
Which query interface would you recommend?

1. Flume
2. Pig
3. Access Mostly Uused Products by 50000+ Subscribers
4. HBase

Question : In linear regression, what indicates that an estimated coefficient is significantly different than zero?

1. R-squared near 1
2. R-squared near 0
3. Access Mostly Uused Products by 50000+ Subscribers
4. A small p-value

Question : Which graphical representation shows the distribution and multiple summary statistics of a
continuous variable for each value of a corresponding discrete variable?

1. box and whisker plot
2. dotplot
3. Access Mostly Uused Products by 50000+ Subscribers
4. binplot

Question : Assume that you have a data frame in R. Which function would you use to display descriptive
statistics about this variable?

1. levels
2. attributes
3. Access Mostly Uused Products by 50000+ Subscribers
4. summary

Question : What is the mandatory Clause that must be included when using Window functions?

1. OVER
2. RANK
3. Access Mostly Uused Products by 50000+ Subscribers
4. RANK BY

Question : What is the purpose of the process step "parsing" in text analysis?

1. computes the TF-IDF values for all keywords and indices
2. executes the clustering and classification to organize the contents
3. Access Mostly Uused Products by 50000+ Subscribers
4. imposes a structure on the unstructured/semi-structured text for downstream analysis