Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : Your organization has a website where visitors randomly receive one of two coupons. It is also
possible that visitors to the website will not receive a coupon. You have been asked to determine if
offering a coupon to visitors to your website has any impact on their purchase decision.
Which analysis method should you use?


  : Your organization has a website where visitors randomly receive one of two coupons. It is also
1. K-means clustering
2. Association rules
3. Student T-test
4. One-way ANOVA


Correct Answer : 4

Explanation: In statistics, one-way analysis of variance (abbreviated one-way ANOVA) is a technique used to compare means of three or more samples (using the F distribution). This technique can be used only for numerical data.[1]

The ANOVA tests the null hypothesis that samples in two or more groups are drawn from populations with the same mean values. To do this, two estimates are made of the population variance. These estimates rely on various assumptions (see below). The ANOVA produces an F-statistic, the ratio of the variance calculated among the means to the variance within the samples. If the group means are drawn from populations with the same mean values, the variance between the group means should be lower than the variance of the samples, following the central limit theorem. A higher ratio therefore implies that the samples were drawn from populations with different mean values.[1]

Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test (Gosset, 1908). When there are only two means to compare, the t-test and the F-test are equivalent; the relation between ANOVA and t is given by F = t2. An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two different categorical independent variables on one dependent variable.Analysis of Variance (ANOVA) is designed to address these issues. ANOVA is a generalization of the hypothesis testing of the difference of two population means. ANOVA tests if any of the population means differ from the other population means. The null hypothesis of ANOVA is that all the population means are equal.







Question : Imagine you are trying to hire a Data Scientist for your team. In addition to technical ability and
quantitative background, which additional essential trait would you look for in people applying for
this position?


  : Imagine you are trying to hire a Data Scientist for your team. In addition to technical ability and
1. Communication skill
2. Scientific background
3. Domain expertise
4. Well Organized



Correct Answer : 1
Explanation: let's discuss how you can be on your way to be an effective Data Scientist.

1. Diverse Technologies - a good Data Scientist is handy with a collection of open-source tools - Hadoop, Java, Python, among others. Knowing when to use those tools, and how to code, are prerequisites. To be a Data Scientist, you should have your hands on a number of tools and technologies, especially open source ones, such as Hadoop, Java, Python, C++, ECL, etc. Besides, having good understanding of database technologies, such as NoSQL database like HBase, CouchDB, etc. is an add-on.

2. Mathematics - The second skill, as you might expect, is a base in statistics, algorithms, machine learning, and mathematics. Conventional computer science degrees no longer satisfy the quest of a data scientist. The job requires someone who on the one hand understands large-scale machine learning algorithms and programming and on the other is a statistician. So, the profile is better suited for experts in other scientific and mathematical disciplines, apart from computer science.

3. Business Skills - As data scientists wear multiple hats, they need to have strong business skills. A data scientist has to communicate with diverse people in an organization that includes communicating and understanding business requirements, application requirements and interpret the patterns and relationships mined from data to people in marketing group, product development teams, and corporate executives. And all this requires good business skills, to get the things done right.

4. Visualization - The fourth set of skills focus on making products real and making data available to users. In other words, this one's a combination of coding skills, an ability to see where data can add value, and collaborating with teams to make these products a reality. You may be able to mine and model data, but are you able to visualize it? Well if not, mind that you should be able to work with some, at least a few of the data visualization tools. Some of these include Tableau, Flare, D3.js, Processing, Google Visualization API, and Raphael.js.

5. Innovation - You don't just have to look around and do with data. You got to think creative, and innovate. A data scientist should be eager to learn more, be curious to find new things, and think out of the box. They should be focused on making products real and making perfectly done data available to users. They should be able to see where data can add value, and how it can brings better results.

6. Problem-Solving Skills This may seem obvious, of course, because data science is all about solving problems. But a good data scientist must take the time to learn what problem needs to be solved, how the solution will deliver value, and how it'll be used and by whom.

7. Communications Skills - Communication is the key to work with various cross-functional team members and present analytics in a compelling and effective manner to the leadership and customers. In other words, you may be brilliant in your rarefied field, but you're not going to be a really good data scientist if you can't communicate with the common folk.






Question : What describes the use of UNION clause in a SQL statement?
  : What describes the use of UNION clause in a SQL statement?
1. Operates on queries and potentially decreases the number of rows
2. Operates on queries and potentially increases the number of rows
3. Operates on tables and potentially decreases the number of columns
4. Operates on both tables and queries and potentially increases both the number of rows and columns



Correct Answer : 2
Explanation: The SQL UNION clause/operator is used to combine the results of two or more SELECT statements without returning any duplicate rows.

To use UNION, each SELECT must have the same number of columns selected, the same number of column expressions, the same data type, and have them in the same order, but they do not have to be the same length.

Syntax:
The basic syntax of UNION is as follows:

SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]

UNION

SELECT column1 [, column2 ]
FROM table1 [, table2 ]
[WHERE condition]



Related Questions


Question : Which word or phrase completes the statement? A data warehouse is to a centralized database
for reporting as an analytic sandbox is to a _______?
 :  Which word or phrase completes the statement? A data warehouse is to a centralized database
1. Collection of data assets for modeling

2. Collection of low-volume databases
3. Centralized database of KPIs

4. Collection of data assets for ETL



Question : You do a Students t-test to compare the average test scores of sample groups from populations A
and B. Group A averaged 10 points higher than group B. You find that this difference is significant,
with a p-value of 0.03. What does that mean?
 :  You do a Students t-test to compare the average test scores of sample groups from populations A
1. There is a 3% chance that you have identified a difference between the populations when in
reality there is none.
2. The difference in scores between a sample from population A and a sample from population B
will tend to be within 3% of 10 points.
3. There is a 3% chance that a sample group from population A will score 10 points higher that a
sample group from population B.
4. There is a 97% chance that a sample group from population A will score 10 points higher that a
sample group from population B.


Question : What is one modeling or descriptive statistical function in MADlib that is typically not provided in a
standard relational database?
 :  What is one modeling or descriptive statistical function in MADlib that is typically not provided in a
1. Expected value
2. Variance
3. Linear regression

4. Quantiles



Question : : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?
 :  : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?
1. Discovery
2. Data Preparation
3. Model Building
4. Communicate Results



Question : You are testing two new weight-gain formulas for puppies. The test gives the results:
Control group: 1% weight gain
Formula A. 3% weight gain
Formula B. 4% weight gain
A one-way ANOVA returns a p-value = 0.027
What can you conclude?

 :   You are testing two new weight-gain formulas for puppies. The test gives the results:
1. Formula A and Formula B are about equally effective at promoting weight gain.
2. Formula A and Formula B are both effective at promoting weight gain.
3. Formula B is more effective at promoting weight gain than Formula A.
4. Either Formula A or Formula B is effective at promoting weight gain.



Question : Data visualization is used in the final presentation of an analytics project. For what else is this
technique commonly used?

 :  Data visualization is used in the final presentation of an analytics project. For what else is this
1. Data exploration
2. Descriptive statistics
3. ETLT
4. Model selection