Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : : In which phase of the data analytics lifecycle do Data Scientists spend the most time in a project?

1. Discovery
2. Data Preparation
3. Model Building
4. Communicate Results

Correct Answer : 2
Explanation: Phase 1-Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data.
Phase 2-Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data
Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4-Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and workflows (for example, fast hardware and parallel processing, if applicable).
Phase 5-Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders.
Phase 6-Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.

Question : You are testing two new weight-gain formulas for puppies. The test gives the results:
Control group: 1% weight gain
Formula A. 3% weight gain
Formula B. 4% weight gain
A one-way ANOVA returns a p-value = 0.027
What can you conclude?

1. Formula A and Formula B are about equally effective at promoting weight gain.
2. Formula A and Formula B are both effective at promoting weight gain.
3. Formula B is more effective at promoting weight gain than Formula A.
4. Either Formula A or Formula B is effective at promoting weight gain.

Correct Answer : 4
Explanation: A One-Way ANOVA (Analysis of Variance) is a statistical technique by which we can test if three or more means are equal. It tests if the value of a single variable differs significantly among three or more levels of a factor.

We can say we have a framework for one-way ANOVA when we have a single factor with three or more levels and multiple observations at each level.

In this kind of layout, we can calculate the mean of the observations within each level of our factor.

The concepts of factor, levels and multiple observations at each level can be best understood by an example.

Factor and Levels - An Example

Let us suppose that the Human Resources Department of a company desires to know if occupational stress varies according to age.

The variable of interest is therefore occupational stress as measured by a scale.

The factor being studied is age. There is just one factor (age) and hence a situation appropriate for one-way ANOVA.

Further suppose that the employees have been classified into three groups (levels):

less than 40
40 to 55
above 55
These three groups are the levels of factor age - there are three levels here. With this design, we shall have multiple observations in the form of scores on Occupational Stress from a number of employees belonging to the three levels of factor age. We are interested to know whether all the levels i.e. age groups have equal stress on the average.

Non-significance of the test statistic (F-statistic) associated with this technique would imply that age has no effect on stress experienced by employees in their respective occupations. On the other hand, significance would imply that stress afflicts different age groups differently.

Question : Data visualization is used in the final presentation of an analytics project. For what else is this
technique commonly used?

1. Data exploration
2. Descriptive statistics
3. ETLT
4. Model selection

Correct Answer : 1
Explanation: Data exploration is an informative search used by data consumers to form true analysis from the information gathered. Often, data is gathered in a non-rigid or controlled manner in large bulks. For true analysis, this unorganized bulk of data needs to be narrowed down. This is where data exploration is used to analyze the data and information from the data to form further analysis.

Data often converges in a central warehouse called a data warehouse. This data can come from various sources using various formats. Relevant data is needed for tasks such as statistical reporting, trend spotting and pattern spotting. Data exploration is the process of gathering such relevant data. There are two main methodologies or techniques used to retrieve relevant data from large, unorganized pools. They are the manual and automatic methods. The manual method is another name for data exploration, while the automatic method is also known as data mining.

Some people believe these terms are synonymous, while others see a technical difference between them. Data mining generally refers to gathering relevant data from large databases. Data exploration, on the other hand, generally refers to a data user being able to find his or her way through large amounts of data in order to gather necessary information.

Related Questions

Question : In which lifecycle stage are test and training data sets created?

1. Model planning
2. Discovery
3. Access Mostly Uused Products by 50000+ Subscribers
4. Data preparation

Question : When creating a presentation for a technical audience, what is the main objective?

1. Show that you met the project goals
2. Show how you met the project goals
3. Access Mostly Uused Products by 50000+ Subscribers
4. Show the technique to be used in the production environment

Question : Your company has different sales teams. Each team's sales manager has developed incentive
offers to increase the size of each sales transaction. Any sales manager whose incentive program
can be shown to increase the size of the average sales transaction will receive a bonus.
Data are available for the number and average sale amount for transactions offering one of the
incentives as well as transactions offering no incentive.
The VP of Sales has asked you to determine analytically if any of the incentive programs has
resulted in a demonstrable increase in the average sale amount. Which analytical technique would
be appropriate in this situation?

1. One-way ANOVA
2. Multi-way ANOVA
3. Access Mostly Uused Products by 50000+ Subscribers
4. Wilcoxson Rank Sum Test

Question : In data visualization, what is used to focus the audience on a key part of a chart?

1. Detailed text
2. Emphasis colors
3. Access Mostly Uused Products by 50000+ Subscribers
4. A data table

Question : Which word or phrase completes the statement? Data-ink ratio is to data visualization as
__________ .

1. Confusion matrix is to classifier
2. Data scientist is to big data
3. Access Mostly Uused Products by 50000+ Subscribers
4. K-means is to Naive Bayes

Question : Consider a database with transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
You decide to run the association rules algorithm where minimum support is 50%. Which rule has
a confidence at least 50%?

1. {soda} => {milk}
2. {milk} => {soda}
3. Access Mostly Uused Products by 50000+ Subscribers
4. {cheese} => {bread}