Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : Refer to exhibit

You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
You cannot request additional datA. what is a way that you could try to increase the R2 of the
model without artificially inflating it?

1. Create clusters based on the data and use them as model inputs
2. Force all 15 variables into the model as independent variables
3. Create interaction variables based only on variables A, B, and C
4. Break variables A, B, and C into their own univariate models

Correct Answer : 1

Explanation: In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variable) denoted X. The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. (This term should be distinguished from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.)
In linear regression, data are modeled using linear predictor functions, and unknown model parameters are estimated from the data. Such models are called linear models.[3] Most commonly, linear regression refers to a model in which the conditional mean of y given the value of X is an affine function of X. Less commonly, linear regression could refer to a model in which the median, or some other quantile of the conditional distribution of y given X is expressed as a linear function of X. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of y given X, rather than on the joint probability distribution of y and X, which is the domain of multivariate analysis.

Question : You have two tables of customers in your database. Customers in cust_table_ were sent an email
promotion last year, and customers in cust_table_2 received a newsletter last year.
Customers can only be entered in once per table. You want to create a table that includes all
customers, and any of the communications they received last year. Which type of join would you
use for this table?

1. Full outer join
2. Inner join
3. Left outer join
4. Cross join

Correct Answer : 1

Explanation: The FULL OUTER JOIN keyword returns all rows from the left table (table1) and from the right table (table2).

The FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.

SQL FULL OUTER JOIN Syntax
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;

Question : In which lifecycle stage are initial hypotheses formed?

1. Model planning
2. Discovery
3. Model building
4. Data preparation

Correct Answer : 2

Explanation: Phase 1-Discovery: In Phase 1, the team learns the business domain, including
relevant history such as whether the organization or business unit has attempted
similar projects in the past from which they can learn. The team assesses the
resources available to support the project in terms of people, technology, time, and
data. Important activities in this phase include framing the business problem as an
analytics challenge that can be addressed in subsequent phases and formulating initial
hypotheses (IHs) to test and begin learning the data.

Related Questions

Question : You have been assigned to do a study of the daily revenue effect of a pricing model of online
transactions. When have you completed the analytics lifecycle?

1. You have a completely developed model based on both a sample of the data and the entire set
of data available.
2. You have presented the results of the model to both the internal analytics team and the
business owner of the project.
3. Access Mostly Uused Products by 50000+ Subscribers
results
4. You have written documentation, and the code has been handed off to the Data Base
Administrator and business operations.

Question : Consider these itemsets:
(hat, scarf, coat)
(hat, scarf, coat, gloves)
(hat, scarf, gloves)
(hat, gloves)
(scarf, coat, gloves)
What is the confidence of the rule (gloves -> hat)?

1. 75%
2. 60%
3. Access Mostly Uused Products by 50000+ Subscribers
4. 80%

Question : What is holdout data?

1. a subset of the provided data set selected at random and used to initially construct the model
2. a subset of the provided data set that is removed by the data scientist because it contains data errors
3. Access Mostly Uused Products by 50000+ Subscribers
4. a subset of the provided data set selected at random and used to validate the model

Question : Which characteristic applies mainly to Data Science as opposed to Business Intelligence?

1. Data dashboards
2. Focus on structured data
3. Access Mostly Uused Products by 50000+ Subscribers
4. Advanced analytical methods

Question : Which word or phrase completes the statement?
Theater actor is to "Artistic and Expressive" as Data Scientist is to ________________

1. Introverted and Technical
2. Logical and Steadfast
3. Access Mostly Uused Products by 50000+ Subscribers
4. Communicative and Collaborative

Question : Which process in text analysis can be used to reduce dimensionality?

1. Parsing
2. Stemming
3. Access Mostly Uused Products by 50000+ Subscribers
4. Sorting