Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : You are using the Apriori algorithm to determine the likelihood that a person who owns a home
has a good credit score. You have determined that the confidence for the rules used in the
algorithm is > 75%. You calculate lift = 1.011 for the rule, "People with good credit are
homeowners". What can you determine from the lift calculation?

1. Support for the association is low
2. Leverage of the rules is low
3. Access Mostly Uused Products by 50000+ Subscribers
4. The rule is true

Correct Answer : Get Lastest Questions and Answer : Exp: Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database: this has applications in domains such as market basket analysis.
The whole point of the algorithm (and data mining, in general) is to extract useful information from large amounts of data. For example, the information that a customer who purchases a keyboard also tends to buy a mouse at the same time is acquired from the association rule below:

Support: The percentage of task-relevant data transactions for which the pattern is true.

Support (Keyboard -> Mouse) = No. of Transactions containing both Keyboards and Mouse/No. of total transactions

Confidence: The measure of certainty or trustworthiness associated with each discovered pattern.

Confidence (Keyboard -> Mouse) = No. of Transactions containing both Keyboards and Mouse/No. of transactions containing (Keyboard)

The algorithm aims to find the rules which satisfy both a minimum support threshold and a minimum confidence threshold (Strong Rules).

Item: article in the basket.
Itemset: a group of items purchased together in a single transaction.

Question : Consider a database with transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
The minimum support is 25%. Which rule has a confidence equal to 50%?

1. {bread} => {milk}
2. {bread, milk} => {cheese}
3. Access Mostly Uused Products by 50000+ Subscribers
4. {bread} => {cheese}

Correct Answer : Get Lastest Questions and Answer : Exp: Only one time {cheese, bread, milk}

Question : Under which circumstance do you need to implement N-fold cross-validation after creating a regression model?

1. The data is unformatted.
2. There is not enough data to create a test set.
3. Access Mostly Uused Products by 50000+ Subscribers
4. There are categorical variables in the model.

Correct Answer : Get Lastest Questions and Answer : Exp: In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables - that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution. Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable; for example, correlation does not imply causation. Many techniques for carrying out regression analysis have been developed. Familiar methods such as linear regression and ordinary least squares regression are parametric, in that the regression function is defined in terms of a finite number of unknown parameters that are estimated from the data. Nonparametric regression refers to techniques that allow the regression function to lie in a specified set of functions, which may be infinite-dimensional.

The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results. EXAMPLE USES OF REGRESSION MODELS

Selecting Colleges : A high school student discusses plans to attend college with a guidance counselor. The student has a 2.04 grade point average out of 4.00 maximum and mediocre to poor scores on the ACT. He asks about attending Harvard. The counselor tells him he would probably not do well at that institution, predicting he would have a grade point average of 0.64 at the end of four years at Harvard. The student inquires about the necessary grade point average to graduate and when told that it is 2.25, the student decides that maybe another institution might be more appropriate in case he becomes involved in some "heavy duty partying." When asked about the large state university, the counselor predicts that he might succeed, but chances for success are not great, with a predicted grade point average of 1.23. A regional institution is then proposed, with a predicted grade point average of 1.54. Deciding that is still not high enough to graduate, the student decides to attend a local community college, graduates with an associates degree and makes a fortune selling real estate. If the counselor was using a regression model to make the predictions, he or she would know that this particular student would not make a grade point of 0.64 at Harvard, 1.23 at the state university, and 1.54 at the regional university. These values are just "best guesses." It may be that this particular student was completely bored in high school, didn't take the standardized tests seriously, would become challenged in college and would succeed at Harvard. The selection committee at Harvard, however, when faced with a choice between a student with a predicted grade point of 3.24 and one with 0.64 would most likely make the rational decision of the most promising student.

Pregnancy : A woman in the first trimester of pregnancy has a great deal of concern about the environmental factors surrounding her pregnancy and asks her doctor about what to impact they might have on her unborn child. The doctor makes a "point estimate" based on a regression model that the child will have an IQ of 75. It is highly unlikely that her child will have an IQ of exactly 75, as there is always error in the regression procedure. Error may be incorporated into the information given the woman in the form of an "interval estimate." For example, it would make a great deal of difference if the doctor were to say that the child had a ninety-five percent chance of having an IQ between 70 and 80 in contrast to a ninety-five percent chance of an IQ between 50 and 100. The concept of error in prediction will become an important part of the discussion of regression models. It is also worth pointing out that regression models do not make decisions for people. Regression models are a source of information about the world. In order to use them wisely, it is important to understand how they work.

Related Questions

Question : Refer to the Exhibit.
In the Exhibit, the table shows the values for the
input Boolean attributes "A", "B", and "C". It also
shows the values for the output attribute "class".
Which decision tree is valid for the data?

1. Tree A
2. Tree B
3. Access Mostly Uused Products by 50000+ Subscribers
4. Tree D

Question : Refer to the Exhibit.
In the Exhibit, the table shows the values for the
input Boolean attributes "A", "B", and "C". It also
shows the values for the output attribute "class".
Which decision tree is valid for the data?

1. Tree A
2. Tree B
3. Access Mostly Uused Products by 50000+ Subscribers
4. Tree D

Question : Refer to the exhibit.
You are assigned to do an end of the year sales analysis of 1, 000 different products,
based on the transaction table. Which column in the end of year report requires the
use of a window function?

1. Total Sales to Date
2. Daily Sales
3. Access Mostly Uused Products by 50000+ Subscribers
4. Maximum Price

Question : Refer to the Exhibit.
You are working on creating an OLAP query that outputs several rows of with summary rows of
subtotals and grand totals in addition to regular rows that may contain NULL as shown in the
exhibit. Which function can you use in your query to distinguish the row from a regular row to a
subtotal row?

1. GROUPING
2. RANK
3. Access Mostly Uused Products by 50000+ Subscribers
4. ROLLUP

Question : Refer to the exhibit.
After analyzing a dataset, you report findings to your team:
1. Variables A and C are significantly and positively impacting the dependent variable.
2. Variable B is significantly and negatively impacting the dependent variable.
3. Access Mostly Uused Products by 50000+ Subscribers
After seeing your findings, the majority of your team agreed that variable B should be positively
impacting the dependent variable.
What is a possible reason the coefficient for variable B was negative and not positive?

1. The information gain from variable B is already provided by another variable
2. Variable B needs a quadratic transformation due to its relationship to the dependent variable
3. Access Mostly Uused Products by 50000+ Subscribers
4. Variable B needs a logarithmic transformation due to its relationship to the dependent variable

Question : Refer to the exhibit.
You have run a linear regression model against your data, and have plotted true outcome versus
predicted outcome. The R-squared of your model is 0.75. What is your assessment of the model?

1. The R-squared may be biased upwards by the extreme-valued outcomes. Remove them and
refit to get a better idea of the model's quality over typical data.
2. The R-squared is good. The model should perform well.
3. Access Mostly Uused Products by 50000+ Subscribers
see if the R-squared improves over typical data.
4. The observations seem to come from two different populations, but this model fits them both
equally well.