Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : What is an appropriate data visualization to use in a presentation for an analyst audience?

1. Pie chart
2. ROC curve
3. Access Mostly Uused Products by 50000+ Subscribers
4. Stacked bar chart

Correct Answer : Get Lastest Questions and Answer :
Exp: In a ROC curve the true positive rate (Sensitivity) is plotted in function of the false positive rate (100-Specificity) for different cut-off points of a parameter. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve (AUC) is a measure of how well a parameter can distinguish between two diagnostic groups (diseased/normal). Logistic regression is often used as a classifier to assign class labels to a person, item, or transaction based on the predicted probability provided by the model. In the Churn example, a customer can be classified with the label called Churn if the logistic model predicts a high probability that the customer will churn. Otherwise, a Remain label is assigned to the customer. Commonly, 0.5 is used as the default probability threshold to distinguish between any two class labels. However, any threshold value can be used depending on the preference to avoid false positives (for example, to predict Churn when actually the customer will Remain) or false negatives (for example, to predict Remain when the customer will actually Churn).

Question : When would you use GROUP BY ROLLUP clause in your OLAP query?

1. where only the subtotals are to be included in the output
2. where only the grand totals are to be included in the output
3. Access Mostly Uused Products by 50000+ Subscribers
in the output
4. where all subtotals and grand totals are to be included in the output

Correct Answer : Get Lastest Questions and Answer :
Exp: The ROLLUP, CUBE, and GROUPING SETS operators are extensions of the GROUP BY clause. The ROLLUP, CUBE, or GROUPING SETS operators can generate the same result set as when you use UNION ALL to combine single grouping queries; however, using one of the GROUP BY operators is usually more efficient.
The GROUPING SETS operator can generate the same result set as that generated by using a simple GROUP BY, ROLLUP, or CUBE operator. When all the groupings that are generated by using a full ROLLUP or CUBE operator are not required, you can use GROUPING SETS to specify only the groupings that you want. The GROUPING SETS list can contain duplicate groupings; and, when GROUPING SETS is used with ROLLUP and CUBE, it might generate duplicate groupings. Duplicate groupings are retained as they would be by using UNION ALL. Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy. A query with a GROUP BY ROLLUP clause returns the same aggregated data as an equivalent query with a GROUP BY clause. It also returns multiple levels of subtotal rows. You can include up to three fields in a comma-separated list in a GROUP BY ROLLUP clause.

The GROUP BY ROLLUP clause adds subtotals at different levels, aggregating from right to left through the list of grouping columns. The order of rollup fields is important. A query that includes three rollup fields returns the following rows for totals:

First-level subtotals for each combination of fieldName1 and fieldName2. Results are grouped by fieldName3.
Second-level subtotals for each value of fieldName1. Results are grouped by fieldName2 and fieldName3.
One grand total row

Question : Which type of numeric value does a logistic regression model estimate?

1. A p-value
2. Any integer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Any real number

Correct Answer : Get Lastest Questions and Answer :
Exp: Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables.

Examples

Example 1: Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively, and whether the candidate is an incumbent.

Example 2: A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The outcome variable, admit/don't admit, is binary.

Related Questions

Question : Refer to the exhibit.
You are using K-means clustering to classify customer behavior for a large retailer. You need to
determine the optimum number of customer groups. You plot the within-sum-of-squares (wss)
data as shown in the exhibit. How many customer groups should you specify?

1. 2
2. 3
3. Access Mostly Uused Products by 50000+ Subscribers
4. 8

Question : Refer to the exhibit.
Click on the calculator icon in the upper left corner. You are given a list of pre-defined association
rules:
A) RENTER => BAD CREDIT
B) RENTER => GOOD CREDIT
C) HOME OWNER => BAD CREDIT
D) HOME OWNER => GOOD CREDIT
E) FREE HOUSING => BAD CREDIT
F) FREE HOUSING => GOOD CREDIT
For your next analysis, you must limit your dataset based on rules with confidence greater than
60%.
Which of the rules will be kept in the analysis?

1. Rules B and D
2. Rules A and F
3. Access Mostly Uused Products by 50000+ Subscribers
4. Rules D and E

Question : Refer to the exhibit.
You are using k-means clustering to discover groupings within a data set. You plot within-sum-ofsquares
(wss) of multiple cluster sizes. Based on the exhibit, how many clusters should you use in
your analysis?

1. 2
2. 8
3. Access Mostly Uused Products by 50000+ Subscribers
4. 10

Question : Refer to the exhibit
Consider the training data set shown in the exhibit. What are the classification (Y = 0 or 1) and the
probability of the classification for the tupleX(0, 0, 1) using Naive Bayesian classifier?

1. Classification Y = 0, Probability = 1/54
2. Classification Y = 1, Probability = 1/54
3. Access Mostly Uused Products by 50000+ Subscribers
4. Classification Y = 0, Probability = 4/54

Question : Refer to the exhibit.
In the exhibit, a correlogram is provided based on an autocorrelation analysis of a sample dataset.
What can you conclude from only this exhibit?

1. There is no structure left to model in the data
2. Lag 7 has a significant negative autocorrelation
3. Access Mostly Uused Products by 50000+ Subscribers
4. Differencing is required before proceeding with any analysis

Question : Refer to the exhibit
Which type of data issue would you suspect based on the exhibit?

1. "Saturated" data, indicating potential issues with data definitions
2. Incomplete data, indicating potential issues with data transmission
3. Access Mostly Uused Products by 50000+ Subscribers
4. The exhibit does not raise any obvious concerns with the data.