Premium

Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)



Question : In data visualization, what is used to focus the audience on a key part of a chart?
 : In data visualization, what is used to focus the audience on a key part of a chart?
1. Detailed text
2. Emphasis colors
3. Access Mostly Uused Products by 50000+ Subscribers
4. A data table



Correct Answer : Get Lastest Questions and Answer :


Explanation: Our brains are compelled to find meaning, whether it is intended or not. Because the eyes are attracted to bright and high-contrast colors, viewers will derive meaning from something that stands out. When you use color for emphasis, it's like shouting that this object or element has the greatest value. At the Lynda.com site, the bright yellow is used to prominently display their most important message.








Question : Which word or phrase completes the statement? Data-ink ratio is to data visualization as
__________ .


 :  Which word or phrase completes the statement? Data-ink ratio is to data visualization as
1. Confusion matrix is to classifier
2. Data scientist is to big data
3. Access Mostly Uused Products by 50000+ Subscribers
4. K-means is to Naive Bayes


Correct Answer : Get Lastest Questions and Answer : Exp:
A confusion matrix (Kohavi and Provost, 1998) contains information about actual and predicted classifications done by a classification system. Performance of such systems is commonly evaluated using the data in the matrix. The following table shows the confusion matrix for a two class classifier.

The entries in the confusion matrix have the following meaning in the context of our study:

a is the number of correct predictions that an instance is negative,
b is the number of incorrect predictions that an instance is positive,
c is the number of incorrect of predictions that an instance negative, and
d is the number of correct predictions that an instance is positive.




The accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation:
AC = (a+d)/(a+b+c+d)
The recall or true positive rate (TP) is the proportion of positive cases that were correctly identified, as calculated using the equation:
TP=d/(c+d)
The false positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated usingthe equation:
FP=b/a+b
The true negative rate (TN) is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation:
TB=a/a+b
The false negative rate (FN) is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation:
FN=c/c+d
Finally, precision (P) is the proportion of the predicted positive cases that were correct, as calculated using the equation:
P=d/b+d




Question : Consider a database with transactions:
Transaction 1: {cheese, bread, milk}
Transaction 2: {soda, bread, milk}
Transaction 3: {cheese, bread}
Transaction 4: {cheese, soda, juice}
You decide to run the association rules algorithm where minimum support is 50%. Which rule has
a confidence at least 50%?

 : Consider a database with  transactions:
1. {soda} => {milk}
2. {milk} => {soda}
3. Access Mostly Uused Products by 50000+ Subscribers
4. {cheese} => {bread}


Correct Answer : Get Lastest Questions and Answer :
Exp: If you see out of 4 association only two has association as {cheese and Bread}




Related Questions


Question : Refer to the Exhibit.
In the Exhibit. For effective visualization, what is the chart's primary flaw?


 : Refer to the Exhibit.
1. The slanting of axis labels.
2. The location of the legend.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The order of the columns.




Question : Refer to the exhibit
You have plotted the distribution of savings account sizes for your bank. How would you proceed,
based on this distribution?

 : Refer to the exhibit
1. The data is extremely skewed. Replot the data on a logarithmic scale to get a better sense of it.
2. The data is extremely skewed, but looks bimodal; replot the data in the range 2, 500-10, 000 to be sure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The data is extremely skewed. Split your analysis into two cohorts: accounts less than 2500, and accounts greater than 2500




Question : Refer to the exhibit.
In the exhibit, a correlogram is provided based on an autocorrelation analysis of a sample dataset.
What can you conclude based only on this exhibit?
 : Refer to the exhibit.
1. There appears to be a seasonal component in the data
2. Lag 1 has a significant autocorrelation
3. Access Mostly Uused Products by 50000+ Subscribers
4. There appears to be no structure left to model in the data




Question : Refer to the exhibit.
In the exhibit, the x-axis represents the derived probability of a borrower defaulting on a loan. Also
in the exhibit, the pink represents borrowers that are known to have not defaulted on their loan,
and the blue represents borrowers that are known to have defaulted on their loan.
Which analytical method could produce the probabilities needed to build this exhibit?

 : Refer to the exhibit.
1. Linear Regression
2. Logistic Regression
3. Access Mostly Uused Products by 50000+ Subscribers
4. Association Rules




Question : Refer to the exhibit.
You have created a density plot of purchase
amounts from a retail website as shown. What should
you do next?
 : Refer to the exhibit.
1. Recreate the plot using the barplot() function
2. Use the rug() function to add elements to the plot
3. Access Mostly Uused Products by 50000+ Subscribers
4. Reduce the sample size of the purchase amount data used to create the plot




Question : Refer to the exhibit.
You are building a decision tree. In this exhibit, four variables are listed with their respective values
of info-gain.
Based on this information, on which attribute would you expect the next split to be in the decision
tree?


 : Refer to the exhibit.
1. Credit Score
2. Age
3. Access Mostly Uused Products by 50000+ Subscribers
4. Gender