Dell EMC Data Science Associate Certification Questions and Answers (Dumps and Practice Questions)

Question : Which of the following is not the Classification algorithm?

1. Logistic Regression
2. Support Vector Machine
3. Access Mostly Uused Products by 50000+ Subscribers
4. Hidden Markov Models
5. None of the above

Correct Answer : Get Lastest Questions and Answer :

Explanation: Logistic regression
Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Support Vector Machines
As with naive Bayes, Support Vector Machines (or SVMs) can be used to solve the task of assigning objects to classes. But the way this task is solved is completely different to the setting in naive Bayes.

Neural Network
Neural Networks are a means for classifying multidimensional objects.

Hidden Markov Models
Hidden Markov Models are used in multiple areas of machine learning, such as speech recognition, handwritten letter recognition, or natural language processing.

Question : Suppose a man told you he had a nice conversation with someone on the train. Not knowing anything
about this conversation, the probability that he was speaking to a woman is 50% (assuming the train had an equal
number of men and women and the speaker was as likely to strike up a conversation with a man as with a woman).
Now suppose he also told you that his conversational partner had long hair. It is now more likely he was speaking
to a woman, since women are more likely to have long hair than men. ____________ can be used to calculate
the probability that the person was a woman.

1. SVM
2. MLE
3. Access Mostly Uused Products by 50000+ Subscribers
4. Logistic Regression

Correct Answer : Get Lastest Questions and Answer :

Explanation: : To see how this is done, let W represent the event that the conversation was held with a woman, and L denote the event that the conversation was held with a long-haired person. It can be assumed that women constitute half the population for this example. So, not knowing anything else, the probability that W occurs is P(W) = 0.5.
Suppose it is also known that 75% of women have long hair, which we denote as P(L |W) = 0.75 (read: the probability of event L given event W is 0.75, meaning that the probability of a person having long hair (event "L"), given that we already know that the person is a woman ("event W") is 75%). Likewise, suppose it is known that 15% of men have long hair, or P(L |M) = 0.15, where M is the complementary event of W, i.e., the event that the conversation was held with a man (assuming that every human is either a man or a woman).
Our goal is to calculate the probability that the conversation was held with a woman, given the fact that the person had long hair, or, in our notation, P(W |L). Using the formula for Bayes' theorem

Question : Bayes' theorem cannot finds the actual probability of an event from the results of your tests?

1. True
2. False

Correct Answer : 2

Explanation: Bayes' theorem finds the actual probability of an event from the results of your tests. For example, you can:
Correct for measurement errors. If you know the real probabilities and the chance of a false positive and false negative, you can correct for measurement errors.
Relate the actual probability to the measured test probability. Bayes' theorem lets you relate Pr(A|X), the chance that an event A happened given the indicator X, and Pr(X|A), the chance the indicator X happened given that event A occurred. Given mammogram test results and known error rates, you can predict the actual chance of having cancer.

Related Questions

Question : In data visualization, which type of chart is recommended to represent frequency data?

1. Q-Q chart
2. Scatterplot
3. Access Mostly Uused Products by 50000+ Subscribers
4. Line chart

Question : Which activity might be performed in the Operationalize phase of the Data Analytics Lifecycle?

1. Try different analytical techniques
2. Try different variables
3. Access Mostly Uused Products by 50000+ Subscribers
4. Transform existing variables

Question : Refer to the exhibit.
You are asked to write a report on how specific variables impact your client's sales using a data
set provided to you by the client. The data includes 15 variables that the client views as directly
related to sales, and you are restricted to these variables only.
After a preliminary analysis of the data, the following findings were made:
1. Multicollinearity is not an issue among the variables
2. Only three variables-A, B, and C-have significant correlation with sales
You build a linear regression model on the dependent variable of sales with the independent
variables of A, B, and C. The results of the regression are seen in the exhibit.
Which interpretation is supported by the analysis?

1. Variables A, B, and C are significantly impacting sales and are effectively estimating sales
2. Due to the R2 of 0.10, the model is not valid - the linear regression should be re-run with all 15
variables forced into the model to increase the R2
3. Access Mostly Uused Products by 50000+ Subscribers
4. Due to the R2 of 0.10, the model is not valid - a different analytical model should be attempted

Question : Refer to the Exhibit.
In the Exhibit. For effective visualization, what is the chart's primary flaw?

1. The slanting of axis labels.
2. The location of the legend.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The order of the columns.

Question : Refer to the exhibit
You have plotted the distribution of savings account sizes for your bank. How would you proceed,
based on this distribution?

1. The data is extremely skewed. Replot the data on a logarithmic scale to get a better sense of it.
2. The data is extremely skewed, but looks bimodal; replot the data in the range 2, 500-10, 000 to be sure.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The data is extremely skewed. Split your analysis into two cohorts: accounts less than 2500, and accounts greater than 2500

Question : Refer to the exhibit.
In the exhibit, a correlogram is provided based on an autocorrelation analysis of a sample dataset.
What can you conclude based only on this exhibit?

1. There appears to be a seasonal component in the data
2. Lag 1 has a significant autocorrelation
3. Access Mostly Uused Products by 50000+ Subscribers
4. There appears to be no structure left to model in the data