Dell EMC Data Science and BigData Certification Questions and Answers

Question : Since R factors are categorical variables, they are most closely related to which data classification level?

1. interval
2. ordinal
3. Access Mostly Uused Products by 50000+ Subscribers
4. ratio

Correct Answer : Get Lastest Questions and Answer :

Explanation: Nominal basically refers to categorically discrete data such as name of your school, type of car you drive or name of a book. This one is easy to remember because nominal sounds like name (they have the same
Latin root).

Ordinal refers to quantities that have a natural ordering. The ranking of favorite sports, the order of people's place in a line, the order of runners finishing a race or more often the choice on a rating scale from 1
to 5. With ordinal data you cannot state with certainty whether the intervals between each value are equal. For example, we often using rating scales (Likert questions). On a 10 point scale, the difference between a 9
and a 10 is not necessarily the same difference as the difference between a 6 and a 7. This is also an easy one to remember, ordinal sounds like order.

Interval data is like ordinal except we can say the intervals between each value are equally split. The most common example is temperature in degrees Fahrenheit. The difference between 29 and 30 degrees is the same
magnitude as the difference between 78 and 79 (although I know I prefer the latter). With attitudinal scales and the Likert questions you usually see on a survey, these are rarely interval, although many points on the
scale likely are of equal intervals.

Ratio data is interval data with a natural zero point. For example, time is ratio since 0 time is meaningful. Degrees Kelvin has a 0 point (absolute 0) and the steps in both these scales have the same degree of
magnitude.

Question : In which phase of the analytic lifecycle would you expect to spend most of the project time?

1. Discovery
2. Data preparation
3. Access Mostly Uused Products by 50000+ Subscribers
4. Operationalize

Correct Answer : Get Lastest Questions and Answer :
Explanation: In the data preparation phase of the Data Analytics Lifecycle, the data range and
distribution can be obtained. If the data is skewed, viewing the logarithm of the data (if it's
all positive) can help detect structures that might otherwise be overlooked in a graph with
a regular, nonlogarithmic scale.
When preparing the data, one should look for signs of dirty data, as explained in the
previous section. Examining if the data is unimodal or multimodal will give an idea of
how many distinct populations with different behavior patterns might be mixed into the
overall population. Many modeling techniques assume that the data follows a normal
distribution. Therefore, it is important to know if the available dataset can match that
assumption before applying any of those modeling techniques.

Question : You are building a logistic regression model to predict whether a tax filer will be audited within the
next two years. Your training set population is 1000 filers. The audit rate in your training data is
4.2%. What is the sum of the probabilities that the model assigns to all the filers in your training set
that have been audited?

1. 42.0
2. 4.2
3. Access Mostly Uused Products by 50000+ Subscribers
4. 0.042

Correct Answer : Get Lastest Questions and Answer :

Explanation: Logistic regression can in many ways be seen to be similar to ordinary regression. It models the relationship between a dependent and one or more independent variables, and allows us to look at the fit of the
model as well as at the significance of the relationships (between dependent and independent variables) that we are modelling. However, the underlying principle of binomial logistic regression, and its statistical
calculation, are quite different to ordinary linear regression. While ordinary regression uses ordinary least squares to find a best fitting line, and comes up with coefficients that predict the change in the
dependent variable for one unit change in the independent variable, logistic regression estimates the probability of an event occurring (e.g. the probability of a pupil continuing in education post 16). What we want
to predict from a knowledge of relevant independent variables is not a precise numerical value of a dependent variable, but rather the probability (p) that it is 1 (event occurring) rather than 0 (event not
occurring). This means that, while in linear regression, the relationship between the dependent and the independent variables is linear, this assumption is not made in logistic regression. Instead, the logistic
regression function is used.

Related Questions

Question : Refer to the exhibit.
You have scored your Naive bayesian classifier model on a hold out test data for cross validation
and determined the way the samples scored and tabluated them as shown in the exhibit.
What are the Precision and Recall rate of the model?

1. Precision = 262/277
Recall = 262/288
2. Precision =262/288
Recall = 262/277
3. Access Mostly Uused Products by 50000+ Subscribers
Recall = 288/262
4. Precision = 288/262
Recall = 277/262

Question : Which ROC curve represents a perfect model fit?

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D

Question : Refer to the exhibit.
You have scored your Naive bayesian classifier model on a hold out test data for cross validation
and determined the way the samples scored and tabulated them as shown in the exhibit.
What are the the False Positive Rate (FPR) and the False Negative Rate (FNR) of the model?

1. FPR = 15/262
FNR = 26/288
2. FPR = 26/288
FNR = 15/262
3. Access Mostly Uused Products by 50000+ Subscribers
FNR = 288/26
4. FPR = 288/26
FNR = 262/15

Question : Refer to the exhibit.
Click on the calculator icon in the upper left corner. An analyst is searching a corpus of documents
for the topic "solid state disk". In the Exhibit, Table A provides the inverse document frequency for
each term across the corpus. Table B provides each term's frequency in four documents selected
from corpus. Which of the four documents is most relevant to the analyst's search?

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D

Question : Refer to the exhibit
Click on the calculator icon in the upper left corner. You are going into a meeting where you know
your manager will have a question on your dataset -- specifically relating to customers that are
classified as renters with good credit status.
In order to prepare for the meeting, you create a rule: RENTER => GOOD CREDIT. What is the
confidence of the rule?

1. 63%
2. 41%
3. Access Mostly Uused Products by 50000+ Subscribers
4. 73%

Question :

One can work with the naive Bayes model without accepting Bayesian probability

1. True
2. False