Dell EMC Data Science and BigData Certification Questions and Answers

Question A data scientist plans to classify the sentiment polarity of , product reviews collected from
the Internet. What is the most appropriate model to use? Suppose labeled training data is
available.

1. Linear regression

2. Logistic regression

3. Access Mostly Uused Products by 50000+ Subscribers
4. Naive Bayesian classifier

Correct Answer : Get Lastest Questions and Answer :

Explanation: Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite
set. It is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the
value of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to
contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the color, roundness and diameter features.

For some types of probability models, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the
method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.

Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem
showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers.[5] Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes
classification is outperformed by other approaches, such as boosted trees or random forests.[6]

An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters necessary for classification

Question : When would you use GROUP BY ROLLUP clause in your OLAP query?

1. where only the subtotals are to be included in the output
2. where only the grand totals are to be included in the output
3. Access Mostly Uused Products by 50000+ Subscribers
in the output
4. where all subtotals and grand totals are to be included in the output

Correct Answer : Get Lastest Questions and Answer :
Exp: The ROLLUP, CUBE, and GROUPING SETS operators are extensions of the GROUP BY clause. The ROLLUP, CUBE, or GROUPING SETS operators can generate the same result set as when you use UNION ALL to combine single
grouping queries; however, using one of the GROUP BY operators is usually more efficient.
The GROUPING SETS operator can generate the same result set as that generated by using a simple GROUP BY, ROLLUP, or CUBE operator. When all the groupings that are generated by using a full ROLLUP or CUBE operator are
not required, you can use GROUPING SETS to specify only the groupings that you want. The GROUPING SETS list can contain duplicate groupings; and, when GROUPING SETS is used with ROLLUP and CUBE, it might generate
duplicate groupings. Duplicate groupings are retained as they would be by using UNION ALL. Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations
as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy. A query with a GROUP BY
ROLLUP clause returns the same aggregated data as an equivalent query with a GROUP BY clause. It also returns multiple levels of subtotal rows. You can include up to three fields in a comma-separated list in a GROUP
BY ROLLUP clause.

The GROUP BY ROLLUP clause adds subtotals at different levels, aggregating from right to left through the list of grouping columns. The order of rollup fields is important. A query that includes three rollup fields
returns the following rows for totals:

First-level subtotals for each combination of fieldName1 and fieldName2. Results are grouped by fieldName3.
Second-level subtotals for each value of fieldName1. Results are grouped by fieldName2 and fieldName3.
One grand total row

Question : Which type of numeric value does a logistic regression model estimate?

1. A p-value
2. Any integer
3. Access Mostly Uused Products by 50000+ Subscribers
4. Any real number

Correct Answer : Get Lastest Questions and Answer :
Exp: Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables.

Examples

Example 1: Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of
interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively, and whether the candidate is an incumbent.

Example 2: A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The
outcome variable, admit/don't admit, is binary.

Related Questions

Question : Which of the following statement is true for the Apriori algorithm?

1. Using the confidence you can say that Rules are Trustworthy and not coincidental

2. Using the confidence you can say that Rules are Trustworthy but not sure whether Rules are coincidental or not.

3. Using the Lift and Leverage you can make sure that rules are identified and filter out the coincidental rules.

4. 1,2
5. 2,3

Question : In which of the following case you can use the Association Rules?
A. You can manage your inventory using this.
B. You can do cross merchandising like products with the high margin
C. You can logically group all the related products on the portal
D. You can physically keep all the related products together
E. You can run the promotions by combining the products

1. A,B,C
2. B,C,D
3. C,D,E
4. B,C,D,E
5. A,B,C,D,E

Question : You have transactions in your dataset. Your marketing team decide that minimum support level is .. How many minimum transactions should be there for an item or combination of item to become
frequent dataset?

1. 30

2. 3000

3. 300

4. 100

5. 3

Question : Which technique you would be using to solve the below problem statement?
"What is the probability that individual customer will not repay the loan amount?"

1. Classification

2. Clustering

3. Linear Regression

4. Logistic Regression

5. Hypothesis testing

Question : What type of output generated in case of linear regression?

1. Continuous variable

2. Discrete Variable

3. Any of the Continuous and Discrete variable

4. Values between 0 and 1

Question : In which of the scenario you can use the linear regression model?
A. Predicting Home Price based on the location and house area
B. Predicting demand of the goods and services based on the weather
C. Predicting tumor size reduction based on input as number of radiation treatment
D. Predicting sales of the text book based on the number of students in state

1. A,B
2. B,C
3. C,D
4. A,B,C
5. A,B,C,D