Question A data scientist plans to classify the sentiment polarity of , product reviews collected from the Internet. What is the most appropriate model to use? Suppose labeled training data is available.
Explanation: Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. It is not a single algorithm for training such classifiers, but a family of algorithms based on a common principle: all naive Bayes classifiers assume that the value of a particular feature is independent of the value of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive Bayes classifier considers each of these features to contribute independently to the probability that this fruit is an apple, regardless of any possible correlations between the color, roundness and diameter features.
For some types of probability models, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood; in other words, one can work with the naive Bayes model without accepting Bayesian probability or using any Bayesian methods.
Despite their naive design and apparently oversimplified assumptions, naive Bayes classifiers have worked quite well in many complex real-world situations. In 2004, an analysis of the Bayesian classification problem showed that there are sound theoretical reasons for the apparently implausible efficacy of naive Bayes classifiers.[5] Still, a comprehensive comparison with other classification algorithms in 2006 showed that Bayes classification is outperformed by other approaches, such as boosted trees or random forests.[6]
An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters necessary for classification
Question : When would you use GROUP BY ROLLUP clause in your OLAP query?
1. where only the subtotals are to be included in the output 2. where only the grand totals are to be included in the output 3. Access Mostly Uused Products by 50000+ Subscribers in the output 4. where all subtotals and grand totals are to be included in the output
Correct Answer : Get Lastest Questions and Answer : Exp: The ROLLUP, CUBE, and GROUPING SETS operators are extensions of the GROUP BY clause. The ROLLUP, CUBE, or GROUPING SETS operators can generate the same result set as when you use UNION ALL to combine single grouping queries; however, using one of the GROUP BY operators is usually more efficient. The GROUPING SETS operator can generate the same result set as that generated by using a simple GROUP BY, ROLLUP, or CUBE operator. When all the groupings that are generated by using a full ROLLUP or CUBE operator are not required, you can use GROUPING SETS to specify only the groupings that you want. The GROUPING SETS list can contain duplicate groupings; and, when GROUPING SETS is used with ROLLUP and CUBE, it might generate duplicate groupings. Duplicate groupings are retained as they would be by using UNION ALL. Queries that use the ROLLUP and CUBE operators generate some of the same result sets and perform some of the same calculations as OLAP applications. The CUBE operator generates a result set that can be used for cross tabulation reports. A ROLLUP operation can calculate the equivalent of an OLAP dimension or hierarchy. A query with a GROUP BY ROLLUP clause returns the same aggregated data as an equivalent query with a GROUP BY clause. It also returns multiple levels of subtotal rows. You can include up to three fields in a comma-separated list in a GROUP BY ROLLUP clause.
The GROUP BY ROLLUP clause adds subtotals at different levels, aggregating from right to left through the list of grouping columns. The order of rollup fields is important. A query that includes three rollup fields returns the following rows for totals:
First-level subtotals for each combination of fieldName1 and fieldName2. Results are grouped by fieldName3. Second-level subtotals for each value of fieldName1. Results are grouped by fieldName2 and fieldName3. One grand total row
Correct Answer : Get Lastest Questions and Answer : Exp: Logistic regression, also called a logit model, is used to model dichotomous outcome variables. In the logit model the log odds of the outcome is modeled as a linear combination of the predictor variables.
Examples
Example 1: Suppose that we are interested in the factors that influence whether a political candidate wins an election. The outcome (response) variable is binary (0/1); win or lose. The predictor variables of interest are the amount of money spent on the campaign, the amount of time spent campaigning negatively, and whether the candidate is an incumbent.
Example 2: A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The outcome variable, admit/don't admit, is binary.