SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a ______, as it is scale-dependent.

1. Between Variables
2. Particular Variable
3. Among all the variables
4. All of the above are correct

Correct Answer : 2

Explanation: The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.

Question : Let's say you have two cases as below for the movie ratings
1. You recommend to a user a movie with four stars and he really doesn't like it and he'd rate it two stars
2. You recommend a movie with three stars but the user loves it (he'd rate it five stars).
So which statement correctly applies?

1. In both cases, the contribution to the RMSE is the same
2. In both cases, the contribution to the RMSE is the different
3. In both cases, the contribution to the RMSE, could varies
4. None of the above

Correct Answer : 1

Explanation:

Question : RMSE is a useful metric for evaluating which types of models?

1. Logistic regression
2. Naive Bayes classifier
3. Linear regression
4. All of the above

Correct Answer : 3

Explanation: Error calculation allows you to see how well a machine learning method is performing.
One way of determining this performance is to calculate a numerical error. This number is sometimes a percent,
however it can also be a score or distance. The goal is usually to minimize an error percent or distance,
however th goal may be to minimize or maximize a score. Encog supports the following error calculation methods.

Sum of Squares Error (ESS)
Root Mean Square Error (RMS)
Mean Square Error (MSE) (default)
SOM Error (Euclidean Distance Error)

RMSE measures error of a predicted numeric value, and so applies to contexts like regression and some recommender system techniques,
which rely on predicting a numeric value. It is not relevant to classification techniques
like logistic regression and Naive Bayes, which predict categorical values.
It also is not relevant to unsupervied techniques like clustering.

The root-mean-square deviation (RMSD) or root-mean-square error (RMSE) is a frequently used measure of the
differences between values predicted by a model or an estimator and the values actually observed. Basically,
the RMSD represents the sample standard deviation of the differences between predicted values and observed values.
These individual differences are called residuals when the calculations are performed over the data sample that was used for estimation,
and are called prediction errors when computed out-of-sample. The RMSD serves to aggregate the magnitudes
of the errors in predictions for various times into a single measure of predictive power. RMSD is a good measure of accuracy,
but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.

Related Questions

Question : Consider the boxplot below.
Which of the following statements are true?
I. The distribution is skewed right.
II. The interquartile range is about 8.
III. The median is about 10.

1. I only
2. II only
3. III only
4. I and III

Question : Assume some output variable "y" is a linear combination of some independent input variables "A" plus some independent noise "e".
The way the independent variables are combined is defined by a parameter vector B
y=AB+e
where X is an m x n matrix, B is a vector of n unknowns, and b is a vector of m values.
Assuming that m is not equal to n and the columns of X are linearly independent, which expression correctly solves for B?

1. A
2. B
3. C
4. D

Question : This question will ask you to provide missing code segments.
A logistic regression model was fit on a data set where 40% of the outcomes
were events(TARGET=1) and 60% were non-events (TARGET=0).
The analyst knows that the population where the model
will be deployed has 5% events and 95% non-events.
The analyst also knows that the company's profit margin for correctly
targeted events is nine times higher than the company's loss for incorrectly targeted non-event.
Given the following SAS program:
What X and Y values should be added to the program to correctly score the data?

1. X=40, Y=10
2. X=.05, Y=10
3. X=.05, Y=.40
4. X=.10.Y=05

Question : An analyst has a sufficient volume of data to perform a -way partition of the data into training,
validation, and test sets to perform honest assessment during the model building process. What is the purpose of the test data set?

1. To provide a unbiased measure of assessment for the final model.
2. To compare models and select and fine-tune the final model.
3. To reduce total sample size to make computations more efficient.
4. To build the predictive models.

Question : Refer to the confusion matrix:
Calculate the sensitivity. (0 - negative outcome, 1 - positive outcome)
Click the calculator button to display a calculator if needed.

1. 25/48
2. 58/102
3. 25/89
4. 58/81

Question :

The total modeling data has been split into training, validation, and test data. What is the best data to use for model assessment?

1. Training data
2. Total data
3. Test data
4. Validation data