SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : A marketing campaign will send brochures describing an expensive product to a set of customers.
The cost for mailing and production per customer is $50. The company makes $500 revenue for each sale.
What is the profit matrix for a typical person in the population?

1. A
2. B
3. C
4. D

Correct Answer : 3
There are two distinct ways of using decision processing in SAS Enterprise Miner:
" Making firm decisions in the modeling nodes and comparing models on profit and loss summary statistics. For this approach, you include all possible decisions in the decision matrix. This is the traditional approach in statistical decision theory.
" Using a profit chart to set a decision threshold. For this approach, there is an implicit decision (usually a decision to "do nothing") that is not included in the decision matrix. The decisions made in the modeling nodes are tentative. The profit and loss summary statistics from the modeling nodes are not used. Instead, you look at profit charts (similar to lift or gains charts) in the Model Comparison node to decide on a threshold for the do-nothing decision. Then you use a Transform Variables or SAS Code node that sets the decision variable to "do nothing" when the expected profit or loss is not better than the threshold chosen from the profit chart. This approach is popular for business applications such as direct marketing. For example, in the German credit benchmark data set (SAMPSIO.DMAGECR), the target variable indicates whether the credit risk of each loan applicant is good or bad, and a decision must be made to accept or reject each application. It is customary to use the loss matrix:
Customary Loss Matrix for the German Credit Data
Target Value Decision
Accept Reject
Good 0 1
Bad 5 0
This loss matrix says that accepting a bad credit risk is five times worse than rejecting a good credit risk. But this matrix also says that you cannot make any money no matter what you do. So the results might be difficult to interpret (or perhaps you should just get out of business). In fact, if you accept a good credit risk, you will make money, that is, you will have a negative loss. And if you reject an application (good or bad), there will be no profit or loss aside from the cost of processing the application, which will be ignored. Hence, it would be more realistic to subtract one from the first row of the matrix to give a more realistic loss matrix:
Realistic Loss Matrix for the German Credit Data
Target Value Decision
Accept Reject
Good - 1 0
Bad 5 0
This loss matrix will yield the same decisions and the same model selections as the first matrix, but the summary statistics for the second matrix will be easier to interpret.
In this tutorial, we will approach the German credit data from a cost/profit perspective. Specifically, we assume that a correct decision of the bank would result in 35% of the profit at the end of a specific period, say 3-5 years. Here a correct decision means that the bank predicts that a customer's credit is in good standing (and hence would obtain the loan), and the customer is indeed has good credit. On the other hand, if the model or the manager makes a false prediction that the customer's credit is in good standing, yet the opposite is true, then the bank will result in a unit loss. This concludes the first column of the following profit matrix: In the second column of the matrix, the bank predicted that the customer's credit is not in good standing and declined the loan. Hence there is no gain or loss in the decision.
Note that the data has 70% credit-worthy (good) customers and 30% not-credit-worthy (bad) customers. A manager without any model that gives everybody the loan would result in the following negative profit per customer: (700*0.35- 300*1.00)/1000 = -55/1000 = -0.055 unit loss.
This number (-0.055 unit loss) may seem small. But if the average of the load is $20,000 for this population (n = 1000), then the total loss will be
(-0.055 unit loss)*($20,000 per unit per customer)*(1,000 customers) = -$1,100,000,which would be a whopping one million and one hundred thousand dollar loss.

Question : Select the correct statements from the below.
1. The sum of errors will be larger than mean absolute error if errors are positive
2. The mean absolute error will, be larger than the sum if errors are negative
3. The mean absolute error will, be smaller than the sum if errors are negative
4. RMSE will equal MAE if all errors are equally large
5. RMSE will be smaller if all errors are not equally large
6. RMSE will be larger if all errors are not equally large

1. 1,3,4,6
2. 1,2,4,6
3. 2,3,4,6
4. 2,3,5,6

Correct Answer : 2
Mean Square Error: this is the average squared distance between the predicted and actual values.
RMSE: The square root of mean squared error.
MAE: This is a variation on mean squared error and is simply the average of the absolute value of the difference between the predicted and actual values.

Question : You are working in an ecommerce organization, where you are designing and evaluating a recommender system,
you need to select which of the following metric will always have the largest value?

1. Root Mean Square Error
2. Sum of Errors
3. Mean Absolute Error
4. Information is not good enough.

Correct Answer : 4

Explanation: Mean absolute error (MAE)
The MAE measures the average magnitude of the errors in a set of forecasts, without considering their direction. It measures accuracy for continuous variables. The equation is given in the library references. Expressed in words, the MAE is the average over the verification sample of the absolute values of the differences between forecast and the corresponding observation. The MAE is a linear score which means that all the individual differences are weighted equally in the average.
Root means squared error (RMSE)
The RMSE is a quadratic scoring rule which measures the average magnitude of the error. The equation for the RMSE is given in both of the references. Expressing the formula in words, the difference between forecast and corresponding observed values are each squared and then averaged over the sample. Finally, the square root of the average is taken. Since the errors are squared before they are averaged, the RMSE gives a relatively high weight to large errors. This means the RMSE is most useful when large errors are particularly undesirable.
The MAE and the RMSE can be used together to diagnose the variation in the errors in a set of forecasts. The RMSE will always be larger or equal to the MAE; the greater difference between them, the greater the variance in the individual errors in the sample. If the RMSE=MAE, then all the errors are of the same magnitude
Both the MAE and RMSE can range from 0 to ?. They are negatively-oriented scores: Lower values are better.

Mean Square Error: this is the average squared distance between the predicted and actual values.
RMSE: The square root of mean squared error.
MAE: This is a variation on mean squared error and is simply the average of the absolute value of the difference between the predicted and actual values.

1. The sum of errors will be larger than mean absolute error if errors are positive
2. The mean absolute error will, be larger than the sum if errors are negative
3. RMSE will equal MAE if all errors are equally large
4. RMSE will be larger if all errors are not equally large

Related Questions

Question : A predictive model uses a data set that has several variables with missing values.
What two problems can arise with this model?

A. The model will likely be overfit.
B. There will be a high rate of collinearity among input variables.
C. Complete case analysis means that fewer observations will be used in the model building process.
D. New cases with missing values on input variables cannot be scored without extra data processing.

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D

Question : Spearman statistics in the CORR procedure are useful for screening for irrelevant variables by
investigating the association between which function of the input variables?

1. Concordant and discordant pairs of ranked observations
2. Logit link (log(p/1-p))
3. Access Mostly Uused Products by 50000+ Subscribers
4. Weighted sum of chi-square statistics for 2x2 tables

Question : A non-contributing predictor variable (Pr > |t| =.) is added to an existing multiple linear regression model. What will be the result?

1. An increase in R-Square
2. A decrease in R-Square
3. Access Mostly Uused Products by 50000+ Subscribers
4. No change in R-Square

Question : The standard form of a linear regression model is:
Which statement best summarizes the assumptions placed on the errors?

1. The errors are correlated, normally distributed with constant mean and zero variance.
2. The errors are correlated, normally distributed with zero mean and constant variance.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The errors are independent, normally distributed with zero mean and constant variance.

Question : In a regression line, the ________ the standard error of the estimate is, the more accurate the predictions are.

1. larger
2. smaller
3. Access Mostly Uused Products by 50000+ Subscribers

Question : Identify the correct SAS program for fitting a multiple linear regression model with
dependent variable (y) and four predictor variables (x1-x4).

1. A
2. B
3. Access Mostly Uused Products by 50000+ Subscribers
4. D