SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : The question will ask you to provide a missing statement. With the given program:
Which SAS statement will complete the program to correctly score the data set NEW_DATA?

1. Scoredata data=MYDIR.NEW_DATA out=scores;
2. Scoredata data=MYDIR.NEW_DATA output=scores;
3. Score data=HYDIR.NEU_DATA output=scores;
4. Score data=MYDIR,NEW DATA out=scores;

Correct Answer : 4
Explanation: This example first illustrates the syntax used for scoring data sets, then uses a previously scored data set to score a new data set. A generalized logit model is fit to the remote-sensing data set used in the section Linear Discriminant Analysis of Remote-Sensing Data on Crops of Chapter 31, The DISCRIM Procedure, to illustrate discrimination and classification methods. In the following DATA step, the response variable is Crop and the prognostic factors are x1 through x4.
data Crops;
length Crop $ 10;
infile datalines truncover;
input Crop $ @@;
do i=1 to 3;
input x1-x4 @@;
if (x1 ^= .) then output; end; input;
datalines;
Corn 16 27 31 33 15 23 30 30 16 27 27 26
Corn 18 20 25 23 15 15 31 32 15 32 32 15
Corn 12 15 16 73
Soybeans 20 23 23 25 24 24 25 32 21 25 23 24
Soybeans 27 45 24 12 12 13 15 42 22 32 31 43
Cotton 31 32 33 34 29 24 26 28 34 32 28 45
Cotton 26 25 23 24 53 48 75 26 34 35 25 78
Sugarbeets 22 23 25 42 25 25 24 26 34 25 16 52
Sugarbeets 54 23 21 54 25 43 32 15 26 54 2 54
Clover 12 45 32 54 24 58 25 34 87 54 61 21
Clover 51 31 31 16 96 48 54 62 31 31 11 11
Clover 56 13 13 71 32 13 27 32 36 26 54 32
Clover 53 08 06 54 32 32 62 16
;
In the following statements, you specify a SCORE statement to use the fitted model to score the Crops data. The data together with the predicted values are saved in the data setScore1. The output from the PLOTS option is discussed at the end of this section.
ods graphics on;
proc logistic data=Crops plots(only)=effect(x=x3);
model Crop=x1-x4 / link=glogit;
score out=Score1; run; ods graphics off;
In the following statements, the model is fit again, the data and the predicted values are saved into the data set Score2, and the OUTMODEL= option saves the fitted model information in the permanent SAS data set sasuser.CropModel:
proc logistic data=Crops outmodel=sasuser.CropModel;
model Crop=x1-x4 / link=glogit; score data=Crops out=Score2; run;

Question : Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable?

1. Adjusted R Square
2. Mallows Cp
3. Chi Square
4. Average Squared Error

Correct Answer : 4

Explanation: The mean squared error is arguably the most important criterion used to evaluate the performance of a predictor or an estimator. (The subtle distinction between predictors and estimators is that random variables are predicted and constants are estimated.) The mean squared error is also useful to relay the concepts of bias, precision, and accuracy in statistical estimation. In order to examine a mean squared error, you need a target of estimation or prediction, and a predictor or estimator that is a function of the data.

Note : Refer the study notes provided by HadoopExam to understand the concept in detail.

Question : Logistic regression is a model used for prediction of the probability of occurrence of an event.
It makes use of several variables that may be___

1. Numerical
2. Categorical
3. Both 1 and 2 are correct
4. None of the 1 and 2 are correct

Correct Answer : 3

Explanation: Logistic regression is a model used for prediction of the probability of occurrence of an event. It makes use of several predictor variables that may be either numerical or categories.

Related Questions

Question : When mean imputation is performed on data after the data is partitioned for honest assessment,
what is the most appropriate method for handling the mean imputation?

1. The sample means from the validation data set are applied to the training and test data sets.
2. The sample means from the training data set are applied to the validation and test data sets.
3. The sample means from the test data set are applied to the training and validation data sets.
4. The sample means from each partition of the data are applied to their own partition.

Question : An analyst generates a model using the LOGISTIC procedure. They are now interested in getting the
sensitivity and specificity statistics on a validation data set for a variety of cutoff values.
Which statement and option combination will generate these statistics?

1. Scoredata=valid1 out=roc;
2. Scoredata=valid1 outroc=roc;
3. mode1resp(event= '1') = gender region/outroc=roc;
4. mode1resp(event"1") = gender region/ out=roc;
Correct answer: 2
The OUTROC= data set contains data necessary for producing the ROC curve. It names the SAS data set that contains the ROC curve for the DATA= data set. The ROC curve is computed only for binary response data. The SCORE statement creates a data set that contains all the data in the DATA= data set together with posterior probabilities and, optionally, prediction confidence intervals. Fit statistics are displayed on request. If you have binary response data, the SCORE statement can be used to create a data set containing data for the ROC curve. You can specify several SCORE statements.

Question : In partitioning data for model assessment, which sampling methods are acceptable?

A. Simple random sampling without replacement
B. Simple random sampling with replacement
C. Stratified random sampling without replacement
D. Sequential random sampling with replacement

1. A,B
2. B,C
3. A,D
4. A,C
5. A,B,C

Question : RMSE measures error of a predicted

1. Numerical Value
2. Categorical values
3. For booth Numerical and categorical values
4. None of the above

Question : Suppose you have made a model for the rating system, which rates between to stars.
And you calculated that RMSE value is 1.0 then which of the following is correct

1. It means that your predictions are on average one star off of what people really think
2. It means that your predictions are on average two star off of what people really think
3. It means that your predictions are on average three star off of what people really think
4. It means that your predictions are on average four star off of what people really think

Question : You are creating a regression model with the input income, education and current debt of a customer,
what could be the possible output from this model.

1. Customer fit as a good
2. Customer fit as acceptable or average category
3. expressed as a percent, that the customer will default on a loan
4. 1 and 3 are correct
5. 2 and 3 are correct

Question : In which of the scenario you can use the regression to predict the values

1. Samsung can use it for mobile sales forecast
2. Mobile companies can use it to forecast manufacturing defects
3. Probability of the celebrity divorce
4. Only 1 and 2
5. All 1 , 2 and 3