SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : A predictive model uses a data set that has several variables with missing values.
What two problems can arise with this model?

A. The model will likely be overfit.
B. There will be a high rate of collinearity among input variables.
C. Complete case analysis means that fewer observations will be used in the model building process.
D. New cases with missing values on input variables cannot be scored without extra data processing.

1. A,B
2. B,C
3. Access Mostly Uused Products by 50000+ Subscribers
4. A,D

Correct Answer : Get Lastest Questions and Answer :

Many of the input variables in the Donor data set that you have been using have missing values. If an observation contains a missing value, then by default that observation is not used for modeling by nodes such as Variable Selection, Neural Network, or Regression.
Depending on the type of predictive model that you build, missing values can cause problems. If your model is based on a decision tree, missing values cause no problems because decision trees handle missing values directly.
However, in Enterprise Miner, regression and neural network models ignore observations that contain missing values. Substantially reducing the size of the training data set can weaken these predictive models. It is wise to impute missing values before you fit a regression model or neural network model. When you replace missing observations with imputed data, regression and neural network algorithms are able to perform whole-case analysis on the entire training data set. If you do not impute missing values for these models, the missing values might result in the creation of an inferior model. Additionally, it is important to impute missing values if you are planning to compare a regression model or neural network model with a decision tree model, because it is more appropriate to compare models that are built on the same set of observations.

Question : Spearman statistics in the CORR procedure are useful for screening for irrelevant variables by
investigating the association between which function of the input variables?

1. Concordant and discordant pairs of ranked observations
2. Logit link (log(p/1-p))
3. Access Mostly Uused Products by 50000+ Subscribers
4. Weighted sum of chi-square statistics for 2x2 tables

Correct Answer : Get Lastest Questions and Answer :

Explanation:
SPEARMAN Requests Spearman rank-order correlation
requests a table of Spearman correlation coefficients based on the ranks of the variables. The correlations range from 1 to 1. If you specify a WEIGHT statement, the SPEARMAN option is invalid.
PROC CORR computes the Spearman correlation by ranking the data and using the ranks in the Pearson product-moment correlation formula. In case of ties, the averaged ranks are used.

Question : A non-contributing predictor variable (Pr > |t| =.) is added to an existing multiple linear regression model. What will be the result?

1. An increase in R-Square
2. A decrease in R-Square
3. Access Mostly Uused Products by 50000+ Subscribers
4. No change in R-Square

Correct Answer : Get Lastest Questions and Answer :

R2 is a statistic that will give some information about the goodness of fit of a model. In regression, the R2 coefficient of determination is a statistical measure of how well the regression line approximates the real data points. An R2 of 1.0 indicates that the regression line perfectly fits the data.

In some (but not all) instances where R2 is used, the predictors are calculated by ordinary least-squares regression: that is, by minimizing SSE. In this case R-squared increases as we increase the number of variables in the model (R-squared will not decrease). This illustrates a drawback to one possible use of R2, where one might try to include more variables in the model until "there is no more improvement". This leads to the alternative approach of looking at the adjusted R2. The explanation of this statistic is almost the same as R-squared but it penalizes the statistic as extra variables are included in the model. For cases other than fitting by ordinary least squares, the R2statistic can be calculated as above and may still be a useful measure. However, the conclusion that that R-squared increases with extra variables no longer holds, but downward variations are usually small. If fitting is by weighted least squares or generalized least squares, alternative versions of R2can be calculated appropriate to those statistical frameworks, while the "raw" R2 may still be useful if it is more easily interpreted. Values for R2 can be calculated for any type of predictive model, which need not have a statistical basis.

Related Questions

Question : An analyst has a sufficient volume of data to perform a -way partition of the data into training,
validation, and test sets to perform honest assessment during the model building process. What is the purpose of the test data set?

1. To provide a unbiased measure of assessment for the final model.
2. To compare models and select and fine-tune the final model.
3. To reduce total sample size to make computations more efficient.
4. To build the predictive models.

Question : Refer to the confusion matrix:
Calculate the sensitivity. (0 - negative outcome, 1 - positive outcome)
Click the calculator button to display a calculator if needed.

1. 25/48
2. 58/102
3. 25/89
4. 58/81

Question :

The total modeling data has been split into training, validation, and test data. What is the best data to use for model assessment?

1. Training data
2. Total data
3. Test data
4. Validation data

Question : What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data
prior to partitioning the data for honest assessment as opposed to performing the data cleansing
after partitioning the data?

1. It violates assumptions of the model
2. It requires extra computational effort and time.
3. It omits the training (and test) data sets from the benefits of the cleansing methods.
4. There is no ability to compare the effectiveness of different cleansing methods.

Question : A company has branch offices in eight regions. Customers within each region are classified as either "High Value"
or "Medium Value" and are coded using the variable name VALUE. In the last year, the total
amount of purchases per customer is used as the response variable. Suppose there is a significant
interaction between REGION and VALUE. What can you conclude?

1. More high value customers are found in some regions than others.
2. The difference between average purchases for medium and high value customers depends on the region
3. Regions with higher average purchases have more high value customers.
4. Regions with higher average purchases have more medium value customers.

Question : This question will ask you to provide a missing option.
Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure:
Means Region / (insert option here) =levene;

1. test
2. adjust
3. var
4. hovtest