SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data
prior to partitioning the data for honest assessment as opposed to performing the data cleansing
after partitioning the data?

1. It violates assumptions of the model
2. It requires extra computational effort and time.
3. It omits the training (and test) data sets from the benefits of the cleansing methods.
4. There is no ability to compare the effectiveness of different cleansing methods.

Correct Answer : 4

Question : A company has branch offices in eight regions. Customers within each region are classified as either "High Value"
or "Medium Value" and are coded using the variable name VALUE. In the last year, the total
amount of purchases per customer is used as the response variable. Suppose there is a significant
interaction between REGION and VALUE. What can you conclude?

1. More high value customers are found in some regions than others.
2. The difference between average purchases for medium and high value customers depends on the region
3. Regions with higher average purchases have more high value customers.
4. Regions with higher average purchases have more medium value customers.

Correct Answer : 2

Question : This question will ask you to provide a missing option.
Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure:
Means Region / (insert option here) =levene;

1. test
2. adjust
3. var
4. hovtest

Correct Answer : 4

Explanation: HOVTEST
HOVTEST=BARTLETT
HOVTEST=BF
HOVTEST=LEVENE ( TYPE= ABS | SQUARE )>
HOVTEST=OBRIEN ( W=number )>
requests a homogeneity of variance test for the groups defined by the MEANS effect. You can optionally specify a particular test; if you do not specify a test, Levene's test(Levene; 1960) with TYPE=SQUARE is computed. Note that this option is ignored unless your MODEL statement specifies a simple one-way model.
The HOVTEST=BARTLETT option specifies Bartlett's test (Bartlett; 1937), a modification of the normal-theory likelihood ratio test.
The HOVTEST=BF option specifies Brown and Forsythe's variation of Levene's test (Brown and Forsythe; 1974).
The HOVTEST=LEVENE option specifies Levene's test (Levene; 1960), which is widely considered to be the standard homogeneity of variance test. You can use the TYPE= option in parentheses to specify whether to use the absolute residuals (TYPE=ABS) or the squared residuals (TYPE=SQUARE) in Levene's test. TYPE=SQUARE is the default.
The HOVTEST=OBRIEN option specifies O'Brien's test (O'Brien; 1979), which is basically a modification of HOVTEST=LEVENE(TYPE=SQUARE). You can use the W= option in parentheses to tune the variable to match the suspected kurtosis of the underlying distribution. By default, W=0.5, as suggested by O'Brien (1979, 1981).
See the section Homogeneity of Variance in One-Way Models for more details on these metho

MEANS Statement
MEANS effects (/ options> ;
Within each group corresponding to each effect specified in the MEANS statement, PROC GLM computes the arithmetic means and standard deviations of all continuous variables in the model (both dependent and independent). You can specify only classification effects in the MEANS statement-that is, effects that contain only classification variables.
Note that the arithmetic means are not adjusted for other effects in the model; for adjusted means, see the section LSMEANS Statement.

One of the usual assumptions in using the GLM procedure is that the underlying errors are all uncorrelated with homogeneous variances. You can test this assumption in PROC GLM by using the HOVTEST option in the MEANS statement, requesting a homogeneity of variance test. This section discusses the computational details behind these tests. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Homogeneity of variance testing for more complex models is a subject of current research.
Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory likelihood ratio test (the HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I error rates and optimal power when the underlying distribution of the data is normal, it can be very inaccurate if that distribution is even slightly nonnormal (Box; 1953). Therefore, Bartlett's test is not recommended for routine use.
An approach that leads to tests that are much more robust to the underlying distribution is to transform the original values of the dependent variable to derive a dispersion variable and then to perform analysis of variance on this variable. The significance level for the test of homogeneity of variance is the p-value for the ANOVA test on the dispersion variable. All of the homogeneity of variance tests available in PROC GLM except Bartlett's use this approach.

Related Questions

Question : Select the choice where Regression algorithms are not best fit

1. When the dimension of the object given
2. Weight of the person is given
3. Temperature in the atmosphere
4. Employee status

Question :Logistic regression does not work well in case of binary classification

1. True
2. False

Question : Refer to the ROC curve: As you move along the curve, what changes?

1. The priors in the population
2. The true negative rate in the population
3. The proportion of events in the training data
4. The probability cutoff for scoring

Question : When mean imputation is performed on data after the data is partitioned for honest assessment,
what is the most appropriate method for handling the mean imputation?

1. The sample means from the validation data set are applied to the training and test data sets.
2. The sample means from the training data set are applied to the validation and test data sets.
3. The sample means from the test data set are applied to the training and validation data sets.
4. The sample means from each partition of the data are applied to their own partition.

Question : An analyst generates a model using the LOGISTIC procedure. They are now interested in getting the
sensitivity and specificity statistics on a validation data set for a variety of cutoff values.
Which statement and option combination will generate these statistics?

1. Scoredata=valid1 out=roc;
2. Scoredata=valid1 outroc=roc;
3. mode1resp(event= '1') = gender region/outroc=roc;
4. mode1resp(event"1") = gender region/ out=roc;
Correct answer: 2
The OUTROC= data set contains data necessary for producing the ROC curve. It names the SAS data set that contains the ROC curve for the DATA= data set. The ROC curve is computed only for binary response data. The SCORE statement creates a data set that contains all the data in the DATA= data set together with posterior probabilities and, optionally, prediction confidence intervals. Fit statistics are displayed on request. If you have binary response data, the SCORE statement can be used to create a data set containing data for the ROC curve. You can specify several SCORE statements.

Question : In partitioning data for model assessment, which sampling methods are acceptable?

A. Simple random sampling without replacement
B. Simple random sampling with replacement
C. Stratified random sampling without replacement
D. Sequential random sampling with replacement

1. A,B
2. B,C
3. A,D
4. A,C
5. A,B,C

Question : RMSE measures error of a predicted

1. Numerical Value
2. Categorical values
3. For booth Numerical and categorical values
4. None of the above