SAS Certified BI Content Developer for SAS 9 and Business Analytics Questions and Answer (Dumps and Practice Questions)

Question : Select the correct statement which applies to logistic regression

1. Computationally inexpensive, easy to implement, knowledge representation easy to interpret
2. May have low accuracy
3. Works with Numeric values
4. Only 1 and 3 are correct
5. All 1,2 and 3 are correct

Correct Answer : 5

Logistic regression
Pros: Computationally inexpensive, easy to implement, knowledge representation easy to interpret
Cons: Prone to underfitting, may have low accuracy Works with: Numeric values, nominal values

Question : Suppose training data are oversampled in the event group to make the number of events and
nonevents roughly equal. A logistic regression is run and the probabilities are output to a data set
NEW and given the variable name PE. A decision rule considered is, "Classify data as an event if probability
is greater than 0.5." Also the data set NEW contains a variable TG that indicates whether there
is an event (1=Event, 0= No event). The following SAS program was used.
What does this program calculate?

1. Depth
2. Sensitivity
3. Specificity
4. Positive predictive value

Correct Answers: 2

Explanation: The sensitivity is the proportion of true positive responders (Response=1) that have a positive test result (Test=1).
The specificity is the proportion of true negative responders (Response=0) that have a negative test result (Test=0) = 6/10

Refer study notes as well.

Question : Refer to the exhibit:
The plots represent two models, A and B, being fit to the same two data sets, training and
validation. Model A is 90.5% accurate at distinguishing blue from red on the training data and 75.5% accurate
at doing the same on validation data. Model B is 83% accurate at distinguishing blue from red on
the training data and 78.3% accurate at doing the same on the validation data.
Which of the two models should be selected and why?

1. Model A. It is more complex with a higher accuracy than model B on training data.

2. Model A. It performs better on the boundary for the training data.

3. Model B. It is more complex with a higher accuracy than model A on validation data.

4. Model B. It is simpler with a higher accuracy than model A on validation data.

Correct Answer : 4

Related Questions

Question : What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data
prior to partitioning the data for honest assessment as opposed to performing the data cleansing
after partitioning the data?

1. It violates assumptions of the model
2. It requires extra computational effort and time.
3. It omits the training (and test) data sets from the benefits of the cleansing methods.
4. There is no ability to compare the effectiveness of different cleansing methods.

Question : A company has branch offices in eight regions. Customers within each region are classified as either "High Value"
or "Medium Value" and are coded using the variable name VALUE. In the last year, the total
amount of purchases per customer is used as the response variable. Suppose there is a significant
interaction between REGION and VALUE. What can you conclude?

1. More high value customers are found in some regions than others.
2. The difference between average purchases for medium and high value customers depends on the region
3. Regions with higher average purchases have more high value customers.
4. Regions with higher average purchases have more medium value customers.

Question : This question will ask you to provide a missing option.
Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure:
Means Region / (insert option here) =levene;

1. test
2. adjust
3. var
4. hovtest

Question : Refer to the exhibit.
Based on the control plot, which conclusion is
justified regarding the means of the response?

1. All groups are significantly different from each other.
2. 2XL is significantly different from all other groups
3. Only XL and 2XL are not significantly different from each other.
4. No groups are significantly different from each other.

Question : Customers were surveyed to assess their intent to purchase a product. An analyst divided the customers
into groups defined by the company's pre-assigned market segments and tested for difference in the customers'
average intent to purchase. The following is the output from the GLM procedure:
What percentage of customers' intent to purchase is explained by market segment?

1. less than 0.01%
2. 35%
3. 65%
4. 76%

Question : Refer to the exhibit:
The box plot was used to analyze daily sales data following three different ad campaigns.
The business analyst concludes that one of the assumptions of ANOVA was violated.
Which assumption has been violated and why?

1. Normality, because Prob > F less than .0001.
2. title Normality, because the interquartile ranges are different in different ad campaigns.
3. Constant variance, because Prob > F less than .0001.
4. Constant variance, because the interquartile ranges are different in different ad campaigns