Question : What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data? 1. It violates assumptions of the model 2. It requires extra computational effort and time. 3. It omits the training (and test) data sets from the benefits of the cleansing methods. 4. There is no ability to compare the effectiveness of different cleansing methods.
Correct Answer : 4
Question : A company has branch offices in eight regions. Customers within each region are classified as either "High Value" or "Medium Value" and are coded using the variable name VALUE. In the last year, the total amount of purchases per customer is used as the response variable. Suppose there is a significant interaction between REGION and VALUE. What can you conclude?
1. More high value customers are found in some regions than others. 2. The difference between average purchases for medium and high value customers depends on the region 3. Regions with higher average purchases have more high value customers. 4. Regions with higher average purchases have more medium value customers.
Correct Answer : 2
Question : This question will ask you to provide a missing option. Complete the following syntax to test the homogeneity of variance assumption in the GLM procedure: Means Region / (insert option here) =levene; 1. test 2. adjust 3. var 4. hovtest
Correct Answer : 4
Explanation: HOVTEST HOVTEST=BARTLETT HOVTEST=BF HOVTEST=LEVENE ( TYPE= ABS | SQUARE )> HOVTEST=OBRIEN ( W=number )> requests a homogeneity of variance test for the groups defined by the MEANS effect. You can optionally specify a particular test; if you do not specify a test, Levene's test(Levene; 1960) with TYPE=SQUARE is computed. Note that this option is ignored unless your MODEL statement specifies a simple one-way model. The HOVTEST=BARTLETT option specifies Bartlett's test (Bartlett; 1937), a modification of the normal-theory likelihood ratio test. The HOVTEST=BF option specifies Brown and Forsythe's variation of Levene's test (Brown and Forsythe; 1974). The HOVTEST=LEVENE option specifies Levene's test (Levene; 1960), which is widely considered to be the standard homogeneity of variance test. You can use the TYPE= option in parentheses to specify whether to use the absolute residuals (TYPE=ABS) or the squared residuals (TYPE=SQUARE) in Levene's test. TYPE=SQUARE is the default. The HOVTEST=OBRIEN option specifies O'Brien's test (O'Brien; 1979), which is basically a modification of HOVTEST=LEVENE(TYPE=SQUARE). You can use the W= option in parentheses to tune the variable to match the suspected kurtosis of the underlying distribution. By default, W=0.5, as suggested by O'Brien (1979, 1981). See the section Homogeneity of Variance in One-Way Models for more details on these metho
MEANS Statement MEANS effects (/ options> ; Within each group corresponding to each effect specified in the MEANS statement, PROC GLM computes the arithmetic means and standard deviations of all continuous variables in the model (both dependent and independent). You can specify only classification effects in the MEANS statement-that is, effects that contain only classification variables. Note that the arithmetic means are not adjusted for other effects in the model; for adjusted means, see the section LSMEANS Statement.
One of the usual assumptions in using the GLM procedure is that the underlying errors are all uncorrelated with homogeneous variances. You can test this assumption in PROC GLM by using the HOVTEST option in the MEANS statement, requesting a homogeneity of variance test. This section discusses the computational details behind these tests. Note that the GLM procedure allows homogeneity of variance testing for simple one-way models only. Homogeneity of variance testing for more complex models is a subject of current research. Bartlett (1937) proposes a test for equal variances that is a modification of the normal-theory likelihood ratio test (the HOVTEST=BARTLETT option). While Bartlett's test has accurate Type I error rates and optimal power when the underlying distribution of the data is normal, it can be very inaccurate if that distribution is even slightly nonnormal (Box; 1953). Therefore, Bartlett's test is not recommended for routine use. An approach that leads to tests that are much more robust to the underlying distribution is to transform the original values of the dependent variable to derive a dispersion variable and then to perform analysis of variance on this variable. The significance level for the test of homogeneity of variance is the p-value for the ANOVA test on the dispersion variable. All of the homogeneity of variance tests available in PROC GLM except Bartlett's use this approach.
Question : In partitioning data for model assessment, which sampling methods are acceptable?
A. Simple random sampling without replacement B. Simple random sampling with replacement C. Stratified random sampling without replacement D. Sequential random sampling with replacement