Question : What is the default method in the LOGISTIC procedure to handle observations with missing data? 1. Missing values are imputed. 2. Parameters are estimated accounting for the missing values. 3. Parameter estimates are made on all available data. 4. Only cases with variables that are fully populated are used.
Correct Answer 4 : Explanation: The logistic regression method is another imputation method available for classification variables in a data set with a monotone missing pattern. In the logistic regression method, a logistic regression model is fitted for a classification variable with a set of covariates constructed from the effects. For a binary classification variable, based on the fitted regression model, a new logistic regression model is simulated from the posterior predictive distribution of the parameters and is used to impute the missing values for each variable
Any observation with missing values for the response, offset, strata, or explanatory variables is excluded from the analysis; however, missing values are valid for variables specified with the MISSING option in the CLASS or STRATA statement. Observations with a nonpositive or missing weight or with a frequency less than 1 are also excluded. The estimated linear predictor and its standard error estimate, the fitted probabilities and confidence limits, and the regression diagnostic statistics are not computed for any observation with missing offset or explanatory variable values. However, if only the response value is missing, the linear predictor, its standard error, the fitted individual and cumulative probabilities, and confidence limits for the cumulative probabilities can be computed and output to a data set by using the OUTPUT statement.
Missing data are a part of almost all research, and we all have to decide how to deal with it from time to time. There are a number of alternative ways of dealing with missing data
If you have missing values in your survey data for any reason, such as nonresponse, this can compromise the quality of your survey results. If the respondents are different from the nonrespondents with regard to a survey effect or outcome, then survey estimates might be biased and cannot accurately represent the survey population. There are a variety of techniques in sample design and survey operations that can reduce nonresponse. After data collection is complete, you can use imputation to replace missing values with acceptable values, and/or you can use sampling weight adjustments to compensate for nonresponse. You should complete this data preparation and adjustment before you analyze your data with PROC SURVEYLOGISTIC. See Cochran (1977), Kalton and Kaspyzyk (1986), and Brick and Kalton (1996) for more information. If an observation has a missing value or a nonpositive value for the WEIGHT or FREQ variable, then that observation is excluded from the analysis. An observation is also excluded if it has a missing value for any design (STRATA, CLUSTER, or DOMAIN) variable, unless you specify the MISSING option in the PROC SURVEYLOGISTIC statement. If you specify the MISSING option, the procedure treats missing values as a valid (nonmissing) category for all categorical variables. By default, if an observation contains missing values for the response, offset, or any explanatory variables used in the independent effects, the observation is excluded from the analysis. This treatment is based on the assumption that the missing values are missing completely at random (MCAR). However, this assumption is not true sometimes. For example, evidence from other surveys might suggest that observations with missing values are systematically different from observations without missing values. If you believe that missing values are not missing completely at random, then you can specify the NOMCAR option to include these observations with missing values in the dependent variable and the independent variables in the variance estimation. Whether or not the NOMCAR option is used, observations with missing or invalid values for WEIGHT, FREQ, STRATA, CLUSTER, or DOMAIN variables are always excluded, unless the MISSING option is also specified. When you specify the NOMCAR option, the procedure treats observations with and without missing values for variables in the regression model as two different domains, and it performs a domain analysis in the domain of nonmissing observations. If you use a REPWEIGHTS statement, all REPWEIGHTS variables must contain nonmissing values.
Question : Given the output from the LOGISTIC procedure: Which variables, among those that are statistically significant at an alpha of 0.05, have the greatest and least relative importance on the fitted model?
1. A. Greatest: MBA Least: DOWN_AMT 2. Greatest: MBA Least: CASH 3. Greatest: DOWN_AMT Least: CASH 4. Greatest: DOWN_AMT Least: HOME
Correct Answer 3 :
Explanation: Chi-Square, DF and Pr > ChiSq - These are the Chi-Square test statistic, Degrees of Freedom (DF) and associated p-value (PR>ChiSq) corresponding to the specific test that all of the predictors are simultaneously equal to zero. We are testing the probability (PR>ChiSq) of observing a Chi-Square statistic as extreme as, or more so, than the observed one under the null hypothesis; the null hypothesis is that all of the regression coefficients in the model are equal to zero. The DF defines the distribution of the Chi-Square test statistics and is defined by the number of predictors in the model. Typically, PR>ChiSq is compared to a specified alpha level, our willingness to accept a type I error, which is often set at 0.05 or 0.01.
Point Estimate - Underneath are the odds ratio corresponding to Effect. The odds ratio is obtained by exponentiating the Estimate, exp[Estimate]. The difference in the log of two odds is equal to the log of the ratio of these two odds. The log of the ratio of two odds is the log odds ratio. Hence, the interpretation of Estimate the coefficient was interpreted as the difference in log-odds--could also be done in terms of log-odds ratio. When the Estimate is exponentiated, the log-odds ratio becomes the odds ratio. We can interpret the odds ratio as follows: for a one unit change in the predictor variable, the odds ratio for a positive outcome is expected to change by the respective coefficient, given the other variables in the model are held constant.
Note : Refer study notes as well, sent to you as an attachment
Question : A marketing manager attempts to determine those customers most likely to purchase additional products as the result of a nation-wide marketing campaign. The manager possesses a historical dataset (CAMPAIGN) of a similar campaign from last year.It has the following characteristics:
Target variable Respond (0,1) Continuous predictor Income Categorical predictor Homeowner(Y,N)
Which SAS program performs this analysis?
1. A 2. B 3. C 4. D
Correct Answer : 1
Explanation: The CLASS statement names the classification variables to be used in the analysis. The CLASS statement must precede the MODEL statement. Most options can be specified either as individual variable options or as global options. You can specify options for each variable by enclosing the options in parentheses after the variable name. You can also specify global options for the CLASS statement by placing the options after a slash (/). Global options are applied to all the variables specified in the CLASS statement. If you specify more than one CLASS statement, the global options specified in any one CLASS statement apply to all CLASS statements. However, individual CLASS variable options override the globaloptions. DESCENDING DESC reverses the sorting order of the classification variable. If both the DESCENDING and ORDER= options are specified, PROC LOGISTIC orders the categories according to the ORDER= option and then reverses that order.
PROC LOGISTIC initially parameterizes the CLASS variables by looking at the levels of the variables across the complete data set. If you have an unbalanced replication of levels across variables or BY groups, then the design matrix and the parameter interpretation might be different from what you expect. For instance, suppose you have a model with one CLASS variable A with three levels (1, 2, and 3), and another CLASS variable B with two levels (1 and 2). If the third level of A occurs only with the first level of B, if you use the EFFECT parameterization, and if your model contains the effect A(B) and an intercept, then the design for A within the second level of B is not a differential effect. PROC LOGISTIC detects linear dependency among the last two design variables and sets the parameter for A2(B 2) to zero, resulting in an interpretation of these parameters as if they were reference- or dummy-coded. The REFERENCE or GLM parameterization might be more appropriate for such problems.