Question : Which of the following describes a concordant pair of observations in the LOGISTIC procedure? 1. An observation with the event has an equal probability as another observation with the event. 2. An observation with the event has a lower predicted probability than the observation without the event. 3. Access Mostly Uused Products by 50000+ Subscribers 4. An observation with the event has a higher predicted probability than the observation without the event
Correct Answer : Get Lastest Questions and Answer : Percent Concordant - A pair of observations with different observed responses is said to be concordant if the observation with the lower ordered response value (honcomp = 0) has a lower predicted mean score than the observation with the higher ordered response value (honcomp = 1). Percent Discordant - If the observation with the lower ordered response value has a higher predicted mean score than the observation with the higher ordered response value, then the pair is discordant. Percent Tied - If a pair of observations with different responses is neither concordant nor discordant, it is a tie.
Question : Refer to the exhibit An analyst examined logistic regression models for predicting whether a customer would make a purchase. The ROC curve displayed summarizes the models. Using the selected model and the analyst's decision rule, 25% of the customers who did not make a purchase are incorrectly classified as purchasers. What can be concluded from the graph? 1. About 25% of the customers who did make a purchase are correctly classified as making a purchase. 2. About 50% of the customers who did make a purchase are correctly classified as making a purchase. 3. Access Mostly Uused Products by 50000+ Subscribers 4. About 95% of the customers who did make a purchase are correctly classified as making a purchase.
Correct Answer : Get Lastest Questions and Answer : Explanation: Use of predicted probabilities for ROC gives information about how well the linear combination of indicator variables distinguish between a case and a non-case. However, as far as I understand, this method will not help much in determining the cut-off scores in terms of Raw scores. The method that I followed is detailed below. First, I selected those indicators that worked significantly better than others by examining the significance of difference in terms of AUC. For this purpose I used Sigma Plot. After, selecting the significantly better indicators, I conducted an exploratory factor analysis to see whether the indicators can be pooled in fewer yet meaningful factors. The obtained factor structure was then tested using confirmatory factor analytic approach (using AMOS) and if a good fit was noted then two approaches were followed to obtain a single score representing the linear combination of the variables forming a given factor. First approach used the latent factor score (using the option for getting imputed score in the AMOS) and the second approach followed the aggregate score (i.e. the sum of the variables constituting the given factor). The ROC was conducted using both the scores and the AUCs were compared for significance of difference. Further, the AUCs of the composite scores were also compared with the AUCs of individual indicators to establish that the linear combination does better than any of the single indicators. If the combination score performed better than any individual indicator then using the Yoden index as well as the intersection of the sensitivity and specificity the cutoff score for the composite score was computed. For this purpose we used Sigma Plot for computation and the SPSS for plotting the values (we used SPSS because we are well versed with it otherwise it can be done using any software including the Sigma Plot and even the Excel).
The aforesaid approach, however, will provide information that combination of various indicators perform better (or do not perform better) than any of the single indicator in making a diagnosis. It will not provide information about the combination of the individual cut-off scores of various indicators in making a better diagnosis as compared to any of the indicator alone. For this purpose, MedCal ( a software) can provide help. However, we did not use it in our own research as the sample size was not much large for this purpose. If you are interested in this then I think the use of MedCalc may help you in getting your answer. The general approach (using the MedCalc) is to conduct the ROC for a single (best) indicator and determine the cut-off score. Then to filter the cases using this cutoff score (i.e., select cases with a score higher than or equal to the cutoff score and then add the next best indicator and perform the ROC and determine the cutoff score for this second indicator. The sensitivity and specificity associated with this second indicator (obtained from the cases having a score higher than or equal to the cutoff score on the first indicator) is in fact the sensitivity and specificity for the combination of the cutoff scores of the two indicators.
For instance, the analysis using the indicator X yielded an equal sensitivity and specificity for a cutoff score of say 10 or higher then one will select the cases with score 10 or higher on the variable X. Now on this sub-sample of cases one will perform ROC using another indicator (say Y). If the analysis revealed that a cutoff score of 12 on Y results in a sensitivity and specificity higher than the X alone (say equal sensitivity and specificity of 93%) then one may conclude that a score of 10 or higher on X combined with a score of 12 or higher on Y gives a better diagnostic accuracy ( with sensitivity and specificity of 93%) than X or Y alone. To support this conclusion the comparison of the AUCs of X, Y and the combination of X and Y will be required. To understand that the ROC curve is a plot where the points on the plot are calculated from the counts in the confusion matrix for a given model score cut-off. If you take the output of the ctable pprob=0.1 to 1 by 0.1 then you have the counts of TN TP FN FP that allow you to calculate the x and y coordinates on the roc curve for 10 different pr cut-offs. What you need to understand is what is the cost matrix associated with TN TP FN FP so that you can make decisions about where is the optimal cut-off for your particular problem. From your question, it looks like you need to do some more study to understand what a roc curve represents, and how to use a risk score generated by a logistic regression. Actually a risk score generated by a model (which does not actually have to be a statistical model)
Question : One common approach for predicting rare events in the LOGISTIC procedure is to build a model that disproportionately over-re presents those cases with an event occurring (e.g. a 50-50 event/non-event split). What problem does this present?