Question : Which SAS program will divide the original data set into % training and % validation data sets, stratified by county? 1. A 2. B 3. C 4. D
Correct Answer : 3 Explanation: SAMPRATE=r RATE=r : specifies the sampling rate, which is the proportion of units to select for the sample. The sampling rate r must be a positive number. You can specify r as a number between 0 and 1. Or you can specify r in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%. The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the sampling rate r as the interval. See the section Systematic Random Sampling for details. For other selection methods, PROC SURVEYSELECT converts the sampling rate r to the sample size before selection by multiplying the total number of units in the stratum or frame by the sampling rate and rounding up to the nearest integer. If you request a stratified sample design with the STRATA statement and specify the SAMPRATE=r option, PROC SURVEYSELECT uses the sampling rate r for each stratum. If you do not want to use the same sampling rate for each stratum, use the SAMPRATE=(values) option or the SAMPRATE=SAS-data-set option to specify a sampling rate for each stratum. SAMPRATE=(values) RATE=(values) : specifies stratum sampling rates. You can separate values with blanks or commas. The number of SAMPRATE= values must equal the number of strata in the input data set. List the stratum sampling rate values in the order in which the strata appear in the input data set. When you use the SAMPRATE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement. Each stratum sampling rate value must be a positive number. You can specify a rate value as a number between 0 and 1. Or you can specify a rate value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%. The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section Systematic Random Sampling for details about systematic sampling. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to a stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding up to the nearest integer. SAMPRATE=SAS-data-set | RATE=SAS-data-set : names a SAS data set that contains stratum sampling rates. The SAMPRATE= data set should have a variable _RATE_ that contains the sampling rate for each stratum. Each sampling rate value must be a positive number. You can specify each value as a number between 0 and 1. Or you can specify a value in percentage form as a number between 1 and 100, and PROC SURVEYSELECT converts that number to a proportion. The procedure treats the value 1 as 100%, and not the percentage form 1%. The SAMPRATE= input data set should contain all the STRATA variables, with the same type and length as in the DATA= data set. The STRATA groups should appear in the same order in the SAMPRATE= data set as in the DATA= data set. The SAMPRATE= option is available only for equal probability selection methods (METHOD=SRS, METHOD=URS, METHOD=SYS, and METHOD=SEQ). For systematic random sampling (METHOD=SYS), PROC SURVEYSELECT uses the inverse of the stratum sampling rate as the interval for the stratum. See the section Systematic Random Sampling for details. For other selection methods, PROC SURVEYSELECT converts the stratum sampling rate to the stratum sample size before selection by multiplying the total number of units in the stratum by the sampling rate and rounding up to the nearest integer. SAMPSIZE=n | N=n : specifies the sample size, which is the number of units to select for the sample. The sample size n must be a positive integer. For selection methods that select without replacement, the sample size n must not exceed the number of units in the input data set. If you specify the ALLOC= option in the STRATA statement, PROC SURVEYSELECT allocates the total sample size among the strata according to the allocation method you request in the ALLOC= option. In this case, SAMPSIZE=n specifies the total sample size to be allocated among the strata. Otherwise, if you specify the SAMPSIZE=n option and request a stratified sample design with the STRATA statement, PROC SURVEYSELECT selects n units from each stratum. For methods that select without replacement, the sample size n must not exceed the number of units in any stratum. If you do not want to select the same number of units from each stratum, use the SAMPSIZE=(values) option or the SAMPSIZE=SAS-data-set option to specify a sample size for each stratum. For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum. SAMPSIZE=(values) | N=(values) : specifies sample sizes for the strata. You can separate values with blanks or commas. The number of SAMPSIZE= values must equal the number of strata in the input data set. List the stratum sample size values in the order in which the strata appear in the input data set. When you use the SAMPSIZE=(values) option, the input data set must be sorted by the STRATA variables in ascending order. You cannot use the DESCENDING or NOTSORTED option in the STRATA statement. Each stratum sample size value must be a positive integer. For without-replacement selection methods, by default, PROC SURVEYSELECT does not allow you to specify a stratum sample size that is greater than the total number of units available in the stratum. If you specify the SELECTALL option, PROC SURVEYSELECT selects all stratum units when the stratum sample size exceeds the number of units in the stratum.
Question : Refer to the lift chart: At a depth of 0.1, Lift = 3.14. What does this mean? 1. Selecting the top 10% of the population scored by the model should result in 3.14 times more events than a random draw of 10%. 2. Selecting the observations with a response probability of at least 10% should result in 3.14 times more events than a random draw of 10%. 3. Selecting the top 10% of the population scored by the model should result in 3.14 timesgreater accuracy than a random draw of 10%. 4. Selecting the observations with a response probability of atleast 10% should result in 3.14times greater accuracy than a random draw of 10%.
Correct Answer : 1
Explanation: Refer the Lift Chart Section in study notes
Question : Refer to the lift chart: What does the reference line at lift = 1 corresponds to? 1. The predicted lift for the best 50% of validation data cases 2. The predicted lift if the entire population is scored as event cases 3. The predicted lift if none of the population are scored as event cases 4. The predicted lift if 50% of the population are randomly scored as event cases
Correct Answer : 2
Explanation: Refer the Lift Chart Section in study notes