Question :Assume that SAS data sets Sasdata.Products and Sasdata.Sales both contain the Prod_ID variable. Which of the following SAS DATA steps returns only exceptions or non matches? a. libname sasdata 'SAS-data-library'; data all; merge sasdata.products sasdata.sales; by prod_id; if ins=1 or inp=1; run; b. libname sasdata 'SAS-data-library'; data all; merge sasdata.products(in=inp) sasdata.sales(in=ins); by prod_id; if ins=1 and inp=1; run; c. libname sasdata 'SAS-data-library'; data all; merge sasdata.products(in=inp) sasdata.sales(in=ins); by prod_id; if ins=0 and inp=0; run; d. libname sasdata 'SAS-data-library'; data all; merge sasdata.products(in=inp) sasdata.sales(in=ins); by prod_id; if ins=0 or inp=0; run; 1. a 2. b 3. Access Mostly Uused Products by 50000+ Subscribers 4. d
Explanation: By default, DATA step match-merges combine all observations in all input data sets. To include only unmatched observations from your output data set, you can use the IN= data set option to create and name a temporary variable that indicates whether the data set contributed to the current observations. If the value of the IN= variable is 0, the data set did not contribute to the current observation; if the value is 1, the data set did contribute to the current observation. You can use a subsetting IF statement to only include those observations that have a value of 0 for the IN= variable of either the Sasdata.Products data set or the Sasdata.Sales data set. Since an unmatched observation might come from either input data set, you do not need to specify that the IN= variables for both Sasdata.Products and Sasdata.Sales have values of 0.
Question :
The following SAS program is submitted:
libname sasdata 'SAS-data-library'; libname labdata 'SAS-data-library'; data labdata.boston labdata.dallas(drop=city dest equipment); set sasdata.cities(keep=orig dest city price equipment); if dest='BOS' then output labdata.boston; else if dest='DFW' then output labdata.dallas; run;
Explanation: In the program above, the KEEP= option specifies that 5 variables (orig, dest, city, price, and equipment) are read from the input data set. All of these variables are output to Labdata.Boston. In the Labdata.Dallas output data set, the DROP= option specifies that city, dest, and equipment are excluded from the data set so that only orig and price are included.
Question : The following SAS program is submitted:
proc contents data=sasuser._all_ nods;run;
Which one of the following is produced as output?
1. the list of all data set names in the Sasuser library only 2. the descriptor portion of the data set named Sasuser._All_ 3. Access Mostly Uused Products by 50000+ Subscribers 4. the list of data set named in the Sasuser library plus the descriptor portion of every data set in the Sasuser library.
Explanation: You can use the CONTENTS procedure to create SAS output that describes the contents of a library. _ALL_ requests a listing of all files in the library, and NODS suppresses the printing of detailed information about each file in the output.
Without this table is to generate a list of words and each table's description. NODS suppresses the printing of detailed information about each file in the output.
Question : After a SAS program is submitted, the following is written to the SAS log:
What changes should be made to keep the statement to correct the errors in the log? 1. keep product sales; 2. keep product, sales; 3. Access Mostly Uused Products by 50000+ Subscribers 4. keep = (product sales); Ans : 1 Exp :
Question : Which SAS statement correctly uses formatted input to read the values in this order: Item (first field), UnitCost (second field), Quantity (third field)? 1. input @1 Item $9. +1 UnitCost comma6. @18 Quantity 3.; 2. input Item $9. @11 UnitCost comma6. @18 Quantity 3.; 3. Access Mostly Uused Products by 50000+ Subscribers @18 Quantity 3.; 4. all of the above Ans : 4 Exp : The default location of the column pointer control is column 1, so a column pointer control is optional for reading the first field. You can use the @n or +n pointer controls to specify the beginning column of the other fields. You can use the $w. informat to read the values for Item, the COMMAw.d informat for UnitCost, and the w.d informat for Quantity.
Question :
Which raw data file requires the PAD option in the INFILE statement in order to correctly read the data using either column input or formatted input? 1. a 2. b 3. Access Mostly Uused Products by 50000+ Subscribers 4. d Ans : 1 Exp : Use the PAD option in the INFILE statement to read variable-length records that contain fixed-field data. The PAD option pads each record with blanks so that all data lines have the same length.
Column input is useful for reading standard values only.
Column input enables you to read standard data values that are aligned in columns in the data records. Specify the variable name, followed by a dollar sign ($) if it is a character variable, and specify the columns in which the data values are located in each record: data scores; infile datalines truncover; input name $ 1-12 score2 17-20 score1 27-30; datalines; Riley 1132 987 Henderson 1015 1102 ; Note: Use the TRUNCOVER option in the INFILE statement to ensure that SAS handles data values of varying lengths appropriately. [cautionend] To use column input, data values must be: in the same field on all the input lines in standard numeric or character form. Note: You cannot use an informat with column input. [cautionend] Features of column input include the following: Character values can contain embedded blanks. Character values can be from 1 to 32,767 characters long. Placeholders, such as a single period (.), are not required for missing data. Input values can be read in any order, regardless of their position in the record. Values or parts of values can be reread. Both leading and trailing blanks within the field are ignored. Values do not need to be separated by blanks or other delimiters.
Formatted input combines the flexibility of using informats with many of the features of column input. By using formatted input, you can read nonstandard data for which SAS requires additional instructions. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. The INPUT statement in the following DATA step uses formatted input and pointer controls. Note that $12. and COMMA5. are informats and +4 and +6 are column pointer controls.
Question : The raw data file referenced by the fileref Employee contains data that is 1. arranged in fixed fields 2. free-format 3. Access Mostly Uused Products by 50000+ Subscribers 4. arranged in columns Ans : 2 Exp : The raw data file contains data that is free-format, meaning that the data is not arranged in columns or fixed fields.
Free-Format Data External files can contain raw data that is free-format; that is, the data is not arranged in fixed fields. The fields can be separated by blanks, or by some other delimiter, such as commas.
Using List Input Free-format data can easily be read with list input because you do not need to specify column locations of the data. You simply list the variable names in the same order as the corresponding raw data fields. You must distinguish character variables from numeric variables by using the dollar ($) sign.
When characters other than blanks are used to separate the data values, you can specify the field delimiter by using the DLM= option in the INFILE statement.
You can also specify a range of variables in the INPUT statement when the variable values in the raw data file are sequential and are separated by blanks (or by some other delimiter). This is especially useful if your data contains similar variables, such as the answers to a questionnaire.
In its simplest form, list input places several limitations on the types of data that can be read.
Reading Missing Values If your data contains missing values at the end of a record, you can use the INFILE statement with the MISSOVER option to prevent SAS from going to the next record to find the missing values.
If your data contains missing values at the beginning or in the middle of a record, you might be able to use the DSD option in the INFILE statement to correctly read the raw data. The DSD option sets the default delimiter to a comma and treats two consecutive delimiters as a missing value.
If the data uses multiple delimiters or a single delimiter other than a comma, you can use both the DSD option and the DLM= option in the INFILE statement.
The DSD option can also be used to read raw data when there is a missing value at the beginning of a record, as long as a delimiter precedes the first value in the record.
Question : Which input style should be used to read the values in the raw data file that is referenced by the fileref Employee?
1. column 2. formatted 3. Access Mostly Uused Products by 50000+ Subscribers 4. mixed Ans : 3 Exp : List input should be used to read data that is free-format because you do not need to specify the column locations of the data.
Free-Format Data External files can contain raw data that is free-format; that is, the data is not arranged in fixed fields. The fields can be separated by blanks, or by some other delimiter, such as commas.
Using List Input Free-format data can easily be read with list input because you do not need to specify column locations of the data. You simply list the variable names in the same order as the corresponding raw data fields. You must distinguish character variables from numeric variables by using the dollar ($) sign.
When characters other than blanks are used to separate the data values, you can specify the field delimiter by using the DLM= option in the INFILE statement.
You can also specify a range of variables in the INPUT statement when the variable values in the raw data file are sequential and are separated by blanks (or by some other delimiter). This is especially useful if your data contains similar variables, such as the answers to a questionnaire.
In its simplest form, list input places several limitations on the types of data that can be read.
Reading Missing Values If your data contains missing values at the end of a record, you can use the INFILE statement with the MISSOVER option to prevent SAS from going to the next record to find the missing values.
If your data contains missing values at the beginning or in the middle of a record, you might be able to use the DSD option in the INFILE statement to correctly read the raw data. The DSD option sets the default delimiter to a comma and treats two consecutive delimiters as a missing value.
If the data uses multiple delimiters or a single delimiter other than a comma, you can use both the DSD option and the DLM= option in the INFILE statement.
The DSD option can also be used to read raw data when there is a missing value at the beginning of a record, as long as a delimiter precedes the first value in the record.
Question : Which SAS program was used to create the raw data file hadoopexam from the SAS data set Work.Scores? 1. data _null_; set work.scores; file 'c:\data\hadoopexam' dlm=','; put name highscore team; run; 2. data _null_; set work.scores; file 'c:\data\hadoopexam' dlm=' '; put name highscore team; run; 3. Access Mostly Uused Products by 50000+ Subscribers set work.scores; file 'c:\data\hadoopexam' dsd; put name highscore team; run; 4. data _null_; set work.scores; file 'c:\data\hadoopexam'; put name highscore team; run; Ans :3 Exp : You can use the DSD option in the FILE statement to specify that data values containing commas should be enclosed in quotation marks. The DSD option uses a comma as the delimiter by default.
SAS does not properly recognize empty values for delimited data unless you use the dsd option. You need to use the dsd option on the infile statement if two consecutive delimiters are used to indicate missing values (e.g., two consecutive commas, two consecutive tabs). Below, we read the exact same file again, except that we use the dsd option.
DATA cars2; length make $ 20 ; INFILE 'readdsd.txt' DELIMITER=',' DSD ; INPUT make mpg weight price; RUN;
PROC PRINT DATA=cars2; RUN;
Question :
Which SAS statement reads the raw data values in order and assigns them to the variables shown below? Variables: FirstName (character), LastName (character), Age (numeric), School (character), Class (numeric) 1. input FirstName $ LastName $ Age School $ Class; 2. input FirstName LastName Age School Class; 3. Access Mostly Uused Products by 50000+ Subscribers School $ 17-19 Class 21; 4. input FirstName 1-4 LastName 6-12 Age 14-15 School 17-19 Class 21; Ans : 1 Exp : Because the data is free-format, list input is used to read the values. With list input, you simply name each variable and identify its type. Free-Format Data External files can contain raw data that is free-format; that is, the data is not arranged in fixed fields. The fields can be separated by blanks, or by some other delimiter, such as commas.
Using List Input Free-format data can easily be read with list input because you do not need to specify column locations of the data. You simply list the variable names in the same order as the corresponding raw data fields. You must distinguish character variables from numeric variables by using the dollar ($) sign.
When characters other than blanks are used to separate the data values, you can specify the field delimiter by using the DLM= option in the INFILE statement.
You can also specify a range of variables in the INPUT statement when the variable values in the raw data file are sequential and are separated by blanks (or by some other delimiter). This is especially useful if your data contains similar variables, such as the answers to a questionnaire.
In its simplest form, list input places several limitations on the types of data that can be read.
Reading Missing Values If your data contains missing values at the end of a record, you can use the INFILE statement with the MISSOVER option to prevent SAS from going to the next record to find the missing values.
If your data contains missing values at the beginning or in the middle of a record, you might be able to use the DSD option in the INFILE statement to correctly read the raw data. The DSD option sets the default delimiter to a comma and treats two consecutive delimiters as a missing value.
If the data uses multiple delimiters or a single delimiter other than a comma, you can use both the DSD option and the DLM= option in the INFILE statement.
The DSD option can also be used to read raw data when there is a missing value at the beginning of a record, as long as a delimiter precedes the first value in the record.
Question :
Which SAS statement should be used to read the raw data file that is referenced by the fileref Hadoopexamsale? 1. infile hadoopexamsale; 2. infile hadoopexamsale ':'; 3. Access Mostly Uused Products by 50000+ Subscribers 4. infile hadoopexamsale dlm=':';
DLM= The dlm= option can be used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file).
DSD The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. Second, it allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable.
FIRSTOBS= This option tells SAS what on what line you want it to start reading your raw data file. If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. For example, if you are reading a comma separated file or a tab separated file that has the variable names on the first line, then use firstobs=2 to tell SAS to begin reading at the second line (so it will ignore the first line with the names of the variables).
MISSOVER This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file.
OBS= Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. When you want to read the entire file, you can remove the obs= option entirely.
A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be:
Question : Which SAS program correctly reads the data in the raw data file that is referenced by the fileref Hadoopexam ? 1. . data perm.contest; infile hadoopexam; input FirstName $ LastName $ Age School $ Class; run; 2. data perm.contest; infile hadoopexam; length LastName $ 11; input FirstName $ lastname $ Age School $ Class; run; 3. Access Mostly Uused Products by 50000+ Subscribers infile hadoopexam; input FirstName $ lastname $ Age School $ Class; length LastName $ 11; run; 4. data perm.contest; infile hadoopexam; input FirstName $ LastName $ 11. Age School $ Class; run; Ans 2 Exp :The LENGTH statement extends the length of the character variable LastName so that it is large enough to accommodate the data. Variable attributes such as length are defined the first time a variable is named in a DATA step. The LENGTH statement should precede the INPUT statement so that the correct length is defined. In general, the length of a variable depends on whether the variable is numeric or character how the variable was created whether a LENGTH or ATTRIB statement is present. Subject to the rules for assigning lengths, lengths that are assigned with the LENGTH statement can be changed in the ATTRIB statement and vice versa.
Question :
Which type of input should be used to read the values in the raw data file that is referenced by the fileref University?
Ans : 4 Exp : Notice that the values for School contain embedded blanks, and the values for Enrolled are nonstandard numeric values. Modified list input can be used to read the values that contain embedded blanks and nonstandard values.
Question : Which SAS statement correctly reads the values for Flavor and Quantity? Make sure the length of each variable can accommodate the values shown. 1. input Flavor & $9. Quantity : comma.; 2. input Flavor & $14. Quantity : comma.; 3. Access Mostly Uused Products by 50000+ Subscribers 4. input Flavor $14. Quantity : comma.; Ans :2 Exp : The INPUT statement uses list input with format modifiers and informats to read the values for each variable. The ampersand modifier enables you to read character values that contain single embedded blanks. The colon (:) modifier enables you to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks.
Question : Which SAS statement correctly reads the raw data values in order and assigns them to these corresponding variables: Year (numeric), School (character), Enrolled (numeric)? 1. input Year School & $27. Enrolled : comma.;
2. input Year 1-4 School & $27. Enrolled : comma.;
4. all of the above Ans : 4 Exp : The values for Year can be read with column, formatted, or list input. However, the values for School and Enrolled are free-format data that contain embedded blanks or nonstandard values. Therefore, these last two variables must be read with modified list input.
Question : You have written a SAS program which will read a RAW data file containing records, after running the entire DATA steps you know there could be 5 errors records. What will be the value of Automatic variable _ERROR_, once data step completed?