SAS Certified Specialist: Base Programming using SAS 9.4 (A00-231) Exam
Two types of SAS statements
1. Statements that are used in DATA and PROC steps 2. Statements that are global in scope and can be used anywhere in a SAS program.
You can use the following comparison operators to express a condition in the WHERE statement:
= or eq ^= or ne > or gt < or lt >= or ge <= or le
Windows, UNIX Environment
A group of SAS files that are stored in the same directory. Other files can be stored in the directory, but only the files that have SAS file extensions are recognized as part of the SAS library.
Program Data Vector (PDV)
A logical area in memory where SAS builds a data set, one observation at a time. When a program executes, SAS reads data values or creates them by executing SAS language statements. The data values are assigned to the appropriate variables in the PDV. From here, SAS writes the values to a SAS data set as a single observation.
Sasuser
A permanent library that contains SAS files in the Profile catalog and that stores your personal settings. This is also a convenient place to store your own files.
Sashelp library
A permanent library that contains sample data and other files that control how SAS works at your site. This is a Read-Only library.
Step
A sequence of SAS statements
Specify an engine
A set of internal instructions that SAS uses for writing to and reading from files in a library.
proc sort data=cert.admit out=work.wgtadmit; by descending weight age; run; proc print data=work.wgtadmit; var weight age height fee; where age>30; run;
Adding the DESCENDING option to the BY statement sorts observations in ascending order of age within descending order of weight. Notice that DESCENDING applies only to the variable Weight.
Missing numeric variables
Are represented by periods.
Semantic Error: Invalid Option
An ________ _________ error occurs when you specify an option that is not valid in a particular statement. When a SAS statement that contains an invalid option is submitted, a message appears in the SAS log indicating that the option is not valid or not recognized.
PROC step
Analyzes data, produces output, or manages SAS files.
Permanent SAS libraries
Are available to you during subsequent SAS sessions.
Generally, variable attributes such as length and type
Are determined the first time a variable is encountered.
Extended Attributes
Are user-defined metadata that is defined for a data set or for a variable (column). Are represented as name-value pairs.
Data Set Variables
As the SET statement compiles, a slot is added to the PDV for each variable in the new data set.
View Descriptor Information Using the Varnum Option
By default, PROC CONTENTS lists variables alphabetically. To list variable names in the order of their logical position (or creation order) in the data set, specify the VARNUM option in PROC CONTENTS. proc contents data=cert.amounts varnum; run;
VAR statement
By default, PROC PRINT lists all the variables in a data set. You can select variables and control the order in which they appear by using a _____ statement. VAR variable(s);
Using Permanently Assigned Labels
Can be assigned in the DATA step. These labels are saved with the data set, and they can be reused by procedures that reference the data set.
DROP=
Can be used to exclude variables from a specific output data set.
Use an Assignment Statement to
Clean invalid data when it is identified.
SAS Log
Collects messages about the processing of SAS programs and about any errors that occur. One of these is generated of the processing activities and the results of the processing each time a step is executed.
DATA step
Creates or modifies data. Can be used to do the following: - Put your data into a SAS data set - Compute values - Check for and correct errors in your data - Produce new SAS data sets by subsetting, supersetting, merging, and updating existing data sets.
FREQ
Derives the frequency
_ERROR_
Description: Initialized to 0, set to 1 when an error occurs. Debugging Use: Displays debugging messages when an error occurs.
_N_
Description: The number of times the DATA step iterated. Debugging Use: Displays debugging messages for a specified number of iterations of the DATA step.
= or eq
Equal to where name='Jones, C.';
TITLE1
Equivalent to TITLE
Missing Values
Every variable and observation in a SAS data set must have a value. If a data value is unknown for a particular observation, a _________ value is recorded in the SAS data set.
proc freq data=sashelp.cars; table origin*DriveTrain; run;
Example code for frequency procedure
FILENAME statement syntax
FILENAME fileref 'filename';
FOOTNOTE1
FOOTNOTE
Creating a Basic Report
First reference the library where your SAS data set is stored. You can also set system options to control the appearance of your reports. Then you submit a PROC PRINT step. PROC PRINT DATA=SAS-data-set; RUN;
VALIDVARNAME=V7 statement
Forces SAS to convert spaces to underscores when it converts column names to variable names.
Name (Variable Attribute)
Identifies a variable. Must conform to SAS naming rules. Possible Values: Any valid SAS name. Examples: Policy Total
DESCENDING option
In the BY statement sorts observations in descending order. If you have more that one variable in the BY statement, ___________ applies only to the variable that immediately follows it.
UPCASE
In VALIDVARNAME= V7|UPCASE|ANY Specifies that the variable name follows the same rules as V7, except that the variable name is uppercase, as in earlier versions of SAS.
title1 'Heart Rates for Patients with'; title3 'Participation in Exercise Therapy'; footnote1 'Data from Treadmill Tests'; footnote3 '1st Quarter Admissions'; proc print data=cert.therapy; var swim walkjogrun aerclass; run; title2 'Report for March'; proc print data=cert.therapy; run;
In the example below, defining a title for line 2 in the second report automatically cancels title line 3.
Properties information
Includes the variable's name, type, length, format, informat, and label.
libref
Is 1-8 characters long, begins with a letter or underscore, and contains only letters, numbers, or underscores.
SAS Data Set
Is a data file that is formatted in a way that SAS can understand. A file that consists of two parts: - A descriptor portion - A data portion
'filename'
Is the fully qualified name or location of the file.
Temporary SAS libraries
Last only for the current SAS session.
< or lt
Less than where partno lt "BG05";
<= or le
Less than or equal to where pulse le 85;
<libref.>SAS-data-set
Names one SAS data set to process.
^= or ne
Not equal to where temp ne 212;
Data errors
Occur when data values are not appropriate for the SAS statements that are specified in a program. SAS detects data errors during program execution.
By default, a simple DATA step iterates
Once for each observation that is being created.
OR (|)
Or, either. If either expression is true, then the compound expression is true.
<REPLACE>
Overwrites an existing SAS data set.
REPLACE option
Overwrites an existing SAS data set.
PROC IMPORT Syntax
PROC IMPORT DATAFILE="filename" | TABLE="tablename" OUT=<libref.SAS-data-set><SAS-data-set-options> <DBMS=identifier><REPLACE>; Example: proc import datafile='/home/u61338413/sasuser.v94/cert/car/car.xls' out=outdata dbms=xls replace; sheet="car"; getnames=yes; run;
Filerefs
Perform the same function as librefs: they temporarily point to a storage location for data. However, librefs reference SAS libraries, whereas __________ reference external files.
Default value for a missing Numeric Value
Period( . )
PROC PRINT
Prints observations in a SAS data set using some or all of the variables.
IMPORT procedure
Reads data from an external data source and writes it to a SAS data set. You can import structured and unstructured data using PROC _________. When you run this, it reads the input file and writes the data to the specified SAS data set. By default, the __________ procedure expects the variable names to appear in the first row. The procedure scans the first 20 rows to count the variables, and it attempts to determine the correct informat and format for each variable.
Label
Refers to a descriptive _________ up to 256 characters long. A variable label, which can be printed by some SAS procedures, is useful in report writing.
Length
Refers to the number of bytes used to store each of the variable's values in a SAS data set. Character variables can be up to 32,767 bytes long. All numeric variables have a default length of 8 bytes. Numeric values are stored as floating-point numbers in 8 bytes of storage. Possible Values: 2 to 8 bytes (Numeric), 1 to 32,767 bytes (Character)
PROC SORT
Replaces the original data set with a data set that is sorted by what the sort is specified by.
<libref.>_ALL_
Requests a listing of all files in the library. (Use a period (.) to append _ALL_ to the libref.)
CONTAINS operator
Selects observations that include the specified substring. The symbol for the ____________ operator is ?. You can use either the ___________ keyword or the symbol.
To cancel all previous titles or footnotes
Specify a null TITLE or FOOTNOTE statement. A null TITLE or FOOTNOTE statement does not contain any number or text and cancels all footnotes and titles that are in effect.
FIRSTOBS=
Starts processing at a specific observation.
If you do not specify a title the default title is:
The SAS System
When the compilation phase is complete
The descriptor portion of the new data set is created.
End-Of-File Marker
The execution phase continues in this manner until the end-of-file marker is reached in the input data file. When there are no more records in the input data file to be read, the data portion of the new data set is complete and the DATA step stops.
Temporary Variables
The temporary variables _N_ and _ERROR_ can be helpful when you debug a DATA step.
Input for a PROC (procedure) step
This is usually a SAS data set
Something that might cause unexpected results
Throughout SAS, using the name literal syntax with SAS member names that exceed the 32-byte limit or have excessive embedded quotation marks.
OBS=MAX in the OPTIONS statement.
To reset the number of the last observation to process. Code: options obs=max; This instructs any subsequent SAS programs in the SAS session to process through the last observation in the data set that is being read.
PROC SORT
To sort your data before using the PRINT procedure to create reports from the data.
View The Descriptor Information For Only a Specific Data Set
Use the PROC CONTENTS step. The following example lists the descriptor information for Cert.Amounts including an alphabetic list of the variables in the data set. proc contents data=cert.amounts; run;
CONTENTS procedure
Use this to display the descriptor portion of the Work.BootSales data set.
Specify DBMS=XLSX
When importing an Excel workbook
variable(s)
_____ in ID variable(s); Specifies one or more variables to print whose value is used instead of the observation number at the beginning of each row of the report.
Variables (Columns)
____________(Also called columns) in the data set are collections of values that describe a particular characteristic.
Observations (Rows)
_____________(Also called rows) in the data set are collections of data values that usually relate to a single object.
Name and create a new SAS data set
data work.stress; DATA statement
Reference an Excel workbook file
libname cert xlsx 'C:\Users\Student1\cert\exercise.xlsx'; Statement: SAS/ACCESS LIBNAME statement
Referencing an excel workbook
libname certxl XLSX 'C:\Users\Student1\cert\exercise.xlsx';
Write out the contents of the SAS library
proc contents data=cert._all_; Statement: PROC CONTENTS
Code for CONTENTS Procedure
proc contents data=work.update; run;
Execute the DATA step
run; RUN statement
Execute the PROC CONTENTS statement
run; RUN statement
Execute the PROC PRINT statement
run; RUN statement
Read in an Excel worksheet (as the input data for the new SAS data set)
set cert.ActLevel; SET statement
DATA=
specifies the data set to be read.
Code using the CONTAINS operator
where firstname CONTAINS 'Jon'; where firstname ? 'Jon';
To read the Excel workbook file, SAS must receive the following information in the DATA step:
- A libref to reference the Excel workbook to be read - The name of the Excel worksheet that is to be read
Common DBMS identifiers included with Base SAS:
- CSV - JMP - TAB
Can use PROC steps to do the following
- Create a report that lists the data - Analyze data - Create a summary report - Produce plots and charts
What SAS does when there is an error/are errors
- Displays the word ERROR - Identifies the possible location of the error - Gives an explanation of the error
Creating Excel Worksheets
- If the Excel workbook does not exist, SAS creates it. - If the Excel worksheet within the workbook does not exist, SAS creates it. - If the Excel workbook and the worksheet already exist, then SAS overwrites the existing Excel workbook and worksheet.
Use the IMPORT procedure statements to do the following:
- Indicate how many rows SAS scans for variables to determine the type and length (GUESSINGROWS=) - Modify whether SAS extracts the variable names from the first row of the data set (GETNAMES=) - Indicate at which row SAS begins to read the data (DATAROW=)
Common Syntax Errors
- Missing RUN statement - Missing semicolon - Unbalanced quotation mark
Predefined SAS libraries
- Sashelp - Sasuser - Work
Two Common Types of Errors
- Syntax errors that occur when program statements do not conform to the rules of the SAS language - Semantic errors that occur when you specify a language element that is not valid for a particular usage.
Use the CONTENTS procedure to create SAS output that describes either of the following:
- The contents of a library - The descriptor information for an individual SAS data set.
Rules for SAS Names: Applies to the names of SAS data sets, variables, and libraries:
- They must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (_). - They can continue with any combination of numbers, letters, or underscores. - They can be 1 to 32 characters long. - SAS library names (librefs) can be 1 to 8 characters long Examples: - Payroll - LABDATA2015_2018 - _EstimatedTaxPayments3
TITLE and FOOTNOTE Statements
- To make your report more meaningful and self-explanatory - You can assign up to 10 of each - Global statements
Two important rules for writing SAS programs
1. A SAS statement ends with a semicolon 2. A statement usually begins with a SAS keyword
Reference a Permanent SAS File
1. Assign a name (libref) to the SAS library in which the file is stored. 2. Use the libref as the first part of the two-level name (libref.filename) to reference the file within the library. 25
Correcting the Error: Missing Semicolon
1. Find the statement that lacks a semicolon. You can usually find it by looking at the underscored keywords in the error message and working backward. 2. Add a semicolon in the appropriate location. 3. Resubmit the corrected program. 4. Check the SAS log again to make sure there are no other errors.
Correcting the Error: Invalid Option
1. Remove or replace the invalid option, and check your statement syntax as needed. 2. Resubmit the corrected program. 3. Check the SAS log again to make sure there are no other errors.
title1; footnote1 'Data from Treadmill Tests'; footnote3 '1st Quarter Admissions'; proc print data=cert.stress; var resthr maxhr rechr; where tolerance='I'; run; footnote; proc means data=cert.stress; where tolerance='I'; var resthr maxhr; run;
1. Specifying the TITLE1 statement cancels all previous titles and cancels the default title The SAS System. The PRINT procedure and the MEANS procedure do not contain any titles in the output. 2. Specifying the FOOTNOTE1 and FOOTNOTE3 statements before the PRINT procedure results in footnotes in the PROC PRINT output. 3. Specifying a null FOOTNOTE statement cancels the previously defined footnotes that are in effect.
The flow of action in the execution phase of a simple DATA step is described as follows:
1. The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1. The _N_ automatic variable represents the number of times the DATA step has iterated. 2. SAS sets the newly created program variables to missing in the program data vector (PDV). 3. SAS reads an observation from a SAS data set directly into the PDV. You can use MERGE, SET, MODIFY, or UPDATE statement to read a record. 4. SAS executes any subsequent programming statements sequentially and updates the PDV. 5. When SAS executes the last statement in the DATA step, all values (except temporary variables and those marked to be dropped) are written as a single observation to the data set. Note that variables that you read with a SET, MERGE, MODIFY, or UPDATE statement are not reset to missing here. 6. SAS counts another iteration, reads the next observation, and executes the subsequent programming statements for the current observation. The DATA step terminates when SAS encounters the end-of-file in a SAS data set.
options firstobs=1 obs=10; proc print data=cert.heart; run;
1. The FIRSTOBS=1 option resets the FIRSTOBS= option to the default value. The default value reads the first observation in the data set. When you specify OBS=10 in the OPTIONS statement, SAS reads through the 10th observation. 2. A total of 10 observations are printed using the PROC PRINT step.
proc sort data=cert.admit out=work.activity; by actlevel; run; proc print data=work.activity; var age height weight fee; where age>30; sum fee; by actlevel; id actlevel; run;
1. The PROC SORT step sorts the permanent SAS data set Cert.Admit by the values of the variable ActLevel. The OUT= option creates the temporary SAS data set Activity. 2. The SUM statement produces column totals for the numeric variable Fee. 3. The BY statement specifies ActLevel as the variable that PROC PRINT uses to form BY groups. 4. The ID statement specifies ActLevel as the variable that replaces the Obs column and listed only once for each BY group and once for each sum. The BY lines are suppressed, and the values of the ID statement variable ActLevel identify each BY group.
Labels can be up to:
256 characters long
z/OS Environment
A specially formatted host data set in which only SAS files are stored.
Work
A temporary library for files that do not need to be saved from session to session.
Format
Affects how data values are written. ________s do not change the stored value in any way; they merely control how that value is displayed. SAS offers a variety of character, numeric, and date and time _______s. Possible Values: Any SAS ______ If no ______ is specified, the default ________ is BEST12 for a numeric variable, and $w. for a character variable.
Checking DATA Step Processing
After PROC IMPORT runs the DATA step to read the data, messages in the log verify that the data was read correctly.
Resubmitting a Revised Program
After correcting your program, you can resubmit it.
Sequentially Process Statements
After the SET statement, SAS executes the remaining statements sequentially and updates the values in the PDV. data work.update; set cert.invent; Total=instock+backord; SalePrice=(CostPerUnit*0.65)+CostPerUnit; format CostPerUnit SalePrice dollar6.2; run; 1. SAS processes the first assignment statement to create the new variable, Total. The values of InStock and BackOrd are added together to create a value for Total. See Figure 7.7 below for a visual representation of how the PDV processes the first assignment statement. 2. SAS processes the second assignment statement to create the new variable, SalePrice. The value of CostPerUnit is multiplied by 0.65, and the resulting value is added to the value of CostPerUnit to create a value for SalePrice.
AND (&)
And, both. If both expressions are true, then the compound expression is true.
Missing character values
Are represented by blanks.
Global Statements
Are used anywhere in a SAS program and stay in effect until changed or canceled, or until the SAS session ends. Common ones: TITLE, LIBNAME, OPTIONS, and FOOTNOTE.
PAGEBY statement
As another enhancement to your PROC PRINT report, you can request that each BY group be printed on a separate page by using the _____________. Prints each BY group on a separate page.
Modifying and Canceling Titles and Footnotes
As global statements, the TITLE and FOOTNOTE statements remain in effect until you modify the statements, cancel the statements, or end your SAS session.
End of the DATA step
At the end of the DATA step, several actions occur. First, the values in the PDV are written to the output data set as the first observation. data work.update; set cert.invent; Total=instock+backord; SalePrice=(CostPerUnit*0.65)+CostPerUnit; format CostPerUnit SalePrice dollar6.2; run;
Default value for a missing Character Value
Blank space
Logical Name (libref)
Can be assigned to a SAS library using the LIBNAME statement.
KEEP=
Can be used to specify which variables to write to a specific output data set.
Output from a DATA step
Can include a SAS data set or report.
Output from a PROC step
Can include a report or an updated SAS data set
Input for a DATA
Can include raw data or a SAS data set.
Redefining a title or footnote line
Cancels any higher numbered title or footnote lines, respectively.
data work.loan01; set cert.loan; if code='1' then type='variable'; else if code='2' then type='fixed'; else type='unknown'; put 'MY NOTE: The condition was met.'; run;
Character Strings Example: You can use a PUT statement to specify a character string to identify your message in the log. The character string must be enclosed in quotation marks.
data work.admit2; set cert.admit; where age>39; run; proc print data=work.admit2; run;
Code to print the results of: When the program is processed, it creates a new SAS data set, Work.Admit2, containing only those observations with age values greater than 39.
data work.newcalc; set cert.loan; if rate>0 then Interest=amount*(rate/12); else put 'DATA ERROR: ' rate= _n_ = ; run;
Conditional Processing Example: You can use a PUT statement with conditional processing (that is, with IF-THEN/ELSE statements) to flag program errors or data that is out of range. In the example below, the PUT statement is used to flag any missing or zero values for the variable Rate.
SAS Program
Consists of a sequence of steps. Can be any combination of DATA or PROC steps.
SAS Library
Contains one or more files that are defined, recognized, and accessible by SAS, and that are referenced and stored as a unit. Highest level of organization for information within SAS. Is a collection of one or more SAS files, including SAS data sets, that are referenced and stored as a unit.
data work.loan01; set cert.loan; if code='1' then type='variable'; else if code='2' then type='fixed'; else type='unknown'; put 'MY NOTE: Invalid Value: ' code= type=; run;
Data Set Variables Example: You can use a PUT statement to specify one or more data set variables to be examined for that iteration of the DATA step. Note: When you specify a variable in the PUT statement, only its value is written to the log. To write both the variable name and its value to the log, add an equal sign (=) to the variable name.
NLEVELS
Displays a table that provides the number of distinct values for each variable that is named in the TABLES statement.
SORT procedure
Does the following: • Rearranges the observations in a SAS data set • Creates a new SAS data set that contains the rearranged observations • Replaces the original SAS data set by default • Can sort on multiple variables • Can sort in ascending or descending order • Treats missing values as the smallest possible values
where age<=55 and pulse>75; where area='A' or region='S'; where ID>'1050' and state='NC';
Example of a WHERE statement You can use compound expressions like these in your WHERE statements.
> or gt
Greater than where income>20000;
>= or ge
Greater than or equal to where id>='1543';
Delimited
Having fixed boundaries or limits.
Type (Variable Attribute)
Identifies a variable as numeric or character. Character variables can contain any values. Numeric variables can contain only numeric values (the numerals 0 through 9, +, -, ., and E for scientific notation). Possible Values: Numeric or character Example: Char Num Char
ID statement
Identifies observations using variable values, such as an identification number, instead of observation numbers. ID variable(s);
OUT=
Identifies the output SAS data set
OUT=<libref.>SAS-data-set
Identifies the output SAS data set with either a one or two-level SAS name (library and member name).
COMPATIBLE
In VALIDMEMNAME= COMPATIBLE | EXTEND Specifies that a SAS data set name must follow these rules: - The length of the names can be up to 32 characters long. - Names must begin with a letter of the Latin alphabet (A- Z, a - z) or an underscore. Subsequent characters can be letters of the Latin alphabet, numerals, or underscores. - Names cannot contain blanks or special characters except for an underscore - Names can contain mixed-case letters. SAS internally converts the member name to uppercase. Therefore, you cannot use the same member name with a different combination of uppercase and lowercase letters to represent different variables.
EXTEND
In VALIDMEMNAME= COMPATIBLE | EXTEND specifies that the data set name must follow these rules: - Names can include national characters. - The name can include special characters, except for the / \ * ? " < > |: - characters. - The name must contain at least one character. - The length of the name can be up to 32 bytes. - Null bytes are not allowed. - Names cannot begin with a blank or a '.' ( period). - Leading and trailing blanks are deleted when the member is created. - Names can contain mixed-case letters. SAS internally converts the member name to uppercase. Therefore, you cannot use the same member name with a different combination of uppercase and lowercase letters to represent different variables.
ANY
In VALIDVARNAME= V7|UPCASE|ANY Specifies that SAS variable names must follow these rules: - The name can begin with or contain any characters, including blanks, national characters, special characters, and multi-byte characters. - The name can be up to 32 bytes long. - The name cannot contain any null bytes. - Leading blanks are preserved, but trailing blanks are ignored. - The name must contain at least one character. A name with all blanks is not permitted. - A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable. However, when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables.
V7
In VALIDVARNAME= V7|UPCASE|ANY Specifies that variable names must follow these rules: - SAS variable names can be up to 32 characters long - The first character must begin with a letter of the Latin alphabet (A-Z, either uppercase or lowercase) or an underscore - Subsequent characters can be letters of the Latin alphabet, numerals, or underscores - Trailing blanks are ignored. The variable name alignment is left-justified - A variable name cannot contain blanks or special characters except for an underscore - A variable name can contain mixed-case letter. SAS stores and writes the variable name in the same case that is used in the first reference of the variable. - Do not assign variables the names of special SAS automatic variables or variable list names to variables.
proc print data=cert.admit; run;
In order for the PRINT procedure to read cert.admit, you specify the two-level name of the file via this code.
proc print data=cert.reps; id idnum lastname; run;
In the following example, the OBS column in the output is replaced with the variable values for IDnum and LastName.
libname excelout xlsx 'C:\Users\Student1\Cert\newExcel.xlsx'; data excelout.HighStress; set cert.stress; run;
In the following example, you use the SAS/ACCESS LIBNAME statement and the DATA step to create an Excel worksheet. The SAS/ACCESS LIBNAME statement specifies the name of the new Excel file as newExcel.xlsx. The DATA step reads in Cert.Stress and then creates the Excel worksheet HighStress in the newExcel.xlsx workbook.
libname cert 'C:\Users\Student1\Cert'; proc print data=cert.therapy; run;
In the program below, the PROC PRINT statement invokes the PRINT procedure and specifies the data set Therapy in the SAS library to which the libref Cert has been assigned. Notice the layout of the resulting report below. These are the default behaviors: • All observations and variables in the data set are printed. • A column for observation numbers appears on the far left. • Variables and observations appear in the order in which they occur in the data set.
DATA statement
Indicates the beginning of the DATA step and names the SAS data set to be created.
Descriptor Portion (Of SAS Data Set)
Information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes.
fileref
Is a name that you associate with an external file. The name must be one to eight characters long, begin with a letter or underscore, and contain only letters, numbers, or underscores.
SAS name literal
Is a name token that is expressed as a string within quotation marks, followed by the uppercase or lowercase letter n. Tells SAS to allow the special character ($) in the data set name. libname certxl xlsx 'C:\Users\Student1\cert\stock.xlsx'; data work.bstock; set certxl.'boots stock'n; run;
n
Is a positive integer. For FIRSTOBS=, _____ specifies the number of the first observation to process. For OBS=, n specifies the number of the last observation to process.
SAS Index
Is a separate file that you can create for a SAS data file in order to provide direct access to a specific observation. The _______ file has the same name as its data file and a member type of ________. _________ can provide faster access to specific observations, particularly when you have a large data set. The purpose of SAS _________ is to optimize WHERE expressions and to facilitate BY-group processing.
SAS Engine
Is a set of internal instructions that SAS uses for writing to and reading from files in a SAS library or a third-party database.
SAS Statement
Is a type of SAS language element that is used to perform a particular operation in a SAS program or to provide information to a SAS program. - Free Format (Can begin and end anywhere on a line) - One of these can continue over several lines - Several of these can be on the same line. - Blank or special characters separate words in this thing - Can be specified in uppercase or lowercase - In these, text enclosed in quotation marks are case sensitive
Extended attribute descriptor information
Is defined by the user and includes the name of the attribute, the name of the variable, and the value of the attribute. The descriptor information also contains information about extended attributes (if defined in a data set).
RUN statement
Is not required between steps in a SAS program. However, it is best practice to use it because it can make the SAS program easier to read and the SAS log easier to understand when debugging.
Engine
Is the name of a library engine that is supported in your operating environment.
Best practice for title and footnote text
Make sure to match quotation marks that enclose the title or footnote text.
Logic Error
Occurs when the program statements execute, but produce incorrect results.
Enclose the label in:
Quotation Marks
Informat
Reads data values in certain forms into standard SAS values. ________s determine how data values are read into a SAS data set. You must use ________s to read numeric values that contain letters or other special characters. Possible values: Any SAS ________ The default ________ for numeric is w.d and for character is $w.
Syntax errors
Rules or grammar of programming language is not followed so compiler or interpreter doesn't understand. They generally prevent SAS from executing the step in which the error occurred.
When you submit a DATA step for execution
SAS checks the syntax of the SAS statements and compiles them. In this phase, SAS identifies the type and length of each new variable, and determines whether a variable type conversion is necessary for each subsequent reference to a variable.
Referencing an Excel Workbook in a DATA step
SET Statement Use the SET statement to indicate which worksheet in the Excel file you want to read. data work.stress; set certxl.ActivityLevels; run;
DATA and PROC statements
Signal the beginning of a new step. The beginning of a new step also implies the end of the previous step.
During the compilation phase, SAS can interpret:
Some syntax errors (such as the keyword DATA misspelled as DAAT).
Catalog
Special type of file. In libraries, they function much like subfolders for grouping other members.
<SAS-data-set-options>
Specifies SAS data set options.
BY-variable
Specifies a variable that the procedure uses to form BY groups. You can specify more than one variable, separated by blanks.
SAS-file-specification
Specifies an entire library or a specific SAS data set within a library. Can take one of the following forms: <libref.>SAS-data-set <libref.>_ALL_
DESCENDING
Specifies that the data set is sorted in descending order by the variable that immediately follows.
NOTSORTED
Specifies that the observations in the data set that have the same BY values are grouped together, but are not necessarily sorted in alphabetical or numeric order. This option applies to all of the variables in the BY statement. You can specify the __________ option anywhere within the BY statement. The Requirement for ordering or indexing observations according to the values of BY variables is suspended when you use the ___________ option.
SET statement
Specifies the SAS data set that you want to use as input data for your DATA step.
message
Specifies the message that you want to write to the SAS log. It can include character literals, variable names, formats, and pointer controls. PUTLOG 'message';
TABLE="tablename"
Specifies the name of the input DBMS table.
DATAFILE=
Specifies the path for the input file.
VALIDVARNAME= (System option)
Specifies the rules for valid SAS variable names that can be created and processed during a SAS session.
<DBMS=identifier>
Specifies the type of data to import
DBMS=
Specifies the type of data to import.
specification
Specifies what is written, how it is written, and where it is written. Examples: • a character string • one or more data set variables • the automatic variables _N_ and _ERROR_ • the automatic variable _ALL_ PUT specification(s);
View the Contents of an Entire Library
Specify the _ALL_ and NODS options in the PROC CONTENTS step. The _ALL_ option lists all files in the Cert library, and the NODS option suppresses the printing of detailed information about each specific file. proc contents data=cert._all_ nods; run; The following output displays a partial output of the contents of the Cert library. The _ALL_ option lists all files including indexes, views, and catalogs
OBS=
Stops processing at a specific observation.
NODS
Suppresses the printing of detailed information about each file when you specify _ALL_. (You can specify NODS only when you specify _ALL_.)
data cert.paris; set cert.laguardia; where dest='PAR' and (boarded=155 or boarded=146); label date='Departure Date'; run; proc print data=cert.paris label; var date dest boarded; run;
The DATA step creates the data set Cert.Paris and defines the label for the variable Date. Because the LABEL statement is inside the DATA step, the labels are written to the Cert.Paris data set and are available to the subsequent PRINT procedure.
End of the Execution Phase
The SAS log confirms that the input data file was read, and it displays the number of observations and variables in the data set.
proc print data=cert.insure; var name policy balancedue; where pctinsured < 100; sum balancedue; run;
The SUM statement in the following PROC PRINT step requests column totals for the variable BalanceDue.
Missing RUN statement
The _____ statement is necessary at the end of the last step. If the _____ statement is omitted from the last step, the program might not complete processing and might produce unexpected results.
Data Portion
The _______ ________ of a SAS data set is a collection of data values that are arranged in a rectangular table.
Descriptor Portion
The _______ portion of a SAS data set contains information about the data set, including the following: - the name of the data set - the date and time that the data set was created - the number of observations - the number of variables Contains information about the properties of each variable in the data set.
data work.update; set cert.invent; Total=instock+backord; SalePrice=(CostPerUnit*0.65)+CostPerUnit; format CostPerUnit SalePrice dollar6.2; run;
The assignment statement below creates two variables, Total and SalePrice. As the statement is compiled, the variable is added to the PDV. The attributes of the variable are determined by the expression in the statement. Because the expression contains an arithmetic operator and produces a numeric value, Total and SalePrice are defined as numeric variables and are assigned the default length of 8.
proc freq data=cert.pats; tables Gender Age; run;
The data set contains invalid characters for the variables Gender and Age. PROC FREQ displays the distinct values of variables and is therefore useful for finding invalid values in data. You can use PROC FREQ with the TABLES statement to produce a frequency table for specific variables.
data work.grades; set cert.class; Homework=Homework*2; AverageScore=MEAN(Score1 + Score2 + Score3 + Homework); if AverageScore<70; run;
The data set contains three test scores and homework grades for four students. The program below is designed to select students whose average score is below 70. Although the program produces incorrect results, there are no error messages in the log.
proc print data=cert.admit; var age height weight fee; run;
The following VAR statement specifies that only the variables Age, Height, Weight, and Fee be printed, in that order. The procedure output from the PROC PRINT step with the VAR statement lists only the values for those variables.
data work.test; set cert.loan01; if code='1' then type='variable'; else if code='2' then type='fixed'; else type='unknown'; if type='unknown' then put 'MY NOTE: invalid value: ' code=; run;
The following example illustrates how to use the PUT statement to write messages to the SAS log. If the value of the variable Code equals 1, then the program returns the value for Type as variable. If the value equals 2, then the return value for Type is fixed. Otherwise, the value of Type is returned as unknown. If Type contains the value unknown, then the PUT statement writes a message to the log.
proc sort data=cert.admit out=work.activity; by actlevel; run; proc print data=work.activity; var age height weight fee; where age>30; sum fee; by actlevel; id actlevel; pageby actlevel; run;
The following example uses the PAGEBY statement to print the BY groups for the variable ActLevel on separate pages. The BY groups are separated by horizontal lines in the HTML output.
proc sort data=cert.admit out=work.activity; by actlevel; run; proc print data=work.activity; var age height weight fee; where age>30; sum fee; by actlevel; run;
The following example uses the SUM statement and the BY statement to generate subtotals for each BY group and a sum of all of the subtotals of the Fee variable. 1. The PROC SORT step sorts the permanent SAS data set Cert.Admit by the values of the variable ActLevel. The OUT= option creates the temporary SAS data set Activity. 2. The SUM statement produces column totals for the numeric variable Fee. 3. The BY statement specifies ActLevel as the variable that PROC PRINT uses to form BY groups. In the output, the BY variable name and value appear before each BY group. The BY variable name and the subtotal appear at the end of each BY group.
data work.grades; set cert.class; Homework=Homework*2; AverageScore = MEAN(Score1, Score2, Score3, Homework); if AverageScore < 70; run; proc print data=work.grades; run;
The output of this code lists the names of students whose average score is below 70.
data empty full; set cert.usa; output full; run;
The program creates two temporary data sets, Empty and Full. The result of this DATA step is that the data set Empty is created but contains no observations, and the data set Full contains all of the observations from Cert.Usa.
data cert.drug1h(drop=placebo); set cert.cltrials(drop=triglyc uric); if placebo='YES'; run; proc print data=cert.drug1h; run;
This DATA step uses the variable Placebo to select observations. To drop Placebo from the new data set, the DROP= option must appear in the DATA statement. When used in the DATA statement, the DROP= option simply drops the variables from the new data set. However, they are still read from the original data set and are available within the DATA step.
options firstobs=10 obs=15; proc print data=clinic.heart; run;
This program processes only observations 10 through 15, for a total of 6 observations.
where (age<=55 and pulse>75) or area='A'; where age<=55 and (pulse>75 or area='A');
To control how compound expressions are evaluated, you can use parentheses (expressions in parentheses are evaluated first).
CONTENTS procedure usage
To display descriptor information.
Referencing Permanent SAS Data Sets
To do this in your SAS programs, use a two-level name consisting of the library name and the data set name: libref.dataset
Referencing Temporary SAS files
To do this, you can specify the default libref Work, a period, and the data set name. For example, the two-level name, Work.Test, references the SAS data set named Test that is stored in the ____ SAS library work. Alternatively, you can use a one-level name to reference a file in a _____ SAS library.
You can use PROC FREQ
To identify any variables that were not given an expected value.
SUM statement
To produce column totals for numeric variables, you can list the variables to be summed in a SUM statement in your PROC PRINT step. You can use the ________ statement to create a subtotal value for variables in the group.
How Long Librefs Remain in Effect
Until changed or canceled, or until the SAS session ends.
FILENAME statement
Use the __________ statement to point to the location of the external file that contains the data.
PUTLOG statement
Use this in the DATA step to write messages to the SAS log to help identify logic errors. Can be used to write to the SAS log in both batch and interactive modes. PUTLOG 'message';
When to Use VALIDMEMNAME= (System Option)
Use this system option when the characters in a SAS data set name contain one of the following: - international characters - characters supported by third-party databases - characters that are commonly used in a filename
GETNAMES=YES
Use this to generate variable names from the first row of data.
SHEET=
Use this to import specific worksheets from an Excel workbook.
VALIDMEMNAME= (System option)
Use this to specify rules for naming SAS data sets.
proc print data=cert.therapy label; label walkjogrun='Walk/Jog/Run'; run;
Using the LABEL Option in the PROC PRINT Statement. In the PROC PRINT step below, the variable name WalkJogRun is displayed with the label Walk/Jog/Run. Note that the LABEL option is in the PROC PRINT statement. If you omit the LABEL option in the PROC PRINT statement, PROC PRINT uses the name of the column heading, walkjogrun, even though you specified a value for the variable.
SAS/ACCESS software
Via this, you can use the LIBNAME statement to access data that is stored in a database management system (DBMS) file.
where-expression
WHERE where-expression; Specifies a condition for selecting observations. Can be any valid SAS expression.
data work.admit2; set cert.admit; where age>39; run;
When the program is processed, it creates a new SAS data set, Work.Admit2, containing only those observations with age values greater than 39. The DATA step creates a new data set and produces messages in the SAS log, but it does not create a report or other output.
When the IMPORT procedure reads a delimited file
When this happens, it generates a DATA step to import the data.
Unbalanced Quotation Marks
When you have this, SAS is often unable to detect the end of the statement in which it occurs.
where actlevel='LOW' or actlevel='MOD'; where fee=124.80 or fee=178.20;
When you test for multiple values of the same variable, you specify the variable name in each expression.
Define a library
You assign a library name to it and specify the location of the files, such as a directory path. Can define these using programming statements.
Specifying compound WHERE expressions
You can also use WHERE statements to select observations that meet multiple conditions. To link a sequence of expressions into compound expressions, you use logical operators.
WHERE statement
You can control which observations are printed by adding a __________ statement to your PROC PRINT step. There should be only one __________ statement in a step. If multiple ___________ statements are issued, only the last statement is processed. In the ____________ statement you can specify any variable in the SAS data set, not just the variables that are specified in the VAR statement. Works for both character and numeric variables. You can also use __________ statements to select observations that meet multiple conditions.
options firstobs=10 obs=15; proc print data=clinic.heart(firstobs=20 obs=30); run;
You can create the same output by specifying FIRSTOBS= and OBS= as data set options, as follows. The data set options override the system options for this instance only. To specify FIRSTOBS= or OBS= for this program only, you could omit the OPTIONS statement altogether and simply use the data set options.
BY statement
You can group variables by using this.
LIBNAME statement
You can include the ___________ statement with any SAS program so that the SAS library is assigned each time the program is submitted. You can use this to assign a libref to a database.
Iterations of the DATA step
You can see that the DATA step works like a loop, repetitively executing statements to read data values and create observations one by one. At the beginning of the second iteration, the value of _N_ is 2, and _ERROR_ is still 0. Each loop (or cycle of execution) is called an __________.
proc means data=cert.pats; var Age; run;
You can submit PROC MEANS to determine whether the age of all test subjects is within a reasonable range. Notice that the VAR statement is specified with that particular variable (Age) to get the statistical information, or range, of the data values.
Specify the observations to process from SAS data sets
You can use the FIRSTOBS= and OBS= options in an OPTIONS statement.
proc print data=cert.reps; id idnum lastname; var idnum sex jobcode salary; run;
You can use the ID and VAR statement together to control which variables are printed and in which order. If a variable in the ID statement also appears in the VAR statement, the output contains two columns for that variable. 1. The ID statement replaces the OBS column in the output with the IDnum and LastName variable values. 2. The VAR statement selects the variables that appear in the output and determines the order. The variable IDnum appeared in both the ID statement and the VAR statement. Therefore, IDnum appears twice in the output.
where actlevel in ('LOW','MOD'); where fee in (124.80,178.20);
You can use the IN operator as a convenient alternative (In a WHERE statement)
When the source of program is not apparent
You can use the PUT statement to examine variable values and to print your own message in the log.
Missing Semicolon
data work.admitfee; set cert.admit; run; proc print data=work.admitfee var id name actlevel fee; run; When you omit a ______, SAS reads the statement that lacks the semicolon (along with the following statement) as one long statement.
Filter the data to include only those participants whose Activity level is High
data work.stress; set certxl.ActivityLevels; where ActLevel='HIGH'; run;
Using the FIRSTOBS= option
firstobs=10; proc print data=cert.heart; run; 1. Use the OPTIONS statement to specify the FIRSTOBS= option. In this example, the FIRSTOBS=10 option enables SAS to read the 10th observation of the data set first and read through the last observation. 2. A total of 11 observations are printed using the PROC PRINT step.
Code for Creating Footnotes
footnote1 'Data from Treadmill Tests'; footnote3 '1st Quarter Admissions'; proc print data=cert.stress; var resthr maxhr rechr; where tolerance='I'; run;
Code for Reading a SAS Data Set
libname cert 'C:\Users\Student1\cert\'; libname Men50 'C:\Users\Student1\cert\Men50'; data Men50.males; set cert.admit; where sex='M' and age>50; run; The DATA step below reads all observations and variables from the existing data set Cert.Admit into the new data set Males. The DATA statement creates the permanent SAS data set Males, which is stored in the SAS library Men50. The SET statement reads the permanent SAS data set Cert.Admit and subsets the data using a WHERE statement. The new data set, Males, contains all males in Cert.Admit who are older than 50. You can add a PROC PRINT statement to this same example to see the output of Men50.Males. proc print data=Men50.males; title 'Men Over 50'; run;
Printing an Excel Worksheet as a SAS Data Set
libname certxl xlsx 'C:\Users\Student1\cert\stock.xlsx'; data work.bstock; set certxl.'boots stock'n; run; procprintdata=work.bstock; run; After using the DATA step to read in the Excel data and create a SAS data set, you can use PROC PRINT to produce a report that displays the data set values. In the following example, the PROC PRINT statement displays all the data values for the new data set, Work.Bstock.
Processing Middle Observations of a Data Set
options firstobs=10 obs=15; proc print data=cert.heart; run; 1. When you set FIRSTOBS=10 and OBS=15, the program processes only observations 10 through 15. 2. A total of six observations are printed using the PROC PRINT step.
Code that limits the number of rows to five records
options obs=5; proc import datafile="C:\Users\Student1\cert\boot.csv" dbms=csv out=shoes replace; getnames=no; run; (options obs=5 is the line of code of reference)
Code for Reading the Entire External File
options obs=max; proc import datafile="C:\Users\Student1\cert\boot.csv" dbms=csv out=shoes replace; getnames=no; run; To modify the PROC step to read the entire external file, restore the default value to the OBS= system option. To do this, set OBS=MAX and then resubmit the program.
Code for Importing a Space-Delimited File with a TXT Extension
options validvarname=v7; filename stdata 'C:\Users\Student1\cert\state_data.txt' 1recl=100; proc import datafile=stdata dbms=dlm out=states replace; delimiter=' '; getnames=yes; run; proc print data=states; run; Specify the fileref and the location of the file. Specify the LRECL= system option if the file has a fixed-length format. The LRECL= system option specifies the default logical record length to use when reading external files. Specify the input file and specify that it is a delimited file. The DBMS= option specifies the type of data to import. If the delimiter type is a character other than TAB or CSV, then the DBMS= option is DLM. The REPLACE option overwrites an existing SAS data set. The OUT= option identifies the output SAS data set. Specify a blank value for the DELIMITER statement. Set the GETNAMES= statement to YES to generate variable names from the first row of data.
Importing a Comma-Delimited File with a CSV Extension
options validvarname=v7; proc import datafile='C:\Users\Student1\cert\boot.csv' dbms=csv out=shoes replace; getnames=no; run; proc print data=work.shoes; run; Specify the input file. DATAFILE= specifies the input data file, and OUT= specifies the output data set. The DBMS= specifies the type of data to import. If the file type is CSV, then the DBMS= option is CSV. The REPLACE option overwrites an existing SAS data set. Set the GETNAMES= statement to NO to not use the first row of data as variable names.
Code for Importing an Excel File with an XLSX Extension
options validvarname=v7; proc import datafile='C:\Users\Student1\cert\boots.xlsx' dbms=xlsx out=work.bootsales replace; sheet=boot; getnames=yes; run; proc contents data=bootsales; run; proc print data=bootsales; run;
Code for Importing a Delimited File with a TXT extension
options validvarname=v7; proc import datafile='C:\Users\Student1\cert\delimiter.txt' dbms=dlm out=mydata replace; delimiter='&'; getnames=yes; run; proc print data=mydata; run; DATAFILE= Specifies the path for the input file. The DBMS= option specifies the type of data to import. If the delimiter is a character other than TAB or CSV, then the DBMS= option is DLM. The REPLACE option overwrites an existing SAS data set. The OUT= option identifies the output SAS data set. Specify an ampersand (&) for the DELIMITER statement. Set the GETNAMES= statement to YES to generate variable names from the first row of data.
Code for Using the SET statement to Specify Imported Data
proc import datafile="C:\certdata\boot.csv" out=shoes dbms=csv replace; getnames=no; run; data boots; set shoes; where var1='South America' OR var1='Canada'; run;
Importing a Tab-Delimited File
proc import datafile='C:\Users\Student1\cert\class.txt' dbms=tab out=class replace; delimiter='09'x; run; proc print data=class; run; Specify the input file. DATAFILE= specifies the input data file, and OUT= specifies the output data set. DBMS= specifies the type of data to import. If the file type is TXT, then the DBMS= option is TAB. The REPLACE option overwrites an existing SAS data set. GETNAMES= statement defaults to YES. Specify the delimiter. On an ASCII platform, the hexadecimal representation of a tab is '09'x. On an EBCDIC platform, the hexadecimal representation of a tab is a '05'x.
Using a Single LABEL Statement to Assign Multiple Labels
proc print data=cert.admit label; var actlevel height weight; label actlevel='Activity Level' height='Height in Inches' weight='Weight in Pounds'; run; 1. Use the LABEL option with the PROC PRINT statement. 2. A single LABEL statement assigns three labels to three different variables. Note that you do not need a semicolon at the end of your label until you are ready to close your LABEL statement. In this example, the semicolon is at the end of the label for Weight.
Code for Using Multiple LABEL statements
proc print data=cert.admit label; var age height; label age='Age of Patient'; label height='Height in Inches'; run; 1. Use the LABEL option with the PROC PRINT statement. If you omit the LABEL option in the PROC PRINT statement, PROC PRINT uses the variable name. 2. You can assign labels in separate LABEL statements. In this example, label the variable Age as Age of Patients. 3. This is the second LABEL statement in this example. Label the variable Height as Height in Inches.
Code - Using the WHERE Statement in PROC PRINT
proc print data=cert.admit; varageheightweightfee; where age>30; run; 1. The VAR statement selects the variables Age, Height, Weight, and Fee and displays them in the output in that order. 2. The WHERE statement selects only the observations for which the value of Age is greater than 30 and prints them in the output. The following output displays only the observations where the value of Age is greater than 30.
View the contents of a particular data set
proc print data=stress; PROC PRINT
Code to remove the noobs column
proc print data=work.example noobs; var age height weight fee; run; Specify the NOOBS option in the PROC PRINT statement.
Code for Printing the Data Set
proc print data=work.shoes; run; You can submit a PROC PRINT step to view the data. The following PROC PRINT step prints the Work.Shoes data set.
Display the data set with the PRINT procedure.
proc print data=work.update; run;
Code for Creating Titles
title1 'Heart Rates for Patients with:'; title3 'Increased Stress Tolerance Levels'; proc print data=cert.stress; var resthr maxhr rechr; where tolerance='I'; run;
When SAS detects an error in the execution phase, the following can occur, depending on the type of error:
• A note, warning, or error message is displayed in the SAS log. • The values that are stored in the PDV are displayed in the SAS log. • The processing of the step either continues or stops.
Rules to specify a condition based on the value of a character variable:
• Enclose the value in quotation marks. • Write the value with lowercase, uppercase, or mixed case letters exactly as it appears in the data set.
In addition to the PRINT procedure showing missing values, the following procedures can be used to detect invalid data:
• PROC FREQ • PROC MEANS
During the compilation phase, SAS creates the following items
• Program data vector (PDV) • Descriptor information
The results when an ID statement specifies the same variable as the BY statement:
• The Obs column is suppressed. • The ID or BY variable is printed in the left-most column. • Each ID or BY value is printed only at the start of each BY group and on the line that contains that group's subtotal.
To enhance your PROC PRINT by labeling columns:
• Use the LABEL statement to assign a descriptive label to a variable. • Use the LABEL option in the PROC PRINT statement to specify that the labels be displayed.
Errors that are detected during the compilation phase include these:
• misspelled keywords and data set names • unbalanced quotation marks invalid options
If it cannot interpret the error, SAS does the following:
• prints the word ERROR followed by an error message in the SAS log • compiles but does not execute the step where the error occurred, and prints the following message: NOTE: The SAS System stopped processing this step because of errors.
Along with data set variables and computed variables, the PDV contains these automatic variables:
• the _N_ variable, which counts the number of times the DATA step iterates. • the _ERROR_ variable, which signals the occurrence of an error caused by the data during execution. The value of _ERROR_ is 0 when there are no errors. When an error occurs, whether one error or multiple errors, the value is set to 1. The default value is 0.
Examples of Descriptor Portion (Of a SAS data set)
• the name of the data set and its member type • the date and time that the data set was created • the names, data types (character or numeric), and lengths of the variables
When PROC IMPORT reads raw data, SAS sets the value of each variable in the DATA step to missing at the beginning of each cycle of execution, with these exceptions:
• variables that are named in a RETAIN statement • variables that are created in a sum statement • automatic variables
