SAS Programming Topics
Renaming Variables
-DATA step match-merging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data set -To prevent overwriting, you can rename variables by using the RENAME= data set option in the MERGE statement
NOFREQ, NOPERCENT, NOROW, NOCOL statements
-NOFREQ suppresses cell frequencies -NOPERCENT suppresses cell percentages -NOROW suppresses row percentages -NOCOL suppresses column percentages
Differences between DROP and KEEP and DROP= and KEEP=
-You cannot use the DROP and KEEP statements in SAS procedure steps. -The DROP and KEEP statements apply to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= and KEEP= data set options in the DATA statement
TABLES statement
-creates output tables for specified variables -To create a two-way table or n-way table, join the variables with an asterisk (*) in the TABLES statement in a PROC FREQ step
PROC SORT Rules
-default is ascending, missing values will appear at the top if ascending -to change to descending "descending var" applies to the variable immediately behind the descending
SAS programing structure
-when a SAS program is submitted for execution, SAS first validates the syntax and then compiles the statements. DATA and PROC statements signal a beginning of a new step -SAS log collects messages about the processing of SAS programs and about any errors that occur
Categorizing Variables
-you can use the IF-THEN statement to assign values to a new variable based off if a specific condition is met -when the DATA step executes, each IF statement is evaluated in order, even if the first condition is true. This wastes system resources and slows the processing of your program -Use the ELSE statement to assign a value of the If-Then is false
Format naming rules
A format name must be a valid SAS name, start with a $ if the format is character, must not end in a number, and have a maximum length of 32. Also, the name of a user-defined format cannot be the same as the name of a format that is supplied by SAS
FMTLIB statement
Adding the keyword FMTLIB to the PROC FORMAT statement displays a list of all the formats in your catalog, along with descriptions of their values
CONTAINS
CONTAINS - use contains to collect observations that contain a key word (like IN for SQL) Symbol: ?
out= option
IMPORT option that identifies the output SAS data set with either a one or tow-level SAS name. If the data set doesn't exist, it creates it
dbms= option
IMPORT option that specifies the type of data to import. Types: CSV, JMP, TAB (tab-delimited files)
Match Merging
Match-merging combines observations from two or more data sets into a single observation in a new data set according to the values of a common variable (Join) Example: data merged; merge a b; by num; run;
ODS output types
Microsoft Excel and Power Point, HTML, PDF, CSV, and RTF
PROC MEANS
PROC MEANS prints the n-count (number of non missing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in a data set
How does SAS store dates
SAS stores date and time values as numeric values. You apply SAS formats to the data so that meaningful date and time values are displayed in reports
NOOBS
Suppresses observation numbers in a print statement
4 Common SAS Global Statements
TITLE, LIBNAME, OPTIONS, and FOOTNOTE
TITLE and FOOTNOTE statement
TITLE<n> 'text' and FOOTNOTE<n> 'text' statements specify the title ad footnote of the page
BODY= statement
The BODY= specification is one way to create an HTML file containing procedure output
BY statement
The BY statement specifies variables to use for categorizing observations
The CATX function
The CATX function enables you to concatenate character strings, remove leading and trailing blanks, and insert separators
The CEIL function
The CEIL function returns the smallest integer that is greater than or equal to the argument
CONTENTS= statement
The CONTENTS= specification creates the file toc.html in the C:\Users \Student1\cert\ directory. The table of contents file has links to each procedure output in the body file
DO WHILE and DO UNTIL
The DO WHILE and DO UNTIL statements enable you to execute DO loops based on whether a condition is true or false
The FLOOR
The FLOOR function returns the largest integer that is less than or equal to the argument
FRAME= statement
The FRAME= specification creates the file frame.html in the C:\Users \Student1\cert\ directory. The frame file integrates the table of contents and the body file
The INDEX function
The INDEX function enables you to search a character value for a specified string
INTCK function
The INTCK function returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values
INTNX function
The INTNX function applies multiples of a given interval to a date, time, or datetime value and returns the resulting value. You can use the INTNX function to identify past or future days, weeks, months, and so on
OUTPUT Statement
The OUTPUT statement writes statistics to a new SAS data set. By default, the default summary statistics are produced for all numeric variables or for the variables specified in the VAR statement
SCAN function
The SCAN function returns the nth word from a character string. The SCAN function enables you to separate a character value into words and to return a specified word
SUBSTR function
The SUBSTR function extracts a substring from an argument, starting at a specific position in the string -Can also be used to replace contents of a character variable
TRANWORD function
The TRANWRD function replaces or removes all occurrences of a word in a character string
The concatenation operator (||)
The concatenation operator concatenates character values. FullName = First || Middle || Last
Basics of the DO Loop
The iterative DO statement executes statements between the DO and END statements repetitively, based on the value of an index variable -index-variable: names a variable whose value governs execution of the DO group
Creating an output for each DO iteration
To create an observation for each iteration of the DO loop, place an OUTPUT statement inside the loop OUTPUT statement inside the loop overrides the automatic output of the dataset
FORMAT statement
To make data values more understandable when they are displayed in your procedure output, you can use the FORMAT statement, which associates formats with variables
CLASS statement
To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure
When the SUBSTR function replaces a word
To summarize, when the SUBSTR function is on the right side of an assignment statement, the function extracts a substring. MiddleInitial=substr(middlename,1,1); When the SUBSTR function is on the left side of an assignment statement, the function replaces the contents of a character variable. substr(region,1,3)='NNW'
One-to-One Reading
Use multiple SET statements in a DATA step to combine data sets. One-to-one reading combines rows from two or more data sets by creating rows that contain all of the columns from each contributing data set
INPUT function
Use the INPUT function to convert character data values to numeric values REQUIRES INFORMAT
PUT function
Use the PUT function to explicitly convert numeric data values to character data values
PUTLOG Statement
Use the PUTLOG statement in the DATA step to write messages to the SAS log to help identify logic errors -Ex: PUTLOG 'message';
Concatenating Datasets
When a program concatenates data sets, all of the observations are read from the first data set listed in the SET statement. Then all of the observations are read from the second data set listed, and so on Example: DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-2;
PUT Statement
When the source of the program errors is not apparent, use the PUT statement to examine variable values and print your own message in the log. You can use the IF-THEN/ELSE statements to conditionally check values
Using the VALUE statement in FORMATS
You can create your own formats to format the values. You can also apply a format to the values of a variable -Use the VALUE statement to define a format for displaying one or more values the keyword VALUE and ends with a semicolon after all the labels have been defined
Program Data Vector
a logical area in memory where SAS builds a data set. As a SET statement compiles, a slot is added to the PDV for each variables in the new data set
Debugging in execution phase
a note, warning, or error message is displayed in the SAS log; the values that are stored in the PDV are displayed in the SAS log; the processing of the step either continues or stops
What is the SASuser library
a permanent library that contains SAS files in the Profile catalog and that stores your personal settings
What is the SAShelp library
a permanent library that contains sample data and other files that control how SAS works at your site
What is the Work library
a temporary library for files that do not need to be saved from session to session
Modifying variables with expressions
an expression is a sequence of operands and operators that form a set of instructions (+,-,*, etc.) Example of creating new variable: data work.stresstest; set cert.tests; TotalTime=(timemin*60)+timesec; run; proc print data=work.stresstest; run;
FIRST.variable and LAST.variable
are variables that SAS creates for each BY variable. SAS sets FIRST.variable when it is processing the first observation in a BY group, and sets LAST.variable when it is processing the last observation in the BY group
PROC CONTENTS
creates a SAS output that describes either of the following: contents of a library, the descriptor information for an individual data set Ex: PROC CONTENTS DATA=SAS-file-specification NODS;
DO loop example: the DO group Month is the index variable, 1 is the start-variable, and 12 is the stop variable
data work.earnings (drop=month); set cert.master; Earned=0; do month=1 to 12; earned+(amount+earned)*(rate/12); end; Balance=Amount+Earned; run;
DO UNTIL example
data work.invest; do until(Capital>=50000); capital+2000; capital+capital*.10; Year+1; end; Run; - The DATA step below uses a DO UNTIL statement to perform the calculation until $50,000 is reached. Each iteration of the DO loop represents one year
DO WHILE example
data work.invest; do while(Capital>=50000); capital+2000; capital+capital*.10; Year+1; end;
DAY function
day=day(date); day of month (1-31)
SAS informats
determines how data values are read and stored according to the data type
SAS formats
determines how variable values are printed according to there data type
eof = option
eof means end of file The value of this is set to 1 when the last observation of the dataset isprocessed
The SAS Output Delivery System (ODS)
gives you flexibility in generating, storing, and reproducing SAS procedure and DATA step output along with a wide range of formatting options
ID statement
identifies observations using variable values, instead of observation numbers. It specifies one ore more variables to print at the beginning of the row of the report instead of observation #
PROC IMPORT
import procedure reads data from an external data source and writes it to a SAS data set. Syntax: PROC IMPORT DATAFILE = 'filename' | TABLE = 'tablename' OUT = <libref.SAS-data-set> <DBMS=identifier>
Where can you perform SAS functions
in WHERE expressions, in macro language statements, in the REPORT procedure, and in Structured Query Language (SQL)
BY group
includes all observations with the same BY value. If you use more than 1 variables in the BY statement, a BY group is a group of observations with the same combination of values for these variables
By-group processing
is a method of processing observations from one or more SAS data sets that are grouped or ordered by values of one or more common variables
The date value
is a value that represents the number of days between January 1, 1960, and a specified date. Dates before 1960 are negative, and dates after are positive
Debugging in compilation phase
misspelled keywords and data set names, unbalanced quotations marks, invalid options
BY variables
names a variables by which the data set is sorted
maxdec =
number of decimal places you want in your output data set, 0 would mean it would have no numbers after the decimal
CROSSLIST statement
option displays crosstabulation tables in ODS column format instead of the default crosstabulation cell format
FILENAME statement
points to an external data file, Filerefs perform the same function as librefs: they temporarily point to a storage location of data. However, librefs reference SAS libraries, filerefs are external files
RTF procedure
produces an output for Microsoft word
QTR function
quarter=qtr(date); quarter (1-4)
MDY function
returns SAS date value from month, day, and year values
FIRSTOBS=
starts processing at a specific observation
OBS=
stops processing after a certain number of observations are reached
NODS
suppresses printing the detail information about each file when you specify a libref._all_ (you can specify this ONLY when using ._all_)
PROC FREQ
the FREQ procedure creates a one-way table that contains the frequency, percent, cumulative frequency, and cumulative percent of every value of every variable in the input data set
SET Statement
the SET statement specifies the SAS data set you want to use as input data for the data step Ex: DATA SAS-data-set; SET SAS-data-set Run;
Descriptor Portion of a SAS dataset
the portion of a SAS data set contains information about the data set like: name, date and time that the data set was created, number of observations, number of variables
BY values
the value of the BY variable
Creating Variable
use an assign statement in any SET step in order to modify existing values or create new variables Syntax: variable=expression - variables: names a new or existing variable expression: any valid SAS expression (value)
DROP= and KEEP=
use the DROP= and KEEP= data set options to specify the variables to drop or keep -Example: data work.stresstest (drop=timemin timesec); set cert.tests;
LABEL statement
use the LABEL statement in the PROC PRINT statement to assign a descriptive label to a variable. Ex: LABEL variable='label1' variable2='label2' -Permanent labels need to be assigned in the DATA step
LENGTH statement
use the LENGTH statement to specify a length to avoid truncation of your values
RETAIN statement
use the RETAIN statement to assign an initial value other than 0
IN= data option
use this option to create and name a temporary variable that indicates whether the data set has contributed a variable or not. Value is 1 if the data set contributed data for that specific observation
WEEKDAY function
wkday=weekday(date); day of week (1-7) -1 is Sunday, 7 Saturday
Pageby statement
would put observations grouped by the BY statement on different pages rather than the same page
OPTIONS AND OBS
you can use the OPTIONS statement with OBS=# before the import procedure to limit the number of observations that SAS reads from your external file -Ex: options obs=5; would read the first five records from a data set
SUM with the BY statement
you can use the SUM statement with the BY statement to create subtotals for each aggregation. For example if you do: SUM fee; by actlevel; would create two tables of aggregations divided up by actlevel
How to reference files in SAS
you use a two level name consisting of the library name and the data set name: Libref.dataset
YEAR function
yr=year(date); year (4 digits)