SAS Certification Prep Guide: Base Programming
To specify a condition based on the value of a character variable, follow these rules:
(1) Enclose the value in quotation marks (2) Write the value with lowercase, uppercase, or mixed case letters exactly as it appears in the data set
As a step is compiled, SAS recognizes the end of the current step when it encounters one of the following statements:
(1) a DATA or PROC statement, which indicates the beginning of a new step (2) a RUN or QUIT statement, which indicates the end of the current step
What pieces of information does SAS need in the DATA step in order to read an Excel workbook file?
(1) a libref to reference the Excel workbook to read (2) the name and location (using another libref) of the new SAS data set (3) the name of the Excel worksheet that is to be read
Which set of steps should you perform to revise and resubmit the program?
(1) correct the errors (2) clear the SAS log (3) resubmit the program (4) check the SAS log
Examples of syntax errors
-misspelled keywords -missing RUN statement -unbalanced quotation marks -missing semicolons -invalid options
SORT Procedure
-rearranges the observations in a SAS data set -creates a new SAS data set that contains the rearranged observations -replaces the original SAS data set by default -can sort on multiple variables -can sort in ascending or descending order -treats missing values as the smallest possible values
libref
1 to 8 characters long, begins with a letter or underscore, and contains only letters, numbers, or underscores; specified as the first element in the two-level name for a SAS file
RETAIN statement
1. assigns an initial value to a retained variable 2. prevents variables from being initialized each time the DATA step executes
P10
10th percentile
P1
1st percentile
P5
5th percentile
P90
90th percentile
P95
95th percentile
P99
99th percentile
VALIDVARNAME = ANY
ANY specifies that SAS variable names must follow these rules: (1) The name can begin with or contain any characters, including blanks, national characters, special characters, and multi-byte characters (2) The name can be up to 32 bytes long (3) The name cannot contain any null bytes (4) Leading blanks are preserved, but trailing blanks are ignored (5) The name must contain at least one character. A name with all blanks is not permitted (6) A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables)
The SUBSTR function replaces variable values if it is placed on the left side of an assignment statement. When placed on the right side the function extracts a substring.
Because of the growth within the 919 area code, the telephone exchange 555 is being reassigned to the 920 area code. The data set Clients.Piedmont includes the variable Phone, which contains telephone numbers in the form 919-555-1234. Which of the following programs correctly changes the values of Phone?
both character and numeric variables
By default, PROC FREQ creates a table of frequencies and percentages for which data set variables?
VALIDMENAME = COMPATIBLE
COMPATIBLE is the default system option; specifies that a SAS data set name must follow these rules: (1) The length of the names can be up to 32 characters long (2) Names must begin with a letter of the Latin alphabet (A-Z, a-z) or an underscore. Subsequent characters can be letters of the Latin alphabet, numerals, or underscores (3) Names cannot contain blanks or special characters except for an underscore (4) Names can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables)
VALIDMENAME = EXTEND
EXTEND specifies that the data set name must follow these rules: (1) Names can include national characters (2) The name can include special characters, except for the / \ * ? " < > | : characters (3) The name must contain at least one character (4) The length of the name can be up to 32 bytes (5) Null bytes are not allowed (6) Leading and trailing blanks are deleted when the member is created (7) Names can contain mixed-case letters. SAS internally converts the member name to uppercase. Therefore, you cannot use the same member name with a different combination of uppercase and lowercase letters to represent different variables
Syntax, FILENAME statement
FILENAME fileref 'filename'
- if the modifier is a constant, enclose it in quotation marks -specify multiple constants in a single set of quotation marks -modifier values are not case sensitive
Facts about modifiers and constants
categorical values
Frequency distributions work best with variables that contain which types of values?
Specify OBS=MAX in the options statement
How do you reset the number of the last observation to process?
32,767
How many characters can be used in a label?
Syntax, ID statement
ID variable(s);
DATAROW=
IMPORT procedure statement that indicates at which row SAS begins to read the data
GUESSINGROWS=
IMPORT procedure statement that indicates how many rows SAS scans for variables to determine the type and length
GETNAMES=
IMPORT procedure statement that modifies whether SAS extracts the variable names from the first row of the data set
-prints the word ERROR followed by an error message in the SAS log -compiles but does not execute the step where the error occurred and prints the following message: NOTE: The SAS System stopped processing this step because of errors
If SAS cannot interpret a syntax error during the compilation phase, what does it do?
A RUN statement is not required between steps in a SAS program. However it is a best practice to use a RUN statement because it makes the SAS program easier to read and the SAS log easier to understand when debugging. SAS assumes that the beginning of a new step implies the end of the previous step
Is a RUN statement required between steps in a SAS program?
KEEP variable(s)
KEEP statement
Syntax, LABEL statement
LABEL variable1 = 'label1' variable2 = 'label2'; Labels can be up to 256 characters long. Enclose the label in quotation marks.
LIBNAME libref engine 'SAS-data-library';
LIBNAME statement
Syntax, OUTPUT statement
OUTPUT<SAS-data-set(s)>;
Syntax, PAGEBY statement
PAGEBY BY-variable; *the variable specified in the PAGEBY statement must also be specified in the BY statement in the PROC PRINT step
PROC CONTENTS DATA = SAS-file-specification NODS;
PROC CONTENTS
Syntax, FREQ procedure
PROC FREQ DATA = SAS-data-set <NLEVELS>; TABLES variable(s); RUN;
Syntax, PROC IMPORT
PROC IMPORT DATAFILE = "filename" | TABLE = "tablename" OUT=<libref.SAS-data-set><SAS-data-set-options> <DBMS=identifier> <REPLACE>;
Syntax, MEANS procedure
PROC MEANS DATA = SAS-data-set <statistics>; VAR variable(s); RUN;
Syntax, PROC PRINT step
PROC PRINT DATA=SAS-data-set; RUN;
Syntax, PROC SORT
PROC SORT DATA=SAS-data-set <OUT=SAS-data-set>; BY <DESCENDING> BY-variable(s); RUN;
Syntax, PUT statement
PUT specification(s); specification specifies what is written, how it is written, and where it is written
Syntax, PUTLOG statement
PUTLOG 'message';
RTF
Rich Text Format
(1) Must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (___) (2)They can continue with any combination of numbers, letters, or underscores (3) They can be 1 to 32 characters long (4) SAS library names (librefs) can be 1 to 8 characters long
Rules for SAS names
Output
SAS data sets
January 1, 1960
SAS date values are the number of days since which date?
leap years
SAS does not automatically make adjustments for daylight saving time, but it does make adjustments for which?
step boundary
SAS executes any statement that has not been previously executed and ends the step
When a program that contains an error is submitted, what messages appear in the SAS log?
SAS: -displays the word ERROR -identifies the possible location of the error -gives an explanation of the error
Syntax, SUM statement
SUM variable(s);
What should you do after submitting a program with unbalanced quotation marks in the Windows or UNIX operating environment?
Simply adding a quotation mark and resubmitting your program usually does not solve the problem. SAS still considers the quotation marks to be unbalanced. To correct the error, you need to resolve the unbalanced quotation mark before you recall, correct, and resubmit the program
VALIDVARNAME = V7
Specifies that variable names must follow these rules: (1) SAS variable names can be up to 32 characters long (2) The first character must begin with a letter of the Latin alphabet (A - Z, either upper or lowercase) or an underscore (____). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores (3) Trailing blanks are ignored. The variable name alignment is left - justified (4) A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables) (5) Do NOT assign variables the names of special SAS automatic variables (such as __N__ and __ERROR__) or list variable names (such as ___NUMERIC___, ___CHARACTER___, and ___ALL___) to variables
tab-delimited values
Specify DBMS=DLM to import any other delimited file that does not end in .CSV
only for the current SAS session
Suppose you do not specify the LIBRARY= option and your formats are stored in Work.Formats. How long do they exist?
The TRIM function removes trailing blanks from character values. In this case, extra blanks must be removed from the values of FirstName. Although answer c also works, the extra TRIM function for the variable LastName is unnecessary. Because of the LENGTH statement, all values of FullName are padded to 40 characters
Suppose you need to create the variable FullName by concatenating the values of FirstName, which contains first names, and LastName, which contains last names. What is the best way to remove extra blanks between first names and last names?
CATX(separator,string-1<,....string-n>)
Syntax, CATX function
CEIL(argument)
Syntax, CEIL function
COMPBL(source)
Syntax, COMPBL function
COMPRESS(source<,characters><,modifier(s)>) A - adds alphabetic characters to the list of characters C - control characters to the list of characters D - adds digits to the list of characters F - adds the underscore character and English letters to the list of characters G - adds graphic characters to the list of characters H - adds a horizontal tab to the list of characters I - ignores the case of the characters to be kept or removed K - keeps the characters to be kept or removed L - adds lowercase letters to the list of characters N- adds digits, the underscore character, and English letters to the list of characters O- processes the second and third arguments once rather every time the COMPRESS function is called P-adds punctuation marks to the list of characters S - adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters T - trims trailing blanks from the first and second arguments U - adds uppercase letters to the list of characters W - adds printable characters to the list of characters X - adds hexadecimal characters to the list of characters
Syntax, COMPRESS function
DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-2; RUN;
Syntax, DATA step for concatenating
DATA output-SAS-data-set; MERGE SAS-data-set-1 SAS-data-set-2; BY <DESCENDING> variable(s); RUN;
Syntax, DATA step for match-merging
DATA output-SAS-data-set; SET SAS-data-set-1; SET SAS-data-set-2; RUN;
Syntax, DATA step for one-to-one reading
DATDIF (start_date, end_date, basis) YRDIF (start_date, end_date, basis)
Syntax, DATDIF, and YRDIF functions
DATE ( ) ***The DATE function does not require any arguments, but it must be followed by parentheses
Syntax, DATE function:
DELETE;
Syntax, DELETE statement
DO UNTIL (expression); ....more SAS statements.... END;
Syntax, DO UNTIL statement
DO WHILE (expression); ....more SAS statements... END;
Syntax, DO WHILE statement
DO; SAS statements END;
Syntax, DO group
DO index-variable=specification-1 <,...specification-n>; .....more SAS statements... END;
Syntax, DO statement, iterative
DROP variable(s)
Syntax, DROP statement
(DROP=variable(s))
Syntax, DROP= data set options
ELSE statement;
Syntax, ELSE statement
FIND (string,substring<,modifiers><,startpos>)
Syntax, FIND function
FLOOR(argument)
Syntax, FLOOR function
FORMAT variable(s) format-name;
Syntax, FORMAT statement
PROC FREQ <options>; RUN;
Syntax, FREQ procedure
IF expression THEN statement
Syntax, IF-THEN statement
(IN = variable)
Syntax, IN = data set option
INDEX (source, excerpt)
Syntax, INDEX function
INPUT (source, informat)
Syntax, INPUT function
INT(argument)
Syntax, INT function
INTCK ('interval', from, to) 'interval' specifies a character constant or a variable. Interval can appear in uppercase or lowercase. NOTE: The type of interval (date, time, or datetime) must match the type of value in from
Syntax, INTCK function
INTNX ('interval', start-from, increment, <'alignment'>) NOTE: The type of interval (date, time, or datetime) must match the type of value in 'start-from' and 'increment'
Syntax, INTNX function
(KEEP=variable(s))
Syntax, KEEP= data set options
LEFT(argument) RIGHT(argument)
Syntax, LEFT and RIGHT function
LENGTH variable(s) <$> length;
Syntax, LENGTH statement
LOWCASE(argument)
Syntax, LOWCASE function
MDY (month, day, year)
Syntax, MDY function
PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)><option(s)>; RUN;
Syntax, MEANS procedure
ODS EXCEL <(<ID =>identifier)><action>; ODS EXCEL <(<ID =>identifier)><option(s)>;
Syntax, ODS EXCEL statement
ODS HTML BODY = file-specification; ODS HTML CLOSE;
Syntax, ODS HTML statement
ODS HTML BODY = body-file-specification CONTENTS = contents-file-specification FRAME = frame-file-specification ODS HTML CLOSE; NOTE: If you specify the FRAME= option, you must also specify CONTENTS = option
Syntax, ODS HTML statement to create a linked table of contents:
ODS PDF <(<ID => identifier)> <action>;
Syntax, ODS PDF statement
ODS RTF <(<ID =>identifier)><action>;
Syntax, ODS RTF statement
ODS open-destination; ODS close-destination CLOSE;
Syntax, ODS statement to open and close destinations
PATH = file-location-specification<(URL = NONE | "Uniform-Resource-Locator">
Syntax, PATH = option with the URL = suboption
PROC FORMAT <options>;
Syntax, PROC FORMAT statement
PROPCASE (argument<, delimiter(s)>)
Syntax, PROPCASE function
PUT (source, format)
Syntax, PUT function
(RENAME = (old-variable-name = new-variable-name))
Syntax, RENAME = data set option
RETAIN variable <initial-value>;
Syntax, RETAIN statement
ROUND(argument,round-off- unit)
Syntax, ROUND function
function-name (argument-1, argument-2, argument-n);
Syntax, SAS function
SCAN (argument, n<,<delimiters>>)
Syntax, SCAN function
STYLE = style-name;
Syntax, STYLE = option
SUBSTR(argument, position<,n>)
Syntax, SUBSTR function
TODAY ( ) ***The TODAY function does not require any arguments, but it must be followed by parentheses
Syntax, TODAY function:
TRANWARD (source,target, replacement)
Syntax, TRANWARD function
TRIM (argument)
Syntax, TRIM function
UPCASE (argument)
Syntax, UPCASE function
(URL = "Uniform-Resource-Locator";
Syntax, URL = suboption in a file specification
VALUE format-name range1='label1' range2='label2' ...more format-names...;
Syntax, VALUE statement
The WEEKDATEw. format writes date values in a format that displays the day of the week, month, day, and year; in the form day-of-week, month-name dd, yy (or yyyy)
Syntax, WEEKDATEw. format
WEEKDAY(date) date is a SAS date value that is specified either as a variable or as a SAS date constant
Syntax, WEEKDAY function:
The WORDDATEw. format writes date values in the form month-name dd, yyyy.
Syntax, WORDDATEw. format
YEAR(date) QTR(date) MONTH(date) DAY(date) date is a SAS date value that is specified either as a variable or as a SAS date constant
Syntax, YEAR, QTR, MONTH, and DAY functions:
variable = expression
Syntax, assignment statement
'ddmmmyy'd OR 'ddmmmyy'd
Syntax, date constant
IF expression;
Syntax, subsetting IF statement
variable + expression
Syntax, sum statement
The DATETIMEw. informat reads expressions that consist of two parts, a date value and a time value, in the form: ddmmmyy hh:mm:ss.ss
Syntax, values read with DATETIMEw. informat
The DATEw. informat reads date values in the form ddmmmyy or ddmmmyyyy
Syntax, values read with DATEw. informat
reads date values in the form MMDDYY or MMDDYYYY
Syntax, values read with MMDDYYw. informat
The TIMEw. informat reads values in the form hh:mm:ss.ss 5 is the minimum acceptable field width for the TIMEw. informat
Syntax, values read with TIMEw. informat
Syntax, TITLE and FOOTNOTE statements
TITLE<n>'text'; FOOTNOTE<n>'text'; n- a number 1 to 10
standard deviation
The default statistics produce by the MEANS procedure are n-count, mean, minimum, maximum, and which one of the following statistics:
QRANGE
The interquartile range and is calculated as the difference between the upper and lower quartile, Q3-Q1
The SCAN function is used to extract words from a character value when you know the order of the words, when their position varies, and when the words are marked by some delimiter
The variable Address2 contains values such as Piscataway, NJ. How do you assign the two-letter state abbreviations to a new variable named State?
The SUBSTR function is best used when you know the exact position of the substring to extract from the character value. You specify the position to start from and the number of characters to extract
The variable IDCode contains values such as 123FA and 321MB. The fourth character identifies sex. How do you assign these character codes to a new variable named Sex?
VALIDVARNAME = UPCASE
UPCASE specifies that the variable name follows the same rules as V7,except that the variable name is uppercase, as in earlier versions of SAS
JMP files
Use DBMS=JMP to specify importing JMP files. JMP variable names can be up to 255 characters long. SAS supports importing JMP files that have more than 32,767 variables
VALIDMENAME = EXTEND system option
Use when the characters in a SAS data set name contains one of the following: (1) international characters (2) characters supported by third-party databases (3) characters that are commonly used in a filename
as many as you want
Using ODS statements, how many types of output can you generate at once?
Syntax, VAR statement
VAR variables;
Syntax, WHERE statement
WHERE where-experession;
PROC FREQ PROC MEANS
What SAS procedures can be used to detect invalid data?
TITLE, LIBNAME, OPTIONS, and FOOTNOTE
What are some common global statements?
Sashelp, Sasuser, Work
What are the predefined SAS libraries?
- the new data set contains all the variables from all the input data sets. If the data sets contain variables that have the same names, the values that are read from the last data set overwrite the values that were read from earlier data sets -the number of observations in the new data set is the number of observations in the smallest original data set. Observations are combined based on their relative position in each data set.
What are the products of one-to-one reading?
(1) statements that are used in DATA and PROC steps (2) statements that are global in scope and can be used anywhere in a SAS program
What are two types of SAS statements?
- the length of the variable's first reference in the DATA step - the assignment statement - the LENGTH statement
What determines the length of a new variable?
-misspelled keywords and data set names -unbalanced quotation marks -invalid options
What errors are detected during the compilation phase?
(1) program data vector (PDV) (2) descriptor information
What is created during the compilation phase?
Define the libraries To reference a permanent SAS file: (1) assign a name (libref) to the SAS library in which the file is stored (2)use the libref as the first part of the two-level name (libref.filename) to reference the file within the library
What is often the first step in setting up your SAS session?
They can be used in calculations like other numeric values
What is the advantage of storing dates and times as SAS numeric date and time values?
1) Unlike CLASS processing, BY-group processing requires that your data already be indexed or sorted in the order of the BY variables. You might need to run the SORT procedure before using the PROC MEANS with a BY group 2) A CLASS statement produces a single large table, whereas BY-group processing creates a series of small tables. The order of the variables in the CLASS statement determines their order in the output table
What is the difference between CLASS processing and BY-group processing?
5
What is the minimum width of the TIMEw. informat?
- the data set descriptor -the program data vector -the ___N___ and ___ERROR___ automatic variables
What is written to the output during the compilation phase?
- they can specify a single value, such as 24 or 'S' -a range of numeric values, such as 0-1500 -a range of character values, such as 'A'-'M'
What ranges can a VALUE statement specify?
(1) compilation phase (2) execution phase
What two phases is a SAS DATA step processed?
Frequency distributions work best with variables whose values are categorical, and whose values are better summarized by counts rather than by averages
What variables does frequency distributions work best with?
-A note, warning, or error message is displayed in the SAS log -The values that are stored in the PDV are displayed in the SAS log -The processing of the step either continues or stops
When SAS detects an error during the execution phase, what happens (depending on the type of error)?
-a character value is assigned to a previously defined numeric variable -a character value is used in an arithmetic operation -a character value is compared to a numeric value, using a comparison operator -a character value is specified in a function that requires numeric arguments
When does automatic character-to-numeric conversions occur?
numeric data values are converted to character values whenever they are used in a character context
When does automatic numeric-to-character conversion occur?
-you know the order of the words in the character value -the starting position of the words varies -the words are marked by some delimiter
When should you use the SCAN function?
-extracts a portion of a value by starting at a specified location -is best used when you know the exact position of the string you want to extract from the character value
When should you use the SUBSTR function?
D:\Output \body.html
When the following code runs, what file is loaded by the links in D:\Output\contents.html
How to reference a fully qualified single external file?
When you associate a fileref with an individual external file, you specify the fileref in subsequent SAS statements and commands
How can you tell whether you have specified an invalid option in a SAS program?
When you submit a SAS statement that contains an invalid option, a log message notifies you that the option is not valid or not recognized. You should recall the program, remove or replace the invalid option, check your statement syntax as needed, and resubmit the corrected program
OTHER
Which keyword can be used to label missing numeric values as well as any values that are not specified in a range?
FMTLIB
Which keyword, when added to the PROC FORMAT statement, displays all the formats in your catalog?
VAR statement VAR boarded transfer deplane;
Which statement limits a PROC MEANS analysis to the variables Boarded, Transfer, and Deplane?
Use the INDEX function in a subsetting IF statement, enclosing the character string in quotation marks. Only those observations in which the function locates the string and returns a value greater than 0 are written to the data set
Within the data set Cert.Bookcase, the variable Finish contains values such as ash, cherry, teak, matte-black. Which of the following creates a subset of the data in which the values of Finish contain the string walnut? Make the search for the string case-insensitive.
Suppose you submit a short, simple DATA step. If the active window displays the message "DATA step running" for a long time, what probably happened?
Without a RUN statement (or a following DATA or PROC step), the DATA step does not execute, so it continues to run. Unbalanced quotation marks can also cause the DATA step running message if relatively little code follows the unbalanced quotation mark.
You permanently associate the formats with variables
You can place the FORMAT statement in either a DATA step or a PROC step. What happens when you place it in a DATA step?
-ascending or descending character order -ascending or descending numeric order -the data must be grouped in some way
Your data does not require any preprocessing if the observations in all of the data sets occur in which of the following patterns?
SAS data set
a data file that is formatted in a way that SAS can understand; consists of two parts - a descriptor portion and a data portion
Windows, UNIX
a group of SAS files that are stored in the same directory. Other files can be stored in the directory, but only the files that has SAS file extension are recognized as part of the SAS library
document
a hierarchy of output objects that enables you to render multiple ODS output without rerunning procedures
program data vector (PDV)
a logical area in memory where SAS builds a data set, one observation at a time
BY-group processing
a method of processing observations from one or more SAS data sets that are grouped or ordered by values of one or more common variables
SAS name literal
a name token that is expressed as a string within quotation marks, followed by the uppercase or lowercase letter n; tells SAS to allow the special character ($) in the data set name
Sasuser
a permanent library that contains SAS files in the Profile catalog and that stores your personal settings; also a convenient place to store your own files
Sashelp
a permanent library that contains sample data and other files that control how SAS works at your site; this is a Read-Only library
PROC FREQ
a procedure that is used to give descriptive statistics about a SAS data set; creates one-way, two-way, and n-way frequency tables; also describes data by reporting the distribution of variable values
index
a separate file that you can create for a SAS data file in order to provide direct access to a specific observation; purpose is to optimize WHERE expressions and facilitate BY - group processing
step
a sequence of SAS statements
expression
a sequence of operands and operators that form a set of instructions
SAS engine
a set of internal instructions that SAS uses for writing to and reading from files in a SAS library or a third-party database
Engine
a set of internal instructions that SAS uses for writing to and reading from files in a library; each one allows you to read different file format, including file formats from other software vendors
z/OS
a specially formatted host data set in which only SAS files can be stored
work
a temporary library for files that do not need to be saved from session to session
SAS statement
a type of SAS language element that is used to perform a particular operation in a SAS program or to provide information to a SAS program; free format; ends with a semicolon; begins with a SAS keyword
SAS datetime value
a value representing the number of seconds between January 1, 1960, and an hour/minute/second within a specified date; makes adjustments for leap years but ignores leap seconds; does not make adjustments for daylight saving time
SAS time value
a value representing the number of seconds since midnight of the current day; values between 0 and 86400
SAS date value
a value that represents the number of days between January 1, 1960 and a specified date; can be perform calculations on dates ranging from 1582 C.E. to 19,900 C.E; dates before January 1, 1960 are negative numbers and dates after are positive numbers
format
affects how data values are written; do NOT change the stored value in any way; merely controls how the data is displayed
PROC (procedure) step
analyzes data, produces output, or manages SAS files; output can be of several types, such as a report or an updated SAS data set
concatenating
appends the observations from one data set to another data set; requires a list of data set names in the SET statement and one or more BY variables in the BY statement; the new data set contains ALL the variables from all the input data sets, as well as the total number of records from all input data sets
Defining a library
assign a library name to it and specify the location of the files, such as a directory path
COMPRESS
begins the display of the next one-way frequency table on the same page as the preceding one-way table if there is enough space to begin the table NOTE: is not valid with the PAGE option
RANGE
calculated as the difference between the maximum value and the minimum value
modifier i
causes the FIND function to ignore character case during the search; if this modifier is not specified, FIND searches for character substrings with the same case as the character in substring
delimiters
characters that are specified as word separators
observations (also called rows)
collections of data values that usually relate to a single object
variables (also called columns)
collections of values that describe a particular characteristic
SAS log
collects messages about the processing of SAS programs and about any errors that occur
match-merging
combines observations from two or more data sets into a single observation in a new data set according to the values of a common variable; use a MERGE statement rather than the SET statement to combine data sets
one-to-one reading
combines rows from two or more data sets by creating rows that contain all the columns from each contributing data set; rows are combined based on their relative position in each data set
comma-separated values (CSV)
comma-separated file with a .CSV extension, DBMS= is optional
SAS program
consists of a sequence of steps; can be any combination of DATA or PROC steps
PROC CONTENTS
creates SAS output that describes either of the following: 1. the contents of a library 2. the descriptor information for an individual SAS data set
explicit OUTPUT statement
creates an observation for each iteration; overrides automatic output, causing SAS to add an observation to the data set only when the statement is executed
DATA step
creates or modifies data; output can be of several types, such as a SAS data set or a report
IN = data set option
data set option to create and name a temporary variable that indicates whether the data set contributed data to the current observation; it is not included in the output SAS data set
FORMCHAR (1,2,7) = 'formchar-string'
defines the characters to be used for constructing the outlines and dividers for the cells of crosstabulation table displays. The characters are used to draw the vertical separators (position 1), the horizontal separators (position 2) and the vertical-horizontal intersections (position 7)
SAS informat
determines how data values are read and stored according to the data type: numeric, character, date, time, or timestamp
SAS format
determines how variable values are printed according to the data type: numeric, character, date, time, or timestamp
FMTLIB
displays a list of all of the formats in your catalog, along with descriptions of their values
PAGE
displays only one table per page
NLEVELS
displays the "number of variable levels" table, which provides the number of levels for each variable named in the TABLES statement
iteration
each loop (or cycle or execution)
WEEKDAY function
enables you to extract the day of the week from a SAS date value; returns a numeric value from 1 to 7 (representative of the days of the week)
data errors
errors that occur when data values are not appropriate for the SAS statements that are specified in a program
syntax errors
errors that occur when program statements do not conform to the rules of the SAS language
semantic errors
errors that occur when you specify a language element that is not valid for a particular usage
IF-THEN statement
executes a SAS statement when the condition in the IF clause is true
iterative DO statement
executes statements between the DO and END statements repetitively, based on the value of an index variable
DO WHILE expression
expression is evaluated before each execution of the loop, so that the statements inside the group are executed repetitively while the expression is true
SUBSTR function
extracts a substring from an argument, starting at a specific position in the string
DAY function form: DAY(date)
extracts the day value from a SAS date value
MONTH function form: MONTH(date)
extracts the month value from a SAS date value
QTR function form: QTR(date)
extracts the quarter value from a SAS date value
YEAR function form: YEAR(date)
extracts the year value from a SAS date value
INTNX function
function applies multiples of a given interval to a date, time, or datetime value and returns the resulting value
LOWCASE function
function converts all letters in a character expression to lowercase
UPCASE function
function converts all letters in a character expression to uppercase
PROPCASE function
function converts all words in an argument to proper case (so that the first letter in each word is capitalized)
INPUT function
function converts character data values to numeric values; requires an informat
PUT function
function converts numeric data values to character values; requires a format
CATX function
function enables you to concatenate character strings, remove leading and trailing blanks, and insert operators; the results are usually equivalent to those that are produced by a combination of the concatenation operator and the TRIM and LEFT functions
INDEX function
function enables you to search a character value for a specified string; searches values from left to right, looking for the first occurrence of the string NOTE: The function is case sensitive
FIND function
function enables you to search for a specific substring of characters within a specified character string -searches the string, from left to right, fro the first occurrence of the substring, and returns the position in the string of the substring's first character -if the substring is not found in the string, the function returns a value of 0 -if there are multiple occurrences of the substring, the function returns only the position of the first occurrence
LEFT function
function left-aligns a character expression; returns an argument with leading blanks moved to the end of the value
catalogs
function like subfolders for grouping other members in SAS libraries
COMPBL function
function removes multiple blanks from a character string by translating each occurrence of two or more consecutive blanks into a single blank
TRIM Function
function removes trailing blanks from character expressions and returns one blank in the expression contains missing values; useful for concatenating because the concatenation operator does not remove trailing blanks
TRANWARD function
function replaces or removes all occurrences of a word in a character string; translated characters can be located anywhere in the string
MDY function
function returns a SAS date value from month, day, and year values; can add the same SAS date to every observation
COMPRESS function
function returns a character string with specified characters removed from the original string; null arguments are allowed and treated as a string with a length of zero
TODAY function
function returns the current date as a numeric SAS date value, which is the number of days since January 1, 1960 NOTE: If the value of the TIMEZONE= system option is set to a time zone name or time zone ID, the return values for date and time are determined by the time zone
DATE function
function returns the current date as a numeric SAS date value, which is the number of days since January 1, 1960 NOTE: If the value of the TIMEZONE= system option is set to a time zone name or time zone ID, the return values for date and time are determined by the time zone
INT function
function returns the integer portion of a numeric value (any decimal portion of the function argument is discarded)
FLOOR function
function returns the largest integer that is less than or equal to the argument
INTCK function
function returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values; counts intervals from fixed interval beginnings, not in multiples of an interval unit from the "from" value
CEIL function
function returns the smallest integer that is greater than or equal to the argument
RIGHT function
function right-aligns a character expression; returns an argument with trailing blanks moved to the start of the value
Round Function
function round values to the nearest specified unit
DATDIF and YRDIF
functions calculate the difference in days and years between two SAS dates
SAS Output Delivery System (ODS)
gives you flexibility in generating, storing, and reproducing SAS procedure and DATA step output along with a wide range of formatting options
SAS library
highest level of organization for information within SAS; a collection of one or more SAS files, including SAS data sets, that are referenced and stored as a unit; in a directory - based operating environment, it's a group of SAS files that are stored in the same directory; in z/OS, it's a group of SAS files that are stored in an operating environment
name
identifies a variable (any valid SAS name)
type
identifies a variable as numeric or character
ID statement
identifies observations using variable values, such as identification number, instead of observation numbers
OUT= <libref.> SAS-data-set
identifies the output SAS data set with either a one or two-level SAS name (library and member name); if the specified SAS data set does not exist, the IMPORT procedure creates it
DROP= option
if you never reference certain variables and you do not want them to appear in the new data set, use this option in the SET statement; when this option is used in the DATA statement, it drops the variables from
BY group
includes all observations with the same BY value. If you use more than one in a BY statement, a BY group is a group of observations with the same combination of values for these variables; has a unique combination of values for the variables
descriptor portion
information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes
___ERROR___
initialized to 0, set to 1 when an error occurs; displays debugging messages when an error occurs
Example of semantic error
invalid option
Uniform-Resource-Locator
is the name of an HTML file or the full URL of an HTML file. ODS uses this URL instead of the file specification in all the links and references that it creates that point to the file
STDDEV | STD
is the standard deviation s and is computed as the square root of the variance
permanent SAS libraries
libraries that are available to you during subsequent SAS sessions; referenced libref.dataset
temporary SAS libraries
libraries that last only for the current SAS session; referenced libref.filename (ex: work.test) OR the data set name only (one - level name) (ex: Test)
logical operators
links a sequence of expressions into compound expressions - AND (&); OR (|)
What types of errors can the PUTLOG statement help you resolve?
logic errors
Which type of delimited file does PROC IMPORT read by default?
logical record-length files
KURTOSIS | KURT
measures the heaviness of tails
SKEWNESS | SKEW
measures the tendency of the deviations to be larger in one direction than in the other
BY variable
names a variable or variables by which the data set is sorted. All data sets must be ordered by the values of the BY variable
DATA = SAS-data-set
names the data set to be analyzed by PROC FREQ
logic error
occurs when the program statements follow the rules and execute, but produce incorrect results; difficult to detect because no notes are written to the log
concatenation operator
operator concatenates character values; can be expressed as || (two vertical bars), two broken vertical bars), or !! (two exclamation points)
OUT= option
option identifies the output SAS data set
SHEET option
option to import specific worksheets from an EXCEL workbook
varnum
option used in the PROC CONTENTS statement to list variable names in the order of their logical position (or creation order) in the data set
NOOBS option
option used to suppress observation numbers
Printer Family (PDF, and so on)
output that is formatted for a high-resolution printer such as PostScript(PS), Portable Document Format (PDF), or Printer Control Language (PCL) files
HTML
output that is formatted in Hypertext Markup Language (HTML). You do not have to specify the ODS HTML statement to produce basic HTML output
Markup Languages Family
output that is formatted using markup languages such as Extensible Markup Language (XML)
<REPLACE> REPLACE= used to replace a permanent SAS data set
overwrites an existing SAS data set; if option is omitted, the IMPORT procedure does not overwrite an existing data set
data portion
portion of a SAS data set is a collection of data values that are arranged in a rectangular table
descriptor portion
portion of the data set contains information about the properties of each variable in the data set and about the data set, including the following: (1) the name of the data set (2) the date and time that the data set was created (3) the number of observations (4) the number of variables (5) variable's name, type, length, format, informat, and label
SAS functions
pre-written routines that perform computations or system manipulations on arguments and return a value; can return either character or numeric result
PROC MEANS
procedure provides data summarization tools to compute descriptive statistics for variables across all observations and within groups of observations -calculates descriptive statistics based on moments -estimates quantiles, which includes the median -calculates confidence limits for the mean -identifies extreme values -performs a t test
PROC IMPORT
procedure reads structured and unstructured data from an external data source and writes it to a SAS data set
nesting
putting a DO loop within a DO loop
SAS date and time informats
read date and time expressions and convert them to SAS date and time values
informat
reads data values in certain forms into standard SAS values; determine how data values are read into a SAS data set
SAS/ACCESS LIBNAME statement
references an excel workbook file
label
refers to a descriptive label of up to 256 characters long
length
refers to the number of bytes used to store each of the variable's values in a SAS data set
one-to-one matching
requires multiple set statements; where the same-named variables occur, values that are read from the second data set replace those that are read from the first data set; also, the number of observations in the new data set is the number of observations in the smallest original data set
SCAN function
returns the nth word from a character string; enables you to separate a character value into words and to return a specified word
CONTAINS (?) operator
selects observations that include the specified substring; symbol is the ?
operators
special-character operators, grouping parentheses, or functions
<SAS-data-set-options>
specifies SAS data set options; cannot specify data set options when importing delimited, comma-separated, or tab-delimited external files; for example, ALTER=, PW=, READ=, or WRITE=
'30/360'
specifies a 30-day month and a 360-day year; a valid DATDIF and YRDIF functions
WHERE expression
specifies a condition for selecting observations; should be only one in a step; if multiple statements are issued, only the last statement is processed
SAS-file-specification
specifies an entire library or a specific SAS data set within a library; can take one of the following forms: <libref>SAS-data-set names one SAS data set to process <libref>__ALL__ requests a listing of all files in the library (Use a period (.) to append __ALL__ to the libref)
DESCENDING option
specifies that the data set is sorted in descending order by the variable that immediately follows
NOTSORTED option
specifies that the observations in the data set that have the same BY values are grouped together, but are not necessarily sorted in alphabetical or numeric order
SET statement
specifies the SAS data set that you want to use as input data for your DATA step
DATAFILE = "filename" | "fileref"
specifies the complete path and filename or fileref for the input PC file, spreadsheet, or delimited external file (omit the quotation marks if the fileref, complete path, or filename does not include special characters)
LIBRARY=libref
specifies the libref for a SAS library to store a permanent catalog of user-defined formats
TABLE= "tablename"
specifies the name of the input DBMS table; if the name does not include special characters (such as question marks), lowercase characters, or spaces, you can omit the quotation marks; NOTE that the DBMS table name might be case sensitive
FIRSTOBS=
specifies the number of the first observation to process
OBS=
specifies the number of the last observation to process
<ORDER= DATA | FORMATTED | FREQ | INTERNAL > =
specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement
VALIDVARNAME = system option
specifies the rules for valid SAS variable names that can be created and processed during a SAS session; set these rules using the VALIDVARNAME= system option
<DBMS=identifier>
specifies the type of data to import
$w.
specifies values as character values in w spaces
MMDDYYw.
specifies values as date values of the form 09/12/17 (MMDDYY8.) or 09/12/2017 (MMDDYY10.)
DATEw.
specifies values as date values of the form 16OCT17 (DATE7.) or 16OCT2017 (DATE9.)
w.d
specifies values that are rounded to d decimal places in w spaces
w.
specifies values that are rounded to the nearest integer in w spaces
COMMAw.d
specifies values that contain commas and decimal places
DOLLARw.d
specifies values that contain dollar signs, commas, and decimal places
subsetting IF statement
statement causes the DATA step to continue processing only those observations that meet the condition of the expression specified in the IF statement
DELETE statement
statement determines which observations to omit as you read data
DO UNTIL expression
statement executes a DO loop until the expression becomes true; expression is evaluated after each execution of the loop, so that the statements inside the group are executed repetitively until the expression is true (always executes at least once)
DATA statement
statement indicates the beginning of the DATA step and names the SAS data set to be created
ELSE statement
statement must immediately follow the IF-THEN statement in your program; executes only if the previous IF-THEN/ELSE statement is false
FILENAME statement
statement used to point to the location of the external file that contains the data
global statements
statements that are used anywhere in a SAS program and stay in effect until changed or canceled, or until the SAS session ends; do not require a run statement
PROC FORMAT
stores user-defined formats and informats as entries in a SAS catalog
SUM
sum
NOPRINT
suppresses the display of all output
NODS
suppresses the printing of detailed information about each file when you specify __ALL__ ; can only be specified together with __ALL__
LRECL= system option
system option specifies the default logical record length to use when reading external files
T
the Student's t statistic to test the null hypothesis that the population mean is equal to mu (u0)
MEAN
the arithmetic mean or average of all the values
Webwork
the default output library in interactive mode when using SAS Studio
filename
the fully qualified name or location of the file
Q1 | P25
the lower quartile or 25th percentile
MAX
the maximum value
MEDIAN | P50
the middle value or the 50th percentile
MIN
the minimum value
SAS-data-library
the name of a SAS library in which SAS data files are stored; specification of the physical name of the library differs by operating environment
engine
the name of a library engine that is supported in your operating environment; when the default is used it does NOT have to be specified in the LIBNAME statement
Fileref
the name that you associate with an external file; the name must be one to eight characters long, begin with a letter or underscore, and contain only letters, numbers, or underscores; when associated with an individual external file, it can be specified in subsequent SAS statements and commands
NMISS
the number of observations with missing values
N
the number of observations with nonmissing values
___N___
the number of times a data step iterated; displays debugging messages for a specified number of iterations of the DATA step
UCLM
the one-sided confidence limit above the mean
LCLM
the one-sided confidence limit below the mean
CV
the percent coefficient of variation
STDERR | STDMEAN
the standard error of the mean
CSS
the sum of squares corrected for the mean
SUMWGT
the sum of weights
CLM
the two-sided confidence limit for the mean
PROBT | PRT
the two-tailed p-value for Student's t statistic, T, with n-1 degrees of freedom. This value is the probability under the null hypothesis of obtaining a more extreme value of T than is observed in this sample
Q3 | P75
the upper quartile or 75th percentile
BY value
the value of the BY variable
USS
the value of the uncorrected sum of squares
MODE
the value that occurs most frequently
target variable
the variable to which the result of a function is assigned Ex: AvgScore = mean (exam1, exam2, exam3);
modifier t
trims trailing blanks from string and substring
PUTLOG statement
use this statement to ensure that debugging messages are written to the SAS log and not to the external file; can be used to write to the SAS log in both batch and interactive modes
assignment statement
used in any DATA step in order to modify existing values or create new variables; -transform variables -create new variables -conditionally process variables -calculate new values -assign new values
MAXDEC = option
used in the PROC MEANS statement to limit the number of decimal places preferred
VALUE statement
used to define a format for displaying one or more values
CLASS statement
used to produce separate analyses of grouped observations;
VALIDMENAME = system option
used to specify rules for naming SAS data sets
PUT statement
used when the source of the program error is not apparent; statement is used to examine variable values and to print your own message in the log
extended attributes
user-defined metadata that is defined for a data set or for a variable (column); represented as name-value pairs
'ACT/360'
uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 360); a valid YRDIF function
'ACT/365'
uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 365)
'ACT/ACT'
uses the actual number of days or years between dates; a valid DATDIF and YRDIF function
OUTPUT <SAS-data-set(s)>
using an output statement without a following data set name causes the current observation to be written to all data sets that are specified in the DATA statement
operands
variable names or constants; can be numeric, character, or both
FIRST.variable and LAST.variable
variables that SAS creates for each BY variable; set when creating the first and last variable in a BY group; these assignments enable you to take different actions, based on whether processing is starting for a new BY group or ending for a BY group
character variables
variables that can contain any values; can be up to 32,767 bytes long; a blank space is the missing default; default informat $w.
numeric variables
variables that can contain only numeric values (the numerals 0 through 9, +, -, and E for scientific notation); has a default length of 8 bytes (stored as floating point numbers in 8 bytes of storage); a period (.) is the default missing value; default informat is w.d
VAR
variance
EXCEL
writes EXCEL spreadsheet files that are compatible with Microsoft Office 2010 and later versions