SAS Certification Prep Guide: Base Programming

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

To specify a condition based on the value of a character variable, follow these rules:

(1) Enclose the value in quotation marks (2) Write the value with lowercase, uppercase, or mixed case letters exactly as it appears in the data set

As a step is compiled, SAS recognizes the end of the current step when it encounters one of the following statements:

(1) a DATA or PROC statement, which indicates the beginning of a new step (2) a RUN or QUIT statement, which indicates the end of the current step

What pieces of information does SAS need in the DATA step in order to read an Excel workbook file?

(1) a libref to reference the Excel workbook to read (2) the name and location (using another libref) of the new SAS data set (3) the name of the Excel worksheet that is to be read

Which set of steps should you perform to revise and resubmit the program?

(1) correct the errors (2) clear the SAS log (3) resubmit the program (4) check the SAS log

Examples of syntax errors

-misspelled keywords -missing RUN statement -unbalanced quotation marks -missing semicolons -invalid options

SORT Procedure

-rearranges the observations in a SAS data set -creates a new SAS data set that contains the rearranged observations -replaces the original SAS data set by default -can sort on multiple variables -can sort in ascending or descending order -treats missing values as the smallest possible values

libref

1 to 8 characters long, begins with a letter or underscore, and contains only letters, numbers, or underscores; specified as the first element in the two-level name for a SAS file

RETAIN statement

1. assigns an initial value to a retained variable 2. prevents variables from being initialized each time the DATA step executes

P10

10th percentile

P1

1st percentile

P5

5th percentile

P90

90th percentile

P95

95th percentile

P99

99th percentile

VALIDVARNAME = ANY

ANY specifies that SAS variable names must follow these rules: (1) The name can begin with or contain any characters, including blanks, national characters, special characters, and multi-byte characters (2) The name can be up to 32 bytes long (3) The name cannot contain any null bytes (4) Leading blanks are preserved, but trailing blanks are ignored (5) The name must contain at least one character. A name with all blanks is not permitted (6) A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables)

The SUBSTR function replaces variable values if it is placed on the left side of an assignment statement. When placed on the right side the function extracts a substring.

Because of the growth within the 919 area code, the telephone exchange 555 is being reassigned to the 920 area code. The data set Clients.Piedmont includes the variable Phone, which contains telephone numbers in the form 919-555-1234. Which of the following programs correctly changes the values of Phone?

both character and numeric variables

By default, PROC FREQ creates a table of frequencies and percentages for which data set variables?

VALIDMENAME = COMPATIBLE

COMPATIBLE is the default system option; specifies that a SAS data set name must follow these rules: (1) The length of the names can be up to 32 characters long (2) Names must begin with a letter of the Latin alphabet (A-Z, a-z) or an underscore. Subsequent characters can be letters of the Latin alphabet, numerals, or underscores (3) Names cannot contain blanks or special characters except for an underscore (4) Names can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables)

VALIDMENAME = EXTEND

EXTEND specifies that the data set name must follow these rules: (1) Names can include national characters (2) The name can include special characters, except for the / \ * ? " < > | : characters (3) The name must contain at least one character (4) The length of the name can be up to 32 bytes (5) Null bytes are not allowed (6) Leading and trailing blanks are deleted when the member is created (7) Names can contain mixed-case letters. SAS internally converts the member name to uppercase. Therefore, you cannot use the same member name with a different combination of uppercase and lowercase letters to represent different variables

Syntax, FILENAME statement

FILENAME fileref 'filename'

- if the modifier is a constant, enclose it in quotation marks -specify multiple constants in a single set of quotation marks -modifier values are not case sensitive

Facts about modifiers and constants

categorical values

Frequency distributions work best with variables that contain which types of values?

Specify OBS=MAX in the options statement

How do you reset the number of the last observation to process?

32,767

How many characters can be used in a label?

Syntax, ID statement

ID variable(s);

DATAROW=

IMPORT procedure statement that indicates at which row SAS begins to read the data

GUESSINGROWS=

IMPORT procedure statement that indicates how many rows SAS scans for variables to determine the type and length

GETNAMES=

IMPORT procedure statement that modifies whether SAS extracts the variable names from the first row of the data set

-prints the word ERROR followed by an error message in the SAS log -compiles but does not execute the step where the error occurred and prints the following message: NOTE: The SAS System stopped processing this step because of errors

If SAS cannot interpret a syntax error during the compilation phase, what does it do?

A RUN statement is not required between steps in a SAS program. However it is a best practice to use a RUN statement because it makes the SAS program easier to read and the SAS log easier to understand when debugging. SAS assumes that the beginning of a new step implies the end of the previous step

Is a RUN statement required between steps in a SAS program?

KEEP variable(s)

KEEP statement

Syntax, LABEL statement

LABEL variable1 = 'label1' variable2 = 'label2'; Labels can be up to 256 characters long. Enclose the label in quotation marks.

LIBNAME libref engine 'SAS-data-library';

LIBNAME statement

Syntax, OUTPUT statement

OUTPUT<SAS-data-set(s)>;

Syntax, PAGEBY statement

PAGEBY BY-variable; *the variable specified in the PAGEBY statement must also be specified in the BY statement in the PROC PRINT step

PROC CONTENTS DATA = SAS-file-specification NODS;

PROC CONTENTS

Syntax, FREQ procedure

PROC FREQ DATA = SAS-data-set <NLEVELS>; TABLES variable(s); RUN;

Syntax, PROC IMPORT

PROC IMPORT DATAFILE = "filename" | TABLE = "tablename" OUT=<libref.SAS-data-set><SAS-data-set-options> <DBMS=identifier> <REPLACE>;

Syntax, MEANS procedure

PROC MEANS DATA = SAS-data-set <statistics>; VAR variable(s); RUN;

Syntax, PROC PRINT step

PROC PRINT DATA=SAS-data-set; RUN;

Syntax, PROC SORT

PROC SORT DATA=SAS-data-set <OUT=SAS-data-set>; BY <DESCENDING> BY-variable(s); RUN;

Syntax, PUT statement

PUT specification(s); specification specifies what is written, how it is written, and where it is written

Syntax, PUTLOG statement

PUTLOG 'message';

RTF

Rich Text Format

(1) Must begin with a letter (A-Z, either uppercase or lowercase) or an underscore (___) (2)They can continue with any combination of numbers, letters, or underscores (3) They can be 1 to 32 characters long (4) SAS library names (librefs) can be 1 to 8 characters long

Rules for SAS names

Output

SAS data sets

January 1, 1960

SAS date values are the number of days since which date?

leap years

SAS does not automatically make adjustments for daylight saving time, but it does make adjustments for which?

step boundary

SAS executes any statement that has not been previously executed and ends the step

When a program that contains an error is submitted, what messages appear in the SAS log?

SAS: -displays the word ERROR -identifies the possible location of the error -gives an explanation of the error

Syntax, SUM statement

SUM variable(s);

What should you do after submitting a program with unbalanced quotation marks in the Windows or UNIX operating environment?

Simply adding a quotation mark and resubmitting your program usually does not solve the problem. SAS still considers the quotation marks to be unbalanced. To correct the error, you need to resolve the unbalanced quotation mark before you recall, correct, and resubmit the program

VALIDVARNAME = V7

Specifies that variable names must follow these rules: (1) SAS variable names can be up to 32 characters long (2) The first character must begin with a letter of the Latin alphabet (A - Z, either upper or lowercase) or an underscore (____). Subsequent characters can be letters of the Latin alphabet, numerals, or underscores (3) Trailing blanks are ignored. The variable name alignment is left - justified (4) A variable name can contain mixed-case letters. SAS stores and writes the variable name in the same case that is used in the first reference to the variable (However when SAS processes a variable name, SAS internally converts it to uppercase. Therefore, you cannot use the same variable name with a different combination of uppercase and lowercase letters to represent different variables) (5) Do NOT assign variables the names of special SAS automatic variables (such as __N__ and __ERROR__) or list variable names (such as ___NUMERIC___, ___CHARACTER___, and ___ALL___) to variables

tab-delimited values

Specify DBMS=DLM to import any other delimited file that does not end in .CSV

only for the current SAS session

Suppose you do not specify the LIBRARY= option and your formats are stored in Work.Formats. How long do they exist?

The TRIM function removes trailing blanks from character values. In this case, extra blanks must be removed from the values of FirstName. Although answer c also works, the extra TRIM function for the variable LastName is unnecessary. Because of the LENGTH statement, all values of FullName are padded to 40 characters

Suppose you need to create the variable FullName by concatenating the values of FirstName, which contains first names, and LastName, which contains last names. What is the best way to remove extra blanks between first names and last names?

CATX(separator,string-1<,....string-n>)

Syntax, CATX function

CEIL(argument)

Syntax, CEIL function

COMPBL(source)

Syntax, COMPBL function

COMPRESS(source<,characters><,modifier(s)>) A - adds alphabetic characters to the list of characters C - control characters to the list of characters D - adds digits to the list of characters F - adds the underscore character and English letters to the list of characters G - adds graphic characters to the list of characters H - adds a horizontal tab to the list of characters I - ignores the case of the characters to be kept or removed K - keeps the characters to be kept or removed L - adds lowercase letters to the list of characters N- adds digits, the underscore character, and English letters to the list of characters O- processes the second and third arguments once rather every time the COMPRESS function is called P-adds punctuation marks to the list of characters S - adds space characters (blank, horizontal tab, vertical tab, carriage return, line feed, and form feed) to the list of characters T - trims trailing blanks from the first and second arguments U - adds uppercase letters to the list of characters W - adds printable characters to the list of characters X - adds hexadecimal characters to the list of characters

Syntax, COMPRESS function

DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-2; RUN;

Syntax, DATA step for concatenating

DATA output-SAS-data-set; MERGE SAS-data-set-1 SAS-data-set-2; BY <DESCENDING> variable(s); RUN;

Syntax, DATA step for match-merging

DATA output-SAS-data-set; SET SAS-data-set-1; SET SAS-data-set-2; RUN;

Syntax, DATA step for one-to-one reading

DATDIF (start_date, end_date, basis) YRDIF (start_date, end_date, basis)

Syntax, DATDIF, and YRDIF functions

DATE ( ) ***The DATE function does not require any arguments, but it must be followed by parentheses

Syntax, DATE function:

DELETE;

Syntax, DELETE statement

DO UNTIL (expression); ....more SAS statements.... END;

Syntax, DO UNTIL statement

DO WHILE (expression); ....more SAS statements... END;

Syntax, DO WHILE statement

DO; SAS statements END;

Syntax, DO group

DO index-variable=specification-1 <,...specification-n>; .....more SAS statements... END;

Syntax, DO statement, iterative

DROP variable(s)

Syntax, DROP statement

(DROP=variable(s))

Syntax, DROP= data set options

ELSE statement;

Syntax, ELSE statement

FIND (string,substring<,modifiers><,startpos>)

Syntax, FIND function

FLOOR(argument)

Syntax, FLOOR function

FORMAT variable(s) format-name;

Syntax, FORMAT statement

PROC FREQ <options>; RUN;

Syntax, FREQ procedure

IF expression THEN statement

Syntax, IF-THEN statement

(IN = variable)

Syntax, IN = data set option

INDEX (source, excerpt)

Syntax, INDEX function

INPUT (source, informat)

Syntax, INPUT function

INT(argument)

Syntax, INT function

INTCK ('interval', from, to) 'interval' specifies a character constant or a variable. Interval can appear in uppercase or lowercase. NOTE: The type of interval (date, time, or datetime) must match the type of value in from

Syntax, INTCK function

INTNX ('interval', start-from, increment, <'alignment'>) NOTE: The type of interval (date, time, or datetime) must match the type of value in 'start-from' and 'increment'

Syntax, INTNX function

(KEEP=variable(s))

Syntax, KEEP= data set options

LEFT(argument) RIGHT(argument)

Syntax, LEFT and RIGHT function

LENGTH variable(s) <$> length;

Syntax, LENGTH statement

LOWCASE(argument)

Syntax, LOWCASE function

MDY (month, day, year)

Syntax, MDY function

PROC MEANS <DATA=SAS-data-set> <statistic-keyword(s)><option(s)>; RUN;

Syntax, MEANS procedure

ODS EXCEL <(<ID =>identifier)><action>; ODS EXCEL <(<ID =>identifier)><option(s)>;

Syntax, ODS EXCEL statement

ODS HTML BODY = file-specification; ODS HTML CLOSE;

Syntax, ODS HTML statement

ODS HTML BODY = body-file-specification CONTENTS = contents-file-specification FRAME = frame-file-specification ODS HTML CLOSE; NOTE: If you specify the FRAME= option, you must also specify CONTENTS = option

Syntax, ODS HTML statement to create a linked table of contents:

ODS PDF <(<ID => identifier)> <action>;

Syntax, ODS PDF statement

ODS RTF <(<ID =>identifier)><action>;

Syntax, ODS RTF statement

ODS open-destination; ODS close-destination CLOSE;

Syntax, ODS statement to open and close destinations

PATH = file-location-specification<(URL = NONE | "Uniform-Resource-Locator">

Syntax, PATH = option with the URL = suboption

PROC FORMAT <options>;

Syntax, PROC FORMAT statement

PROPCASE (argument<, delimiter(s)>)

Syntax, PROPCASE function

PUT (source, format)

Syntax, PUT function

(RENAME = (old-variable-name = new-variable-name))

Syntax, RENAME = data set option

RETAIN variable <initial-value>;

Syntax, RETAIN statement

ROUND(argument,round-off- unit)

Syntax, ROUND function

function-name (argument-1, argument-2, argument-n);

Syntax, SAS function

SCAN (argument, n<,<delimiters>>)

Syntax, SCAN function

STYLE = style-name;

Syntax, STYLE = option

SUBSTR(argument, position<,n>)

Syntax, SUBSTR function

TODAY ( ) ***The TODAY function does not require any arguments, but it must be followed by parentheses

Syntax, TODAY function:

TRANWARD (source,target, replacement)

Syntax, TRANWARD function

TRIM (argument)

Syntax, TRIM function

UPCASE (argument)

Syntax, UPCASE function

(URL = "Uniform-Resource-Locator";

Syntax, URL = suboption in a file specification

VALUE format-name range1='label1' range2='label2' ...more format-names...;

Syntax, VALUE statement

The WEEKDATEw. format writes date values in a format that displays the day of the week, month, day, and year; in the form day-of-week, month-name dd, yy (or yyyy)

Syntax, WEEKDATEw. format

WEEKDAY(date) date is a SAS date value that is specified either as a variable or as a SAS date constant

Syntax, WEEKDAY function:

The WORDDATEw. format writes date values in the form month-name dd, yyyy.

Syntax, WORDDATEw. format

YEAR(date) QTR(date) MONTH(date) DAY(date) date is a SAS date value that is specified either as a variable or as a SAS date constant

Syntax, YEAR, QTR, MONTH, and DAY functions:

variable = expression

Syntax, assignment statement

'ddmmmyy'd OR 'ddmmmyy'd

Syntax, date constant

IF expression;

Syntax, subsetting IF statement

variable + expression

Syntax, sum statement

The DATETIMEw. informat reads expressions that consist of two parts, a date value and a time value, in the form: ddmmmyy hh:mm:ss.ss

Syntax, values read with DATETIMEw. informat

The DATEw. informat reads date values in the form ddmmmyy or ddmmmyyyy

Syntax, values read with DATEw. informat

reads date values in the form MMDDYY or MMDDYYYY

Syntax, values read with MMDDYYw. informat

The TIMEw. informat reads values in the form hh:mm:ss.ss 5 is the minimum acceptable field width for the TIMEw. informat

Syntax, values read with TIMEw. informat

Syntax, TITLE and FOOTNOTE statements

TITLE<n>'text'; FOOTNOTE<n>'text'; n- a number 1 to 10

standard deviation

The default statistics produce by the MEANS procedure are n-count, mean, minimum, maximum, and which one of the following statistics:

QRANGE

The interquartile range and is calculated as the difference between the upper and lower quartile, Q3-Q1

The SCAN function is used to extract words from a character value when you know the order of the words, when their position varies, and when the words are marked by some delimiter

The variable Address2 contains values such as Piscataway, NJ. How do you assign the two-letter state abbreviations to a new variable named State?

The SUBSTR function is best used when you know the exact position of the substring to extract from the character value. You specify the position to start from and the number of characters to extract

The variable IDCode contains values such as 123FA and 321MB. The fourth character identifies sex. How do you assign these character codes to a new variable named Sex?

VALIDVARNAME = UPCASE

UPCASE specifies that the variable name follows the same rules as V7,except that the variable name is uppercase, as in earlier versions of SAS

JMP files

Use DBMS=JMP to specify importing JMP files. JMP variable names can be up to 255 characters long. SAS supports importing JMP files that have more than 32,767 variables

VALIDMENAME = EXTEND system option

Use when the characters in a SAS data set name contains one of the following: (1) international characters (2) characters supported by third-party databases (3) characters that are commonly used in a filename

as many as you want

Using ODS statements, how many types of output can you generate at once?

Syntax, VAR statement

VAR variables;

Syntax, WHERE statement

WHERE where-experession;

PROC FREQ PROC MEANS

What SAS procedures can be used to detect invalid data?

TITLE, LIBNAME, OPTIONS, and FOOTNOTE

What are some common global statements?

Sashelp, Sasuser, Work

What are the predefined SAS libraries?

- the new data set contains all the variables from all the input data sets. If the data sets contain variables that have the same names, the values that are read from the last data set overwrite the values that were read from earlier data sets -the number of observations in the new data set is the number of observations in the smallest original data set. Observations are combined based on their relative position in each data set.

What are the products of one-to-one reading?

(1) statements that are used in DATA and PROC steps (2) statements that are global in scope and can be used anywhere in a SAS program

What are two types of SAS statements?

- the length of the variable's first reference in the DATA step - the assignment statement - the LENGTH statement

What determines the length of a new variable?

-misspelled keywords and data set names -unbalanced quotation marks -invalid options

What errors are detected during the compilation phase?

(1) program data vector (PDV) (2) descriptor information

What is created during the compilation phase?

Define the libraries To reference a permanent SAS file: (1) assign a name (libref) to the SAS library in which the file is stored (2)use the libref as the first part of the two-level name (libref.filename) to reference the file within the library

What is often the first step in setting up your SAS session?

They can be used in calculations like other numeric values

What is the advantage of storing dates and times as SAS numeric date and time values?

1) Unlike CLASS processing, BY-group processing requires that your data already be indexed or sorted in the order of the BY variables. You might need to run the SORT procedure before using the PROC MEANS with a BY group 2) A CLASS statement produces a single large table, whereas BY-group processing creates a series of small tables. The order of the variables in the CLASS statement determines their order in the output table

What is the difference between CLASS processing and BY-group processing?

5

What is the minimum width of the TIMEw. informat?

- the data set descriptor -the program data vector -the ___N___ and ___ERROR___ automatic variables

What is written to the output during the compilation phase?

- they can specify a single value, such as 24 or 'S' -a range of numeric values, such as 0-1500 -a range of character values, such as 'A'-'M'

What ranges can a VALUE statement specify?

(1) compilation phase (2) execution phase

What two phases is a SAS DATA step processed?

Frequency distributions work best with variables whose values are categorical, and whose values are better summarized by counts rather than by averages

What variables does frequency distributions work best with?

-A note, warning, or error message is displayed in the SAS log -The values that are stored in the PDV are displayed in the SAS log -The processing of the step either continues or stops

When SAS detects an error during the execution phase, what happens (depending on the type of error)?

-a character value is assigned to a previously defined numeric variable -a character value is used in an arithmetic operation -a character value is compared to a numeric value, using a comparison operator -a character value is specified in a function that requires numeric arguments

When does automatic character-to-numeric conversions occur?

numeric data values are converted to character values whenever they are used in a character context

When does automatic numeric-to-character conversion occur?

-you know the order of the words in the character value -the starting position of the words varies -the words are marked by some delimiter

When should you use the SCAN function?

-extracts a portion of a value by starting at a specified location -is best used when you know the exact position of the string you want to extract from the character value

When should you use the SUBSTR function?

D:\Output \body.html

When the following code runs, what file is loaded by the links in D:\Output\contents.html

How to reference a fully qualified single external file?

When you associate a fileref with an individual external file, you specify the fileref in subsequent SAS statements and commands

How can you tell whether you have specified an invalid option in a SAS program?

When you submit a SAS statement that contains an invalid option, a log message notifies you that the option is not valid or not recognized. You should recall the program, remove or replace the invalid option, check your statement syntax as needed, and resubmit the corrected program

OTHER

Which keyword can be used to label missing numeric values as well as any values that are not specified in a range?

FMTLIB

Which keyword, when added to the PROC FORMAT statement, displays all the formats in your catalog?

VAR statement VAR boarded transfer deplane;

Which statement limits a PROC MEANS analysis to the variables Boarded, Transfer, and Deplane?

Use the INDEX function in a subsetting IF statement, enclosing the character string in quotation marks. Only those observations in which the function locates the string and returns a value greater than 0 are written to the data set

Within the data set Cert.Bookcase, the variable Finish contains values such as ash, cherry, teak, matte-black. Which of the following creates a subset of the data in which the values of Finish contain the string walnut? Make the search for the string case-insensitive.

Suppose you submit a short, simple DATA step. If the active window displays the message "DATA step running" for a long time, what probably happened?

Without a RUN statement (or a following DATA or PROC step), the DATA step does not execute, so it continues to run. Unbalanced quotation marks can also cause the DATA step running message if relatively little code follows the unbalanced quotation mark.

You permanently associate the formats with variables

You can place the FORMAT statement in either a DATA step or a PROC step. What happens when you place it in a DATA step?

-ascending or descending character order -ascending or descending numeric order -the data must be grouped in some way

Your data does not require any preprocessing if the observations in all of the data sets occur in which of the following patterns?

SAS data set

a data file that is formatted in a way that SAS can understand; consists of two parts - a descriptor portion and a data portion

Windows, UNIX

a group of SAS files that are stored in the same directory. Other files can be stored in the directory, but only the files that has SAS file extension are recognized as part of the SAS library

document

a hierarchy of output objects that enables you to render multiple ODS output without rerunning procedures

program data vector (PDV)

a logical area in memory where SAS builds a data set, one observation at a time

BY-group processing

a method of processing observations from one or more SAS data sets that are grouped or ordered by values of one or more common variables

SAS name literal

a name token that is expressed as a string within quotation marks, followed by the uppercase or lowercase letter n; tells SAS to allow the special character ($) in the data set name

Sasuser

a permanent library that contains SAS files in the Profile catalog and that stores your personal settings; also a convenient place to store your own files

Sashelp

a permanent library that contains sample data and other files that control how SAS works at your site; this is a Read-Only library

PROC FREQ

a procedure that is used to give descriptive statistics about a SAS data set; creates one-way, two-way, and n-way frequency tables; also describes data by reporting the distribution of variable values

index

a separate file that you can create for a SAS data file in order to provide direct access to a specific observation; purpose is to optimize WHERE expressions and facilitate BY - group processing

step

a sequence of SAS statements

expression

a sequence of operands and operators that form a set of instructions

SAS engine

a set of internal instructions that SAS uses for writing to and reading from files in a SAS library or a third-party database

Engine

a set of internal instructions that SAS uses for writing to and reading from files in a library; each one allows you to read different file format, including file formats from other software vendors

z/OS

a specially formatted host data set in which only SAS files can be stored

work

a temporary library for files that do not need to be saved from session to session

SAS statement

a type of SAS language element that is used to perform a particular operation in a SAS program or to provide information to a SAS program; free format; ends with a semicolon; begins with a SAS keyword

SAS datetime value

a value representing the number of seconds between January 1, 1960, and an hour/minute/second within a specified date; makes adjustments for leap years but ignores leap seconds; does not make adjustments for daylight saving time

SAS time value

a value representing the number of seconds since midnight of the current day; values between 0 and 86400

SAS date value

a value that represents the number of days between January 1, 1960 and a specified date; can be perform calculations on dates ranging from 1582 C.E. to 19,900 C.E; dates before January 1, 1960 are negative numbers and dates after are positive numbers

format

affects how data values are written; do NOT change the stored value in any way; merely controls how the data is displayed

PROC (procedure) step

analyzes data, produces output, or manages SAS files; output can be of several types, such as a report or an updated SAS data set

concatenating

appends the observations from one data set to another data set; requires a list of data set names in the SET statement and one or more BY variables in the BY statement; the new data set contains ALL the variables from all the input data sets, as well as the total number of records from all input data sets

Defining a library

assign a library name to it and specify the location of the files, such as a directory path

COMPRESS

begins the display of the next one-way frequency table on the same page as the preceding one-way table if there is enough space to begin the table NOTE: is not valid with the PAGE option

RANGE

calculated as the difference between the maximum value and the minimum value

modifier i

causes the FIND function to ignore character case during the search; if this modifier is not specified, FIND searches for character substrings with the same case as the character in substring

delimiters

characters that are specified as word separators

observations (also called rows)

collections of data values that usually relate to a single object

variables (also called columns)

collections of values that describe a particular characteristic

SAS log

collects messages about the processing of SAS programs and about any errors that occur

match-merging

combines observations from two or more data sets into a single observation in a new data set according to the values of a common variable; use a MERGE statement rather than the SET statement to combine data sets

one-to-one reading

combines rows from two or more data sets by creating rows that contain all the columns from each contributing data set; rows are combined based on their relative position in each data set

comma-separated values (CSV)

comma-separated file with a .CSV extension, DBMS= is optional

SAS program

consists of a sequence of steps; can be any combination of DATA or PROC steps

PROC CONTENTS

creates SAS output that describes either of the following: 1. the contents of a library 2. the descriptor information for an individual SAS data set

explicit OUTPUT statement

creates an observation for each iteration; overrides automatic output, causing SAS to add an observation to the data set only when the statement is executed

DATA step

creates or modifies data; output can be of several types, such as a SAS data set or a report

IN = data set option

data set option to create and name a temporary variable that indicates whether the data set contributed data to the current observation; it is not included in the output SAS data set

FORMCHAR (1,2,7) = 'formchar-string'

defines the characters to be used for constructing the outlines and dividers for the cells of crosstabulation table displays. The characters are used to draw the vertical separators (position 1), the horizontal separators (position 2) and the vertical-horizontal intersections (position 7)

SAS informat

determines how data values are read and stored according to the data type: numeric, character, date, time, or timestamp

SAS format

determines how variable values are printed according to the data type: numeric, character, date, time, or timestamp

FMTLIB

displays a list of all of the formats in your catalog, along with descriptions of their values

PAGE

displays only one table per page

NLEVELS

displays the "number of variable levels" table, which provides the number of levels for each variable named in the TABLES statement

iteration

each loop (or cycle or execution)

WEEKDAY function

enables you to extract the day of the week from a SAS date value; returns a numeric value from 1 to 7 (representative of the days of the week)

data errors

errors that occur when data values are not appropriate for the SAS statements that are specified in a program

syntax errors

errors that occur when program statements do not conform to the rules of the SAS language

semantic errors

errors that occur when you specify a language element that is not valid for a particular usage

IF-THEN statement

executes a SAS statement when the condition in the IF clause is true

iterative DO statement

executes statements between the DO and END statements repetitively, based on the value of an index variable

DO WHILE expression

expression is evaluated before each execution of the loop, so that the statements inside the group are executed repetitively while the expression is true

SUBSTR function

extracts a substring from an argument, starting at a specific position in the string

DAY function form: DAY(date)

extracts the day value from a SAS date value

MONTH function form: MONTH(date)

extracts the month value from a SAS date value

QTR function form: QTR(date)

extracts the quarter value from a SAS date value

YEAR function form: YEAR(date)

extracts the year value from a SAS date value

INTNX function

function applies multiples of a given interval to a date, time, or datetime value and returns the resulting value

LOWCASE function

function converts all letters in a character expression to lowercase

UPCASE function

function converts all letters in a character expression to uppercase

PROPCASE function

function converts all words in an argument to proper case (so that the first letter in each word is capitalized)

INPUT function

function converts character data values to numeric values; requires an informat

PUT function

function converts numeric data values to character values; requires a format

CATX function

function enables you to concatenate character strings, remove leading and trailing blanks, and insert operators; the results are usually equivalent to those that are produced by a combination of the concatenation operator and the TRIM and LEFT functions

INDEX function

function enables you to search a character value for a specified string; searches values from left to right, looking for the first occurrence of the string NOTE: The function is case sensitive

FIND function

function enables you to search for a specific substring of characters within a specified character string -searches the string, from left to right, fro the first occurrence of the substring, and returns the position in the string of the substring's first character -if the substring is not found in the string, the function returns a value of 0 -if there are multiple occurrences of the substring, the function returns only the position of the first occurrence

LEFT function

function left-aligns a character expression; returns an argument with leading blanks moved to the end of the value

catalogs

function like subfolders for grouping other members in SAS libraries

COMPBL function

function removes multiple blanks from a character string by translating each occurrence of two or more consecutive blanks into a single blank

TRIM Function

function removes trailing blanks from character expressions and returns one blank in the expression contains missing values; useful for concatenating because the concatenation operator does not remove trailing blanks

TRANWARD function

function replaces or removes all occurrences of a word in a character string; translated characters can be located anywhere in the string

MDY function

function returns a SAS date value from month, day, and year values; can add the same SAS date to every observation

COMPRESS function

function returns a character string with specified characters removed from the original string; null arguments are allowed and treated as a string with a length of zero

TODAY function

function returns the current date as a numeric SAS date value, which is the number of days since January 1, 1960 NOTE: If the value of the TIMEZONE= system option is set to a time zone name or time zone ID, the return values for date and time are determined by the time zone

DATE function

function returns the current date as a numeric SAS date value, which is the number of days since January 1, 1960 NOTE: If the value of the TIMEZONE= system option is set to a time zone name or time zone ID, the return values for date and time are determined by the time zone

INT function

function returns the integer portion of a numeric value (any decimal portion of the function argument is discarded)

FLOOR function

function returns the largest integer that is less than or equal to the argument

INTCK function

function returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values; counts intervals from fixed interval beginnings, not in multiples of an interval unit from the "from" value

CEIL function

function returns the smallest integer that is greater than or equal to the argument

RIGHT function

function right-aligns a character expression; returns an argument with trailing blanks moved to the start of the value

Round Function

function round values to the nearest specified unit

DATDIF and YRDIF

functions calculate the difference in days and years between two SAS dates

SAS Output Delivery System (ODS)

gives you flexibility in generating, storing, and reproducing SAS procedure and DATA step output along with a wide range of formatting options

SAS library

highest level of organization for information within SAS; a collection of one or more SAS files, including SAS data sets, that are referenced and stored as a unit; in a directory - based operating environment, it's a group of SAS files that are stored in the same directory; in z/OS, it's a group of SAS files that are stored in an operating environment

name

identifies a variable (any valid SAS name)

type

identifies a variable as numeric or character

ID statement

identifies observations using variable values, such as identification number, instead of observation numbers

OUT= <libref.> SAS-data-set

identifies the output SAS data set with either a one or two-level SAS name (library and member name); if the specified SAS data set does not exist, the IMPORT procedure creates it

DROP= option

if you never reference certain variables and you do not want them to appear in the new data set, use this option in the SET statement; when this option is used in the DATA statement, it drops the variables from

BY group

includes all observations with the same BY value. If you use more than one in a BY statement, a BY group is a group of observations with the same combination of values for these variables; has a unique combination of values for the variables

descriptor portion

information that SAS creates and maintains about each SAS data set, including data set attributes and variable attributes

___ERROR___

initialized to 0, set to 1 when an error occurs; displays debugging messages when an error occurs

Example of semantic error

invalid option

Uniform-Resource-Locator

is the name of an HTML file or the full URL of an HTML file. ODS uses this URL instead of the file specification in all the links and references that it creates that point to the file

STDDEV | STD

is the standard deviation s and is computed as the square root of the variance

permanent SAS libraries

libraries that are available to you during subsequent SAS sessions; referenced libref.dataset

temporary SAS libraries

libraries that last only for the current SAS session; referenced libref.filename (ex: work.test) OR the data set name only (one - level name) (ex: Test)

logical operators

links a sequence of expressions into compound expressions - AND (&); OR (|)

What types of errors can the PUTLOG statement help you resolve?

logic errors

Which type of delimited file does PROC IMPORT read by default?

logical record-length files

KURTOSIS | KURT

measures the heaviness of tails

SKEWNESS | SKEW

measures the tendency of the deviations to be larger in one direction than in the other

BY variable

names a variable or variables by which the data set is sorted. All data sets must be ordered by the values of the BY variable

DATA = SAS-data-set

names the data set to be analyzed by PROC FREQ

logic error

occurs when the program statements follow the rules and execute, but produce incorrect results; difficult to detect because no notes are written to the log

concatenation operator

operator concatenates character values; can be expressed as || (two vertical bars), two broken vertical bars), or !! (two exclamation points)

OUT= option

option identifies the output SAS data set

SHEET option

option to import specific worksheets from an EXCEL workbook

varnum

option used in the PROC CONTENTS statement to list variable names in the order of their logical position (or creation order) in the data set

NOOBS option

option used to suppress observation numbers

Printer Family (PDF, and so on)

output that is formatted for a high-resolution printer such as PostScript(PS), Portable Document Format (PDF), or Printer Control Language (PCL) files

HTML

output that is formatted in Hypertext Markup Language (HTML). You do not have to specify the ODS HTML statement to produce basic HTML output

Markup Languages Family

output that is formatted using markup languages such as Extensible Markup Language (XML)

<REPLACE> REPLACE= used to replace a permanent SAS data set

overwrites an existing SAS data set; if option is omitted, the IMPORT procedure does not overwrite an existing data set

data portion

portion of a SAS data set is a collection of data values that are arranged in a rectangular table

descriptor portion

portion of the data set contains information about the properties of each variable in the data set and about the data set, including the following: (1) the name of the data set (2) the date and time that the data set was created (3) the number of observations (4) the number of variables (5) variable's name, type, length, format, informat, and label

SAS functions

pre-written routines that perform computations or system manipulations on arguments and return a value; can return either character or numeric result

PROC MEANS

procedure provides data summarization tools to compute descriptive statistics for variables across all observations and within groups of observations -calculates descriptive statistics based on moments -estimates quantiles, which includes the median -calculates confidence limits for the mean -identifies extreme values -performs a t test

PROC IMPORT

procedure reads structured and unstructured data from an external data source and writes it to a SAS data set

nesting

putting a DO loop within a DO loop

SAS date and time informats

read date and time expressions and convert them to SAS date and time values

informat

reads data values in certain forms into standard SAS values; determine how data values are read into a SAS data set

SAS/ACCESS LIBNAME statement

references an excel workbook file

label

refers to a descriptive label of up to 256 characters long

length

refers to the number of bytes used to store each of the variable's values in a SAS data set

one-to-one matching

requires multiple set statements; where the same-named variables occur, values that are read from the second data set replace those that are read from the first data set; also, the number of observations in the new data set is the number of observations in the smallest original data set

SCAN function

returns the nth word from a character string; enables you to separate a character value into words and to return a specified word

CONTAINS (?) operator

selects observations that include the specified substring; symbol is the ?

operators

special-character operators, grouping parentheses, or functions

<SAS-data-set-options>

specifies SAS data set options; cannot specify data set options when importing delimited, comma-separated, or tab-delimited external files; for example, ALTER=, PW=, READ=, or WRITE=

'30/360'

specifies a 30-day month and a 360-day year; a valid DATDIF and YRDIF functions

WHERE expression

specifies a condition for selecting observations; should be only one in a step; if multiple statements are issued, only the last statement is processed

SAS-file-specification

specifies an entire library or a specific SAS data set within a library; can take one of the following forms: <libref>SAS-data-set names one SAS data set to process <libref>__ALL__ requests a listing of all files in the library (Use a period (.) to append __ALL__ to the libref)

DESCENDING option

specifies that the data set is sorted in descending order by the variable that immediately follows

NOTSORTED option

specifies that the observations in the data set that have the same BY values are grouped together, but are not necessarily sorted in alphabetical or numeric order

SET statement

specifies the SAS data set that you want to use as input data for your DATA step

DATAFILE = "filename" | "fileref"

specifies the complete path and filename or fileref for the input PC file, spreadsheet, or delimited external file (omit the quotation marks if the fileref, complete path, or filename does not include special characters)

LIBRARY=libref

specifies the libref for a SAS library to store a permanent catalog of user-defined formats

TABLE= "tablename"

specifies the name of the input DBMS table; if the name does not include special characters (such as question marks), lowercase characters, or spaces, you can omit the quotation marks; NOTE that the DBMS table name might be case sensitive

FIRSTOBS=

specifies the number of the first observation to process

OBS=

specifies the number of the last observation to process

<ORDER= DATA | FORMATTED | FREQ | INTERNAL > =

specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement

VALIDVARNAME = system option

specifies the rules for valid SAS variable names that can be created and processed during a SAS session; set these rules using the VALIDVARNAME= system option

<DBMS=identifier>

specifies the type of data to import

$w.

specifies values as character values in w spaces

MMDDYYw.

specifies values as date values of the form 09/12/17 (MMDDYY8.) or 09/12/2017 (MMDDYY10.)

DATEw.

specifies values as date values of the form 16OCT17 (DATE7.) or 16OCT2017 (DATE9.)

w.d

specifies values that are rounded to d decimal places in w spaces

w.

specifies values that are rounded to the nearest integer in w spaces

COMMAw.d

specifies values that contain commas and decimal places

DOLLARw.d

specifies values that contain dollar signs, commas, and decimal places

subsetting IF statement

statement causes the DATA step to continue processing only those observations that meet the condition of the expression specified in the IF statement

DELETE statement

statement determines which observations to omit as you read data

DO UNTIL expression

statement executes a DO loop until the expression becomes true; expression is evaluated after each execution of the loop, so that the statements inside the group are executed repetitively until the expression is true (always executes at least once)

DATA statement

statement indicates the beginning of the DATA step and names the SAS data set to be created

ELSE statement

statement must immediately follow the IF-THEN statement in your program; executes only if the previous IF-THEN/ELSE statement is false

FILENAME statement

statement used to point to the location of the external file that contains the data

global statements

statements that are used anywhere in a SAS program and stay in effect until changed or canceled, or until the SAS session ends; do not require a run statement

PROC FORMAT

stores user-defined formats and informats as entries in a SAS catalog

SUM

sum

NOPRINT

suppresses the display of all output

NODS

suppresses the printing of detailed information about each file when you specify __ALL__ ; can only be specified together with __ALL__

LRECL= system option

system option specifies the default logical record length to use when reading external files

T

the Student's t statistic to test the null hypothesis that the population mean is equal to mu (u0)

MEAN

the arithmetic mean or average of all the values

Webwork

the default output library in interactive mode when using SAS Studio

filename

the fully qualified name or location of the file

Q1 | P25

the lower quartile or 25th percentile

MAX

the maximum value

MEDIAN | P50

the middle value or the 50th percentile

MIN

the minimum value

SAS-data-library

the name of a SAS library in which SAS data files are stored; specification of the physical name of the library differs by operating environment

engine

the name of a library engine that is supported in your operating environment; when the default is used it does NOT have to be specified in the LIBNAME statement

Fileref

the name that you associate with an external file; the name must be one to eight characters long, begin with a letter or underscore, and contain only letters, numbers, or underscores; when associated with an individual external file, it can be specified in subsequent SAS statements and commands

NMISS

the number of observations with missing values

N

the number of observations with nonmissing values

___N___

the number of times a data step iterated; displays debugging messages for a specified number of iterations of the DATA step

UCLM

the one-sided confidence limit above the mean

LCLM

the one-sided confidence limit below the mean

CV

the percent coefficient of variation

STDERR | STDMEAN

the standard error of the mean

CSS

the sum of squares corrected for the mean

SUMWGT

the sum of weights

CLM

the two-sided confidence limit for the mean

PROBT | PRT

the two-tailed p-value for Student's t statistic, T, with n-1 degrees of freedom. This value is the probability under the null hypothesis of obtaining a more extreme value of T than is observed in this sample

Q3 | P75

the upper quartile or 75th percentile

BY value

the value of the BY variable

USS

the value of the uncorrected sum of squares

MODE

the value that occurs most frequently

target variable

the variable to which the result of a function is assigned Ex: AvgScore = mean (exam1, exam2, exam3);

modifier t

trims trailing blanks from string and substring

PUTLOG statement

use this statement to ensure that debugging messages are written to the SAS log and not to the external file; can be used to write to the SAS log in both batch and interactive modes

assignment statement

used in any DATA step in order to modify existing values or create new variables; -transform variables -create new variables -conditionally process variables -calculate new values -assign new values

MAXDEC = option

used in the PROC MEANS statement to limit the number of decimal places preferred

VALUE statement

used to define a format for displaying one or more values

CLASS statement

used to produce separate analyses of grouped observations;

VALIDMENAME = system option

used to specify rules for naming SAS data sets

PUT statement

used when the source of the program error is not apparent; statement is used to examine variable values and to print your own message in the log

extended attributes

user-defined metadata that is defined for a data set or for a variable (column); represented as name-value pairs

'ACT/360'

uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 360); a valid YRDIF function

'ACT/365'

uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 365)

'ACT/ACT'

uses the actual number of days or years between dates; a valid DATDIF and YRDIF function

OUTPUT <SAS-data-set(s)>

using an output statement without a following data set name causes the current observation to be written to all data sets that are specified in the DATA statement

operands

variable names or constants; can be numeric, character, or both

FIRST.variable and LAST.variable

variables that SAS creates for each BY variable; set when creating the first and last variable in a BY group; these assignments enable you to take different actions, based on whether processing is starting for a new BY group or ending for a BY group

character variables

variables that can contain any values; can be up to 32,767 bytes long; a blank space is the missing default; default informat $w.

numeric variables

variables that can contain only numeric values (the numerals 0 through 9, +, -, and E for scientific notation); has a default length of 8 bytes (stored as floating point numbers in 8 bytes of storage); a period (.) is the default missing value; default informat is w.d

VAR

variance

EXCEL

writes EXCEL spreadsheet files that are compatible with Microsoft Office 2010 and later versions


Kaugnay na mga set ng pag-aaral

Defensive driving at defensivedriving.com

View Set

C393 CompTIA A+ Core 1 All Topics

View Set

MCAT Discrete Practice Questions

View Set

Sport Management: Test 1 (Chapters 1-5)

View Set

State and Local Government Final

View Set

Drugs for Angina Pectoris & Management of STEMI

View Set

Live Virtual Machine Lab 2.1: Module 02 Defining Networking Devices

View Set