SAS Programming Topics

Ace your homework & exams now with Quizwiz!

Renaming Variables

-DATA step match-merging overwrites values of the like-named variable in the first data set in which it appears with values of the like-named variable in subsequent data set -To prevent overwriting, you can rename variables by using the RENAME= data set option in the MERGE statement

NOFREQ, NOPERCENT, NOROW, NOCOL statements

-NOFREQ suppresses cell frequencies -NOPERCENT suppresses cell percentages -NOROW suppresses row percentages -NOCOL suppresses column percentages

Differences between DROP and KEEP and DROP= and KEEP=

-You cannot use the DROP and KEEP statements in SAS procedure steps. -The DROP and KEEP statements apply to all output data sets that are named in the DATA statement. To exclude variables from some data sets but not from others, use the DROP= and KEEP= data set options in the DATA statement

TABLES statement

-creates output tables for specified variables -To create a two-way table or n-way table, join the variables with an asterisk (*) in the TABLES statement in a PROC FREQ step

PROC SORT Rules

-default is ascending, missing values will appear at the top if ascending -to change to descending "descending var" applies to the variable immediately behind the descending

SAS programing structure

-when a SAS program is submitted for execution, SAS first validates the syntax and then compiles the statements. DATA and PROC statements signal a beginning of a new step -SAS log collects messages about the processing of SAS programs and about any errors that occur

Categorizing Variables

-you can use the IF-THEN statement to assign values to a new variable based off if a specific condition is met -when the DATA step executes, each IF statement is evaluated in order, even if the first condition is true. This wastes system resources and slows the processing of your program -Use the ELSE statement to assign a value of the If-Then is false

Format naming rules

A format name must be a valid SAS name, start with a $ if the format is character, must not end in a number, and have a maximum length of 32. Also, the name of a user-defined format cannot be the same as the name of a format that is supplied by SAS

FMTLIB statement

Adding the keyword FMTLIB to the PROC FORMAT statement displays a list of all the formats in your catalog, along with descriptions of their values

CONTAINS

CONTAINS - use contains to collect observations that contain a key word (like IN for SQL) Symbol: ?

out= option

IMPORT option that identifies the output SAS data set with either a one or tow-level SAS name. If the data set doesn't exist, it creates it

dbms= option

IMPORT option that specifies the type of data to import. Types: CSV, JMP, TAB (tab-delimited files)

Match Merging

Match-merging combines observations from two or more data sets into a single observation in a new data set according to the values of a common variable (Join) Example: data merged; merge a b; by num; run;

ODS output types

Microsoft Excel and Power Point, HTML, PDF, CSV, and RTF

PROC MEANS

PROC MEANS prints the n-count (number of non missing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in a data set

How does SAS store dates

SAS stores date and time values as numeric values. You apply SAS formats to the data so that meaningful date and time values are displayed in reports

NOOBS

Suppresses observation numbers in a print statement

4 Common SAS Global Statements

TITLE, LIBNAME, OPTIONS, and FOOTNOTE

TITLE and FOOTNOTE statement

TITLE<n> 'text' and FOOTNOTE<n> 'text' statements specify the title ad footnote of the page

BODY= statement

The BODY= specification is one way to create an HTML file containing procedure output

BY statement

The BY statement specifies variables to use for categorizing observations

The CATX function

The CATX function enables you to concatenate character strings, remove leading and trailing blanks, and insert separators

The CEIL function

The CEIL function returns the smallest integer that is greater than or equal to the argument

CONTENTS= statement

The CONTENTS= specification creates the file toc.html in the C:\Users \Student1\cert\ directory. The table of contents file has links to each procedure output in the body file

DO WHILE and DO UNTIL

The DO WHILE and DO UNTIL statements enable you to execute DO loops based on whether a condition is true or false

The FLOOR

The FLOOR function returns the largest integer that is less than or equal to the argument

FRAME= statement

The FRAME= specification creates the file frame.html in the C:\Users \Student1\cert\ directory. The frame file integrates the table of contents and the body file

The INDEX function

The INDEX function enables you to search a character value for a specified string

INTCK function

The INTCK function returns the number of interval boundaries of a given kind that lie between two dates, times, or datetime values

INTNX function

The INTNX function applies multiples of a given interval to a date, time, or datetime value and returns the resulting value. You can use the INTNX function to identify past or future days, weeks, months, and so on

OUTPUT Statement

The OUTPUT statement writes statistics to a new SAS data set. By default, the default summary statistics are produced for all numeric variables or for the variables specified in the VAR statement

SCAN function

The SCAN function returns the nth word from a character string. The SCAN function enables you to separate a character value into words and to return a specified word

SUBSTR function

The SUBSTR function extracts a substring from an argument, starting at a specific position in the string -Can also be used to replace contents of a character variable

TRANWORD function

The TRANWRD function replaces or removes all occurrences of a word in a character string

The concatenation operator (||)

The concatenation operator concatenates character values. FullName = First || Middle || Last

Basics of the DO Loop

The iterative DO statement executes statements between the DO and END statements repetitively, based on the value of an index variable -index-variable: names a variable whose value governs execution of the DO group

Creating an output for each DO iteration

To create an observation for each iteration of the DO loop, place an OUTPUT statement inside the loop OUTPUT statement inside the loop overrides the automatic output of the dataset

FORMAT statement

To make data values more understandable when they are displayed in your procedure output, you can use the FORMAT statement, which associates formats with variables

CLASS statement

To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure

When the SUBSTR function replaces a word

To summarize, when the SUBSTR function is on the right side of an assignment statement, the function extracts a substring. MiddleInitial=substr(middlename,1,1); When the SUBSTR function is on the left side of an assignment statement, the function replaces the contents of a character variable. substr(region,1,3)='NNW'

One-to-One Reading

Use multiple SET statements in a DATA step to combine data sets. One-to-one reading combines rows from two or more data sets by creating rows that contain all of the columns from each contributing data set

INPUT function

Use the INPUT function to convert character data values to numeric values REQUIRES INFORMAT

PUT function

Use the PUT function to explicitly convert numeric data values to character data values

PUTLOG Statement

Use the PUTLOG statement in the DATA step to write messages to the SAS log to help identify logic errors -Ex: PUTLOG 'message';

Concatenating Datasets

When a program concatenates data sets, all of the observations are read from the first data set listed in the SET statement. Then all of the observations are read from the second data set listed, and so on Example: DATA output-SAS-data-set; SET SAS-data-set-1 SAS-data-set-2;

PUT Statement

When the source of the program errors is not apparent, use the PUT statement to examine variable values and print your own message in the log. You can use the IF-THEN/ELSE statements to conditionally check values

Using the VALUE statement in FORMATS

You can create your own formats to format the values. You can also apply a format to the values of a variable -Use the VALUE statement to define a format for displaying one or more values the keyword VALUE and ends with a semicolon after all the labels have been defined

Program Data Vector

a logical area in memory where SAS builds a data set. As a SET statement compiles, a slot is added to the PDV for each variables in the new data set

Debugging in execution phase

a note, warning, or error message is displayed in the SAS log; the values that are stored in the PDV are displayed in the SAS log; the processing of the step either continues or stops

What is the SASuser library

a permanent library that contains SAS files in the Profile catalog and that stores your personal settings

What is the SAShelp library

a permanent library that contains sample data and other files that control how SAS works at your site

What is the Work library

a temporary library for files that do not need to be saved from session to session

Modifying variables with expressions

an expression is a sequence of operands and operators that form a set of instructions (+,-,*, etc.) Example of creating new variable: data work.stresstest; set cert.tests; TotalTime=(timemin*60)+timesec; run; proc print data=work.stresstest; run;

FIRST.variable and LAST.variable

are variables that SAS creates for each BY variable. SAS sets FIRST.variable when it is processing the first observation in a BY group, and sets LAST.variable when it is processing the last observation in the BY group

PROC CONTENTS

creates a SAS output that describes either of the following: contents of a library, the descriptor information for an individual data set Ex: PROC CONTENTS DATA=SAS-file-specification NODS;

DO loop example: the DO group Month is the index variable, 1 is the start-variable, and 12 is the stop variable

data work.earnings (drop=month); set cert.master; Earned=0; do month=1 to 12; earned+(amount+earned)*(rate/12); end; Balance=Amount+Earned; run;

DO UNTIL example

data work.invest; do until(Capital>=50000); capital+2000; capital+capital*.10; Year+1; end; Run; - The DATA step below uses a DO UNTIL statement to perform the calculation until $50,000 is reached. Each iteration of the DO loop represents one year

DO WHILE example

data work.invest; do while(Capital>=50000); capital+2000; capital+capital*.10; Year+1; end;

DAY function

day=day(date); day of month (1-31)

SAS informats

determines how data values are read and stored according to the data type

SAS formats

determines how variable values are printed according to there data type

eof = option

eof means end of file The value of this is set to 1 when the last observation of the dataset isprocessed

The SAS Output Delivery System (ODS)

gives you flexibility in generating, storing, and reproducing SAS procedure and DATA step output along with a wide range of formatting options

ID statement

identifies observations using variable values, instead of observation numbers. It specifies one ore more variables to print at the beginning of the row of the report instead of observation #

PROC IMPORT

import procedure reads data from an external data source and writes it to a SAS data set. Syntax: PROC IMPORT DATAFILE = 'filename' | TABLE = 'tablename' OUT = <libref.SAS-data-set> <DBMS=identifier>

Where can you perform SAS functions

in WHERE expressions, in macro language statements, in the REPORT procedure, and in Structured Query Language (SQL)

BY group

includes all observations with the same BY value. If you use more than 1 variables in the BY statement, a BY group is a group of observations with the same combination of values for these variables

By-group processing

is a method of processing observations from one or more SAS data sets that are grouped or ordered by values of one or more common variables

The date value

is a value that represents the number of days between January 1, 1960, and a specified date. Dates before 1960 are negative, and dates after are positive

Debugging in compilation phase

misspelled keywords and data set names, unbalanced quotations marks, invalid options

BY variables

names a variables by which the data set is sorted

maxdec =

number of decimal places you want in your output data set, 0 would mean it would have no numbers after the decimal

CROSSLIST statement

option displays crosstabulation tables in ODS column format instead of the default crosstabulation cell format

FILENAME statement

points to an external data file, Filerefs perform the same function as librefs: they temporarily point to a storage location of data. However, librefs reference SAS libraries, filerefs are external files

RTF procedure

produces an output for Microsoft word

QTR function

quarter=qtr(date); quarter (1-4)

MDY function

returns SAS date value from month, day, and year values

FIRSTOBS=

starts processing at a specific observation

OBS=

stops processing after a certain number of observations are reached

NODS

suppresses printing the detail information about each file when you specify a libref._all_ (you can specify this ONLY when using ._all_)

PROC FREQ

the FREQ procedure creates a one-way table that contains the frequency, percent, cumulative frequency, and cumulative percent of every value of every variable in the input data set

SET Statement

the SET statement specifies the SAS data set you want to use as input data for the data step Ex: DATA SAS-data-set; SET SAS-data-set Run;

Descriptor Portion of a SAS dataset

the portion of a SAS data set contains information about the data set like: name, date and time that the data set was created, number of observations, number of variables

BY values

the value of the BY variable

Creating Variable

use an assign statement in any SET step in order to modify existing values or create new variables Syntax: variable=expression - variables: names a new or existing variable expression: any valid SAS expression (value)

DROP= and KEEP=

use the DROP= and KEEP= data set options to specify the variables to drop or keep -Example: data work.stresstest (drop=timemin timesec); set cert.tests;

LABEL statement

use the LABEL statement in the PROC PRINT statement to assign a descriptive label to a variable. Ex: LABEL variable='label1' variable2='label2' -Permanent labels need to be assigned in the DATA step

LENGTH statement

use the LENGTH statement to specify a length to avoid truncation of your values

RETAIN statement

use the RETAIN statement to assign an initial value other than 0

IN= data option

use this option to create and name a temporary variable that indicates whether the data set has contributed a variable or not. Value is 1 if the data set contributed data for that specific observation

WEEKDAY function

wkday=weekday(date); day of week (1-7) -1 is Sunday, 7 Saturday

Pageby statement

would put observations grouped by the BY statement on different pages rather than the same page

OPTIONS AND OBS

you can use the OPTIONS statement with OBS=# before the import procedure to limit the number of observations that SAS reads from your external file -Ex: options obs=5; would read the first five records from a data set

SUM with the BY statement

you can use the SUM statement with the BY statement to create subtotals for each aggregation. For example if you do: SUM fee; by actlevel; would create two tables of aggregations divided up by actlevel

How to reference files in SAS

you use a two level name consisting of the library name and the data set name: Libref.dataset

YEAR function

yr=year(date); year (4 digits)


Related study sets

Chapter 7: Positive Organizational behavior

View Set

Practice quiz: branching and merging

View Set

Anatomy and Physiology Chapter 1 Atlas A

View Set

AWS Cloud Practitioner Quiz - Udemy course credit to...

View Set