SAS Base Exam Concepts

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Types of SAS Output

-SAS dataset (inc. values + descriptive info) -SAS log -Report or simple listing (proc print, tables, etc.) -Other SAS files (e.g. catalogs, settings) -External files or entries directly into other databases (e.g. proc export, direct entry into Oracle, etc.)

SAS Language Elements

-Statements -Expressions -Functions & CALL routines -Options -Formats (for formatting SAS data in output) -Informats (for reading formatted raw data into usable SAS data)

SAS Error Types

-Syntax errors -Semantic errors -Execution-time errors -Data errors -Macro-related errors (newer in 9.4 documentation)

SAS QC Checklist for Preventing Errors

1. Check Syntax (esp. semicolons, quotes, keywords & spelling, ending every DO or SELECT with END;) 2. Check order 3. Check INPUT & data

2 Phases of reading in a datset

1. Compilation Phase 2. Execution Phase

Methods to Combine Datasets

1. Concatenate 2. Interleaving 3. Merging (1-to-1, match merge) 4. Update 5. Modify

Base SAS Components

1. Data management facility (ability to organize data in tabular form) 2. Programming language 3. Data analysis & reporting utilities (PROCS)

What does the placement of DROP= in the code below do? data new (drop=ID); merge data1 data2 (drop = var2) data3; by ID; run;

1. Drop= in the first DATA line (drop=ID) causes ID to be dropped in the final output dataset, but KEEPS it for the initial merge 2. Drop= in the MERGE line drops it from the initial merge so it never gets merged in the first place Option 1 might be good e.g. if you need to merge then deidentify. Option 2 is more efficient if there are variables you just don't need at all.

Duplicate BY Values in MODIFY - What happens?

1. If duplicates are in MASTER dataset, the transaction DS modifies FIRST occurrence in master (leaves others unchanged) 2. If duplicates are in TRANSACTIONAL dataset, then observations overwrite each other (final value = last obs with that BY value) 3. If duplicates are in BOTH master & transactional, SAS still applies each transactional value to the FIRST occurrence in the master dataset (first occurrence will overwrite itself a bunch of times, subsequent occurrences nothing will happen).

The 6 SAS Variable Attributes

1. Name 2. Type 3. Length 4. Informat 5. Format 6. Label

Standard Numeric Formats

1234 (integer) 123.456 (decimal) -123E4 (scientific notation)

LIBNAME

Assigns new permanent library (folder) for storing data, output, etc. Statement: LIBNAME libref 'filepath'; libref = short name for library Must use 2-level name to call this in future (libref.data)

_IORC_

Automatic variable generated during a MODIFY statement. Values: _SOK = observation from transaction dataset was found in master. _DSENMR and _DSEMTR: Observations from transactional dataset were not found in master. If consecutive obs in the same BY group were not found, the first occurrence will say _IORC_ = _DSENMR and subsequent occurrences will be _DSEMTR USES: E.g. select if _iorc_ = _sok

like operator, _ and %

LIKE searches for a matching pattern and is case sensitive. _ substitutes for one character (insert anything), and % is a wildcard (anything can go there). Example: proc print data = emp; where Name like '_R%'; run; Searches Name variable for the pattern '<1 character>capital R<anything>'

LOWCASE / UPCASE/PROPCASE

Converts case of a character variable to lower case (LOWCASE), upper case (UPCASE), or proper case (PROPCASE) e.g. if x="my Pinneapple", UPCASE(x) = PINNEAPPLE, LOWCASE(x) = my pinneapple, PROPCASE(x) = My Pinneapple

Macro Variables (Static/Simple)

Create a macro variable with a static value that can be called again & again in a program. When you change the assigned value, it will change throughout the program. Assign it - %LET: %LET mymacvar = OK; Call it - &: Title "Is &mymacvar"; *Note: Don't use quotes when ASSIGNING macvar (even if character) - use DOUBLE QUOTES when calling it.

Basic Syntax to Create & Call a [Local] Macro Routine

Create a macro: %MACRO macname; ... <insert regular SAS code for macro>; %MEND macname; Call a macro - &macname: DATA mydata; SET myset; &macname; run;

Automatic Variables

Created in the PDV during a DATA step: _N_: counts the # of times a DATA step iterates _ERROR_: signals an error caused by execution (if _ERROR_ = 1, there's an error such as a character in a numeric variable)

proc sgpanel

Creates a panel/grid of charts based on one "panelby" variable (e.g. instead of multiple boxplots in one graph, boxplot for each group is in a different panel/box). Syntax: proc sgpanel data=mydata; panelby groupvar / layout=lattice; hbox numvar; run;

IN= option

Creates a temporary variable indicating whether an observation came from the specified dataset. Used mainly when concatenating datasets, e.g.: data combo; set data1 (IN=yep) data2; by ID; if yep then source = 'first dataset'; else source = 'second dataset'; run; *First creates binary, temporary 'yep' variable where yep=1 if observation is in data1 or 0 if not. (Could also create a different 1/0 variable for data2 called something different, but would be redundant for just 2 datasets). Since 'yep' is only in PDV/temp, also creates a variable called source which shows the output.

SAS Data Types

Numeric or character

Default & Max Storage Space for Vars

Numeric: 8 bytes Character: Length of first occurrence/data value (e.g. "blue" -> 4 chars), max 32,767 characters

Missing Data Notation for SAS Datasets (numeric vs character)

Numeric: Missing data stored as period (".") Character: Missing stored as blank, EXCEPT when entering list data (DATALINES) - must use period to represent missing data.

proc print obs= vs (obs=)

OBS= Used to label the auto-generated observation number column, e.g. proc print data=mydata obs='My Obs'; *noobs suppresses auto-obs column (OBS=) in parentheses tells SAS the maximum obs # to print e.g. proc print data=mydata (obs=10); *firstobs specifies 1st obs to print

END= option

Creates a temporary variable which flags the last observation in a dataset (1=last obs, 0=not "). Written to PDV only, so have to write new var or print to see. data combo; set data1 data2 END=flag; by ID; if FLAG then output; run; (*use proc print to see*)

Invalue

Creates an informat for character values, which allows you to tell SAS how to read in special character values. Used in PROC FORMAT. E.g. if have a numeric var where missing was marked as "M": proc format; invalue varname 'M' = . other = _SAME_; run;

options DATESTYLE=

DATESTYLE=mdy, dmy, ymd Used to clarify date formats when could be ambiguous (e.g. European vs American dates)

PROC DATASETS - Delete Files

DELETE SELECTED FILES: proc datasets library=mylib; delete mydata; run; DELETE ALL BUT SELECTED: proc datasets lib=mylib; save mydata; run; DELETE ALL FILES IN SAS LIB: proc datasets library=mylib kill; run; quit;

DO WHILE & DO UNTIL

DO WHILE: Processes a do loop as long as a condition is met (doesn't initiate if the condition is never met) DO UNTIL: Starts do loop & continues until a condition is met

SAS Dataset Components

Data Values & Descriptor Portion (metadata)

Add formats & styles in PROC TABULATE

Formats: In TABLE statement, add asterisk + "f=<format.>" E.g. TABLE groupvar, numvar*mean*f=comma7.; Style: Add in class, var, or table statements. E.g. class var1 / style=[font_style=italic]; table var1*[style=[background=cyan font_weight=bold];

COMPRESS to join variables

Join variables & remove spaces (e.g. to create a unique ID) newvar=compress(oldvar1||oldvar2||oldvar3)

Merge: What happens if multiple obs with the same BY value?

Keeps all obs from both datasets

Using the : Colon

-Denote a series of integers (1:10 = 1,2,3...10) -Variable wildcard (e.g. "drop pre:;" = drop all variables with prefix 'pre') -In combo with comparison operators (e.g. =: means "starts with", e.g. if lastname =: "Ha", then order = 1; else order=2;) -Denote array dimensions (e.g. array(1:10) var1-var10;)

Pointer Controls

var n1-n2: Column, specifies col widths @n var: Absolute, specifies start of var is @n var1 +n var2: Relative, tells SAS to move pointer n places to right of the preceding variable var1 / var2: Jump to next line for var2 #n var1: Go to nth line of input buffer to find var1

Sum Statement

variable + expression (e.g. used to create running totals - variable = cumulative total var you are calculating, expression = var you want to create a running total OF) e.g. cumulativecases + cases

COMPRESS modifiers

Specific chars let you remove groups of variables. SYNTAX: compress(string,char,mod) compress(string, ,'a'); -> removes all letters compress(string, ,'kd'); -> keeps digits only compress(string,'czy', 'i'); -> removes czy, ignores case compress(string, , 'pd'); -> removes punctuation & digits)

Infile

Specifies an external file to read with an INPUT statement, valid in a DATA step

firstobs & obs options

Specifies line # of first (FIRSTOBS=) and/or last (OBS=) observation to read in. Example: data new; set old (firstobs=5 obs=10); run; /*reads in observations 5-10*/

proc import - GUESSINGROWS option

Specifies the number of rows to scan to determine the appropriate data type and length for a variable. EXAMPLE:

PUT statement

Specifies to write something to the log. E.g. data concat; set data1 data2; by ID; if type <=: 'm' then put id; run; (Will output ID for any obs where type starts with a-m in the log)

SAS DateTime Value

Stored as number of SECONDS between Jan 1, 1960 and a specified date time (to the nearest second)

SAS Time Value

Stored as numeric number of seconds since midnight (00:00:00am) the current day. Time values range from 0 to 86400.

Options dms/nodms

Turns on/off windows (switches to line mode?)

Assign Data Variable to a Macro Variable

Use CALL SYMPUT: E.g. this code assigns the variable "species" to a macro variable 'fishtype': DATA fish; SET sashelp.fish; CALL SYMPUT('fishtype',species); RUN; *NOTE: Must use SINGLE quotes to assign macro var in call symput.

proc chart - report mean or sum instead of default (count)

Use SUMVAR & TYPE= options. E.g. proc chart data=mydata; vbar charvar / sumvar=numvar type=means; run; *Will show bar chart of mean numvar by charvar groups. **Can also do TYPE=SUM.

Conditional formatting for cell colors based on value

Use proc format + call in proc freq, tabulate, etc. EXAMPLE: proc format; value flag low-49 = 'red' 50-99 = 'yellow'; run; proc tabulate data=mydata; class female; table (female ALL)*(n*f=5. pctn*f=pctfmt.*[style=[background=flag.]]); run;

Digit selectors

Used in proc format/PICTURE. In the desired numeric "picture" (e.g. '999-99-9999'), the 9s are called DIGIT SELECTORS. Can be either 0 (will not print leading zeros) or nonzero 1-9 (will print leading zerios; most coders use 9).

Create Running Total (Whole Dataset)

data newdata; set olddata; totalvar + origvar; run; Totalvar = A NEW running total variable (will retain its value for each new obs then add the next one) Origvar = Variable you want to create a running total of (e.g. daily cases -> cumulative cases)

INDEX

index(charvar, 'target') Searches & returns the position (index) of a specified character or substring. Can also be used to check if a character or substring is present in a string by testing if index > 0. e.g. if index(charvar, 'ok') > 0, then flag=1; else flag=0 -> if 'ok' is found in a string then flag=1

Basic ODS Syntax

ods <outputtype> <file>; proc... (whatever you want to run); ods <outputtype> close; E.g. ods html "my_file.htm"; proc means data=mydata; ... run; ... ods html close; ***Output only shows AFTER YOU CLOSE.

PROC DATASETS - Change Dataset Names

proc datasets library=mylib; CHANGE olddata1=newdata1 olddata2=new2...; run;

PROC OPTIONS;

proc options; run; *Returns current values for all system options in SAS

Sorting Datasets (1 var vs Multiple)

proc sort data=mydata; BY sortvar1 sortvar2 sortvar3; run; *Sorts by var3 w/in var2 w/in var1 **Default is ascending order ***To sort all descending, must place "descending" before EACH sortvar, i.e. "BY descending sortvar1 descending sortvar2 descending sortvar3"

Rounding

round(var,n) where n = the place you want to round to. e.g. newvar = round(oldvar,10) creates newvar that rounds oldvar to the nearest 10. int(var) rounds to nearest whole integer

ODS Trace

"Trace" all output objects from an ODS statement, including Name, Label, Template, and Path. ODS trace ON; (turn on trace) ODS trace OFF; (turn off trace)

A Few Common SAS Informats

$ = Character variable comma5. = Read 5 in characters w/ commas & convert to SAS numeric (e.g. 1,234 -> 1234) comma10. = Read up to 10 chars w/ commas & convert to SAS numeric (1,234,567 -> 1234567) percent3. = Reads in % as numeric (98% -> 0.98) dollar4. = Reads in 4 characters w/ dollar sign ($125 -> 125) mmddyy10. = Converts mm/dd/yyyy to SAS date mmddyy8. = Converts mm/dd/yy to SAS date

Comparing CharVars/ If variable starts with...

'=:' -> 'starts with' (e.g. if var1 =: 'L', then...) NOTE1: Letters have numeric value "under the hood" where " "<a<b<...z<A<B<C....". NOTE2: Case sensitive! Upper case > lower Letters have >Therefore, "if x >=: 'm'" ~ "if x starts with m-z"

Create a Date Constant

(Create variable with a constant date, e.g. local index case date for COVID) Use 'INFORMAT'd; E.g. data covid; set orig; input cases date mmddyy10.; indexdate = '01MAR2020'd; run;

Calculate Duration/Date Difference

***PREVIOUS: Use date1-date2 (formats in days). BETTER: DATDIF = Calculate date diff in days: datdif(startdatevar, enddatevar, 'actual') YRDIF = Calculate date diff in years yrdif(startdate, enddate, 'actual') *'actual' is the basis - could also set to assume 360 day years/30 day months, etc. **to set a constant start/end date, use "'01Jan2017'd" or similar ***To call current date, use TODAY(), e.g. yrdif(startdate, TODAY(), 'actual')

Positional Macro

*ORDER MATTERS* when listing macro parameters (macro vars) Syntax: %macro mymacro (macvar1, macvar2, macvar3); ***sas code blabla***; %mend mymacro; Then when you call it, you list what each of the macvars are, IN SAME ORDER AS WHEN CREATING MACRO. I.e. if you want macvar1=4 macvar2='OK' & macvar3 = 46: %mymacro (4, 'OK', 46) *NOTE: Macro variables written this way are LOCAL macvars - they work only for this macro.

Running Total by Group

*SORT FIRST BY GROUP* proc sort data=data out=sorted; BY group; run; data total; set sorted; by group; IF first.group then runningtotal=0; runnintotal + origvar; run; HOW IT WORKS: Resets the running total to 0 at the beginning of each group

Numeric Operators

+, -, *, /, ** : add, subtract, multiply, divide, exponent >=/ge or <=/le : Greater/less than or equal to ~=, -=, ^= or ne: Not equal to Combine conditions using AND (&) or OR (|). If combining many conditions, SAS by default evaluates "and"s first, then "ors". Use parentheses to group as needed.

Arrays

-A temporary variable grouping (exists only for that data step) for acting on multiple vars at once -Each variable in the array is assigned a # position based on the order you list them ARRAY myarray[n] var3 var1 var4 var2... myarray = name of array n = # of elements (variables) in the array Positions are 1 (var3), 2 (var1), 3 (var4) and 4 (var2)

Common Time & DateTime Informats/Formats

ANYDTDTEw. = Extracts many diff date/time formats DATETIMEw. = Reads ddmmmyyyy hh:mm:ss $N8601Bw.d = Reads ISO 8601 date formats (e.g. 1994-11-05T08:15:30-05:00) HHMMSSw. = Reads hh:mm:ss form B8601TMw.d = Read ISO 8601 times (hhmmss<ffffff>)

Create Separate Histograms by Group

Add CLASS statement. E.g.: proc univariate data=mydata; histogram numvar; CLASS charvar; run;

Permanently assign an output format in the DATA step

Add FORMAT line into data/set step: data new; set old; format date mmddyy10.; run;

Label variables in PROC TABULATE

Add labels in TABLE statement using var1='Label'. E.g. TABLE groupvar1="My Group", numvar1="Age"*mean numvar2="Height"*median;

Order Rows in PROC REPORT

Add to DEFINE statement: proc report data=... ; column var1 var2... ; DEFINE var1 /order order=internal; run; ORDER OPTIONS: Data = Same order as data Freq = Descending by frequency Internal = Like proc sort (???) raw values Formatted = Sort by formatted values

Order Class Vars in PROC TABULATE

Add to PROC TABULATE statement. Options: ORDER=DATA -> Follows dataset order ORDER=FREQ -> Sorts by descending freq ORDER=FORMATTED -> by formatted values ORDER=UNFORMATTED -> by raw values (ascending) e.g. proc tabulate data=mydata order=freq;

Add Descriptive Stats in PROC TABULATE

Add to each analysis variable using asterisks in TABLE statement. Examples: TABLE groupvar, numvar*mean; TABLE groupvar, numvar1*min numvar1*max; TABLE groupvar, numvar1(min max mean std)*f=dollar9.2; Common options: mean,median,std,min,max

Formatted Input

Allows SAS user to read in nonstandard data formats using informats (e.g. dates, numbers with commas or dollar signs) & pointer controls (to specify variable positions). SAS reads until it reaches the last column specified, not the last space (like col input). LIMITATIONS: -Must specify where the pointer should look for a variable using column widths/pointer controls (e.g. var1 1-6 var2 $ 8-12), absolute pointer controls (e.g. @1 var1 @8 var2), or relative pointer controls (e.g. var1 +2 var2)

Graph Template Language (GTL)

An extension of ODS for creating "sophisticated graphics". Graphics are generated using templates created in PROC TEMPLATE, then graphs are rendered using SGRENDER. proc template; define statgraph histogram; begingraph; layout overlay; histogram weight; endlayout; endgraph; end; run; ods graphics / width=450px; proc sgrender data=sashelp.class template=histogram; run;

FIRST.var and LAST.var

Automatically-created variables during PROC SORT; BY;. FIRST.var = 1 if it's the first observation with a given value of the BY variable; LAST.var = 1 if it's the last observation with the BY variable, else both=0. Use to unduplicate in PROC SORT: proc sort data=mydata out=clean NODUPRECS; BY sortvar; run; Write to Dataset (after proc sort out=sorted): data newdata; set sorted; by sortvar; first = first.sortvar; last = last.sortvar; run;

Group Observations

BY <groupvar>; *Only for SAS datasets - obs must be sorted

BY variable vs BY value vs BY group

BY variable: The variable data is being sorted/ grouped by (e.g. ID) BY value: For one obs, the value of its by var (e.g. ID=1001) BY group: The group of all obs with the same BY value (e.g. if there are 5 obs with ID=1001)

ODS HTML - Files produced

Body File: Main table or graph Contents: Contains links to body files for each HTML table/graphic, with anchor tags based on NAME attribute (anchor: <A href='myfile.htm#TABLEID>) Page: Links to body file for each pg Frame: Provides simultaneous view of body + contents and/or page.

%eval

Built-in macro to evaluate arithmetic & logical operations (use SAS like calculator). Ex. %let d=%eval(10+20); %let k = 1; %let tot = %eval(&k + 1); %put &tot; (writes answer to log)

Temporary vs Permanent Datasets

By default, all data stored as temporary in the WORK library. Must specify 2-level name (library.dataset) to store permanently

UPDATEMODE=NOMISSINGCHECK (MODIFY statement)

By default, in MODIFY statements if a value is missing in the transactional dataset, SAS does not replace the master value. You can override this with UPDATEMODE = NOMISSINGCHECK so transactional values override master, even if missing.

Informats & Input Style

COLUMN input: Add formats in input statement (e.g. var1 mmddyy8. var2 $ 10-13 var3...) LIST input: Add separate INFORMAT statement OR colon. 1. INFORMAT stmt data... informat datevar mmddyy8.; input datevar; 2. COLON input datevar : mmddyy8.;

COMPBL & COMPRESS

COMPBL collapses 2+ spaces to 1. COMPRESS removes spaces (default) or other character. e.g. if x = "can't / touch this": COMPBL(x) = "can't/touch this" COMPRESS(x) = "can't/touchthis" COMPRESS(x,'/') = "can't touch this"

PROC DATASETS - Copy/Move Datasets

COPY ALL DATASETS: proc datasets; copy out=outlib in=inlib; run; MOVE ALL DATASETS (deletes from inlib): copy out=outlib in=inlib move; MOVE/COPY ONLY SELECTED DATASETS: copy in=inlib out=outlib; select data1; MOVE/COPY ALL BUT EXCLUDED DS: copy in=inlib out=outlib; exclude data1;

WEEKDAY(date)

Calculates day of week for a given date (though SAS recommends testing)

Notepad window

Can also be used to edit/develop code (honestly don't see the point on this and internet is not helpful)

RETAIN

Causes an observation to retain its value from one data step iteration to the next (instead of being reset to missing each time). (don't fully understand use cases though... possibly means subsequent manipulations in the dataset will not apply because the original values are retained?)

VERIFY function

Checks for invalid characters, returns index position of first invalid character if present, otherwise 0. SYNTAX: verify(charvar, 'validchars'); e.g. if x values are ABCCE, BEAXI, EBBB, verify(x,'ABCDE') -> 0, 4, 0

Interleaving Datasets

Combines SORTED datasets vertically, & sorts them in the new dataset on a common variable. Syntax: [1st proc sort data1 and data2] data combined; set data1 data2; BY sortvar; run;

Merging

Combines datasets horizontally. ***DATASETS MUST BE SORTED ON IDVAR FIRST.** 1-to-1 Merge: When datasets have the exact same observations in the same order (exact match) - you can basically just glue datasets together side-by-side. (data combo; merge data1 data2; run; - no BY statement). Mainly use if order of merge doesn't matter Match Merge: Merging on a common identifier/BY variable (data combo; merge data1 data2; BY byvar; run;)

SPEDIS

Computes likelihood of 2 words matching, where 0=perfect match, higher score = worse match. SYNTAX: spedis(charvar, "match") E.g. Data match; Input word $12.; Match = spedis(word, 'Epidemiology'); Datalines; Epidemiology Encyclopedia Epidemic Epistemology ; Run; Yields Match=0,87,37,20

Concatenate Datasets (2 Methods)

Concatenate datasets = Stack vertically. Method 1: DATA/SET (any # of datasets) data new; set old1 old2; run; Method 2: PROC APPEND (2 datasets max) proc append base=basedata data=newdata; run; (optional FORCE option)

TODAY()

Sets date program is run as today's date. Can also assign variable using this statement. E.g. data new; set old; today = TODAY(); days = deptdate - today; run;

proc print advance options

Default: All vars & obs print ID <var>; Emphasizes 1 variable & makes it first column for all "pages"/tables WHERE: Used to subset data FORMAT: Add format st (proc print... ; var... ; format var1 comma7. var2 mmddyy10. etc. SUM: Specify vars to sum (will add that column to the output if not in VARS) BY: Group output using BY <groupvar> - must be sorted 1st! -> can also break out by page using PAGEBY

ODS Table Definitions

Define table formatting - not required. Can run ODS without them. May include column order, data formats, headers, fonts, etc.

Semantic Errors

Definition: Correct language elements but incorrect usage, e.g. wrong # of arguments, using a numeric variable where only character is vaild. When Detected: Usually compilation, sometimes execution Consequences: SAS enters syntax check mode (only ltd processing but SAS runs through the whole program to validate it.)

Execution-Time Errors

Definition: SAS attempts to execute program but fails. E.g. invalid arguments to functions, invalid math operations, obs in wrong order for BY group, out of storage space. When Detected: Execution Consequences: Usually runs & produces warnings or notes, but stops processing for serious errors.

Data Errors

Definition: Some data values are not appropriate (e.g. some character values in a numeric var) When Detected: Execution phase Consequences: SAS generates _ERROR_=1 for that obs but keeps running, writes invalid values to log.

Syntax Errors

Definition: Statements don't conform to SAS rules (e.g. missing semicolon/ quote, misspelled or incorrect keyword). When detected: Compilation phase Consequence: SAS first attempts to try and autocorrect/interpret. If can't prints error message & stops running.

Macro-related errors

Definition: When a macro facility is used incorrectly When detected: Macro compile time or execution Consequences: Depends

Reading in Data Files: Direct vs Indirect Method

Direct: Include filepath in DATA step. E.g.: data temp; infile '/folders/myfolders/temp.dat'; input... Indirect: Set up fileref at the beginning for easy reference later. E.g.: filename temp '/folders/myfolders/temp.dat; ... data temp; infile temp; input..."

Double-click & Right-click if you have no mouse

Double-Click: Type x or x next to the item, then enter Right-Click: Type ? next to the item, then enter

ENDSAS; or BYE;

Ends SAS session (I would imagine best for batch processing/large jobs running in the background)

More SAS Number Formats

Ew. = Scientific format (2.35E-3) Fract. = Fraction (1/4) Percentw.d = Percent (75.25%)

Combining DO Loops + Array

Executes multiple actions on groups of variables - use i= statement to specify which positions in the array to act on. EXAMPLE: data fish; set sasuser.fish; ARRAY fisharray[3] height weight length; DO i = 1 to 3; IF fisharray[i] = 0 then fisharray[i] = .; IF fisharray[i] > 1000 then fisharray[i] = .; END; RUN; *Recall must add END statement to complete DO

How to Handle Missing or Varying Data (e.g. supposed to be 5 characters, it's really 3)

FLOWOVER (default) - looks to next record STOPOVER (only for supposedly perfect data) MISSOVER (converts all to missing) TRUNCOVER (allows shorter data values) *All used after INFILE e.g. infile filename 'filepath' truncover; infile datalines truncover;

Ways to Run SAS

FOREGROUND (mostly interactive, can't do anything else in SAS while running) & BACKGROUND processing. *All are foreground except batch. 1. Windowing enviro (usual, most flexible) 2. Noninteractive (prep a file, run it then it goes - can't use your work station or modify while running) 3. Batch mode (runs larger operation in "batches" in the background of your computer; you can do other tasks at work station simultaneously) 4. Interactive line mode: Enter one line of a program at a time, SAS executes automatically when end of line is reached

Compilation Phase (3 things produced)

First stage in reading in a dataset. Checks for errors & compiles statements. The compilation stage produces: - Input buffer - Program Data Vector (PDV) - Descriptor Information

File Paths for Data or Libs (Windows)

For SAS myfolders: 'folders/myfolders/libref'; 'folders/myfolders/data.dat'; For other location on DPH computer (example): 'C:\Users\employeeid\Documents\test.csv'

proc sgplot

For creating more advanced graphics. Basic syntax: proc sgplot data=mydata; <charttype> var1 <options> / <response= var2>; run; Chart types include histogram, vbar, hbar, dot, series (lines), density, hbox, vbox

Z formats

Formats standard numeric variables **with leading zeros** (otherwise leading zeros will be removed). e.g. if x values are 00552, 1234, 0873 input x Z.; (will keep leading zeros)

proc import vs infile/input

From SAS communities: -INFILE + INPUT is more flexible, allows more control in how data is read in -IMPORT proc tries to "guess" data type, width, format, etc. https://communities.sas.com/t5/General-SAS-Programming/Difference-between-INFILE-AND-IMPORT/td-p/102115#:~:text=INFILE%20together%20with%20INPUT%20in,column%20type%2C%20width%20and%20informats.

proc univariate/histogram options

GRID: Add reference grid to chart (cgrid=grid color, lgrid=line type. E.G. histogram numvar / grid cgrid=blue;) VSCALE: Sets units for y axis (options are COUNT, PERCENT, or PROPORTION). VAXIS: Sets increments for y axis (e.g. histogram numvar / vscale=count vaxis=0 to 10 by 2) MIDPOINTS: Set bin width (e.g. histogram var1 / midpoints = 0 to 500 by 100;) VAXISLABEL: Label vertical axis (e.g. histogram var1 / vaxislabel='My Label' INSET: Adds mini tabel within graph with key stats. (e.g. proc univariate data=mydata; histogram var1; inset n mean min max; run;)

NOSOURCE, NONOTES and ERRORS=

Global options statements which suppress log output. NOSOURCE = Suppress SAS statements in log NONOTES = Suppress notes in log ERRORS = #: Tells SS the max # of observations for which to print error messages Enter as standalone (global) options line: e.g. options nosource nonotes errors=5;

OPTIONS missing=

Global statement, sets missing value for output e.g. OPTIONS missing='missing';

PRINTTO=

Global statement, specifies where to print log or output: OUTPUT: proc printto print='output-file' new;run; (reset to default: proc printto print=print; run;) LOG: proc printto log='alt-log-file'; run; (reset to default: proc printto log=log; run)

Missing Data in PROC TABULATE

If class var has missing data, use MISSING option. If analysis var ", will exclude from analysis. Can label missing values using MISSTEXT= option. Ex. proc tabulate data=mydata missing; class x y z; table x all y all / misstext = 'No Data'; run;

WHERE/IN vs IF/THEN

In a DATA step, WHERE/IN statements are applied before the input buffer, vs IF/THEN conditions are applied between the PDV & final output dataset. As a consequence: -IF is better when accessing raw data, for using new vars or automatic vars (_N_, _ERROR_), for using dataset options (e.g. OBS, FIRSTOBS), or to conditionally execute a statement. -WHERE is better for special operators (LIKE, CONTAINS), subsetting as a dataset option, w/ PROC SQL, to improve efficiency. NOTE: When merging datasets, WHERE/IN is better applied BEFORE merge, vs IF/THEN is better AFTER.

LIST statement

In a DATA/INPUT statement, tells SAS to list observations in the log as it's reading them in. E.g.: data vehicles; input id type $ wheels; if wheels > 4 then list; datalines; 1001 car 4 ... etc.

Rules for SAS Names

Includes datasets, varnames, librefs, etc.: -Can contain 1-32 characters -First character must be a letter or underscore -Subsequent chars may be letter, number or underscore -No blanks or other special characters -[Varnames only] Not case sensitive when calling/ using, but SAS will always output using the original varname case entered.

Input (function) vs Input (keyword)

Input (function): Used to convert character values to numeric OR other character e.g. phone = input(phonenumber,5.) Input (keyword: Instructs SAS how to write the values of SAS variables e.g. data new; input var1 var2 $ var3...

Informat

Instructs SAS how to read data, like date values, into SAS variables

Format

Instructs SAS how to write/format the values of SAS variables (in output)

PROC DATASETS basic functions & syntax

Interactive proc used to bulk-manage, modify, & view SAS libraries & datasets. Runs until QUIT; used. Basic Syntax: proc datasets library=mylib; <many diff sub-commands>; Run; Quit; ***If no library specified, assumes default/work KEY FUNCTIONS include copying/moving files across libraries, listing library contents or dataset attributes, renaming files, repairing files, deleting files, appending datasets, modifying datasets (e.g. rename vars, change/add/remove formats & labels), manage passwors

proc plot options

Label Axes: Add LABEL statement (proc plot...; plot y*x... ; LABEL yvar = 'Y Label' xvar = 'X Label'; run;) Set Axis Values: Add HAXIS or VAXIS statement (plot yvar*xvar / haxis = 0 to 200 by 5 vaxis = 0 to 3000 by 100;) Change Dots: Add ='plotchar', e.g. plot yvar*xvar='+'; (will plot dots as + signs) Suppress legend: NOLEGEND option (proc plot data=... nolegend;)

KEYLABEL statement (e.g. proc tabulate)

Lets you assign labels to keywords/auto-generated titles, e.g. proc tabulate data=mydata; class female ses; table female ALL, ses ALL; keylabel ALL = "Total" N = " "; /*2 spaces between... not sure why this works but it does*/ mean = "Mean" lclm = "95% Lower CL" uclm = "95% Upper CL"; run; ^Labels ALL total, removes N row from proc tabulate output.

Variable Assignment

Lets you create new variables or replace/modify old during the DATA step - e.g.: data new; set old; newvar1 = oldvar1*10; if oldvar2 = '1' then newvar2 = 'yellow'; if oldvar2 = '2' then newvar2 = 'blue'; OR (replace) if oldvar2='y' then oldvar2 = 'yes';

PROC TEMPLATE

Lets you create template, e.g. for ODS output. Basic syntax: PROC TEMPLATE; DEFINE <table defs>; HEADER... ; COLUMN ... ; END; RUN; *See pg 586-7 for example code

ODS Select/ODS Exclude

Lets you select specific objects to output (must know name first htough). E.g. ods select BasicMeasures Quantiles; proc univariate... ^Selects only 2 tables to output from proc univariate.

Modified List Input

Lets you use informats for list-format (delimited) data. RULES: -Must use colon between varname & informat (e.g. data test; input var1 : $10. var2 : mmddyy10.) -Can allow embedded nulls IF there are at least 2 spaces or other dlm between values using "&" (e.g. data test; input var1 & $18. var2 $;)

Data Input Types

List input Column input Formatted input (Also, modified list & mixed)

PROC DATASETS - Get library listing

Lists files ("members") in a SAS data library: proc datasets library=mylib; run; OPTIONS: MEMTYPE= Specifies file type (e.g. data, catalog) e.g. proc datasets library=mylib memtype=data.

Log Definition & Components

Log provides record of everything you do in a SAS session. Key components: -SAS statements -Error messages -Notes & warnings -Notes w/ # of obs and variables for each dataset created

Program Data Vector (PDV)

Logical memory area where SAS builds a dataset, one observation at a time. The PDV reads or creates data values (e.g. if missing #, -> ".") _N_ and _ERROR_ are also created in the PDV

Input Buffer

Logical memory area where SAS reads each record from a raw data file when the program executes (if reading another SAS data file, skip this step)

PROC TRANSPOSE: Long to Wide

Long to Wide - Basic Syntax: proc transpose data=flippy out=wide prefix=weight; BY id; ID visitnum; VAR weight; run; Here start with a long dataset with 3 columns: ID, visit number, and weight -> Transform to a dataset with 1 row per ID & 3 columns for weight at visits 1-3. *BY is the ID variable you want to be the unique row identifier in the wide DS (must sort data on BY var first). **ID tells SAS which variables or values you want transposed to new columns. Here, it's visit #. ***VAR is the VALUE you want to keep & transpose by ID (i.e. here, we want weight by visit number). Usually only specify one var for wide to long.

proc chart options

MIDPOINTS: Set axis intervals for continuous numeric variables (vbar var1 / midpoints=0 to 100 by 5); LEVELS: Sets # of cut points for bar charts of numeric variables (vbar var1 / levels=5); DISCRETE: Override SAS auto-grouping numeric variables & plot all discrete values. (e.g. vbar numvar1/ discrete) SUBGROUP: Set subgroups within main groups (e.g. vbar var1 / sugroup = var2;)

PROC REPORT missing data

MISSING option tells SAS to include missing values as a distinct value. No MISSING option will lead SAS to ignore or drop these values. E.g. proc report data=grocmiss missing; column sector manager N sales; define sector / group format=$sctrfmt.; define manager / group format=$mgrfmt.; define sales / format=dollar9.2; rbreak after / dol summarize; run;

Common Date Informats/Formats

MMDDYY8. -> mm/dd/yy MMDDYY10. -> mm/dd/yyyy DATE7. -> ddMMMyy (e.g. 25JUL07) DATE9. -> ddMMMyyyy WORDDATE18. -> January 25, 2020 WEEKDATE29. -> Saturday, March 8, 2020 ANYDTDTE10. -> Used to read in mixed/any date format (though not always perfect) *Note: Best practice to use 4-digit year

Descriptor Information

Metadata info stored in the descriptor portion of a SAS dataset. Includes variable and dataset attributes. Can be viewed using PROC CONTENTS

Missing Data (Numeric)

Missing numeric entered as period (".") -"." sorts low in PROC SORT or comparison (e.g. n>.) -Some procs auto-exclude missing, others do not. Be careful!

Proc Plot - Multiple Scatter Plots

Multiple Plots, Different Pages: Multiple y*x's e.g. proc plot data=mydata; plot yvar1*xvar1 yvar2*xvar2... ; run; Multiple Plots, Same Page: Use VPCT/HPCT, where VPCT = % of vertical space given to each graph, HPCT = % of horizontal space. e.g. proc plot data=mydata hpct=33; ...plot... run; Multiple Variables Plotted on Same Axis: Use OVERLAY option e.g. proc plot... ; plot yvar1*xvar1 yvar2*xvar1 / overlay; run;

Message characters

Non-numeric characters used in proc format/PICTURE that will be printed in numeric output. (E.g. hyphens for phone numbers or SSN)

Call routines

Similar to functions, but different in that you cannot use them in assignment statements or expressions

ODS

Output Delivery System. How SAS produces output in different formats, such as HTML, SAS, RTF (word), a non-SAS dataset. How it Works: 1. Data and Tabular definitions (with formatting instructions for data) are combined into the OUTPUT OBJECT. 2. Output object has a name, label, path, and usually formats. 3. Output object is directed to an ODS destination, which specifies the type of output & location 4. SAS creates final formatted output.

proc import basic syntax & options

PROC IMPORT datafile='C:\filepath" dbms=dlm out=mydata replace; delimiter='&'; getnames=yes; run; *DBMS specifies data type. Options include csv, xlsx, dlm (space- or other delimited file), tab (tab-delimited), access (MS access database), dbf (dBASE file) **OUT= specifies name (& lib) of output SAS dataset ***REPLACE: Overwrites existing SAS dataset ****GETNAMES= Pull variable names from 1st row of data

proc import vs infile/input - reading in csv

PROC IMPORT datafile='filepath' dbms=csv out=mydata replace; getnames=yes; run; DATA mydata2; infile 'filepath' dlm=',' firstobs=2; length var2 $12; input var1 mmddyy10. holiday $; run;

PROC REPORT basic structure

PROC REPORT data=mydata <options>; COLUMN var1 var2...; DEFINE... (basically used to define everything from formats to output statistics to which vars to group by. Options include GROUP, ANALYSIS, and more); RUN; Simplest PROC REPORT looks like PROC PRINT (default = all vars, all obs). Can also do summary stats like PROC TABULATE. Supposedly the most flexible.

PROC TABULATE structure

PROC TABULATE data=mydata <options>; CLASS var1 var2...; /*usually charvars*/ VAR numvar; /*analysis vars - numeric*/; TABLE var1*var2, numvar*mean*f=comma7.; run; TABLE statement tells SAS how to structure the output. Must specify column var at minimum - if 1 comma, goes TABLE row, col. If 2 commas, goes page,row,col. EVERY VARIABLE YOU WANT TO USE MUST BE SPECIFIED IN A CLASS OR VAR STATEMENT

PROC DATASETS - List dataset content

Proc datasets library=mylib memtype=data; CONTENTS data=mydata; run; Outputs header, name, engine, date created & last modified, var names & attributes, more. *NOTE: Can only specify 1 dataset per contents stmt MODIFIERS: VARNUM: Lists vars in order of their logical position in dataset (e.g. contents data=mydata varnum;) SHORT: Lists short 1-line list of vars (" short;) NODS: Provides listing only (" nods;)

Concatenation using SET vs PROC APPEND

Processing time: Proc append is faster (doesn't process 1st dataset) # of datasets it concatenates: 2 only for proc append, any number for set Handling Different Variables (e.g. dataset 1 has a variable dataset 2 doesn't): SET keeps all vars from all datasets, sets to missing as needed. APPEND uses all vars from BASE, sets to missing if not included in new data. Drops any new data vars not in BASE dataset. Handling Different Varnames or Types: SET doesn't run. BASE will treat diff names like diff variables, only run for type if you use FORCE option Handling Diff. Lengths: - SET: Uses length from 1st dataset in SET statement - PROC APPEND: If BASE length >= DATA, keeps BASE length. If DATA > BASE, doesn't run unless using FORCE option, in which case it truncates DATA. Handling Different Formats, Informats, & Labels: - SET: Explicitly defined overrides default, 1st dataset overrides if all are defined - PROC APPEND: BASE overrides

proc univariate histograms - purpose + basic syntax

Produces a histogram showing continuous NUMERIC variable distribution. Basic syntax: proc univariate data=mydata noprint; histogram numvar; run; *NOPRINT suppresses a bunch of tables auto-generated by proc univariate. **Auto-scales vertical axis & horizontal bins

proc plot purpose & basic syntax

Produces simple scatterplot of two (numeric) variables. Basic syntax: PROC PLOT data=mydata; PLOT yvar*xvar; Run; Default output includes x/y labels, legend, dots are alphabet where A=1 obs, B=2 obs, ... Z=26+ obs

proc chart function & basic syntax

Produces simple vertical & horizontal bar charts, block charts, pie charts, and star charts. Defaults to showing FREQUENCY (n) of each group. Basic syntax: PROC CHART data=mydata; <charttype> var1; run; *<charttype> can be VBAR, HBAR, BLOCK, PIE, or STAR **For numeric variables, SAS will auto-set cutpoints.

3 SAS Programming Windows

Program Editor, Log, and Output (Results Viewer is now default output destination for 9.3+) *5 most common windows also include Results & Explorer.

proc contents

Provides descriptor portion of dataset, or lets you view contents of an entire library. Simplest use: proc contents data=mydata; run;

Put (function) vs Put (keyword)

Put (function): . Returns a value using a spec. format, used to convert numeric to character e.g. id2=put(id, 8.) Put (keyword) Writes lines to the SAS log, output window, or to an external location specified in the most recent FILE stmt, valid in a DATA step e.g. data concat; set data1 data2; if type = 2 then put id; run;

Macro variable lists

Quick way to reference a list of variables. %LET varlist = var1 var2 var3 var4; proc freq data=lib.mydata; table &varlist; run;

PROC DATASETS Modify Variable Attributes

RENAME VARIABLES: proc datasets library=mylib; MODIFY mydata; RENAME oldvar1=newvar1 old2=new2... ; run; CHANGE/ASSIGN FORMATS: proc datasets... ; modify mydata; FORMAT datevar1-datevar3 mmddyy9. var4 $20...; run; CHANGE/ASSIGN LABELS: proc datasets...; modify mydata; LABEL var1='Label 1' var2='Label 2'... ; run; REMOVE FORMATS/LABELS: proc datasets...; modify mydata; ATTRIB _all_ label=''; ATTRIB _all_ format=; run;

ODS Output Keywords/Types

RTF = MS Word Output = SAS dataset Listing = SAS text output (classic) HTML = HTML, web (9.4 default) Printer = High-resolution output, specify printer path PDF = PDF

Column Input

Reads in data structured in fixed columns. ADVANTAGES: -Character variables CAN contain spaces or embedded blanks -Can create variables >8 bytes without using LENGTH statement -Lets you read columns in any order or skip some -Char vars can be up to 32,767 bytes LIMITATIONS: -Only standard numeric or character data allowed -Must specify column widths (e.g. var1 1-6 var2 $ 8-12).

List Input

Reads in delimited data where 1 row = 1 obs. Default delimiter is a space, for other delims specify "dlm =" (e.g. for csv do dlm = ','). RULES: -Only for data in standard numeric or char formats -No embedded blanks allowed -Must specify missing values using a non-space character such as '.' or '-' -Default length of char vars is 8 bytes

Conditional DELETE (Data step)

Remove variables that meet a condition. E.g.: data subset; set fulldata; if project = 'done' then delete; run;

UPDATE datasets

Replaces and/or appends data in a master dataset based on a new, transactional dataset. (If some idvars are missing from new dataset, does not delete them - only adds/modifies). data master; update master transaction; by ID; run; UPDATEMODE= option -> specifies what to do with missing data in new transactional data. UPDATEMODE=MISSINGCHECK (default) leaves master data values in place if they're missing in new dataset. UDPATEMODE=NOMISSINGCHECK replaces master data with missing if missing in new dataset. ***Also can create a new dataset

MODIFY

Replaces, deletes or appends existing dataset (similar to update, but cannot create NEW datasets, only replace). Can modify a dataset by itself, or modify a master using a transactional. data mydata; modify mydata; var1 = var1*.25; run; data master; modify master transaction; by id; run; *NOTE: Data do not need to be pre-sorted

INTCK

Returns # of specified intervals between any 2 dates. Format: intck(interval, startdate, enddate, <method(optional)>) Interval options include 'DAYS', 'MONTH', 'qtr', 'YEAR', 'WEEK', custom

INTNX

Returns date given an interval, start date, and number of intervals elapsed. Basic format: intnx(interval, startdate, increments) Intervals similar to INTCK - 'DAY', 'week', 'MONTH','YEAR','HOUR' (not case sens.)

COMPRESS

Returns string variable with specified characters removed (defaults to remove blanks if no character specified). E.g. compress(var,'1234567890') -> removes digits compress(var, "+-", modifier) Modifiers add a group to the list of modified characters

data _null_;

Runs a DATA step without generating a dataset. Often used with PUT (i.e. write to log instead). E.g. data _null_; input var1 var2 var3 var4; put var1 var2 var3 var4; /*write to log*/ datalines...

PICTURE + prefix option

SAS can't put character as the first digit automatically. Add prefix: proc format; picture ph low-high = '0999) 999-9999' (prefix = '(' ) ; run;

How SAS Codes Dates

SAS codes datases as the # of days before or after January 1, 1960. (1/1/1960 = 0, 1/2/1960 = 1, etc.) For data in mmddyy8. format, e.g. 01/07/70, SAS uses YEARCUTOFF to infer the first 2 digits of year. For v9.4, YEARCUTOFF defaults to 1940: if YY<40, SAS assumes it's in the 2000s (2000-2040). If YY>40, assumes it's in the 1900s (1941-1999).

SCAN function

Scans for the nth "word" separated by a specified delimiter: scan(source,n,delim) E.g. if x="my dog, she's the best, I swear!" scan(x,1,',') = "my dog" ("my dog" is the first "word" or chunk of variable before the first delimiter. scan(x,-1,',')=" I swear!" (n=-1 returns LAST segment)

INDEXC

Searches character for 1+ characters. Syntax: INDEXC (character-value/var, 'char1', 'char2', 'char3', 'charn') -> returns position (index #) of the FIRST occurrence of char1, char2, char3, charn; if none found, returns 0. Can also search for multiple chars using INDEXC(string, 'char1char2char3char4'); e.g. if string="ABCDEFG" indexc(string,'F','C','G') -> 3 (position of C) indexc(string,'FCG') -> 3 (position of C) indexc('x','y','z') -> 0 (not found) **Can use this to read in mixed date formats (see Fun w/ Functions exercise)

FIND

Searches for substring within stringvar & returns position # of first letter in substring. (Like index but for substrings?) *note: 'i' modifier = ignore case. e.g. find_silver=FIND(rich, 'silver','i');

Execution Phase

Second phase of a SAS DATA step, where SAS executes the data step statements ONE OBSERVATION AT A TIME, looping until it reaches the end of the dataset. Steps: 1. Begin data statements 2. Sets variable values to missing in PDV 3. Checks: is there a new record (obs) to read? If yes, continue to 4, if no, end DATA step & proceed to rest of program 4. Read 1 input record (obs) 5. Execute any additional data steps (e.g. formats, labels) for this 1 record 6. Write this 1 record to the SAS dataset 7. Loop back to step 1.

SAS 9.4 Universal Printer

Set printerpath, then just use ods printer. E.g.: options printerpath=pdf; ods printer; proc means data=mydata; ... ; run; ods printer close;

Subset Datasets

Subset OBSERVATIONS: 1. Delete: IF <condition> THEN DELETE; 2. Accept/Keep: IF <cond.>; *How it works: SAS does not write obs to output dataset; jumps to next obs. Generally use whichever requires least arguments (SAS prefs #2) **Can also use WHERE/IN - best for PROCs Subset VARIABLES: Use drop/keep

SUBSTR(left) and SUBSTR(right) vs SUBSTRN

Substr(left) and substr(right) refer to their position in an assignment statement. Right (usually to EXTRACT): newvar = substr(oldvar,1,5) Left (usually to REPLACE): substr(oldvar,1,3) = '123' (replaces first 3 characters of oldvar w/ 123) substrn like substr but a few extra features See https://support.sas.com/resources/papers/proceedings/proceedings/sugi25/25/cc/25p088.pdf

SUM(x)

Sums values, excluding missing E.g. totalcost = sum(cost,tax);

ALL option for PROC TABULATE

System option, adds total columns/rows for specified variables. (KEYLABEL lets you label sas keywords). E.g. proc tabulate data=mydata; class female ses; table female ALL, ses ALL; keylabel ALL = "Total"; run; ^Produces table with total row for gender, total col for SES

&SYSDATE

System-generated macro variable that calls the date the program is run, with default format 06NOV20. E.g. data _null_; CALL SYMPUT ('mydate', trim(put("&sysdate"d,mmddyy10.))); run; %put &mydate; Other automatic macro vars include &SYSDATE9, &SYSDAY, and more. See https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=mcrolref&docsetTarget=n0t7cm070dig9hn1kmz3892ebtmw.htm&locale=en.

Cross Elements in PROC TABULATE

TABLE var1*var2 (groups rows/columns first by var1, then var2 within var1)

DO group

Tells SAS to DO several actions based on a common condition, until an END statement ends that condition. E.g. data new; set old; IF city = 'amsterdam' then do; country = 'Netherlands'; museums = 15 walkscore = 82; end; ELSE IF city = 'roma' then do; ...

Formatting Proc Print Output

Title statement (TITLE 'My Title';) Labels: (LABEL var1 = 'mylabel' var2 = 'label2';) Double spacing: Add "double" option (proc print data=mydata double;) Footnotes: Like title statement (FOOTNOTE 'here is my footnote';) Uniform Column Widths: Add width=uniform (proc print data=mydata width=uniform;)

TRANSLATE & TRANWRD

Translate: Searches for & replaces a specific character in a character variable -> translate(variable, tochar, fromchar) Tranwrd: Searches & replaces a substring -> tranwrd(variable, target, replacement) E.g. if x = "My cat, she is fluffy, orange, and fat" translate(x,'!',',') = "My cat! she is fluffy! orange! and fat" tranwrd(x,'cat','enchilada') = "My enchilada, she is fluffy, orange, and fat"

LAG, LAG2 or LAGn

Used to compare/compute differences between subsequent OBSERVATIONS (vertically). LAG(x) by default returns the value for the observation immediately prior. LAG2(x) returns value for 2 obs prior. LAGn(x) returns value for n obs prior. E.g. Data lastime; Input day covtest; Prev_day=lag(covtest); Two_Days_Ago=lag2(covtest); Four_Days=lag4(covtest); Datalines; 1 40 2 165 3 256 4 145 ; Run; LAG values: - , 40, 165, 256 LAG2 values: -, -, 40, 165... etc. **Could use this to compute 3-day averages, etc.

INPUT + "trailing @" (input... @;)

Used to read in only observations that meet a specified condition: data fun; input conditionvar 3-5 @; IF conditionvar = 3; input var1 var2 var3 conditionvar var5; datalines; ...

keep vs drop

Used to select variables to keep or drop when reading in a dataset. Can apply either to the original set (KEEP= / DROP=) or new dataset (KEEP/DROP). Example 1 (only reads in kept variables to PDV): data new; set old (keep = var1 var2 var3); run; Example 2 (reads in all vars to PDV, only writes kept vars to new dataset): data new; set old; keep var1 var2 var3; *no equals sign* run;

LENGTH statement

Used to specify length of variables in the DATA step. MUST COME BEFORE input statements/new variable creation. E.g.: data new; set old; length newvar $10; if oldvar = 'purple' then newvar = 'cool'; else if oldvar = 'yellow' then newvar = 'totallyrad'; run; (Prevents SAS setting default length=4 and cutting off 'totallyrad' at just 'tota') NOTE1: Moves variables to the left of dataset NOTE2: Can be used to save storage for numeric vars (mainly integers), but more common for character

OUTPUT

Used to specify one or more output datasets. Note1: This command suppresses auto-output, so if using must write output statement for all obs. Note2: Order matters! Always put at the END of a data step (after variable assignments) E.g. (writes 4 output datasets): data italy other summer restofyear; set olddata; if country = 'Italy' then output italy; else output other; if season = 'Summer' then output summer; else output restofyear; run; **MAKE SURE TO PUT ALL OUTPUT DATASETS in first DATA line!!!!

INPUT + double trailing @ (input... @@;)

Used to split one record into multiple observations by telling SAS to loop the same variables until it finishes reading the whole record: data test; input gender $ age @@; datalines; m 13 f 22 m 21 f 65 m 18 f 34 ; run; *will run gender & age twice for each line to split it into one long dataset w/ age & gender.

PICTURE

Valid in proc format, lets user create custom numeric formats by showing SAS a 'picture' of what the output should look like. EXAMPLE: proc format; picture ssn low-high = '999-99-9999"; run; [call using format] NOTE1: Picture cannot start with symbol - use PREFIX option to start w/ symbol NOTE2: *ssn = name of format **low-high = tells SAS all values should be printed with that format

Read Multiple Records into 1 Obs

When data for one record is split on multiple lines: e.g. 1023 David blue 189 1033 Angela ... Can use multiple input statements, "/", or "#n": Option 1 (multiple inputs): data ppl; input id 1-4 name $ 6-23; input eyes $ 1-6; input weight 1-3; datalines... Option 2: "/" data ppl; input id 1-4 name $ 6-23 / eyes $ 1-6 / weight 1-3; datalines... Option 3 "#n" data ppl; input #1 id 1-4 name $6-23 #2 eyes 1-6 #3 weight 1-3 (or whatever order you want)

PROC TRANPOSE: Wide to Long

Wide to Long - basic syntax: proc transpose data=wide out=long; VAR weight1-weight3; BY id; run; Creates table with an id column, _NAME_ (weight1, weight2, or weight3), and values column (actual values for weight1-3). *BY is the unique identifier you want to repeat across rows *VAR specifies variables that contain similar data over different columns, e.g. for repeat measurements over time.

PROC TRANSPOSE - basic, no options

With no options specified, proc transpose will exactly transpose the data (flip rows & columns, putting all varnames in a column called _NAME_, and naming other cols COL1, COL2...COLN. E.g.: proc transpose data=flippy out=testy; run;

FILE PRINT <options>;

Writes PUT output directly to output window instead of log. E.g. data _null_; infile...; input var1 var2 var3... ; FILE PRINT notitles; put @1 var1 @6 var2 @15 var2... ; run;

%PUT

Writes specified text to the log. Examples: Code: %put My first macrovar is: &varlist9; Output: My first macrovar is: read9 math9 write9 Code: %put _all_; Output: Writes all macro vars to log & states if they're automatic, global (user-created)

ZIPNAME, ZIPCITY, ZIPSTATE

ZIPNAME(5digitzip) returns uppercase state names in which zip code is located ZIPCITY(zip) returns city ZIPSTATE(zip) returns state abbreviation ZIPFIPS(zip) returns state FIPS code

Concatenate Character Vars

x||y or cat(x,y): Concatenates exactly as they are catt(x,y): Also removes trailing spaces cats(x,y): Removes trailing & leading spaces catx(sep,x,y): Concatenates, removes spaces, and adds a separator in between. e.g. x="the dog ", y="& cat " x||y = cat(x,y) = "the dog & cat " catt(x,y) = "the dog & cat" cats(x,y) = "the dog& cat" catx(' ',x,y) = "the dog & cat" *NOTE: Watch for truncation - may need LENGTH


Set pelajaran terkait

ch 11 Wounds, Pressure Ulcers (Fund ch 48)

View Set

Alterations in the Musculoskeletal system quiz

View Set

Brunner and Suddharth Med Surg Chapter 39 Study Guide Questions Part 2

View Set

PHYSICS 171 Final Exam study guide

View Set

Unit 2: Material Facts Related to Property Condition and Location

View Set

Mental Health Powerpoint Questions

View Set

APUSH Chapter 12: Give Me Liberty

View Set

Applied Clinical Anatomy Section 1

View Set

Unit 1: Literature with a Purpose Study Guide

View Set

Biancas inscribed Angles - Circles Unit

View Set