Doing more with the SAS-Coursera

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Storing Custom Formats

By default, custom formats are stored in the temporary Work library in a catalog named Formats. A SAS catalog is a special SAS file that stores different types of information in smaller units called catalog entries. If you want to create permanent custom formats, you must specify where they are stored. You can use the LIBRARY=, or LIB=, option to specify a library and a catalog name, such as pg2.myfmts. If you specify just the library, SAS uses the default catalog named Formats. options fmtsearch=(pg2.myfmts sashelp); This OPTIONS statement directs SAS to search in the Myfmts catalog of the PG2 library and the Formats catalog of the Sashelp library after it searches the default locations Work.Formats and Library.Formats. EX: proc format library=pg2; */ OR proc format library=pg2.formats; value $reg 'C' = 'Complete' 'I' = 'Incomplete' other = 'Miscoded'; value hght low-<58 = 'Below Average'58-60 = 'Average'60<-high = 'Above Average';run; options fmtsearch=(pg2); */ OR options fmtsearch=(pg2.formats); proc print data=pg2.class_birthdate noobs;where Age=12;var Name Registration Height;format Registration $grg. Height hght.;run;

Building Character Strings

CAT (string1, ... stringn): Concatenates strings together, does not remove leading or trailing blanks. CATS (string1, ... stringn): Concatenates strings together, removes leading or trailing blanks from each string. CATX ('delimiter', string1, ... stringn):Concatenates strings together, removes leading or trailing blanks from each string, and inserts the delimiter between each string. Name=ALFRED Season= 2017 Day=46 StormID1=cat(Name, Season, Day); StormID2=cats(Name, Season, Day); StormID3=CATX('-', Name, Season, Day); {CATX function to concatenate Name, Season, and Day with a hyphen inserted between each value.} StormID1=ALFRED 201746 StormID2 =ALFRED201746 StormID3=ALFRED-2017-46 To Modify the StormID2 assignment statement to insert a hyphen only between Name and Season. StormID2=cats(Name, '-', Season, Day); gives ALFRED-201746

Character Functions

COMPBL (string):Returns a character string with all multiple blanks in the source string converted to single blanks. NewLocation=compbl(Location); TOYAMA, Toyama, JA to TOYAMA, Toyama, JA COMPRESS (string <, characters>):Returns a character string with specified characters removed from the source string. NewStation=compress(Station); Without additional arguments, the COMPRESS function removes all blanks from a string. JA-000047612 to JA-000047612 JA 0000 47909 to JA000047909 NewStation=compress(Station,"- "); You need to specify both the hyphen and the space. You list them in any order. JA-000047612 to JA000047612 JA 0000 47909 to JA000047909 STRIP (string):Returns a character string with leading and trailing blanks removed.

Question 3 (True or false) In the FORMAT procedure, you specify the name of the format and the name of the column that will use the custom format.

False-The format name is specified in the VALUE statement of PROC FORMAT. However, PROC FORMAT has no reference to the table or column to be formatted. Those items appear in other steps, such as a FORMAT statement within PROC PRINT

What are the values for First.City and Last.City for the third row of the input table given the following information? State City Population NC Carty 2000 NC Dur 4000 NC Green 40001 SC Green 5000 data StatePopulation; set Population; by State City; run;

First.City=1 and Last.City=1 The values of First.City and Last.City are dependent on the values of State. For the third row, First.City is equal to 1 because that is the first time that Greenville appears within NC. Last.City is equal to 1 because that is the last time Greenville appears within NC.

Checking the Condition

For a DO UNTIL loop, the condition is always checked at the bottom of the DO loop.For a DO WHILE loop, the condition is always checked at the top of the DO loop. do until (savings>3000)-Always executes once ; do while (savings<=3000);Excutes only if condition is true Notice that Linda has already reached the desired savings of 3,000. Even though her savings has exceeded the target, the DO UNTIL loop will execute once because the condition is not checked until the bottom of the DO loop. The DO WHILE will not execute at all because the condition is checked at the top of the DO loop

How many rows are in the bikeinfo2 output table given the following input table and code? data bikeinfo2; set bikeinfo; (has 2 columns and 2 rows) do month=1 to 3; do week=1 to 4; bike=bike+2; end; output; end; run;

For each row read in, 3 rows are created (1 for each of three months). So, 2 rows read * 3 months = 6 rows.

difference between using a function and a format

Formatting doesn't change values, only functions do. By using Functions, the real value will be changed. functions will round off the numbers after digit and will change the value. formatting will only delete the numbers after the digit. * using the function *; 65 WindAvg1=round(mean(of Wind1-Wind4), .1); * using the format *;65.0 format WindAvg2 5.1;

Input function

INPUT(source, informat) The INPUT function converts a character value to a numeric value by using an informat to indicate how the character string should be read. Date2=input(Date, date9.); The informat tells SAS how the data in the input table is displayed so it knows how to read it in ANYDTDTEw. informat can be used if dates are written in various data formats. However if data is amiguous such as 06/01/2018(interpreted as june 1st or jan 6th), sas uses datestyle =system option (LOCALE, if LOCALE is set to English then default is MDY, June 1). We can change default by using OPTIONS DATESTYLE=DMY (JAn 6)

Outputting to multiple tables

data monument(drop=ParkType) park(drop=ParkType) other;set pg2.np_yearlytraffic;if ParkType='National Monument' then output monument;else if ParkType='National Park' then output park;else output other;drop Region;run;

INTCK function (to count the number of some time interval, such as weeks, weekdays or months, that have occurred between a start and end date)

INTCK( custom-interval, start-date, end-date, <'method'>) Examples of intervals are year, month, week, or weekday. The second argument is the start date and the third argument is the end date. The optional fourth argument defines the calculation method Default method is for week starts with sunday and ends with Saturday, it is called discrete YearsMarried=intck('YEAR', WeddingDay, today(), 'C'); continuous method ( to consider the number of weeks between 2 dates is to count weeks based on a continuous count from the start date.) Both storms were two days, but why are the values assigned to Weeks different? MAARUTHA spans a Saturday/Sunday boundary, and ARLENE was in the middle of a week.

PUTLOG

If we don't have access to SAS enterprise to see the debugger, we can use this option to see execution phase step by step * Syntax *; * PUTLOG _ALL_; *; * PUTLOG column=; *; * PUTLOG "message"; The _ALL_ keyword writes all columns in the PDV and their values to the log and column= enable you to write specific columns and their values to the log. You can also specify a message to write to the log Data st; set pg2.storm(obs=2); putlog "PDV after set statement"; - - putlog _all_; Log results: PDV after set statement Name=Agarth Basin=EP Maxwind=15 ocean= stormlength=. _ERROR_= 0 _N_=1 PDV after set statement Name=Albine Basin=SI Maxwind=20 ocean= stormlength=. _ERROR_= 0 _N_=2 Then PUTLOG _ALL_ writes all the columns and values in the PDV to the log.At the end of the two iterations of the DATA step,you can examine the log to see the PDV valuesfor each of the two rows. data np_parks; set pg2.np_final (obs=5); putlog "NOTE: START DATA STEP ITERATION"; keep ParkName AvgMonthlyVisitors Acres Size; length Size $ 6; where Type="PARK"; format AvgMonthlyVisitors Acres comma10.; Type=propcase(Type); putlog Type=; AvgMonthlyVisitors=sum(DayVisits,Campers,OtherLodging)/12; if Acres<1000 then Size="Small"; else if Acres<100000 then Size="Medium"; else Size="Large"; putlog _all_; run;

compilation phase and execution phase

In the compilation phase, SAS basically prepares the code and establishes data attributes and the rules for execution. In the execution phase, SAS follows those rules to read, manipulate, and write data.

Steps in Execution phase

In this phase, SAS 1. reads data, 2. processes it in the PDV, and 3. outputs it to a new table. .

Identifying Character Positions

LENGTH (string):Returns the length of a non-blank character string, excluding trailing blanks; returns 1 for a completely blank string. ANYDIGIT (string):Returns the first position at which a digit is found in the string. ANYALPHA (string):Returns the first position at which an alpha character is found in the string. ANYPUNCT (string):Returns the first position at which punctuation character is found in the string.

Merging Tables with Nonmatching Rows

Let's say SAS has just finished reading the matching rows where Name is Barbara and output the contents of the PDV to a row in the new table. The next BY values are examined, and because neither value matches Barbara, the entire PDV is set to missing values. SAS then compares BY values in the two tables and finds that they don't match, so it reads the row from the table with the BY-value that comes first in sorted sequence (Carol before David) and writes Name, Grade, and Teacher to the PDV. The rest of the columns remain as missing when the row is written to the output table.

intck month

Months2Pay=intck('month', ServiceDate, PayDate); 2 Months2Pay=intck('month', ServiceDate, PayDate, 'c'); 1 10JUL2018 (Service date) 05SEP2018(Paydate) Using the default discrete method, Months2Pay is 2. Two end-of-month boundaries were crossed (the end of July and the end of August). Using the continuous method, Months2Pay is 1. One month boundary was crossed at August 10. The next boundary does not occur until September 10.

PUT function

PUT(source, format) The PUT function can be used to convert a numeric column to a character column. The PUT function has two arguments, The first argument is the source, which is the column that that you want to convert from numeric to character. The second argument is the format. The format tells SAS how to display the new character value. downame format Writes date values as the name of the day of the week a=put(18702,downame.); gives Wednesday year4. gives 2018 zipcodelast2=substr(put(zipcode, z5.), 4, 2); The Z format will write a number as a character string and insert leading zeros, if necessary, to fill 5 total positions. So with our first argument in substring now a character value

round

ROUND(number <, rounding-unit>) Top3avg=round(mean(Q1,Q2,Q3), .1) to the nearest 10th 8.3 CEIL (number) Returns the smallest integer that is greater than or equal to the argument. FLOOR (number) Returns the largest integer that is less than or equal to the argument. INT (number) Returns the integer value.

Automatic Converting Column Type

Range=High-Low; High is char type DailyVolumne=Volume/30; Volume is char type Sometimes the automatic conversion is successful, as you saw with the High column.Because High only includes standard numeric values (just digits and decimal points), SAS can interpret the character strings as numbers. But sometimes the automatic conversion doesn't work. The values in the Volume column include commas, so they cannot be interpreted as standard numeric values. Instead, they are written as missing values in the automatic conversion process. Similarly, if you use a numeric column in a character expression, SAS attempts to convert the values from numeric to character. Sometimes it might work, and other times you might get unexpected results.

One-to-One Merge

SAS simply compares rows sequentially as it reads from the multiple tables, matching rows based on the value of the common column. In the compilation phase, all of the columns from the first table listed on the MERGE statement and their attributes are added to the PDV. SAS then examines the second table on the MERGE statement. Any additional columns and their attributes that are not already in the PDV are added. If there are any other statements in the DATA step that create new columns, they are also added to the PDV. Finally, any other compile-time statements are processed.In the execution phase, SAS begins by examining the BY column value for the first row in each table.If they match, then both rows are read into the PDV,additional statements are executed,and at the end of the DATA step, the row is written to the output table. SAS returns to the top of the DATA step for the next iteration, and advances to row 2 in both tables. Again, the values in Name match, so both rows are read into the PDV, and so on. data class2; merge teachers_sort test2_sort; by Name; run;

Extracting Words from a String (SCAN function)

SCAN(string, n, <delimiter); >Location='OWASE, MIE, JA'; City=scan(Location, 1); gives OWASE (1st word of location)ut additional arguments, the COMPRESS functio Default delimiters : blank ! $ % & ( ) * + , - . ? ; < ^ | Location=Miyake-jima, Tokyo, JA City=scan(Location, 1) gives Miyake prefecture=scan(Location, 2) give jima That's a problem. So how do we get the entire string to be the City, and Tokyo to be the Prefecture? The third argument allows me to specify or limit the symbols that are delimiters. City=scan(Location, 1, ',') gives Miyake-jima prefecture=scan(Location, 2,',') give Tokyo Country=scan(Location, -1) gives JA by reading from backwards City=propcase(scan(Location, 1, ',')) gives Miyake-Jima Proper casing for this City name should include a lower case j. I can limit the delimiters used by the PROPCASE function. I'll type quote space quote to indicate that only the space should be treated as a delimiter between words City=propcase(scan(Location, 1, ','), " ") gives Miyake-jima

intnx function-Shifting Date Values

Suppose you have a table that includes transaction dates and you want to create a column for billing that shifts the dates to the first day of the following month. INTNX('interval',start,increment <,'alignment'>) You specify an interval (such as week, month, year, or many others) as the first argument, and the name of a SAS date column as the second argument. The third argument is the increment number, which represents the number of intervals to shift the value of the start date. The optional fourth argument controls the position of SAS dates within the interval. Increment can be zero, positive or negative alignment-'beginning', 'middle', 'end', 'sameday'. Default value - 'beginning'. mydate = '02JAN2017'd;day=intnx('day', mydate , 7); 09JAN2017 (day = mydate + 7); mydate = '02JAN2017'd;nextsunday=intnx('week', mydate , 1);nextsunday = 08JAN2017 nextsunday=intnx('week', mydate , 1, 'sameday'); returns 09JAN2017 firstday=intnx('month', date , 0); 04jan88----01jan88 firstday=intnx('month', date , 2); 04jan88----01MAR88 firstday=intnx('month', date , -1, 'end'); 04MAR88---31FEB88

Replacing Character Strings-TRANWRD

TRANWRD(source, target, replace) The TRANWRD function basically does find and replace for you. The first argument is generally a character column, the second argument is the target, or the string you want to find. The third argument is the string that replaces the target. summary2=tranwrd(summary, 'hurricane', 'storm'); This assignment statement uses the TRANWRD function to replace all instances of lowercase hurricane with storm in the Summary column(summary is a variable name.

Subsetting Rows in the Execution Phase

The WHERE statement is a compile-time statement that establishes rules about which rows are read INTO the PDV. Therefore, the WHERE expression must be based on columns that exist in the input table referenced in the SET statement. The FIRST./LAST. variables are not in the input table - they're assigned after a row is read into the PDV. Therefore, you can't use the WHERE statement to subset rows based on values of the FIRST./LAST. variables. Instead, you must subset the data during the execution phase based on values that may be assigned or changed after a row is read into the PDV. To do this, you can use the subsetting IF statement. The syntax is simply the keyword IF, followed by an expression. The subsetting IF statement is an executable statement, so it processes during the execution phase in the order it occurs in the DATA step code. The IF expression can be based on any values in the PDV. data new;set storm;by basin; if last.basin=1; sl= ed-sd;

Processing Data in Groups

The execution phase of the DATA step includes some very useful features that makes it easy to determine when each group of values in a table begins or ends. First you create an output table that is sorted by a column that has groups you want to analyze. Then you use the same BY statement in the DATA step to tell SAS that you want to process the data in groups. When a DATA step includes a BY statement followed by a column name, two special columns, FIRST.by-column and LAST.by-column, are added to the PDV. In this example code, the column names are first.basin and last.basin. During the execution phase, the FIRST. and LAST. variables are assigned a value of 0 or 1. The FIRST. variable is 1 for the first row within a group, and 0 for all other rows. Similarly, the LAST. variable is 1 for the last row within a group, and 0 for all other rows. These temporary variables contain important information that you can use before they are dropped when a row is written to the output table. pros sort data=storm;by basin; data new; set storm; by basin;

DATA step execution acts like an automatic loop

The first time through the DATA step, the SET statement 1. reads the first row from the input table, 2. processes any other statements in sequence, 3.manipulating the values in the PDV. 4. When SAS reaches the RUN statement or the end of the DATA step, there is an implied OUPUT action so that the contents of the PDV, minus any columns flagged for dropping, are written as the first row in the output table The DATA step then automatically loops back to the top, and resets the values of computed columns in the PDV to missing(coumns read through set statement are not set to missing, but 1st/previous columns will be retained and replaced by 2nd columns ), and executes the statements in order again, this time reading, manipulating, and outputting the next row. Compile-time statements such as DROP, LENGTH, and WHERE, are not executed for each row, however because of the rules they establish in the compilation phase, they impact the output table. KEEP, WHERE, and FORMAT are compile-time statements. They set rules for execution.

Tens=10 20 30 40;; data newnums; set nums; retain Count 100; Count+Tens; run; What is the value of Count at the end of the third DATA step iteration?

The initial value of Count is 100. The end of the first iteration is 100+10=110. The end of the second iteration is 110+20=130. The end of the third iteration is 130+30=60.

Length of Format names

The length of the format matches the length of the longest label.

What is the result of running the following DATA step? data work.boots; set sashelp.shoes(keep=Product Subsidiary); where Product='Boot'; NewSales=Sales*1.25; run;

The step produces work.boots with four columns

Sum statement

The sum statement ignores missing values, initially sets the accumulator column to 0, adds a numeric value to an accumulator column, and automatically retains the value of the accumulator column.

Conditional DO Loops

There are times when you don't know how many times the DO loop needs to iterate. For example, suppose you want the DO loop to stop when each person has saved more than 3000 dollars? You can use a conditional DO loop to iterate as many times as necessary to meet the condition. There are two variations of the conditional DO loop - DO UNTIL and DO WHILE. DO UNTIL executes until a condition is true. DO WHILE executes while a condition is true. For both methods, the expression must be enclosed in parentheses. do until(savings>3000); month+1; savings+amount; savings+(savings*0.02/12); end; In this example, the DO UNTIL loop executes until savings is greater than 3,000. do while (savings<=3000); The DO WHILE loop uses the opposite expression. The DO WHILE loop executes while savings is less than or equal to 3,000. Let's see if these two examples give the same result.

Creating custom format from the table

We need FmtName-The column named FmtName contains the name of the format you are creating. Don't forget - if you're creating a format to apply to character values, the name must begin with a dollar sign. Start-The column named Start contains the values to the left of the equal sign in the VALUE statement (or the values you want to format). Label-The column named Label contains the values to the right of the equal sign in the VALUE statement (the labels you want to apply to the values). End-You might need additional columns, for example, if you're specifying ranges you need an End column in addition to a Start column. Our input table has two columns: Sub_Basin and SubBasin_Name. data work.sbdata; retain FmtName '$sbfmt'; set data (rename=(Sub_Basin=Start SubBasin_Name=Label)); keep start label FmtName; run; proc format CNTLIN=work.sbdata;run; For the new table, we need to rename the Sub_Basin column to Start and the SubBasin_Name column to Label. We also need to add a column for the format name. We can use a RETAIN statement to create the FmtName column and retain the value $sbfmt for each row. In the end, the new table work.sbdata has three columns (FmtName, Start, and Label) with a row for each SubBasin value. After the table has the correct layout, it's very simple to use it to create a custom format. Instead of a VALUE statement, you use the CNTLIN= option on the PROC FORMAT statement. /*Based on ranges than single value*/ data catdata; retain FmtName "catfmt"; set pg2.storm_categories(rename=(Low=Start High=End Category=Label)); keep FmtName Start End Label;run; /*So after the DATA step, my PROC FORMAT will import that temporary table catdata.*/ proc format cntlin=catdata; run; proc freq;tables;format sub_basin $sbfmt. wind catfmt.;run;

Handling Column Attributes

When multiple tables are listed in the SET statement, columns from the first table are added to the PDV with their corresponding attributes. When SAS reads the second table in the SET statement, the attributes of any columns that are already in the PDV cannot be changed. For the Name column, the length is already set and cannot be modified. warning will be given if types are diffrent WARNING: Multiple lengths were specified for the variable Name by input data set(s). This can cause truncation of data You can solve this problem by using the LENGTH statement. The LENGTH statement must come before the SET statement so that the attributes of Name are established in the PDV before Name is read from sashelp.class. data class_current; length Name $ 9; set sashelp.class pg2.class_new2(rename=(Student=Name)); run;

concatenating tables

When the columns have the same names, lengths, and types, then concatenating tables is simple. You use the DATA step to create a new table, and simply list the tables that you want to combine in the SET statement. SAS reads all of the rows from the first table listed in the SET statement and writes them to the new table, then it reads and writes the rows from the second table, and so on.

Columns in the PDV using drop/keep= option

When you use a DROP= or KEEP= data set option on a table in the SET statement, the excluded columns are not read into the PDV, so they are not available for processing. When you use a DROP or KEEP statement or a DROP= or KEEP= data set option in the DATA statement, the columns are included in the PDV and CAN be used for processing. They are flagged to be dropped when an implicit or explicit OUPUT is reached. data sale_high(drop=) sales_low(drop=);run;

Retaining Values in the PDV

YTDRain is a computed column, and by default all computed columns are reset to missing when the PDV is reinitialized. In order to create an accumulating column, you must override the default behavior of the PDV. First, you need the initial value of YTDRain to equal zero, rather than missing. Second, when the PDV is reinitialized at the beginning of each iteration of the DATA step, you need SAS to retain the value of YTDRain in the PDV, rather than set it to missing. The RETAIN statement is the solution! RETAIN is a compile-time statement that sets a rule for one or more columns to keep their value each time the PDV is reinitialized, rather than being reset to missing. It also provides the option of establishing an initial value in the PDV before the first iteration of the DATA step. By adding a RETAIN statement to your program, you can change the default behavior of the PDV in this scenario to create an accumulating column. Retain YTDRain 0; YTDRain=YTDRain+DailyRain

Creating accumulating columns

YTDRain=YTDRain+DailyRain...This will not work to create running total of DailyRain as newly computed values (YTDRain) are set to missing in the PDV and each time it runs new rows it reinitializes to missing again.

Explicit output

You can use an explicit OUTPUT statement in the DATA step to force SAS to write the contents of the PDV to the output table at specific points in the program. If you use an explicit OUTPUT statement anywhere in a DATA step, you have taken control of the output and there is no implicit OUTPUT at the conclusion of the DATA step data forecast; set sashelp.shoes; keep Region Product Subsidiary Year ProjectedSales; format ProjectedSales dollar10.; Year=1; ProjectedSales=Sales*1.05; OUTPUT ; Year=2; ProjectedSales=ProjectedSales*1.05; output; Year=3; ProjectedSales=ProjectedSales*1.05; *output; run; if we dont have output statements at each of those assignment statments only year 3 was outputted as that was the last value appeared in the PDV.

Identifying Matches and Nonmatches

You can use the IN= data set option to create temporary variables in the PDV that you can use to flag matching or non-matching values.The IN= variables are included in the PDV during execution, but are not written to the output table. During execution, the IN= variables are assigned a value of 0 or 1. Zero means that table does not include the by-column value for that row, and 1 means it does include the by-column value. Notice when Name was read from both input tables, inUpdate and inTeachers are both 1. Carol is only in the class_teachers table, so inUpdate is 0 and inTeachers is 1, and the opposite is true for David. data class2; merge pg2.class_update(in=inUpdate) pg2.class_teachers(in=inTeachers); by name; if inUpdate=1 and inTeachers=1; run;

What is the correct formatted output given the following PROC FORMAT step and the input table? proc format; value $answer '1'='Yes' '2'='No' 'other'='Not Answered'; run;

Ys No 3 The word other is in quotation marks. Therefore, it is not seen as a keyword. A value of 3 is not referenced by any of the values in the VALUE statement. The value 3 is displayed as 3.

_ERROR_, _N_

_ERROR_, _N_ are included in PDV during execution. _ERROR_ set to 0 if there are no errors and 1 if any errors occur. _N_ intially set to 1 and each time a datastep loops past the data statement variable increments by 1.

PDV

includes each column that you've referenced in the DATA step and its attributes, including the column name, type, and length.

Largest, smalles, min, max functions

maxbal=max(of bal1-bal6); minbal=min(of bal1-bal6); quiz1st=largest(1,of quiz1-quiz6); The column quiz1st is using the largest function to get the largest or maximum score from the columns Quiz 1 through Quiz 5. By using the keyword of, we can reference the range of columns Quiz 1 dash Quiz 5 quiz2nd=largest(2,of quiz1-quiz6); 2nd largest balls top2avg=round(mean (quiz1st,quiz2nd), .1); small1=smallest(1,of bal1-bal6); smallest small2=smallest(2,of bal1-bal6); second smallest small3=smallest(3,of bal1-bal6); third smallest

To check formats under a library

proc format fmtlib library=work; select $sbfmt catfmt;/*limit to only few formats*/ run;

PROC FORMAT

proc format; *character format;value $regfmt 'C'='Complete' 'I'='Incomplete' other='Miscoded'; *Numeric format; value hrange 50-57='Below Average' 58-60='Average' 61-70='Above Average'; run; format Registration $regfmt. Height HRANGE.; This doesnt include values like 57.2 ect. Better way to perform the same *Numeric format;value hrange 50-<58='Below Average' 58-60='Average' 60<-70='Above Average'; run; when you specify the range the dash incudes the numbers.ex 58-60 includes both 58 and 60. lessthan symbol before ending value excludes the ending value, ex 50-<58 excludes 58. Lessthan symbol after starting value excludes the starting value, ex 60<-70. keywords:low, high, other,

Proc Format example

proc format; value $region 'NA'='Atlantic' 'WP','EP','SP'='Pacific' 'NI','SI'='Indian' ' '='Missing' other='Unknown';run; data storm_summary; set pg2.storm_summary; Basin=upcase(Basin); BasinGroup=put(Basin, $region.); run; proc freq data=pg2.np_summary order=freq; tables Reg; label Reg='Region'; format Reg $highreg.; run; proc format; value psize low-<10000='Small' 10000-<500000='Average' 500000-high='Large'; run; proc format; value sdate low-'31DEC1999'd='1999 and before' .='Not supplied' run; value salrange low-<50000="Under $50K" 50000-100000="50K-100K" 100000<-high="Over 100K";

RAND function

studentid=RAND('integer', 1000, 9999); generate random interger from th given intervals

Steps in Compilation phase

1. SAS runs through your program to check for syntax errors. 2. If there are no errors, SAS builds a critical area of memory called the Program Data Vector, or PDV for short. 3. SAS establishes rules for the PDV based on your code, such as which columns will be dropped, or which rows from the input table will be read into the PDV. 4. Finally, SAS creates the descriptor portion or the table metadata

What is the value of x at the completion of the DATA step? data test; x=15; do until(x>12); x+1; end; run;

16 A DO UNTIL is evaluated at the bottom. The initial value of x is 15. The DO loop occurs one time, even though 15 is greater than 12, because the condition is not checked until the bottom of the loop. Therefore, 15 becomes 16 before the condition is checked.

The sashelp.cars table contains 428 rows: 123 rows with Origin equal to Europe and 305 rows with Origin equal to other values. How many rows will be in the Other table? data Europe Other; set sashelp.cars; if Origin='Europe' then output Europe; output Other; run;

428 rows The OUTPUT statement outputs every row to the Other table.

How many rows and columns are in the output table ShippingZones given the following information? The input table Shipping contains 5 rows and 3 columns (Product, BoxSize, and Rate). data ShippingZones; set Shipping; Zone=1; output; Zone=2; Rate=(Rate*1.5); run;

5 rows and 4 coulmns The explicit OUTPUT statement is sending the ZONE=1 rows to the output table. There is no explicit OUTPUT statement after ZONE=2, so those rows are not making it to the output table. An implicit OUTPUT is not at the bottom of the DATA step due to the explicit OUTPUT. The four columns are Product, BoxSize, Rate, and Zone.

Which of the following does not have proper syntax for specifying a range in the VALUE statement? 500>-700 'Horse' - 'Mouse' 500-<700 'A' - 'C'

500>-700: The greater than symbol is not valid for a range. The less than symbol can be used after the starting value or before the ending value.

FIND function

==FIND(string,substring, <'modifiers">) FIND function is an effective way to search for a particular substring within character values. The first argument typically is the character column and the second argument specifies the substring to find. The optional third argument can be used to provide modifiers. "I" makes the search case insensitive and "T" trims leading and trailing blanks. The FIND function returns a number that indicates the start position of the substring within the string. If the substring is not found, the FIND function returns a zero Airportloc=find(Station, 'Airport'); The FIND function does a case-sensitive search in the values of Station for the substring Airport station=Airport Road gives 1 and station=Raleigh Durham Airport give 16 and Station=Cary Parkway gives 0 To make it case insensitive search use i data storm_damage2; set pg2.storm_damage; drop Date Cost; CategoryLoc=find(Summary, 'category', 'I'); if CategoryLoc > 0 then Category=substr(Summary,CategoryLoc, 10); run; Summary= Category 3 hurricane initially impacts the U.S. as a Category 1 near Miami, FL, then as a strong Category 3 along the eastern LA-western MS coastlines, resulting in severe storm surge damage (maximum surge probably exceeded 30 feet) along the LA-MS-AL coasts, wind damage, and the failure of parts of the levee system in New Orleans. Inland effects included high winds and some flooding in the states of AL, MS, FL, TN, KY, IN, OH, and GA. CategoryLoc=1 Category=Category 3

SAS date time

A datetime value in SAS is stored as the number of seconds from midnight on January 1, 1960. Datepart(date-time vaue) Timepart(date-time value)10dec1999:10:00:00:00

Output Inside and Outside the DO Loop

Data aaa; do month=1 to 12; output; end; An OUTPUT statement between the DO and END statements writes a row for each iteration of the DO loop.In this example, there are 12 iterations; therefore, the output happens 12 times. The value of the index-column is incremented at the bottom of the DO loop, after each row is written to the output table. After 12 rows are written to the output table, month is incremented to 13, and because 13 exceeds the stop value, the DO loop does not execute again. If we run the same code without the explicit OUTPUT statement, implicit output at the end of the DATA step is active. Since output occurs after the 12 iterations of the DO loop, the value of Month is 13 when the row is written to the output table. with output we get 12 rows with last row=12 without output we get only one row with year value of 13

example compilation

Data storm; set stormsum; length Ocean $8; drop enddate; where name is not missing; basin=upcase(basin) stormleng=enddate-startdate; if substr(basin2,1)="I" then ocean="Indian"; else if substr(basin2,1)="A" then ocean="Atlantic"; else Ocen="pacific"; run; To build the PDV, SAS passes through the DATA stepsequentially, adding columns and their attributes. The SET statement in this program is listed first, so all of the columns from the storm_summary_small table are added to the PDV along with the required column attributes name, type, and length. If there are any other statements that define new columns, those columns are also added to the PDV. In this code, the LENGTH statement defines the character column Ocean with a length of 8. StormLength is the last new column, and based on the arithmetic expression, it's defined as a numeric column with a length of 8. As you can see, Ocean occurs in assignment statements later in the step. However, after a column and its attributes are established in the PDV, they cannot be changed. That's why the LENGTH statement must occur before the IF-THEN statements, otherwise SAS would have used the assignment statement OCEAN="Indian" to define Ocean with a length of 6. Remember, SAS is not processing data at this point, so the IF expression is not evaluated, SAS simply adds columns to the PDV in the order they occur in the code. There are certain statements that are specific to the compilation phaseand establish the behavior of the PDV. The DROP statement does not remove a column from the PDV. Instead, SAS marks the column with a drop flag so that it's dropped later in execution. In this program, EndDate will be dropped from the output data,but it is still available to use in the PDV for calculating the column StormLength. The WHERE statement defines which rows will be read from the input table into the PDV during execution. Finally, the descriptor portion of the output table is complete. Notice that the EndDate column is not included in the descriptor portion of the output table.

Iterative DO Loops

Do index column=start to stop <by increment> DO year=1 to 3 by 1; Start is a number or numeric expression that specifies the initial value of the index-column. Stop is a number or numeric expression which specifies the value that the index-column must exceed to stop execution of the DO loop.The DO loop continues to execute as long as the value of the index-column is within the start and stop range.Each time the DO loop executes, the value of the increment is added to the value of the index-column and this continues until the value of the index-column exceeds the stop value. In our example until the year is 4.

Call routine

call missing(prod, invty); Like functions, CALL routines perform a computation or a system manipulation based on the input you provide in arguments. However, a CALL routine does not return a value. Instead, it alters column values or performs other system functions.For a CALL routine to modify the value of an argument, those arguments MUST be supplied as column names -- constants and expressions are not valid Suppose we have students who have taken 5 quizzes and we want to drop each student's lowest 2 quiz cores and base their grade on the average of the top 3 cores. call sortn (of q1-q5); (takes the columns provided as arguments, and reorders the numeric values for each student from low to high) qavg=mean (of q3-q5) Notice we are not assigning these values to new columns, but instead the data values are reordered in the existing columns. As the teacher, I could easily drop the lowest 2 scores for each student and calculate a mean score based on the values of Quiz3 through Quiz5.

Using the Sum Statement

coulumn(nam of new accumulating column) + Expression (column to add) You've seen that one way to create an accumulating column is to use the RETAIN statement and an assignment statement that uses the SUM function to ignore missing values. Because creating an accumulating column in a table is a common need, the DATA step offers a shortcut to using these multiple statements. You can use the SUM statement as a simple way to create an accumulating column. The name of the new accumulating column is on the left, and the column or constant to add for each row is on the right of the expression. The SUM statement does the following four things automatically: Creates the accumulating column to the left of the plus sign and sets the initial value to 0. Automatically retains the value of the accumulating column in the PDV. Adds the value of the column or constand on the right of the plus sign to the accumulating column for each row, and ignores missing values.

Example Find and substr together

data storm_damage2; set pg2.storm_damage; drop Date Cost; CategoryLoc=find(Summary, 'category', 'i'); if CategoryLoc > 0 then Category=substr(Summary, CategoryLoc, 10); run; Here The SUBSTR function starts at the number stored in CategoryLoc and reads 10 characters, and returns the string to Category. Summary=Massive category 4 hurricane made landfall near Rockport, Texas causing widespread damage. Harvey's devastation was most pronounced due to the large region of extreme rainfall producing historic flooding across Houston and surrounding areas. More than 30 inches of rainfall fell on 6.9 million people, while 1.25 million experienced over 45 inches and 11,000 had over 50 inches, based on 7-day rainfall totals ending August 31. This historic U.S. rainfall caused massive flooding that displaced over 30,000 people and damaged or destroyed over 200,000 homes and businesses. CategoryLoc=9 Category=category 4 Create a new column named Park that uses the SUBSTR function read the value in ParkName and exclude the NP code at the end of the string. Hint: Use the FIND function to identify the position number of the NP string. That value can be used as the third argument of the SUBSTR function to specify how many characters to read. Park=substr(ParkName, 1, find(ParkName,'NP')-2); parkname =Great Smoky Mountains NP nonp=23 park=Great Smoky Mountains parkname =Acadia NP nonp=8 park=Acadia

Which output table does the following step produce?

data test; bike=10; do day=1 to 7 while (bike lt 13); bike=bike+2; end; run; Bike day 14 3

Scan Example

data weather_japan_clean; set pg2.weather_japan; Location=compbl(Location); City=propcase(scan(Location, 1, ','), ' '); Prefecture=scan(Location, 2, ','); *putlog Prefecture $quote20.; " Tokyo" if Prefecture="Tokyo"; run; This gives 0 results and need to add STRIP function to give results. When the SCAN function extracts Prefecture from Location, only a comma is specified as a delimiter. The leading space is included in the returned value. Adding a space as a delimiter works if there are no spaces embedded in City or Prefecture. Prefecture=scan(Location, 2,', '); The STRIP function removes leading and training blanks. Prefecture=strip(scan(Location, 2, ','));

Converting the Type of an Existing Column

data x;set pg2.stocks; date2=Volume=input(date,date9.);volumn=input(Volume, date9.); In this example, we have code to change the type of Volume from character to numeric. However, when the SET statement reads the pg2.stocks data, the PDV is established, and Volume is a character column. The assignment statement cannot change the character column Volume to a numeric column because it's already established. This is similar to how a LENGTH statement cannot change the length of a column after it is set in the PDV. If you want to change Volume to a numeric column, there are three steps you must follow to change the column type. First, use the RENAME= data set option in the SET statement to rename the input column you want to change. The RENAME= data set option follows the table name that has the column you want to rename. You list the current column name on the left side of the equal sign and the new column name on the right set dataname(rename=(volume=charvolume)); volume=input(CharVolume, comma12.); drop CharVolume;

Which of the following statements contains valid syntax? do Year=2018 to 2018 or until (Earnings<=100000); do Age=10 to 14 and while (Weight<150); do Increase=5 to 10 while (temperature Lt 102); do week=1 to 52 do until (Mileage ge 2750);

do Increase=5 to 10 while (temperature Lt 102); When combining an iterative with a conditional, you cannot use extra words such as AND, DO, or OR. The conditional immediately follows the iterative.

Combining Iterative and Conditional DO Loops

do month=1 to 12 until (savings>5000); savings+amount; savings+(savings*0.02/12);end; The DO loop stops executing when the stop value is exceeded or the condition is met, whichever is first. Suppose we need to reach a savings value of 5,000 (conditional) but we don't want to go beyond 12 months (iterative).

Creating an Accumulating Column within Groups

we want to create an accumulating column named MTD, or Month-to-Date Rain, but reset it each time a new month begins. Here we'll use the First dot variable to reset an accumulating column for rain totals at the beginning of each month. A critical part to make this work is that we reset MTD Rain to 0 each time that SAS reaches the first row within a new month group. So after the BY statement, I'll add an IF statement. If First.Month equal 1, then MTD Rain equals 0. data houston_monthly; set pg2.weather_houston; keep Date Month DailyRain MTDRain; by Month; if first.Month=1 then MTDRain=0; MTDRain+DailyRain; if last.Month=1; /*subsetting IF statement to output only the final day of each month*/ run;

Specifying Columns Lists

you can use a double-dash to specify a physical range of columns from left to right in a table. The columns can have any names format quiz1--avzquiz 3.1; Notice that you don't need to use the OF keyword in the FORMAT statement.The OF keyword is required when you use column lists as arguments in a function or call routine avgquiz=mean (of Q:);


Kaugnay na mga set ng pag-aaral

ADP 3-0 / ADRP 3-0 - Unified Land Operations

View Set

Examfx Life and Health Insurance

View Set

Ch. 1: The Sciences of Anatomy and Physiology (Learnsmart Quiz)

View Set

Math Chapter 6 "Relations and functions"

View Set

Davinci Resolve 17 Color Certification

View Set

exploring the entrepreneurial perspective

View Set

ATI fundamentals practice quiz questions pt. 1

View Set

Chapter 2: Collecting Subjective Data: The Interview and Health History

View Set