PSTAT 130 Midterm

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Character missing value

" " (blank)

AND

(&) if both expressions are true

OR

(|) if either expression is true

Sorting a Sas Data set procedure

* without out = file, the sorted data will overwrite the original dataset * when you include more Than one SORT variable in the by statement: - sas sorts the data set by the first variable listed - the sorts by the second variable within the values of the first variable and so on... By Default, SAS sorts in ascending order. The keyword descending applies to the following variable - DESCENDING goes before the variable in question use nodupekey to delete duplicate records

proc export

- allows you to create data files with any delimiter you choose - resulting text file wil have one delimiter between each data value, and each observation will occupy one line of text - last part of the file name, the file extension, to decide what type of file to create

proc means

- calculates common summary statistics - summarizes numerica values - by and class statements can be used to create summaries for sub-groups - can create an output data set of summary statistics By default: - analyzes every numeric variable in the sas dataset - prints the statistics n, mean, std, min and max excludes missing values before calculating statistics Statements and options - var <var list> : selects variables to be summarized - by <var list> : creates separate summaries for each By group - class <var list> : creates separate summaries for each class group - output out = newDS : creates an output data set containing summary statistics - statistical keywords: options on proc means statement By statement: use the by statement to request summaries for subgroups - data MUST be stored on the By variables first Class Statement: use the class statement to request summaries for sub-groups - data do not need to be sorted on the class variable(s) first Saving Output: - use the output out = statement to save the results of your means procedure to a sas dataset limit number of decimals - use maxdec= option in the proc means statement to limit the number of decimal places in the summary statistics

Proc Import details

- can scan you excel sheet and automatically determine the variable types - will assign lengths to the character variables - can recognize most date formats - by default, takes the variable names from the first row of spreadsheet - if you have a column that contains both numeric and character values, then by default, the numbers will be converted to missing values optional statements: sheet = 'sheet-name'; - if you have more than one sheet in your file range = 'sheet-name$ul:lr'; - if you want t read only specific cells in the sheet, you can specify a range - specify upper left (ul) and lower right(lr) Getnames = No; - removes the default variable names from the table Mixed = yes; - use if you have a column of characters and number if you don't want the numbers to be converted to missing values

put function

- if you want to create a new variable with a user-defined format o.g. variable may be numeric or character but the resulting variable is always character newVar = put(oldVar, user_defined_format);

What is contained in the Sas log?

- notes about the version of sas and your sas site number - it contains the original program statements with line numbers added on the left - data step is followed by a note containing the name of the sas dataset created and the number of observations and variables - both data and proc steps produce a note about the computer resources used - error messages

what does @n do?

- tells sas to move to the nth column when reading in the data - can skip forward or backward within a line of data - skip over unneeded data - read a variable twice using different informats

@'character' column pointer

- use when you don't know the starting column of the data, but you know that it always comes after a particular character or word

Syntax rules for creating format names

-cannot be more than 8 characters -for character values, $ must be first character, a letter or underscore for the second character, and no more than 6 additional characters, number and underscores -for numerical values, must have a letter or underscore as the first character and no more that 7 additional characters, number and underscores cannot end in a number cannot be the same as a SAS format does not end with a period

reading excel spreadsheets

-create sas dataset from an excel spreadsheet using the import wizard - create a sas data set from an excel spreadsheet using proc import

output statement

-every sas data step has an implied output statement at the end, which tells sas to write the current observation to the output data set before returning to the beginning of the data step to process the next observation - you can override the implicit output statement with your own output statement - once an output statement is in your data step, it is no longer implied, and sas writes an observation only when it encounters an output statement - if you want to write several observations for each pass through the data step, you can put an output statement in a DO loop - gives you control over when an observation is written to a sas data set

using sas function

-perform arithmetic operations - compute sample statistics - manipulate sas dates and process character values - perform many other tasks * sample statistics functions ignore missing values*

programming Errors -Tips

-use the enhanced editor = it color codes keywords and highlights errors in red -write your program in small parts and test each part - clear the log and output before running your program -review the log, looking for red and green text - confirm the number of records and variables in each data set using the log - keep all variables in your interim data sets - inspect the data sets you create in Table editor, or using proc print

Numeric missing value

. (period)

4 main file types in SAS

.sas (sas program) .log (contents of log) .lst (contents of output window) .sas7bdat (SAS dataset)

grouping data with user-defined formats

1) use the FORMAT procedure to define a format that assigns all the values that you want to group together to a text string 2) apply the new format to the variable you want to group in a PUT function in a DATA step or a FORMAT statement in a procedure

When was SAS released?

1971

How long can labels be?

256 characters

Library reference must...

8 characters or less start with a letter or underscore

The value $10, 580 is under the format A. DOLLAR7. B. DOLLAR5. C. DOLLAR6. D. DOLLAR5.3 E. None of the above

A. Dollar7. make sure when using the dollar format you take into consideration the length being with the comma. *Always okay to overestimate the whole number length*

Which is not an advantage of column input? a. Standard as well as nonstandard data values can be read. b. It can be used to read character variables that contain embedded blanks. c. Fields do not have to be separated by blanks or other delimiters. d. No placeholder is required for missing data.

A. Standard as well as nonstandard data values can be read. Column input is useful for reading standard values only.

Which of the following statements is true when SAS encounters a syntax error in the DAT step? A. the sas log contains an explanation of the error B. the data step continues to execute and the resulting data set is complete C. data set stops executing at the point of the error and the resulting data set contains observations up to that point D. A note appears in the SA log indicating that the incorrect statement was saved to a SAS dataset for further examination

A. the SAS log will contain an explanation of the error

Explorer Window

Allows you to navigate to libraries, datasets, and other SAS objects

Given the Sas DATA set work.products with variables: ProdID, Price, productType Sales and Returns. The following sas program is submitted: data work. revenue(drop = sales returns price); set work.products(keep = prodID Price Sates Returns); revenue = price*(sales-returns); run; how many variables does the work.revenue data set contain? A. 1 B. 2 C. 3 D. 4

B. 2 (prodID and revenue)

The following sas program is submitted at the stat of a new sas session: libname sasdata 'directory'; data sasdata.sales; set sas.data.sales.data; profit = expenses-revenues; run; proc print data = sales; run; The sas data set sasdata.salesdata has 10 observations. Which of the following explains why a report fails to generate? A. the data step fails execution B. the sas data set sales does not exist C. the sas dataset sales has no observations D. the print procedure contains a syntax error

B. the sas dataset Sales does not exist ( we did not create a temporary files called sales, only sasdata.sales)

general form for proc means

BASIC proc means data = sas-data-set; run; Specifying which vars to include: proc means data = useData; var v1 v2 v3; run; Specifiying summary statistics: proc means data = useData n mean std; var v1 v2 v3; run; Analyzing subgroups (By var): proc means data = useData n mean std; var v1 v2 v3; by var4; run; Analyzing subgroups (by class): proc means data = useData n mean std; var v1 v2 v3; class v4; run; output option: proc means data = useData n mean std; var v1 v2 v3; by var4 output out = meansout; run; maxdec option: proc means data = useData n mean std maxdec = 2; var v1 v2 v3; by var4 output out = meansout; run;

The contents of the raw data file TYPESIZE are listed below: ----|----10---|----20---|----30 cokelarge The following SAS program is submitted: data soda; infile'typesize'; input type $ 1-4 +1 size $; run; Which one of the following represents the values of the variables TYPE and SIZE? Select one: a. type size cokelarge (missing character value) b. type size coke large c. type size coke arge d. No values are stored as the program fails to execute due to syntax errors.

C. type size coke arge (type variable goes from columns 1-4 +1, so deletes the l in large; considers it a delimiter)

after a sas program is submitted, the following is written to the sas log data work.sales; set work.sales_old(keep = product month num_sold cost); ... keep = Product Sales; ... What changes should be made to the KEEP statement to correct the errors in the log? A. keep=(product sales) B. keep = Product Sales; C. keep product sales; D. Keep product , sales

C. (keep = only in the data/proc/set statements)

the following sas program is submitted: data work.accounting; length jobcode $ 12; set work.department; run; the work.department sas dataset contains a character variable named job code with a length of 5. Which of the following is the length of the variable job code in the output data? A. 5 B. 8 C. 12 D. the value cannot be determined because the program fails to execute due to syntax errors

C. 12 (since length is placed before the set statement)

Which format produces dates of the form 31DEC2017? a. DDMMYY9. b. DDMMYYYY9. c. DATE9. d. MMDDYY9.

C. Date9.

Which SAS statement correctly uses column input to read values in the raw data file below in this order: Address (4th field), SquareFeet (second field), Style (first field), Bedrooms (third field)? Raw data file: __________________________________________ 1---+----10---+----20---+----30 2STORY 1800 4 SHEPPARD AVENUE CONDO 1200 2 RAND STREET RANCH 1550 3 MARKET STREET ________________________________ a. input Address 15-29 SquareFeet 8-11 Style 1-6 Bedrooms 13; b. input Address 15-29 $ SquareFeet 8-11 Style 1-6 $ Bedrooms 13; c. input Address $ 15-29 SquareFeet 8-11 Style $ 1-6 Bedrooms 13; d. input $ 15-19 Address 8-11 SquareFeet $ 1-6 Style 13 Bedrooms;

C. input Address $ 15-29 SquareFeet 8-11 Style $ 1-6 Bedrooms 13; Column input specifies the variable's name, followed by the dollar sign ($) if the values are character values and the beginning and ending column locations of the raw data values.

The following SAS program is submitted: data work.accounting; set work.department; label jobCode= 'Job Description'; run; which statement is true about the output dataset? A. the label of the variable Jobcode is Job(only the first word) B. the label of the variable Jobcade is Job Desc (only the first 8 characters) C. the label of the variable Jobcode is Job Description D. the program fails to execute due to errors. Labels must be defined in the PROC STEP

C. label is Job Description; labels have their own length (check above)

Which is a valid LIBNAME statement? A. libname 'directory'; B. libname sasdata sas 'directory'; C. libname sasdata 'directory'; D. sasdata libname 'directory';

C. libname sasdata 'directory';

Which procedure is used to permanently modify a label? a. PROC DATA b. PROC DATASETS c. PROC CONTENTS d. PROC MODIFY

C. proc Datasets

Which of the following are valid pointer controls? a. -n b. +n c. @n d. +n and @n

D. +n and @n

How many steps does the following SAS program contain? PROC SORT DATA=movies OUT=movies_By_imdb_score; BY imdb_score; RUN; PROC PRINT; RUN; DATA USmovies UKmovies; SET movies; IF country="USA" THEN OUTPUT USmovies; ELSE IF country="UK" THEN OUTPUT UKMovies; RUN; a.4 b.2 c.1 d.3

D. 3 (one from the data step and 2 from the proc step)

The dataset wine_origin is of the form Bottle Origin 1 France 2 California and the datasetwine_type is of the form Bottle Type 1 Red 2 White 3 White Assuming that the variable bottle is used as linkage variable, what is the relationship between the two datasets? a. Many to one b. Many to many c. One to one d. One to missing

D. One to Missing

By default, how does SAS write to the output data set? a. SAS first processes all input records, then writes all output records b. None of the above c. SAS can write multiple output records when it reaches the RUN statement d. SAS writes at most one output record when it reaches the RUN statement

D. SAS writes at most one output when it reaches the RUN statement

A raw data file is listed below. 1---+----10---+----20---+--- Jose,47,210 Sue,,108 The following SAS program is submitted using the raw data file above as input: data employeestats; <insert INFILE statement here> input name $ age weight; run; The following output is desired: name age weight Jose 47 210 Sue . 108 Which of the following INFILE statements completes the program and accesses the data correctly? a. infile 'file-specification' dlm=','; b. infile 'file-specification' missover; c. infile 'file-specification' pad; d. infile 'file-specification' dsd;

D. infile 'file-specification' dsd; The PAD option specifies that SAS pad variable length records with blanks. The MISSOVER option prevents SAS from reading past the end of the line when reading free formatted data. The DLM= option specifies the comma as the delimiter; however, consecutive delimiters are treated as one by default. The DSD option correctly reads the data with commas as delimiters and two consecutive commas indicating a missing value like those in this raw data file.

The following SAS program is submitted: data ONE TWO SASUSER.TWO set SASUSER.ONE; Run; Assuming that sasuser.one exists, how many temporary and permanent sas datasets are created? A. 2 temporary and 1 permanent B. 3 temporary and 2 permanent C. 2 temporary and 2 permanent D. there is an error and no new dataset is created

D. no data set created due to an error (no semi colon ending the statement)

The following SAS program is submitted: datanWORK.DATE_INFO; X='04jul2005'd; DayOfMonth=day(x); MonthOfYear=month(x); Year=year(x); run; What types of variables are DayOfMonth, MonthOfYear, and Year? a. DayOfMonth and Year are numeric. MonthOfYear is character. b. DayOfMonth, Year, and MonthOfYear are character. c. DayOfMonth, Year, and MonthOfYear are date values. d. DayOfMonth, Year, and MonthOfYear are numeric.

D. they're all numeric

When running the following code, PROC SORT DATA=movies OUT=movies_By_imdb_score; BY imdb_score; RUN; PROC PRINT; RUN; DATA USmovies UKmovies; SET movies; IF country="USA" THEN OUTPUT USmovies; ELSE IF country="UK" THEN OUTPUT UKMovies; RUN; How many datasets are created? a.4 b.1 c.2 d.3

D.3 (one form the sort procedure and 2 From the data step)

DSD = option

DSD = delimiter sensitive data - ignores delimiters in data values enclosed in quotation marks - does not read quotation marks as part of the data value - treats 2 delimiters in a row as a missing value - assumes delimiter is a comma - csv files are commonly read with the dad option - use missive option is there is any chance there might be missing data in dataset

Main Components of a SAS program

Data Step and proc Step

Each Dataset has a...

Descriptor and Data Portion

What are the 5 main windows of SAS?

Editor Log Output Results Explorer

general form of the format statement

FORMAT variable(s) format;

KEEP vs KEEP= and DROP vs. DROP= : DATA step

KEEP=/DROP= can apply to BOTH input and output data sets KEEP and DROP applies ONLY output data sets when you create multiple output data sets, use the keep=/drop= option to write different variables to different data sets keep and drop statements applies to all output data sets

General form form for libref

LIBNAME libref 'SAS-data-library' <options>;

Can the format statement be used in the format procedure?

NO

Is "Run" always required?

NO

Supressing the Observations column

NOOBS gets rid of the row number son the left side of the report place after all the arguments in proc print statement

How to display the descriptor portion?

Proc Contents Data = <Sas-Data-Set>; run;

How to display the data portion?

Proc Print Data = <Sas-data-set>; run;

All procedures except this assume your data is already sorted?

Proc Sort

how to change between two different character encoders (i.e. ASCII or EBCDIC)

Proc sort sortseq = EBCDIC <or> ASCII;

Sas Sytax Rules Review

SAS Statements usually begin with a keyword and always end with a semicolon sas statements are free-format (can be used by anyone) one or more blank spaces can be used to separate words statements can begin and end in any column a single statement can span multiple lines several statements can be on the same line

Sas program basics

SAS programs consist of a series of statments each statement starts with a keyword and ends with a semi colon statemtned may include zero or more options single line comment lines start with asterisk and end with a semi colon multiple-line comments start with /* and end with */ not case sensitive

Appending datasets

Set statement with 2 data sets with same variables and types - data set A has m records and k variables - data set B has n records and k variables combined data set has (m+n) records and k variables - the order of observations is determined by the order of the list of old data sets * if one of the datasets has a variable not contained in the other dataset, then the observations from the other data sets will have missing values for that variable * * you may want to create a variable that identifies the source of each set of data * - in a separate data step prior to appending the data steps

What does a +1 in the input statement mean?

The +1 skips over one column in while inputting the variables and reading the dataset onto sas

defining titel and footnotes

Title statement: Titlen 'text'; footnote statement: footnoten 'text';

How to interleave two datasets into one by a certain variable?

Use the By statement in conjunction with the set statement data newDATA; set ds1 ds2; by var1; run;

PROC STEP: subsetting variables

VAR, KEEP=/DROP=

PROC STEP: subsetting records

WHERE

SUBSET rows in dataset

WHERE statement DELETE statement IF Statement * where is the same in both data and proc step*

general form of where statement

WHERE where-expression where-expression is a sequence of operands and operators operands = variables or constants operands = - comparison, special, logical operators or functions

DATA STEP: subsetting records

WHERE/IF Statement (variable exists or not)

when is it necessary/required to sort the dataset?

When you need to use the by statement in either data or proc step ex: merge by

Can the format statement be used in the datasets procedure?

YES

NE

^= ~= <>

SAS libraries

a reference to the location of SAS datasets the programmer assign most library name, except: SASUSER: permanent library WORK: temporary library- the library files will be deleted when we end the session SaS datasets have two-level name: libref.filename

Sas Libraries

a reference to the locations of SAS Dataset points to a storage location on a disk drive

sas functions

a routine that returns a value that is determined from specified arguments

The SAS data set STUDENTS contains a numeric variable named Grade and a character variable named Student: STUDENTS Grade Student --------- -------------- 100 Robert 100 Sally 99 William 50 Mary The following SAS program is submitted: proc print data=STUDENTS NOOBS; where Grade='100'; run; What is the output? Select one: a. No Output b. Grade Student --------- -------------- 99 William 50 Mary c. Grade Student --------- -------------- 100 Robert 100 Sally 99 William 50 Mary d. Grade Student --------- -------------- 100 Robert 100 Sally

a. No Output (because Grade is a numeric variable not character)

What happens if you merge the following data sets by the variable SSN? * first ds has 2 columns with ssn and age while the second column has ssn, age and date. SSN are the same in both tables, but corresponding ages are different* Select one: a. The values of Age in the 2nd data set overwrite the values of Age in the 1st data set. b. The values of Age in the 1st data set overwrite the values of Age in the 2nd data set. c. The DATA step fails because the two data sets contain same-named variables that have different values. d. The values of Age in the 2nd data set are set to missing

a. The values of Age in the 2nd data set overwrite the values of Age in the 1st data set.

When creating a format with the VALUE statement, the new format's name - cannot end with a number - cannot end with a period - cannot be the name of a SAS format, and ... a. must begin with a dollar sign ($) if used with a character variable. b. cannot be the name of a data set variable. c. must be at least eight characters long. d. must be at least two characters long.

a. must begin with a dollar sign ($) if used with a character variable.

colon modifier

aka "short" variables used to read each values only as far as the next delimiter - allows you to use informants with List input, but handle non-standard data values (i.e. values that do not match the format) Character Example: - default character variable length is 8 -use colon modifier to read character variables longer than 8 characters - tells sas to read in that variables until it reaches a space

Which step sorts the observations of a permanent SAS data set by two variables and stores the sorted observations in a temporary SAS data set? Select one: a. proc sort data=SASUSER.EMPLOYEES out=SASUSER.EMPSORT;by Lname and Fname;run; b. proc sort data=SASUSER.EMPLOYEES out=EMPSORT; by Lname Fname;run; c. proc sort out=SASUSER.EMPLOYEES data=WORK.EMPSORT;by Lname Fname;run; d. proc sort out=EMPLOYEES data=EMPSORT;by Lname and Fname;run;

b. proc sort data=SASUSER.EMPLOYEES out=EMPSORT; by Lname Fname; run;

How many characters can be used in a label? a. 40 b. 256 c. 96 d. 200

b. 256 (refer up)

EBCDIC sorting from lowest to highest

blank lowercase letters uppercase letters numerals

ACSII sorting from lowest to highest

blank numerals uppercase letters lowercase letters

proc freq

by default: - analyzes every variable in the sas data set - info for both numeric and cha variables - displays each distinct data values - calculates the number of observations in which each data value appears (and the corresponding relative and cumulative percentages) - indicates for each variable how many observations have missing values Tables statement - used to select to select variable and to specify the type of frequency report Format statement: - analyze the frequency of observations within User-Defined categories Cross tabular Frequency Reports: - two-way tables categorize observations on the combination of two sets of categories

Which of the following FORMAT procedures is written correctly? a. proc format value colorfmt; 1='Red' 2='Green' 3='Blue' run; b. proc format; value colorfmt; 1='Red' 2='Green' 3='Blue' c. proc format; value colorfmt 1='Red' 2='Green' 3='Blue'; d. proc format; value colorfmt 1='Red'; 2='Green'; 3='Blue'; run;

c. proc format; value colorfmt 1='Red' 2='Green' 3='Blue'; A semicolon is needed after the PROC FORMAT statement. The VALUE statement begins with the keyword VALUE and ends with a semicolon after all the labels have been defined.

Which of the following is a SAS syntax requirement? a. Put only one statement on each line. b. Begin each statement in column one. c. End each statement with a semicolon. d. Put a RUN statement after every DATA or PROC step.

c. duh

Rules for SAS dataset name

can be 1 to 32 characters long must begin with a letter or an underscore can continue with any combination of numbers, letters, or underscores

Syntax rules for creating format ranges

can be single values or ranges of vaues

Syntax rules for creating format labels

can be up to 256 characters in length are typically enclosed in quotes, although it is not required

DO statement

causes all sas statements coming after it to be treated as a unit until a matching END statement appears DO and END statements and all statements in between are called a DO group

Character Data Type

contains any letters, numbers, special characters and blanks stored with a length 1-32,767 bytes 1 byte equals one character default size: 8 digits

why would you want to use proc freq?

create tables showing the distribution of categorical data values can also reveal irregularities in the data

In=

create variables identifying which data sets contained the observations - variables must start with a letter or underscore, be 32 characters or fewer in length and can only contain only letters, numerals, or underscores

Which step displays a listing of all the data sets in the WORK library? Select one: a. proc contents lib=WORK.all;run; b. proc contents all_data; run; c. proc contents lib=WORK run; d. proc contents data=WORK._all_; run;

d. proc contents data=WORK._all_; run;

How to get data into SAS

data entry using SAS table editor include raw data within SAS Program ("datalines" or "cards") read in raw data from an external file ("infile") import data from another software product (e.g. EXCEL) read in a pre-existing SAS dataset (permanent or created within the program)

general form for merging datasets

data sas-dataset; merge sasds1 sasds2; by var1 <var2...>; run;

general form for concatenation datasets

data sas-dataset; set sasds1 sasds2; run;

creating a dataset refers to

data step

What is the data step's built in loop?

data steps execute line by line and observation by observation

pros and cons of datalines and infile

datalines - see data more directly - good for small amounts of data - often used to create test data sets infile - more common - necessary if data comes from outside source - preferred for larger sets - allows to easily rerun programs on updated data

How to read data from source: raw data in a sas program

datalines / cards data newSet; input var1 $ var2; datalines <cards>; A 1 B 2 ; run;

When to use which method for reading raw data?

datalines good for small data infile good front using a library reference

formatting data values

display formatted values using sas formats in a list report create user-defined formats using the format procedure apply user-defined to variables in a list report

log window

displays status messages created when SAS executes program

Output Window

displays the results of the sas procedures

many to many

duplicate matching by values are in both data sets

list input

each data is separated by a space as a "delimiter" - all variables must be in standard format - values cannot contain spaces -character values cannot be longer than 8 characters -numeric values cannot contain commas or dollar signs - dates will be read as characters rather than data values

column input

each data value is in a fixed location character variables can: - be longer than 8 characters - contain spaces you can skip some data fields, if desired the data must be in "standard" format - numbers may not contain commas or dollar signs - dates will be read as character, instead of numeric, variables

Where statement

enables you to select observations that meet a certain condition can be used with most sas procedures

Why use formats?

enhances the readability of reports by formatting the data values

how to tell sas a value is a date?

evaluate = '14feb2009'd; the d

weekday(sas-date)

extracts the day of the week from a sas date and returns a number from 1-7 where 1 represents Sunday and so on (numeric)

Month(sas-date)

extracts the month from a sas date ad returns a number from 1-12 (numeric)

QTR(sas-date)

extracts the quarter from a -date and returns a number from 1 to 4 (numeric)

Year(Sas-date)

extracts the year from a SAS date and returns a four digit value for year (numeric)

Defining Titles and footnotes

footnotes appear at the bottom of the page of output there is no default footnote you can have more than one footnote the value n can be from 1 to 10 and refers the footnote line an unnumbered FOOTNOTE is equivalent to FOOTNOTE1 footnotes remain in effect until they are changed the null footnote statement, footnote; cancels all

difference between format and informat

format = controls the way data looks when you print them informat = controls the way SAS reads data in

format

how the values are displayed

Where vs. IF

if the variable already exists in the input data set, you can use a WHERE statement if you are evaluating a calculated variable, use IF

How can you subset using the if statement?

if var = "val"; or if var = "val" then delete;

why use tables option in proc freq?

if you do not add the tables statement, you will get a one-way frequency table for every variable

How to read data from source: data created in another software program (i.e. excel)

import (interactive???) need to tell sas - format of data - where it is located - delimiter save sas dataset in an existing library/create a new library

Missover

in infile statement: tells sas not to go to the next line of data when it runs out of values

How to read data from source: raw data in an external text file

infile data newSet; infile 'directory'; input var1 $ var2; run;

Descriptor portion

information about the overall dataset variable attributes = information about each variable (name, label type, length, position, label, information, format)

Selecting variables

keep statement: identifies the variables that will remain in the data set drop statement: identifies variables to be eliminated from the data step DO NOT USE BOTH THE KEEP AND DROP STATEMENTS WITHIN THE SAME DATA STEP

DATA STEP: subsetting variables/columns

keep/keep= or drop/drop=

In the DATASETS procedure which statement is used to assign a new label to a variable?

label

types of raw data input

list input column input formatted input

different uses of pointers in inputs

list input: sas automatically scans to the next non-blank field and starts reading column input: sas starts reading in the exact column you specify formatted input: sas just starts reading--wherever the pointer is, that is where SAS reads

Which Key word is used to merge/link datasets?

merge and by statements

match merging / one to one

merge and by statements - the records from each data set with the same value of the(unique) BY variable are linked, and output as one record - if you omit the BY statement, the first records from each data set are output together as one records without being linked by a common variable

options within the tables statement include:

missing = includes the missing values in frequencies and in percentages missprint = includes missing values in frequencies but not in percent nocum = suppresses cumulative frequencies in one-way frequencies nopercent = surpassed printing the percentages noprint = suppresses printing of frequency tables out = dataset = writes a dataset containing frequencies crosslist = displays cross tabulation in list format with totals list = displays cross tabulations in list format without totals nocol = suppresses column percentages in crosstabulations nor = suppresses row percentages in cross tabulations

today()

obtains the date value from the system clock

label column at data step

permanently associates labels with variables * label option required in the proc step for it to be applied*

general format of proc export

proc export DATA = data-set OUTFILE = 'filename' REPLACE; data-set = sas dataset you want to export filename = the name you make up for the output data file REPLACE = tells SAS to replace the file if it already exists optional: DBMS= = used to specify the filetype - comma delimited > CSV - Tab-delimited > TAB - Space delimited > DLM DELIMITER = - create a file with a delimiter other than comma, tab or space

general form for creating a format

proc format; value formatName range1 = 'label' ... ; run;

general form of a frequency report

proc freq data = -data-set; run; with tables statement: proc freq data = -data-set; tables <variable list>; run; with format statement: proc freq data = -data-set; format variable formatName; tables <variable list>; run; cross_tabular: proc freq data = sas-data-set; tables var1*var2; run;

general format import procedure

proc import out = newFile dataFile = 'external-file-name' dbms = file-type replace;/*replace if already in outfile*/ getnames = YES; /*if you want to keep colnames*/ run; external-file-name = the file you want to read newFile = the sas dataset you want to create REPLACE = tells sas to replace the sas dataset named in the out= option if it already exists DBMS= = tells sas the type of Excel file to read and may or may not be necessary (usually set to equal EXCEL)

procedures for summarizing data

proc means = calculate and display simple summary statistics proc freq = calculate and display frequency counts

creating a report refers to...

proc print

PROC PRINT for SORTED DATA

proc print; by var1 var2; sum var1 var2; run;

general sort format

proc sort data = oldFILE <out = newFILE>; by <descending> var1 var2; run;

Proc Step

produce plots and charts perform "utility" operations on a data set (like print or sort) pre-written routines, analyze data, produce descriptive statistics output results or reports can list, sort and summarize data

results window

provides bookmarks to each section of SAS output

Sorting a SAS DATA Set IDEA

rearranged the observations in a SAS data set can create a new SAS data set containing the rearranged observations can sort on multiple variables can sort in ascending (default) or descending order does not generate printed output treats missing values as the smallest possible value

SAS date value

represents the number of days since Jan 1, 1960 evaluate = '14feb2009'd; returns 17942

datepart value

return just the date portion - number of days since 01/01/1960

Process for formatting data values

sas dataset > format > report

Dupout = option

sas will put the deleted repeated observations into a specified data set proc sort data = oldData out = newData nodupkey dupout = erasedRecords;

Which key word is used to append/concatentate datasets?

set

How to read data from source: data in a permanent sas dataset or created by another sas program

set statement data newSet; set oldSet; run;

dlm = option

since sas expects spaces between the values, this option allows you to use other delimiter - tab as delimiter: dlm= '09'X - sas interprets 2+ delimiters in a row as a single delimiter

What determines the methods we use to read data into sas?

source, structure and format

step boundaries

steps begin with a - data statement - proc statement steps end with a - run statement - quit statement - the beginning of another step

Numeric Data Type

stored as a floating point number in 8 bytes of storage by default 8 bytes of floating point storage provide space for 16-17 significant digits. not restricted to 8 digits

How to read data from source: data entry

table editor (what??) only on campus bruh

cross tabulations or contingency tables

tables combining two or more variables

ampersand (&) modifier

tells SAS to read your data value until it reaches 2 or more spaces in a row inserted after the variable with the embedded spaces

rename = (oldVar = newVar)

tells SAS to rename certain variables

firstObs = n

tells SAS to start reading at observation n

obs = n

tells SAS to stop reading at observation n

noprint option in proc means

tells sas there is no need to produce any printed results since we are saving the results in a new sas dataset - using proc summary is the same as using proc means with the noprint option

how many variables are contained in temp1 and temp2 after running this program? data temp1 temp2(keep = firstName lastName base); set class.empdata(keep = firstName lastName base address); bonus = base*0.3; drop base; run;

temp1 has 4 (firstName lastName address and bonus) temp2 has 2 (firstName and lastName)

label at proc step

temporary * always required to use label in the proc step in order for it to be applied in the output*

datetime value

the number of seconds between midnight January 1, 1960, and a specific date and time 12/01/2009 9:15AM is stored as 1,291,281,300 seconds since 01/01/1960 12:00AM datetime(MMDDYYYTTTT)

Data Portion

the values of the data

SAS datasets

they can have more than 1 indexes, locate records in the data set more efficiently can be read by SAS but not most other programs

Merge datasets

two data sets with at least one common variable and other unique variables - data set A has m records and k UNIQUE variables - data set B has n records and j UNIQUE variables - combined data set (typically) has max(nvm) records and k+j+1 variables

Data Step

typically create or modify SAS data sets. can also be sued to produce custom-designed reports - read data - assign variable names - keep or delete specific observations - transform observations - produce new SAS data sets by subsetting, merging, and updating existing data sets

one to many

unique by values are in one data set and duplicate matching by values are in the other data set

Why is subsetting in the proc step more efficient than in the data step?

unlike subsetting in the data step, where statements in the proc step does not create a new data set

selecting and deleting rows

use an if statement to include only those rows that meet the criteria

creating user defined formats

use format procedure to create the format apply the format to specific variables by using FORMAT statement

how to group variables when creating contingency tables?

use parentheses tables yearsEducation * (sex age); # produces 2 two-way contingency tables one for education * sex and another for education * age

How do you move the pointer explicitly?

use the column pointer @n where n is the number of the column sas should move to

KEEP vs KEEP= and DROP vs. DROP= : General Case

use the keep or drop statement (within the data step) to eliminate variables from the OUTPUT data set - these variables CAN still be used in the SAS expressions use the keep= or drop= in a SET statement to eliminate variables from the INPUT statement - the eliminated variables CANNOT be used in expressions

proc print default output

use variable names as column headings displays all variables contained in data set display all observations contained in dataset display variable values in their "native" format

Creating new variables

use variables assignments in the DATA step to create new variabes An assignment statement: - evaluates an expression - assigns the resulting values to a variable

Editor Window

used to edit, execute, and save sas programs

MDY(month, day, year)

uses number month, day and year values to return the corresponding SAS date value (string)

formatted input

uses sas formats(called informants) data can be in "non-standard" format - numbers can contain commas and dollar signs - dates can be read as numeric values data can be free-form or fixed text files

structure

what variables are present in the data

rename = data set option

when appending data sets > to create common variable names when merging data sets > create unique variables names

when would you use a length statement?

when creating character variables using an if/else statement, you may need to include a LENGTH statement in your program to define the length of the variable you are creating without a LENGTH statement, a character variable's ;length is determined by the first occurrence of the new variable name EX: if temperature > 100 then Status = 'Hot'; else status = 'Cold'; # above would truncate "Cold" to "Col" since "hot" is only 3 character long Length Status $4; if temperature > 100 then Status = 'Hot'; else status = 'Cold'; # works better now that we used the length statement

what is the universal way SAS reads lines of raw data?

when sas reads a line of raw data, it uses a pointer to mark its place, but each style of input uses the pointer a little differently

source

where the data are stored

MMDDYYw.

writes sas date values in form mm/dd/yy or mm/dd/yyyy width range: 2-10 default width: 8 EX use = 8966 Format use MMDDYY8.; > 07/19/84 Format use MMDDYY6.; > 071984

DATEw.

writes sas date values in the form ddmmmyy or ddmmmyyyy width range: 5-11 default width: 7 *change w for the width * EX: use = 8966 Format use DATE7.; > 19JUL84 Format use DATE9.; > 19JUL1984 for comparison purposes, if you input the value being in the date format i.e. evaldate = '19JUL84'd; # need the letter d at the end it will save as 8966 > the number of days since Jan 1, 1960.

DATETIMEw.d

writes sas datetime values in the form ddmmmyy:hh:mm:ss.ss width range: 7-40 default width: 16 EX: use = 12182 Format use DATETIME13.; > 01JAN60:03:23 Format use DATETIME18.1; > 01JAN60:03:23:02.0 evaldate = '01JAN60:03:23'dt; #need letter d and t it will save as 12182 > the number of seconds since midnight Jan 1, 1960

Can you define multiple formats within the same format procedure?

yes

Is the by statement required in proc sort?

yes

can you mix input styles?

yes

KEEP vs KEEP= and DROP vs. DROP= : PROC step

you can use only keep = and drop = option NO KEEP or DROP statement

Writing Summary Statistics to a SAS Data set

you can use the output keyword with out = option to specify a new dataset along with new variables to add to the statement EX: proc means data = zoo noprint; var lions tigers bears; output out = zoosum Mean(lions bears) = lionWeight bearWeight; run;


Kaugnay na mga set ng pag-aaral

Network+ Firewall Design and Configuration Facts 8.3.8

View Set

欢迎2 1-1 坐飞机还是坐火车

View Set

Chapter 8- Horizontal Gene Transfer

View Set

Neurocognitive Disorder Case Study (Alzheimer's Disease)

View Set

Supply Chain Chapter 12: Demand Planning: Forecasting and Demand Management

View Set

Chapter 8: Diagramming to Identify Possible Factors

View Set

Algebra I Unit 3 Inequalities for Gimkit

View Set

Intro to Financial and Real Estate Careers

View Set

Intro to Human Resource Management - Pre & Post Quiz Answers

View Set