Reading in Excel/Raw Data Files, Session 4
export wizard steps
1. select the data set from which you want to export data 2. select teh type of data source to which you want to export files. 3. assign the output file. 4. assign the table name. 5. save the generated proc export code. (optional)
Input statement example for dates
Birth_Date :date.; Hire_Date :mmddyy.;
delimiter
a space (blank) is the default delimiter the dlm=option can be added to the infile statement to specify an alternate delimiter
nonstandard data
any data that sas cannot read without a special instruction examples: 5,823 (23) $67.23 01/12/1999 12may2006
copy procedure
as an alternative to the data step, this can be used to create an excel worksheet
libname statement
assigns a library reference name (libref) to a sas data library example libname orion 's:\workshop'; orion=libref 's:\workshop';=physical location of sas data library
SAS name literals
by default special characters such as $ are not allowed in data set names sas name literals enable special characters to be included in data set names a sas name literal is a name token that is expressed as a string within quotation marks, followed by the letter n. example: orion.'Australia$'n 'Australia$'n=sas name literal
sas/access libname statement with the data step
can be used to create an excel worksheet
dsd delimiter
changes how sas treats delimiters when you use LIST input and sets teh default delimiter to a comma. when you specify this, sas treats two consecutive delimiters as a missing value and removes quotation marks from character values. this option specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data.
procedures
code based
input statement specifications
column input-enables you to read standard data values that are aligned in columns in the raw data file formatted input-combines the flexibility of using informats with many of the features of column input. by using formatted input, you can read nonstandard data for which sas requires additional instructions. list input-uses a scanning method for locating data values. data values are not required to be aligned in columns, but must be separated by at least one blank or other defined delimiter.
data step processing steps
compile program initialize variables to missing (PDV) execute read statement execute other statements output to sas data set
import procedure/ mixed=yes or not
converts numeric data values for a column that contains mixed data types. the default is NO, which means that numeric data will be imported as missing values in a character column. if mixed=yes, then the engine will assign a sas character type for the column and convert all numeric data values to character data values example: mixed=no;
COMMA. DOLLAR.
converts raw data ($12,345) to sas data value of 12345
COMMAX. DOLLARX.
converts raw data value of ($12.345) to sas data value of 12345
EUROX.
converts raw data value of (E12.345) to sas data value of 12345
MMDDYY.
converts raw data value of 01/01/60 to a sas data value of 0
DDMMYY
converts raw data value of 31/12/60 to a sas data value of 365
DATE.
converts raw data value of 31DEC59 to a sas data value of -1
standard data
data that sas can read without any special instructions examples: 58 -23 67.23 00.99 5.6E5 1.2E-2
reading in a delimited raw data file
data___________ infile__________ input_____________ ... run;
length statement
defines the length of a variable explicitly example: length first_name $12 last_name $12
input statement
describes the arrangement of values in the raw data file and assigns input values to the corresponding sas variables
tab delimiter
dlm='09'x.
SAS explorer window
each worksheet in the excel workbook is treated as though it is a sas data set. worksheet names appear with a dollar sign at the end of the name
import/export wizards and procedures
enable you to read and write data between sas data sets and external pc files
import wizard
enables you to read data from an external data source and write to a sas data set steps 1. select the type of file you are importing 2. locate the input file 3. select the table range or worksheet from which to import data 4. select a location to store the imported file 5. save the generated PROC IMPORT code (optional)
sas/access libname statement
extends the libname statement to support assigning a library reference name (libref) to microsoft excel workbooks this enables you to reference worksheets directly in a DATA step or SAS procedure, and to read from and write to a microsoft excel worksheet as though it were a sas data set. mixed=yes/no specifies whether to convert numeric data values into character data values for a column with mixed data types example: libname orion 's:\workshop\sales.xls'; orion=libref 's:\workshop\sales.xls';=physical file name of excel workbook including path, filename, and extension
export procedure/ outfile=
filename/specifies teh complete path and filename or a fileref for the output PC file, spreadsheet or delimited external file example: outfile="s:\workshop\qtr2007c.xls"
import procedure/ datafile=
filename/specifies the complete path and filename or a fileref for the input pc file, spreadsheet or delimited external file. example: datafile="s:\workshop\sales.xls"
import procedure/ getnames=yes or no
for spreadsheets and delimited external files, determines whether to generate sas variable names from the column names in the input files first row of data. note that if a column name contains special characters that are not valid in a sas name, such as a blank, sas converts the character to an underscore. example: getnames=yes;
data step processing
has two phases: compilation execution
import procedure/ dbms=
identifier/specifies the type of data to import. to import a dbms table, you must specify dbms=using a valid database identifier. for example, dbms=excel specifies to import a microsoft excel worksheet example: dbms=excel;
infile statement
identifies the physical name of the raw data file to read with an input statement the physical name is the name that the operating environment uses to access the file example: infile 's:\workshop\sales.csv';
disassociating a libref
if sas has a libref assigned to an excel workbook, the workbook cannot be opened in excel. to disassociate a libref, use a libname statement and specify the libref and the CLEAR option. example: libname orion clear;
$
indicates a character informat
dsd example
infile 'phone2.csv' dsd;
missover example
infile 'phone2.csv' missover;
.
is a required delimiter
reading in a sas data set
libname___________ data__________________ set___________ ... run;
reading in an excel worksheet
libname_____________ data____________ set___________ ... run;
export procedure/ data=
libref/identifies the input sas data set example: data=orion.qtr1_2007
import procedure/ out=
libref/identifies the output sas data set example: out=work.subset2a
informat
names the sas informat or user defined informat
DSD
option for the infile statement sets the default delimiter to a comma treats consecutive delimiters as missing values enables sas to read values with embedded delimiters if the value is surrounded by quotation marks
import procedure/ replace
overwrites an existing sas data set. if you do not specify replace, proc import does not overwrite an existing data set. example: dbms=excel replace;
wizards
point and click interfaces to invoke from the sas windowing environment, select file and import data or export data
missover option
prevents sas from lading a new record when the end of the current record is reached. if sas reaches the end of the row without finding values for all fields, variables without values are set to missing
data statement
provides the name of the sas data set being created the data statement can create temporary or permanent data sets
export wizard
reads data from a sas data set and writes it to an external file source
EUROXw.d
reads nonstandard numeric data and removes embedded characters in European currency
COMMAw.d DOLLARw.d
reads nonstandard numeric data and removes embedded commas, blanks, dollar signs, percent signs, and dashes
COMMAXw.d DOLLARXw.d
reads nonstandard numeric data and removes embedded periods, blanks, dollar signs, percent signs, and dashes
$w.
reads standard character data
w.d
reads standard numeric data
compilation
sas checks the syntax of the data step sas creates an input buffer to hold the current raw data file record that is being processed sas creates a program data vector (PDV) to hold the current sas observation sas creates the descriptor portion of the output data set
missing values at end of a record
sas loads teh enext record to finisht the observation writes a note to the log
import procedure/ scantime=yes or no
scans all row values for a datetime data type field and automatically determines the time data type if only time values (that is, no date or datetime values) exist in the column example: scantime=yes;
import procedure/ scantext=yes or no
scans the length of text data for a data source column and uses the longest string data that is found as the sas column width. example: scantext=yes;
d
specifies an optional decimal scaling factor in the numeric informats
w
specifies the number of columns to read in the input data
export procedure/ dbms=
specifies the type of data to export. to export a dbms table, you must specify dbms= by using a valid database identifier. for example, dbms=excel specifies to export a table into an excel worksheet. example: dbms=excel;
import procedure/ usedate=yes or no
specifies which format to use. if usedate=yes, then the date. format is used for date/time columns in the data source table while importing data from excel workbook. if usedate=no, then a datetime. format is used for date/time. example: usedate=yes;
import procedure/ range=
subsets a spreadsheet by identifying the rectangular set of cells to import from the specified spreadsheet. example: range="australia$";
list input for nonstandard data
the : format modifier enables you to use an informat to read nonstandard delimited data an informat is an instruction that sas uses to read data values into a variable the width of the informat can be eliminated for character variables, the width of the informat determines the variable length, if it has not been previously defined
list input
to read, must be separated with a delimiter and acan be in standard or nonstandard form
where statement
used to obtain a subset of observations from an input data set. cannot be used to select records from a raw data file (use IF as a substitute)
list input for standard data
variables must be specified in the order that they appear $ indicate to store a variable value as a character value rather than as a numeric value the default length for character and numeric variables is eight bytes example: input Employee_ID First_Name $ Last_Name $;