Reading in Excel/Raw Data Files, Session 4

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

export wizard steps

1. select the data set from which you want to export data 2. select teh type of data source to which you want to export files. 3. assign the output file. 4. assign the table name. 5. save the generated proc export code. (optional)

Input statement example for dates

Birth_Date :date.; Hire_Date :mmddyy.;

delimiter

a space (blank) is the default delimiter the dlm=option can be added to the infile statement to specify an alternate delimiter

nonstandard data

any data that sas cannot read without a special instruction examples: 5,823 (23) $67.23 01/12/1999 12may2006

copy procedure

as an alternative to the data step, this can be used to create an excel worksheet

libname statement

assigns a library reference name (libref) to a sas data library example libname orion 's:\workshop'; orion=libref 's:\workshop';=physical location of sas data library

SAS name literals

by default special characters such as $ are not allowed in data set names sas name literals enable special characters to be included in data set names a sas name literal is a name token that is expressed as a string within quotation marks, followed by the letter n. example: orion.'Australia$'n 'Australia$'n=sas name literal

sas/access libname statement with the data step

can be used to create an excel worksheet

dsd delimiter

changes how sas treats delimiters when you use LIST input and sets teh default delimiter to a comma. when you specify this, sas treats two consecutive delimiters as a missing value and removes quotation marks from character values. this option specifies that when data values are enclosed in quotation marks, delimiters within the value be treated as character data.

procedures

code based

input statement specifications

column input-enables you to read standard data values that are aligned in columns in the raw data file formatted input-combines the flexibility of using informats with many of the features of column input. by using formatted input, you can read nonstandard data for which sas requires additional instructions. list input-uses a scanning method for locating data values. data values are not required to be aligned in columns, but must be separated by at least one blank or other defined delimiter.

data step processing steps

compile program initialize variables to missing (PDV) execute read statement execute other statements output to sas data set

import procedure/ mixed=yes or not

converts numeric data values for a column that contains mixed data types. the default is NO, which means that numeric data will be imported as missing values in a character column. if mixed=yes, then the engine will assign a sas character type for the column and convert all numeric data values to character data values example: mixed=no;

COMMA. DOLLAR.

converts raw data ($12,345) to sas data value of 12345

COMMAX. DOLLARX.

converts raw data value of ($12.345) to sas data value of 12345

EUROX.

converts raw data value of (E12.345) to sas data value of 12345

MMDDYY.

converts raw data value of 01/01/60 to a sas data value of 0

DDMMYY

converts raw data value of 31/12/60 to a sas data value of 365

DATE.

converts raw data value of 31DEC59 to a sas data value of -1

standard data

data that sas can read without any special instructions examples: 58 -23 67.23 00.99 5.6E5 1.2E-2

reading in a delimited raw data file

data___________ infile__________ input_____________ ... run;

length statement

defines the length of a variable explicitly example: length first_name $12 last_name $12

input statement

describes the arrangement of values in the raw data file and assigns input values to the corresponding sas variables

tab delimiter

dlm='09'x.

SAS explorer window

each worksheet in the excel workbook is treated as though it is a sas data set. worksheet names appear with a dollar sign at the end of the name

import/export wizards and procedures

enable you to read and write data between sas data sets and external pc files

import wizard

enables you to read data from an external data source and write to a sas data set steps 1. select the type of file you are importing 2. locate the input file 3. select the table range or worksheet from which to import data 4. select a location to store the imported file 5. save the generated PROC IMPORT code (optional)

sas/access libname statement

extends the libname statement to support assigning a library reference name (libref) to microsoft excel workbooks this enables you to reference worksheets directly in a DATA step or SAS procedure, and to read from and write to a microsoft excel worksheet as though it were a sas data set. mixed=yes/no specifies whether to convert numeric data values into character data values for a column with mixed data types example: libname orion 's:\workshop\sales.xls'; orion=libref 's:\workshop\sales.xls';=physical file name of excel workbook including path, filename, and extension

export procedure/ outfile=

filename/specifies teh complete path and filename or a fileref for the output PC file, spreadsheet or delimited external file example: outfile="s:\workshop\qtr2007c.xls"

import procedure/ datafile=

filename/specifies the complete path and filename or a fileref for the input pc file, spreadsheet or delimited external file. example: datafile="s:\workshop\sales.xls"

import procedure/ getnames=yes or no

for spreadsheets and delimited external files, determines whether to generate sas variable names from the column names in the input files first row of data. note that if a column name contains special characters that are not valid in a sas name, such as a blank, sas converts the character to an underscore. example: getnames=yes;

data step processing

has two phases: compilation execution

import procedure/ dbms=

identifier/specifies the type of data to import. to import a dbms table, you must specify dbms=using a valid database identifier. for example, dbms=excel specifies to import a microsoft excel worksheet example: dbms=excel;

infile statement

identifies the physical name of the raw data file to read with an input statement the physical name is the name that the operating environment uses to access the file example: infile 's:\workshop\sales.csv';

disassociating a libref

if sas has a libref assigned to an excel workbook, the workbook cannot be opened in excel. to disassociate a libref, use a libname statement and specify the libref and the CLEAR option. example: libname orion clear;

$

indicates a character informat

dsd example

infile 'phone2.csv' dsd;

missover example

infile 'phone2.csv' missover;

.

is a required delimiter

reading in a sas data set

libname___________ data__________________ set___________ ... run;

reading in an excel worksheet

libname_____________ data____________ set___________ ... run;

export procedure/ data=

libref/identifies the input sas data set example: data=orion.qtr1_2007

import procedure/ out=

libref/identifies the output sas data set example: out=work.subset2a

informat

names the sas informat or user defined informat

DSD

option for the infile statement sets the default delimiter to a comma treats consecutive delimiters as missing values enables sas to read values with embedded delimiters if the value is surrounded by quotation marks

import procedure/ replace

overwrites an existing sas data set. if you do not specify replace, proc import does not overwrite an existing data set. example: dbms=excel replace;

wizards

point and click interfaces to invoke from the sas windowing environment, select file and import data or export data

missover option

prevents sas from lading a new record when the end of the current record is reached. if sas reaches the end of the row without finding values for all fields, variables without values are set to missing

data statement

provides the name of the sas data set being created the data statement can create temporary or permanent data sets

export wizard

reads data from a sas data set and writes it to an external file source

EUROXw.d

reads nonstandard numeric data and removes embedded characters in European currency

COMMAw.d DOLLARw.d

reads nonstandard numeric data and removes embedded commas, blanks, dollar signs, percent signs, and dashes

COMMAXw.d DOLLARXw.d

reads nonstandard numeric data and removes embedded periods, blanks, dollar signs, percent signs, and dashes

$w.

reads standard character data

w.d

reads standard numeric data

compilation

sas checks the syntax of the data step sas creates an input buffer to hold the current raw data file record that is being processed sas creates a program data vector (PDV) to hold the current sas observation sas creates the descriptor portion of the output data set

missing values at end of a record

sas loads teh enext record to finisht the observation writes a note to the log

import procedure/ scantime=yes or no

scans all row values for a datetime data type field and automatically determines the time data type if only time values (that is, no date or datetime values) exist in the column example: scantime=yes;

import procedure/ scantext=yes or no

scans the length of text data for a data source column and uses the longest string data that is found as the sas column width. example: scantext=yes;

d

specifies an optional decimal scaling factor in the numeric informats

w

specifies the number of columns to read in the input data

export procedure/ dbms=

specifies the type of data to export. to export a dbms table, you must specify dbms= by using a valid database identifier. for example, dbms=excel specifies to export a table into an excel worksheet. example: dbms=excel;

import procedure/ usedate=yes or no

specifies which format to use. if usedate=yes, then the date. format is used for date/time columns in the data source table while importing data from excel workbook. if usedate=no, then a datetime. format is used for date/time. example: usedate=yes;

import procedure/ range=

subsets a spreadsheet by identifying the rectangular set of cells to import from the specified spreadsheet. example: range="australia$";

list input for nonstandard data

the : format modifier enables you to use an informat to read nonstandard delimited data an informat is an instruction that sas uses to read data values into a variable the width of the informat can be eliminated for character variables, the width of the informat determines the variable length, if it has not been previously defined

list input

to read, must be separated with a delimiter and acan be in standard or nonstandard form

where statement

used to obtain a subset of observations from an input data set. cannot be used to select records from a raw data file (use IF as a substitute)

list input for standard data

variables must be specified in the order that they appear $ indicate to store a variable value as a character value rather than as a numeric value the default length for character and numeric variables is eight bytes example: input Employee_ID First_Name $ Last_Name $;


Kaugnay na mga set ng pag-aaral

Classical Civilisation - Greek and Roman priests

View Set

Econ Chapter 4 Test Review Demand

View Set

Marketing Management: Chapter 10

View Set

Chapter 22 - Physiologic Adaptations of the Newborn

View Set