ST 307 : Statistical Programming

Ace your homework & exams now with Quizwiz!

"Dataskin" / "Stat"

"Dataskin" changes the look of the bars in a graph. "Stat" sets the statistic for the y-axis. (examples are Freq, Mean, Median, Percent, Sum). Ex: Vbar Type / Dataskin = matte Stat = percent; Run;

"Limits" / "Limitstat"

"Limits" are where you want error bars to be drawn. (relates to alpha and confidence interval for data). "Limitstat" changes the limit type (Stderr, Stdev, CLM...)

"Markerattrs" (Markerfillattrs, Markeroutlineattrs...)

"Marker Attributes." -Affects the color, size, symbol, outline etc. of the marker, or points on the graph. Ex: PROC SGplot Data = something; Scatter X = xvariable Y = yvariable; Markerattrs=(Color = blue Symbol=diamondfilled); Run;

IF/Then/Else Statement in SAS Example

... IF (Energy > 5) AND (Start = 1) THEN Quality= "Good"; ELSE IF (Start = 1) THEN Quality = "Ok"; ELSE Quality = "Other"; Run; -the if/then/else statement helps create NEW variables

"SET"

A data step used to create a new data set from an old existing SAS one. Ex: Data mydata.cars; SET sashelp.cars; Run;

"Datalines"

A datalines statement allows you to input the actual data yourself as you would want it. Ex: Datalines; Fedora Blue 10 TopHat Red 15 ; Run;

Density Plot

A density plot is basically a smooth histogram. -default normal distribution -if you want to overlay graphs, you put them in the statement in order you want them to be Ex. Proc SGplot data = sashelp.cars; Histogram msrp / Dataskin = sheen; Density msrp; Density msrp / type = kernel; Run;

Correlation

A unitless measure of strength and direction of the linear relationship between 2 variables. -between -1 (exact negative relationship) and 1

"Datalabel"

Adds labels. -default is FREQ

Missing Y Trick

Allows us to get the confidence / prediction interval for a value of a variable NOT in the data set. -2 statements Ex. Data temp; Input syrup rep $ l a b; Datalines; 49 1 . . . ; Proc Datasets; Append Base=mydata.cheese data=temp; Run;

"Proc Datasets"

Allows us to view the Descriptor portion of a data set. -copy, rename, delete sas files -list all files in a library -edit some variable attributes (name, labels...) Ex: Proc Datasets library = sashelp.heart; ContentsData = dataset <options>; QUIT;

Hypothesis Tests

Answers the question of whether or not a particular value are reasonable for "u" or if the data contradicts that theory.

"DSD"

Automatically changes the delimiter to a comma. -helps if there is more than 1 comma in a row

"CLPARM" / "CLPARM" / "CLI"

CLPARM = gives confidence intervals for Beta's CLM = give confidence intervals for MEAN RESPONSE at each set of predictor values in data set CLI = gives prediction intervals for a new response at each set of predictor values in the data set -future observation (lots of variation) -CLM and CLI CAN'T BE in the same statement

"Proc Freq"

Calculates summary statistics for categorical variables. PROC FREQ: -A single variable uses a 1-way contingency (frequency) table -Multiple variables use 2-way contingency tables

Delimiter

Character that separates data values. -"3,Hello,ST" (comma) -"3 Hello ST" (space "09"x)

"CLPARM"

Creates confidence interval statistics in your tables.

"Trim"

Cuts out outer data. -determine how much to cut off Ex. Trim = 0.05

Confidence Interval Options in SAS

Data = input data set Alpha = (1 - Confidence Level) H0 = "H naught" variable, null hypothesis Sides = 1 or 2-sided test (U=upper 1-sided, L=lower 1-sided, 2=2-sided) CI = confidence interval for st. deviation -The default test for alpha = 0.05 (95% confident), 2-sided, and tests that the null hypothesis = 0

Keeping / Dropping Data from a Set Example

Data mydata.chis; SET mydata.chis; Where BMI > 20; Run; -this keeps data ONLY where BMI > 20 -Can also add a "DROP" statement Data mydata.chis; SET mydata.chis; Where Asian EQ "0"; Run; -this only keeps variables in data set where Asian variable is EQUAL to 0

Example of a FULL Infile Step

Data mydata.student; Infile "C:\Users\student.txt" Firstobs=2 DLM=","; Length Name $ 12; Input s_perc Percent6. gpa stat : Comma10. Format s_perc Percent8.2 Graddate ddmmyy10.; Label s_perc="Percent Stat Completed" Run;

How to Store Data in SAS

Data sets can be temporary or permanent. -stored in "library," which is a collection of SAS files that are stored in the same directory

Population / Sample

Entire group of units you are studying. sample = a subset of the population

Subset Variables

Ex. DROP/KEEP statement, drop/keep options

Subset Observations

Ex. WHERE statement, IF statement

Multiple Linear Regression

Fits a best "plane" through 3D data. -model with 2 predictors -fits more flexible surface through data

"Proc Univariate"

Helps create data sets / histograms / tables...

P-Value

Helps you determine the significance of your results. It is between 0 and 1. -If the p-value is less than alpha, you reject the null (original) hypothesis. If it is greater, you "fail to reject" the null hypothesis (alternative could be true).

Colon

Indicates that the value is to be read in from the next nonblank column (or delimiter) or end of a dataline.

Simple Linear Regression

Is there a linear relationship between x and y. (asks same question as correlation). -Between the RESPONSE (L) and the COVARIATE (syrup) Ex. Proc GLM data=mydata.cheese PLOTS = All; Model L = Syrup; Run; QUIT;

Contingency Tables

Main way to numerically summarize categorical data. -bar plot -comparative bar plot (multiple variables) -Proc SGplot

One-level vs. Two-level names

One-Level Names assume the data set is in the work library. Ex: Data = housedata Two-Level Names specify the library and the data set names. Ex: Data = sashelp.cars -in sashelp library and we're using the "cars" data set within it

Operators

Operators specify arithmetic operations or when to keep or drop data. Ex: NE (not equal to), GT (greater than), LT (less than), GE (greater than or equal to), IN (in a list)...

Confidence Intervals

Provide a range of values for which we are "confident" contain the true mean.

Reading External Data

Read in external files with data steps or proc steps. -different methods for TXT/DAT/CSV (comma separated values), XLS... -use an INFILE statement for EXTERNAL DATA ONLY

Inference

Relating a sample to a population.

Fisher

Requests confidence intervals for correlation and p-values under a specified null hypothesis. -includes Pearson / Spearman correlations

"RespAsc" / "RespDesc"

RespAsc stacks or places data in ascending order. Desc does descending order.

Content & Descriptor Portion of Data Sets

SAS data sets are composed of 2 pieces. Content Portion = a collection of variables on each record. -variables are stored as columns, records (observations) are stored as the rows Descriptor Portion = information about the data set. -number of observations -type of each variable -name / length of variables -format, informat, label...

"Nendpoints"

Specifies how many endpoints you want.

Permanent Data Set

Stored in a SAS library created and named by you. -usable in current / future sas sessions -saved as .sas7bdat extension -created with a "Libname" statement Ex: Libname Mydata "C:\Users\Desktop"; Run;

Temporary Data Set

Stored in the SAS folder "Work Library." -usable only if current sas session is open -it is lost when the sas session closes

Formats

Tells SAS how to DISPLAY a variable in the new chart or graph. -"label" statement

Class Statement

Tells SAS which variable in the data set specifies the 2 different populations. -only take on 2 values

F-Value

Tells if a GROUP of variables are jointly significant. -T-test is for ONE variable only.

"Firstobs"

Tells you which row the "First Observation" or record is on. -commonly firstobs=2 (to skip the title blocks)

Procttest / Proc GLM

Test a mean from a normal population. -Proc GLM lets you specify any degree of interaction and nested effects options = Class, Paired, By, Var, Freq, Weight...

Categorical Data

The values represent a category (M/F, Yes/No...) -attributes or labels -CHAR = character data (no numbers) -mathematical operations are not meaningful here

Informats

These tell SAS how to READ IN a variable. -character informats start with a $ sign ($13.) Ex: 123,456 = "comma7." (to say there is a comma and 7 characters, including the comma as one) 123,456.00 = "comma10.2" (10 characters total, with 2 numbers after the decimal) 01/04/98 = "ddmmyy8." (8 characters, can also have 10)

Quantitative Data

Uses numbers. Measures of Center : Mean, Median Measures of Spread : Variance, St. Deviation, Inter - Quartile Range... Proc Univariate is for a single variable. Proc Corr (elation) is for multiple variables Ex: Proc Univariate data = sashelp.heart; By type; Var heigh weight; Histogram msrp; Run;

Spearman's Correlation Coefficient

Uses ranks of data points instead of actual points. -sample mean is NOT robust to outliers -uses the ranks to determine correlation -values LESS EFFECTED by outliers -Ex. VAR rankx ranky (this tells us its Spearman w/o specifying it elsewhere in the statement)


Related study sets

The Definitive Anatomy EXAM 2 QUIZLET

View Set

Adolescent Psychology Exam 3 Ch 8,7,12,6

View Set

3.4.1 - The Cell Interior - The Cytoskeleton

View Set

Output Device Review - Career Prep A

View Set

Starting Out with Python, 3e Ch 8

View Set

Allergic Rhinitis, Cough and Cold

View Set

Chapter 1- Systems Approach to a Foodservice Organization

View Set

Checkpoint Exam - Ethernet Concepts Exam

View Set

Corporate Compliance: A Proactive Stance (2019)

View Set