4087 Chapter 10

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Nonmatches

At least one single observation in one data set is unrelated to any observation from another data set based on the values of one or more selected variables.

Methods of Merging (Combining)

1) One-to-one reading a) Creates observations that contain all of the variables from each contributing dataset. b) Combines observations based on their relative position in each data set. c) Statement: Set 2) Concatenating a) Appends the observations from one data set to another. b) creates a new data set c) Statement: SET 3) Interleaving a) Interleaving intersperses observations from two or more data sets, base on one or more common variables. b) Statement: SET, BY 4) Appending a) Appending the data adds the observations in the second data set directly to the end of the original dataset b) Procedure: APPEND 5) Match-merging a) Matches observations from two or more datasets into a single observation in a new data set according to the values of a common variable. b) Statements: Merge, BY

One-to-Many or Many-to-One

A single observation in one data set is related to more than one observation from another data set based on the values of one or more selected variables and vise versa.

One to one

A single observation in one data set is related to one and only one observation form another data set based on the values of one or more selected variables.

Concatenating (explained)

All the observations are read from the first data set listed in the SET statement. then all of the observations are read from the second data set listed and so on. The new data set contains all of the variables and observations from all of the input data sets. (Creates new data set)

Merge Procedure Steps (continued)

Before reading additional observations during a match-merge, SAS first determines whether there are observation remaining for the current By group

Concatenating (continued)

By default, a compile-time error occurs if the same variable is not the same type in all SAS data sets in the SET statement.

The IN= Data Set Option

Creates a variable that indicates whether the data set contributed data to the current observation. The variable created with the IN= data set option is temporary (in the PDV). Therefore, the variable is only available during the execution phase and is not written to the SAS data set. Ex: Merge EmpsAU (inEmps) PhoneC (in=Cell); 0 indicates that the data set did not contribute to the current observation 1 otherwise. Only available during execution and are not written to the SAS data set.

Outputting to Multiple Data Sets

Data EmpsAUC EmpsOnly PhoneOnly; merge EmpsAU(in=Emps) PhoneC(in=Cell); by EmpID; if Emps=1 and Cell=1 then output EmpsAUC; else if Emps=1 and Cell=0 then output EmpsOnly; else if.... run;

One to One Reading (Code Example)

Data clinic.one2one; set clinic patients; if age<60; set clinic measure; run; (Subsetting observations from the first data set and combining them with the observations from the second data set. Outputs the number of rows that the smaller data set of the two contains.

Concatenating (code example)

Data concat; set a c; run; both data sets contain two variables and three rows. Both contain the variable num. If there is a common variable between the two data sets, SAS assigns that variable the attributes of the first dataset. In this case it would be dataset 'a'

Appending (explained)

Does not create a new data set. Takes data from one table and 'appends' it to the bottom of a 'master' data set.

Quiz 10.11

How many observations in the final data set EmpsAUC are considered nonmatches? - 2 (Rows with missing spaces)

Quiz 10.2

How many observations will be in Emps after appending the three data sets? (Base table has 3, and the three tables that are being appended have 2) - 3+2+2+2=9

Quiz 10.4

How many observations will be in Emps if the program is submitted a second time? 9+2+2+2 - 15

Quiz 10.05

How many variables will be in EmpsAll2 after concatenating EmpsCN and EmpsJP? (Both data sets have 3 but two don't match) - 4 variables

Merge Procedure Steps (continued)

If there are observations remaining for the current BY group, they are read into the PDV, processed, and written to the output data set.

Interleaving (Go to page 10-35)

Interleaving intersperses observations from two or more data sets, base on one or more common variables. b) Statement: SET, BY

Quiz 10.3

It ask how many variables will be in Emps after appending the three data sets - The base data set has three so three

Match-Merging (explained)

Match merging combines observations from two ore more SAS data sets into a single observation in a new data set based on the values of one or more common variables.

IN= (Code example)

Matches Only data EmpsAUC; merge EmpsAU (in=Emps) PhoneC (in=Cell); by EmpID; if Emps=1 and Cell=1; run; 'if' sentence can also be written as 'if Emps and Cell;'

IN= (Code example)

Non Matches Only data EmpsAUC; merge EmpsAU (in=Emps) PhoneC (in=Cell); by EmpID; if Emps=0 and Cell=0; run; 'if' sentence can also be as 'if not Emps and not Cell;'

Append (requirements)

Only two data steps can be uses at a time in one step. The observations in the base data set are not read. The variable information in the descriptor portion of the base data set cannot change.

Merging Data Sets One-to-One (explained)

Prepare datasets for merging using the SORT procedure Merge SAS data sets one-to-one based on a common variable by using the MERGE and BY statements in a DATA step Observations can be merged based on their positions in the original datasets or merged by one or more common variables.

Sort before merge (code)

Proc sort data=work.EmpsAU; by descending First descending Last; run; The variables in the by statement must be common to all data sets.

Merge Procedure Steps

SAS re initializes variables in the PDV at the start of every DATA step iteration. Variables created by an assignment statement are reset to missing, but variables that are read with a MERGE statement are not reset to missing.

Concatenating (continues)

SAS re initializes variables in the PDV at the start of every DATA step iteration. Variables created by assignment statements are reset to missing, but variables that are read with a SET statement are not reset to missing until the input SAS data set changes

Eliminating Duplicates with Sort Procedure

The NODUPKEY option deletes observations with duplicate BY values The EQUALS option maintains the relative order of the observations within the input data set in the output data set for observations with identical BY values Ex: Proc sort data=EmpsDUP out=EmpsDUP1 nodupkey equals; by EmpID; run;

Append vs. Set Statement

The data set that results from concatenating two data sets with the SET statement is the same data set that results from concatenating them with the append procedure if the two data sets contain the same variables. The APPEND procedure concatenates much faster than the SET statement because the APPEND procedure does not process the observations from the BASE=data set. The two methods are significantly different when the variables differ between data sets. SET uses all variables and assigns missing values where appropriate.

Rename Statement

Use this when concatenating and variables names do not match up data empsALL; set empsCN (rename = (country=region)) empsJP; run;

Quiz 10.10

What are the modified, completed statements? - Data work.payadd; merge work.payroll work.addresses; by Employee_ID; run;

Quiz 10.12

What are the values of Emps and Cell? Data EmpsAUC; merge EmpsAU (in=Emps) PhoneC (in=Cell) ; run; -The PDV shows a value that is only in Phone C so Emps = 0 and Cell = 1

Quiz 10.1

Which method (appending, concatenating, or merging) should be used for the given business scenario? The JanSales, FebSales, and MarSales data sets need to be combined to create the Qtr1Sales data set. - Concatenating (New data set) The Sales data set needs to be combined with the Target data set by month to compare the sales data to the target data - Merging (join on month) The OctSales data set needs to be added to the YTD data set - Append (adding one to another)

Quiz 10.07

Which method would you use if you wanted to create a new variable at the time of concatenation? -SET Statement Data EmpsBonus; Set empsDK empsFR; if Country='Denmark' then Bonus=300; else Bonus=500; run;

Quiz 10.06

Which statement has correct syntax? -Set EmpsCN(rename=(Country=Location)) EmpsJP(rename=(Region=Location);

Quiz 10.08

Which step is sorting the observations in a SAS data set and overwriting the same SAS data set? - Proc sort data=work.EmpsAU by First; run;

Quiz 10.13

Which subsetting IF statement can be added to the Data step to only output the matches? -If Emps=1 and Cell=1;

Appending (code example)

proc append base=A data=B; run; BASE = names the data set to which observations are added. DATA = names the data set containing observations that are added to the base data set.

Appending (use FORCE options)

when appending data sets and there is a variable in the data that is not in the base, you have to use the force option. Example: Proc append base = emps data = emps2010 force; emps2010 had a column that emps did not, but that extra column from emps2010 is not added, just the similar columns.


Ensembles d'études connexes

Abnormal Psych: Schizophrenia and other psychotic disorders

View Set

Completa este párrafo con las preposiciones Por o Para (Pg. 384)

View Set