Lesson 9: Joining Tables in R

Ace your homework & exams now with Quizwiz!

Join on Specific Columns I

Generally, the orders data frame would not have the column order_id and the customers data frame would not have the column customer_id. One way that we could address this problem is to use the dplyr function rename() to rename the columns for our joins. In the example below, we will rename the column id in the customers data frame to customer_id, so that orders and customers now have a common column to join on. customers <- customers %>% rename(customer_id = id) inner_join(orders, customers)

Joining Tables

Matching a piece of data between two data sets.

Inner Join I

The inner_join() method in dplyr looks for columns that are common between two data frames and then looks for rows where those columns' values are the same. It then combines the matching rows into a single row in a new table. We can call the inner_join() method with two data frames like this: joined_df <- df_1 %>% inner_join(df_2) This will match up all of the customer information to the orders that each customer made.

Join on Specific Columns II

We can add the by argument when calling inner_join() to specify which columns we want to join on. In the example below, the "left" table is the one that comes first (orders), and the "right" table is the one that comes second (customers). This syntax says that we should match the customer_id from orders to the id in customers. This syntax says that we should match the customer_id from orders to the id in customers. orders %>% inner_join(customers, by = c('customer_id' = 'id')) We could use the following code to make the suffixes reflect the table names: orders %>% inner_join(customers, by = c('customer_id' = 'id'),

Inner Join II

We can use the pipe %>% to join multiple data frames together at once. The following command would join df_1 with df_2, and then join the resulting data frame with df_3: big_df <- df_1 %>% inner_join(df_2) %>% inner_join(df_3)

Full Join

When we join two data frames whose rows don't match perfectly, we lose the unmatched rows. If we wanted to combine the data from both companies without losing the customers who are missing from one of the tables, we could use a Full Join. A Full Join would include all rows from both tables, even if they don't match. Any missing values are filled in with NA. full_joined_dfs <- company_a %>% full_join(company_b)

Concatenate Data Frames

When we need to reconstruct a single data frame from multiple smaller data frames, we can use the dplyr bind_rows() method. This method only works if all of the columns are the same in all of the data frames. If we want to combine two data frames, we can use the following command: concatenated_dfs <- df1 %>% bind_rows(df_2)

Left Join

A Left Join includes all rows from the first (left) table, but only rows from the second (right) table that match the first table. For this command, the order of the arguments matters. If the first data frame is company_a and we do a left join, we'll only end up with rows that appear in company_a. By listing company_a first, we get all customers from Company A, and only customers from Company B who are also customers of Company A. left_joined_df <- company_a %>% left_join(company_b)

Right Join

A Right Join is the exact opposite of left join. Here, the joined table will include all rows from the second (right) table, but only rows from the first (left) table that match the second table. By listing company_a first and company_b second, we get all customers from Company B, and only customers from Company A who are also customers of Company B. right_joined_df <- company_a %>% right_join(company_b)


Related study sets

FHSU 2021 Graduate Seminar Study Guide

View Set

Financial Management Exam 1 chp 1

View Set

Carmen Homework 11 - Sampling Distributions

View Set