LSUS BADM790 Module 4

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

group_by()

Add grouping structure to rows in data frame. Note this does not change values in data frame.

arrange()

Arrange rows of a data variable in ascending (default) or descending order

mutate()

Create new variables by mutating existing ones

4. Is a dataset transformed when only using the group_by() function?

It is important to note that the group_by() function doesn't change data frames by itself. Rather it changes the meta-data, or data about the data, specifically the grouping structure. It is only after we apply the summarize() function that the data frame changes.

filter()

Pick out a subset of rows

summarize()

Summarize many values to one using a summary statistic function like mean(), median(), etc.

DYPLR arrange

arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp. freq_dest %>% arrange(num_flights) arrange() always returns rows sorted in ascending order by default. To switch the ordering to be in "descending" order instead, we use the desc() function as so: freq_dest %>% arrange(desc(num_flights))

1. Know when to use the different dplyr verbs (filter, arrange, group_by, mutate).

filter() Pick out a subset of rows summarize() Summarize many values to one using a summary statistic function like mean(), median(), etc. group_by() Add grouping structure to rows in data frame. Note this does not change values in data frame. mutate() Create new variables by mutating existing ones arrange() Arrange rows of a data variable in ascending (default) or descending order inner_join() Join/merge two data frames, matching rows by a key variable filter() a data frame's existing rows to only pick out a subset of them. For example, the alaska_flights data frame. summarize() one of its columns/variables with a summary statistic. Examples of summary statistics include the median and interquartile range of temperatures as we saw in Section 2.7 on boxplots. Note there is a subtle but important difference between sum() and n(); while sum() returns the sum of a numerical variable, n() returns a count of the number of rows/observations. group_by() its rows. In other words, assign different rows to be part of the same group. Then we can combine group_by() with summarize() to report summary statistics for each group separately. For example, say you don't want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one for each of the three origin airports. mutate() its existing columns/variables to create new ones. For example, convert hourly temperature recordings from degrees Fahrenheit to degrees Celsius. arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp. join() it with another data frame by matching along a "key" variable. In other words, merge these two data frames together.

DPLYR filter

filter() a data frame's existing rows to only pick out a subset of them. For example, the alaska_flights data frame. portland_flights <- flights %>% filter(dest == "PDX") View(portland_flights) btv_sea_flights_fall <- flights %>% filter(origin == "JFK" & (dest == "BTV" | dest == "SEA") & month >= 10) View(btv_sea_flights_fall)

DYPLR group_by

group_by() its rows. In other words, assign different rows to be part of the same group. Then we can combine group_by() with summarize() to report summary statistics for each group separately. For example, say you don't want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one for each of the three origin airports. summary_monthly_temp <- weather %>% group_by(month) %>% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_monthly_temp diamonds %>% group_by(cut) %>% ungroup() If you want to group_by() two or more variables, you should include all the variables at the same time in the same group_by() adding a comma between the variable names.

DYPLR mutate

mutate() its existing columns/variables to create new ones. For example, convert hourly temperature recordings from degrees Fahrenheit to degrees Celsius. weather <- weather %>% mutate(temp_in_C = (temp - 32) / 1.8) flights <- flights %>% mutate(gain = dep_delay - arr_delay) flights <- flights %>% mutate( gain = dep_delay - arr_delay, hours = air_time / 60, gain_per_hour = gain / hours )


Set pelajaran terkait

Chapter 5 Financial Services: Savings Plans and Payment Accounts

View Set

Economic Interdependence Ch.16 APHG

View Set

Julius Caesar Comprehension Questions

View Set