ST308 Quiz 3
geom_smooth()
Adds a trend line to a plot
longer (more rows and less columns)
Analysis methods often prefer (longer/wider)?
.cols =
Attribute in across() that specifies the columns you want to apply the func to
.fns =
Attribute in across() that specifies the func you want to apply
use = "complete.obs"
Attribute in cov() and cor() functions that removes NA values in calculation
sep
Attribute that specifies the character used to separate or unite column(s)
aes(x = ...)
Attribute used to specify that we want our categories across x-axis
values_to
Attribute within pivot_longer() that gives new name(s) for data values
names_to
Attribute within pivot_longer() that provides new name(s) for columns created
cols
Attribute within pivot_longer() that specifies columns to pivot to longer format
values_from
Attribute within pivot_wider() that specifies the column(s) to get the cell values from
names_from
Attribute within pivot_wider() that specifies the column(s) to get the names used in the output columns
.fns = , .cols =
Attributes of across() func
linear relationship
Covariance, Correlation, etc are measures of
dplyr::across()
Func that allows for applying a summarization to multiple columns easily
geom_jitter()
Func used to create box plot with jitter qualities
geom_point()
Func used to create scatter plot
geom_violin()
Func used to create violin plot similar to boxplot
levels() <- c()
Func used to specify the levels of a factor (variable)
geom_boxplot()
Function from ggplot2 used to create a boxplot
geom_density()
Function from ggplot2 used to create a kernel smoother (smoothed version of a histogram)
pivot_longer()
Function that lengthens data by increasing the number of rows and decreasing the number of columns
cor()
Function that returns correlation value
cov()
Function that returns covariance value
pivot_wider()
Function that widens data by increasing the number of columns and decreasing the number of rows
as.factor()
Function used to create a new factor version of a variable
table()
Function used to create contingency table
geom_text()
Function used to describe text
shape
Histogram, Density plot, etc describe the ______ of the data (numeric)
Describe the relative frequency (or count) for each category
How do we describe the distribution of a categorical variable?
shape, measures of center, and measures of spread
How do we describe the distribution of a numeric variable?
center
Mean, Median, etc are measures of....
position = "..."
Syntax for position attribute
contingency tables
Table that describes the relative frequency (or count) for each category (cat variables)
fill
Used in position attribute and stacks bars and standardizes each stack to have constant height
stack
Used in position attribute and stacks bars on top of each other
jitter
Used in position attribute for continuous data with many points at same values
dodge
Used in position attribute to create side-by-side bar plot
Categorical (Qualitative) variable
Variable where entries are a label or attribute
Numeric (Quantitative) variable
Variable where entries are a numerical value where math can be performed
spread
Variance, Standard Deviation, Quartiles, IQR, etc are measures of
cols, names_to, and values_to
What are the attributes of pivot_longer()?
names_from and values_from
What are the attributes of pivot_wider()?
sep
What attribute is important to write out in the separate and untie functions?
fill = "..."
What do we write using the labs attrb to label the legend for stacked barplots?
Q3 - Q1
What is IQR?
Shape and measures of linear relationship
What is used to describe the dist of two numeric variables?
summarise(avg = mean(fare, na.rm = TRUE), med = median(fare, na.rm = TRUE), var = var(fare, na.rm = TRUE))
What to write if we wanted to make varibales based on summaries of subgroups of data: avg variable that gives mean for each subgroup, med variable that gives median for each subgroup, and var variable that gives variance for each subgroup
[ , , ]
What to write when you want to find conditional bivariate info from three-way contingency table
group_by(var1, var2) %>% summarise(.....)
What to write when you want to find summary values for subgroups based on two variables?
In the ggplot() func
Where does the aes(x = ...) go?
na.rm = TRUE
Which attribute is used for numeric functions to remove NA values in calculation?
aes()
Which attribute maps variables in the data frame to plot elements?
if_else()
Which function is used to execute statements conditionally to create a variable?
group_by()
Which function is used when creating summaries for groups?
summary()
Which function returns the Min, 1st Qu, Median, Mean, 3rd Qu, Max, and NA's?
Scatter plot
Which kind of plot is used to describe the shape of distribution of two num varibales?
tidyr
Which package is used to reshape data?
levels
______ define all possible values for the factor (variable)
probs =
attribute in quantile() that specifies which percentage of quantile you want to return
alpha =
attribute used specify transparency
labs
attribute used to label things in plot
aes()
defines visual properties of objects in the plot
coord_flip()
func that rotates a plot
label = paste()
func used to when adding text to plot
ggplot() + geom_bar() and ggplot() + stat_count()
funcs used to create bar plots
quantile()
function that returns quantile(s) of specified value(s)
where()
function used in across() .cols to specific which columns based on a specific characteristic
mutate()
function used to add newly created column(s) to current data frame (doesn't overwrite the data frame)
unite()
function used to combine two columns
tidyr :: drop_na(var_name)
function used to remove NA class for a variable
separate()
function used to separate a column
stat = "identity"
if you have summary data and don't want to use stat = "count", specify y and use .....
ggplot
package used to create plots
distribution
pattern and frequency with which you observe a variable
factor
special class of vector with a levels attribute