stats reading quizzes
Which of the following is the correct syntax for "not equal to"?
!=
What is the correct syntax for the "pipe" operator?
%>%
What operator allows you to keep adding layers to a ggplot() object?
+
box
1st quartile, median, 3rd quartile (i.e. the middle 50% of the data
What is the default number of bins R uses to build a histogram?
30
What is the default value for the na.rm argument in R?
FALSE
Select the statements that are TRUE about the "+" sign when using ggplot().
Not using the "+" sign to add a geometric object will result in an empty plot. The "+" sign adds a layer to the plot.
What is the name of the phenomenon where relationships that exist in aggregate disappear or reverse when the data are broken into groups?
Simpsons Paradox
What is the name of the aesthetic argument that allows you to change the transparency of a geometric object in ggplot()?
alpha
Which of the following are TRUE of a linear regression model?
an explanatory variable can be numerical or categorical it can include more than one explanatory variable, x it can include more than one outcome variable, y
What is the name of the dplyr function that allows you to sort the rows of a data frame by the alphanumeric order of a variable/column?
arrange()
Which of the following graphs can be used to visualize a single variable?
boxplot, barplot, histogram
Which line of code will successfully create a new data frame flights_500mi containing only flights that travel at least 500 miles?
flights_500mi <- flights %>% filter(distance >= 500)
Suppose we have a function (function_name) that takes two arguments (i.e. two inputs) named argument1 and argument2. What is the correct syntax to get this function to run in your console?
function_name(argument1, argument2)
What is the name of the dplyr function that allows you to transform an un-tidy data frame in wide format into a tidy data frame in long format?
gather()
What does the term "argument" refer to in R?
input to a function
length
interquartile range (i.e. a measure of the spread of the data)
Which of the following are typically used to visualize the relationship between two numeric variables?
linegraph scatterplot
What is the name of the commonly-used modeling technique that is the focus of Chapter 5? (Hint: it's two words)
liner regression
Which function allows you to create a new variable?
mutate()
Which of the following arguments successfully removes missing values before computing a numerical summary (e.g. a mean, standard deviation, etc
na.rm = TRUE
Rows correspond to
observations
Which function allows you to import a .csv file into R?
read_csv()
Which type of plot would be best to visualize differences in the distribution of life expectancy by continent?
side-by-side boxplots
Which of the following are examples of aesthetic attributes of geometric objects?
size color position (e.g. x and/or y coordinates) shape
What is the term used to describe data that is in the format required for analysis with the ggplot2 and dplyr packages?
tidy
Columns correspond to
variables