stat 202 final

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Which of the following is the correct syntax for "not equal to"?

!=

library() credit_ch6 Credit as_tibble() %>% select(debt = Balance, = Limit, = Income, credit_rating = , age = Age)

ISLR <- %>% credit_limit income rating

Which of the following are correct interpretations of the confidence interval for the lego sets minifigures example

We are 95% confident that β1β1 falls between the values 8.38 and 12.70 We are 95% confident that the price increase associated with adding one additional minifigure to a Harry Potter set is between $8.38 and $12.70

Which of the following summary statistics are included in the five-number summary and are used to construct a boxplot when there are no "outliers" in the data?

first quantile (Q1, 25th percentile) third quantile (Q3, 75th percentile) median minimum maximum

lm()

fit a linear regression model

lm()

fits a linear regression model

flights_with_airport_names <- %>% inner_(airports, = c("dest" = "faa"))

flights join by

rbernoulli()

generates results of random coin flips

barplots

geom_bar or geom_col

boxplots

geom_boxplot()

When x refers to the country Germany, what is the value of the indicator function 1Amer(x)?

0

What is the standard deviation of the variable temp in the weather dataset?

17.8

how many columns and rows are in debt_model_data

6 and 400

Which of the following help ensure research is reproducible?

Literate programming (i.e. code that is readable)Well-documented data cleaning and analysesUsing a workflow tool such as RMarkdown

What is the name of the dplyr function that allows you to sort the rows of a data frame by the alphanumeric order of a variable/column?

arrange()

Fill in the blanks of the code that creates a scatterplot of teaching score by beauty score and plots the regression line. ggplot(evals_ch5, aes(x = , y = )) + geom_point() + labs( = "Beauty Score", = "Teaching Score", title = "Relationship of teaching and beauty scores") + geom_(method = )

bty_avg score x y smooth "lm"

geom_col()

categorical data that is NOT pre-counted

geom_bar()

categorical data that is pre-counted

each variable forms a

column

skimr

computing summary stats

drinks_smaller_tidy <- drinks_smaller %>% gather(key = type, value = servings, -country) gather drinks_smaller type country Drinks_smaller_tidy

function to wrangle the data name of un-tidy data frame name of the variable in the new tidy data frame that contains the column names of the original un-tidy data frame name of the column you don't want to tidy name of tidy data frame

histograms

geom_histogram

Which function allows you to create a new variable?

mutate()

each observation forms a

row

select()

selects columns of a data set

filter ()

selects rows of a data frame

Which type of plot would be best to visualize differences in the distribution of life expectancy by continent?

side-by-side boxplots

Which of the following are visualization techniques that allow comparison of distributions of a numerical variable split or grouped by another variable? Select all that apply.

side-by-side boxplotsfaceted histogram

What is the name of the phenomenon where relationships that exist in aggregate disappear or reverse when the data are broken into groups?

simpsons paradox

arrange()

sort by a variable

What is the term used to describe data that is in the format required for analysis with the ggplot2 and dplyr packages?

tidy

What is the default number of bins R uses to build a histogram?

30

What is the preferred plot for visualizing a categorical variable?

barplot

select()

only keep desired columns

Confirmatory research typically starts with a

question

What is the gold standard method for understanding causality?

randomized experiment

sample_n()

randomly selects rows from a data set

Which function allows you to import a .csv file into R?

read_csv()

Which of the following are typically used to visualize the relationship between two numeric variables?

scatterplot linegraph

skim ()

view summary information for each variable

explanatory variable

x

What is the correct syntax for the "pipe" operator?

%>%

What operator allows you to keep adding layers to a ggplot() object?

+ or plus or plus sign or the plus sign

Match the following symbols with their appropriate value from the GSS proportions example. ππ ˆππ^ αα nn p−value

0.29 0.32 0.1 100 0.6

ggplot2

data visualization

select()

identify columns of a dataframe to keep

filter ()

identify rows of a dataframe to keep

readr

import data files such as .csv

readr

importing data files such as .csv

gapminder

includes datasets for analysis

nycflights13

includes datasets for analysis

view ()

look at raw data

Which of the following arguments successfully removes missing values before computing a numerical summary (e.g. a mean, standard deviation, etc)?

na.rm = TRUE

filter()

only keep rows that meet a criteria

each type of observational unit forms a

table

What is the name of the "umbrella" package that includes ggplot2, dplyr, readr, etc.?

tidyverse

debt_model_data <- credit_ch6 select(debt, credit_limit, income) %>% (debt_hat = , residual = ) %>% rownames_to_column("ID")

%>% mutate fitted (debt_model) residuals (debt_model)

Fill in the blank of the code that creates a new data frame subsetted to include only hourly temperatures at Newark airport for January 1-15. early_january_weather <- weather filter( == "EWR" & month == & <= 15)

%>% origin 1 day

What degrees of freedom are used to compute the p-value in the Tennessee STAR hypothesis test?

135

Which of the following graphs can be used to visualize a single variable? Select all that apply.

barplot boxplot histogram

linegraphs

geom_line()

What's the name of the online forum the authors mention with an extensive community of R users that you can consult for code troubleshooting?

stackoverflow

independent variable

x

Exploratory research typically starts with

data

predictor variable

x

Which of the following lines of code will store the value 30 in the object x?

x <- 15 + 15 x <- 6*5 x = 30

dependant variable

y

outcome variable

y

What is the default value for the na.rm argument in R?

FALSE or F

Which of the following can you conclude from the fact that the correlation coefficient between score and bty_avg is 0.187? Select all that apply.

The slope of the best fit line for score vs. bty_avg is positive. As a bty_avg decreases, score also tends to decrease.

summarize()

compute the mean of a variable

tidyr

converting data to tidy format

mutate()

create a new variable

What does any ONE row in this flights dataset refer to?

data on a flight

dplyr

data wrangling

What is the name of the primary package we will use for data wrangling?

dplyr

scatterplots

geom_point()

Which of the following are TRUE of a linear regression model?

it can include more than one explanatory variable, x an explanatory variable can be numerical or categorical

What is the name of the commonly-used modeling technique that is the focus of Chapter 5?

linear regression

Which of the following does ggplot() expect to be listed as the second argument by default?

mapping

According to the Tennessee STAR experiment, if small class size has no effect on student math scores, what is the probability that you would observe a difference in test scores greater than ±19.46±19.46 (i.e. ±2.49±2.49 standard errors)? Round to 3 decimal places, and give your answer as a probability between 0 and 1, not as a percentage.

0.014

Match the following values from the Tennessee STAR experiment to their appropriate interpretation.

19.46- The average difference in math scores between those in small vs. regular classes 7.82- A measure of precision of the estimated treatment effect of class size on math scores 2.49- How far away b1b1 is from the null value, in standardized (i.e. standard error) units 1.4- The percentage of samples in which we would expect to see a difference in scores as large as we did, if there is no true effect of class size on math scores 0.01- The false positive rate we are willing to tolerate 0- The hypothesized value for the treatment effect

What is the margin of error for the confidence interval computed for the lego sets example? Round to two decimal places. Hint: recall from Section 10.3 that the general formula for a confidence interval is Estimate±± Margin of Error. In other words, a confidence interval is computed by [Estimate - Margin of Error, Estimate + Margin of Error].

2.16

debt_model <- lm(debt ~ credit_limit + income, data = credit_ch6) How many quantities does this model estimate? That is, how many values will there be in the Estimate column of the regression table for this model?

3

summary_temp <- weather %>% summarize(mean = mean(temp), std_dev = sd(temp)) assignment operator name of the new summary data frame pipe operator name of a new summary variable name of data frame to be summarized

<- summary_temp %>% std_dev weather

Which of the following arguments in geom_histogram() will make the inside (as opposed to the outline) of the bars blue?

fill = "blue"

Which line of code will successfully create a new data frame flights_500mi containing only flights that travel at least 500 miles?

flights_500mi <- flights %>% filter(distance >= 500)

What is the name of the dplyr function that allows you to transform an un-tidy data frame in wide format into a tidy data frame in long format?

gather () or pivot_long()

Fill in the blank of the code that created this plot. ggplot(data = weather, mapping = aes(x = temp)) + geom_(color = "white") + (~ month)

geom_histogram facet_wrap (~month)

What is the name of the geometric layer that you would add to create a histogram when using ggplot2?

geom_histogram()

ggplot(credit_ch6, aes(x = , y = )) + geom_() + (x = , y = , = "Debt and income") + (method = "lm", )

income debt point labs "Income (in $1000)" "Credit card debt (in$)" title geom_smooth se=FALSE

Match the features of a boxplot with the information they display about the data - dots - whiskers - box - length

outliers lines extending from the box to points less than the 25th percentile or greater than the 75th percentile 1st quartile, median, 3rd quartile (i.e. the middle 50% of the data) interquartile range (i.e. a measure of the spread of the data)

what is a statistical graphic

statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects.

summarize ()

statistics

What is the name of the theoretical framework for data visualization that is the foundation of the ggplot2 package?

the grammar of graphics


Ensembles d'études connexes

Algebra Unit 0 & 1 Semester 1 Exam Review Solving Equations and Inequalities

View Set

8.1 The Structure of Financial Markets and Financial Assets

View Set

Supply and Demand: Theory: End of Chapter

View Set

Chapter 52: Assessment of the Gastrointestinal System

View Set