Chap 5
5. How would you create a plot to look at the distribution of Distance?
gf_histogram(~ Distance, data = BikeCommute)
9. If you create an empty model of TopSpeed, what would it mean to have an "empty model"?
The model would include only mean TopSpeed.
5. What would be true about the empty model for Fat?
The model would make the same prediction (the mean of Fat) for every person regardless of their values on other variables.
13. If the mean of TopSpeed is 33.6 and a given observation has a TopSpeed of 23.6, what is the residual?
-10
If you print the residuals for StudentSurvey.modelGPA, what will you see?
-For each participant in the study, the difference between his/her GPA and the mean GPA
What is the difference between a model of data and a model of the DGP?
-Our certainty about each model's accuracy is different. We can be certain about our sample model statistics, but never truly know the parameters in the model of the DGP. -What we want is a good model of the DGP. Because we can't develop one directly, we must instead develop a model based on sample data.
Why is the mean unbiased? (Check all that apply.)
-The errors in both directions from the mean are equal. -The sum of the differences on the left of the mean is always the same as that on the right of the mean. -The mean is not more influenced by bigger values than by smaller values.
How is the mean the "middle" of the distribution? (Check all that apply.)
-The mean is a balancing point for the error above and below the mean. -The deviations above the mean are equal to the absolute deviations below the mean.
How is the median the "middle of the distribution? (Check all that apply.)
-There are an equal number of data points above and below the median. -The number of values above the median are equal to the number of values below the median.
18. What notation can be used to represent the mean of the population?
-β0 -𝜇
17. What notation cannot be used to represent the mean of the sample?
-𝑌𝑖 -𝛽0 -𝜇
7. The mean of Alcohol is 3.279 per week. A particular patient consumes 2 drinks per week. Which of the following represents the residual for this patient under the empty model?
-𝑌𝑖 −𝑏0 -2-3.279 -𝑒𝑖
12. Consider the idea that DATA = MODEL + ERROR. If the mean of TopSpeed is 33.6 and a given observation has a TopSpeed of 23.6, what is the data?
23.6
11. If the mean for TopSpeed is 33.6, what will the empty model predict for each observation's TopSpeed?
33.6
8. Which of the following cannot be calculated from the NutritionStudy data set?
A parameter
When we use the mean as a model, why do we call it a "parameter estimate"?
Because we can't calculate the mean of the DGP, we must estimate it.
4. Which of the following word equations represents the hypothesis that smoking explains variation in fat consumption.
Fat consumption = Smoking + other stuff
3. The average Distance of this person's bike commute is just over 27 miles. Imagine that you've discovered that one of your observations has been recorded incorrectly. Instead of a distance of around 27 miles, the distance for one of the commute has been entered as 54 miles! You make the correction to your data frame. How will the correction affect the mean?
The mean will be lower because of the correction.
4. How might you expect this correction (changing 54 back to 27) to affect the mean and the median?
The median will be affected less than will the mean.
2. Let's say we wanted to write a word equation to explain the variation in the time it takes to bike to work. We think that the Distance of a person's commute is an important explanatory variable. What would the word equation look like?
Time = Distance + other stuff
14. Imagine you make three histograms: one for TopSpeed, one for the predicted values based on the empty model for TopSpeed, and one for the residuals. Which two distributions will have a similar shape?
TopSpeed and residuals
We run a study and calculate a parameter estimate, 𝑏0b0 = 25. We decide to run the study again, and again find a parameter estimate of 𝑏0b0 = 25. Based on these two studies, what do we know about the population parameter 𝛽0β0?
We know that 25 is our best estimate of 𝛽0β0, but we can never be sure of the true population parameter.
10. What R code would you use to fit the empty model for TopSpeed?
lm(TopSpeed ~ NULL, data = BikeCommute)
Describes a population
parameter
Describes a sample
statistic
Mean of a population
μ
Mean of a sample
𝑌⎯⎯⎯⎯
6. The mean of Alcohol is 3.279 drinks per week. A particular patient consumes 2 drinks per day. Which of the following symbols would be used to represent the value 2 in the notation of the General Linear Modal?
𝑌𝑖
Error around a sample model
𝑒𝑖
Error around a population model
𝜖𝑖