Chapter 5

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is the difference between a model of data and a model of the DGP?

Both of the above are differences.

Mean of a sample

𝑌⎯⎯⎯⎯

Error around a sample model

𝑒𝑖

Mean of a population

𝜇

Error around a population model

𝜖𝑖

When we use the mean as a model, why do we call it a "parameter estimate"?

Because we can't calculate the mean of the DGP, we must estimate it.

# get the favstats for Thumb # get the favstats for Residuals # if you decide to save them, make sure to print them out

favstats(~Thumb, data = TinyFingers) favstats(~Residual, data = TinyFingers)

When we ran this R code to fit the empty model to our data for Thumb length from the full Fingers data set, what was the number 60.1? (Check all that apply.)

A statistic A parameter estimate

How is the mean the "middle" of the distribution? (Check all that apply.)

D) The mean balances the amount of error below and above it. E) The deviations below and above the mean always sum to 0.

# modify this to fit the empty model of Thumb Empty.model <- # this prints the best-fitting number Empty.model # save the favstats for Thumb (this is helpful for drawing a line) Thumb.stats <- # make a histogram of Thumb and draw the line for the mean gf_histogram() %>% gf_vline(xintercept = )

Empty.model <- lm(Thumb ~ NULL, data = Fingers) Empty.model Thumb.stats <- favstats(~Thumb, data = Fingers) gf_histogram(~Thumb, data = Fingers) %>% gf_vline(xintercept = ~mean, data = Thumb.stats)

# this code from before fits the empty model for Fingers Empty.model <- lm(Thumb ~ NULL, data = Fingers) # generate predictions from the Empty.model Fingers$Predicted <- # generate residuals from the Empty.model Fingers$Residual <- # this prints out 10 lines of Fingers head(select(Fingers, Thumb, Predicted, Residual), 10)

Empty.model <- lm(Thumb ~ NULL, data = Fingers) Fingers$Predicted <- predict(Empty.model) Fingers$Residual <- resid(Empty.model) head(select(Fingers, Thumb, Predicted, Residual), 10)

How is the median the "middle" of the distribution? (Check all that apply.)

If all data points in the distribution are arranged in order, there are an equal number of data points below and above the median.

Look at the printout of thumb lengths and predicted thumb lengths for each student. Which students' actual scores were closest to those predicted by the model? (Check all that apply.)

Student 3 and 4

Examine the equation above. Which verbal statement best describes the meaning of the equation?

The sum of the deviations of each person i's score, from 1 to n, is equal to 0.

# modify this to save the predictions from the TinyEmpty.model TinyFingers$Predicted <- # this prints TinyFingers TinyFingers

TinyFingers$Predicted <- predict(TinyEmpty.model) TinyFingers

# modify this to save the residuals from the TinyEmpty.model TinyFingers$Residual <- # this prints TinyFingers TinyFingers

TinyFingers$Residual <- TinyFingers$Thumb - TinyFingers$Predicted TinyFingers

# modify this to save the residuals from the TinyEmpty.model (calculate them the easy way) TinyFingers$easyResidual <- # this prints TinyFingers TinyFingers

TinyFingers$easyResidual <-TinyFingers$Thumb - TinyFingers$Predicted TinyFingers

The whole data set is just six observations. Make a histogram of the distribution of six thumb lengths (Thumb). Add in a blue line to show where the mean is. # modify this to save favstats for Thumb length TinyThumb.stats <- # modify this to draw a vline representing the mean in "blue" gf_histogram(~Thumb, data = TinyFingers) %>% gf_vline()

TinyThumb.stats <- favstats(~Thumb, data = TinyFingers) gf_histogram(~Thumb, data = TinyFingers) %>% gf_vline(xintercept = ~mean, color = "blue", data = TinyThumb.stats)

We run a study and calculate a parameter estimate, b0b0 = 25. We decide to run the study again, and again find a parameter estimate of b0b0 = 25. Based on these two studies, what do we know about the population parameter β0β0?

We know that 25 is our best estimate of β0, but we can never be sure of the true population parameter.

Should 60.1 be represented in the GLM notation as b0 or β0?

b0

# modify this code to make a histogram of Age in the MindsetMatters data frame gf_histogram(~ , data = MindsetMatters) # save the favstats for Age to Age.stats # print out the contents of Age.stats

gf_histogram(~Age, data = MindsetMatters) Age.stats <- favstats(~Age, data = MindsetMatters) Age.stats

For each of these variables, make histograms and get the favstats(). For each distribution, which do you think is a better one-number model? The median or the mean? # modify this code to make a histogram of GradePredict gf_histogram(~ , data = Fingers) # save the favstats for GradePredict GradePredict.stats <- # this code will print out the favstats GradePredict.stats

gf_histogram(~GradePredict, data = Fingers) GradePredict.stats <- favstats(~GradePredict, data = Fingers) GradePredict.stats

# modify this code to make a histogram of Thumb gf_histogram(~ , data = Fingers) # save the favstats for Thumb Thumb.stats <- # write code to print out the favstats

gf_histogram(~Thumb, data = Fingers) Thumb.stats <- favstats(~Thumb, data = Fingers) Thumb.stats

# make a histogram for Thumb # make a histogram for Predicted # make a histogram for Residual

gf_histogram(~Thumb, data = Fingers) gf_histogram(~Predicted, data = Fingers) gf_histogram(~Residual, data = Fingers)

Try modifying this code to draw a green line for the median using the favstats you've saved in outcome.stats. # We saved the favstats for outcome to outcome.stats # Modify this to draw a vline representing the median in "green" gf_histogram(~outcome, data = tinydata) %>% gf_vline(xintercept = ~mean, color = "blue", data = outcome.stats)

gf_vline(xintercept = ~median, color = "green", data = outcome.stats)

In R, these and other statistics are very easy to find with the function favstats(). Create a variable called outcome and put in these numbers: 5, 5, 5, 10, 20. Then put that variable into a data frame called tinydata. Finally, run the favstats() function on the variable outcome. # Modify this line to save the numbers to outcome outcome <- c() # Put outcome into the tinydata data frame tinydata <- data.frame() # This will give you the favstats for outcome favstats(~outcome, data = tinydata)

outcome <- c(5, 5, 5, 10, 20) tinydata <- data.frame(outcome) favstats(~outcome, data = tinydata)

Once we have outcome.stats saved, which of these lines of codes would return the median?

outcome.stats$median

Describes a population

parameter

Describes a sample

statistic


Conjuntos de estudio relacionados

Organizational Management Midterm

View Set

Exam 2 - Chapter 25 Negotiable instruments

View Set

chpt 2THE BEGINNINGS From Conception to Birth

View Set

ABLLS-R H38. Answers questions containing two critical stimuli

View Set

BUS MANAGEMENT "YOUR PACE" FINAL

View Set