Chapter 7

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

in the empty model

b0 + b1 is the grand mean Y^

second interpretation of 𝑏0+𝑏1𝑋𝑖

b0 represents the mean for females

How would Xi be coded in order for this model to make sense?

0 for short and 1 for tall

Why do all three females have the same score (59) for Sex.predicted?

59 is Sex.model's prediction for any female's thumb length

Why does everyone have the same score for Empty.pred?

62 is TinyEmpty.model's prediction for every person's thumb length.

What is the F ratio?

A sample statistic

Why don't all three females have the same residual for Sex.resid?

All the females in the data frame had different thumb lengths, so the model is wrong for each of these females in a different way.

Reducing Error

Although we can improve model fit by adding parameters to a model, there is always a trade-off involved between reducing error (by adding more parameters to a model), on one hand, and increasing the intelligibility, simplicity, and elegance of a model, on the other.

Xi

Because X represents the variable Sex and different people have different values of Sex. Because "sub i" is for variables (like Thumb and Sex and error) that have different values for different individuals. Because the parameters are the same for every single person.

Above, in the histogram of the residuals (in sky blue), why are the means of Sex.resid for the two groups equal to 0?

Because even the mean of a bunch of residuals is still a mean. Means always balance the residuals. Adding them up will equal 0.

Why would we be unlikely to get the same F ratio?

Because the F ratio is a statistic, just like the mean or PRE. A different sample will likely result in a different statistic.

Why do you think the sum of squares for the empty model is greater than that for the Sex model?

Because the Sex model explains more variation, there is less left over variation Because the empty model explains less variation, so there is more left over variation

In this supernova table, why is the df for the row that says "Total" equal to 156?

Because the empty model uses 1 degree of freedom and we started with 157 data points.

Why is SS total the same for both models

Because the outcome variable is Thumb and the empty model of Thumb (the Grand Mean) is the same, so the total error from the empty model is the same. Because we are using the same data set (Fingers) and any models built from any of the variables in Fingers will have the same SS total.

Above, in the histogram of the residuals (in sky blue), why are the means of Sex.resid for the two groups not different any more?

Because the residuals are what is left over after subtracting the means of the two Sex groups from the original thumb lengths. Because the residuals are what is left over after taking out the effects of Sex. Because the residuals are what is left over after accounting for variation in Sex. Because the residuals are what is left over after modeling the data with Sex.model.

Why is the sex.resid a -3?

Because this is the value of 56 - 59 Because this is the value of actual thumb length minus predicted thumb length for this person

Why do we leave out the error term when trying to predict?

Because we do not know how far off the new observation is from our model. We can only find out error in hindsight. Error is whatever is left over from the model. When we use our model to predict, we are focused on using the model. Because every person will deviate a bit from the model in different ways.

Compare the ANOVA tables for the Height3Group and Height2Group models. Why is the total SS the same for both models?

Both have the same outcome variable.

What does the resid() function in R do?

Computes the residuals for all the observations in the data frame. Takes the model and computes all the residuals, or leftover variation, from that model.

ei

Each person's residual (difference between their thumb length and the mean)

Yi

Each person's thumb length

TRUE or FALSE? PRE is a better measure of effect size than is the difference in means between two groups because it is not subject to sampling variation.

False

How do we quantify the residuals from the Sex.model?

For each person, Yi-Y^i For each person, observed minus predicted score

How do we quantify the residuals from the empty model? (Check all that apply.)

For each person, Yi-Y^i For each person, observed minus predicted score

Is the deviation of 𝑌1Y1 greater from the empty model or from the Sex model?

From the empty model, because the distance from the Grand Mean is greater than the distance from the mean for females.

What do you notice about these pairs of histograms?

In the Thumb histograms, it seems like the males are shifted higher than the females, but that is not true of the Sex.resid histograms.

If we took another sample of 157 students and asked them for measurements of thumb and height, would we get the same F ratio?

No, this is unlikely.

Look at the ANOVA table for the Height3Group model. What proportion of variation in thumb length is explained by this model?

PRE (.1423)

What is the worst thing about PRE?

PRE does not tell us how much error has been reduced relative to how much complexity has been added to the model.

Which is true about PRE?

PRE ranges from 0 to 1. A proportion of the total cannot go beyond the total (i.e., over 1). PRE can be thought of as the percent of explained variation.

What is the difference between a parameter and a variable?

Parameters are fit from the data. The values of the variables are part of the data. Each person's thumb length is composed of the same parameters, and different values for the variables. The variables have a sub-i to indicate that this value varies for each person. Parameters do not have a sub-i because they are the same for each person.

What does this number (e.g., PRE = .66) mean? (Check all that apply.)

Proportion of error reduced from the empty model, compared to the Sex model Proportion of error explained by the Sex model, compared to the empty model

PRE

Proportional Reduction in Error the proportion of total variation in the outcome variable that is explained by the explanatory variable. SSmodel / SStotal SS Model in the numerator of the formula above represents the reduction in error when going from the empty model to the more complex model, which includes an explanatory variable

Let's think about the term "mean square." What does that mean in this picture?

Roughly, the average area of a blue square

SS Model

SS(Sex.model to Empty.model) we need to figure out how much error has been reduced by the Sex model in comparison to the empty model. This reduced error is represented by the distance of each person's predicted score under the Sex.model to their predicted score under the empty model.

SS Total

SS(Thumb to Empty.model)

SS Error

SS(Thumb to Sex.model)

Which model explains more variation in thumb length?

Sex.model

Why are these parameter estimates different from the ones we found for the TinyFingers data set? response - incorrect

Sex.model was estimated from a different set of data (Fingers) than TinySex.model.

What is the difference between Sex.resid and Empty.resid?

Sex.resid is the leftover variation from the Sex model, but the Empty.resid is leftover variation from the null (or empty) model. Sex.resid represents the residuals left after subtracting out Sex model, but Empty.resid is the residuals left after subtracting out the null (or empty) model.

Which of these ideas are true? Bigger area of the squares indicates a better model. Smaller area of the squares indicates a better model. Smaller area of the squares indicates better predictions. Bigger area of the squares indicates more error in our predictions. Smaller area of the squares indicates more error in our predictions.

Smaller area of the squares indicates a better model. Smaller area of the squares indicates better predictions. Bigger area of the squares indicates more error in our predictions.

Based on Sex.resid and Empty.resid, which model has more error around it?

The TinyEmpty.model​, because the residuals are greater in extent from that model.

Which of the following comparisons of TinySex.model and Sex.model are true?

The TinySex.model explained a greater proportion of variation of Thumb in TinyFingers than Sex.model explained of Thumb in Fingers Sex.model is a more reliable model because it is based on a lot more data. In general, we can create a better model of the DGP when we have more data than when we have less data.

b1 in interpretation 2

The amount that must be added to the mean for females to get the mean for males

How can you tell which model is a better fit to the data? Why do you think this model is better?

The better model has a larger PRE. The better model has a larger SS model. The better model has a small SS error (leftover error) This model produced more accurate predictions of Thumb, thus reducing the residuals.

How can you tell which is a better model?

The better model has a larger PRE. The better model has a larger SS model. The better model has a smaller SS error (leftover error)

The goal of making a statistical model isn't just to reduce error. We want to reduce error, sure, but what are other goals to keep in mind when building a model?

The model helps us understand something about the DGP. The model helps us make good enough predictions. The model balances simplicity and accuracy.

What does PRE mean?

The proportion of variation explained by the Height2Group.model.

In the output above, which row is the error left over after fitting the Sex model?

The row that says "Residuals"

Why is this PRE different from the results when using the tiny data set?

The sex.model is different because it was fit from a different set of data. The data that residual variation was calculated from were different in TinyFingers versus Fingers.

If you have two models for the same outcome variable, one more complex than the other, which will have the larger Sum of Squares Error?

The simple model

The red squares depict which of the following concepts?

The squared residuals from the Sex model The squared deviations from the means of each group

What is the pattern of means from favstats() across the three groups of Height3Group?

The taller groups tend to have longer thumbs.

What is SS total?

The total squared error of Thumb lengths from the grand mean The remaining SS from the Empty.model of Thumb length All of the SS left over from the Empty.model of Thumb length The SS Error from the Empty.model of Thumb length

The F ratio from our Height2Group model 11.66. Which is the correct interpretation?

The variance explained by the model is 11.66 times larger compared to the leftover variance unexplained by the model.

The red boxes depict the SS Error. Which of these accurately describes the visualization for the MS Error?

This is the average square drawn by squaring the deviations of the data from the group means.

By process of elimination, what is the appropriate visualization for the MS Model?

This is the average square drawn by squaring the deviations of the group means from the Grand Mean.

Why does it say "Height2Grouptall" in the output (rather than just "Height2Group")?

This is the increment you add on for the thumb length of someone in the tall group.

What is the mean square equivalent to?

Variance

Y^ = b0 + b1Xi What input does R want from you in order to compute this function? What is the output (or result) of this function?

Xi Y^

Examine the output above. Which variable Y^i is and which is Yi?

Y^i = sex.predicted Yi = thumb

𝑌𝑖=𝑏0+𝑏1𝑋𝑖+𝑒𝑖

Yi = thumb length of each person 𝑏0+𝑏1𝑋𝑖 = model's prediction of thumb length ei = error of each person from the prediction

Yi, X1i, X2i

Yi= the individual's thumb length X1i = whether the individual is in the medium group X2i = whether the individual is in the tall group

PRE units

a percentage or proportion

explanatory variable

a variable that we think explains or causes changes in the response variable

b1

deviation of the group mean from the grand mean

Yi=𝑏0+𝑏1𝑋𝑖+𝑒𝑖 What part(s) of this equation is/are going to be different for each person? What part(s) of this equation is/are going to be the same for each person?

different Yi Xi ei same b0 b1

Null model

each person's score would be modeled with Grand Mean

Data

each point represented as Yi can be thought of as having two components: mean thumb length for everyone GRAND MEAN + each person's residuals from the model, or error, represented by ei

b0

grand mean

Cohen's d

indicates the size of a group difference in standard deviation units. 𝑑=𝑌¯1−𝑌¯2𝑠

Error with explanatory variable

is calculated from each person's group mean (male or female) instead of from the Grand Mean

b0

mean of everyone's thumb length

Quantitative Modeling

means that numbers are used to describe or analyze a situation

Grand Mean

model, represented as b0 use it to make clear when we are referring to the mean for everyone in the sample

Mean difference units

original units

mean

point in the distribution that reduces the sum of squares to its lowest point the mean is a model that balances the deviations from the model and minimizes the sum of squared residuals.

What is the difference between the function that we made (TinySex.fun) and the predict() function?

predict() can only generate predicted scores for the data you used to create a model. The input to predict() is a model, and the input to TinySex.fun is a value of the explanatory variable.

orange circle

reduction in error that has been achieved by the Sex model in comparison to the empty model.

Sex, 𝑏1𝑋𝑖

represent the deviation of the group mean from the Grand Mean.

sex.resid

residual for each person by this subtraction: their observed score minus their score predicted by the model

error in interp 2

residual from predicted score under the model

Thumb Length = Sex + Other stuff

sex explains some of the variation in thumb length, but other things also affect thumb length

Cohens'd units

standard deviations

What would be another way to calculate this sum of squares?

sum(TinyFingers$Empty.residual^2)

F ratio

take into account the number of parameters it takes to realize the gains in PRE.

In the sex model

the predicted score, is the mean of female group or the mean of the male group

𝑀𝑆𝑇𝑜𝑡𝑎𝑙

to indicate the MS from the empty model depicted in the row labeled Total.

What is the formula for the F ratio?

𝐹=𝑀𝑆𝑀𝑜𝑑𝑒𝑙𝑀𝑆𝐸𝑟𝑟𝑜𝑟F=MSModelMSError

Reason by analogy to figure out how to calculate 𝑀𝑆𝐸𝑟𝑟𝑜𝑟MSErrorusing the second row of the supernova table labeled "Error (from model)."

𝑀𝑆𝐸𝑟𝑟𝑜𝑟 =𝑆𝑆𝐸𝑟𝑟𝑜𝑟 /𝑑𝑓𝐸𝑟𝑟𝑜𝑟

Which of the following can help us calculate 𝑀𝑆𝑀𝑜𝑑𝑒𝑙?

𝑀𝑆𝑀𝑜𝑑𝑒𝑙 =𝑆𝑆𝑀𝑜𝑑𝑒𝑙 /𝑑𝑓𝑀𝑜𝑑𝑒𝑙

General Linear Model notation:

𝑌𝑖=𝑏0+𝑏1𝑋1𝑖+𝑏2𝑋2𝑖+𝑒𝑖

Empty Model

𝑌𝑖=𝑏0+𝑒𝑖

What unit is the SS measured in?

𝑚𝑚2


Kaugnay na mga set ng pag-aaral

Chapter 7 History of R & R - What's That Sound?

View Set

Sherpath: Female Reproductive System

View Set

blackbox, important anecdotes #1

View Set

ISDS Exam 1 module 2 practice test

View Set

Success in CLS Ch. 11 UA & Body Fluids (67 q.)

View Set

9.2 RESOURCES PROCESS GROUP Estimate Activity Resources module questions

View Set

ARTICLE 250: Grounding & Bonding

View Set

Devices, Internet, and operating systems:

View Set