Final Review

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

What is the word equation that represents the hypothesis that smoking explains variation in fat consumption.

Fat consumption = Smoking + other stuff

If the confidence interval for β1 is .9547 m/sec plus or minus 0.118 m/sec, how big is the standard error of the sampling distribution of b1? A Roughly .118 divided by 2 B Roughly .118 divided by square root of 12 C Roughly .9547 divided by 2 D Roughly .9547 divided by square root of 12

A Roughly .118 divided by 2

When you add an explanatory variable to your model, what should be the effect on the Sum of Squares from the empty model? A It should remain unchanged. B It should go up. C It should go down. D It depends on how much variation is accounted for by the explanatory variable.

A It should remain unchanged

What will the following code do? xqt(.025, df = 999) A Return t critical for a sample size of 1000 B Return a square root C Return the confidence interval D Return the percentage of data points that fall below .025, given df = 999

A Return t critical for a sample size of 1000

If we add more mice to the study, which of these would not be affected? A) β0 B) b0 C) Y D) n

A) β0

If you created a bootstrapped sampling distribution of 10,000 means from your sample of SpeedUp, what qualities would you expect it to have? A) A roughly normal shape, and a standard deviation similar to the standard deviation of the sample B) A roughly normal shape, and a standard deviation smaller than the standard deviation of the sample C) A mean similar to the sample mean, and a standard deviation similar to the standard deviation of the sample D) A shape similar to that of the sample, and a standard deviation smaller than the standard deviation of the sample

B A roughly normal shape, and a standard deviation smaller than the standard deviation of the sample

What will the following code do? resample(Wetsuits, 12) A) Take a new sample from the population of Wetsuits B) Create a new sample from the observations in Wetsuits C) Create a new sample from Wetsuits that is the same as the original sample D) Select a random observation from the 12 observations in Wetsuits

B Create a new sample from the observations in Wetsuits

Which of the following is the correct interpretation of PRE (0.98) in the supernova table above? A 98% of the Wetsuit velocities in the data frame can be predicted with the corresponding NoWetsuit velocity. B 98% of the SS from the empty model can be explained by adding NoWetsuit to the complex model. C 98% of the NoWetsuit model can be proportionally reduced by the empty model. D The NoWetsuit model's SS Total will be 98% of the SS Total from the empty model.

B 98% of the SS from the empty model can be explained by adding NoWetsuit to the complex model.

What's the value in using the t distribution? A) It's less variable than the normal distribution. B) It works well as a model of the sampling distribution if the sample size is small, or standard deviation of the population is unknown. C) It helps us determine the degrees of freedom from our data. D) It works well as a model of the population if the sample size is small, or standard deviation of the population is unknown.

B) It works well as a model of the sampling distribution if the sample size is small, or standard deviation of the population is unknown.

If you've calculated the variance for WgtGain4, what have you found? A) Roughly the total squared residual from the empty model, in squared grams B) Roughly the average squared residual from the empty model, in squared grams C) Roughly the average residual from the empty model, in grams D) The sum of the residuals from the mean

B) Roughly the average squared residual from the empty model, in squared grams

When Pulse3Group is included in our model to explain variation in Exercise, how is error from this more complex model calculated? A) The deviation of each person's Exercise from the Grand Mean for Exercise B) The deviation of each person's Exercise from the mean Exercise of their Pulse3Group C) The deviation of each Pulse3Group's mean to the Grand Mean for Exercise D) None of the above

B) The deviation of each person's Exercise from the mean Exercise of their Pulse3Group

We can calculate the residuals from both the empty model and the complex model. What is similar about these two sets of residuals? A The values of the residuals from the empty model will be the same as the values of the residuals from the complex model. B The residuals represent the difference between the data and the model's prediction. C The residuals represent the difference between the data and the Grand Mean. D In both cases, the residuals can reduced to near 0 simply by being careful with measurement and data entry.

B) The residuals represent the difference between the data and the model's prediction.

Wetsuiti = b0 + b1NoWetsuiti + ei If the confidence interval for β1 is .9547 m/sec plus or minus 0.118 m/sec, which of the following is NOT a correct interpretation? A We are 95% confident that the true slope of the DGP will be in this range. B There is a 95% chance that if you repeated this experiment with a different set of swimmers, the slope of the regression line will fall within this confidence interval. C 95% of all Wetsuit velocities have this relationship with the NoWetsuit velocity. D The true parameter (β1) will very likely fall inside this interval.

C

Which of the following is the correct interpretation of MS Total (297,230) in the supernova table above? A) This is, roughly, the total number of points in the data frame. B) This is, roughly, the total number of squared means based on the empty model. C) This is, roughly, the average squared residual from the mean. D) This is, roughly, the standard deviation from the mean.

C

What would the sampling distribution of means look like for samples of n=1? A It would be normal, regardless of the shape of the population distribution. B It would have the same shape as the population distribution, but a smaller standard deviation. C It would have the same shape and standard deviation as the population distribution. D It's not possible to tell based on the information given.

C It would have the same shape and standard deviation as the population distribution.

What would the interpretation of MS Total be? A) It is, roughly, the total number of ratings in the data frame. B) It is, roughly, the total number of squared means based on the empty model. C) It is, roughly, the average squared residual from the Grand Mean. D) It is, roughly, the standard deviation around the Grand Mean.

C It is, roughly, the average squared residual from the Grand Mean.

LikeMi = b0 + b1AttractiveMi+ei Which of the following is an INCORRECT interpretation of the confidence interval for β1 in this model? A We are 95% confident that the true slope of the DGP will be in this range. B There is a 5% chance that this interval does not contain the true slope of the DGP. C 95% of all LikeM ratings have this relationship with AttractiveM ratings. D The true parameter (β1) will very likely fall inside this interval.

C 95% of all LikeM ratings have this relationship with AttractiveM ratings.

You're interested in females' ratings of males' intelligence. You simulate 500 samples of 276 ratings, calculate the mean of each sample, and plot the resulting distribution of means in a histogram. What will be the mean of this sampling distribution? A) The mean of the original sample B) The mean of the population or DGP C) Whatever mean you set when you ran the simulation D) Can't tell from the information given.

C Whatever mean you set when you ran the simulation

If you want to know if a regression model is better than a simple model in terms of making a prediction, what parameter should you make a sampling distribution of? A The mean B The standard deviation C The confidence interval D The slope

D

Imagine that you have both the empty model for Exercise and the complex model for Exercise (i.e., the model that includes Pulse3Group). What would you do if you wanted to compare how well they predict Exercise? A Compare the SS from each model B Look at the reduction in error in the Pulse3Group model C Examine the PRE D Any of the above

D

Imagine we drew two random samples from a population, and measured each case sampled on the same outcome variable. One sample had an n=30, the other an n=60. Which of the following statements is true? A The mean of the larger sample would be greater than the mean of the smaller sample. B The standard error of the larger sample would be greater than the standard error of the smaller sample. C The means of these two samples could be thought of as coming from the same sampling distribution. D The sum of squares of the larger sample would almost certainly be greater than the sum of squares of the smaller sample.

D

The F value for this model in the table above is .02. What does this F ratio tell us? A We should reject the empty model because this value is lower than .05. B 2% of the SS total is explained by the AgeM model of FunM. C There is an 88% chance that the slope of the DGP is equal to 0. D None of the above.

D * The variance explained by the model is 2% times larger compared to the variance left unexplained

The sum of squares gets larger as: A: The variation increases B:The sample size increases C: The spread of the distribution increases D:All of the above

D The sum of squares gets larger as: A: The variation increases B:The sample size increases C: The spread of the distribution increases

Imagine we drew two random samples from a population, and measured each case sampled on the same outcome variable. One sample had an n = 30, the other an n = 60. Which of the following statements is true? A) The mean of the larger sample would be greater than the mean of the smaller sample. B) The standard error of the larger sample would be greater than the standard error of the smaller sample. C) The means of these two samples could be thought of as coming from the same sampling distribution. D) The sum of squares of the larger sample would almost certainly be greater than the sum of squares of the smaller sample.

D The sum of squares of the larger sample would almost certainly be greater than the sum of squares of the smaller sample. *The mean of the larger sample would not be greater, the standard error would not be greater, and the means did not come from the same sampling distribution.

The mean _______ the sum of squares

minimizes

The mean is a model that spends only ______ degree of freedom and minimizes squared error.

ONE

If the mean of TopSpeed is 33.6 and a given observation has a TopSpeed of 23.6, what is the residual?

-10

Two types of variables

1)Quantitative(numbers) 2)Qualitative

SDoM <- do(10000) If you were to stack up all the bars, what would the total be?

10,000

If the mean of TopSpeed is 33.6 and a given observation has a TopSpeed of 23.6, what is the data?

23.6

What is standard error? A) The standard deviation of the sampling distribution of an estimate B) The average residual of a score from its model prediction C) The square root of the variance D) None of the above

A

If we fit an empty model to this data, how would we depict it on this scatterplot? A) A horizontal line that shows the mean for minutes played. B) A vertical line that shows the mean free throw percentage. C) A diagonal line that bisects the cloud of points. D) You would not be able to represent the empty model visually because it is a single number.

A

If we repeated this study but our sample size was larger and thus our standard error was smaller, what would be different about the confidence interval? A It's likely that the 95% CI of b1 would be smaller. B It's likely that the 95% CI of b1 would be larger. C There is no way to tell because standard error is not related to confidence intervals. D The confidence interval would stay the same as long as the confidence level is the same.

A

If you increase your sample size in a study, how does it affect the 95% confidence interval around a parameter estimate? A It would make the confidence interval narrower. B It would make the confidence interval wider. C It would increase your level of confidence. D It would not have any of these effects.

A

If you use shuffle() to create a randomized sampling distribution of b1 (a group difference) based on a sample of data, what will be the mean of the resulting sampling distribution? A 0 B The mean of your sample C The true mean of the population D Whatever you decide for it to be

A

In Yi = 10.38 - .85X1i - 3.14X2i + ei what does X1i stand for? A) Whether someone is in the medium pulse group or not B) The number of members in the medium pulse group C) The intercept for Pulse3Groupmed D) Whether someone is in the low or medium or high group

A

Which of the following would have the same exact value? A) Population mean and sampling distribution mean B) Standard deviation of the sample distribution and standard deviation of the sampling distribution C) Sample mean and population mean D) Sum of squares of the sample and standard error of the mean

A Population mean and sampling distribution mean

Imagine that you've calculated SS for both the empty model and the complex model for Exercise. What will be true about these SS? A SS leftover from the empty model will be greater than the SS leftover from the complex model. B SS leftover from the empty model will be smaller than the SS leftover from the complex model. C SS leftover from the empty model will be equal to the SS leftover from the complex model. D In both cases the SS will be 0 because the residuals are balanced by the mean.

A SS leftover from the empty model will be greater than the SS leftover from the complex model.

Which distribution would you use to create a confidence interval around a parameter estimate? A A sampling distribution B A sample distribution C A population distribution D None of these

A A sampling distribution

What kind of distribution would this code create? do(10000) * b1(Wetsuit ~ NoWetsuit, data = resample(Wetsuits, 12)) A) A sampling distribution of bootstrapped slopes B) A sampling distribution of means C) A sampling distribution of the mean difference between Wetsuit and NoWetsuit D) The population distribution that our sample could have come from

A A sampling distribution of bootstrapped slopes

The mean of Alcohol is 3.279 per day. A particular patient consumes 2 drinks per day. Which of the following represents the residual for this patient under the empty model? A: Yi-bo B: 2-3.279 C: ei D: All of the abova

ALL OF THE ABOVE

What notation can be used to represent the mean of the population?

Bo u

If you bootstrap a sampling distribution based on your sample of data, what will be the mean of the bootstrapped distribution? A 0 B The mean of your sample C The true mean of the population D Whatever you decide for it to be

B

If you fit a model that predicts Mins by including FTMade as an explanatory variable, how many parameters would the model have? A) 2: Mins and FTMade B) 2: the y-intercept and the slope of the regression line C) 2: the mean of Mins and the increment added for each free throw made that exceeds the mean number of free throws made D) Yi, b0, b1, Xi

B

PRE= .0536 A There is a .05 chance that we have made a truly explanatory model. B .05 of the total variation in exercise hours is explained by the pulse groups. C .05 of the sample has a relationship between exercise hours and pulse groups. D .05 of the complex model's sum of squares can be explained by Pulse3Group.

B

What does the PRE of .60 mean? A) There is a .60 chance that this explanatory variable helps us make better predictions of the outcome variable. B) 60 of the sum of squares from the empty model is explained by the Light groups. C) .60 of the sample has a relationship between WgtGain4 and Light groups. D) .60 of the sum of squares from the Light.model is explained by the Light groups.

B

Why does the table show a smaller sum of squares srror (73.78) than sum of squares total (186.28)? A) SS total should actually be smaller than the SS error. This must be an error in the code. B) SS total is based on residuals from the Grand Mean. SS error is based on residuals left over after some of the total variation is explained by the difference in group means. C) SS total depends on variation in the outcome variable (how much weight was gained). SS error depends on variation in the explanatory variable (whether the mouse is the LL or LD group). D) SS total is larger because the Light model it is calculated from is more complex. SS error is smaller because it is calculated from the more simple empty model.

B

You fit a regression model, then construct a 95% confidence interval for the estimate of β1. If the confidence interval includes 0, what does this mean? A It suggests we should reject the empty model and stay with the complex model. B It suggests we should retain the empty model. C It means that the true value of β1is 95% likely to be 0. D It means that β0could be 0.

B

Which of the following is the correct interpretation of PRE (0.65) in the supernova table above? A) 65% of the players' minutes in the data frame can be predicted with their Points. B) 65% of the SS from the empty model can be explained by adding Points to the complex model. C) 65% of the Points model can be proportionally reduced by the empty model. D) The Points model's SS total will be 65% of the SS total from the empty model.

B B) 65% of the SS from the empty model can be explained by adding Points to the complex model.

The mean does what

BALANCES THE DEVIATIONS ABOVE AND BELOW THE MEAN

What can you now say? WgtGain>10 True= 0.222222 False= .777778 A) Approximately 22% of mice gained more than 10 grams of weight. B) If another mouse were randomly selected and added to this data set, the likelihood that it would gain more than 10 grams would be 22%. C) Both of the above D) None of the above

C

Above we have included the ANOVA tables for two models: wt = age + other stuff and wt = smoke.factor + other stuff. Why do these two models have the same value for SS total (25356)? A) Because both SS totals are based on the residuals from the empty model. B) Because both SS totals are based on the same outcome variable. C) Because both SS totals are based on the values from the same data set. D) All of the above reasons together explain why the SS totals are the same.

D

If the sampling distribution of means is normal, the underlying population distribution is: A Probably normal B Positively skewed C Negatively skewed D Impossible to tell

D

If you use lm() to fit the empty model for LikeM, and then use confine() to find the confidence interval, what does the confidence interval tell you? A It gives you a range of possible β1s that could have generated your sample. B It gives you a range of possible β0s that could have generated your sample. C It gives you a range of possible μs that could have generated your sample. D Both B and C are correct.

D

The P value refers to the sampling distribution of ______ based on the empty model

F

If the z score for your friend's car's highway miles per gallon is found to be .6, what does that mean?

The car's highway miles per gallon is .6 standard deviations larger than the mean for hwy.

What are variables?

In the columns

What are values?

In the rows

What were to change more if we were to exclude the maximum value? A: Mean B: Median C: All would change alot

Mean

What is the IQR?

Middle 50% of data Q3-Q1

Residual is the Data - __________

Model

Can you tell the likelihood of a single baby having a particular birth weight from a sampling distribution of means?

No, you cannot tell the liklihood of a single baby having a particular birth weight from a sampling distribution of means

which distance would be used to calculate the sum of squares error? The data point to the _________.

PREDICTION

Residual definition

Residual is the difference between the Data Point and the Predicted Score

SS Model will be _______ than SS Total because SS Model is always ________than SS Total, whether a model explains a lot of variation or not.

SMALLER

What is the variation

Sum of Squares/Degrees of Freedom **Also knows as mean squared

What's true of the distribution of any variable, if your model is the mean of that variable?

The distribution of the variable is the same shape as the distribution of its residual.

Imagine you make three histograms: one for TopSpeed, one for the predicted values based on the empty model for TopSpeed, and one for the residuals. Which two distributions will have a similar shape?

Topspeed and residuals

The F ratio shows that the model explains ________ per degree of freedom

VARIATION

What is the unit of observation?

What we are describing

The mean of Alcohol is 3.279 drinks per day. A particular patient consumes 2 drinks per day. Which of the following symbols would be used to represent the value 2 in the notation of the General Linear Modal?

Yi

The mean of TopSpeed is 33.6 and a given observation has a TopSpeed of 23.6. What part of this GLM notation represents 23.6?

Yi

In GLM notation, which of he following represents the model (or prediction)?

b0

If we express our model as Yi = b0 + b1X1i + b2X2i + ei which part represents the model's prediction for Exercise?

b0 + b1X1i + b2X2i

For the empty model would the model make the same prediction (the mean of Fat) for every person regardless of their VALUES on other variables.

yea The model would make the same prediction (the mean of Fat) for every person regardless of their values on other variables.


Kaugnay na mga set ng pag-aaral

LUOA Test: Absolutism, Reason & Revolution (9th Grade)

View Set

Government - Chapter 13 the courts, Government Chapter 15 : The Bureaucracy, Govt. Chapter 16: Domestic Policy, Government Chapter 17: Foreign Policy

View Set

2.1 Expressing numbers book notes

View Set

Dosage Calculation 3.0 Pediatric Medications Test

View Set

Penny Chapter 19 - The Menstrual Cycle

View Set

SEC B-2 Types of Short-term Credit

View Set