STAT 326 Final Study

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

interpretation for R-squared in SLR

___ % of the variation in (y) is explained by the linear relationship between (x) and (y)

interpretation for R-squared in MR

___% of the variability in (y) is explained by the multiple regression model, including (list all explanatory variables)

List these from smallest margin of error to largest margin of error a) A 99% confidence interval with a sample size of 50. b) A 99% confidence interval with a sample size of 25. c) A 90 % confidence interval with a sample size of 50. d) A 95% confidence interval with a sample size of 50.

C, D, A, B

what is a parameter

a numerical description of a population

what is a statistic

a numerical description of a sample

correlation coefficient ("r")

a numerical measure of the direction and strength of a linear relationship between 2 quantitative variables

How does the standard error for an individual observation compare to the standard error for a mean response?

it will always be larger

what does overfitting do to a model? What does it do to pi?What is the rule of thumb?

It reduces the predictive power and increase the pi. There must be at least 10 observations for the data

How is the "independence" assumption checked?

look at prompt, see if sampling method was simple random sample

extrapolation

making predictions that are outside our range of x-values used to fit the LS model

as sample size increases

margin of error decreases

coefficient of determination (R-squared) in MR

measures how well our linear model fits our data

Multiple Regression (MR)

more than one explanatory variable & a numerical response variable

How do you calculate DF in SLR?

n-2

How do you calculate DF in MR?

n-p-1 (p= number of slope coefficients)

How can we determine the value of the correlation coefficient (r) by only referring to the info from JMP?

need to figure out the direction of the relationship (can determine by looking at estimated slope value or observing the scatterplot). Also need to know R-squared value

What does the null hypothesis for the F-test imply?

none of the explanatory variables contribute to helping predict the response (model is NOT useful)

as you decrease the p to leave what happens to the number of vars?

The vars increase

True or false? If the sample mean of a random sample from an x distribution is relatively small, when the confidence level c is reduced, the confidence interval for μ becomes shorter.

True. As the level of confidence decreases, the maximal error of estimate decreases.

True or false? If the original x distribution has a relatively small standard deviation, the confidence interval for μ will be relatively short.

True. As σ decreases, E decreases, resulting in a shorter confidence interval.

true or false The value zc is a value from the standard normal distribution such that P(-zc < z < zc) = c.

True. By definition, critical values zc are such that 100c% of the area under the standard normal curve falls between -zc and zc.

True or false? Consider a random sample of size n from an x distribution. For such a sample, the margin of error for estimating μ is the magnitude of the difference between and μ.

True. By definition, the margin of error is the magnitude of the difference between and μ.

True or false? The point estimate for the population mean μ of an distribution is x, computed from a random sample of the x distribution.

True. The mean of the distribution equals the mean of the x distribution and the standard error of the distribution decreases as n increases.

calculate Margin of Error

Upper bound- Lower bound/ 2

population regression model

Uy= Bo + Bi* xi

what are some indicators of multicollinarity?

Vif >= 10, strong correlation between x's, opposite signs of the slopes then we expect to see. slope is only a issue if orginal correlation is mod or strong

interpretation of a CI for the mean response (Uy)

We are 95% confident that the MEAN (Y) will be between (lower bound) and (upper bound), for a given value of x

interpretation of a PI for a single response (y)

We are 95% confident that the actual response will be between (lower bound) and (upper bound), for a given value of x

interpretation of CI for a regression slope(B1) example

We are 95% that for every additional unit increase in x, the MEAN response will (increase/decrease) by between (lower bound) and (upper bound)

interpretation for RMSE

We expect approximately 95% of the actual observations to be within (2*RMSE) of their corresponding predicted values

Conclusion for F-test

We have/lack statistically significant evidence that this Multiple Regression model is useful in predicting the response (Y), under the alpha level of .05

interpretation of bo example

When the explanatory variable is equal to 0, the PREDICTED response is (bo) units

Sam computed a 90% confidence interval for μ from a specific random sample of size n. He claims that at the 90% confidence level, his confidence interval contains μ. Is this claim correct? Explain.

Yes. The proportion of all confidence intervals based on random samples of size n that contain μ is 0.90.

how does a mixture model work?

You begin with none selected like a foward model, give a p to leave and enter hopefully the same to avoid infanite loops, stop when removing a far would decrease the amount of info, or make the graph less precise.

what is are the ways to detect multiocollinearity l?

You can use a scatterplot matrix. Look at the matrix and see if two explanitory vars has a strong correlation. If they do you should be concerned. Another is to look at the output and see how close to one it is. You will now visually if you have a scatterplot matrix with small close bubbles

what does it mean if vif is above 10?

You have multicollinearity

What happens to the width of a confidence interval as we increase the level of confidence ("C")?

it gets wider

How does R-squared adjusted compare to R-squared?

it is always smaller

How does the width of a CI (for the mean response Uy) compare to the width of a PI (for a single future observation y)?

it is narrower

interpretation of bo in MR

when all explanatory variables are equal to 0, the PREDICTED response (y hat) is ___

formula for PI for a single observation (y)

y hat +/- t* SE y hat (JMP refers to SE as standard error of individual)

formula for CI for an average/mean response (Uy)

y hat +/- t*SEu hat (JMP refers to SE as standard error of predicted)

estimated/predicted model

y hat= bo + b1*x

does multiolinarity lead to misleding slopes and have redudant information?

yes

how is the SLR line formed?

-determined based on the optimal combination which will minimize the sum of absolute vertical distances between the observations & the regression line. (by *minimizing the summation of squared errors*)

what happens to R-squared when explanatory variables are ADDED to the model? (may not be meaningful)

-increases

How is the "normality of residuals/the error" assumption checked?

-normal quantile plot of residuals *want residual points to follow straight line and stay within the bands*

How is the "form" assumption checked?

-with residual plot (x-axis: predicted values, y-axis: residuals of response) *want to see both positive and negative residuals*

How is the "constant variance" assumption checked?

-with residual plot (x-axis: predicted values, y-axis: residuals of response) *want to see similar spread of residuals w/ no certain pattern*

What assumptions must hold in MR to make valid inferences?

1) form 2) constant variance 3) normality of residuals 4) independence

What assumptions must hold in SLR to make valid inferences?

1) form is linear 2) constant variance of residuals 3) normality of residuals 4) independence

how do you do the foward selection?

1)make slr for all the explanitory vars 2)do a hypothesis test to see if there is a linear association to the response 3)look for the smallest p value and make sure it's smaller then your p to enter 4)add that var to the model and make a mrm with that and the other vars 5)repeat until you smallest p value is bigger then the p to enter

how do you calc vif?

1/(1-R^2i) where r^2i is the coefficient determination of the model

what is the minimum number of obersrvations per var?

10

Sam computed a 95% confidence interval for μ from a specific random sample. His confidence interval was 10.1 < μ < 12.2. He claims that the probability that μ is in this interval is 0.95. What is wrong with his claim?

Either μ is in the interval or it is not. Therefore, the probability that μ is in this interval is 0 or 1.

True or false? A larger sample size produces a longer confidence interval for μ.

False. As the sample size increases, the maximal error decreases, resulting in a shorter confidence interval.

True or false? Every random sample of the same size from a given population will produce exactly the same confidence interval for μ.

False. Different random samples may produce different values, resulting in different confidence intervals.

True or false? If the sample mean of a random sample from an x distribution is relatively small, then the confidence interval for μ will be relatively short.

False. The maximal error of estimate controls the length of the confidence interval regardless of the value of .

interpretation of b1 example

For every unit increase in the explanatory variable, the PREDICTED response will (increase or decrease) by ___.

what are the effects of multi collinearity?

Higher standrad error and potentially wrong slopes(so can belive x has a negative impact on y when it reality it doesnt it just appears that way after acccounting for the other response var)

lets say your vars all have the same p value in jmp. What do you do?

Look at the t value. You want to find the largest t value as that means a smaller p value

How is the F-ratio calculated?

MSR/MSE or (SSR/p)/(SSE/n-p-1) *obtain from JMP*

can vars be removed after selection?

NO

do any of the three prodcedures remove multicollinearity?

Nope but they help

statistic standard deviation

S

total sum of squares (SST)

SSR+SSE

how do you do the backwards elimination?What a disadvantage?

Start with a full model, remove the largest p-value, and keep going until all our below p to leave. You need lots of data

What does the alternative hypothesis for the F-test imply?

at least 1 explanatory variable is helpful in predicting the response, however we do not know which or how many variables (model is useful)

how to calculate a CI for a regression slope(B1)

b1 +/- t* se(b1)

How does the interpretation of the population slope (Bi) change in comparison to the interpretation of the sample slope(bi)?

change the predicted response to say MEAN (or average) response

to determine what var to remove?

depends on things like interpribility of the x var, measurability so like how much it would cost, and the correlation with the other x vars in the model

What is the Global (F-test) used for?

determine if a model is useful

calculate a residual

e=Actual-Response e= y- y hat

formula for t-statistic

estimate/ SE(estimate)

interpretation of bi in MR

for every unit increase in xi (one explanatory variable), the PREDICTED response (y hat) will increase/decrease by (bi), assuming all other explanatory variables are held constant

4 features to look for in a scatterplot

form, direction, strength, outliers

How is the F-test conducted?

hypothesis test that encompasses all B's by testing Ho: B1=B2=B3...=Bp=0 vs. Ha: at least one Bi=/0

Simple Linear Regression (SLR)

one numerical explanatory variable & a one numerical response variable

How does the p-value change for a one-sided hypothesis test?

p-value/2

RMSE

the estimated standard error of the regression model (residual standard error) =s (square root of MSE)

as the confidence level (C) increases

the margin of error (E) increases

as the standard deviation increases

the margin of error (E) increases

as the standard deviation decreases

the margin of error decreases

as sample size decreases

the margin of error increases

what does "Uy" represent?

the mean response (Y), for a given value of x

What does a larger F-ratio value mean?

the model does a better/good job at predicting our response

in a f selection the larger your p to enter?

the more vars in your final model

what is a sampling distribution

the probability distribution of a sample statistic

At what value of x can we predict the most precise intervals with?

the sample mean of x

model/regression sum of squares (SSR)

variation that can be explained through the multiple regression model

error sum of squares (SSE)

variation that remains unexplained

What is multicollinearity?

when 2 x's are highly correlated and so provide redundant info about the y


Ensembles d'études connexes

AP Economics Firm Profit and Costs

View Set