STT Exam 2
leverage
data pts w/ x variables far from mean of x exert leverage on a linear model
What does a regression equation tell?
- predict y value at particular x -estimate slope between y and x -estimate if linear association is positive/negative
What are the 4 conditions of outliers?
-can distort correlation dramatically - can make weak correlation look big/hide strong correlation -can give a positive association a negative correlation -report correlation w/ & w/o outlier
What can a scatter plot & regression line be used for?
-determine any (x,y) pairs are outliers - predict y at specific x -estimate average y at specific x
a correlation of .9 means
.9^2=.81 so 81% of the variation of the y-values can be explained by the explanatory variable, x
what are the 4 properties of coefficient of determination?
0<r^2<1 =1 only if all points lie on a line doesn't change if units change measures strength between y & x
For which one of these relationships could we use a regression analysis? Only one choice is correct. a. Relationship between weight and height. b. Relationship between political party membership and opinion about abortion. c. Relationship between gender and whether person has a tattoo. d. Relationship between eye color (blue, brown, etc.) and hair color (blond, etc.).
A
Which of the following correlation values indicates the strongest linear relationship between two quantitative variables? a. r = −0.65 b. r = −0.30 c. r = 0.00 d. r = 0.50
A
A correlation of zero between two quantitative variables means that A. we have done something wrong in our calculation of r. B. there is no association between the two variables. C. there is no linear association between the two variables. D. re-expressing the data will guarantee a linear association between the two variables. E. None of the above.
C
The value of a correlation is reported by a researcher to be r = −0.5. Which of the following statements is correct? A. The x-variable explains 50% of the variability in the y-variable. B. The x-variable explains −50% of the variability in the y-variable. C. The x-variable explains 25% of the variability in the y-variable. D. The x-variable explains −25% of the variability in the y-variable.
C
Which of the following sets of variables is most likely to have a negative association? A. the number of bedrooms and the number of bathrooms in a house B. the number of rooms in a house and the time it takes to vacuum the house C. the age of a house and the cleanliness of the carpets inside D. the size of a house and its selling price
C
Which of the following sets of variables is most likely to have a negative association? A. the height of the son and the height of the father B. the age of the wife and the age of the husband C. the age of the mother and the number of children in the family D. the age of the mother and the ability to have children
D
WHAT ARE THE BEST WAY TO START OBSERVING THE RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Scatterplots
what is the correlation coefficient?
a numerical measure of the direction & strength of a linear association
what is a residual plot?
a scatterplot of the residuals against the explanatory variable -stretch horizontally -same amt of scatter -no bends -no outliers
Correlation
a statistic that measures the strength and direction of a linear relationship between two quantitative variables.
in ^y= a +bx what does a and b represent?
a- y-intercept b- slope
what does analyzing a residual do?
asses the adequacy of a model & identify outliers
Which of the following is a deterministic relationship? a. The relationship between hair color and eye color. b. The relationship between father's height and son's height. c. The relationship between height in inches and height in centimeters. d. The relationship between height as determined with a ruler and height as determined by a tape measure.
c
what is the straight enough condition?
correlation measures the strength only if the form is straight enough that its a linear relationship
Regression equation
describes the average relationship between a quantitative response and explanatory variable.
what are the 4 things to look for in a scatterplot?
direction, form, strength, unusual features
what is the symbol for residual?
e
what does b0 represent?
estimated intercept
what does b1 represent?
estimated slope
Which variable goes on the horizontal axis?
explanatory
what does x represent?
explanatory variable
What does it mean if the correlation coefficient between 2 quantitative variables is positive?
high values on one variable are associated w/ high values on the other
what to look for to determine strength of scatterplot?
how closely the points fit the trend & outliers
influential point
if omitted from the data, results in a very different regression model
what does slope tell?
indicates how much of a change there is for the predicted value of y when x increases 1 unit
4 types of scatterplot trends
linear, curved, clusters, no pattern
what does it mean if a scatterplot has an exotic form?
nonlinear with sharp points
What are 2 types of unusual features of a scatterplot?
outliers & subgroups
what would the graph of the equation: ^y=b0+b1x look like?
positive linear
What are the 3 types of relationships that can be determined from a scatterplot?
positive, negative, curvilinear
3 types of scatterplot directions
positive, negative, no direction
high leverage points
pull the line close to them -large effect -may determine slope & y intercept
what is the equation for correlation coefficient?
r= (xi-mean of x)(yi-mean of y) / (n-1)SxSy
what is the symbol for the coefficient of determination?
r^2
what are the 3 ways to study the relationship between 2 quantitative variables?
scatterplot, correlation, regression
what do Sx and Sy stand for?
standard deviation of X and Y
What does a correlation of r=0.0 between 2 variables mean?
the best straight line through the data is horizontal
residual
the difference between an OBSERVED value and the PREDICTED value
what does ^y represent?
the predicted response
negative residual
the predicted values overestimate the actual data
positive residual
the predicted values underestimate the actual data value
what does the coefficient of determination measure?
the proportion of variation that is explained by the independent variable
extrapolation
the use of a regression line for prediction outside the range of values- can't be trusted
interpolation
the use of a regression line for prediction within the range of values
what does the regression line predict?
the value for the response variable (y) as a straight-line function of the value x of the explanatory variable
Two variables have a positive relationship when
the values of one increase as the other increases
What is a scatterplot?
two-dimensional graph of data values
lurking variable
usually unobserved, influences the association between the variables of primary interest
confounding
when 2 explanatory variables are both associated w/ a response variable & eachother
what is the general equation of a regression line?
y= a + bx + error