chapter 4: stats 10

Ace your homework & exams now with Quizwiz!

Always use a phrase like ____ ____ when describing an association because the trend you are describing has variability - the association you are describing may not be true for all individuals.

"tends to"

• Correlation says

"there seems to be a linear association between these two variables" but it doesn't tell us what that association is

• The response variable

(dependent variable) is the variable of interest. • It is the outcome variable. • It goes on the y-axis.

The explanatory variable

(independent variable) is the variable that it is the predictor. • It goes on the x-axis.

Correlation is always between _____ and _____

-1 and +1.

We have r no correlation when r is equal to

0 (r = 0).

Trend The general tendency of the scatterplot as you read from left to right, typical trends:

1. Increasing (uphill), called a positive association 2. Decreasing (downhill), called a negative association 3. No trend, if there is neither an uphill nor downhill tendency

Examining Scatterplots Note three features:

1. Trend (like center) 2. Strength (like spread) 3. Shape

The correlation between weight and gas mileage of cars is -0.96. Which of the following is the correct value and interpretation of r-squared of the linear model for predicting gas mileage from weight?

92% of the variation in the gas mileage is explained by the weight of the cars.

Understanding the Correlation Coefficient

Changing the order of the variables does not • change the correlation coefficient (r). Adding a constant or multiplying by a positive • constant does not affect r. • • The correlation coefficient is unitless. Must have a linear trend.

Extrapolation

Do not extrapolate beyond the data, i.e., do not make a prediction for a x value outside the range of the data - the linear model may no longer hold outside that range.

Interpretation of the slope:

For each unit increase in x, we expect y to increase/decrease on average by the value of the slope. • A positive slope implies that as x increases y increases. • A negative slope implies that as x increases y decreases. • When correlation (r) is positive slope (b) will be positive and when correlation (r) is negative slope (b) will be negative

Assessing Model Fit

In order to assess if the linear model is a good fit, we look to see how much of the variation in the response variable is accounted for by the model, i.e. explained by the explanatory variable.

Use Scatterplots to Investigate Associations Between ________Variables

Numerical

Finding the Regression Line

On average, the sum of the squares of the vertical distances between the points or observed y-values (y ) and the value predicted by the line ( y) is the smallest for the regression line. • Also called the least squares line. ˆ(y − yˆ)2

Intercept

Once we have the slope, we can calculate the intercept. B=y+bx • We need to find the means of variables x and y. • Interpretation of the intercept: When x equals 0, we expect y to equal the intercept.

Coefficient of Determination

The coefficient of determination or the correlation coefficient squared, (or r-squared), gives the percentage of the variance of y is explained by x. • For the Burger King model: • When interpreting a regression model we need to explain in context of what means: • Burger King model: 69% of the variation in the fat content (response or y-variable) is explained by the protein content (explanatory or x-variable) of the burger.

correlation coefficient (r)

The correlation coefficient (r) gives us a numerical measurement of the strength of the linear relationship between two numerical variables. -the Pearson correlation coefficient -Correlation only makes sense if the trend is linear. -Correlation has no units

Linear Model

The linear model is just an equation of a straight line through the data. • The points in the scatterplot don't all line up, but a straight line can summarize the general pattern. • The linear model can help us understand how the explanatory (independent) and response (dependent) variables are associated.

Slope

The slope can be calculated using the correlation coefficient, r, and the standard deviations of the explanatory variable (x) and the response variable (y) slope= r (Sy/Sx)

Negative correlation

We have negative correlation when r is less than • 0 (r < 0). The closer the correlation is to -1, the stronger • the association between the two variables.

Positive Correlation

We have positive correlation when r is greater • than 0 (r > 0). The closer the correlation is to 1, the stronger the • association between the two variables.

Correlation Does Not Mean ________

causation • Do not conclude that a cause-and-effect relationship between two variables exists just because there is a strong correlation.

The sign of a correlation coefficient gives the _____ ___

direction of the association + increasing - decreasing

regression line

is a tool for making predictions about future observations. • It is a useful method for summarizing a linear relationship.

residual

is the difference between the observed value and its associated predicted value. Residual = observed value - predicted value = y − y^

Each point in the scatterplot represents ____ _________

one observation

Regression Line ____ matters

order

Correlation is sensitive to ____

outliers

Finding the Correlation Coefficient

r = ∑(zx*zy)/ n−1 zx is the z-score for the x-variable, zy is the z- score for the y-variable, n is the sample size.

Coefficient of Determination

r^2=0 means that none of the variability in y is explained by x. r^2=1 means that all of the variability in y is explained by x. • While the correlation coefficient is between -1 and 1, is between 0 and 1. • We would like to be as close to 100% as possible. 0 is less than or equal to r^2 is less than or equal to 1

Prediction

simply plug in the value for x in the equation for the regression line and calculate predicted y. Predicted Fat = 6.8 + 0.97 × Protein Predicted Fat = 6.8+ 0.97×30 = 35.9g yˆ= 35.9g

Scatterplots with small amounts of scatter or little vertical variation indicate a ____association.

strong

Scatterplots with large amounts of scatter or vertical variation indicate a _____ association.

weak

predicted value

y ^ represents the predicted value or the estimate made from a model.

equation for a straight line.

y= mx + b, or y=a+bx where y is the y-variable, x is the x-variable, b is the intercept and m is the slope.

A correlation near ____ corresponds to a • weak linear association.

zero

Pitfalls to Avoid

• Don't fit linear models to nonlinear associations. • Correlation is not causation. • Beware of outliers! Try the regression and correlation with and without influential points to see the differences • Be careful of regressions of aggregate data (data of means rather than individuals). • Don't extrapolate.


Related study sets

EXAMPLES CFA Level 1, Section 1: Ethics & Standards

View Set

Unit 5: Worlds Entangled 1600-1750

View Set

Pathophysiology Practice Midterm 2

View Set

American History Chapter 12 The Jacksonian Era

View Set

2019 DOT: Drug Abuse and Alcoholism

View Set

Chapter 8: Management of the Older Adult Patient

View Set

additive effect or antagonist effect

View Set

Aquatic science unit 4 test review:

View Set