AP Stats Chapter 3

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

To achieve linearity when working with a power model of the form y = ax^P, you can transform the data by...

1. Raising the values of the explanatory variable x to the p power and plot the points (x^P, y). 2. Take the pth root of the values of the response variable y and plot the points (x, P√(y)).

Cautions about Correlation

1.Correlation doesn't imply causation. 2.Correlation does not measure form. 3.Correlation should only be used to describe linear relationships. Correlation is not a resistant measure of strength

Facts About Correlation

1.Correlation requires that both variables be quantitative. 2.Correlation makes no distinction between explanatory and response variables. 3. r does not change when we change the units of measurement of x, y, or both. 4.The correlation r has no unit of measurement. It's just a number.

bivariate data

A data set that describes the relationship between two variables

univariate data

A one-variable data set

regression line

A regression line is a line that describes how a response variable y changes as an explanatory variable x changes. Regression lines are expressed in the form y ̂=a+bx where y ̂ (pronounced "y-hat") is the predicted value of y for a given value of x.

Interpreting a Regression Line

A regression line is a model for the data, much like the density curves of Chapter 2. The y intercept and slope of the regression line describe what this model tells us about the relationship between the response variable y and the explanatory variable x. CAUTION: When asked to interpret the slope or y intercept, it is very important to include the word predicted (or equivalent) in your response. Otherwise, it might appear that you believe the regression equation provides actual values of y.

residual

A residual is the difference between the actual value of y and the value of y predicted by the regression line. residual= actual y -predicted y =y -y ̂

residual plot

A residual plot is a scatterplot that displays the residuals on the vertical axis and the explanatory variable on the horizontal axis.

Form

A scatterplot can show a linear form or a nonlinear form. The form is linear if the overall pattern follows a straight line. Otherwise, the form is nonlinear.

Direction

A scatterplot can show a positive association, negative association, or no association.

Strength

A scatterplot can show a weak, moderate, or strong association. An association is strong if the points don't deviate much from the form identified. An association is weak if the points deviate quite a bit from the form identified.

Measuring Linear Association: Correlation

A scatterplot displays the direction, form, and strength of a relationship between two quantitative variables. When the association between two quantitative variables is linear, we can use the correlation r to help describe the strength and direction of the association. CAUTION It is only appropriate to use the correlation to describe strength and direction for a linear relationship.

What if x and y have a relationship that can be described by an exponential model of the form y = abx?

A scatterplot of the logarithm of y against x should produce a roughly linear association. WHY? y=ab^x log(y)=log⁡(ab^x ) log(y)=log(a)+log⁡(b^x) log(y)=log(a)+x∙log⁡(b)

When transforming data the follow a power model, what if you have no idea what value of p to use?

A scatterplot of the logarithms of both variables should produce a linear pattern WHY? y=ax^p log(y)=log⁡(ax^p ) log(y)=log(a)+log⁡(x^p) log(y)=log(a)+p∙log⁡(x)

Scatterplot

A scatterplot shows the relationship (association) between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis. Each individual in the data set appears as a point in the graph.

influential point

An influential point in regression is any point that, if removed, substantially changes the slope, y intercept, correlation, coefficient of determination, or standard deviation of the residuals.

outlier

An outlier in regression is a point that does not follow the pattern of the data and has a large residual.

Transforming with Powers and Roots

Applying a function such as the logarithm or square root to a quantitative variable is called transforming the data. Transforming data amounts to changing the scale of measurement that was used when the data were collected. Consider a power model of the form y = ax^p If we transform the values of the explanatory variable x by raising them to the p power, and graph the points (x^P, y), the scatterplot should have a linear form.

Correlation and Regression Wisdom

Correlation and regression are powerful tools for describing the relationship between two variables. When you use these tools, you should be aware of their limitations. CORRELATION AND REGRESSION LINES DESCRIBE ONLY LINEAR RELATIONSHIPS CORRELATION AND LEAST-SQUARES REGRESSION LINES ARE NOT RESISTANT ASSOCIATION DOES NOT IMPLY CAUSATION

Extrapolation

Extrapolation is the use of a regression line for prediction far outside the interval of x values used to obtain the line. Such predictions are often not accurate. CAUTION: Don't make predictions using values of x that are much larger or much smaller than those that actually appear in your data.

correlation r

For a linear association between two quantitative variables, the correlation r measures the direction and strength of the association.

Regression to the Mean

For an increase of 1 standard deviation in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations in the response variable y. This is called regression to the mean, because the values of y "regress" to their mean.

coefficient of determination r2

The coefficient of determination r2 measures the percent reduction in the sum of squared residuals when using the least-squares regression line to make predictions, rather than the mean value of y. In other words, r2 measures the percent of the variability in the response variable that is accounted for by the least-squares regression line. r2 tells us how much better the LSRL does at predicting values of y than simply guessing the mean y for each value in the dataset.

explanatory variable

may help predict or explain changes in a response variable.

response variable

measures an outcome of a study

How to Make a Scatterplot

•Label the axes: The explanatory variable goes on the horizontal (X-axis). The response variable goes on the vertical axis. If there is no explanatory variable, either variable can go on the horizontal axis. •Scale the axes. •Plot individual data values.

Analysis of relationships between two variables builds on the same tools we used to analyze one variable:

•Plot the data, then look for overall patterns and departures from those patterns. •Add numerical summaries. •When there's a regular overall pattern, use a simplified model to describe it.

Some Important Properties of the Correlation r

•The correlation r is always a number between -1 and 1 (-1 ≤ r ≤ 1). •The correlation r indicates the direction of a linear relationship by its sign: r > 0 for a positive association and r < 0 for a negative association. •The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line. •If the linear relationship is strong, the correlation r will be close to 1 or -1. •If the linear relationship is weak, the correlation r will be close to 0.

A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and patterns. If a regression model is appropriate:

•The residual plot should show no obvious patterns. •The residuals should be relatively small in size.

In the regression equation y ̂=a+bx :

•a is the y intercept, the predicted value of y when x = 0 •b is the slope, the amount by which the predicted value of y changes when x increases by 1 unit

A number of statistical software packages produce similar regression output. Be sure you can locate

•the slope b •the y intercept a •the values of s •the value of r2

negative association

Two variables have a negative association when above-average values of one variable tend to accompany below-average values of the other variable.

positive association

Two variables have a positive association when above-average values of one variable tend to accompany above-average values of the other variable and when below-average values also tend to occur together.

Calculating the Regression Equation from Summary Statistics

Using technology is often the most convenient way to find the equation of a least-squares regression line. It is also possible to calculate the equation of the least-squares regression line using only the means and standard deviations of the two variables and their correlation.

Consider a power model of the form y = ax^P

Graphing the points (x^P, y) should produce a scatterplot with a linear form. OR Graphing the points (x, P√(y)) should produce a scatterplot with a linear form.

Residuals

In most cases, no line will pass exactly through all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. These vertical distances are called residuals (the "leftover" variation in the response variable).

regression lines

Linear (straight-line) relationships between two quantitative variables are common. A regression line summarizes the relationship between two variables, but only in a specific setting: when one variable helps explain the other.

Unusual features

Look for individual points that fall outside the overall pattern and distinct clusters of points.

Explanatory and Response Variables

Most statistical studies examine data on more than one variable. Analysis of relationships between two variables builds on the same tools we used to analyze one variable. Note: In many studies, the goal is to show that changes in one or more explanatory variables actually cause changes in a response variable. However, other explanatory-response relationships don't involve direct causation.

Determining if a Linear Model Is Appropriate:Residual Plots

One of the first principles of data analysis is to look for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at a residual plot.

high leverage

Points with high leverage in regression have much larger or much smaller x values than the other points in the data set.

How to Calculate the Correlation r

Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are x ̅ and sx for the x-values and y ̅ and sy for the y-values.

How to Interpret a Residual Plot

To determine whether the regression model is appropriate, look at the residual plot. •If there is no leftover curved pattern in the residual plot, the regression model is appropriate. •If there is a leftover curved pattern in the residual plot, consider using a regression model with a different form.

least-squares regression line

The least-squares regression line is the line that makes the sum of the squared residuals as small as possible.

standard deviation of the residuals s

The standard deviation of the residuals s measures the size of a typical residual. That is, s measures the typical distance between the actual y values and the predicted y values.

The Least-Squares Regression Line

There are many different lines we could use to model the association in a particular scatterplot. A good regression line makes the residuals as small as possible. The regression line we prefer is the one that minimizes the sum of the squared residuals.

no association

There is no association between two variables if knowing the value of one variable does not help us predict the value of the other variable.

How Well the Line Fits the Data: The Role of s and r2 in Regression

To assess how well the line fits all the data, we need to consider the residuals for each observation, not just one. Using these residuals, we can estimate the "typical" prediction error when using the least-squares regression line. The standard deviation of the residuals s gives us a numerical estimate of the average size of our prediction errors. There is another numerical quantity that tells us how well the least-squares regression line predicts values of the response y.

How to Describe a Scatterplot

To describe a scatterplot, make sure to address the following four characteristics in the context of the data: Direction, Form, Strength and Unusual Features CAUTION When describing the association shown in a scatterplot, write in the context of the problem. This means that you need to use both variable names in your description.

How to Calculate the Least-squares Regression Line Using Summary Statistics

We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means x ̅ and y ̅ and the standard deviations sx and sy of the two variables and their correlation r. The least-squares regression line is the line y ̂=a+bx with slope b=r∙s_y/s_x and y intercept a=y ̅- bx ̅

ASSOCIATION DOES NOT IMPLY CAUSATION

When we study the relationship between two variables, we often hope to show that changes in the explanatory variable cause changes in the response variable. CAUTION: A strong association between two variables is not enough to draw conclusions about cause and effect.


Kaugnay na mga set ng pag-aaral

MUSIC 201 Final - All Non-ID Questions

View Set

California: Real Estate Principles - Chapter 23

View Set

Chapter 17 Nucleotides and Nucleic Acids (Questions)

View Set