STATISTICS Module 10 Notes

Ace your homework & exams now with Quizwiz!

The value of the correlation coefficient (r) is certainly important, but the real importance comes from using r to obtain the coefficient of determination

which is equal to r squared.

Residuals

Differences between the predicted y-value and the observed y-value for a given x-value.

Calculating the correlation (r) of two variables requires paired interval or ratio data. To properly use a correlation coefficient requires that the two variables make sense to pair together.

For example, it would make sense to determine the relationship between the two variables height and weight, but, although both variables are on a ratio scale, it probably does not make sense to investigate the relationship between height and the average amount of money spent monthly on utilities.

A correlation is a relationship between two variables. You would look for correlations only between variables that have paired interval or ratio data.

For example, you might look for a correlation between the price of amusement park tickets and the number of tickets sold. It would not make sense to look for a correlation between the number of tickets sold and how many tickets were printed.

If we know that two variables have a strong positive correlation: We use the correlation coefficient to build a regression equation that we can use to make predictions about one of our variables from what we know about the other variable.

High scores on one variable should lead to high scores on the second variable Low scores on one variable should lead to low scores on the other

-.81 is a stronger correlation because it is closer to -1 than .65 is to 1.

The correlation coefficient is limited to the range of -1 to 1.

The coefficient of determination is a measure of the amount of variance that exists in our data that can be explained. In other words, r^22 is an indication of the amount of variance in one variable that is explained by the variance in the other variable.

The data we collect is variable

To determine if the correlation is statistically significant, we need to compare our obtained value for r to the critical value for r.

The first step in this process is to calculate our degrees of freedom, n- 2. Using the r-Table, we scroll down the left column to the degrees of freedom value, and scroll over to the column under the .05 level of significance.

Pearson's Product Moment Correlation, which is used to describe linear relationships between two quantitative variables.

The symbol used for Pearson's Product Moment Correlation is r.

r2= .802 = .64 The coefficient of determination (r2) is .64. Since r ranges from -1 to 1, r2 must fall between 0 and 1 (-12 = 1; r2 cannot be a negative number).

To simplify the interpretation of r2, we convert the value or r2 to a percentage by multiplying it by 100. .64.100 = 64%

Pearson's r gives us an indication of the strength and the direction of the relationship.

Values closer to 1 or -1 indicate very strong relationships or correlations. Values closer to 0 indicate weaker relationships or correlations.

Correlation is important for determining if a relationship exists between two variables, but it also gives us information that will allow us to use one variable to predict the value of the second variable.

We are often interested in using the information we know about one variable to make a prediction for what the value of the second variable will be. when colleges use high school grade point averages or standardized test scores to make predictions about how well a prospective student will perform in college

The coefficient of determination should be considered symmetrical

that is, r2 indicates the amount of variation in variable Y that can be attributed to the variance in variable XAND the amount of variability in variable X that can be attributed to the variation in variable Y.

A correlation is a relationship that exists between two variables. You can determine whether a correlation is strong or weak or whether one exists at all. You can also use correlations to predict a value given another value.

A positive correlation might exist between the number of cars on a highway at a particular time and the number of traffic jams. As the number of cars increases, the number of traffic jams increases. On the other hand, a negative correlation might exist between the number of law enforcement officers patrolling a stretch of highway and the number of people who are speeding. As the number of officers increases, the number of people speeding decreases.

Positively correlated variables (a positive r) change in the same direction. As one variable increases, so does the other variable.

An example of a positive correlation might be height and weight in children. Typically, as children grow, they begin to weigh more. So we might say that height and weight are positively correlated: as height increases, weight increases.

A strong correlation between two variables just indicates that a relationship exists; when one variable changes, we tend to see change in the other variable.

However, Variable X does not cause Variable Y to change!

Pearson's r indicates the strength and direction of the correlation between two variables. You would then use hypothesis testing to determine if the correlation exists not just in the samples but also in the population.

Hypothesis testing helps you determine whether to reject the null hypothesis or fail to reject the hypothesis. The null hypothesis for correlation is: H0= r = 0 Use the r-Table and the degrees of freedom to determine the critical value for r. Then compare your obtained value for r to the critical value for r. Remember, always compare the absolute value for r to the critical value for r.

our predictions are not always right on; in fact, there will always be error in our estimate, regardless of what we are predicting. To compensate for that error, we calculate the standard error of the estimate using the standard deviation of the predicted variable and the correlation of both variables. The formula for the standard error of the estimate is:

Se = Sy [square root] 1 - r^2

To determine if the correlation between two variables is significant, we compare the r we calculated to the critical value for r from the table (rcrit).

We ask the question, is Robt > Rcrit ? If we can answer yes to this question, then the correlation is significant. The correlation is statistically significant, indicating that we would expect to see the same relationship in the population from which the sample came. If we cannot, then our correlation is not significant. We are comparing the absolute value of r to the critical value for r. In other words, if r is negative, don't consider the negative sign when asking if .

We can calculate Pearson's r (robt ) for our data in order to determine the strength and direction of the relationship between two variables. (it is impossible to take the square root of a negative number and it's impossible for the square root of a number to be negative)

big ass formula df = n -2

If r = 1, it is a perfect positive correlation. If r = -1, it is a perfect negative correlation. The closer r is to 0, the weaker the correlation. If r = 0, there is no correlation.

correlation does not mean causation. For example, just because the number of text messages sent during the day increases as the day progresses does not mean that time causes the number of text messages to increase.

The sign of Pearson's r (that is, whether it is positive or negative)

indicate the direction of the correlation.

A correlation

may exist between two variables that are paired in some way.

The formula for the regression equation is: Y` = a + bX where Y` is Y-prime: value we predict Y to be b = r(Sy/Sx) (s's rep standard deviations for group X and group Y) a = Y_ - bX_ (represent the means for groups Y and X.)

where Y` is Y-prime: value we predict Y to be a is the Y-intercept, that is, the point at which the regression line crosses the Y-axis. b is the slope of the regression line. The slope is an indication of the amount of change in Y that is associated with a unit change in X. X is the value that we are using to predict Y'. We are given X. r is the correlation coefficient, and represents the strength and direction of the relationship between variables X and Y.

Finding the Regression Equation

1. calculate b (b = r (Sy/Sx) 2. Calculate a. (a = Y_ - bX_ ) 3. calculate regression equation: Y` = a + bX ) once you calculate Y` you can put any number into X)

Negative correlations (a negative r) indicate that the variables change in opposite directions, that is, as one variable increases, the other variable decreases.

An example of this might be the air temperature outside and the amount of money spent on gas to heat your home. As the air temperature increases (that is, it gets warmer outside), the amount of money spent on gas to heat your home decreases. Likewise, when the air temperature outside decreases (that is, gets colder), the amount of money spent on gas to heat your home increases. This is an example of two variables that are negatively correlated, thus, r would be a negative number.

Scatter plots are one way to explore correlations.

A positive correlation would show points moving from the lower left to the upper right corner of the graph. The graph shows that as values for one variable increase, the values for the other variable increase. A negative correlation would show points moving from the upper left down to the lower right corner of the graph. As the values for one variable increase, the values for the other variable decrease. When plotted points seem randomly scattered, no correlation is apparent.

(In algebra, we learned that for any two points in the coordinate plane, we can graph the line containing the points and also develop an equation for that line, y = mx + b) Graphing lines and finding y-values are the concepts behind a regression line.

A regression line is the straight line that best fits the data. The regression line minimizes the sum of the squares of all the residuals. A residual is the vertical difference between the point and the regression line.

A scatter plot will give us an indication of the positive or negative relationship that exists between two variables. It is not scientific - we still need to calculate Pearson's r to have a mathematical measure of the strength and direction of the linear relationship between the two variables

But, a scatter plot will at least visually represent any relationship between the two variables.

When two variables show a correlation, you can use that relationship to make predictions about other values. Both positive and negative correlations show a general linear relationship a regression line is the straight line that the data points cluster around.

By calculating the regression line, you can more accurately predict other values. You can minimize the error of a prediction by finding the standard error of the estimate and determining an interval within which the prediction should fall.

Hypothesis Testing We use hypothesis testing to determine if the correlation we see in our sample reflects what is occurring in the population.

Hypothesis testing is a test of significance that provides information about the conclusions we can draw about the null hypothesis. With correlation, the null hypothesis is: H0: r=0

The range of Pearson's r is -1 to 1, and its exact value is referred to as the correlation coefficient. Correlation Coefficient Measure of the strength and direction of a linear relationship between two variables.

If r (the correlation coefficient) is -1 or 1, we have a perfect correlation. If r is equal to 0, our two variables are not correlated at all, meaning they have no relationship.

Negative Correlation in general, that the points are moving from the upper left corner to the lower right corner of the graph. This scatter plot indicates that the two variables graphed here are negatively correlated: as one variable increases, the other variable decreases.

If r = -1, we have a perfect negative correlation Perfect negative correlations are rare, but can happen.

Positive Correlation moving from the lower left corner to the upper right corner of the graph.

If r = 1, we have a perfect positive correlation

The higher the value of r2, the more variance we've explained in one variable with the variance of the other variable. This is a good thing!

If we can explain a large amount of the variance in one variable by the variance in the other variable, then there's not much room for something else to have much of an impact on the overall variance.

H0: r=0 this null hypothesis indicates that there is no relationship between the two variables. A test of the null hypothesis gives us the information we need in order to determine if we can reject the null hypothesis or if we fail to reject the null hypothesis

Rejecting the null hypothesis indicates that there is a relationship between the two variables and it is statistically significant. Failing to reject the null hypothesis indicates that there is no significant relationship between the two variables.

To determine the strength and direction of the relationship between two variables, you can calculate Pearson's r, the correlation coefficient.

Remember, the value of r can range from -1 to 1. The closer r is to 1 or -1, the stronger the correlation. However, hypothesis testing is needed to determine if the correlation is statistically significant.

No Correlation Some variables have little or no relationship.

The points are all over the place, and no single line, positive or negative, can be drawn through the middle.

Since we are working with paired data, we can plot each point (the person's height and the person's weight, for example) to form a scatter plot.

The scatter plot will visually indicate if we have a relationship, and what type of relationship it is. There are three types of relationships or correlations we can have.

After you solve for R, is the relationship statistically significant?

We can have two variables that are strongly correlated, while the correlation is not statistically significant, and we can have variables that are moderately correlated but the correlation is statistically significant.

A Correlation Does Not Imply Causation One very important point to remember is that correlation does not imply causation!

What is meant by this statement is that even if a strong relationship is found between two variables, we cannot assume that cahnge in one variable causes a change in the other variable to happen. We are investigating the relationship between two variables, but we're not trying to establish a cause-and-effect relationship between them. Correlation studies cannot establish cause and effect!

IF Robt > greater Rcrit the correlation is significant IF Robt < less than Rcit the correlation is not significant

You can also use r to find the coefficient of determination (r2). The range of r2 is from 0 to 1. The closer r2 is to 1, the more the variance in one variable can be explained by the variance in the other variable and the stronger the correlation.

By plotting data points showing the ratio of two variables, you can determine whether the correlation is positive, negative, or if there is no correlation.

You can also use the Pearson's Product Moment Correlation (r) to describe linear relationships between two variables. Pearson's r indicates the strength and direction of the relationship. The exact value of r is referred to as the correlation coefficient.


Related study sets

Fluid and Electrolytes: Saunder's Questions

View Set

Business Law Questions Chapter 10

View Set

Cultural Anthropology Midterm Study Guide

View Set

Chapter 44: Nursing Care of the Child With an Alteration in Mobility/Neuromuscular or Musculoskeletal Disorder

View Set