Stats Exam 4

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

regression line (line of best fit)

A line, segment, or ray drawn on a scatter plot to estimate the relationship between two sets of data. A regression line with a slope of 0 implies that the response variable does not change as the explanatory variable changes implying the explanatory variable does not help to predict the response variable.

scatter diagram

A scatter diagram is a graph that shows the relationship between two quantitative variables. Each pair in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal​ axis, and the response variable is plotted on the vertical axis.

testing for linear association

If the absolute value of the correlation coefficient is greater than the critical value, we say that a linear relation exists between the two variables. Otherwise, no linear relation exists. The correlation coefficient is a measure that gives the direction and strength of the linear relationship between two quantitative variables. It should not be used when there is a​ non-linear relationship between two variables or when one or both variables are categorical.

If the linear correlation between two variables is​ negative, the slope of the regression line will also be negative.

If the linear correlation between two variables is​ negative, what can be said about the slope of the regression​ line?

The line of best fit is sometimes called the​ least-squares regression line because it minimizes the sum of the squared residuals. As​ such, the​ residuals, which can be positive or​ negative, sum to 0. Also, points on the regression line will have a residual of 0.​ Therefore, if an observation has a residual of​ 0, its predicted value equals its observed value.

In​ regression, what can be said about the sum of the residuals of all the​ observations?

When analyzing two quantitative​ variables, what is the first thing that should be​ done?

It is important to determine what kind of relationship exists between the two variables. To do​ so, make a scatterplot.

z statistic

Measures the number of standard deviations a data value is from the mean (either above or below). This score is obtained by subtracting the mean from a given data value and dividing this result by the standard deviation. (If it's 0 that means the difference between sample proportions in question is 0)

negative linear correlation

The closer r is to -1, the more negative the correlation; if it's exactly -1, the data lie on a straight line with negative slope / 1, straight pos slope

residual

The difference between an observed value of the response variable and its predicted value is called a ______. An observed value of the response variable greater than its predicted value will have a positive _____. An observed value less than its predicted value will have a negative _______. to find it, plug in the x-value given in the data point in question, find y-hat, then subtract it from the observed y (y -- y-hat)

points on least squares regression line

The notation y-hat is used in the​ least-squares regression line to indicate a predicted value of y for a given value of x.​ Thus, the​ y-value of each point on the least squares regression line represents the expected​ y-value at the corresponding value of​ x, given the data provided. The expectation (y-coordinates on the line) represent the mean value of y at a given x-value). Response variable y is normally distributed with constant SD.

Properties of the Linear Correlation Coefficient

[1] The linear correlation coefficient is always between −1 and 1, inclusive. That is, −1≤r≤1. [2] If r=+1, then a perfect positive linear relation exists between the two variables. [3] If r=−1, then a perfect negative linear relation exists between the two variables. [4] The closer r is to +1, the stronger is the evidence of positive association between the two variables. [5] The closer r is to −1, the stronger is the evidence of negative association between the two variables. [6] If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So a value of r close to 0 does not imply no relation, just no linear relation. [7] The linear correlation coefficient is a unitless measure of association. So the unit of measure for x and y plays no role in the interpretation of r. [8] The correlation coefficient is not resistant. Therefore, an observation that does not follow the overall pattern of the data

Prediction intervals for an individual response

________________________ are intervals constructed about the predicted value of y that are used to measure the accuracy of a single individual's predicted value. If we use the least-squares regression equation to predict the total cholesterol for one 42-year-old female, we construct a _______________.

Confidence intervals for a mean response

_________________________ are intervals constructed about the predicted value of y, at a given level of x, that are used to measure the accuracy of the mean response of all the individuals in the population. If we use the least-squares regression equation to predict the mean total cholesterol for all 42-year-old females, we construct a _________________.

scatter diagram (scatterplot)

a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point in the scatter diagram. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis

linear correlation coefficient

a measure of the strength and direction of the linear relation between two quantitative variables

negative association

as one variable (x) increases, the other (y) decreases

positive association

as one variable (x) increases, the other variable (y) increases

extrapolation

estimation by projecting beyond the data given; although it is possible that the pattern may​ continue, there is no guarantee.​ Therefore, _______ could result in bad predictions and is not recommended.

coefficient of determination (R^2)

measures the proportion of total variation in the response variable that is explained by the least-squares regression line it's a number between 0 and 1, inclusive. if = 0, the least-squares regression line has no explanatory value; if = 1, the l-s-r line explains 100% of the variation in the response variable

Greek letter ρ (rho)

represents the population correlation coefficient

r

represents the sample correlation coefficient

quadratic relation

scatter plot in a U shape (correlation coefficient does not apply here - only applies to linear relations)

What does the slope represent?

the change in y for a​ one-unit increase in x

response (dependent) variable

the variable whose value can be explained by the value of the explanatory (or predictor or independent) variable

y-intercept

the​ ________ is the value of y when x=0.

R-square

​R-square is the percent of variation in the response variable that is explained by the explanatory variable.​ If R-square is 26%, that means 26% of the variation in the amount of damage to a house is explained by a simple linear regression with the distance of the burning house from the nearest fire station as the explanatory variable.


Ensembles d'études connexes

Patho Interactive Module Questions

View Set

Chapter 12: Food Safety Concerns

View Set

Nutrition Consultant Exam Chp. 7-9

View Set

Domain 7: Security Operations: Incident Management

View Set