AP Statistics Chapter 5: Summarizing Bivariate Data

Ace your homework & exams now with Quizwiz!

Predicted (fitted) values ŷ1, ŷ2,... ŷn

Obtained by substituting the x value for each observation in the data set into the least-squares line; ŷ=a/bx1,..., ŷn=a+bxn

residuals

Obtained by subtracting each predicted value from the corresponding observed y value. These are the vertical deviations from the least-squares line.

Influential Observation

An observation is potentially influential if it has an X value that is far away from the rest of the data (separated from the rest of the data in the x direction). To determine if the observation is in fact influential, we assess whether removal of this observation has a large impact on the value of the slope or intercept of the least-squares line.

Outlier

An observation that has a large residual. Outlier observations fall far away from the least-squares line in the y direction.

Independent (predictor or explanatory) variable

In a bivariate data set, the variable that will be used to make a prediction of the dependent variable. This is denoted by x. Also sometimes called the predictor variable or the explanatory variable.

Dependent (Response) Variable

In a bivariate data set, the variable whose value we would like to predict. This is denoted by y. Also sometimes called the response variable.

Notation of population correlation coefficient

P

Residual

The difference between an observed y value and the corresponding predicted y-value. The residuals from the least-squares line are the n quantities. Residual = Observed value - Predicted value

Total sum of squares

The sum of squared deviations from the sample mean is a measure of total variation in the observed y values.

Residual (error) sum of squares

The sum of the squared residuals is a measure of y variation that cannot be attributed to an approximate linear relationship (unexplained variation).

Logistic regression function

Data is fit into linear regression model, which then be acted upon by a logistic function predicting the target categorical dependent variable.

Power transformation

A transformation in which a power/exponent is chosen, and then each original value is raised to that power to obtain the corresponding transformed value. Do NOT pick 0 as the exponent as that would make every value 1, and an exponent of 1 is NOT a transformation either.

population correlation coefficient

(p) is the correlation computed by using all possible pairs of data values (x,y) taken from a population

Pearson's sample correlation coefficient

A measure of the strength and direction of a linear relationship between two numerical variables. Denoted by r. Although there are several different correlation coefficients, Pearson's correlation coefficient is by far the most commonly used, and so the name "Pearson's" is often omitted and it is referred to as simply the correlation coefficient.

Transformation

A simple function of the x and/or y variable, which is then used in a regression. is the replacement of a variable by a function of that variable: for example, replacing a variable x by the square root of x or the logarithm of x. In a stronger sense, a transformation is a replacement that changes the shape of a distribution or relationship.

Predicted (fitted) value

Result from substituting each sample x value into the equation for the least-squares line.

Residual plot

Scatterplot of the (x, residual) pairs. Isolated points or a pattern of points in a residual plot are indicative of potential problems. a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

A = ȳ - bx̄

The intercept of the least-squares line.

Least-squares line

The line that minimizes the sum of squared deviations. The least-squares line is also called the sample regression line.

Sum of squared deviations

The most widely used measure of the goodness of fit of a line y = a+bx to bivariate data (X1,Y1),...,(Xn,Xn)

Coefficient of determination

The proportion of variation in observed y's that can be explained by an approximate linear relationship. The proportion of variation in y that can be attributed to an approximate linear relationship between x and y. The coefficient of determination is denoted by r squared. The value of r squared often converted to a percentage (by multiplying by 100) and interpreted as the percentage of variation in y that can be explained by an approximate linear relationship between x and y.

Standard deviation about the least-squares lines

The size of a "typical" deviation from the line least-squared line. Roughly speaking, Se is the typical amount by which an observation deviates from the least-squares line.

Principle of least squares

a form of mathematical regression analysis that finds the line of best fit for a dataset, providing a visual demonstration of the relationship between the data points. aims to create a straight line that minimizes the sum of the squares of the errors generated by the results of the associated equations, such as the squared residuals resulting from differences in the observed value and the value anticipated based on the model. Y=mx+b m = Sy/Sx

Scatterplot

a graphical depiction of the relationship between two variables

Notation for Residual

e

Sample correlation coefficient

measures both the strength and direction of the linear relationship between two variables

Notation for sample correlation coefficient

r

regression analysis

used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships

Equation of a line

y=a+bx a = intercept b = slope


Related study sets

Chapter 6: Discrete Probability Distributions

View Set

Cell bio lab cancer pharmacology

View Set

Cambridge English Profile Level A2

View Set

Chapter 32: Health Assessment of Children

View Set