Chapter 7 Linear Regression

¡Supera tus tareas y exámenes ahora con Quizwiz!

Extrapolation

Prediction of the mean value of the dependent variable y for values of the independent variables x1 , x2 , . . . , xq that are outside the experimental range.

Parameter

A measurable factor that defines a characteristic of a population, process, or system.

Coefficient of determination

A measure of the goodness of fit of the estimated regression equation. It can be interpreted as the proportion of the variability in the dependent variable y that is explained by the estimated regression equation.

Random variable

A quantity whose values are not known with certainty.

Point estimator

A single value used as an estimate of the corresponding population parameter.

Regression analysis

A statistical procedure used to develop an equation showing how the variables are related.

Best subsets

A variable selection procedure that constructs and compares all possible models with up to a specified number of independent variables.

Dummy variable

A variable used to model the effect of categorical independent variables in a regression model; generally takes only the value zero or one.

Confidence interval

An estimate of a population parameter that provides an interval believed to contain the value of the parameter at some level of confidence.

Confidence level

An indication of how frequently interval estimates based on samples of the same size taken from the same population using identical sampling techniques will contain the true value of the parameter we are estimating.

Prediction interval

An interval estimate of the prediction of an individual y value given values of the independent variables.

Stepwise selection

An iterative variable selection procedure that considers adding an independent variable and removing an independent variable at each step.

Backward elimination

An iterative variable selection procedure that starts with a model with all independent variables and considers removing an independent variable at each step.

Forward selection

An iterative variable selection procedure that starts with a model with no variables and considers adding an independent variable at each step.

Cross-validation

Assessment of the performance of a model on data other than the data that were used to generate the model.

Overfitting

Fitting a model too closely to sample data, resulting in a model that does not accurately reflect the population.

Leave-one-out cross-validation

Method of cross-validation in which candidate models are repeatedly fit using n - 1 observations and evaluated with the remaining observation.

Holdout method

Method of cross-validation in which sample data are randomly divided into mutually exclusive and collectively exhaustive sets, then one set is used to build the candidate models and the other set is used to compare model performances and ultimately select a model.

k-fold cross-validation

Method of cross-validation in which sample data set are randomly divided into k equal-sized, mutually exclusive and collectively exhaustive subsets. In each of k iterations, one of the k subsets is used to evaluate a candidate model that was constructed on the data from the other k - 1 subsets.

Linear regression

Regression analysis in which relationships between the independent variables and the dependent variable are approximated by a straight line.

Multiple linear regression

Regression analysis involving one dependent variable and more than one independent variable.

Simple linear regression

Regression analysis involving one dependent variable and one independent variable.

Quadratic regression model

Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model: yˆ 5 bo 1 b1 x1 1 b2 x12; also referred to as a second-order polynomial model.

Piecewise linear regression model

Regression model in which one linear relationship between the independent and dependent variables is fit for values of the independent vari- able below a prespecified value of the independent variable, a different linear relationship between the independent and dependent variables is fit for values of the independent vari- able above the prespecified value of the independent variable, and the two regressions have the same estimated value of the dependent variable (i.e., are joined) at the prespecified value of the independent variable.

Interaction

Regression modeling technique used when the relationship between the dependent variable and one independent variable is different at different values of a second independent variable.

t test

Statistical test based on the Student's t probability distribution that can be used to test the hypothesis that a regression parameter bj is zero; if this hypothesis is rejected, we conclude that there is a regression relationship between the jth independent variable and the dependent variable.

Training set

The data set used to build the candidate models.

Validation set

The data set used to compare model forecasts and ultimately pick a model for predicting values of the dependent variable.

Multicollinearity

The degree of correlation among independent variables in a regression model.

Residual

The difference between the observed value of the dependent variable and the value predicted using the estimated regression equation; for the ith observation, the ith resid- ual is yi 2 yˆi.

Regression model

The equation that describes how the dependent variable y is related to an independent variable x and an error term; the multiple linear regression model is y=Bo+B1x1+B2x2+...+Bqxq+e

Estimated regression

The estimate of the regression equation developed from sample data by using the least squares method. The estimated multiple linear regression equation is y=bo+b1x1+b2x2+...+bqxq

knot

The prespecified value of the independent variable at which its relationship with the dependent variable changes in a piecewise linear regression model; also called the break- point or the joint.

p value

The probability that a random sample of the same size collected from the same population using the same procedure will yield stronger evidence against a hypothesis than the evidence in the sample data given that the hypothesis is actually true.

Hypothesis testing

The process of making a conjecture about the value of a population parameter, collecting sample data that can be used to assess this conjecture, measuring the strength of the evidence against the conjecture that is provided by the sample, and using these results to draw a conclusion about the conjecture.

Statistical inference

The process of making estimates and drawing conclusions about one or more characteristics of a population (the value of one or more parameters) through anal- ysis of sample data drawn from the population.

Experimental region

The range of values for the independent variables x1 , x2 , . . . , xq for the data that are used to estimate the regression model.

Interval estimation

The use of sample data to calculate a range of values that is believed to include the unknown value of a population parameter.

Dependent variable

The variable that is being predicted or explained. It is denoted by y and is often referred to as the response.

Independent variable(s)

The variable(s) used for predicting or explaining values of the dependent variable. It is denoted by x and is often referred to as the predictor variable.


Conjuntos de estudio relacionados

PrepU Chapter 3: Health, Wellness, and Health Disparities

View Set

Inventor Assessment 2 Study Guide

View Set

Population Health Exam 2 Book and PCA Questions

View Set

THE ULTIMATE TEAS COMBINATION EAT YOUR HEART OUT (ONLY SOME SCIENCE)

View Set

AP Psych Vocab Unit 3/Study Guide

View Set

Health Policy Provisions, Clauses, and Riders (Quiz)

View Set

Lesson 1 & 2 & 3 Econ 304k Word problems, Lesson 4 5 6 Econ 304K, Lesson 7 8 9 Econ 304K

View Set

Intro to Business Ch. 13 - Promotion and Pricing Strategy

View Set

Unit 5 - Numerical & Algebraic Expressions

View Set