BNAD 277 Chapter 14

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

T/F: By restricting the number of explanatory variables to one in the simple linear regression model, we reduce potential usefulness of the model

True

T/F: The scatterplot is a graphical tool where each point in the plot represents a pair of observed values of the two variables

True

Limitations of Correlation Coefficient Analysis

-Captures only linear relationship (could have a non linear relationship) -May not be reliable measure when outliers are present in one or both of the variables -Correlation does not imply causation (if two variables are highly correlated, does not necessarily cause the other)

What can correlation tell you?

-Strength of the relationship -everything covariance can tell you (if there is a relationship and direction)

What is true concerning correlation analysis?

-The correlation coefficient captures only a linear relationship -the correlation coefficient may not be a reliable measure when outliers are present in one or both of the variables -correlation does not imply causation

Which of the following is true of the standard error of the estimate?

-Theoretically, its value has no predefined upper limit -it is a measure of the accuracy of the regression model -It is based on the squared deviations between the actual and predicted values of the response variable

When to use Standard Error of the Estimate

-When we are comparing various models -model with smaller Se is better fit

Regression Analysis

-captures the casual relationship between variables -captures the effect of explanatory variables (one or more) on the response variable

Scatterplot

-helps determine whether or not two variables are related in some systematic way -Each point in diagram represents a pair of observed values of the two variables

Goodness-of-fit measures:

-the standard error of the estimate -the coefficient of determination -the adjusted coefficient of determination

Adjusted R^2

-used to compare competing linear regression models with different # of explanatory variables -higher the value, better the model

Correlation relationships

.7-1: strong .3-.7: moderate below .3: weak (same for negatives)

What values can the standard error of the estimate, s, assume?

0 <_ s < infinity

What does the value of R^2 fall between?

0 and 1 Closer to 1= stronger fit Closer to 0= weaker fit (same with R)

Se can assume what values

0 to infinity but the closer to 0, better the model fits

Interpret an R^2 of .72

72% of the sample variation in the response variable is explained by the same regression equation (stronger fit) Other factors not included in model account for remaining 28% of sample variation

In simple linear regression, a downward sloping trend line suggests which of the following?

A negative linear relationship between x and y

The standard error of the estimate can assume what value?

Between zero and infinity

The coefficient of determination can assume what value?

Between zero and one

Spurious correlation

Can make two variables appear closely related when no casual relation exists

What sample correlation coefficient who show the strongest association between X and Y?

Closet to -1 or 1

What is the notation for the random error term?

E

Inexact relationship

If the value of the response variable is not uniquely determined by the explanatory variables

If the sample regression equation is y (hat)= 15/5x, which is the correct interpretation of the estimates slope coefficient?

For every unit increase in x, y will decrease, on average, by 5 units

Hypothesis for Test Statistic for ρxy

H0: ρxy = 0 HA: ρxy ≠ 0

Two tailed test of whether the population correction coefficient differs from zero:

H0: ρxy = 0 HA: ρxy ≠ 0

Which of the following is NOT true of the standard error of the estimate?

It can take on negative values

What type of relationship exists between two variables if as one increases, the other decreases?

Negative

What kind of relationship: As one variable increases, the other increases

Positive

What type of relationship exists between two variables if as one increases, the other increases?

Positive

Which is easier to interpret and why? R^2 or Se?

R^2 because it has both lower and upper bounds that make its interpretation more intuitive

Multiple R

R^2 is the square of multiple R Multiple R is the square root of R^2

The residual e represents

The difference between an observed and predicted value of the response variable at a given value of the explanatory variable

What can covariance tell you?

The direction of the linear relationship between two variables (+ or -) CANNOT say strength of the relationship

The standard error of the estimate, Se, is what?

The positive square root of Se^2

For which of the following situations is the multiple regression model appropriate?

The response variable is influenced by to or more explanatory variables

When would you use a simple linear regression model

The response variable y is influenced by one explanatory variables

What best defines the rejection region of a test?

The set of values of a test statistic for which the null hypothesis is rejected

y(hat) = 20 - 3x. Interpret the estimated slope coefficient:

The slope if negative, indicating a negative linear relationship

Which of the following is a possible advantage of using multiple tools to judge the validity of a regression model?

To avoid the risk of using the wrong model

T/F: In multiple linear regression for the sample regression equation, bi measures the change in the predicted value of the response variable y(hat) given a unit increase in the associated explanatory variable xi, holding all other explanatory variables constant

True; bi represents the partial influence of xi on y(hat)

Why do we use the multiple regression model instead of the simple regression model?

We add explanatory variables to increase model's usefulness

When is there no linear relationship?

When covariance is 0

deterministic relationship between variables

When the value of the response variable is uniquely determined by the values of the explanatory variables

which best describes outliers?

a few extreme high or low values in the data set

What best defines a test statistic in a hypothesis test?

a variable upon which the decision in hypothesis testing is based

In practice, we use a stochastic model over a deterministic model because

certain variables that impact the response variable are not included in the model

What is a strong correlation relationship?

close to -1 or 1

The goodness-of-fit measure that quantifies the proportion of the variation in the response variable that is explained by the sample regression equation is the

coefficient of determination

In a regression model, the Multiple R is the

correlation between the response variable and its predicted value

What is the difference between correlation and causation?

correlation means that two variables are related, but causation means one causes the other to happen

in regression analysis, the response variable is also called the

dependent variable

Other names for response variable

dependent variable, the explained variable, the predicted variable, or the regressand

When the response variable is uniquely determined by the explanatory variable, the relationship is ______

deterministic

unlike R(squared), adjusted R(squared) can be used to compare regression models with

different numbers of explanatory variables

In order to select the preferred model (multiple or singular) we need to examine what?

goodness-of-fit measures

Since the standard error of the estimate....

has no predefined upper limit, it is hard to interpret in isolation

Multiple linear regression model allows us to study what?

how the response variable is influenced by two or more explanatory variables

In regression analysis, the explanatory variable is also called the

independent variable

Other names for explanatory variable

independent variables, predictor variables, control variables, or regressors

One limitation of correlation analysis is that

it only captures a linear relationship between two variables

In hypothesis tests about the population correlation coefficient, the alternative hypothesis of not equal to zero is used when testing whether two variables are ____________

linearly related

When two regression models applied on the same data set have the same response variable but a difference number of explanatory variables, the model that would provide the better fit is the one with the

lower s and higher adjusted R(squared)

If the correlation between the response variable and the explanatory variables is sufficiently low, then adjusted R(squared)

may be negative

Sample covariance

measure of the linear relationship between two variables X and Y

The common approach to fitting a line to sample data in a scatterplot is to

minimize the value of the sum of the squared residuals

In E(y) = β0 + β1x, when β1 < 0 what is the relationship?

negative linear relationship

In E(y) = β0 + β1x, when β1 = 0 what is the relationship?

no linear relationship

How many explanatory variables does a simple linear regression model have?

one

In E(y) = β0 + β1x, when β1 > 0 what is the relationship?

positive linear relationship

OLS method (ordinary least squares)

produces the straight line that is "closest" to the data by finding where the SSE is minimized. SSE is the sum of the squared differences between the observed values y and their predicted values y(hat) OR the sum of the squared distances from the regression equation

Which of the following measures are used to judge the goodess-of-fit of a regression model?

s, R(squared), and adjusted R(squared)

The sample variance of the residual s(squared), is defined as

the average of the squared differences between y and y(hat)

residual e

the difference between the observed and the predicted values of y, that is y − y(hat) ,

Simple linear regression model assumption

the expected value of y lies on a straight line, denoted by β0 + β1x, where β0 and β1 are the unknown intercept and slope parameters

SSR represents

the explained variation in the response variable

To estimate the parameters β0 and β1 we use what?

the method of least squares (ordinary least squares (OLS))

In the sample regression equation: y(hat)= b0 + b1x, y(hat) is

the predicted value of the response variable given a specified value of the explanatory variable x

unlike R(squared), adjusted R(squared) explicitly accounts for

the sample size and the number of explanatory variables

How to determine the better fit to a model

the smaller Se implies a better fit to the model

Why to we conduct a hypothesis test for correlation coefficient?

to determine whether the apparent relationship between the two variables, implied by the sample correlation coefficient, is real or due to chance

What does a negative value of the sample covariance imply?

when x is above its mean and y is below its mean= x and y have a negative linear relationship

What does a positive value of the sample covariance imply?

when x is above its mean, y is also above its mean= x and y have a positive linear relationship

On a scatterplot for simple linear regression model, where do x and y go?

y on vertical axis, x on horizontal axis implying that x influences the variation in y

deterministic component of the simple linear regression model

β0 + β1x (aka the expected value for y for a given value of x)


Ensembles d'études connexes

FINA 5311 TAMUCC Chapter 2 - 14 Review

View Set

World Geography Unit 8 - North America Study Guide - Bowersox

View Set

Intro to Geographic Information Systems (GIS)

View Set

NUR150: Chapter 25-Muscle Relaxants

View Set

Chapter 9 - The Neolithic Revolution

View Set

Conceptual Physics 11-14 Test (3)

View Set

Ch 16: Fluid, Electrolyte, and Acid-Base Imbalances

View Set