BNAD 277 Chapter 14
T/F: By restricting the number of explanatory variables to one in the simple linear regression model, we reduce potential usefulness of the model
True
T/F: The scatterplot is a graphical tool where each point in the plot represents a pair of observed values of the two variables
True
Limitations of Correlation Coefficient Analysis
-Captures only linear relationship (could have a non linear relationship) -May not be reliable measure when outliers are present in one or both of the variables -Correlation does not imply causation (if two variables are highly correlated, does not necessarily cause the other)
What can correlation tell you?
-Strength of the relationship -everything covariance can tell you (if there is a relationship and direction)
What is true concerning correlation analysis?
-The correlation coefficient captures only a linear relationship -the correlation coefficient may not be a reliable measure when outliers are present in one or both of the variables -correlation does not imply causation
Which of the following is true of the standard error of the estimate?
-Theoretically, its value has no predefined upper limit -it is a measure of the accuracy of the regression model -It is based on the squared deviations between the actual and predicted values of the response variable
When to use Standard Error of the Estimate
-When we are comparing various models -model with smaller Se is better fit
Regression Analysis
-captures the casual relationship between variables -captures the effect of explanatory variables (one or more) on the response variable
Scatterplot
-helps determine whether or not two variables are related in some systematic way -Each point in diagram represents a pair of observed values of the two variables
Goodness-of-fit measures:
-the standard error of the estimate -the coefficient of determination -the adjusted coefficient of determination
Adjusted R^2
-used to compare competing linear regression models with different # of explanatory variables -higher the value, better the model
Correlation relationships
.7-1: strong .3-.7: moderate below .3: weak (same for negatives)
What values can the standard error of the estimate, s, assume?
0 <_ s < infinity
What does the value of R^2 fall between?
0 and 1 Closer to 1= stronger fit Closer to 0= weaker fit (same with R)
Se can assume what values
0 to infinity but the closer to 0, better the model fits
Interpret an R^2 of .72
72% of the sample variation in the response variable is explained by the same regression equation (stronger fit) Other factors not included in model account for remaining 28% of sample variation
In simple linear regression, a downward sloping trend line suggests which of the following?
A negative linear relationship between x and y
The standard error of the estimate can assume what value?
Between zero and infinity
The coefficient of determination can assume what value?
Between zero and one
Spurious correlation
Can make two variables appear closely related when no casual relation exists
What sample correlation coefficient who show the strongest association between X and Y?
Closet to -1 or 1
What is the notation for the random error term?
E
Inexact relationship
If the value of the response variable is not uniquely determined by the explanatory variables
If the sample regression equation is y (hat)= 15/5x, which is the correct interpretation of the estimates slope coefficient?
For every unit increase in x, y will decrease, on average, by 5 units
Hypothesis for Test Statistic for ρxy
H0: ρxy = 0 HA: ρxy ≠ 0
Two tailed test of whether the population correction coefficient differs from zero:
H0: ρxy = 0 HA: ρxy ≠ 0
Which of the following is NOT true of the standard error of the estimate?
It can take on negative values
What type of relationship exists between two variables if as one increases, the other decreases?
Negative
What kind of relationship: As one variable increases, the other increases
Positive
What type of relationship exists between two variables if as one increases, the other increases?
Positive
Which is easier to interpret and why? R^2 or Se?
R^2 because it has both lower and upper bounds that make its interpretation more intuitive
Multiple R
R^2 is the square of multiple R Multiple R is the square root of R^2
The residual e represents
The difference between an observed and predicted value of the response variable at a given value of the explanatory variable
What can covariance tell you?
The direction of the linear relationship between two variables (+ or -) CANNOT say strength of the relationship
The standard error of the estimate, Se, is what?
The positive square root of Se^2
For which of the following situations is the multiple regression model appropriate?
The response variable is influenced by to or more explanatory variables
When would you use a simple linear regression model
The response variable y is influenced by one explanatory variables
What best defines the rejection region of a test?
The set of values of a test statistic for which the null hypothesis is rejected
y(hat) = 20 - 3x. Interpret the estimated slope coefficient:
The slope if negative, indicating a negative linear relationship
Which of the following is a possible advantage of using multiple tools to judge the validity of a regression model?
To avoid the risk of using the wrong model
T/F: In multiple linear regression for the sample regression equation, bi measures the change in the predicted value of the response variable y(hat) given a unit increase in the associated explanatory variable xi, holding all other explanatory variables constant
True; bi represents the partial influence of xi on y(hat)
Why do we use the multiple regression model instead of the simple regression model?
We add explanatory variables to increase model's usefulness
When is there no linear relationship?
When covariance is 0
deterministic relationship between variables
When the value of the response variable is uniquely determined by the values of the explanatory variables
which best describes outliers?
a few extreme high or low values in the data set
What best defines a test statistic in a hypothesis test?
a variable upon which the decision in hypothesis testing is based
In practice, we use a stochastic model over a deterministic model because
certain variables that impact the response variable are not included in the model
What is a strong correlation relationship?
close to -1 or 1
The goodness-of-fit measure that quantifies the proportion of the variation in the response variable that is explained by the sample regression equation is the
coefficient of determination
In a regression model, the Multiple R is the
correlation between the response variable and its predicted value
What is the difference between correlation and causation?
correlation means that two variables are related, but causation means one causes the other to happen
in regression analysis, the response variable is also called the
dependent variable
Other names for response variable
dependent variable, the explained variable, the predicted variable, or the regressand
When the response variable is uniquely determined by the explanatory variable, the relationship is ______
deterministic
unlike R(squared), adjusted R(squared) can be used to compare regression models with
different numbers of explanatory variables
In order to select the preferred model (multiple or singular) we need to examine what?
goodness-of-fit measures
Since the standard error of the estimate....
has no predefined upper limit, it is hard to interpret in isolation
Multiple linear regression model allows us to study what?
how the response variable is influenced by two or more explanatory variables
In regression analysis, the explanatory variable is also called the
independent variable
Other names for explanatory variable
independent variables, predictor variables, control variables, or regressors
One limitation of correlation analysis is that
it only captures a linear relationship between two variables
In hypothesis tests about the population correlation coefficient, the alternative hypothesis of not equal to zero is used when testing whether two variables are ____________
linearly related
When two regression models applied on the same data set have the same response variable but a difference number of explanatory variables, the model that would provide the better fit is the one with the
lower s and higher adjusted R(squared)
If the correlation between the response variable and the explanatory variables is sufficiently low, then adjusted R(squared)
may be negative
Sample covariance
measure of the linear relationship between two variables X and Y
The common approach to fitting a line to sample data in a scatterplot is to
minimize the value of the sum of the squared residuals
In E(y) = β0 + β1x, when β1 < 0 what is the relationship?
negative linear relationship
In E(y) = β0 + β1x, when β1 = 0 what is the relationship?
no linear relationship
How many explanatory variables does a simple linear regression model have?
one
In E(y) = β0 + β1x, when β1 > 0 what is the relationship?
positive linear relationship
OLS method (ordinary least squares)
produces the straight line that is "closest" to the data by finding where the SSE is minimized. SSE is the sum of the squared differences between the observed values y and their predicted values y(hat) OR the sum of the squared distances from the regression equation
Which of the following measures are used to judge the goodess-of-fit of a regression model?
s, R(squared), and adjusted R(squared)
The sample variance of the residual s(squared), is defined as
the average of the squared differences between y and y(hat)
residual e
the difference between the observed and the predicted values of y, that is y − y(hat) ,
Simple linear regression model assumption
the expected value of y lies on a straight line, denoted by β0 + β1x, where β0 and β1 are the unknown intercept and slope parameters
SSR represents
the explained variation in the response variable
To estimate the parameters β0 and β1 we use what?
the method of least squares (ordinary least squares (OLS))
In the sample regression equation: y(hat)= b0 + b1x, y(hat) is
the predicted value of the response variable given a specified value of the explanatory variable x
unlike R(squared), adjusted R(squared) explicitly accounts for
the sample size and the number of explanatory variables
How to determine the better fit to a model
the smaller Se implies a better fit to the model
Why to we conduct a hypothesis test for correlation coefficient?
to determine whether the apparent relationship between the two variables, implied by the sample correlation coefficient, is real or due to chance
What does a negative value of the sample covariance imply?
when x is above its mean and y is below its mean= x and y have a negative linear relationship
What does a positive value of the sample covariance imply?
when x is above its mean, y is also above its mean= x and y have a positive linear relationship
On a scatterplot for simple linear regression model, where do x and y go?
y on vertical axis, x on horizontal axis implying that x influences the variation in y
deterministic component of the simple linear regression model
β0 + β1x (aka the expected value for y for a given value of x)