Data Analysis: Chapter 12: Simple Regression
True or false: If a relationship exists between a response variable Y and a predictor variable X it is appropriate to say that X causes variation in Y.
false
standard error
measure of overall fit - denoted as se - sum of the squared residuals
simple regression line
simple linear model: Y =. slop x X + y-intercept
What does SSE represent in regression analysis?
the amount of variation in Y that is left unexplained
response variable
y variable (the dependent variable) - treated as a random variable
Which of the following is a use of the standard error of estimate in regression analysis?
As a goodness of fit measure
When using a simple regression equation for predicting a response variable, _______ or predicting outside the range observed x values, should be approached with caution.
Extrapolation
The ______ statistic is defined as the ration MSR/MSE.
F
The R2 value falls in the range ____ to ____.
0 to 1
regression assumptions
1. The errors are normally distributed 2. The errors have constant variable, o^2 3. The errors are independent of each other
Which of the following choices describe the information we look for on a scatterplot?
1. pattern of relationship 2. presence of outliers 3. direction of relationship
What type of linear relationship appears to exist between two variables if Rxy = 0.83?
1. positive 2. strong
For which of the following scenarios would a simple regression model be appropriate?
1. predicting height from age 2. predicting salary from years of education 3. predicting units sold from advertising spending
Given the following output from a regression analysis: y(hat) = -54 + 2.3x, R2= .75, p-value=.003 for the zero slope test (a = .05), one would conclude:
1. the slope is significantly different from zero 2. for each unit increase in X, Y increases by 2.3 units, on average. 3. Approximately 75% of the variation in Y is explained by X.
Given that Sexy = 492.45, SSxx = 234.5, b1=____________- (round to 1 decimal).
2.1
In hypothesis test about the population correlation coefficient, p, the null hypothesis for a two-tailed test is ________
H0: p=0
decomposition of variance
SST = SSE+SSR total variation around the mean = unexplained error variation + variation explained by the regression.
The statistic that quantifies the strength of the linear relationship between two quantitative variables, X and Y, is called the ________
correlation coefficient
Which of the following statements best explained the term "goodness of fit"?
how well the regression model fits the sample data
How does the coefficient of determination help as a goodness of fit tool in regression analysis?
it gives the percentage of the variation in Y explained by the sample regression equation
sample correlation coefficient
measures the degree of linearity in the relationship between two random variables X and Y and is denoted as r. - its value will fall in the interval [-1,1].
standardized residual
obtained by dividing each residual by its standard error - offer the advantage of a predictable scale
In statistics, a straight-line model of the relationship between two variables, X and Y, is called a ___________
simple regression equation
With the population linear model: Y=B0 + B1X + E, B1 represents the ____
slope population parameter
Simple regression describes the relationship between two variables, X and Y, using the ______ and _____ form of a linear equation.
slope; intercept
The difference between the quick rule formulas for a confidence interval for Y and a prediction interval for Y is ________
the confidence interval includes the sqr. root of n in the denominator of the margin of error whereas the prediction interval does not.
The use of the standard error of regression, se, as a measure of goodness of fit of a model is best expressed by which of the following statements?
the smaller the value, the better the fit
The test for zero slope will give the same result as the test for zero correlation because _____.
the talc value will be the same in both tests
method of maximum likelihood
this method chooses values of the regression parameters that will maximize the probability of obtaining the observed sample data
True or false: If we leave off the error term from the population linear model, the left hand side of the equation is now the expected value of Y for a given x value.
true
predictor variable
x variable (the independent variable)
In order to fit a regression line on an Excel scatterplot:
1. from the ribbon: click on chart tools > layout > trend line 2. highlight the data points on the scatterplot. Right click and choose Add trend line
Fill in the missing symbols between the sums of squares to express the relationship: SST _______ SSR__________SSE
SST = SSR + SSE
If the slope parameter, B1 =3.52, what does this suggest about the nature of the relationship between X and Y?
positive relationship
spurious correlation
two variables appear related because of the way they are defined.
Which of the following sample correlation coefficients show the strongest association between X and Y?
-0.95
In the simple linear regression model, which of the following is another name for the predictor variable?
1. X variable 2. independent variable
Business applications of correlation analysis
1. financial planners study correlations between asset classes over time, in order to help their clients diversify their portfolios. 2. Marketing analysts study correlations between customer online purchases in order to develop new web advertising strategies 3. Human Resources experts study correlations between measures of employee performance in order to devise new job-training programs.
If the trend line equation is Y(hat) = 15 + 5x, which of the following is the correct interpretation of 5?
For every unit increase in X, Y on the average will increase by 5 units
Both the standard error of the slope estimate and the standard error of the intercept estimate are calculated using:
Se
Which of the following statements is true about the test of H0: p=0?
The test statistic is assumed to follow the t-distribution with n-2 degrees of freedom
autocorrelation
a pattern of nonindependent errors, mainly found in time-series data.
what is a residual?
calculated as the observed value of y minus the estimated value of y - used to estimate the standard deviation of the errors
A _____ interval for Y, the response variable, predicts the mean of Y whereas a ____ interval for Y predicts the individual value for Y.
confidence; prediction
The Excel regression analysis is found in _____
data > data analysis > regression
The goal in regression analysis is to ____
explain variation in the response variable about its mean
high leverage
indicates that the observation is far from the mean of X - such observations have great influence on the regression estimates because they are at the "end of the lever"
The method used to find the line of best fit (minimizing the sum of the squared residuals) is called the ____________ ___________ ________ or OLS.
ordinary least squares
Given the population linear model y=B0 + B1x + E, the term E represents _____
random error due to predictor variables other than X that affect Y
Se is an estimate for the standard deviation of the ____
regression errors
The graph used to show the direction of the relationship between two plotted variables is the ______ plot
scatter
ordinary least squares method
used to estimate a regression so as the ensure the best fit
The residual, e1, in a simple linear regression model represents which of the following?
yi-y(hat)i