stats final comprehensive questions
Why should you always carry out a residual analysis as part of a regression model?
residual analysis evaluates the assumptions of regression to help determine if the regression model that is selected is appropriate
What are the assumptions of Regression?
Linearity Independence of Errors Normality of Errors Equal Variance #LINE
How does the interpretation of the regression coefficients differ in multiple regression and simple linear regression?
SIMPLE LINEAR = the slope (B1) represents the change in the mean of Y per unit change in X ***doesn't take into account any other variables MULTIPLE = the slope (B1) represents the change in the mean of Y per unit change in X1, holding constant the effects of the other X variables (taking into account the effects of X2) ***taking into consideration the 2 independent (x) variables
What is the interpretation of the Y intercept and the slope in the simple linear regression equation?
Y intercept (B0) = the mean value of Y when X equals 0 Slope (B1) = expected change in Y per unit change in X
What is the difference between r^2 and adjusted r^2?
adjusted r^2 is used in multiple regression because it takes into account both the # of independent variables and the sample size of the model & provides a more appropriate interpretation when comparing models
When do you use logistic regression?
enables you to use regression models to predict the probability of a particular categorical responses for a given set of independent variables
What is the difference between moving averages and exponential smoothing?
exponential moving average gives a higher weighting to recent prices, while the simple moving average assigns equal weighting to all values
How does the least-squares linear trend forecasting model developed in this chapter differ from the least-squares linear regression model considered in Chapter 13?
for trend forecasting, you can simplify the interpretation of the coefficients by assigning coded values to the X (time) variable
principle of parsimony
guides you to pick least complex regression model (fewest independent variables)
Interpret the meaning of a slope coefficient equal to 2.2 in logistic regression.
holding other variables constant, the natural log of estimated odds ratio for dependent categorical response will increase by 2.2 for each unit increase in a particular independent variable
What is the interpretation of the coefficient of determination?
measures the % variation in Y that is explained by the variation in X in the regression model
What is the issue that arises when categorical variables are present?
they violate the normality assumption of the least-squares method and can result in predicted Y values that are impossible
How does autoregressive modeling differ from the other approaches to forecasting?
this model is used to forecast time series that display autocorrelation (when values are highly correlated with the values that preceded and succeed them)
Under what circumstances do you include an interaction term in a regression model?
when the effect of an independent (X1) variable on the dependent variable (Y) changes according to the value of a second independent variable (X2), you would use an interaction term to model an interaction effect in a regression model
Why and how do you use dummy variables?
Why? to examine the effect of a categorical independent variables How? dummy variables are used to recode a categorical variable using numerical values of 0 & 1 so that they can be correctly analyzed in the regression analysis
How does testing the significance of the entire multiple regression model differ from testing the contribution of each independent variable?
overall F test determines if there is a significant relationship between the dependent variable & the entire set of independent variables t- test measures the contribution of a variable (x1) while the remaining variables (x2) are included in the model
What is a time series?
set of numerical data collected over time