GB 307 Exam 2
Indications of strong collinearity
-Bes Subset Regression - Large change in the value of a previous coefficient when a new variable is added to the model - a previously significant variable becomes non-significant when a new independent variable is added
Time series forecasting methods
-Moving average: simple average of demand from some number of past periods -Weighted moving average: assigns different weights to each period's demand based on importance -Exponential Smoothing: moving average that puts less weight on older data -smoothing coefficient: weight given to most recent demand
What would be the possible consequences of including highly correlated variables in multiple regressions?
-it will inflate the standard errors of the estimated coefficients -it will result in a model with high value of R2 but insignificant t-values for all parameter estimates -the signs of coefficients do not make sense based on logic
Cyclical Component
-repeating up and down movements -affected by business cycle, political, and economic factors -multiple years duration -often causal or associative relationships
square-root transformation
-use when data occur in the form of form of counts (integer values) , such as a poison random variable -helps improve normality of the distribution an equality of variance -transformation
adjusted R^2 formula
1 - ( (1-r^2) * (n-1)/(n-k-1) )
VIF
1/(1-R^2) if > 5, high collinearity if R^2 > .80, high collinearity
Consider a 9-year moving average used to smooth a time series that was first recorded in 1984. 1. Which year serves as the 1st centered value in the smoothed series? 2. How many years of values in the series are lost when computing all 9-year moving averages?
1988 & 8
How many observations will be lost in creatings 1. a five period Moving Average model? 2. a 2nd-order Autoregressive model?
4 and 2
quadratic models
4 models based on quadratic functions
While determining the relationship between dependent variable and 3 independent variable, we tried determining the Variance Inflationary Factor for each of the variable to measure the collinearity among the variables. Which of the below independent variable is highly correlated with other independent variables? VIF = 4.2742 VIF = 4.0006 VIF = 5.2425
5.2425
moving averages
A forecasting method that uses the average of the most recent k data values in the time series as the forecast for the next period.
Autoregressive model
A regression model in which a regression relationship based on past time series values is used to predict the future time series values.
linear trend model
A regression model where a time series variable changes by a constant amount each time period
exponential trend model
A regression model where a time series variable changes by a constant percentage each time period
curvilinear relationship
A relationship in which increases in the values of the first variable are accompanied by both increases and decreases in the values of the second variable The value of the dependent variable Y increases or decreases at a changing rate as the value of X changes
Exponential Smoothing
A weighted-moving-average forecasting technique in which data points are weighted by an exponential function.
In testing a Quadratic effect, we compare Adj R2 from simple regression model to which of the following from the quadratic model?
Adj R2
exponential smoothing formula
Ei = WY + (1-W)*E(i-1)
first-order autocorrelation
For a time series process ordered chronologically, the correlation coefficient between pairs of adjacent observations
causal forecasting methods
Forecasting methods that are based on the assumption that the variable we are trying to forecast exhibits a cause-effect relationship with one or more other variables.
Testing Quadratic Effect
Ho: B2 = 0 (no improvement) H1: B2 not equal to 0 (improvement)
Fstat
MSR/MSE if greater than critical value, reject null
Trend component of a time series
Persistent, overall upward or downward pattern Changes due to population, technology, age, culture, etc. Typically several years duration
quadratic regression model
Regression model in which a nonlinear relationship between the independent and dependent variables is fit by including the independent variable and the square of the independent variable in the model: ; also referred to as a second-order polynomial model.
Seasonal Component (Time Series)
Regular pattern of up and down fluctuations Due to weather, customs, etc. Occurs within a single year
MSE
SSE/d.f.
MSR
SSR/d.f.
irregular component
The component of the time series that accounts for the random variability in the time series.
possible consequences of collinearity
The signs of coefficients do not make sense based on logic It will inflate the standard errors of the estimated coefficients It will result in a model with high value of R2 but insignificant t-values for all parameter estimates
T or F Moving Averages Method: it can't be used for forecasting
True
T or F Moving Averages: Given a data set with 15 yearly observations, a 5-year moving average is smoother than a 3-year moving average
True
T or F Moving Averages: Givena. data set with 15 yearly observations, a 5-year moving average will have fewer observations than a 3-year moving average
True
T or F Moving Averages: It can be used to smooth a series
True
T or F? Autoregressive Model: It takes advantage of autocorrelation between values in different periods
True
Pth order autoregressive model
Y = A0 + A1Y1 + A2Y2....+ APYP
square root regression
Yi = B0 + B1*(X1)^(1/2)
forecasting time period
Yi+1 = Ei
quadratic regression model
Yi=B0 + B1X1 + B2X1^2
time series
a forecasting technique that uses a series of past data points to make a forecast
Mean Absolute Deviation (MAD)
a measure of the overall forecast error for a model
Moving Average Method
a series of arithmetic means, used for smoothing not forecasting
lagged predictor variable
a variable based on the past values of the time series
transformations
are mathematical alterations of data values made to either overcome violations of the assumptions of regression or to make a model whose form is not linear into a linear model. -can be applied to the values of an independent X variable or the dependent Y variable or both
Pth order autocorrelation
autocorrelation exists between values that are p period apart
quadratic term
ax^2
exponential smoothing
consists of a series of exponential weighted moving averages
collinearity
correlation of independent variables with each other, which can bias estimates of regression coefficients
irregular
erratic, nature, accidents, noise
Forecasting
estimates future business conditions by monitoring changes that occur over time
Best subsets approach
evaluates all possible regression models for a given set of independent variables
best subset approach
evaluates all possible regression models for a given set of independent variables, highest adjusted r^2 and lowest standard error is best
second order autocorrelation
every other measurement is related -when autocorrelation exists between values that are two periods apart
principle of parsimony
favors the hypothesis that requires the fewest assumptions
Cross-validation
first splits the existing data into two parts --> You then use the first part of the data to develop the regression model and then use the second part of the data to evaluate the predictive ability of the model (is often used as a practical way of validating a model developed by predictive analytics methods.)
qualitative forecasting
important when historical data is unavailable, subjective, judgemental
L
integer value that corresponds to or is a multiple of the estimated average length of a cycle in a time series, should be an odd number, unable to make computations for first and lsat (L-1)/2 years
variance inflationary factor (VIF)
is a measure of the amount of multicollinearity in regression analysis -Multicollinearity exists when there is a correlation between multiple independent variables in a multiple regression model.
stepwise regression
is useful when there are many independent variables, and a researcher wants to narrow the set down to a smaller number of statistically significant variables
Cp statistic
measures the differences between a fitted regression model and a true model, along with random error
degrees of freedom for autocorrelation
n - 2p - 1
trend
overall long-term upward or downward movement in a time series, can be linear or nonlinear
amount of values lost in autocorrelation
p
Collinearity
refers to situations in which two or more of the independent variables are highly correlated with one another
coded values for time periods
simplifies interpretation of coefficients, starting with 0 for first year
Choosing W
smaller- more smoothing, if you want to eliminate irregular and cyclical variations larger- if you want to forecast future short-term directions
selection of L
subjective, for cyclical trend choose L that is a multiple of estimated length of series
determining appropriateness of selected autoregressive model
t stat, remove one parameter and continue regression until you can reject null
lagged predictor variable
takes its value from the value of a predictor variable from another time period, used to overcome problems autocorrelations causes with other models
autoregressive modeling
technique used to forecast time series that display autocorrelation
after using the quadratic model rather than the linear model, we found that the r2 for the quadratic model is 0.876. What does this mean?
the quadratic model explains 87.6% of the variation in Y
The adjusted R2 for the quadratic model is 0.9765 and the adjusted R2 for the simple regression model is 0.094567. What does this imply
the quadratic model is better
Logarithmic Transformation
the use of "logging" either the y- and/or x-values to straighten the graphical representation of the data
Autocorrelation
the values of a time series at particular points in time are highly correlated with the values that precede and succeed them
Quantitative Forecasting
time series method and causal method
seasonal
trends in monthly or quarterly data throughout a 12 month period
the longer the period, the smaller the number of moving averages you can compute, longer than 5 to 7 is undesirable
true
cyclical
up and downs lasting 2 to 10 years, differ in intensity
square root transformation
used to overcome violations of equal variance assumption as well as to transform a model whose form is not linear into a linear model
stepwise regression
variables are added or deleted from the regression at each step of the model-building process, attempts to find "best" regression model without examining all possible models
First-order autoregressive model equation
y t = β 0 + β 1 y t − 1 + β 2 y t − 2 + ϵ t
quadratic trend model
y=a+bt+ct^2 - the t^2 term allows a nonlinear shape. It is useful for a time series that has a turning point or that is not captured by the exponential model
If a quadratic regression model is Yi=β1X1i+β2X21i+ε, what should be the alternate hypothesis (H1) for testing the quadratic effect?
β2 ≠ 0