Unit 2: Auto Regressive and Moving Average (ARMA) Model
property of lag 0 ACF for white noise
This aligns with the property of the auto-covariance function of a white noise, which is nonzero at lag 0 and it's equal to the variance, but is 0 otherwise.
MLE formula to minimize
To estimate the AR and MA coefficients, we minimize this likelihood function, more precisely, the log of the likelihood function.
why do we prefer that processes are invertible?
We prefer the invertible ARMA processes in practice because if we can invert an MA process to an AR process, we can find the value of Zt which is not observable, based on all past values of Xt which are observable. If a process is non-invertible, then in order to find the value of Zt, we have to know all future values of Xt.
can Yule Walker be used to estimate order of AR(p)?
YES
For ARMA parameter estimation, do you de-mean? Are p and q fixed?
Yes and Yes
Can MA be used on ARMA models?
Yes, if causal
invertible process
Zt is white noise process
stochastic
a random process - an expected value + variation
ARMA stands for
auto-regressive moving average
why do we perform a histogram/Q-Q plot with ARMA/ARIMA?
b/c we assume normality
what is a causal process?
can invert MA to AR and AR to MA model
What's a downside to AICC?
computationally very expense
stationary solution to ARMA equation if this condition met
for all values | z | = 1, phi(z) cannot = 0
How can we use the Yule Walker equations?
for finding the phi coefficients of the AR model
forecasting ARIMA model
forecasting in ARIMA <> forecasting in ARMA b/c ARMA processes assumed causal, and can't do that w/ ARIMA
Sample PACF
good for identifying a 'good' model
what are the conditions of an ARMA process?
if stationary for every t
What conditions make a time series ideal for Yule Walker, asides from the requirements of white noise w/ mean zero and constant variance?
large length of time series or large sample size
as phi increases, is it more or less likely that a process approaches causality?
less likely See example - top time series is stationary while bottom is NOT
What does the last equation here mean?
means that for h > p, the variance ~ 1/p AN = asymptotic normal distribution
what does side = 1 mean?
means we're generating an MA process
what method do you use for an AR process?
method = recursive
are all stationary processes causal?
no
another approach for evaluating a prediction is
seeing if the observed values fall within the prediction bands
how do you find the psi coefficients?
solve a system of linear equations
what part of the notation refers to the autoregressive part?
the B term
what happens to the co-variance between two points in a stationary process as the lag approaches infinity?
the co-variance approaches 0
What's the d in the ARIMA model?
the d represents how many differences - differences can remove trend
prediction accuracy measures
the y* is the predicted values The capital y_hat is mean fit response MSPE is appropriate for validation prediction accuracy for model using the best linear prediction approach. But it depends on scales, and it's sensitive to outliers. MAE is not appropriate to evaluate prediction accuracy of the best linear prediction, and it depends on skills but it is robust to outliers. MAPE is not appropriate again, to evaluate prediction accuracy of the best linear prediction, but it does not depend on the scale, and it is robust to outliers. PM or the percentage measure, it's a appropriate for evaluating prediction accuracy of the best linear prediction and it does not depend on the scale. The PM measure is reminiscent of the regression R squared used in the linear regression. It can be interpreted as the proportion between the variability in the prediction and the variability in the new data. While MAE and MAPE are commonly used to evaluate prediction accuracy, I recommend using the precision measure.
ARMA order and ACF below confidence bands
we do not expect the sample autocorrelation to be approximately 0 for lags larger than the order of the MA process, but to be approximately 0. *only for stationary processes
What do we use to estimate MA parameters, the theta parameters?
we use the Innovations algorithm
ARMA autocovariance function
where psi = coefficients to linear process
Can ARIMA be used on non-stationary processes?
yes
are all causal processes stationary?
yes
is there a significant difference b/w predicting 10 days ahead, and 10 days ahead (But one day at a time)?
yes
if auto-regressing, will you have to filter your data?
yes - since this ARMA model is order 2, you must filter p:end in your indices where p is order
Can Yule Walker determine confidence intervals and statistical significance of parameters?
yes!
identifying order
AR(p) from PACF MA(q) from ACF For an ARMA process, it can be shown that the PACF has the property, that it's the 0 for lags larger than the order p of the AR process. Thus, we can identify the lack of the AR process using the PACF plot. To summarize, if we have an AR process, we can identify p using PACF. And if we have an MA process, we can identify q using ACF. Unfortunately, there are no such simple rules for ARMA(p,q) processes in general.
ARIMA limitations
ARIMA modeling is easy to implement, but it captures the non-stationary in trend assuming similarity among prior observations and thus it can mispredict if there are large changes in the trend. On the other hand, the trend plus ARIMA estimation approach is more difficult to implement, particularly if interested in obtaining confidence bands also, but it can capture long-memory trends. Prediction using ARIMA can perform well if only short time prediction periods are considered.
PACF and confounding
Another important measure of the dependence for a time series that accounts for disconfounding is the partial autocorrelation function.
In the ED data example, what methods do we use to test for correlation?
Box-Pierce and Ljung-Box
AR coefficient estimation via linear regression
Constraint - must assume normality
Confounding
Correlation between two observations in a time series at two time points can result from a mutual linear dependence on observations at other time points called confounding.
what's a less computationally expensive order selection method than AICC?
Extended auto correlation function
coefficient correlation example
From this examples, and other examples, we see that although Xt is modeled as a moving average model, thus, a linear combination of white noise. Then Xt and Xt-1 are still dependent on the sample sequence of white noise. Thus, they are correlated. This correlation is reflected in the ACF plots.
PACF above bands and order
If we were to use the PACF plot to identify the lag of the AR process, we identity an AR process with order one. However, the property of the PACF for an AR process does not hold for non-stationary processes. See non-starionary and stationary, both AR(2) plots The non-stationary makes it look like order 1 when it's order 2
Are the expectation and covariance of the ARIMA process uniquely determined?
Importantly, the expectation and the covariance of the ARIMA process are not uniquely determined.
How does ARIMA differ from ARMA?
In ARMA d = 0 while in ARIMA d > 0 Also, a causal ARMA process will not be causal as ARIMA b/c there are d roots on the unit circle
statistical properties of MLE
Last, we can use the property of the MLEs, the estimated maximum likelihood estimators, particular the property of asymptotically normally distributed, regardless of the assumption on the noise Zt. The MLEs are asymptotically unbiased, that is, for large sample size, the expectation of the MLEs is close to the true parameter values. The covariance matrix of the estimator is directly dependent on the covariance of the time series. This asymptotic distribution can be used for statistical inference on the AR and MA coefficients, as well as for the asymptotic distribution of the ACF and PACF.
Consequence of ACF estimate versus exact value
Moreover, its estimate is not exact, but it's a random variable with a distribution as I will explain in more detail in a next lesson. Because of this, we do not expect the sample autocorrelation to be exactly 0 for lags larger than the order of the MA process, but to be approximately 0. Thus, the ACF plot indicates that the order of the MA process is 1, which is the order of the process we actually simulated.
consider the sample model
Mt is trend St is seasonality Zt is white noise In practice: Because there are many orders to determine those of ARIMA P, D, Q, along with those of a seasonal ARIMA, capital P, capital D, capital Q, such a model is difficult to implement.
To apply ARMA, do the random variables need a normal distribution?
NO Note, that the distribution of the random variables does not need to be normal.
Can Yule Walker be used for MA and ARMA?
NO!
Can the ACF and PACF plots predict p and q for an ARMA process?
No The PACF plot here suggests AR(p) = 3 and the ACF plot here suggests MA(q) = 4. However, it's (2, 2)
would you expect to see periodic patterns in PACF?
No - by design should exclude
If you fit a time series with AR alone and then ARMA, will both models have the same AR(p) that minimizes AIC?
No; in the ED data example the AR alone minimized AIC with p order 6 but with ARMA the p order was 5.
ARMA notation
Phi is a polynomial of order p with coefficients given by the AR portion of the model. Theta is a polynomial of order q with coefficients given by the MA portion
AR linear regression process
1) calculate actuals PACF and see if time series order aligns with order chosen for model 2) calculate coefficients w/ right order 3) see if time series - residuals is inside confidence bands and shows AR(0)
process used in IBM example
1) check log(time series) with ACF/PACF 2) check log(time series difference order = 1) with ACF/PACF (lags should give guidance to AR order needed) 4) check ARMA with ACF/PACF/Histogram/QQ/Ljung and Box-Pierce
process to estimate ARMA parameters
1. The AR coefficients denoted with phi1 to phip and the MA coefficients denoted with theta1 to thetaq are unknown parameters. But if the process has non-zero mean, then, mu, the mean of the process, is a parameter, along with a variance of the y noise Zt. 2. Commonly, we estimate them in parameter mu first, and 3. subtract mu from the process, assuming a zero mean process. 4.Thus, we only estimate the AR and MA coefficients, and the variance parameter.