CFA Level 2 - Quantitative Methods
n-grams
A technique that compares runs of letters in two values to get a match score from 0 to 100 percent
Deep Learning Algorithms
Algorithms such as neural networks and reinforced learning learn from their own prediction errors and are used for complex tasks such as image recognition and natural language processing.
Log-linear regression
Assumes financial independent variable grows at constant rate
How do you test for Conditional Heteroskedascity?
Breusch Pagan / Chi Squared Test
Ensemble Learning
Combines predictions from models
How to correct overfitting
Complexity Reduction & Cross Validation, penalties are introduced
neural networks
Comprise an input layer, hidden layers (which process the input), and an output layer. The nodes in the hidden layer are called neurons, which comprise a summation operator (that calculates a weighted average) and an activation function (a nonlinear function).
three conditions of covariance stationarity
Constant and finite mean. Constant and finite variance. Constant and finite covariance with leading or lagged values.
What probabilistic models do not account for correlated variables
Decision Trees
Qualitative dependent variables
Dummy variables used as dependent variables rather than as independent variables (Zero or One)
How to test for Serial Correlation
Durbin Watson Test
Durbin-Watson Test Decision Rule
If DW stat < dL then reject H0 >> Positive Serial Correlation If dL < DW < dU then test is inconclusive If DW is > dU then do not reject
Residual Sum of Squares (SSR)
In multiple regression analysis, the sum of the squared OLS residuals across all observations (RSS/K)
Supervised Learning
Inputs and outputs are identified for the computer, and the algorithm uses this labeled training data to model relationships.
Overfitting & Issues
Large number of variables, but low out of sample accuracy/r2
Can the Durbin Watson Test be used with AR Models?
NO
Underfit Model
Underfitting describes a machine learning model that is not complex enough to describe the data it is meant to analyze. An underfit model treats true parameters as noise and fails to identify the actual patterns and relationships
What regression do you use when rate of change is constant?
Whenever the rate of change is constant over time, the appropriate model is a log-linear trend model. The other two choices are a linear trend model and an autoregressive model.
How to correct conditional heteroskedascity
White-Corrected Standard errors
correcting for structural shift
estimate the models before and after the change
When to use classification and regression tree
for non-linear data
Standard error of estimate
gives a measure of the standard distance between the predicted Y values on the regression line and the actual Y values in the data...if the relationship between the dependent and independent variables is very strong, the SEE will be low.
F-test (ANOVA)
has two distinct degrees of freedom, one associated with the numerator (k, the number of independent variables) and one associated with the denominator (n − k − 1). The critical value is taken from an F-table. The decision rule for the F-test is reject H0 if F > Fcritical. Remember that this is always a one-tailed test.
what does adjusted r squared measure
he adjusted R2 provides a measure of the goodness of fit that adjusts for the number of independent variables included in the model
K-Nearest Neighbor
his is used to classify an observation based on nearness to the observations in the training sample.
When is linear trend model used?
if the data points seem to be equally distributed above and below the line and the mean is constant.
presence of autoregressive conditional heteroskedasticity (ARCH)
indicates that the variance of the error terms is not constant. This is a violation of the regression assumptions upon which time series models are based.
Random Forest
large number of classification trees
AR Model Formula
xt = b0 + b1xt − 1 + b2xt − 2 + ... + bpxt − p + εt
Log Linear Regression Formula
yt = e^ [b0 + b1(t)] ln(yt) = ln(eb0 + b1(t)) ⇒ ln(yt) = b0 + b1(t)
Sum of Squared Errors (SSE)
-Measures the unexplained variation in the dependent variable that is explained by the independent variable -The sum of squared vertical distances between estimated and actual Y-values -Sum of squared residuals (SSE)
Dickey Fuller Test
-Test for covariance stationary condition -Subtract a one-period lagged variable from each side of an autoregressive model, then test to see if the new coefficient is different from 0. If not different than 0, coefficient must be a unit root
Regression Assumptions
1) Linearity 2) independence of errors 3) homoscedasticity 4) normality of error distribution The assumptions include a normally distributed residual with a constant variance and a mean of zero.
Conditional Heteroskedasticity: 1. What is it? 2. Effect? 3. Detection? 4. Correction?
1. Residual variance related to level of independent variables. 2. Standard errors are unreliable, but the slope coefficients are consistent and unbiased. 3. BP Chi squared test 4. White-Corrected Std. Errors or Hansen method
Serial correlation (autocorrelation) 1. What is it? 2. Effect? 3. Detection? 4. Correction?
1. Residuals are correlated. 2.Type I errors (for positive correlation) but the slope coefficients are consistent and unbiased. 3. Durbin Watson Test 4. Hansen adjusted standard errors
Sum of Squares Total (SST)
= RSS + SSE
Chain Rule of Forecasting
A forecasting process in which the next period's value as predicted by the forecasting equation is substituted into the right-hand side of the equation to give a predicted value two periods ahead. ˆxt+1=ˆb0+ˆb1xt
Autoregressive model
A regression model in which the independent variables are previous values of the time series.
Multicollinearity
A situation in which several independent variables are highly correlated with each other. This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables. Too many Type II errors and causes coefficient estimates to be unreliable and standard errors to be biased.
F statistic
A statistical test to determine the relationship between the variances of two or more samples. (MSR/MSE)
ANOVA table
A table used to summarize the analysis of variance computations and results. It contains columns showing the source of variation, the sum of squares, the degrees of freedom, the mean square, the F value(s), and the p-value(s). 1. Regression (RSS) / df=k 2. Error (SSE) / df= n-k-1 3. Total (SST) / df = n-1
data curation
Activities to preserve/maintain the quality of data
correction for seasonality
Add a seasonal Lag - Plot data. Check seasonal residuals (autocorrelations) for significance. If residuals are significant, add the appropriate lag (e.g., for monthly data, add the 12th lag of the time series).
Machine Learning Summary
Find the pattern, apply the pattern
Correcting for Linear Trend
First Differencing
Likely candidates for linear trend model
Growth & GDP
Cointegration
Occurs when two time series are moving with a common pattern due to a connection between the two time series
How to Correct Multicollinearity
Omit one or more of the correlated independent variables
Six common misspecifications of the regression model
Omitting a variable. Transforming variable. Incorrectly pooling data. Using a lagged dependent variable as an independent variable. Forecasting the past. Measuring independent variables with error.
How to test for covariance stationarity
Plot the data to see if the mean and variance remain constant. Perform the Dickey-Fuller test (which is a test for a unit root, or if b1 − 1 is equal to zero or b1=1)
positive serial correlation
Positive serial correlation is the condition where a positive regression error in one time period increases the likelihood of having a positive regression error in the next time period. The residual terms are correlated with one another, leading to coefficient error terms that are too small.
Machine Learning Metrics
Precision (P) = true positives / (false positives + true positives) recall (R) = true positives / (true positives + false negatives) accuracy = (true positives + true negatives) / (all positives and negatives) F1 score = (2 × P × R) / (P + R)
k-means clustering
Process of organizing observations into one of k groups based on a measure of similarity.
RMSE (root mean squared error)
Root mean squared error (RMSE) is used to assess the predictive accuracy of autoregressive models (FOR OUT OF SAMPLE DATA) For example, you could compare the results of an AR(1) and an AR(2) model. The RMSE is the square root of the average (or mean) squared error. The model with the lower RMSE is better.
Data Normalization
Scales variables between 0 and 1
When to use a log-linear trend model
Seasonality, data series that exhibits a trend or for which the residuals are correlated or predictable or the mean is non-constant, exponential growth
Structural Change (Coefficient Instability)
Shorter time periods have higher coefficient stability....structural change is indicated by a significant shift in the plotted data at a point in time that seems to divide the data into two distinct patterns.
t statistic formula
Single Variables: The numerator measures the actual difference between the sample data (M) and the population hypothesis (μ). The estimated standard error in the denominator measures how much difference is reasonable to expect between a sample mean and the population mean.
Standard error of Estimate Formula
Square Root (SSresidual/df) or Square root (MSE)
Support Vector Machine
Supervised learning classification tool that seeks a dividing hyperplane for any number of dimensions can be used for regression or classification
Dickey Fuller-Engle Granger (DF-EG)
Test to cointegration. Dickey-Fuller test (which is a test for a unit root, or if b1 − 1 is equal to zero). Null Hypothesis of a unit root
Tokenization
Text is considered to be a collection of tokens, where a token is equivalent to a word. Tokenization is the process of splitting a given text into separate tokens. Bag-of-words (BOW) is a collection of a distinct set of tokens from all the texts in a sample dataset. Stemming is the process of converting inflected word forms into a base word.
Unsupervised Machine Learning
The computer is not given labeled data; rather, it is provided unlabeled data that the algorithm uses to determine the structure of the data
Effects of Model Misspecification
The effects of the model misspecification on the regression results are basically the same for all the misspecifications: regression coefficients are biased and inconsistent, which means we can't have any confidence in our hypothesis tests of the coefficients or in the predictions of the model.
hierarchal clustering
This builds a hierarchy of clusters without any predefined number of clusters.
First Differencing
This is used to correct AR when it has a unit root. You subtract the previous value from the current value to define NEW DEPENDENT variable. Here you graph the CHANGE in the values which is essential the error term. (Yt = = xt − xt−1 ⇒ yt = εt) b0=b1=0 & Yt = Et
Unit Root
To test: 1. Examine autocorrelations 2. Use Dicky Fuller Test If b1 is close or equal to 1 then it is similar to Random Walk which means it is NON STATIONARY.
How to correct for serial correlation
Use Hansen method to adjust Standard Errors
Root Mean Squared Error
[Sum (Predicted - Actual)^2 / n] ^ 0.5 Lower RMSE equals higher predictive power
Principal Component Analysis (PCA)
a dimension-reduction tool that can be used to reduce a large set of variables to a small set that still contains most of the information in the large set
low latency
a network that doesn't have delay
Deep Learning Nets
are neural networks with many hidden layers (more than 20) useful for pattern, speech, and image recognition.
Mean Reversion Formula
b0 / (1-b1)
Data standardization
centers variables at mean of zero and std. dev of 1
Autoregressive Conditional Heteroskedasticity (ARCH)
describes the condition where the variance of the residuals in one time period within a time series is dependent on the variance of the residuals in another period. When this condition exists, the standard errors of the regression coefficients in AR models and the hypothesis tests of these coefficients are invalid.
Confidence Interval
predicted Y value ± (critical t-value)(standard error of forecast)
bag of words model
procedure then collects all the tokens in a document
Confidence Interval of regression coefficient
regression coefficient ± (critical t-value)(standard error of regression coefficient) If zero is contained in the confidence interval constructed for a coefficient at a desired significance level, we conclude that the slope is not statistically different from zero.
Correcting for Exponential trend
take natural log and first difference
Mean Squared Error (MSE)
the average of the squared differences between the forecasted and observed values (SSE / n - k - 1 )
What is the mean reverting level when a unit root is present?
the mean reverting level is undefined (b1 = 1), so the series is not covariance stationary.
R squared
the proportion of the total variation in a dependent variable explained by an independent variable (coefficient of determination) (RSS/SST)
If a time series has a random walk then >>
unit root means that the least squares regression procedure that we have been using to estimate an AR(1) model cannot be used without transforming the data first. A time series with a unit root will follow a random walk process. Since a time series that follows a random walk is not covariance stationary, modeling such a time series in an AR model can lead to incorrect statistical conclusions, and decisions made on the basis of these conclusions may be wrong. Unit roots are most likely to occur in time series that trend over time or have a seasonal element.
How to test for serial correlation in an AR Model
use a t-test to see if correlations at any lag are statistically significant