CFA Level II Quick Sheet - Quantitative Methods

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

F1 Score

(2 x P x R)/(P + R)

T - Test

(B^ - B1) Divided By Standard Error DF = N-2

Standard Error of Estimate

(SSE)^1/2

Accuracy

(True Positives +True Negatives) / (True Positives + False Negatives)

Normalization

(X1 - Xmin)/ (XMax -XMin)

Breusch-Pagan test

A test for conditional heteroskedasticity in the error term of a regression. Tests if R^2 is equal to zero which would show that the error is not explained. One Tailed Test

Deep Learning Algorithms

Algorithm such as neural networks and reinforced learning learn from their own prediction errors are used for complex tasks such as image recognition and natural language processing

Supervised Learning

Algorithm uses labeled training data to model relationship

Unsupervised Machine Learning

Algorithm uses unlabeled data to determine the structure of the data

Serial Correlation

Autocorrelation divided by the standard error. Standard Error 1/(N)^1/2 Tested with a T - Test to show variable with significance and misspecification

Mean Reverting Level

B0 divided by (1-B1)

Out of Sample Error

Bias Error + Variance Error + Base Error

Regressing a Random Walk

Both Variables are Covariance Stationary --> Yes One Variable is Covariance Stationary --> No Neither are covariance stationary --> Check for Cointegration

Categories of Supervised Learning

Classification --> Categorical and Ordinal Regression --> Continous

Data Sets for Supervised Learning

Classification and Prediction

Unit Root

Coefficients are equal to one so the series is not covariances non stationary.

Hansen method

Corrects Autocorrelation by adjusting standard errors upward.

White-Corrected Standard Errors

Corrects Heteroskedasticity by inflating the standard errors.

Autocorrelation

Correlation among error terms. Each error term seems to be in the direction of the previous error term. Standard Error is too small.

F - Test - Multicollinearity

Detects Multicollinearity and helps arrive at a variable to omit.

Random Forest

Does not reduce signal to noise ration

Support Machine Vector

Linear Classifier that seeks an optimal hyperplane which separates into two data sets by the maximum margin

Test Significance of Regression Formula (F-Test)

MSR / MSE With K and N-K-1 Degrees of Freedom

F Test Formula

MSR/MSE RSS/k Divided by SSE/K-N-1

Covariance Stationary

Mean and Variance do not change over time. There is a mean reverting level with a constant expected value, covariance and correlation. For Autoregression models to work they must be covariance stationary.

Ensemble learning provides

More accurate and stable predictions

If Regression Shows no significant T-Tests but F-Test is significant than it results in

Multicollinearity

Dummy Variables Used

N-1 The Omitted variable becomes the Intercept

Cointegration

Two Time series are related to the same macroeconomic event which always the ability to test for Beta

Multicollinearity

Two or more "X" Variables are correlated with each other while in reality they are supposed to be independent. Standard Error is too low - Null is tougher to reject and increases chance of type II error.

Neural Networks

Work Well in the Prescence of non-linear and complex interaction amongst variables

Standard X1

X1 - Mean/ Standard Deviation

Confidence Interval for Predicted Y-Value

Y^ + and - (Two Tailed Critical Value at n-2 degrees of freedom) x (Standard Error of Forecast)

A Significant A1 in the ARCH Model allows

for the estimation of the variance of the error term

Big Data is defined as

high volume, velocity and variety which usually leads to low latency

The Coefficient of Determination shows

how much of the independent variable is explained by the variance in the dependent variables

The Presence of Conditional heteroskedasticity of residuals would lead to

invalid standard errors and statistical tests

Autoregression Models use first differencing to

make adjustments for the data to become stationary

A 2 or higher Durbin Watson Score shows

The possibility of negative serial correlation

Dickey Fuller Test

-Test for covariance stationary condition. -Subtract a one-period lagged variable from each side of an autoregressive model, then test to see if the new coefficient is different from 0. -If not different than 0, coefficient must be a unit root

Assumptions of Linear Regression

1. X and Y share linear relationship 2. Variance of the error term (residual) is constant. 3. Residual term is independently distributed and normally distributed.

An Autoregression Model regresses

A Dependent variable against one or more lagged values

Durbin Watson Test

A test to determine whether first-order Autocorrelation is present. Durbin Watson Statistic = 2(1-Correlation) If DWS is lower than 1.34 (positive autocorrelation) or higher than 2.66 (Negative autocorrelation) then there is evidence of auto correlation.

Seasonality

A time series shows consistent seasonal patterns. Shown by Statistically significant lagged error term through a T-Test. Corrected by adding lagged terms

Model Misspecification

Effects Hypothesis Testing and leads to bias coefficients of tests. - Function Form - Incorrect Pooling of Data - Variables should be transformed - Using lagged dependent variables as independent variables - Forecasting the Past - Measuring independent variables with Error

Data explanation

Encompasses Data, Future Selection and Future Engineering

Forecasting the Past

Financial Statements are published with a lag

Random Walk

Happens with B1 = 1 Xt +Xt-1 + E

Autoregression Conditional Heteroskedasticity Model (ARCH)

Heteroskedasticity occurs when variance is non-constant and conditional on the independent variable. The Error Term today correlates to the Error Term Yesterday Et^2 = A0 +A1Et +E

Heteroskedasticity

Non Constant Error Variance Type II Error --> As the independent variable increases the residual variance expands. Stand Errors are too low.

Durbin Watson Test Parameters

Null Hypothesis is that no autocorrection exists Reject Null if DWS is lower than 1.34 or higher than 2.66

F-Test Degrees of Freedom

Numerator --> Independent Variables Used Denominator --> Sample Size - Independent Variables - 1

Measurement Error

Occurs when you are testing a proxy variable (i.e. corporate governance).

Least absolute shrinkage and selection operator (LASSO)

Penalty Regression where the term compromises summing the absolute values of the regression coefficient

Incorrect Pooling of Data

Pooling data from a government that had a recent change in regime.

Sum of Squares Total Formula

RSS + SSE

Mean Squared Regression Formula (MSR)

RSS / k

R^2

Regression Sum of Squares (RSS) divided by Total Sum of Squares

Effects of Model Misspecification

Regression coefficients are often biased and/or inconsistent, which means we can't have any confidence in our hypothesis tests of the coefficients or in the predictions of the model.

Dimension Reduction Seeks to

Remove Noise and the excessive number of features from a data set

First Differencing

Removes the Unit Root from the Data Set

Mean Squared Error Formula (MSE)

SSE/N-K-1

Test of Fitness

Smaller SEE is better fit Higher R^2 is better Fit

Sum of Squared Error Formula (SSE)

Sum of (Y1-Y^)^2

Residual Sum of Squares Formula (RSS)

Sum of (Y^- YMean)^2

Autoregression

The Dependent Variable is regressed on prior values them self To Model sales in the futures we lost at sales in the past. Specified correct if autocorrelation of residuals are not significant

The F-Test Indicates

The Joint Significance of the independent variables

Underfit Model

Treats Parameters as noise

Precision

True Positives / (False Positives + True Positives)

Recall

True Positives/ (True Positives + False Negatives)

Root Mean Square Error

Used when the target variable is continuous. Sum (Predicted - Actual)^2 Divided by N

Log-Linear Model

Used when you do not want a negative outcomes where B1 is the constant rate of growth. LNY = B0 +B1t or y= E^B0t +B1t

Function Form Misspecification

Variables are omitted. If omitted variables are correlated with each other than the residual will be bias.

Variance of ARCH

Variance = A0 + Ai*Et Tested through a T-STAT

Linear Trend Model

Yt = B0 +B1t+ Error

White Standard Errors in heteroskedastic are higher than

bias errors leading to lower computer t-test and less frequent rejection of the null (Type I Error)

In K-fold cross validation technique the data is

randomly separated into K-1 equal sets than then one set become validation sample and the rest are training samples

The Standard Error of Estimate is the difference of

the Estimated values to the actual values of the dependent variables


Ensembles d'études connexes

Chapter 14: Dividends and Dividend Policy

View Set

Ch 4 Retirement and other Insurance concepts

View Set

Identifying Premises and Conclusions

View Set

Nutrition- chapter 7 study guide

View Set

Radiologic Positioning and Related Anatomy Ch.4 Upper Limb

View Set

Sponsor's Roles and Responsibilities (31%-35%)

View Set