CFA lvl 2 Quantitative Methods

¡Supera tus tareas y exámenes ahora con Quizwiz!

Accuracy formula

(TP + TN)/(TP + TN+ FP + FN)

Brent is trying to explain the concept of the standard error of estimate (SEE) to Johnson. In his explanation, Brent makes three points about the SEE: Point 1: The SEE is the standard deviation of the differences between the estimated values for the independent variables and the actual observations for the independent variable. Point 2: Any violation of the basic assumptions of a multiple regression model is going to affect the SEE. Point 3: If there is a strong relationship between the variables and the SSE is small, the individual estimation errors will also be small. How many of Brent's points are most accurate?

2 The statements that if there is a strong relationship between the variables and the SSE is small, the individual estimation errors will also be small, and also that any violation of the basic assumptions of a multiple regression model is going to affect the SEE are both correct. The SEE is the standard deviation of the differences between the estimated values for the dependent variables (not independent) and the actual observations for the dependent variable. Brent's Point 1 is incorrect.

Assuming the a1 term of an ARCH(1) model is significant, the following can be forecast:

A Model is ARCH(1) if the coefficient a1 is significant. It will allow for the estimation of the variance of the error term.

Assumptions of a multiple regression model:

A linear relationship exists between the dependent and independent variables. In other words, the model on the first page of this topic review correctly describes the relationship. The independent variables are not random, and there is no exact linear relation between any two or more independent variables. The expected value of the error term, conditional on the independent variable, is zero The variance of the error terms is constant for all observations The error term for one observation is not correlated with that of another observation The error term is normally distributed.

What are the assumptions of a multiple regression model?

A linear relationship exists between the dependent and independent variables. In other words, the model on the first page of this topic review correctly describes the relationship. The independent variables are not random, and there is no exact linear relation between any two or more independent variables. The expected value of the error term, conditional on the independent variable, is zero [i.e., E(ε | X1, X2, ...,Xk)=0][i.e., E(ε | X1, X2, ...,Xk)=0]. The variance of the error terms is constant for all observations [i.e., E(ε2i)=σ2ε][i.e., E(εi2)=σε2]. The error term for one observation is not correlated with that of another observation [i.e., E(εiεj) = 0, j ≠ i]. The error term is normally distributed.

How do you calculate a mean reverting level?

A time series exhibits mean reversion if it has a tendency to move toward its mean. xt=b0/(1−b1).

What does a value of 2 indicate in a Durbin-Watson (DW) test statistic?

A value of 2 indicates no correlation, a value greater than 2 indicates negative correlation, and a value less than 2 indicates a positive correlation. There is a range of values in which the DW test is inconclusive. D = 2(1-r) r = autocorrelation

In an autoregressive time-series model, seasonality may be corrected by:

Adding an appropriate lag is an appropriate solution to seasonality. Excluding variables can sometimes be used to solve multicollinearity. Transforming using first-differencing can be a cure for nonstationarity. (Module 3.4, LOS 3.l)

The F-Statistic

An F-test assesses how well a set of independent variables, as a group, explains the variation in the dependent variable. In multiple regression, the F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable. The F-statistic is calculated as: F=MSR/MSE = [RSS/k]/[SSE/n−k−1] where: MSR = mean regression sum of squares MSE = mean squared error Important: This is always a one-tailed test! In multiple regression, the F-statistic tests all independent variables as a group.

What is an F-test used for and how do you calculate it?

An F-test assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. F=MSR/MSE = (RSS/k) / (SSE/n−k−1) One-tailed test. Decision rule: reject H0 if F (test-statistic) > Fc (critical value)

Classification and regression trees (CART)

Classification trees are appropriate when the target variable is categorical, and are typically used when the target is binary (e.g., an IPO will be successful vs. not successful). Logit models, discussed in a previous reading, are also used when the target is binary, but are ill-suited when there are significant nonlinear relationships among variables. In such cases, classification trees may be a viable alternative. Regression trees are appropriate when the target is continuous. Classification trees assign observations to one of two possible classifications at each node. At the top of the tree, the top feature (i.e., the one most important in explaining the target) is selected, and a cutoff value c is estimated. Observations with feature values greater than c are assigned to one classification, and the remainder are assigned to the other classification. The resulting classes are then evaluated based on a second feature, and again divided into one of two classes. Every successive classification should result in a lower estimation error than the nodes that preceded it.

Breusch-Pagan test, what is it used for and what is the formula?

Common way to test for heteroskedasticity, calls for the regression of the squared residuals on the independent variables. If conditional heteroskedasticity is present, the independent variables will significantly contribute to the explanation of the squared residuals. The test statistic for the Breusch-Pagan test, which has a chi-square (χ2) distribution, is calculated as: BP chi-square test= n×R^2resid with k degrees of freedom where: n = the number of observations R^2= R^2 from a second regression of the squared residuals from the first regression on the independent variables k = the number of independent variables This is a one-tailed test because heteroskedasticity is only a problem if the R2 and the BP test statistic are too large.

LOS 5.a: Identify and explain steps in a data analysis project.

Conceptualization of the modeling task. Data collection. Data preparation and wrangling. Data exploration. Model training.

Conditional heteroskedasticity

Conditional heteroskedasticity is heteroskedasticity that is related to the level of (i.e., conditional on) the independent variables. For example, conditional heteroskedasticity exists if the variance of the residual term increases as the value of the independent variable increases

Ensemble Learning

Ensemble learning is the technique of combining predictions from multiple models rather than a single model. The ensemble method results in a lower average error rate because the different models cancel out noise. Two kinds of ensemble methods are used: aggregation of heterogeneous learners and aggregation of homogenous learners. Under aggregation of heterogeneous learners, different algorithms are combined together via a voting classifier. The different algorithms each get a vote, and then we go with whichever answer gets the most votes. Ideally, the models selected will have sufficient diversity in approach, resulting in a greater level of confidence in the predictions. Under aggregation of homogenous learners, the same algorithm is used, but on different training data. The different training data samples (used by the same model) can be derived by bootstrap aggregating or bagging. The process relies on generating random samples (bags) with replacement from the initial training sample.

Steps in data exploration:

Exploratory data analysis (EDA) involves looking at data descriptors such as summary statistics, heat maps, word clouds, and so on. The objectives of EDA include:Understanding data properties, distributions, and other characteristics.Finding patterns or relationships, and evaluating basic questions and hypotheses.Planning modeling in future steps. Feature selection is a process to select only the needed attributes of the data for ML model training. The higher the number of features selected, the higher the model complexity and training time. Feature engineering is the process of creating new features by transforming (e.g., taking the natural logarithm), decomposing, or combining multiple features.

Clustering

Given a data set, clustering is the process of grouping observations into categories based on similarities in their attributes (called cohesion). For example, stocks can be assigned to different categories based on their past performance, rather than using standard sector classifiers (e.g., finance, healthcare, technology, etc.).

What are the three violations of regression assumptions and their effects?

Heteroskedasticity - error term has nonconstant variance. *Compute robust standard errors (white-corrected standard errors for t-stats). Breusch-Pagan / scatter diagrams. Serial correlation - error terms are correlated with each other Multicollinearity - linear relationship between independent variables

Hierarchical clustering

Hierarchical clustering builds a hierarchy of clusters without any predefined number of clusters. In an agglomerative (or bottom-up) clustering, we start with one observation as its own cluster and add other similar observations to that group, or form another nonoverlapping cluster. A divisive (or top-down) clustering algorithm starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters.

Random Walk

If a time series follows a random walk process, the predicted value of the series (i.e., the value of the dependent variable) in one period is equal to the value of the series in the previous period plus a random error term.

What is a random walk?

If a time series follows a random walk process, the predicted value of the series (i.e., the value of the dependent variable) in one period is equal to the value of the series in the previous period plus a random error term.

What is a Random Walk with a Drift

If a time series follows a random walk with a drift, the intercept term is not equal to zero. That is, in addition to a random error term, the time series is expected to increase or decrease by a constant amount each period. A random walk with a drift can be described as: xt = b0 + b1xt-1 + εt b0 = the constant drift b1 = 1

A time-series model that uses quarterly data exhibits seasonality if the fourth autocorrelation of the error term:

If the fourth autocorrelation of the error term differs significantly from 0, this is an indication of seasonality. (Module 3.4, LOS 3.l)

In-sample forecasts

In-sample forecasts (ˆyt)(y^t) are within the range of data (i.e., time period) used to estimate the model, which for a time series is known as the sample or test period. In-sample forecast errors are (yt-ˆyt)(yt-y^t), where t is an observation within the sample period. In other words, we are comparing how accurate our model is in forecasting the actual data we used to develop the model. The Predicted vs. Actual Capacity Utilization figure in our Trend Analysis example shows an example of values predicted by the model compared to the values used to generate the model.

To reduce type I error, Stokes should most appropriately increase the model's:

Increase precision High precision is valued when the cost of a type I error (i.e., FP) is large, while high recall is valued when the cost of a type II error (i.e., FN) is large. (LOS 5.g)

What is a dummy variable?

It is a variable created to take place of a quantitative variable. For example, a regression on "blue or green eyes". N-1 is the Df, and you would use zeroes and ones to model the variables.

K-means clustering

K-means clustering partitions observations into k nonoverlapping clusters, where k is a hyperparameter (i.e., set by the researcher). Each cluster has a centroid (the center of the cluster), and each new observation is assigned to a cluster based on its proximity to the centroid. Initially, k centroids are randomly selected, and clustering starts. As a new observation gets assigned to a cluster, its centroid is recalculated, which may result in reassignment of some observations, thus resulting in a new centroid and so forth until all observations are assigned and no new reassignment is made. One limitation of this type of algorithm is that the hyperparameter k is chosen before clustering starts, meaning that one has to have some idea about the nature of the data set. K-means clustering is used in investment management to classify thousands of securities based on patterns in high dimensional data.

K-nearest neighbor (KNN)

More commonly used in classification (but sometimes in regression), this technique is used to classify an observation based on nearness to the observations in the training sample. The researcher specifies the value of k, the hyperparameter, triggering the algorithm to look for the k observations in the sample that are closest to the new observation that is being classified. The specification of k is important because if it is too small, it will result in a high error rate, and if it is too large, it will dilute the result by averaging across too many outcomes. Also, if k is even, there may be ties, with no clear winner. KNN is a powerful, nonparametric model, but it requires a specification of what it means to be near. Analysts need to have a clear understanding of the data and the underlying business to be able to specify the distance metric that needs to be optimized. Another issue with KNN is the specification of feature set; inclusion of irrelevant or correlated features can skew the results. Investment applications of KNN include predicting bankruptcy, assigning a bond to a ratings class, predicting stock prices, and creating customized indices.

Multicollinearity

Multicollinearity refers to the condition when two or more of the independent variables, or linear combinations of the independent variables, in a multiple regression are highly correlated with each other. This condition distorts the standard error of estimate and the coefficient standard errors, leading to problems when conducting t-tests for statistical significance of parameters.

What machine learning model should be used for non-linear data?

Neural network (NN) Support vector model and least absolute shrinkage and selection operator (LASSO) models are appropriate for linear data.

Heteroskedasticity

One of the assumptions of multiple regression is that the variance of the residuals is constant across observations. Heteroskedasticity occurs when the variance of the residuals is not the same across all observations in the sample. This happens when there are subsamples that are more spread out than the rest of the sample.

Out-of-sample forecasts

Out-of-sample forecasts are made outside of the sample period. In other words, we compare how accurate a model is in forecasting the y variable value for a time period outside the period used to develop the model. Out-of-sample forecasts are important because they provide a test of whether the model adequately describes the time series and whether it has relevance (i.e., predictive power) in the real world. Nonetheless, an analyst should be aware that most published research employs in-sample forecasts only.

What is the effect of positive serial correlation?

Positive serial correlation can lead to standard errors that are too small, which will cause computed t-statistics to be larger than they should be, which will lead to too many Type I errors (i.e. the rejection of the null hypothesis when it is actually true).

Principal components analysis

Principal components analysis is a common methodology of complex or large dataset models with a significant number of features to be reduced to their common behavioral attributes within a model that have the strongest predictive power of the defined outcome (i.e., consumer default). Since the model reduces the number of variables needed to explain the variation in the data, it reduces the computational time required to complete it instead of using every single parameter for every single record.

regularization

Regularization forces the beta coefficients of nonperforming features toward zero. Investment analysts use LASSO to build parsimonious models. Regularization can be applied to nonlinear models, such as the estimation of a stable covariance matrix that can be used for mean-variance optimization.

Standard Error of Estimate (SEE)

SEE for a regression is the standard deviation of its residuals. The lower the SEE, the better the model fit. SEE =√MSE

Support vector machine (SVM)

SVM is a linear classification algorithm that separates the data into one of two possible classifiers (e.g., sell vs. buy). Given n features, an n-dimensional hyperplane divides a sample into one of the two possible classifications. SVM maximizes the probability of making a correct prediction by determining the boundary that is farthest away from all the observations. This boundary comprises a discriminant boundary as well as margins on the side of the boundary. The margins are determined by the support vectors, observations that are closest to the boundary. Misclassified observations in the training data are handled via soft margin classification. This adaptation optimizes the tradeoff between a wider margin and classification error. We should note that a more complex, nonlinear model can be used for classification as opposed to SVM to reduce classification error, but this requires more features and may result in overfitting. Applications of SVM in investment management include classifying debt issuers into likely-to-default versus not-likely-to-default issuers, stocks-to-short versus not-to-short, and even classifying text (from news articles or company press releases) as positive or negative.

covariance stationary

Statistical inferences based on ordinary least squares (OLS) estimates for an AR time series model may be invalid unless the time series being modeled is covariance stationary. A time series is covariance stationary if it satisfies the following three conditions: 1) Constant and finite expected value. The expected value of the time series is constant over time. (Later, we will refer to this value as the mean-reverting level.) 2) Constant and finite variance. The time series' volatility around its mean (i.e., the distribution of the individual observations around the mean) does not change over time. 3) Constant and finite covariance between values at any given lag. The covariance of the time series with leading or lagged values of itself is constant.

How to test autoregression models for serial correlation?

Step 1: Estimate the AR model being evaluated using linear regression: Start with a first-order AR model [i.e., AR(1)] using xt = b0 + b1xt-1 + εt. Step 2:Calculate the autocorrelations of the model's residuals (i.e., the level of correlation between the forecast errors from one period to the next). Step 3: Test whether the autocorrelations are significantly different from zero: If the model is correctly specified, none of the autocorrelations will be statistically significant. To test for significance, a t-test is used to test the hypothesis that the correlations of the residuals are zero. The t-statistic is the estimated autocorrelation divided by the standard error. You want the t-statistic to be > than the critical t-value to show it is insignificant

Recall formula

TP / (TP + FN)

Precision formula

TP/(TP+FP)

What is the Durbin-Watson test and how do you interpret it?

The Durbin-Watson statistic tests for serial correlation in the residuals. We are concerned with positive serial correlation, so look for DW < D1 (you will be given d1 and d.u). If it is less that d1, that means there is a problem with serial correlation and you must reject H0.

What hypothesis does the F-statistic test?

The F-statistic tests whether all the slope coefficients in a linear regression are equal to zero.

Bea Carroll, CFA, has performed a regression analysis of the relationship between two economic measures. Her analysis indicates a standard error of estimate (SEE) that is high relative to total variability. Which of the following conclusions regarding the relationship between the two economic measures can Carroll most accurately draw from her SEE analysis? The relationship between the two variables is most likely:

The SEE is the standard deviation of the error terms in the regression, and is an indicator of the strength of the relationship between the dependent and independent variables. Essentially, the standard error of estimate is a measure of how well the regression model fits the data. If the SEE is small, the model fits well. The SEE will be low if the relationship is strong and conversely will be high if the relationship is weak. We do not have enough evidence to conclude that the relationship between the two variables is inverse or logarithmic.

How to detect multicollinearity?

The most common way to detect multicollinearity is the situation where t-tests indicate that none of the individual coefficients is significantly different than zero, while the F-test is statistically significant and the R2 is high. This suggests that the variables together explain much of the variation in the dependent variable, but the individual independent variables don't. The only way this can happen is when the independent variables are highly correlated with each other, so while their common source of variation is explaining the dependent variable, the high degree of correlation also "washes out" the individual effects. General rule of thumb: If the absolute value of the sample correlation between any two independent variables in the regression is greater than 0.7, multicollinearity is a potential problem. However, this only works if there are exactly two independent variables.

What is the equation for linear regression?

The regression line is the line for which the sum of the squared differences (vertical distances) between the Y-values predicted by the regression equation (ˆYi=ˆb0+ˆb1Xi) and actual Y-values, Yi, is minimized. The sum of the squared vertical distances between the estimated and actual Y-values is referred to as the sum of squared errors (SSE).

root mean squared error criterion (RMSE)

The root mean squared error criterion (RMSE) is used to compare the accuracy of autoregressive models in forecasting out-of-sample values. For example, a researcher may have two autoregressive (AR) models: an AR(1) model and an AR(2) model. To determine which model will more accurately forecast future values, we calculate the RMSE (the square root of the average of the squared errors) for the out-of-sample data. Note that the model with the lowest RMSE for in-sample data may not be the model with the lowest RMSE for out-of-sample data.

Jason Brock, CFA, is performing a regression analysis to identify and evaluate any relationship between the common stock of ABT Corp and the S&P 100 index. He utilizes monthly data from the past five years, and assumes that the sum of the squared errors is .0039. The calculated standard error of the estimate (SEE) is closest to:

The standard error of estimate of a regression equation measures the degree of variability between the actual and estimated Y-values. The SEE may also be referred to as the standard error of the residual or the standard error of the regression. The SEE is equal to the square root of the mean squared error. Expressed in a formula, SEE = √(SSE / (n-2)) = √(.0039 / (60-2)) = .0082

What is the standard error of the estimate and how do you calculate it?

The standard error of the estimate is equal to [SSE/(n − k − 1)]1/2

How to find the number of observations in a regression model

The standard error of the estimated autocorrelations is 1/√TT, where T is the number of observations (periods). So, if the standard error is given as 0.0632, the number of observations, T, in the time series must be (1/ 0.0632)2 ≈ 250.

What is the test for ARCH based off of?

The test for ARCH is based on a regression of the squared residuals on their lagged values

Effect of Heteroskedasticity on Regression Analysis

There are four effects of heteroskedasticity you need to be aware of: The standard errors are usually unreliable estimates. The coefficient estimates (the ˆbjb^j) aren't affected. If the standard errors are too small, but the coefficient estimates themselves are not affected, the t-statistics will be too large and the null hypothesis of no statistical significance is rejected too often. The opposite will be true if the standard errors are too large. The F-test is also unreliable.

Unit root testing for non-stationarity

To determine whether a time series is covariance stationary, we can (1) run an AR model and examine autocorrelations, or (2) perform the Dickey Fuller test. In the first method, an AR model is estimated and the statistical significance of the autocorrelations at various lags is examined. A stationary process will usually have residual autocorrelations insignificantly different from zero at all lags or residual autocorrelations that decay to zero as the number of lags increases. A more definitive test for unit root is the Dickey Fuller test. For statistical reasons, you cannot directly test whether the coefficient on the independent variable in an AR time series is equal to 1. To compensate, Dickey and Fuller created a rather ingenious test for a unit root.

How do you test if a model is covariance stationary?

To determine whether a time series is covariance stationary, we can (1) run an AR model and examine autocorrelations, or (2) perform the Dickey Fuller test. In the first method, an AR model is estimated and the statistical significance of the autocorrelations at various lags is examined. A stationary process will usually have residual autocorrelations insignificantly different from zero at all lags or residual autocorrelations that decay to zero as the number of lags increases. Dickey Fuller Test - null hypothesis is g = 0 (i.e., the time series has a unit root). For example, if on the exam you are told the null (g = 0) cannot be rejected, your answer is that the time series has a unit root. If the null is rejected, the time series does not have a unit root.

Least absolute shrinkage and selection operator (LASSO)

Type of supervised ML This is a popular penalized regression model. In addition to minimizing SSE, LASSO minimizes the sum of the absolute values of the slope coefficients. In such a framework, there is a tradeoff between reducing the SSE (by increasing the number of features) and the penalty imposed on the inclusion of more features. Through optimization, LASSO automatically eliminates the least predictive features. A penalty term, λ (lambda), is the hyperparameter that determines the balance between overfitting the model and keeping it parsimonious.

Penalized regressions

Type of supervised machine learning. Penalized regression models reduce the problem of overfitting by imposing a penalty based on the number of features used by the model. The penalty value increases with the number of independent variables (features) used. Imposing such a penalty can exclude features that are not meaningfully contributing to out-of-sample prediction accuracy (i.e., it makes the model more parsimonious). Penalized regression models seek to minimize the sum of square errors (SSE) as well as a penalty value.

Principal component analysis (PCA)

Type of unsupervised ML Problems associated with too much noise often arise when the number of features in a data set (i.e., its dimension) is excessive. Dimension reduction seeks to reduce this noise by discarding those attributes that contain little information. One method is PCA, which summarizes the information in a large number of correlated factors into a much smaller set of uncorrelated factors. These uncorrelated factors, called eigenvectors, are linear combinations of the original features. Each eigenvector has an eigenvalue—the proportion of total variance in the data set explained by the eigenvector. The first factor in PCA would be the one with the highest eigenvalue, and would represent the most important factor. The second factor is the second-most important (i.e., has the second-highest eigenvalue) and so on, up to the number of uncorrelated factors specified by the researcher. Scree plots show the proportion of total variance explained by each of the principal components. In practice, the smallest number of principal components that collectively capture 85%-95% of the total variance are retained. Since the principal components are linear combinations of the original data set, they cannot be easily labeled or interpreted, resulting in a black-box approach.

Unconditional heteroskedasticity

Unconditional heteroskedasticity occurs when the heteroskedasticity is not related to the level of the independent variables, which means that it doesn't systematically increase or decrease with changes in the value of the independent variable(s). While this is a violation of the equal variance assumption, it usually causes no major problems with the regression.

How do you correct heteroskedacity?

Use robust or "White-corrected Standard Errors". Then recalculate the t-stats using it. A second method to correct for heteroskedasticity is the use of generalized least squares, which attempts to eliminate the heteroskedasticity by modifying the original equation

When using the root mean squared error (RMSE) criterion to evaluate the predictive power of the model, what criteria should you observe?

Use the model with the lowest RMSE calculated using the out-of-sample data. RMSE is a measure of error hence the lower the better. It should be calculated on the out-of-sample data i.e. the data not directly used in the development of the model. This measure thus indicates the predictive power of our model.

Neural Networks

Useful in supervised regression and classification models, neural networks (NNs), (also called artificial neural networks, or ANNs) are constructed as nodes connected by links. The input layer consists of nodes with values for the features (independent variables). These values are scaled so that the information from multiple nodes is comparable and can be used to calculate a weighted average. The input values from the nodes in the input layer connect to a second set of nodes in the hidden layer. Typically, several inputs are connected to a particular hidden node, meaning that the node receives multiple input values via the links. The nodes that follow the input variables are called neurons because they process the input information. These neurons comprise a summation operator that collates the information (as a weighted average) and passes it on to a (typically nonlinear) activation function, to generate a value from the input values. This value is then passed forward to other neurons in subsequent hidden layers (a process called forward propagation). A related process, backward propagation, is employed to revise the weights used in the summation operator as the network learns from its errors.

Autoregressive (AR) Models

When the dependent variable is regressed against one or more lagged values of itself, the resultant model is called as an autoregressive model (AR). For example, the sales for a firm could be regressed against the sales for the firm in the previous month.

Autoregressive model (AR)

When the dependent variable is regressed against one or more lagged values of itself, the resultant model is called as an autoregressive model (AR). For example, the sales for a firm could be regressed against the sales for the firm in the previous month. In an autoregressive time series, past values of a variable are used to predict the current (and hence future) value of the variable.

If the p-value of a variable is less than the significance level, can the null hypothesis be rejected?

Yes. The p-value is the smallest level of significance for which the null hypothesis can be rejected. Therefore, for any given variable, if the p-value of a variable is less than the significance level, the null hypothesis can be rejected and the variable is considered to be statistically significant.

Analysis of variance (ANOVA)

a statistical procedure for analyzing the total variability of the dependent variable.

Random forest

a variant of classification trees whereby a large number of classification trees are trained using data bagged from the same data set. A randomly selected subset of features is used in creating each tree, and each tree is slightly different from the others. The process of using multiple classification trees to determine the final classification is akin to the practice of crowdsourcing. Because each tree only uses a subset of features, random forests can mitigate the problem of overfitting. Using random forests can increase the signal-to-noise ratio because errors across different trees tend to cancel each other out. A drawback of random forests is that the transparency of CART is lost, and we are back to the black-box category of algorithms.

Serial correlation

also known as autocorrelation, refers to the situation in which the residual terms are correlated with one another. Serial correlation is a relatively common problem with time series data. Positive serial correlation exists when a positive regression error in one time period increases the probability of observing a positive regression error for the next time period. Negative serial correlation occurs when a positive error in one period increases the probability of observing a negative error in the next period.

coefficient of determination(R2)

defined as the percentage of the total variation in the dependent variable explained by the independent variable. For example, an R2 of 0.63 indicates that the variation of the independent variable explains 63% of the variation in the dependent variable. R^2 = RSS / SST

How do you calculate the T-stat?

forecasted coefficient / standard error of the coefficient estimate

F1 score

harmonic mean of precision and recall

Total sum of squares (SST)

measures the total variation in the dependent variable. SST is equal to the sum of the squared differences between the actual Y-values and the mean of Y.

Sum of squared errors (SSE)

measures the unexplained variation in the dependent variable. It's also known as the sum of squared residuals or the residual sum of squares. SSE is the sum of the squared vertical distances between the actual Y-values and the predicted Y-values on the regression line

Regression sum of squares (RSS)

measures the variation in the dependent variable that is explained by the independent variable. RSS is the sum of the squared distances between the predicted Y-values and the mean of Y.

Alexis Popov, CFA, is analyzing monthly data. Popov has estimated the model xt = b0 + b1 × xt-1 + b2 × xt-2 + et. The researcher finds that the residuals have a significant ARCH process. The best solution to this is to:

re-estimate the model with generalized least squares. If the residuals have an ARCH process, then the correct remedy is generalized least squares which will allow Popov to better interpret the results.

What methods are used to detect serial correlation?

residual plots and the Durbin-Watson statistic.

What is Term frequency (also known as collection frequency)?

the number of times a given word appears in the whole corpus (i.e., collection of sentences) divided by the total number of words in the corpus. *General strategy to identify noisy terms. Term frequency can be calculated and examined to identify outlier words. The statistics of TF range between 0 and 1 because TF values are ratios of total occurrences of a particular word to total number of words in the collection. A sample of words with the highest TF and lowest TF values is also shown to gain insight into what kinds of words occur at these extreme frequencies.

A time series x that is a random walk with a drift is best described as:

xt = b0 + b1*xt−1 + εt The best estimate of random walk for period t is the value of the series at (t − 1). If the random walk has a drift component, this drift is added to the previous period's value of the time series to produce the forecast.


Conjuntos de estudio relacionados

Chapter 4 - Life Policy Provisions and Options

View Set

Words of the Day List 7- The prefixes im- and in-

View Set

Economics-Chapter 1 What Is Economics?

View Set

Chapter 3 International Business

View Set

S.I.E. Chapters 1-7 100Q (missed)

View Set

5.3 Trigonometric Functions of Any Angle

View Set

Clicker Questions for Bio 1082 Exam III

View Set

27-Constipation & Irritable Bowel Syndrome

View Set