Quantitative Methods

Ace your homework & exams now with Quizwiz!

trend

A long-term pattern of movement in a particular direction.

discriminant analysis

A multivariate classification technique used to discriminate between groups, such as companies that either will or will not become bankrupt during some time frame.

learning rate

A parameter that affects the magnitude of adjustments in the weights in a neural network.

dummy variable

A type of qualitative variable that takes on a value of 1 if a particular condition is true and 0 if that condition is false.

dendrogram

A type of tree diagram used for visualizing a hierarchical cluster analysis—it highlights the hierarchical relationships among the clusters.

multiple linear regression

Linear regression involving two or more independent variables.

supervised learning

Machine learning where algorithms infer patterns between a set of inputs (the X's) and the desired output (Y). The inferred pattern is then used to map a given input set into a predicted output.

in-sample forecast errors

The residuals from a fitted time-series model within the sample period used to fit the model

residual autocorrelation

The sample autocorrelations of the residuals.

root mean squared error (RMSE)

The square root of the average squared forecast error; used to compare the out-of-sample forecasting performance of forecasting models.

regimes

With reference to a time series, the underlying model generating the times series.

log-linear model

With reference to time-series models, a model in which the growth rate of the time series as a function of time is constant.

trimming

Also called truncation, it is the process of removing extreme values and outliers from a dataset.

recall

Also known as sensitivity, in error analysis for classification problems it is the ratio of correctly predicted positive classes to all actual positive classes. Recall is useful in situations where the cost of false negatives (FN), or Type II error, is high.

precision

In error analysis for classification problems it is ratio of correctly predicted positive classes to all predicted positive classes. Precision is useful in situations where the cost of false positives (FP), or Type I error, is high.

target

In machine learning, the dependent variable (Y) in a labeled dataset; the company in a merger or acquisition that is being acquired.

robust standard errors

Standard errors of the estimated parameters of a regression that correct for the presence of heteroskedasticity in the regression's error term.

soft margin classification

An adaptation in the support vector machine algorithm which adds a penalty to the objective function for observations in the training set that are misclassified.

name entity recognition

An algorithm that analyzes individual tokens and their surrounding semantics while referring to its dictionary to tag an object class to the token.

parts of speech

An algorithm that uses language structure and dictionaries to tag every token in the text with a corresponding part of speech (i.e., noun, verb, adjective, proper noun, etc.).

hierarchical clustering

An iterative unsupervised learning procedure used for building a hierarchy of clusters.

market timing

Asset allocation in which the investment in the market is increased if one forecasts that the market will outperform T-bills.

seasonality

A characteristic of a time series in which the data experiences regular and predictable periodic changes, e.g., fan sales are highest during the summer months.

majority-vote classifier

A classifier that assigns to a new data point the predicted label with the most votes (i.e., occurrences).

K-means

A clustering algorithm that repeatedly partitions observations into a fixed number, k, of non-overlapping clusters.

bag-of-words (BOW)

A collection of a distinct set of tokens from all the texts in a sample dataset. BOW does not capture the position or sequence of words present in the text.

random forest classifier

A collection of a large number of decision trees trained via a bagging method.

corpus

A collection of text data in any form, including list, matrix, or data table forms

learning curve

A curve which plots the accuracy rate (= 1 - error rate) in the validation or test samples (i.e., out-of-sample) against the amount of data in the training sample, so is useful for describing under- and overfitting as a function of bias and variance errors.

fitting curve

A curve which shows in- and out-of-sample error rates (Ein and Eout) on the y-axis plotted against model complexity on the x-axis.

test sample

A data sample that is used to test a model's ability to predict well on new data.

training sample

A data sample that is used to train a model.

validation sample

A data sample that is used to validate and tune a model.

labeled data set

A dataset that contains matched sets of observed inputs or features (X's) and the associated output or target (Y).

chain rule of forcasting

A forecasting process in which the next period's value as predicted by the forecasting equation is substituted into the right-hand side of the equation to give a predicted value two periods ahead.

summation operator

A functional part of a neural network's node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function.

activation function

A functional part of a neural network's node that transforms the total net input received into the final output of the node. The activation function operates like a light dimmer switch that decreases or increases the strength of the input.

confusion matrix

A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).

support vector machine (SVM)

A linear classifier that determines the hyperplane that optimally separates the observations into two sets of data points.

multiple linear regression model

A linear regression model with two or more independent variables.

ensemble learning

A technique of combining the predictions from a collection of models to achieve a more accurate prediction.

bootstrap aggregating (oe bagging)

A technique whereby the original training data set is used to generate n new training data sets or bags of data. Each new bag of data is generated by random sampling with replacement from the initial training set.

complexity

A term referring to the number of features, terms, or branches in a model and to whether the model is linear or non-linear (non-linear is more complex).

regularization

A term that describes methods for reducing statistical variability in high dimensional data estimation problems.`

Breusch-Pagan test

A test for conditional heteroskedasticity in the error term of a regression.

random walks

A time series in which the value of the series in one period is the value of the series in the previous period plus an unpredictable random error.

auto-regressive model (AR)

A time series regressed on its own past values, in which the independent variable is a lagged value of the dependent variable.

unit root

A time series that is not covariance stationary is said to have a unit root.

divisive clustering

A top-down hierarchical clustering method that starts with all observations belonging to a single large cluster. The observations are then divided into two clusters based on some measure of distance (similarity). The algorithm then progressively partitions the intermediate clusters into smaller ones until each cluster contains only 1 observation.

first-differencing

A transformation that subtracts the value of the time series in period t - 1 from its value in period t.

linear trend

A trend in which the dependent variable changes at a constant rate with time.

composite variable

A variable that combines two or more variables that are statistically strongly related to each other.

independent variable

A variable used to explain the dependent variable in a regression; a right-hand-side variable in a regression equation.

eigenvectors

A vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features.

deep learning

Algorithms based on complex neural networks, ones with many hidden layers (more than 3), that address highly complex tasks, such as image classification, face recognition, speech recognition, and natural language processing.

unconditional heteroskedacity

Heteroskedasticity of the error term that is not correlated with the values of the independent variable(s) in the regression.

neural networks

Highly flexible machine learning algorithms that have been successfully applied to a variety of tasks characterized by non-linearities and interactions among features. They typically consist of three layers: input layer; hidden layer, where learning occurs; and output layer.

deep learning nets (DLN)

Algorithms based on complex neural networks, ones with many hidden layers (more than 3), that address highly complex tasks, such as image classification, face recognition, speech recognition, and natural language processing.

error autocorrelation

The autocorrelation of the error term.

features

The independent variables (X's) in a labeled dataset.

linear classification

A binary classifier that makes its classification decision based on a linear combination of the features of each data point.

agglomerative clustering

A bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster. The algorithm finds the two closest clusters, based on some measure of distance (similarity), and combines them into 1 new larger cluster. This process is repeated iteratively until all observations are clumped into a single large cluster.

document term matrix (DTM)

A matrix where each row belongs to a document (or text file), and each column represents a token (or term). The number of rows is equal to the number of documents (or text files) in a sample text dataset. The number of columns is equal to the number of tokens from the BOW built using all the documents in the sample dataset. The cells typically contain the counts of the number of times a token is present in each document.

adjusted R^2

A measure of goodness-of-fit of a regression that is adjusted for degrees of freedom and hence does not automatically increase when another independent variable is added to a regression.

eigenvalue

A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector.

grid search

A method of systematically training a model by using various combinations of hyperparameter values, cross validating each model, and determining which combination of hyperparameter values ensures the best model performance.

hyperparameter

A parameter whose value must be set by the researcher before learning begins.

scree plots

A plot that shows the proportion of total variance in the data explained by each principal component.

feature engineering

A process of creating new features by changing or transforming existing features.

feature selection

A process whereby only pertinent features from the dataset are selected for model training. Selecting fewer features decreases model complexity and training time.

logistic regression (logit model)

A qualitative-dependent-variable multiple regression model based on the logistic probability distribution.

probit regression (probit model)

A qualitative-dependent-variable multiple regression model based on the normal distribution.

multicollinearity

A regression assumption violation that occurs when two or more independent variables (or combinations of independent variables) are highly but not perfectly correlated with each other.

generalized least squares

A regression estimation technique that addresses heteroskedasticity of the error term.

log-log regression model

A regression that expresses the dependent and independent variables as natural logarithms.

penalized regression

A regression that includes a constraint such that the regression coefficients are chosen to minimize the sum of squared residuals plus a penalty term that increases in size with the number of included features.

pruning

A regularization technique used in CART to reduce the size of the classification or regression tree—by using pruning sections of the tree that provide little classifying power are removed.

N-grams

A representation of word sequences. The length of a sequence varies from 1 to n. When one word is used, it is a unigram; a two-word sequence is a bigram; and a 3-word sequence is a trigram; and so on.

regular expression (regex)

A series of texts that contains characters in a particular order. Regex is used to search for patterns of interest in a given text.

time series

A set of observations on a variable's outcomes in different time periods.

dimension reduction

A set of techniques for reducing in the number of features in a dataset while retaining variation across observations to preserve the information contained in that variation.

application programming interface (API)

A set of well-defined methods of communication between various software components and typically used for accessing external data.

cluster

A subset of observations from a data set such that all the observations within the same cluster are deemed "similar."

K-nearest neighbor

A supervised learning technique that classifies a new observation by finding similarities ("nearness") between this new observation and the existing data.

classification and regression tree (CART)

A supervised machine learning technique that can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree. CART is commonly applied to binary classification or regression.

ceiling analysis

A systematic process of evaluating different components in the pipeline of model building. It helps to understand what part of the pipeline can potentially improve in performance by further tuning.

cross validation

A technique for estimating out-of-sample error directly by determining the error in validation samples.

k-fold cross-validation

A technique in which data (excluding test sample and fresh data) are shuffled randomly and then are divided into k equal sub-samples, with k - 1 samples used as training samples and one sample, the kth, used as a validation sample.

first-order serial correlation

Correlation between adjacent observations in a time series.

holdout samples

Data samples that are not used to train a model.

metadata

Data that describes and gives information about other data.

covariance stationary

Describes a time series when its expected value and variance are constant and finite in all periods and when its covariance with itself for a fixed number of periods in the past or future is constant and finite in all periods.

variance error

Describes how much a model's results change in response to new data from validation and test samples. Unstable models pick up noise and produce high variance error, causing overfitting and high out-of-sample error.

bias error

Describes the degree to which a model fits the training data. Algorithms with erroneous assumptions produce high bias error with poor approximation, causing underfitting and high in-sample error.

cointegrated

Describes two time series that have a long-term financial or economic relationship such that they do not diverge from each other without bound in the long run.

qualitative dependent variables

Dummy variables used as dependent variables rather than as independent variables.

common size statement

Financial statements in which all elements (accounts) are stated as a percentage of a key figure such as revenue for an income statement or total assets for a balance sheet.

conditional skedasticity

Heteroskedasticity in the error variance that is correlated with the values of the independent variable(s) in the regression.

LASSO

Least absolute shrinkage and selection operator is a type of penalized regression which involves minimizing the sum of the absolute values of the regression coefficients.

reinforcement leanring (RL)

Machine learning in which a computer learns from interacting with itself (or data generated by the same algorithm).

reinforcement learning

Machine learning in which a computer learns from interacting with itself (or data generated by the same algorithm).

unsupervised learning

Machine learning that does not make use of labeled data.

mutual information

Measures how much information is contributed by a token to a class of texts. MI will be 0 if the token's distribution in all text classes is the same. MI approaches 1 as the token in any one class tends to occur more often in only that particular class of text.

base error

Model error due to randomness in the data.

web spidering (or scraping) projects

Programs that extract raw content from a source, typically web pages.

term frequency (TF)

Ratio of the number of times a given token occurs in all the texts in the dataset to the total number of tokens in the dataset.

linear regression

Regression that models the straight-line relationship between the dependent and independent variable(s).

positive serial correlation

Serial correlation in which a positive error for one observation increases the chance of a positive error for another observation, and a negative error for one observation increases the chance of a negative error for another observation.

readme files

Text files provided with raw data that contain information related to a data file. They are useful for understanding the data and how they can be interpreted correctly.

analysis of variance (ANOVA)

The analysis of the total variability of a dataset (such as observations on the dependent variable in a regression) into components representing different sources of variation; with reference to regression, ANOVA provides the inputs for an F-test of the significance of the regression as a whole.

n-period moving average

The average of the current and immediately prior n - 1 values of a time series.

centroid

The center of a cluster formed using the K-means clustering algorithm.

kth order autocorrelation

The correlation between observations in a time series separated by k periods.

autocorrelations

The correlation of a time series with its own past values.

out-of-sample forecast errors

The differences between actual and predicted value of time series outside the sample period used to fit the model.

tokenization

The process of splitting a given text into separate tokens. Tokenization can be performed at the word or character level but is most commonly performed at word level.

heteroskedasticity

The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations.

F1 score

The harmonic mean of precision and recall. F1 score is a more appropriate overall performance metric (than accuracy) when there is unequal class distribution in the dataset and it is necessary to measure the equilibrium of precision and recall.

regression coefficients

The intercept and slope coefficient(s) of a regression.

ground truth

The known outcome (i.e., target variable) of each observation in a labelled dataset.

ensemble method

The method of combining multiple learning algorithms, as in ensemble learning.

sentence length

The number of characters, including spaces, in a sentence.

document frequency (DF)

The number of documents (texts) that contain a particular token divided by the total number of documents. It is the simplest feature selection method and often performs well when many thousands of tokens are present.

collection frequency (CF)

The number of times a given word appears in the whole corpus (i.e., collection of sentences) divided by the total number of words in the corpus

accuracy

The percentage of correctly predicted classes out of total predictions. It is an overall performance metric in classification problems.

error term

The portion of the dependent variable that is not explained by the independent variable(s) in the regression.

data mining

The practice of determining a model by extensive searching through a dataset for statistically significant patterns.

exploratory data analysis (EDA)

The preliminary step in data exploration, where graphs, charts, and other visualizations (heat maps and word clouds) as well as quantitative methods (descriptive statistics and central tendency measures) are used to observe and summarize data.

parameter instability

The problem or issue of population regression parameters that have changed over time.

one hot encoding

The process by which categorical variables are converted into binary form (0 or 1) for machine reading. It is one of the most common methods for handling categorical features in text data.

scaling

The process of adjusting the range of a feature by shifting and changing the scale of the data. Two of the most common ways of scaling are normalization and standardization.

backward propagation

The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network's layers.

foreward propagation

The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network's layers.

data preparation (cleansing)

The process of examining, identifying, and mitigating (i.e., cleansing) errors in raw data.

frequency analysis

The process of quantifying how important tokens are in a sentence and in the corpus as a whole. It helps in filtering unnecessary tokens (or features).

winsorization

The process of replacing extreme values and outliers in a dataset with the maximum (for large value outliers) and minimum (for small value outliers) values of data points that are not outliers.

partical regression coefficients

The slope coefficients in a multiple regression. Also called partial slope coefficients.

clustering

The sorting of observations into groups (clusters) such that observations in the same cluster are more similar to each other than they are to observations in other clusters.

mean reversion

The tendency of a time series to fall when its level is above its mean and rise when its level is below its mean; a mean-reverting time series tends to return to its long-term mean.

dependent variable

The variable whose variation about its mean is to be explained by the regression; the left-hand-side variable in a regression equation.

projection error

The vertical (perpendicular) distance between a data point and a given principal component.

data wrangling (preprocessing)

This task performs transformations and critical processing steps on cleansed data to make the data ready for ML model training (i.e., preprocessing), and includes dealing with outliers, extracting useful variables from existing data points, and scaling the data.

overfitting

When a model fits the training data too well and so does not predict well using new data.

generalize

When a model retains its explanatory power when predicting out-of-sample (i.e., using new data).

nonstaionarity

With reference to a random variable, the property of having characteristics such as mean and variance that are not constant through time.

estimated parameters

With reference to a regression analysis, the estimated values of the population intercept and population slope coefficient(s) in a regression.

fitted parameters

With reference to a regression analysis, the estimated values of the population intercept and population slope coefficient(s) in a regression.

serially correlated

With reference to regression errors, errors that are correlated across observations.

model specification

With reference to regression, the set of variables included in the regression and the regression equation's functional form.

heteroskedastic

With reference to the error term of regression, having a variance that differs across observations.


Related study sets

Intro. Inter. Acc. Chapter 4 graded Final Review

View Set

Ch.6 Credit Bureaus and Collection Practices

View Set

Google Level 1 Certification (Units 10-13)

View Set

C214 Chapt 9 Review: Cost of Capital

View Set

(Complete) Chapter 2: Trends in Human Resource Management

View Set

UNIT 10: Configuring the SAP S/4HANA Cloud, Public Edition Solution

View Set

Computer Science - Networking/ Internet

View Set

Chapter 27: Growth and Development of the Preschooler

View Set