Machine learning basics

Ace your homework & exams now with Quizwiz!

classification problem types

* Binary - The label contains two possible classes * Multiclass - The label contains three or more possible mutually exclusive classes * Multi-label - The label has two or more classes and a case may be associated with more than one category (categories are not mutually exclusive) * Imbalanced - The classes are not at all equally distributed, eg. a rare disease or fraud detection. Typically the label is binary.

regression problem

A a type of supervised learning problem with the goal of predicting a continuous valued output.

binary classification problem

A classification problem with exactly two classes.

multiclass classification problem

A classification problem with more than two classes.

test set

A data set used to evaluate a machine learning model.

training set

A data set used to train a machine learning model.

box plot

A graph that displays a five-number summary of a set of data and includes the minimum, first quartile, median, third quartile, and maximum. The box portion of the display has the first quartile, median, third quartile, and the whiskers on either side of the box have the maximum and minimum. A variation of this kind of display plots outlier values as individual points.

clustering algorithm

A machine learning technique used to classify data points into different groups based on relationships among the variables where members of each group are more similar than members of other groups.

supervised learning

A machine learning technique where the prediction the value of the label (output) are based on knowledge of the input data. The features (inputs) provide the information to train the model. There is an assumed relationship between the features and the predicted output.

unsupervised learning

A machine learning technique where there is no assumed relationships among the features (inputs) and the process looks for previously undiscovered relationships.

violin plot

A method of displaying numerical data that shows the probability density of the data.

kernel density estimation

A method of estimating the probability density function of an unknown variable.

linear regression

A method of finding the best model for a linear relationship between the explanatory variables (features or inputs) and response variable (output or label).

dummy variable

A numeric variable with two possible values (typically 0 or 1) that represent the presence or absence of a particular category value.

artificial intelligence

A process using a variety of prediction techniques such as clustering, regression, classification, decision trees, deep learning, neural networks, etc. to address different kinds of prediction problems.

automation

A process where a machine takes over an entire task.

task

A sequence of decisions.

work flow

A sequence of tasks where each task has one or more decisions, and the jobs that have to be done may span several tasks. The goal is to turn inputs into outputs.

quantile

A set of cut points that divide a sample of data into groups containing (as far as possible) equal numbers of observations in a data set.

data set

A set of data. Also spelled as dataset.

quartile

A set of three cut points that divide a sample of data into groups containing (as far as possible) a quarter of the observations in a dataset. The lowest cut point is referred to as Q1, the middle cut point (the median) as Q2, and the highest as Q3.

underfitting

A situation where a machine learning algorithm is too simple or has too few features and the data shows structure not captured by the model.

overfitting

A situation where a machine learning algorithm with too many features fits a particular set of data very well but does a poor job of fitting new examples (does not generalize well).

logistic regression

A statistical model that uses a logistic function to make a binary prediction when the label is categorical, for example, whether or not something is in a particular class. The dependent variable or label represents the logarithm of the odds that the dependent variable is in that class and can have a value from -negative to positive infinity.

support vector machine

A supervised learning technique that allows an algorithm to deal with an infinite number of features (independent variables).

scatter plot

A type of plot or mathematical diagram using Cartesian coordinates to display values for two variables in adata set.

classification problem

A type of supervised learning problem with the goal of predicting a discrete valued output.

outlier

A value in a data set that is either less than the lower quartile value or greater than the upper quartile value by a factor of 1.5 times the interquartile range.

false positive rate

Also called fall-out or false alarm ratio, this is a success metric for a machine language classification model that measures what proportion of actual positive cases were falsely identified as negative. Computed using (False Positive)/(False Positives + True Negatives)) or (False Negatives)/(All Negative cases)

false negative rate

Also called miss rate, this is a success metric for a machine language classification model that measures what proportion of actual positive cases were falsely identified as negative. Computed using (False Negatives)/(True Positives + False Negatives)) or (False Negatives)/(All Positive cases)

precision

Also called positive predictive value, this is a success metric for a machine language classification model that measures what proportion or fraction of cases identified as positive were correct. Computed using (True Positives)/(True Positives + False Positives))

true negative rate

Also called selectivity or specificity, this is a success metric for a machine language classification model that measures what proportion or fraction of actual negative cases were identified as negative. Computed using (True Negatives)/(True Negatives + False Positives)) or (True Negatives)/(All Negative cases)

true positive rate

Also called sensitivity or recall, this is a success metric for a machine language classification model that measures what proportion or fraction of actual positive cases were identified as positive. Computed using (True Positives)/(True Positives + False Negatives)) or (True Positives)/(All Positive cases)

logit function

Also known as the logarithm of the odds, this is the value given by log(p/(1-p)).

deep learning

An AI function that imitates the workings of the human brain in processing data and creating patterns for use in decision making (relies on "back propagation" or learning by example).

learning algorithm

An algorithm that outputs a particular hypothesis.

feature

An input or independent variable of a machine learning model used to help make a prediction. The inputs may be numerical or a discretely valued categorical variable.

F-score

Based on both accuracy and recall, this is a measure of the accuracy of a binary classification algorithm. It is defined as: 2*[(Precision)(Recall)] / [Precision + Recall], where Precision = (True Positives)/(True Positives + False Positives)), and Recall = (True Positives)/(True Positives + False Negatives)) The closer the value is to 1, the more accurate the test.

unknown unknowns

Black swans, unpredictable from past data (bolt out of the blue, Napster, impact of social media).

visualization options for numerical data

Box plots Bar charts Kernel density estimate plot Violin plot

one hot encoding

Converting a categorical variable with N-1 category levels to N binary variables (dummy variables). Only N-1 binary variables needed because the presence of the Nth could be represented by the N-1 variables all being zero.

externality

Costs or liabilities that are borne but others beyond the decision makers or the group that benefits from the decision (bitcoin, acid rain).

reward function engineering

Determining the rewards to various actions given the prediction made by the AI (prediction algorithm). Could include whether to pass of the reward determination to human judgment.

judgment

How different outcomes and errors get compared, ranked, and otherwise evaluated.

satisficing

If lacking good prediction skills, make decisions that are good enough.

decision making

In artificial intelligence, the process of making a choice or finding a solution after obtaining the results of a prediction.

intelligence vs. artificial intelligence

Intelligence encompasses several things including prediction, judgement, action, evaluating results, and choosing data. Artificial intelligence has a much more narrow focus on prediction.

known unknowns

Sparse data, bad predictions for algorithms (earthquakes, elections), but in some areas (catching a ball, facial recognition) humans do well with sparse data, area where joint machine-human decisions work well.

elements of a decision

Prediction - What you need to know (the prediction) to make a decision Judgment - How do you value the different outcomes and errors Action - What you are trying to do Outcome - Metrics for success Input - What is needed to run the predictive algorithm Training - What is needed to train the algorithm Feedback - How to use the outcome to improve the algorithm

churn

Replacing lost customers with new ones or an outcome where a current customer becomes a former customer.

known knowns

Rich data for use in predictions (fraud detection, medical diagnosis)

accuracy

Success metric for a machine language classification model that measures the overall percentage of correctly assigned cases or fraction of correctly identified cases. Computed using (True Positives + True Negatives)/(All Cases))

intelligence

The ability to obtain useful information, where prediction is a key component.

label

The dependent variable or output of a supervised machine learning process.

interquartile range

The difference between the upper and lower quartile value of a numerical data set.

machine learning

The discipline of allowing a machine such as a computer to learn from data without being explicitly programmed to do so.

cost function

The magnitude of the error between the prediction of the hypothesis of a learning function and the actual reality.

scaling

The mathematical transformation of numeric features to make the distribution look more like a standard normal distribution.

AI winter

The period from the 1950s to at least the 1980s where the promise and potential of AI was not realized due in part to underestimating the difficulties of the problem and the limitations of mathematical models, software, and hardware of the era.

odds

The probability of success (p) divided by the probability of failure (1-p) and is given as p/(1-p).

prediction

The process of filling in missing information that includes taking information you have (data) and creating information that you don't have. The ability to see otherwise hidden information, whether it be in the past, present, or future. A key component of intelligence and artificial intelligence.

data cleaning

The process of reviewing a set of data to detect and either correct or remove data and other information that is corrupt, incomplete, irrelevant, inappropriate, inaccurate, or unnecessary.

data wrangling

The process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for other purposes such as analytics. Also known as data munging. Also distinct from data cleaning.

feature engineering

The process of using domain knowledge of the data to create features that make machine learning algorithms work.

training

The process of using past data (inputs and outputs) to develop a decision algorithm.

reward function

The relative rewards and penalties associated with taking particular actions the produce particular outcomes.

prediction data types

Three types of data: 1. Input - The independent variables used to predict the outcome of the prediction 2. Training - The known input and output data used to develop the decision process or algorithm. 3. Feedback - Data, including past outcomes, used to improve the algorithm

unknown knowns

Wrong predictions with high confidence where the human or the machine don't understand the underlying decision process that generated the data. Among other issues is reversing the causal sequence (casual inference), observing an action thinking it leads to a particular outcome when the action only happens when the outcome is already assured (grandmaster sacrificing the queen) or misunderstanding the relationship between demand, supply, and prices. May be able to counter unknowns with modeling how the data gets generated.


Related study sets

History of Architecture - Greek Architecture

View Set

Live Virtual Machine Lab 7.3: Module 07 Cloud Concepts

View Set

American History II Final Test Study Guide

View Set