Machine learning basics
classification problem types
* Binary - The label contains two possible classes * Multiclass - The label contains three or more possible mutually exclusive classes * Multi-label - The label has two or more classes and a case may be associated with more than one category (categories are not mutually exclusive) * Imbalanced - The classes are not at all equally distributed, eg. a rare disease or fraud detection. Typically the label is binary.
regression problem
A a type of supervised learning problem with the goal of predicting a continuous valued output.
binary classification problem
A classification problem with exactly two classes.
multiclass classification problem
A classification problem with more than two classes.
test set
A data set used to evaluate a machine learning model.
training set
A data set used to train a machine learning model.
box plot
A graph that displays a five-number summary of a set of data and includes the minimum, first quartile, median, third quartile, and maximum. The box portion of the display has the first quartile, median, third quartile, and the whiskers on either side of the box have the maximum and minimum. A variation of this kind of display plots outlier values as individual points.
clustering algorithm
A machine learning technique used to classify data points into different groups based on relationships among the variables where members of each group are more similar than members of other groups.
supervised learning
A machine learning technique where the prediction the value of the label (output) are based on knowledge of the input data. The features (inputs) provide the information to train the model. There is an assumed relationship between the features and the predicted output.
unsupervised learning
A machine learning technique where there is no assumed relationships among the features (inputs) and the process looks for previously undiscovered relationships.
violin plot
A method of displaying numerical data that shows the probability density of the data.
kernel density estimation
A method of estimating the probability density function of an unknown variable.
linear regression
A method of finding the best model for a linear relationship between the explanatory variables (features or inputs) and response variable (output or label).
dummy variable
A numeric variable with two possible values (typically 0 or 1) that represent the presence or absence of a particular category value.
artificial intelligence
A process using a variety of prediction techniques such as clustering, regression, classification, decision trees, deep learning, neural networks, etc. to address different kinds of prediction problems.
automation
A process where a machine takes over an entire task.
task
A sequence of decisions.
work flow
A sequence of tasks where each task has one or more decisions, and the jobs that have to be done may span several tasks. The goal is to turn inputs into outputs.
quantile
A set of cut points that divide a sample of data into groups containing (as far as possible) equal numbers of observations in a data set.
data set
A set of data. Also spelled as dataset.
quartile
A set of three cut points that divide a sample of data into groups containing (as far as possible) a quarter of the observations in a dataset. The lowest cut point is referred to as Q1, the middle cut point (the median) as Q2, and the highest as Q3.
underfitting
A situation where a machine learning algorithm is too simple or has too few features and the data shows structure not captured by the model.
overfitting
A situation where a machine learning algorithm with too many features fits a particular set of data very well but does a poor job of fitting new examples (does not generalize well).
logistic regression
A statistical model that uses a logistic function to make a binary prediction when the label is categorical, for example, whether or not something is in a particular class. The dependent variable or label represents the logarithm of the odds that the dependent variable is in that class and can have a value from -negative to positive infinity.
support vector machine
A supervised learning technique that allows an algorithm to deal with an infinite number of features (independent variables).
scatter plot
A type of plot or mathematical diagram using Cartesian coordinates to display values for two variables in adata set.
classification problem
A type of supervised learning problem with the goal of predicting a discrete valued output.
outlier
A value in a data set that is either less than the lower quartile value or greater than the upper quartile value by a factor of 1.5 times the interquartile range.
false positive rate
Also called fall-out or false alarm ratio, this is a success metric for a machine language classification model that measures what proportion of actual positive cases were falsely identified as negative. Computed using (False Positive)/(False Positives + True Negatives)) or (False Negatives)/(All Negative cases)
false negative rate
Also called miss rate, this is a success metric for a machine language classification model that measures what proportion of actual positive cases were falsely identified as negative. Computed using (False Negatives)/(True Positives + False Negatives)) or (False Negatives)/(All Positive cases)
precision
Also called positive predictive value, this is a success metric for a machine language classification model that measures what proportion or fraction of cases identified as positive were correct. Computed using (True Positives)/(True Positives + False Positives))
true negative rate
Also called selectivity or specificity, this is a success metric for a machine language classification model that measures what proportion or fraction of actual negative cases were identified as negative. Computed using (True Negatives)/(True Negatives + False Positives)) or (True Negatives)/(All Negative cases)
true positive rate
Also called sensitivity or recall, this is a success metric for a machine language classification model that measures what proportion or fraction of actual positive cases were identified as positive. Computed using (True Positives)/(True Positives + False Negatives)) or (True Positives)/(All Positive cases)
logit function
Also known as the logarithm of the odds, this is the value given by log(p/(1-p)).
deep learning
An AI function that imitates the workings of the human brain in processing data and creating patterns for use in decision making (relies on "back propagation" or learning by example).
learning algorithm
An algorithm that outputs a particular hypothesis.
feature
An input or independent variable of a machine learning model used to help make a prediction. The inputs may be numerical or a discretely valued categorical variable.
F-score
Based on both accuracy and recall, this is a measure of the accuracy of a binary classification algorithm. It is defined as: 2*[(Precision)(Recall)] / [Precision + Recall], where Precision = (True Positives)/(True Positives + False Positives)), and Recall = (True Positives)/(True Positives + False Negatives)) The closer the value is to 1, the more accurate the test.
unknown unknowns
Black swans, unpredictable from past data (bolt out of the blue, Napster, impact of social media).
visualization options for numerical data
Box plots Bar charts Kernel density estimate plot Violin plot
one hot encoding
Converting a categorical variable with N-1 category levels to N binary variables (dummy variables). Only N-1 binary variables needed because the presence of the Nth could be represented by the N-1 variables all being zero.
externality
Costs or liabilities that are borne but others beyond the decision makers or the group that benefits from the decision (bitcoin, acid rain).
reward function engineering
Determining the rewards to various actions given the prediction made by the AI (prediction algorithm). Could include whether to pass of the reward determination to human judgment.
judgment
How different outcomes and errors get compared, ranked, and otherwise evaluated.
satisficing
If lacking good prediction skills, make decisions that are good enough.
decision making
In artificial intelligence, the process of making a choice or finding a solution after obtaining the results of a prediction.
intelligence vs. artificial intelligence
Intelligence encompasses several things including prediction, judgement, action, evaluating results, and choosing data. Artificial intelligence has a much more narrow focus on prediction.
known unknowns
Sparse data, bad predictions for algorithms (earthquakes, elections), but in some areas (catching a ball, facial recognition) humans do well with sparse data, area where joint machine-human decisions work well.
elements of a decision
Prediction - What you need to know (the prediction) to make a decision Judgment - How do you value the different outcomes and errors Action - What you are trying to do Outcome - Metrics for success Input - What is needed to run the predictive algorithm Training - What is needed to train the algorithm Feedback - How to use the outcome to improve the algorithm
churn
Replacing lost customers with new ones or an outcome where a current customer becomes a former customer.
known knowns
Rich data for use in predictions (fraud detection, medical diagnosis)
accuracy
Success metric for a machine language classification model that measures the overall percentage of correctly assigned cases or fraction of correctly identified cases. Computed using (True Positives + True Negatives)/(All Cases))
intelligence
The ability to obtain useful information, where prediction is a key component.
label
The dependent variable or output of a supervised machine learning process.
interquartile range
The difference between the upper and lower quartile value of a numerical data set.
machine learning
The discipline of allowing a machine such as a computer to learn from data without being explicitly programmed to do so.
cost function
The magnitude of the error between the prediction of the hypothesis of a learning function and the actual reality.
scaling
The mathematical transformation of numeric features to make the distribution look more like a standard normal distribution.
AI winter
The period from the 1950s to at least the 1980s where the promise and potential of AI was not realized due in part to underestimating the difficulties of the problem and the limitations of mathematical models, software, and hardware of the era.
odds
The probability of success (p) divided by the probability of failure (1-p) and is given as p/(1-p).
prediction
The process of filling in missing information that includes taking information you have (data) and creating information that you don't have. The ability to see otherwise hidden information, whether it be in the past, present, or future. A key component of intelligence and artificial intelligence.
data cleaning
The process of reviewing a set of data to detect and either correct or remove data and other information that is corrupt, incomplete, irrelevant, inappropriate, inaccurate, or unnecessary.
data wrangling
The process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for other purposes such as analytics. Also known as data munging. Also distinct from data cleaning.
feature engineering
The process of using domain knowledge of the data to create features that make machine learning algorithms work.
training
The process of using past data (inputs and outputs) to develop a decision algorithm.
reward function
The relative rewards and penalties associated with taking particular actions the produce particular outcomes.
prediction data types
Three types of data: 1. Input - The independent variables used to predict the outcome of the prediction 2. Training - The known input and output data used to develop the decision process or algorithm. 3. Feedback - Data, including past outcomes, used to improve the algorithm
unknown knowns
Wrong predictions with high confidence where the human or the machine don't understand the underlying decision process that generated the data. Among other issues is reversing the causal sequence (casual inference), observing an action thinking it leads to a particular outcome when the action only happens when the outcome is already assured (grandmaster sacrificing the queen) or misunderstanding the relationship between demand, supply, and prices. May be able to counter unknowns with modeling how the data gets generated.