Info MGMT Data Robot
Machine Learning
A field of study that gives computers the ability to learn without being explicitly programmed. The practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the word.
LogLoss
A measure of accuracy. Rather than evaluating the model directly on whether it assigns cases (rows) to the correct "label", the model is evaluated based on probabilities generated by the model and their distance from the correct answer. Lower scores are better.
Holdout Set
A subsection of a dataset to provide a final estimate of the machine learning model's performance after if has been trained and validated. Holdout sets should never be used to make decisions about which algorithms to use for improving tuning algorithms.
Importance (green bars)
Alternating Conditional Expectations (ACE Score). Very loosely a correlation for non-linear.
Predictors
Are features responses or predictors?
Variable Type
Boolen, Categorical, Numeric, Text
Features
Can be thought of as the independent variables we will use to predict the target.
Blenders
Combination of models.
Index
Common way to talk about feature (i.e. feature #34).
Supervised ML
Data scientist tells the machine what it wants it learn (identifies target).
Feature Name
Directly from Flat File.
Model Diagnosis
Evaluation/ranking of the models.
Artificial Intelligence
Machines that can preform tasks that are characteristic of human intelligence.
Descriptive Stats
Mean, Standard Deviation, Median, Min, Max
Missing
Number of missing values.
Unique
Number of unique values.
Training Set
Subsection of a dataset from which the machine learning algorithm uncovers or "learns" relationships between the features and the target variable.
Validation (test) Set
Subsection of a dataset to which we apply the machine learning algorithm to see how accurately it identifies relationships between the known outcomes for the target variable and the dataset's other features.
Five Fold Cross Validation
The data set less the the holdout is split into five folds.
Over Training
The model simply memorizes the training examples and is not able to give correct outputs also for patterns that were not in the training dataset. Poor generalization.
Automated ML
The process of automating Machine Learning. Makes ML possible with out extensive math/stat/programming.
Target
The variable we are trying to predict and gain insights about.
Goal of Machine Learning
To build computational models with high prediction and generalization capabilities.
Unsupervised ML
Up to the machine to decide what it wants to learn.
Boolean Variables
Yes/No, True/False, 1/0 (binary)