Info MGMT Data Robot

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Machine Learning

A field of study that gives computers the ability to learn without being explicitly programmed. The practice of using algorithms to parse data, learn from it, and then make a determination or prediction about something in the word.

LogLoss

A measure of accuracy. Rather than evaluating the model directly on whether it assigns cases (rows) to the correct "label", the model is evaluated based on probabilities generated by the model and their distance from the correct answer. Lower scores are better.

Holdout Set

A subsection of a dataset to provide a final estimate of the machine learning model's performance after if has been trained and validated. Holdout sets should never be used to make decisions about which algorithms to use for improving tuning algorithms.

Importance (green bars)

Alternating Conditional Expectations (ACE Score). Very loosely a correlation for non-linear.

Predictors

Are features responses or predictors?

Variable Type

Boolen, Categorical, Numeric, Text

Features

Can be thought of as the independent variables we will use to predict the target.

Blenders

Combination of models.

Index

Common way to talk about feature (i.e. feature #34).

Supervised ML

Data scientist tells the machine what it wants it learn (identifies target).

Feature Name

Directly from Flat File.

Model Diagnosis

Evaluation/ranking of the models.

Artificial Intelligence

Machines that can preform tasks that are characteristic of human intelligence.

Descriptive Stats

Mean, Standard Deviation, Median, Min, Max

Missing

Number of missing values.

Unique

Number of unique values.

Training Set

Subsection of a dataset from which the machine learning algorithm uncovers or "learns" relationships between the features and the target variable.

Validation (test) Set

Subsection of a dataset to which we apply the machine learning algorithm to see how accurately it identifies relationships between the known outcomes for the target variable and the dataset's other features.

Five Fold Cross Validation

The data set less the the holdout is split into five folds.

Over Training

The model simply memorizes the training examples and is not able to give correct outputs also for patterns that were not in the training dataset. Poor generalization.

Automated ML

The process of automating Machine Learning. Makes ML possible with out extensive math/stat/programming.

Target

The variable we are trying to predict and gain insights about.

Goal of Machine Learning

To build computational models with high prediction and generalization capabilities.

Unsupervised ML

Up to the machine to decide what it wants to learn.

Boolean Variables

Yes/No, True/False, 1/0 (binary)


Kaugnay na mga set ng pag-aaral

Chapter 1 What is a supply chain

View Set

Assessment, Diagnosis and Management of Common Neurological Problems

View Set

CHE 102: Chapter 3- Atoms and Elements

View Set

Week 2 - Introduction to Management

View Set

Social Media Marketing Ch.1-5 Terms

View Set

Subject Pronouns in Spanish- Andrew

View Set

Карпюк 9 клас, U1 "Who are you?", L1 "Vital statistics"

View Set

Inspirlang Unit 1 What is your name?

View Set