Artificial Intelligence Chapter 11
Label
data that has been annotated by a human to show what is to be learnt by the algorithm
Principal Component Analysis(PCA)
dimensionality reduction technique which reduces the number of dimensions of our data to a small number that best describes its core features
The goal of the GAN
generating fake samples to the extent that D can not distinguish them produced by G
Clustering
grouping similar samples into the same group
The quality of selected features
has a direct impact on quality of models as models learn using features that are informative in order to arrive at a final prediction
The Issues of Machine Learning for Big Data
learning for large scale of data, learning for different types of data, learning for high speed of streaming data, learning for uncertain and incomplete data, learning for data with low value density and meaning diversity
Inductive Learning
learning from observation and earlier knowledge by generalization of rules and conclusions
Transfer learning
learning of new tasks relies on the previous learned tasks
Machine learning problems
- Unknown data generation process - Only a given training set 𝕏, 𝕐 can be used to approximate a prediction model or generative model
Machine Learning Process Stages
1. Classify the problem 2. Acquire data 3. Process data 4. Model the problem 5. Validate and execute 6. Deploy
needed
A nonlinear model is __________
Generative
Learn a generative model
supervisor
The labels serve as a "___________" to the algorithm, teaching it during training by providing information on which samples it got correct or wrong
model training
The process of defining the predictor function
Classification
a supervised learning technique that defines the Decision Boundary so that there is a clear separation of the output variables
Big Data processing
the process of cleaning, filtering, and organizing the data for successful mining and modeling, by solving or avoiding problems in the data
Dimensionality reduction
the process of reducing the number of random variables under consideration, by obtaining a set of principal variables
Feature engineering
the process of using domain knowledge of the data to create features that make machine learning algorithms work
A good approach
to always choose an optimum number of features, not too much and not too little
The goal of supervised machine learning
to develop a finely-tuned predictor function, h(x), called hypothesis
The capacity of the first model
too small to fit the test data sets and it creates big errors
Test set
used only to assess the performances of a classifier. It is never used during the training process so that the error on the test set provides an unbiased estimate of the generalization error
A predictor function
used to predict the outcome of the dependent variable
Generative Model Motivation
Generative models (in general) cope with all of above - Can model P(X) - Can generate new images
Hypothesis
a certain function that we believe (or hope) is similar to the true function, the target function that we want to model.
A feature
a characteristic of an observed data point in a dataset
Representation Learning
A set of techniques that allows a system to automatically discover the representations needed for feature detection or classification from raw data; Replaces manual feature engineering and allows a machine to both learn the features and use them to perform a specific task
the training data set
Adopting 12th model almost approximates perfectly _____________________, but if you anticipate "new" data, there's a big problem
selecting a model
After ______________ with a sufficiently large capacity, various regularization techniques are applied so that the selected model does not deviate from normal
Supervised learning
All training samples have label information; The Data mining task of inferring a function from labeled training data;
Spatial transformation
Converting original feature space into low dimensional or high-dimensional space
neural networks
Deep learning finds a hierarchical feature space from low level(dots, lines, edge) to high level(face) by using _____________ with multiple hidden layers
Density estimation
Estimation of probability distribution from data
methods
Feature Engineering __________ allow us to choose the right representation to train models
application of machine learning
Feature engineering is fundamental to the ______________, and is both difficult and expensive
measurable
Features are usually ______________and represent a specific axis of explanation for the data
a high discriminative tendency
Features that best describe the data should always be chosen as such features have _____________ which helps the machine learning model classify outputs and predictions
The quality of the data
Gathering enough data for a given application in sufficient quantities increases estimation accuracy
Clustering, Density estimation, Spatial transformation
General tasks of Unsupervised Learning
the program
In supervised machine learning, the input and output data (training data) are used to create _________________ or the predictor function
the optimal spatial transformation
In the real problem, we need to automatically find ______________ using the unsupervised learning
Machine Learning
Learning system that automatically configures the model M and improves performance P, based on the empirical data D acquired from the interaction with environment E
Adversarial
Trained in an adversarial setting
Networks
Use Deep Neural Networks
Modelling
____________ in machine learning is complicated and can not be expressed by simple mathematical formulas
The real world
____________ is not linear and the noise is mixed
The primary goal
____________________ of any supervised learning algorithm is to minimize the predictor error while defining a hypothesis based on the training data
Manifold
a lowdimensional space in a high dimensional space
Training set
a set of examples used for learning, where the objective value is known
Validation set
a set of examples used to tune the architecture of a classifier and estimate the error
The first principal component
accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible
Unsupervised learning
all training samples do not have label information; the machine learning task of inferring a function to describe hidden structure from "unlabeled" data;
The k-means
an algorithm that minimizes the objective function
Ockham's Razor Principle
prefer the simplest hypothesis consistent with data related to KISS principle ("Keep It Simple Stupid")
The k-medoids
to update the cluster center by the selected representative (insensitivity to noise compared to the k-means)
Semi-supervised learning
unlabeled and labeled samples are mixed