Machine Learning Exam 1
Common Failure Cases of GANs
- The discriminator becomes too strong too quickly and the generator ends up not learning anything - The generator only learns very specific weaknesses of the discriminator - The generator learns only a very small subset of the true data distribution
Variational Autoencoder
A "beefed up" autoencoder. Instead of learning a latent vector, they learn a statistical representation of the code (usually a normal distribution denoted by a mean and a standard deviation)
Generative Adversarial Network
A fallback on variational autoencoders that is composed of 2 neural networks (A generative and discriminator network)
Autoencoder
A family of neural networks for which the input is the same as the output. Work by compressing the input into a latent-space representation and reconstructing the output from this representation (don't sort latent space)
Multilayer perceptron
A feed-forward artificial neural network
Batch
A hyperparameter that defines the number of samples to work through before updating the internal model parameters
Epoch
A hyperparameter that defines the number of times that the learning algorithm will work through the entire training dataset
"Deep" ANN
A neural network that has more than 1 hidden layer
"Shallow" ANN
A neural network that only has ~1 layer; One layer between input and output
Sample
A single row of data
Risk
A standard criterion for a "good" approximation is one which minimizes the expected loss.
Linearly Separable
All problems which can be solved by separating 2 classes with a hyperplane (AND and OR patterns); No hidden layers
Perceptron
An early learning network where you discover a set of connection weights which correctly classified a set of binary input vectors
Sigmoid
An example of the response (output) function, looks like an "S"
Agent
Anything that can be viewed as perceiving its environment through inputs and acting upon that environment through outputs
TRUE
Backpropagation doesn't work on linear or binary activations
Convolutional Neural Network
Commonly used in image processing; Uses "cells" to compare each part of the image
Loss
Comparing the result of a learned function vs the label function (error)
Supervised Learning
Data that consists of domain-range pairs (X,Y)
Unsupervised Learning
Data that consists only of domain values X
Activation function
Decides whether a neuron should be activated or not by calculating a weighted sum and further adding bias with it (ex. Sigmoid)
Problem of VAE
Does not really try to simulate real images; Even if it classifies data correctly, a one pixel difference would still be counted as "correct"
Perceptron applications/limitations
Handle logic gates well/any problem that is linearly separable
Adding bias to the (weight * input) function
How do we increase/maintain/reduce dimensionality between layers?
Testing Set
In back-propagation, a collection of input-output patterns that are used to assess network performance
Training Set
In back-propagation, a collection of input-output patterns that are used to train the network
Learning Rate (n)
In back-propagation, a scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments
Underfitting
In hidden layers, this is when you are using too few neurons. Occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set
Overfitting
In hidden layers, using too many neurons in the hidden layers. Occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers; Leads to increased training time
Entropy
Measure of chaos in a system; The average level of "information", "surprise", or "uncertainty" inherent in the variable's possible outcomes
Gradient Descent
Optimization algorithm for finding the input to a function that produces the optimal value; Weights are changed in proportion to the negative of an error derivative with respect to each weight
Generator
Part of a GAN that learns to create fake data by incorporating feedback from a discriminator (tries to trick D), want G to work harder
Discriminator
Part of a GAN that tries to distinguish real data from fake data (Tries not to be fooled)
- 2D Arrays: Weights from input->hidden and hidden->output, weight changes - 1D Arrays: Neuron layers, bias weights
Possible Data Structures for propagated networks
Weights
Represent synaptic efficacy and may be excitatory (positive) or inhibitory (negative).
net = SUM(w*I)
The equation for the output of a perceptron; If the net input is > than the threshold theta, then the output unit is turned on, otherwise it is turned off
Hidden Layer
The middle layer of an artificial neural network that has three or more layers.
Learning
The process of modifying the weights in order to produce a network that performs some function
Latent Space
The representation of compressed data (The more the features become closer to observed features, the more accurate the resulting data becomes)
Domain
The set X is called feature space. The coordinates of X are called features. Individual features may take values in a continuum, a discrete, ordered set, or a discrete, unordered set
Cross Entropy
Used to quantify the difference between two probability distributions; Expected coding length using optimal code for Q
Range
Usually either a finite, unordered set, in which case learning is called classification, or it is a continuum, in which case learning is called regression
- Dimensionality Reduction - Compression - Denoising - Super-Sampling - Anomaly detection
What are some applications of autoencoders?
- Each Pattern is presented to the network and propagated forward all the way to the output - Gradient Descent to minimize total error on training set patterns
What are the 2 defined learning steps in back-propagation networks?
- Between the size of the input layer and the size of the output layer - 2/3 the size of the input layer, plus the size of the output layer - Less than twice the size of the input layer
What are the 3 general rules when determining the correct number of neurons to use in the hidden layer?
- Computes the output Z, using the inputs and the weights - Performs the activation on Z to give out the final output A of the neuron
What is the atomic unit of a neuron?
- Inputs x(i) arrive through presynaptic connections - Synaptic efficacy is modeled using real weights w(i) - The response of the neuron is a nonlinear function f of its weighted inputs
What is the basic neuron model in a feed-forward network?