Artificial Intelligence Chapter 12
Limitations
1. Local Minima 2. Over Fitting 3. Vanishing Gradient problem 4. You learn by using all the unnecessary information and speed ↓ and cost ↑
Recurrent Neural Network(RNN)
A neural network in which the previous output is input
Convolutional Structure
Multiple nodes share a weight; Zoom out with Max pooling; CNN
feature/representation learning
The hidden layer transforms the feature vector into a new feature space that is more advantageous for classification
The training set
used for generating the model that represents a hypothesis
Long Short-term memory(LSTM)
used primarily to handle long-term dependence (LSTM has selective memory capability) to handle very long sequential data
Weaknesses of Deep Learning
• It takes a lot of time and money to acquire learning data. • It does not properly interpret patterns that fall outside the range of learning data. • Since the generated model is a black box, it is difficult for humans to interpret or improve the content.
Strengths of Deep Learning
• You do not have to worry about expressing features to tell the computer • Generally, it produces better results than a human model. • Does not require advanced mathematical knowledge or programming skills • The abundance of open source algorithms makes it cheaper and faster to develop.
communicate
Interconnected neurons use electrical pulses to "____________" with each other.
Deep Learning Algorithms
• Deep Neural Network (DNN) • Convolutional Neural Network(CNN) • Recurrent Neural Network(RNN)
Characteristics of ANN
- A large number of very simple processing neuron-like processing elements - A large number of weighted connections between the elements - Distributed representation of knowledge over the connections - Knowledge is acquired by network through a learning process
Neural Signal Processing
1. Signals from connected neurons are collected by the dendrites. 2. The cells body sums the incoming signals 3. When sufficient input is received, the neuron generates an action potential 4. That action potential is transmitted along the axon to other neurons 5. If sufficient input is not received, the inputs quickly decay and no action potential is generated. 6. Timing is clearly important
Fully Connected Structure
All connections between two layer nodes; DNN
Imitate Human Brain
Builds learning algorithms that mimic the brain
the training phase
During ______________, the algorithm selects the nodes from the deep neural network to be dropped (activation value set to 0)
Multi-Layer Perceptron Characteristics
Hidden layer, Activation Function, Back propagation method for error correction
Vanishing Gradient problem
In gradient decent and backpropagation methods, each of the MLP weights receives an update proportional to the partial derivative of the error function with respect to the current weight in each iteration of training
Minsky's Perceptrons
Pointed out the limitation of perceptron and suggested a solution to overcome the problem by using multi layered structure
Recurrent Structure
Return the output value and status value of one node and input it further; Effect with simple storage device; RNN
Deep Learning Characteristics
Separate training per layer, Stack up a trained hidden layer, Strong at local minima, Overfitting can be avoided even with less data, Hidden layers can learn to have desired characteristics
the hidden layer
The first layer is called the input layer, the last layer is the output layer, and the middle layers are called ____________
Feed-forward Networks
The input signals are propagated to the output layer in the forward pass and the weights are optimized in a recursive manner in order to train the model for generalizing the new input data based on the training set provided as input.
Recurrent-feedback Networks
The output of one forward propagation is fed as input for the next iteration of training for sequences of data. Ex) text, speech, or any other form of audio input
Sigmoid function
The output value is between 0 and 1. This function takes a geometrical shape of S and hence the name sigmoid
filter
The weight used in the process of creating the feature map
sequential data
While static data is acquired at any one moment and is of fixed length, __________________ is dynamic and usually variable length
The pooling layer
________________ plays a role of reducing the dimension size of the image data by the down-sampling method
Artificial Neural Networks
a computational system inspired by the structure, processing method, learning ability of a biological brain
A binary classifier
a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class
DBN(deep belief network)
a generative graphical model, composed of multiple layers of latent variables ("hidden units"), with connections between the layers
Deep Learning
a set of algorithms in machine learning that attempt to learn in multiple levels, corresponding to different levels of abstraction. It typically uses artificial neural networks
The Gradient Descent method
a technique to observe the descending direction of the slope in order to find the position of the minimum value of the function and repeat the examination while moving little by little
Perceptron
an algorithm for supervised learning of binary classifiers; introduces concepts such as nodes, weights, and layers and learning
Stochastic gradient descent(SGD)
an iterative technique that can distribute the work units and get us to the global minima in a computationally optimal way
Human brain
anatomical connections to graph theory
The receptors
collect information from the environment
Deep Neural Network(DNN)
commonly consists of at least two hidden layers between input and output - Multi-Layer Perceptron(MLP) - The number of layers can be very high
Over Fitting
excessive learning (usually occurs when data is insufficient) - Drop out or more sufficient data is required
The effectors
generate interactions with the environment - e.g. activate muscles
Convolution Neural Network(CNN)
networks that have a known grid-like network topology, such as image data that are essentially a 2-dimensional grid of pixels
The circulating edge
responsible for transmitting the information generated at the moment of t-1 at the moment of t
Local Minima
stops learning even though it has not reached to the global minima- Stochastic Gradient Descent (SGD) Method is required
Convolution
the process of obtaining a specific feature in the local domain
Rectified Linear Unit
the simplest, computationally optimized, and hence most popularly used activation function for the ANNs. The output value is 0 for all negative input and the same as the value of input for positive input
Neuron
the smallest information processing unit in the brain
The validation set
used to test the efficiency of the hypothesis function or the trained model for