Recurrent Neural Networks (RNNs)

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Types of Supervised Learning

"Regression and Classification" Artificial Neural Networks - Used for Regression & Classification Convolutional Neural Networks - Used for Computer Vision Recurrent Neural Networks - Used for Time Series Analysis

Types of Unsupervised Learning

"clustering and association" Self-Organizing Maps - Used for Feature Detection Deep Boltzmann Machines - Used for Recommendation Systems AutoEncoders - Used for Recommendation Systems

RNN Training Methods

- Back-Propagation Through Time (BPTT): Unfolding RNNs in time and using the extended version of back-propagation. - Extended Kalman Filtering (EKF): A set of mathematical equations that provides efficient computational means to estimate the state of a process, in a way that minimizes the mean of squared error (cost function) on a linear system. - Real Time Recurrent Learning (RTRL): Computing the error gradient and update weights for each time step.

Drawbacks of a typical RNN architecture

- Computation being slow - Difficulty of accessing information from a long time ago - Cannot consider any future input for the current state

RNN Architectures

- Feed Forward Neural Network - Simple Recurrent Neural Network - Fully Connected Recurrent Neural Network

Motivations for Recurrent Neural Networks

- Feed forward network accepts a fixed-sized vector as input and produces a fixed-sized vector as output - Fixed amount of computational steps - Recurrent nets allows us to operate over sequences of vectors

Advantages of a typical RNN architecture

- Possibility of processing input of any length - Model size s not increasing with size of input - Computation considers historical information - Weights are shared across time

Summary of RNNs

- RNNs allow a lot of flexibility in the architecture design - RNNs are simple but don't work very well - Common to use LSTM: their additive interactions improve gradient flow - Backward flow of gradients in RNN can explode or vanish - Exploding is controlled with Gradient clipping. Vanishing is controlled with additive interactions (LSTM) - Better/simpler architectures are a hot topic of the current research - Better understanding (both theoretical and empirical) is needed.

There are two kinds of neural networks:

1. Feed forward Neural Networks: - Connections between the units do not form a cycle 2. Recurrent Neural Networks: - Connections between units form cyclic paths

Long Short-Term Memory (LSTM)

An LSTM network is a special kind of recurrent neural network (RNN) that is optimized for learning from and acting upon time-related data which may have undefined or unknown lengths of time between relevant events. LSTMs work very well on a wide range of problems and are now widely used. They were introduced in 1997 by Hochreiter & Schmidhuber and were refined and popularized by many subsequent researchers.

Rectified Linear Unit (ReLU)

An activation function with the following rules: If input is negative or zero, output is 0. If input is positive, output is equal to input. When using rectified linear units, the typical sigmoidal activation functions used for node output is replaced with a new function: f(x) = max(0, x). This activation only saturates in one direction and thus is more resilient to the vanishing of gradients.

Back-propagation through time

Forward through entire sequence to compute loss, then backward through entire sequence to compute gradient.

Memory Cell

LSTM networks introduce a new structure called a memory cell. Each memory cell contains four main elements: - Input gate - Forget Gate - Output gate - Neuron with a self-recurrent These gates allow the cell to keep and access information over long periods of time.

Types of RNNs

One-to-one Tz = Ty = 1 Traditional Neural Network One-to-many Tz = 1, Ty > 1 Music Generation Many-to-one Tz > 1, Ty = 1 Sentiment classification Many-to-many Tz = Ty Name entity recognition Many-to-many Tz != Ty Machine translation

Recurrent Neural Network (RNN)

Recurrent Neural Networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. An RNN models sequential interactions through a hidden state, or memory. It can take up to N inputs and produce up to N outputs. For example, an input sequence may be a sentence with the outputs being the part-of-speech tag for each word (N-to-N). An input could be a sentence, and the output a sentiment classification of the sentence (N-to-1). An input could be a single image, and the output could be a sequence of words corresponding to the description of an image (1-to-N). At each time step, an RNN calculates a new hidden state ("memory") based on the current input and the previous hidden state. The "recurrent" stems from the fact that at each step the same parameters are used and the network performs the same calculations based on different inputs. RNNs are very powerful dynamic system for sequence tasks such as speech recognition or handwriting recognition since they maintain a state vector that implicitly contains information about the history of all the past elements of a sequence.

Truncated Back-propagation through time

Run forward and backward through chunks of the sequence instead of whole sequence.

vanishing gradient problem

The vanishing gradient problem arises in very deep Neural Networks, typically Recurrent Neural Networks, that use activation functions whose gradients tend to be small (in the range of 0 from 1). Because these small gradients are multiplied during backpropagation, they tend to "vanish" throughout the layers, preventing the network from learning long-range dependencies. Common ways to counter this problem is to use activation functions like ReLUs that do not suffer from small gradients, or use architectures like LSTMs that explicitly combat vanishing gradients. The opposite of this problem is called the exploding gradient problem. SOLUTIONS Multi-level hierarchy Residual Networks Rectified linear units (ReLUs)

Residual networks

This technique introduces bypass connections that connect layers further behind the preceding layer to a given layer. This allows gradients to propagate faster to deep layers before they can be attenuated to small or zero values.

multi-level hierarchy

This technique pretrains one layer at a time, and then preforms backpropagation for fine tuning.

T/F: Cross-entropy loss is a common choice for the cost (loss) function in an RNN.

True

T/F: LSTM networks can outperform RNNS and Hidden Markov Models (HMM).

True This is achieved by multiplicative gate units that learn to open and close access to the constant error flow.


Ensembles d'études connexes

perfusion Exemplar 16.B Congenital Heart Defects

View Set