RNN - LSTM - Transfer Learning Let 6
RNN is best for :
•Good for Sequential data, or ordered data
Hyperparameter Optimization covered
•Grid Search •Random Search •Hand-tuning •Gaussian Process with Expected Improvement •Tree-structured Parzen Estimators (TPE)
Grid search
•Grid search is a model hyperparameter optimization technique •It tune the hyperparameters of deep learning models
hand-tuned hyperparameters available
•Optimization Algorithm -optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam'] •Network Weight Initialization -init_mode = ['uniform', 'lecun_uniform', 'normal', 'zero', 'glorot_normal', 'glorot_uniform', 'he_normal', 'he_uniform'] •Neuron Activation Function -activation = ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear'] •Dropout Regularization -dropout_rate = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] •
TL workflow:
1) Take layers from a previously trained model 2) Freeze them, so as to avoid destroying any of the information(trainable = False) 3) Add some new, trainable layers on top of the frozen layers 4) Train the new layers on your dataset 5) optional step, is fine-tuning, which consists of unfreezing
Transfer Learning Types
1)Inductive 2)Transductive 3)Unsupervised 1) Adapt existing supervised training model on new labeled dataset 2)supervised and unlabeled 3)unsupervised and unlabeled
RNN
A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor.
Limitations of RNNs
A simple RNNs fail to understand the context behind an input. Something that was said long before, cannot be recalled when making predictions in the present The reason behind this is the problem of Vanishing Gradient
RNN Main problems
Exploding Gradients: weights assigns high importance: unstable network Vanishing Gradients: values of gradients are too small: model stops learning
Why Recurrent Neural Network
Humans don't start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words •Traditional neural networks can't do this, and it seems like a major shortcoming. Idea: retain the information preserving the importance of order
LSTM
LSTMs on the other hand, make small modifications to the information by multiplications and additions
alternate solution to RNN limitations and problems
Long Short-Term Memory (LSTM)
cells states
The previous cell state The previous hidden state The input at the current time step -The previous cell state: (i.e. the information that was present in the memory after the previous time step) -The previous hidden state: (i.e. this is the same as the output of the previous cell) -The input at the current time step: (i.e. the new information that is being fed in at that moment)
LSTMs can selectively remember or forget things(T/F)
True
With LSTMs, the information flows through a mechanism known as .............
cell states
model = Sequential() model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[??])) ----------? model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2)) model.add(Dense(?,activation=??????')) -----------?
num of classes
gates are composed out of a ................neural net layer and ............... operation
sigmoid , multiplication(pointwise)
Transfer learning
taking features learned on one problem, and leveraging them on a new, similar problem. (Domain Adaptation)
gates in LSTM
the ability to remove or add information to the cell state, carefully regulated by structures called gates. •Gates are a way to optionally let information through.
gate values 0,1 what do they mean?
zero : let nothing through one : let everything through