Chapter 9: RNN and LSTM

Ace your homework & exams now with Quizwiz!

RNNs for various NLP tasks

1) Sequence labelling 2) Sequence Classification 3) Text Generation

LSTM

Addresses challenge of vanishing gradient (backpropagation in hidden layer) and capturing long-range dependencies (sequential data) specialized neural units (gates) regulate flow of information modular design

Word Embedding Matrix

lookup table that maps each word in a vocabulary to a dense vector representation The input vector of a word is the row of a word embedding matrix

Text Generation

machine translation, text summarization and question answering sampling words sequentially conditioned on previous choices, typically using a softmax distribution. (Autoregressive generation)

Elaborate on Task 1

Part-of-speech tagging or named entity Word embeddings as input Tag probabilities as output generated using softmax Cross Entropy Loss

Describe RNN computation process

input processed sequentially (one at a time), multiplied by weight matrices, and combined with the previous hidden layer's value using an activation function. No fixed-length limit on context, extending to the sentence beginning. Training RNNs (Backpropagation) involves two passes: - Forward inference to accumulate loss (cross entropy) and save hidden layer values. - Reverse processing to compute gradients and update weights.

Issues with Encoder-Decoder Model

context vector aka last hidden state acts as bottleneck information stored limited esp for long sentences

Attention mechanism

context vector is derived by computing a weighted sum of all hidden states, where the weights vary for each token, enabling the model to focus on relevant parts of the input sequence for each token generated in the output sequence. scored using dot-product similarity, bilinear models and nromalized with softmax

Stacked RNNs

multiple RNN layer, a single layer (RNN network) serves as input to a subsequent layer (RNN network) create representations of higher (complex/abstract features) and lower levels (edges/textures) of abstraction across layers hence outperforming single-layer networks optimization dependent on the application, training set increase number of layers, increasing training cost

Architecture of RNN

self-supervised model include a hidden layer with a recurrent connection responsible for retaining memory, and influencing later decisions.

Elaborate on Task 2

sentiment analysis, spam detection, or topic classification consists of a feedforward network and softmax layer combined with cross-entropy loss Utilize only final hidden state for classification Use pooling to aggregate information from all hidden states in sequence

Gates

specialized neural units input ~: add new info to context output ~: info passed to next hidden state forget ~: removes irrelevant info from context

Teacher forcing

use gold target sentence aka ground truth as input of the next time step when training the model instead of output of current time step

Weight tying

use single set of embeddings (tying of weights of certain layers in the network) at both input and output softmax layer improve model perplexity, reduce parameter count hence reduced redundancy

Bidirectional RNNs

utilize information from both left and right contexts (final hidden state) by running two separate RNNs, one left-to-right and one right-to-left concatenates, performs element-wise addition or multiplication of both forward and backward contexts into a single vector


Related study sets

Medical Laboratory Science Review Harr. - 3.3 Immunology and Serology: Infectious Disease

View Set

Integumentary/Musculoskeletal NCLEX prep

View Set

Chapter 15 vital signs True or false

View Set