RNN
Question: In an RNN, the ________ function is used to calculate the output of a neuron.
Answer: Activation Difficulty: Medium Explanation: In an RNN, the activation function is used to calculate the output of a neuron. It is typically a non-linear function such as the tanh or ReLU function. The activation function is a crucial part of the neuron as it determines whether the neuron should be activated based on the weighted sum of the inputs.
Question: A method that allows RNNs to assign weights to different inputs in a sequence in order to focus more on the important parts of the input is known as ________.
Answer: Attention Mechanism Difficulty: Hard Explanation: Attention mechanisms allow RNNs to assign different weights to different parts of the input sequence, effectively allowing the model to 'focus' on the parts of the input that are most relevant to the task at hand. This mechanism has proved particularly useful in tasks like machine translation, where it is important to align the words in the input and output sequences.
Question: A type of RNN that uses output from previous time steps as inputs for future time steps is known as an ________.
Answer: Autoregressive RNN Difficulty: Hard Explanation: An Autoregressive RNN is a type of recurrent neural network that uses its own previous outputs as inputs for future steps. This makes them particularly suitable for tasks where the output at each time step depends explicitly on previous outputs, such as in time series forecasting or generating text.
Question: ________ in RNNs allows the gradient to be backpropagated through every timestamp of the sequence, which is done by unfolding the sequence and treating it as a deep neural network with repeating weights.
Answer: Backpropagation Through Time (BPTT) Difficulty: Hard Explanation: Backpropagation Through Time (BPTT) in RNNs is a modification of the standard backpropagation algorithm that allows the gradient to be backpropagated through every timestamp of the sequence. This is done by unfolding the sequence in time and treating it as a deep feedforward neural network where the weights are shared across each layer (timestamp). The unfolded model is then trained using standard backpropagation.
Question: One of the ways to train an RNN is with a method called ________ which updates the weights after computing the loss over all time steps.
Answer: Backpropagation Through Time (BPTT) Difficulty: Hard Explanation: Backpropagation Through Time (BPTT) is a training method used for RNNs. It is a variant of the standard backpropagation algorithm that is applied to the unfolded version of the RNN, computing the loss over all time steps and then updating the weights accordingly.
Question: The process of calculating the gradients of a Recurrent Neural Network by unrolling its computations over time is called ________.
Answer: Backpropagation Through Time (BPTT) Difficulty: Hard Explanation: Backpropagation Through Time (BPTT) is the process of calculating the gradients of a Recurrent Neural Network by unrolling its computations over time. In this approach, the sequence is treated as a deep feed-forward neural network, with the same weights at each layer, but with different inputs and hidden units. The weights are then updated to minimize the error over the entire sequence.
Question: In RNNs, the ________ architecture connects two hidden layers of opposite directions to the same output, helping to capture past and future context.
Answer: Bi-directional RNN Difficulty: Hard Explanation: A bi-directional RNN is a type of RNN that has two hidden layers that pass information in opposite directions. By doing this, it allows the network to have access to past and future context, making it particularly useful for tasks that depend on the surrounding context of an input.
Question: The ________ model is a type of RNN architecture that combines a bidirectional RNN with LSTM units to remember the context of words in a sentence.
Answer: Bidirectional LSTM (BiLSTM) Difficulty: Hard Explanation: The Bidirectional LSTM (BiLSTM) model is a type of RNN architecture that combines a bidirectional RNN with LSTM units to remember the context of words in a sentence. BiLSTMs effectively increase the amount of information available to the network, giving the forward and backward passes through the sequences separate hidden layers which are then passed on to the next layer.
Question: A type of RNN that connects two hidden layers of opposite directions to the same output is known as a ________.
Answer: Bidirectional RNN Difficulty: Hard Explanation: A Bidirectional RNN is a type of RNN that connects two hidden layers of opposite directions to the same output. With this form of generative deep learning, the output layer can get information from past (backwards) and future (forward) states simultaneously. Bidirectional RNNs are particularly useful when the context of the input is needed.
Question: The ___________ is a type of RNN that can capture dependencies of different time steps in sequence prediction problems.
Answer: Bidirectional RNN Difficulty: Hard Explanation: Bidirectional RNN is a type of RNN that is specifically designed to capture dependencies of different time steps in sequence prediction problems. Unlike traditional RNNs, which can only preserve information from the past because the only inputs they see are from the previous steps, BRNNs are able to see future data for a more accurate prediction.
Question: An RNN architecture that processes data from the past and future of a specific time step is called a ________.
Answer: Bidirectional RNN Difficulty: Hard Explanation: Bidirectional RNNs are a type of recurrent neural network architecture that processes data from both the past and future of a specific time step. They consist of two RNNs - one processing data as normal from past to future, and one processing data from future to past. Outputs from both RNNs are combined to create the final output for each time step. This can provide additional context compared to a unidirectional RNN, potentially improving model performance.
Question: In language modelling, a type of RNN that generates a sequence of outputs, such as a sentence, is known as a ________.
Answer: Decoder Difficulty: Medium Explanation: In language modelling, a decoder is a type of RNN that generates a sequence of outputs, like a sentence, based on some input. This input could be a fixed-size vector representation of the sentence (as in an encoder-decoder model), or the previous words in the sequence (as in a language generation model).
Question: The time step at which the output of the RNN is calculated is often referred to as the ________.
Answer: Decoding Step Difficulty: Medium Explanation: The decoding step in the context of an RNN often refers to the time step at which the output of the network is calculated. In a seq2seq model, for example, the encoder processes the input sequence, and then the decoder generates the output sequence one step at a time, with each of these steps being a decoding step.
Question: An RNN that has one or more hidden layers between input and output is called a ________.
Answer: Deep Recurrent Neural Network Difficulty: Medium Explanation: A Deep Recurrent Neural Network is an RNN that has one or more hidden layers between input and output. The presence of multiple hidden layers enables the network to learn more complex patterns. However, this can also make the network harder to train.
Question: ________ is a technique often used in training RNNs to prevent overfitting, which randomly ignores or "drops" some neurons during training.
Answer: Dropout Difficulty: Medium Explanation: Dropout is a regularization technique often used in training RNNs (and other types of neural networks) to prevent overfitting. During training, it randomly "drops" or ignores some neurons, which helps to prevent complex co-adaptations on the training data that can overfit the model.
Question: In RNNs, ________ is a method for dealing with different lengths of inputs and outputs which works by processing one time step at a time.
Answer: Dynamic Unrolling Difficulty: Hard Explanation: Dynamic Unrolling is a method used in RNNs for dealing with sequences of different lengths. It works by processing one time step at a time and allowing the sequence length to be dynamic, i.e., changing from one sequence to another. This provides more flexibility in handling different lengths of sequences in the input or output.
Question: In the context of RNNs, ________ refers to the construction of a fixed-size vector representation from a variable-length input sequence.
Answer: Encoding Difficulty: Hard Explanation: Encoding, in the context of RNNs, often refers to the task of creating a fixed-size vector representation from a variable-length input sequence. This is typically done in models like seq2seq or encoder-decoder models, where the encoder RNN processes the input sequence to create a single vector representation.
Question: ________ is a variant of the RNN architecture that uses gating units to control and manage the memory, thereby handling the problem of long-term dependencies.
Answer: Gated Recurrent Unit (GRU) Difficulty: Hard Explanation: The Gated Recurrent Unit (GRU) is a variant of the RNN architecture that uses gating units to control and manage the memory. The GRU, much like the LSTM, has mechanisms by which it can forget its previous state and update it with new information. GRUs can also handle the problem of long-term dependencies, though they use a simpler and less resource-intensive method than LSTMs.
Question: A variant of RNN that uses gating mechanisms to adaptively control the flow of information in the network is called a ________.
Answer: Gated Recurrent Unit (GRU) Difficulty: Medium Explanation: A Gated Recurrent Unit (GRU) is a variant of the RNN that uses gating mechanisms to adaptively control the flow of information in the network. GRUs have fewer parameters and thus may train a bit faster or need less data to generalize, but they may not capture as many nuances in the data as LSTM.
Question: ________ is a model similar to an LSTM, but has fewer parameters and thus is faster to train.
Answer: Gated Recurrent Unit (GRU) Difficulty: Medium Explanation: The Gated Recurrent Unit (GRU) is a type of recurrent neural network that, like the LSTM, uses gating units to control and manage the flow of information through the network. However, the GRU has fewer parameters than the LSTM and thus is generally faster to train, while often achieving similar performance.
Question: ________ in an RNN helps in preventing the gradient from exploding, but it does not mitigate the vanishing gradient problem.
Answer: Gradient Clipping Difficulty: Hard Explanation: Gradient Clipping is a technique used in training RNNs that prevents the gradient from becoming too large and causing numerical instability, a problem known as the exploding gradient problem. However, it does not address the vanishing gradient problem, where gradients become too small to effectively train the network.
Question: In an RNN, the output at time step t depends not only on the input at that time step, but also on a ________ from previous time steps.
Answer: Hidden State Difficulty: Medium Explanation: In an RNN, the output at time step t depends not only on the input at that time step, but also on a hidden state, which encapsulates some information about a sequence up to the current step. This hidden state allows RNNs to effectively process sequence data by "remembering" some of the past.
Question: The variant of RNNs which uses gating units to control and manage the memory, thus solving the vanishing gradient problem, is called a ________.
Answer: LSTM (Long Short Term Memory) Difficulty: Easy Explanation: LSTMs or Long Short-Term Memory networks are a type of RNN that use gating units to control and manage memory. These gates regulate the flow of information into and out of the memory cell of the LSTM, thereby allowing it to learn long-term dependencies and addressing the vanishing gradient problem.
Question: An ________ is a type of network that utilizes special units in addition to standard units in a recurrent neural network, which allows the model to have more complex patterns and longer time lags.
Answer: LSTM (Long Short-Term Memory) Difficulty: Medium Explanation: LSTM, or Long Short-Term Memory, is a type of recurrent neural network that utilizes special units in addition to standard units. These special units, or gates, allow the model to selectively remember or forget information, which enables them to capture more complex patterns and longer time dependencies compared to standard RNNs.
Question: ________ is a technique often used in RNNs where the model is trained to predict the next word in a sentence.
Answer: Language Modeling Difficulty: Medium Explanation: Language Modeling is a technique often used in RNNs where the model is trained to predict the next word in a sentence. This is a fundamental task in natural language processing and is used in a variety of applications, including machine translation, speech recognition, and text generation.
Question: ________ is an RNN training method that mitigates the vanishing gradient problem by using a constant error carousel.
Answer: Long Short-Term Memory (LSTM) Difficulty: Hard Explanation: Long Short-Term Memory (LSTM) is a type of RNN that is designed to mitigate the vanishing gradient problem by using a structure called a constant error carousel. The constant error carousel ensures that the error can flow through many time steps without vanishing or exploding, which makes it possible for the LSTM to learn longer sequences.
Question: A type of neural network designed for processing sequential data that introduces the concept of memory cells is known as ________.
Answer: Long Short-Term Memory (LSTM) Difficulty: Medium Explanation: Long Short-Term Memory (LSTM) is a type of RNN designed for processing sequential data. The key idea behind LSTMs is the introduction of a memory cell which can maintain its state over time, effectively allowing the LSTM to learn when to forget previous hidden states and when to update hidden states given new information.
Question: An extension of a simple RNN that attempts to model temporal dependencies while avoiding the vanishing gradient problem is the ________.
Answer: Long Short-Term Memory (LSTM) Difficulty: Medium Explanation: Long Short-Term Memory (LSTM) is an extension of a simple RNN that was developed to model temporal dependencies while avoiding the vanishing gradient problem. LSTMs introduce the concept of a cell state and utilize a series of gates to control the flow of information into and out of the cell, thereby preserving the error that can be backpropagated through time and layers.
Question: A solution to the vanishing gradient problem in RNNs that introduces a new gate called a forget gate, along with an input gate and output gate, is called ________.
Answer: Long Short-Term Memory (LSTM) Difficulty: Medium Explanation: The Long Short-Term Memory (LSTM) is a solution to the vanishing gradient problem in RNNs. It introduces a new gate called a forget gate, along with an input gate and output gate. These gates determine how much of the information is forgotten and how much of the new information is added to the cell state, and control the output of the new cell state respectively. These mechanisms allow LSTMs to remember and forget information over long sequences.
Question: A specific issue faced by RNNs, the ________ problem, refers to the difficulty the model faces when trying to learn to connect information where gaps between the relevant information are very large.
Answer: Long-Term Dependency Difficulty: Hard Explanation: The Long-Term Dependency problem is a difficulty faced by RNNs when trying to learn to connect information where gaps between the relevant information are very large. This issue arises because the gradient of the loss function decays exponentially with time, which makes it hard for the RNN to learn the appropriate weights to capture these long-term dependencies.
Question: The ________ problem is a difficulty faced by simple RNNs, where the model is unable to accurately connect the long-term dependencies between inputs and outputs.
Answer: Long-Term Dependency Difficulty: Hard Explanation: The Long-Term Dependency problem is a difficulty faced by simple RNNs, where they are unable to accurately connect the long-term dependencies between inputs and outputs. As a sequence gets longer, it becomes increasingly difficult for the RNN to learn to maintain the influence of an input several steps back. This is due to issues with the training process, specifically the vanishing and exploding gradient phenomena.
Question: The issue of not being able to access information from a long time ago in an input sequence, often faced by standard RNNs, is known as the ________ problem.
Answer: Long-Term Dependency Difficulty: Hard Explanation: The Long-Term Dependency problem is a difficulty faced by standard RNNs, where they struggle to access or use information from a long time ago in the input sequence. This is due to the vanishing gradients problem, which makes it hard for the RNN to learn the appropriate weights to capture these long-term dependencies.
Question: The use of RNNs in sequence-to-sequence models was first introduced in the field of ________.
Answer: Machine Translation Difficulty: Medium Explanation: The use of RNNs in sequence-to-sequence (seq2seq) models was first introduced in the field of machine translation. Seq2seq models, with an encoder RNN to process the input sequence and a decoder RNN to generate the output sequence, have since been used in a variety of other applications, including text summarization, speech recognition, and more.
Question: The term ________ in the context of RNNs refers to the maximum number of steps that the model can look back in time while learning.
Answer: Memory Length Difficulty: Medium Explanation: Memory length, in the context of RNNs, refers to the maximum number of steps that the model can look back in time while learning. In practice, this memory length is often limited by computational constraints and the vanishing gradient problem, which makes it difficult for RNNs to learn long-term dependencies.
Question: In RNN, a training technique that involves reversing the sequence input while keeping the output the same, useful for many-to-one tasks, is called ________.
Answer: Sequence Reversal Difficulty: Hard Explanation: Sequence Reversal is a training technique used in RNNs that involves reversing the sequence of the input while keeping the output the same. This can be useful in many-to-one tasks, like sentiment analysis, where the final output is a function of the entire input sequence.
Question: ________ is a model that uses two RNNs - one to encode the input sequence into a fixed-length vector, and another to decode that vector into an output sequence.
Answer: Sequence-to-Sequence Model Difficulty: Hard Explanation: Sequence-to-Sequence (Seq2Seq) model is a type of model that uses two RNNs: one to encode the input sequence into a fixed-length vector, and another to decode that vector into an output sequence. This kind of model is often used in tasks where the lengths of the input and output sequences can vary, such as machine translation or text summarization.
Question: RNNs have a unique feature called ________ which helps them to remember all the information in sequence.
Answer: Sequential Memory Difficulty: Medium Explanation: Sequential Memory is a feature unique to RNNs that allows them to 'remember' or store all the information in a sequence. This memory capability is essential in tasks such as time series prediction, natural language processing, and anything else that requires understanding the context from earlier inputs.
Question: A problem faced by RNNs where a model assigns more importance to recent inputs over the old ones is called the ________ problem.
Answer: Short-term Memory Difficulty: Hard Explanation: The short-term memory problem refers to a limitation faced by RNNs where they struggle to connect information from long ago in the sequence to the present task. This is due to their architecture which assigns more importance to recent inputs over the old ones.
Question: A special kind of RNN which performs the same task for every element of a sequence, with the output being dependent on previous computations is known as a ________.
Answer: Simple RNN Difficulty: Medium Explanation: A Simple RNN, also known as a Vanilla RNN, is a type of RNN that performs the same task for each element in a sequence, with the output being dependent on previous computations. Despite its simplicity, Simple RNNs can model complex temporal structures given enough neurons and layers, but they often struggle with long sequences due to the vanishing gradient problem.
Question: The type of RNN where connections between units form a directed cycle is known as a ________.
Answer: Simple Recurrent Neural Network Difficulty: Medium Explanation: A Simple Recurrent Neural Network is a type of RNN where connections between units form a directed cycle. This architecture allows it to exhibit dynamic temporal behavior and process sequence data. Despite its name, Simple RNNs can represent complex patterns in sequence data, but they often struggle with learning long-term dependencies due to the vanishing gradient problem.
Question: A type of RNN that has connections forming a directed cycle is known as a ________ RNN.
Answer: Simple or Elman Difficulty: Hard Explanation: A simple RNN, also known as an Elman network, is a type of RNN where the connections between units form a directed cycle. This allows information to be passed from one step in the sequence to the next, making it capable of processing sequence data.
Question: ________ is a method in RNNs to overcome the vanishing gradient problem by using a shortcut connection that allows the gradient to flow directly backward through time.
Answer: Skip Connection Difficulty: Hard Explanation: Skip Connections, also known as residual connections, are a method used in RNNs (and other types of networks like CNNs) to overcome the vanishing gradient problem. They work by creating a shortcut connection that bypasses one or more layers, allowing the gradient to flow directly backward through time, which can make it easier for the network to learn long-range dependencies in the data.
Question: ________ is an optimization algorithm that is commonly used in training deep learning models such as RNNs.
Answer: Stochastic Gradient Descent (SGD) Difficulty: Easy Explanation: Stochastic Gradient Descent (SGD) is an optimization algorithm that is commonly used to train deep learning models, including RNNs. It's a variant of the gradient descent algorithm that performs the update step using only a single or a few training examples, which can make the training process faster and more efficient.
Question: A specific type of RNN that is structured to make predictions based on sequence data, with the assumption that future steps depend on the past steps, is called a ________.
Answer: Time Series RNN Difficulty: Medium Explanation: A Time Series RNN is a specific type of RNN that is designed to make predictions based on sequence data, where it is assumed that future steps are dependent on past steps. This makes them particularly useful in tasks such as weather forecasting, stock market prediction, and any other type of time series prediction task.
Question: In the context of Recurrent Neural Networks, the problem where the contributions of information decrease geometrically over time is known as the ________ problem.
Answer: Vanishing Gradient Difficulty: Hard Explanation: The Vanishing Gradient problem is a difficulty that occurs in the training of artificial neural networks with gradient-based learning methods. In Recurrent Neural Networks, this problem arises due to the nature of backpropagation in time, where the contribution of information decays geometrically over time and hence, it becomes harder to learn and tune the parameters of earlier layers.
Question: A severe problem that RNNs face when sequences are long is called the ________ problem.
Answer: Vanishing Gradient Difficulty: Medium Explanation: The vanishing gradient problem is a difficulty faced by RNNs, where the gradient of the loss function decays exponentially with time when sequences are long. This makes it hard for the RNN to learn and tune the parameters associated with the early time steps of the sequence.