DSV_ML_DL

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is exploration vs exploitation trade-off?

A dilemma an agent faces when completing objectives. Exploration tries to gather information about environment but can lead to suboptimal or even negative rewards. Exploitation, agent select actions that yield highest reward based upon current knowledge. How every i may miss better actions to take if it explored more.

What is a Dynamical System?

A dynamical system is how a sate of system moves to another state. In other words, a system that changes over time according to a set of fixed rules. Any function involving recurrence can be considered an RNN.

What are RNNs?

A family of neural networks for processing sequential data. Specialized for processing sequence values. RNN can handle much longer sequences of variable length. Through each "sweep" each hidden state will accumulate information from the previous hidden state which allows for predictions based upon sequential data that can be useful in speech recognition and others.

What is bootstrapping? and why it can be a bad idea?

Bootstraping is resampling from data by selecting data uniformly. It can be helping in scenarious when there is noisy or small amounts of data. However it can lead to low generalizability of unseen data as it can overfit to the training data.

What is the Exploding gradient problem?

Exploding gradients is about big updates to weights. Which can make the predictions wrong.

Difference between GRU and LSTM?

GRU is also a type of RNN. It is slightly different from LSTM. The main difference between LSTM and GRU Gates is the number of gates. 1)LSTM uses three gates(input, forget, and output gate) while GRU uses two gates(reset and update). 2)In GRU, the update gate is similar to the input and forget gate of LSTM and the reset gate is another gate used to decide how much past information to forget. 3)GRU is faster compared to LSTM. 4)GRU needs fewer data to generalize.

How does GRU work?

GRU's are similar to LSTM's where it has an Update and Reset Gate. UPDATE gate: responsible for determining amount of previous information needed to pass along the next state. RESET gate: wether previous cell state is important or not.

What is gradient clipping?

Gradient clipping is a way to rectify this vanishing and exploding gradient problem. Gradient clipping can interchangeably refer to two things. Gradient Scaling: normalizing error gradient vector such that the vector norm equals a defined value. such as 1.0 Gradient clipping: forcing gradient values to a specific minimum or maximum value if gradients exceed an expected range.

Give some examples of GRU uses

Image captioning, Traffic prediction, sentiment analysis, music geenration.

Explain gates used in LSTM with their functions.

LSTM has three gates: input, forget, and output gate. The input gate is used to add the information to the network, Forget used to discard the information, and the Output gate decides which information pass to the hidden and output layer.

How does LSTM solve RNN's problem

LSTM sovles vanishing gradient problem that is most prominent in RNN models.

Define the contents of LSTM and how each gates do.

LSTM's at least the vanilla version, has 3 mains gates. Input gate: decides which information will be stored in the long term memory (decides usefulness of the infromaiton). Forget Gate: decided which information from long term memory is kept Output Gate: output gate takes current input to produce new short term memory which will be passed on the next cell. in the next time step or unit.

In which data input situations would you prefer an LSTM to a GRU unit?

LSTMs are preferred when input sequences have long-term dependencies or require selective retention of multiple features, while GRUs are preferred when computational efficiency is a priority or input sequences have shorter dependencies.

What is Lasso Regression?

Lasso tried to eliminate irrelevant features thot do not help with the prediction. This is done use an L1 Normalization where it adds a factor of sum of absolute value of coefficients in the objective function.

What is markov decision process?

Mathematical framework used to define reinforcement learning problems. Where an agent interactes with thei environment to achieve a goal. MDP can be defined as (S,A,P,R,y). Where, S finite states A agent P policy R reward y discount factor

What is CNN's?

Regularized version of multilayer perceptron. Assemble patterns of increasing complexity using smaller patterns captured by filters. Officially, CCN is a neural network with convolutional layers. The CNN layer has a number of filters that perform convolution. "A convolution can be thought as "looking at a function's surroundings to make better/accurate predictions of its outcome."

What is Ridge Regression?

Ridge penalizes irrelevant features that do not help with prediction. This is done using L2 normalization, where it adds a factor of sum of squares of coefficients in the optimization objective

What are LSTMs?

Special Type of RNN that can learn order dependence in sequence prediction problems. This also mitigates the vanishing gradient problem.

What is the main difference between LSTM and RNN?

The main advantage of using LSTM is to avoid vanishing gradients. Another feature of LSTM is the forget gate which allows for judging which token/input to give importance by either discarding the previous input from the last hidden layer or the new input.

What is the vanishing gradient problem?

When doing back propagation, you calculate gradients for each layer. However when the gradients become smaller and smaller through each hidden layer the differences become so small that it doesn't really affect the earlier hidden layer weights.. In deep networks, updates to backprop were close to zero. ReLU and variations address the vanishing gradient problem

What is the link to Linear Regression to Logistic Regression

When it comes to logistic reg, Weight and coefficient is passed on to a sigmoid function for classifying between 0 and 1. You can aslo do multi-label and ordinal multi-label classification with logistic regression.

What would be the consequence to an LSTM unit if the forget gate (ft) was removed?

problems can arise such as: - Vanishing or exploding gradients. Diffiuclty in training - accumilation of irrelevant features. leading to overfitting in summary, removing forget gate will impair the ability to selectively retain or discard information leading to terrible modeling.


Conjuntos de estudio relacionados

Week #3 Quiz CIST1220-Structured Query Language-SQL

View Set

3. Physics Practice Questions - Chapter 3

View Set

The Skeletal System Ch8 Smartbook Assignment

View Set

NURS 7001: Unit 3 Review Questions

View Set

ACC Federal Taxes | Chapter 9 | Review

View Set

Ch 63 Low Back Pain and Intervertebral Disc Disease

View Set