L2: MLP, backpropagation and activation functions

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Which loss function should be used in a binary classification task?

(Binary) Cross-entropy

Name 5 activation functions

1) Linear 2) Sigmoid 3) ReLU 4) Hyperbolic tangent 5) Softmax

Give three examples of loss functions

1) Mean squared error (MSE) 2) Mean absolute error 3) Cross-entropy

What defines an MLP's output layer for a binary classification task?

A single neuron with sigmoid activation function.

What is Leaky-ReLU?

A variant of ReLU where for negative inputs, a small slope (<1) is used.

In a regression test, what should the output produce?

All possible values (in principle from -∞ to ∞)

What is a solution to the vanishing gradient?

An activation function with a constant or linear derivative.

At which stage does an MLP optimize its parameters?

Backpropagation

Which loss function should be used in a multiclass classification task?

Categorical cross-entropy

How are the layers in a multilayer perceptron or feed-forward network called?

Dense or Fully connected layers

What is meant with the 'vanishing gradient'?

Due to the chain rule, the gradients are smaller layer after layer.

What is meant with 'dying ReLU'?

During training a ReLU unit can fall into a state where its output will be 0 for any input. It is very difficult to recover from such a sate as the gradient will also be 0.

How is the multilayer perceptron (MLP) also called?

Feed-forward network

What defines the output layer of an MLP that is designed for a classification task of five different classes?

Five neurons with softmax activation function.

What is special about the multilayer perceptron (MLP) or feedforward network?

Formed by more than one layer.

At which stage does an MLP calculate the loss/error?

Forward propagation

To which activation function does this the formula belong? tanh(σ) = (e^20 - 1)/(e^20 + 1)

Hyperbolic tangent

What does the chain rule say?

If L = f(a) and a = g(z) than dL/dz = dL/da × da/dz

When does a linear regression activation function work?

In the output layer of a regression test

Give the formula for the cross-entropy loss function that is used in binary classification.

L = -y log(p) - (1-y) log(1-p)

To which activation function does this the formula belong? σ(z) = z

Linear

Which activation function should be used in a regression test?

Linear

How is the feed-forward network also called?

Multilayer perceptron (MLP)

What is special about the XOR?

No single straight line can separate the two classes.

Can the perceptron solve the XOR?

No, the perceptron can only separate the space linearly.

In a multiclass classification test, what should the output produce?

One-hot encoding indicating the correct class

Give an example of an activation function with a constant or linear derivative.

ReLU

To which activation function does this the formula belong? ReLU(z) = max(z, 0)

ReLU

To which activation function does this the formula belong? σ(z) = 1/(1+e^(-z))

Sigmoid

Which activation function should be used in a binary classification test?

Sigmoid or Tanh

To which activation function does this the formula belong? σ(z_i) = e^(z_i)/(∑_i^N e^(z_j))

Softmax

Which activation function should be used in a multiclass classification task?

Softmax

How do you know how each individual parameter contributes to the error?

Take the derivative of the error (cost function) with respect to each parameter.

What does the loss (error/cost) function calculate?

The "cost" or distance between the network's output and a expected one.

What is dL/dw_ij^l?

The derivative of the error (cost function) with respect to each parameter.

What calculates the "cost" or distance between the network's output and a expected one?

The loss (error/cost) function

What happens at the stage 'forward propagation'?

The loss/error is calculated.

What are the outputs of XOR?

The output is 1 if only x1 or x2 is 1, but not both.

For a MLP with n layers, what is the output of the network (y)?

The output of the last layer

What happens at the stage 'backpropagation'?

The parameters (weights and biases) are updated while minimizing the loss (with gradient descent).

For what is a nonlinear activation function needed?

To learn more complex functions.

In a binary classification test, what should the output produce?

Two different values

For a MLP with n layers, the output of each layer (a^l) is defined as: a^l = f(W^l × a^(l-1) + b^l) What is a^0? What is a^n?

a^0 = x a^n = y

How is the output of each layer (a^l) defined, for a MLP with n layers?

a^l = f(W^l × a^(l-1) + b^l)

Suppose a MLP starts with an x value and a weight (W₁). u₁ = W₁x. There is one more hidden layer that leads to a₁ and this value is used to calculate the loss. Give the formula to calculate how much parameter W₁ contributes to the error.

dloss/dW₁ = dloss/da₁ × da₁/du₁ × du₁/dW₁

In the formula for the cross-entropy loss function that is used in binary classification ( L = -y log(p) - (1-y) log(1-p) ), what is p and what is y?

p is the output of the network and y is the target output.

Give the function of the linear activation function

σ(z) = z


Kaugnay na mga set ng pag-aaral

sadlier-oxford vocabulary workshop level E unit 2

View Set

Post World War II and Early Civil Rights Movement

View Set

micro chapter 5 graded assignment

View Set

Ch. 1 - Masculine and feminine nouns (El masculino y femenino)

View Set

THE SEVEN SIGNS: SIGNS 4 - 7 AND THE SEVEN PLAGUES

View Set