Lecture 6-7: Neural networks

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Describe the cost and misclassification rate trends for a model 1) without a decaying learning rate 2) with a decaying learning rate

1) Without a decaying learning rate, the cost and misclassification rate are expected to decrease as the model improves its performance during training. However, if the learning rate is not set appropriately, the cost and misclassification rate may oscillate or diverge, indicating that the model is not learning effectively. 2) A decaying learning rate help to avoid oscillations or divergence and improve the convergence speed of the model. The cost and misclassification rate are expected to decrease as the training progresses, albeit slower, due to the decreased learning rate.

Explain how the LSTM is working internally (principally, 5 steps)?

1. Determine the amount of old long-term memory to retain (weights, bias, sigm) 2. Calculate the potential new contribution to long-term memory (weights, bias, tanh) 3. Determine the weight of potential new long-term memory to add (weights, bias, sigm) 4. Process short-term memory with RNN 5. Scale output with long-term memory

Explain the dropout principle.

A regularization technique commonly used in deep learning to prevent overfitting. It works by randomly setting a portion of the activations in a layer to zero during training, which forces the network to learn multiple independent representations of the data.

Give the names of three pretrained networks. (No details required.)

AlexNet, ResNet, VGGNet

Describe the rationale behind convolutional neural networks, e.g., parameter sharing.

CNNs represent input variables and hidden layers as matrices, preserving spatial information. Sparse interactions and parameter sharing are key characteristics of CNNs that make them well-suited for processing grid-like data, such as images. They allow the model to learn rich and robust representations of the input data, while also reducing network parameters and improving generalization performance. Sparse interactions: Each output element in a convolutional layer depends only on a small number of input elements, achieved through the use of filters that extract features from the input data. Parameter sharing: The same set of weights is used for all input elements in a convolutional layer, allowing the network to learn more efficient representations of the data.

Explain how to recognize overfitting and how to use dropout to avoid overfitting (Examples 6.2-4). [figure]

Dropout is a regularization technique that is commonly used to avoid overfitting in neural networks. It works by randomly dropping out a fraction of the neurons in the model during training, which prevents the neurons from co-adapting and forces the model to learn more robust and generalizable representations of the data. To use dropout in a neural network, you can simply insert a dropout layer after each hidden layer of the network. The dropout layer randomly sets a fraction of the neurons to zero during training, while keeping all the neurons active during evaluation or inference. The fraction of neurons that are dropped out is controlled by a hyperparameter called the dropout rate (value between 0 and 1). A high or low dropout rate results in a high or low level of regularization, respectively. Dropout = one of the most popular regularization techniques due to having good performance and high simplicity (computationally cheap.)

Make calculations of number of parameters in different neural networks? Examples 6.1 and 6.3 (be able to perform calculation). [figure]

Example 6.1 - The MNIST dataset of handwritten digits - One layer from 28x28=784 pixels to 10 classes (0-9) means: 784*10+10=7 850 parameters. - Two layers with 200 hidden units means (784*200+200) + (200*10+10) = 159 010 parameters.

What input data is CNN suited for?

Grid-like data with spatial dependencies, such as images. Used for tasks like image classification, object detection, and image segmentation.

Describe which layers that can be inside a feature extraction frontend and a classification backend in a CNN.

In the feature extraction frontend of a CNN: Convolutional layers apply learnable filters to the input data to extract features at different scales and locations. Pooling layers down-sample the input data by applying a pooling operation, such as max or average pooling. In the classification backend of a CNN: Fully connected layers (dense layers) classify the features extracted by the frontend. Consist of neurons connected to all neurons in the previous layer. Use extracted features to make predictions about the input data.

Describe the structure of neural networks and their parameters.

Inspired by the structure and function of the human brain. They consist of layers of interconnected nodes (neurons), which process and transmit information. The layers of neurons in a NN are organized into an input layer, one or more hidden layers, and an output layer. The input layer receives the input data and passes it through the hidden layers, which perform computations on the data using a set of weights and biases. The weights and biases are the parameters of the neural network. The output layer produces the final output of the neural network, which can be a prediction, a classification, or a reconstruction of the input data.

Why was the LSTM developed from the RNN?

LSTM (Long Short-Term Memory) networks are a type of RNN designed to address the issue of long-term dependencies in sequential data. Traditional RNNs suffer from the problem of vanishing gradients, which makes it difficult for the network to learn long-term dependencies. LSTMs use a more complex structure with memory cells, input gates, output gates, and forget gates to allow the network to selectively retain or forget information from previous time steps, solving the vanishing gradient problem.

Describe what is unique with RNNs and CNNs and what they have in common.

RNNs: Process sequential data, capture temporal dependencies (time series). CNNs: Process grid-like data, capture spatial dependencies. Both: Learn hierarchical representations of data, achieve excellent performance on a wide range of tasks.

What does RNN stand for?

Recurrent Neural Network

What input data is RNN suited for?

Sequential data with temporal dependencies. Language translation and speech recognition are common applications.

Give examples of spatial filters and what output they produce for a simple input.

Spatial filters are image processing techniques that operate on the spatial domain of an image (the two-dimensional array of pixels that represents the image). They are used to modify the pixel values of an image in some way, based on the values of the pixels in the surrounding area. Some examples of spatial filters are Gabor, Sobel, and Median filters.

What are convolutional layers with stride, and pooling layers?

Stride and pooling layers are used to - reduce number of hidden units - reduce size of the input data - increase computational efficiency Stride: In convolutional layers, the stride determines the step size that the filter moves across the input data during the convolutional operation. A larger stride results in a smaller output feature map. Pooling layers: Used to down-sample the input data by applying a pooling operation, such as max or average pooling. Reduces the size of the input data by replacing a group of adjacent pixels with a single pixel. Helps reduce complexity and improve generalization performance.

Explain how CNNs are trained.

Supervised learning approach using a labeled dataset and loss function. Network trained to minimize loss by adjusting weights and biases.

Explain how RNNs are trained.

Supervised learning approach using a labeled dataset and loss function. Similar to CNN training, but captures temporal dependencies using feedback connections. The output of previous sample can be used as input for current sample.

Both CNNs and RNNs are typically trained using a variant of stochastic gradient descent. how does it work?

The NN is updated based on the gradient of the loss function with respect to the weights and biases, which are iteratively adjusted to minimize the loss function. This is typically done using a combination of forward propagation to compute the predicted output and backpropagation to compute the gradients of the loss function.

What is important to think about when initializing a neural network? For ReLU?

The cost functions for training NNs are usually non-convex, which implies that it is sensitive to the value of the initial parameters. Some of the important factors to consider when initializing a neural network are: Scale, Symmetry and Sparsity of the weights. For ReLU (Rectified Linear Unit), the scale of the weights should be small to avoid saturation of the ReLU function, which can prevent the model from learning (due to gradients approaching zero)

Which derivatives are needed for calculating gradients in a 2-layer neural network (no calculations needed)?

To calculate the gradients of the loss function with respect to all of the parameters in a 2-layer neural network, you will need to take the derivative of the loss function with respect to each of the weights and biases in both layers. For a 2-layer neural network, the parameters include the weights and biases of both the first and second layers. This will involve using the chain rule.

What is zero-padding in the context of CNN?

Zero-padding is used to preserve the spatial dimensions of the input data and control the size of the output feature map. (Zero-padding refers to the process of adding a border of zero-valued pixels around the input data before applying a convolutional operation. This allows the output feature map to have the same spatial dimensions as the input data, which can be useful for a number of reasons. It is typically implemented by adding an equal number of zero-valued pixels to the top, bottom, left, and right sides of the input data.)

Illustrate three different activations functions and give their names. [figure]

[figure]


Kaugnay na mga set ng pag-aaral

IELTS Speaking Part 1 - Email - Simple version

View Set

iteration 1, ISM3314C Iteration 2, iteration 3, iteration 4, Iteration 5, Iteration 6, Iteration 7

View Set

biology for non science majors chapter 7 8 9 10

View Set

CompTIA A+ Certification 901 chp 8 Display Devices

View Set

Chapter 3 Working with Financial Statements

View Set

Adams Pharmacology Chapter 44-47

View Set

P.E - Individual/Dual Sports and Games

View Set