CS 254 Exam 2
Which of the following is not true about using convolutional neural networks (CNNs) for image analysis?
A CNN can be trained for unsupervised learning tasks, whereas an ordinary neural net cannot.
Having max pooling layer in between two convolutional layers, always decrease the number of the parameters(weights).
FALSE
If you increase the number of hidden layers in a Multi Layer Perceptron neural networks (fully connected networks), the classification error of test data always decreases.
FALSE
In backpropagation learning, we should start with a small learning rate parameter and slowly increase it during the learning process.
FALSE
Suppose a convolutional neural network is trained on MNIST dataset (handwritten digits dataset). This trained model is then given a completely BLACK image as an input (Zero Input). The output probabilities for this input would be ZERO for all classes.
FALSE
Suppose a convolutional neural network is trained on MNIST dataset (handwritten digits dataset). This trained model is then given a completely white image as an input. The output probabilities for this input would be equal for all classes.
FALSE
Suppose we have one hidden layer neural network as shown below. The hidden layer in this network works as a dimensionality reductor. Now instead of using this hidden layer, we replace it with a dimensionality reduction technique such as PCA. The network that uses a dimensionality reduction technique always give same output as network with hidden layer?
FALSE
The number of neurons in the output layer must match the number of classes (Where the number of classes is greater than 2) in a supervised learning task.
FALSE
weight sharing can occur in convolutional neural network or fully connected neural network (Multi-layer perceptron)
FALSE
What are the steps for using a gradient descent algorithm in training neural networks?
Initialize random weight and bias Pass an input through the network and get values from output layer Calculate error between the actual value and the predicted value Go to each neurons which contributes to the error and change its respective values to reduce the error. Reiterate until you find the best weights of network
Batch Normalization is helpful because
It normalizes (changes) all the input before sending it to the next layer
In a neural network, knowing the weight and bias of each neuron is the most important step. If you can somehow get the correct value of weight and bias for each neuron, you can approximate any function. What would be the best way to approach this?
Iteratively check that after assigning a value how far you are from the best values, and slightly change the assigned values to make them better
Which of the following gives non-linearity to a neural network?
Rectified Linear Unit
Suppose you want to redesign the AlexNet architecture to reduce the number of arithmetic operations required for each backprop update. Which one of these choices will reduce the number of arithmetic operations the most:
Removing a convolutional layer
A multiple-layer neural network with linear activation functions is equivalent to one single-layer perceptron that uses the same error function on the output layer and has the same number of inputs.
TRUE
A neural network with multiple hidden layers and sigmoid nodes can form non-linear decision boundaries.
TRUE
A perceptron is guaranteed to perfectly learn a given linearly separable function within a finite number of training steps.
TRUE
In a neural network, Data Augmentation, Dropout, Regularization all deal with overfitting.
TRUE
In a neural network, Dropout, Regularization and Batch normalization all deal with overfitting.
TRUE
It is possible to use 1x1 convolution filters.
TRUE
Using momentum with gradient descent helps to find solutions more quickly.
TRUE
When pooling layer is added in a convolutional neural network, translation in- variance is preserved.
TRUE
dead unit in a neural network is the unit which doesn't update during training by any of its neighbour
TRUE
using Mini-batch gradient decent the model update frequency is higher than batch gradient descent which allows for more robust convergence, avoiding local minima.
TRUE
For a classification task, instead of random weight initializations in a neural network, we set all the weights to zero. Which of the following statements is true?
The neural network will train but all the neurons will end up recognizing the same thing
Which of the following is NOT correct about ReLU (Rectified Linear Unit) activation function?
Zero-centered output.