Deep Learning Midterm 2
The Jaccard Coefficient is represented by: σ(X,Y)= (|X∩Y|)/(|X∪Y|) σ(X,Y)= (|X∪Y|)/(|X∩Y|) σ(X,Y)= (|X-Y|)/(|X∪Y|) σ(X,Y)= (|X∩Y|)/(|X-Y|)
σ(X,Y)= (|X∩Y|)/(|X∪Y|)
As a default contrib.learn.DNNClassifier(hidden_units=[300,100], n_classes=10, feature_columns=feature_cols) will use A max value classifier A softmax classifier
A softmax classifier
Manifold assumption, also called the manifold hypothesis, holds that most real-world X-dimensional datasets lie close to a much Y-dimensional manifold. (X=high, Y=high) (X=low, Y=low) (X=low, Y=high) (X=high, Y=low)
(X=high, Y=low)
In an LSTM 1) A ______________ gate controls which parts of the long-term state should be erased. 2) A ______________ gate controls which parts should be added to the long-term state. 3) A _______________ gate controls which parts of the longterm state should be read and output at this time step
1) forget 2) input 3) output
Match the technique with the picture: (tsne=t-Sne, iso=Isomap, mds=Multidimensional scaling) 1) Swirl 2) Stretched parallelogram 3) Abstract fractal
1) mds 2) iso 3) tsne
If we are training the skipgram with a context widow of size 2, then for each center word, how many training examples are typically generated? (choose the best answer) 2 4 6 8
4
You're given a collection of images and asked to build a model to classify each image into 1 of 100 classes. You decided to create a DNN to build this model, if each image is 30x30 pixels and training batch size 50: The number of neurons in the passthrough (input) layer is: _______________________ If you have 1000 neurons in the first hidden layer then the Weights matrix at this layer would have __________________ by __________________ dimensions If you have 800 neurons in the second hidden layer then the Weights matrix at this layer would have _____________ by _____________ dimensions The number of neurons in the output layer is ____________
900 900, 1000 1000, 800 100
The vanishing/exploding gradients problem in RNNs is controlled using which of the following tricks: (Select all possible answers) Good parameter initialization Non-saturating activation function (eg. ReLU) Batch Normalization Gradient Clipping Faster Optimizers
All of these are correct
Select all the Unsupervised Learning Algorithms from the list k-Means Clustering Apriori Logistic Regression Multinomial Naive Bayes Principal Component Analysis
k-means clustering, apriori, pca
In RNNs "Dynamic Unrolling Through Time" is used to Avoid 'out of memory' errors Improve the accuracy of the network Avoid the vanishing gradient problem Avoid the exploding gradient problem All of the above None of the above
Avoid 'out of memory' errors
The use a DropoutWrapper is TF remedy to Avoid the vanishing gradient problem Avoid the exploding gradient problem Avoid overfitting All of the above None
Avoid overfitting
When training RNNs on long sequences, they may suffer from [blank] gradient problem Vanishing Exploding Both A and B Neither
Both A and B
True or False. A Manifold Hypothesis holds that most real world high-dimensional datasets lie very far away from a much lower-dimensional manifold based on the same data. (An example to think about would be multidimensional scaling).
False
True or False. The lower the dimensionality of the word embeddings vectors, the higher will be the accuracy on various semantic analogy tasks
False
True or False. Truncated Back Propagation through Time is when you unroll an RNN for an unlimited number of time steps during training.
False
The higher the dimensionality of the word embeddings vectors, the higher will be the accuracy on various semantic analogy tasks True False
False (Accuracy is parabolic with respect to dimensionality of word embedding vectors)
In Scikit-Learn's GridSearchCV, all you need to do is tell it which _______________ you want it to experiment with, and what values to try out, and it will evaluate all the possible combinations of ___________________ values, using cross-validation.:
hyperparameters, hyperparameter
The best activation function to use with Linear Regression is: Sigmoid ReLU TanH None of the above
None of the above
Which of the following statements are true: 1) A high-frequency term in the corpus has a low TF-IDF score 2) High Frequency of a term in an individual document has no impact on its TF-IDF score Only 1 Only 2 Both 1 and 2 Neither
Only 1
Your classifier identifies 9 females in a scene where there are 11 females and 13 males. But only 6 are correct and the remaining 3 are males. What is the precision and recall? Precision is 9/11, recall is 9/13 Precision is 9/13, recall is 9/11 Precision is 6/13, recall is 6/11 Precision is 6/9, recall is 6/11
Precision is 6/9, recall is 6/11
Which of the following is not a Dimensionality Reduction Approach (select all that apply) Projection Preservation Manifold Learning Principal Component Analysis Distribution
Preservation, Distribution
As a default contrib.learn.DNNClassifier(hidden_units=[300,100], n_classes=10, feature_columns=feature_cols) will use ReLU Sigmoid activation
ReLU
The curves/lines above can represent the softmax values for four inputs to the softmax function impossible! surely for some values not for four, but for for two inputs (blue and green)
impossible!
Which of the following are not the applications of RNN (select all that apply) Time Series Analysis Shortest Distance Computation Parts of Speech Tagging Text Summarization Handwriting Recognition
Shortest Distance Computation
Unrolling an RNN through time known as Backpropagation through time is an important step in [blank] of an RNN Training Testing Validation Compression
Training
RNNs can achieve high image recognition accuracy (>98% on MNIST data). True or False
True
True or False. PCA requires the whole training set to fit in memory in order for the
True
True or False. The GRU cell is a simplified version of the LSTM cell
True
True or False. Under the hood, the DNNClassifier class creates all the neuron layers, based on the ReLU activation function
True
In TF, when using e.g. the command lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons) LSTM cells manage how many separate state vectors? One num_units Two Any number declared by a separate command
Two
The command: contrib.learn.DNNClassifier(hidden_units=[300,100], n_classes=10, feature_columns=feature_cols) will create a neural network with Two deep layers and no output layer Two layers and an output layer One hundred layers of 300 neurons and an output layer Three hundred layers of 100 neurons and an output layer
Two layers and an output layer
To analyze time series data, such as stock prices, you would use recurrent neural networks (RNN) convolutional neural networks (CNN) Either Neither
recurrent neural networks (RNN)
A ___________ node and its _____________ method are used to save models in TensorFlow
saver, save
In Tensorflow, When you evaluate [blank], TensorFlow automatically determines the set of nodes that it depends on
variable