Deep Learning With Python
What purpose does a loss function serve in deep learning?
It serves as a feedback signal
What is the process of manually engineering good layers of representations for data?
Feature engineering
What are kernel methods?
Kernel methods are a group of classification algorithms, the best known of which is the support vector machine (SVM)
The central class of Keras is the ___________________.
Layer
What might be a better term than 'deep' learning?
Layered representations learning, or hierarchical representations learning. These better describe what is actually happening.
How many axes does a 5-D tensor have?
Five
The Sequential class is used only for what arrangement of layers?
For linear stacks of layers
What does a loss function measure?
How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
You can intuitively understand the dimensionality of your representation space as _______________________
"how much freedom you're allowing the model to have when learning internal representations."
>>> x = np.zeros((300, 20)) >>> x = np.transpose(x) >>> print(x.shape) What is the new shape?
(20, 300)
If you have 60,000 28x28 images, what is the shape of the resulting vector?
(60000, 28, 28)
The MNIST dataset has 60,000 training images each of which is 28 x 28. What is its shape?
(60000, 28, 28)
model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) Having 16 units means the weight matrix W will have what shape?
(input_dimension, 16)
a = tf.ones((2, 2)) What does this describe?
A Tensor
What are examples of transformations?
A coordinate system change, or linear projections (which may destroy information), translations, nonlinear operations (such as "select all points such that x > 0"), and so on
What is a decision boundary?
A decision boundary can be thought of as a line or surface separating your training data into two spaces corresponding to two categories
What is a deep-learning model?
A directed, acyclic graph of layers
What are the three parts of the compilation step?
A loss function, an optimizer, and metrics to monitor during training and testing
What is a tensor that contains only one number?
A scalar
What does a 10-way softmax layer return?
An array of 10 probability scores which sum to 1
What kind of a shape does a scalar have?
An empty shape, ().
What algorithm does the optimizer use?
Backpropagation -- the central algorithm in deep learning.
What are the three most common use cases of neural networks?
Binary classification, multiclass classification, and scalar regression
What is step 3 of a Keras workflow?
Configure the learning process by choosing a loss function, an optimizer, and some metrics to monitor
What is a tuple?
It groups any number of items into a single compound value. Syntactically, it is a comma-separated sequence of values.
What shape does a vector have?
It has a shape with a single element such as (5,)
What does reshaping a tensor mean?
Rearranging its rows and columns to match a target shape
What is the simplest way to mitigate overfitting?
Reduce the size of the model
____________________ is the generalization of the concept of derivatives to functions of multidimensional inputs: that is, to functions that take tensors as inputs.
The gradient
What is the rank of a tensor?
The number of axes
What defines a hypothesis space?
The topology of a network
What is the most common network architecture by far?
linear stacks of layers
How do you modify the state of a variable?
via its assign method
Deep learning models consist of chains of simple tensor operations, parameterized by ______________________
weights
In padding lists so they all have the same length, what is the integer tensor's shape?
(samples, max_length)
From the above question, what is the shape of z?
(3,4,5)
What is another word for a 2D tensor?
A matrix
How are kernel functions typically crafted?
By hand
What is a kernel method?
Kernel methods are a group of classification algorithms.
Everything in Keras is either a ________________ or something that closely interacts with one.
Layer
What is the second way of turning lists into a tensor?
Multi-hot encoding the lists to turn them into vectors of 0s and 1s.
e *= d What does this do?
Multiply two tensors (element-wise).
In technical terms, we'd say that the transformation implemented by a layer is ____________________________ by its weights
parameterized
Input data in a neural network corresponds with what output data?
the corresponding targets
Image data is usually processed by what kind of layers?
2D convolution layers (Conv2D).
Image data is normally stored in n-dimensional tensors. What is n?
4
The loss is the quantity you'll attempt to minimize during training, so what should it represent?
A measure of success for the task you're trying to solve.
What is a hypothesis space?
A predefined set of operations through which machine learning algorithms search, in order to find the best transformations that turn data into more useful representations for a given task.
What are some advanced ways to help gradient propagation?
Batch normalization, residual connections, and depth-wise separable convolutions
What is Machine Learning?
Searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal.
What determines how learning proceeds?
The optimizer
Can a layer be stateless?
Yes
What is a disadvantage of having more units?
it makes the model more computationally expensive and may lead to learning unwanted patterns
A model that is too small will not _____________________
overfit
What is the class meant to manage modifiable state in TensorFlow?
tf.Variable
output = relu(dot(input, W) + b). What is W?
the weight matrix
If the shape of a vector is (10000, 28, 28) then what is its length?
10000
z = np.array([[ ... [5, 78, 2, 34, 0], ... [59, 17, 4, 32, 11], ... [14, 88, 7, 22, 5], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [4, 77, 5, 33, 3], ... [6, 79, 3, 35, 1], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [6, 79, 3, 35, 1], ... [2, 56, 4, 44, 1], ... [7, 80, 4, 36, 2]]]) What is the rank of z (its ndim)?
3
What is a layer?
A data-processing module that you can think of as a filter for data.
from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 32 refer to?
A dense layer with 32 outputs, aka output units
What is a synonym for a dense neural network?
A fully connected network
What is another way of saying 'a function of multidimensional inputs'?
A function that takes a tensor as inputs
What is a kernel function?
A kernel function is a computationally tractable operation that maps any two points in your initial space to the distance between these points in your target representation space, completely bypassing the explicit computation of the new representation
What is another word for a zero dimensional tensor?
A scalar
What is regularization?
A set of best practices that actively impede the model's ability to fit perfectly to the training data, with the goal of making the model perform better during validation.
What is a five-dimensional vector?
A vector with five entries
What might be a good metric to monitor in training and testing for an image classification problem?
Accuracy (the fraction of the images that were correctly classified)
How do you indicate a tuple?
Although it is not necessary, it is conventional to enclose tuples in parentheses: >>> julia = ("Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia")
What is a simple transformation such as a high-dimensional non-linear projection?
An SVM
What is learning, in the context of machine learning?
An automatic search process for better representations
What is shallow learning?
Any approach to machine learning that tends to focus on learning only one or two layers of representations of the data.
What's a representation?
At its core, it's a different way to look at data—to represent or encode data. For instance, a color image can be encoded in the RGB format (red-green-blue) or in the HSV format (hue-saturation-value): these are two different representations of the same data.
What is step 2 of a Keras workflow?
Define a network of layers (or model ) that maps your inputs to your targets
What is step 1 of a Keras workflow?
Define your training data: input tensors and target tensors
Simple vector data, stored in 2D tensors of shape (samples, features), is often processed by what kind of layers?
Densely connected layers, otherwise known as fully connected or dense layers (the Dense) class in Keras
What is dimensionality?
Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times.
What is the name of the process of extracting useful representations manually?
Feature engineering
What is gradient boosting good for?
Gradient boosting is used for problems where structured data is available, whereas deep learning is used for perceptual problems such as image classification.
What is a gradient?
It is the derivative of a tensor operation
What does the deep in deep learning refer to?
It isn't a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations.
What does the loss function do?
It takes the predictions of the network and the true target (what you wanted the network to output) and computes a distance score, capturing how well the network has done on this specific example
What does the term "regularization" imply?
It tends to make the model simpler, more "regular," its curve smoother, more "generic"; thus it is less specific to the training set and better able to generalize by more closely approximating the latent manifold of the data.
What is an objective function?
It's a synonym for the loss function: To control the output of a neural network, you need to be able to measure how far this output is from what you expected. In this sense the 'objective' function refers to the how close you are to accomplishing the objective of the network.
What is step 4 of a Keras workflow?
Iterate on your training data by calling the fit() method of your model
What defines a layer's state?
Its weights
What is a dense neural network?
Layers are fully connected (dense) by the neurons in a network layer. Each neuron in a layer receives an input from all the neurons present in the previous layer—thus, they're densely connected.
What are the three key attributes that define a tensor?
Number of axes (rank); shape (a tuple of integers that describes how many dimensions the tensor has along each axis) and data type
How many axes does a five-dimensional vector have?
One
The gradient-descent process must be based on how many scalar loss values?
One
What is the first way of turning a list into a tensor?
Pad your lists so that they all have the same length
How do you reduce the size of the model?
Reduce the number of learnable parameters in the model, determined by the number of layers and the number of units per layer.
c = tf.sqrt(a) What does this do?
Take the square root.
What is an optimizer?
The mechanism through which the network will update itself based on the data it sees and its loss function.
What are the two ways of defining a model?
Using the Sequential class or the functional API
____________________ in a layer encapsulate some state and ___________________ some computation
Weights, and a forward pass
output = relu(dot(input, W) + b) and model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) The dot product with W will project the input data onto what?
a 16-dimensional representation space (and then you'll add the bias vector b and apply the relu operation)
What is GradientTape in TensorFlow?
a mathematical tool for automatic differentiation (autodiff), which is the core functionality of TensorFlow. It does not "track" the autodiff, it is a key part of performing the autodiff.
A simple API should have a single _____________________ around which everything is centered.
abstraction
In the context of tensors, what is another word for dimension?
axis
Classifying movie reviews as positive or negative is an example of what kind of classification?
binary classification
The Embedding layer is best understood as a _________________ that maps integer indices (which stand for specific words) to dense vectors.
dictionary
The entire learning process is made possible by the fact that neural networks are chains of ___________________________
differentiable tensor operations
Why is GradientTape so named?
it is used to record ("tape") a sequence of operations performed upon some input and producing some output, so that the output can be differentiated with respect to the input (via backpropagation / reverse-mode autodiff) (in order to then perform gradient descent optimization).
What loss function might you use for a regression problem?
mean squared error
Classifying news wires by topic is an example of ______________
multiclass classification
Classifying news wires by topic is an example of what kind of classification?
multiclass classification
Reducing the network's size is one of the most common ______________________ techniques
regularization
What are the two axes of a matrix?
rows and columns
In machine learning, data points are called ______________.
samples
A vector has a shape with a ____________________ element, such as (5,)
single
What does a layer encapsulate?
some weights and some computation
In order to be useful for a neural network, a list must be turned into a __________________.
tensor
Selecting specific elements in a tensor is called _______________
tensor slicing
What do we call selecting specific elements in a tensor?
tensor slicing
A layer is a data-processing module that takes as input one or more _____________ and that outputs one or more ______________.
tensors
Are weights scalars, matrices, or tensors?
tensors
To train a model, we'll need to update its state, which is a set of ____________.
tensors
In an image classification problem, what will the test set consist of?
test_images and test_labels. (That is, a set of images and their corresponding labels)
To train a model, what method do you use?
the fit() method
What is an advantage of having more units?
this allows your model to learn more-complex representations
What kind of variables are tracked by default?
trainable variables
Are NumPy arrays assignable?
yes
What is symbolic AI?
The idea that human-level artificial intelligence could be achieved by having programmers handcraft a sufficiently large set of explicit rules for manipulating knowledge
What are the two essential characteristics of how deep learning learns from data?
The incremental, layer-by-layer way in which increasingly complex representations are developed, and the fact that these intermediate incremental representations are learned jointly, each layer being updated to follow both the representational needs of the layer above and the needs of the layer below.
What is crossentropy?
A quantity from the field of information theory that measures the distance between probability distributions or, in this case, between the ground-truth distribution and your predictions.
What is the difference between a relu and a sigmoid function?
A relu (rectified linear unit) is a function meant to zero out negative values, whereas a sigmoid "squashes" arbitrary values into the [0, 1] interval, outputting something that can be interpreted as a probability.
d=b+c What does this do?
Add two tensors (element-wise).
What is backpropagation?
Because all tensor operations in neural networks are differentiable, it's possible to apply the chain rule of derivation to find the gradient function mapping the current parameters and current batch of data to a gradient value.
What should the loss represent?
The loss is the quantity you'll attempt to minimize during training, so it should represent a measure of success for the task you're trying to solve.
What is an optimizer in a Deep Learning network?
The mechanism through which the network will update itself based on the data it sees and its loss function.
What is the depth of a model?
The number of layers contribute to a model of the data.
model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) What is the first argument being passed to each layer?
The number of units in the layer
What does the optimizer specify?
The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.
What is the shape of a numpy array?
The shape attribute for numpy arrays returns the dimensions of the array. If Y has n rows and m columns, then Y.shape is (n,m). So Y.shape[0] is n.
What is the the best known kernel method?
The support vector machine (SVM).
What is the name of the data that the model will learn from?
The training set
What is the purpose of word embeddings?
They are meant to map human language into a geometric space.
What is the central problem in machine learning and deep learning?
To meaningfully transform data: in other words, to learn useful representations of the input data at hand—representations that get us closer to the expected output.
What is transposition?
Transposing a matrix means exchanging its rows and its columns, so that x[i, :] becomes x[:, i]
What library do most practitioners of gradient boosting use?
XGBoost library, which offers support for the two most popular languages of data science: Python and R
How do you determine the right number of layers or the right size for each layer?
You must evaluate an array of different architectures on your validation set, not on your test set
How would you track a constant tensor?
You'd have to manually mark it as being tracked by calling tape.watch() on it.
What's another way of saying "more units"?
a higher-dimensional representation space
In using the padding list to tensor technique, what kind of layer do you need to start your model?
a layer capable of handling such integer tensors , also known as an Embedding layer
What is validation data?
a set of inputs that the model doesn't see during training.
For multiloss networks, all losses are combined (via averaging) into __________________________________.
a single scalar quantity
What shape does a scalar have?
an empty shape, ().
What is a feature?
an individual measurable property or characteristic of a phenomenon
Before you start training a model, you need to pick three things. What are they?
an optimizer, a loss, and some metrics
What algorithm does the optimizer use to adjust the values of the weights a little, in a direction that will lower the loss score?
backpropagation
What is the central algorithm in deep learning?
backpropagation
What are text-processing models called that treat input words as a set, discarding their original order?
bag-of-words models
Classifying movie reviews as positive or negative is an example of _________________________
binary classification
What loss function might you use for a two-class classification problem?
binary crossentropy
What loss function might you use for a many-class classification problem?
categorical crossentropy
On rare occasions, you may see a ______________ tensor
char
In machine learning, a category in a classification problem is called a _____________________.
class.
Is logistic regression a regression algorithm or a classification algorithm?
classification
Building deep-learning models in Keras is done by clipping together ___________________________ to form useful data-transformation pipelines.
compatible layers
What are convolutional networks generally used for?
computer vision
What loss function might you use for a sequence-learning problem?
connectionist temporal classification (CTC)
A neural network expects to process __________________ batches of data.
contiguous
What is the data type usually called in Python libraries?
dtype
What is learning in the context of machine learning?
finding a set of values for the weights of all layers in a network, such that the network will correctly map example inputs to their associated targets
A network, or model, can also be thought of as a _____________
function. In machine learning, the specific model you are using is the function and requires parameters in order to make a prediction on new data.
Machine learning, in a way, is the science of ______________________?
generalization
The attempt to generate knowledge that can be used across different tasks.
generalization
What does the topology of a network do?
it defines a hypothesis space
What does the fit() method do?
it runs mini-batch gradient descent for you. You can also use it to monitor your loss and metrics on validation data
One or more tensors learned with stochastic gradient descent, together contain the network's _______________________.
knowledge
The class associated with a specific sample is called a _________________.
label
Learning means finding a combination of ____________________ that minimizes a loss function for a given set of training data samples and their corresponding targets.
model parameters
Once your model is trained, what method to you use to generate predictions on new inputs?
model.predict()
Layers are assembled into ____________________.
models
Name some of the arbitrary network architectures that Keras supports
multi-input or multi-output models, layer sharing, and model sharing
A neural network that has multiple outputs may have how many loss functions?
multiple loss functions (one per output)
What is a tensor's rank called in Python libraries such as Numpy?
ndim
Layers are combined into a _____________________
network (or model)
What does the network.compile function look like in Keras?
network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
Are TensorFlow tensors assignable?
no
What kind of object is a layer's weights?
one or more tensors
each Dense layer with a relu activation implements a chain of tensor operations. Setting the variable as 'output', what is that chain of tensor operations?
output = relu(dot(input, W) + b)
What are unwanted patterns?
patterns that will improve performance on the training data but not on the test data
What's the first requirement of creating a variable?
provide some initial value, such as a random tensor
Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically processed by what kind of layers?
recurrent layers, such as an LSTM layer.
What kind of NN is used for for sequence processing?
recurrent network
Estimating the price of a house, given real-estate data, is an example of ______________
regression
Estimating the price of a house, given real-estate data, is an example of what?
scalar regression
What are text-processing models that care about word order called?
sequence models
In the context of deep learning, what is another word for 'parameters'?
settings, also 'weights'
What is a hypothesis space?
space of possibilities
What is a constant tensor?
tf.constant are fixed values, and hence not trainable
In Keras, what is the single abstraction around which everything is centered?
the Layer class
What is the meaning of 'the number of units in the layer'?
the dimensionality of representation space of the layer
How/where do you specify the three required elements before you start training a model?
the model.compile() method.
The specification of what a layer does to its input data is stored in the layer's ____________________.
weights
If tensors aren't assignable, how do we update a model's state?
with variables
What does the loss function define?
It defines the feedback signal used for learning
What does a support vector machine do?
It finds a good decision boundary: a line or surface separating the training data into two spaces corresponding to two different categories.
What can you tell from this layer: network.add(layers.Dense(10, activation='softmax'))
It is densely connected; it is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1).
What is the functional API for in Keras?
It is for directed acyclic graphs of layers, which lets you build completely arbitrary architectures.
What is the purpose of the loss function ?
It is how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.
What is mini-batch stochastic gradient descent?
Learning happens by drawing random batches of data samples and their targets, and computing the gradient of the model parameters with respect to the loss on the batch. The model parameters are then moved a bit (the magnitude of the move is defined by the learning rate) in the opposite direction from the gradient.
What is the definition of learning in the context of deep learning?
Learning means finding a set of values for the model's weights that minimizes a loss function for a given set of training data samples and their corresponding targets.
What are element-wise operations?
Operations that are applied independently to each entry in the tensors being considered
A Layer is an object that encapsulates two things. What are they?
Some state and some computation
What is the general workflow for finding an appropriate model size?
Start with relatively few layers and parameters, and increase the size of the layers or add new layers until you see diminishing returns with regard to validation loss.
What kind of tensors don't exist in Numpy (or in most other libraries), and why not?
Strings, because tensors live in preallocated, contiguous memory segments and strings, being variable length, would preclude the use of this implementation.
How many axes does a 3D tensor have? How many does a matrix have?
3, and 2, respectively
e = tf.matmul(a, b) What does this do?
Take the product of two tensors
b = tf.square(a) What does this do?
Take the square.
What is eager execution?
Tensor operations get executed on the fly: at any point, you can print what the current result is, just like in NumPy.
What do we apply to find the gradient function mapping the current parameters and current batch of data to a gradient value?
The chain rule of derivation
______________________ takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It's effectively a dictionary lookup
The embedding layer
What does the optimizer specify?
The exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.
The notion of layer compatibility refers specifically to what?
The fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape.
from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 784 refer to?
The first dimension
What is accuracy in the context of image classification?
The fraction of the images that were correctly classified.
What is the purpose of the optimizer?
The fundamental trick in deep learning is to use the distance score generated by the loss function as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current example. That's what the optimizer does.