Deep Learning With Python

Ace your homework & exams now with Quizwiz!

What purpose does a loss function serve in deep learning?

It serves as a feedback signal

What is the process of manually engineering good layers of representations for data?

Feature engineering

What are kernel methods?

Kernel methods are a group of classification algorithms, the best known of which is the support vector machine (SVM)

The central class of Keras is the ___________________.

Layer

What might be a better term than 'deep' learning?

Layered representations learning, or hierarchical representations learning. These better describe what is actually happening.

How many axes does a 5-D tensor have?

Five

The Sequential class is used only for what arrangement of layers?

For linear stacks of layers

What does a loss function measure?

How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

You can intuitively understand the dimensionality of your representation space as _______________________

"how much freedom you're allowing the model to have when learning internal representations."

>>> x = np.zeros((300, 20)) >>> x = np.transpose(x) >>> print(x.shape) What is the new shape?

(20, 300)

If you have 60,000 28x28 images, what is the shape of the resulting vector?

(60000, 28, 28)

The MNIST dataset has 60,000 training images each of which is 28 x 28. What is its shape?

(60000, 28, 28)

model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) Having 16 units means the weight matrix W will have what shape?

(input_dimension, 16)

a = tf.ones((2, 2)) What does this describe?

A Tensor

What are examples of transformations?

A coordinate system change, or linear projections (which may destroy information), translations, nonlinear operations (such as "select all points such that x > 0"), and so on

What is a decision boundary?

A decision boundary can be thought of as a line or surface separating your training data into two spaces corresponding to two categories

What is a deep-learning model?

A directed, acyclic graph of layers

What are the three parts of the compilation step?

A loss function, an optimizer, and metrics to monitor during training and testing

What is a tensor that contains only one number?

A scalar

What does a 10-way softmax layer return?

An array of 10 probability scores which sum to 1

What kind of a shape does a scalar have?

An empty shape, ().

What algorithm does the optimizer use?

Backpropagation -- the central algorithm in deep learning.

What are the three most common use cases of neural networks?

Binary classification, multiclass classification, and scalar regression

What is step 3 of a Keras workflow?

Configure the learning process by choosing a loss function, an optimizer, and some metrics to monitor

What is a tuple?

It groups any number of items into a single compound value. Syntactically, it is a comma-separated sequence of values.

What shape does a vector have?

It has a shape with a single element such as (5,)

What does reshaping a tensor mean?

Rearranging its rows and columns to match a target shape

What is the simplest way to mitigate overfitting?

Reduce the size of the model

____________________ is the generalization of the concept of derivatives to functions of multidimensional inputs: that is, to functions that take tensors as inputs.

The gradient

What is the rank of a tensor?

The number of axes

What defines a hypothesis space?

The topology of a network

What is the most common network architecture by far?

linear stacks of layers

How do you modify the state of a variable?

via its assign method

Deep learning models consist of chains of simple tensor operations, parameterized by ______________________

weights

In padding lists so they all have the same length, what is the integer tensor's shape?

(samples, max_length)

From the above question, what is the shape of z?

(3,4,5)

What is another word for a 2D tensor?

A matrix

How are kernel functions typically crafted?

By hand

What is a kernel method?

Kernel methods are a group of classification algorithms.

Everything in Keras is either a ________________ or something that closely interacts with one.

Layer

What is the second way of turning lists into a tensor?

Multi-hot encoding the lists to turn them into vectors of 0s and 1s.

e *= d What does this do?

Multiply two tensors (element-wise).

In technical terms, we'd say that the transformation implemented by a layer is ____________________________ by its weights

parameterized

Input data in a neural network corresponds with what output data?

the corresponding targets

Image data is usually processed by what kind of layers?

2D convolution layers (Conv2D).

Image data is normally stored in n-dimensional tensors. What is n?

4

The loss is the quantity you'll attempt to minimize during training, so what should it represent?

A measure of success for the task you're trying to solve.

What is a hypothesis space?

A predefined set of operations through which machine learning algorithms search, in order to find the best transformations that turn data into more useful representations for a given task.

What are some advanced ways to help gradient propagation?

Batch normalization, residual connections, and depth-wise separable convolutions

What is Machine Learning?

Searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal.

What determines how learning proceeds?

The optimizer

Can a layer be stateless?

Yes

What is a disadvantage of having more units?

it makes the model more computationally expensive and may lead to learning unwanted patterns

A model that is too small will not _____________________

overfit

What is the class meant to manage modifiable state in TensorFlow?

tf.Variable

output = relu(dot(input, W) + b). What is W?

the weight matrix

If the shape of a vector is (10000, 28, 28) then what is its length?

10000

z = np.array([[ ... [5, 78, 2, 34, 0], ... [59, 17, 4, 32, 11], ... [14, 88, 7, 22, 5], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [4, 77, 5, 33, 3], ... [6, 79, 3, 35, 1], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [6, 79, 3, 35, 1], ... [2, 56, 4, 44, 1], ... [7, 80, 4, 36, 2]]]) What is the rank of z (its ndim)?

3

What is a layer?

A data-processing module that you can think of as a filter for data.

from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 32 refer to?

A dense layer with 32 outputs, aka output units

What is a synonym for a dense neural network?

A fully connected network

What is another way of saying 'a function of multidimensional inputs'?

A function that takes a tensor as inputs

What is a kernel function?

A kernel function is a computationally tractable operation that maps any two points in your initial space to the distance between these points in your target representation space, completely bypassing the explicit computation of the new representation

What is another word for a zero dimensional tensor?

A scalar

What is regularization?

A set of best practices that actively impede the model's ability to fit perfectly to the training data, with the goal of making the model perform better during validation.

What is a five-dimensional vector?

A vector with five entries

What might be a good metric to monitor in training and testing for an image classification problem?

Accuracy (the fraction of the images that were correctly classified)

How do you indicate a tuple?

Although it is not necessary, it is conventional to enclose tuples in parentheses: >>> julia = ("Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia")

What is a simple transformation such as a high-dimensional non-linear projection?

An SVM

What is learning, in the context of machine learning?

An automatic search process for better representations

What is shallow learning?

Any approach to machine learning that tends to focus on learning only one or two layers of representations of the data.

What's a representation?

At its core, it's a different way to look at data—to represent or encode data. For instance, a color image can be encoded in the RGB format (red-green-blue) or in the HSV format (hue-saturation-value): these are two different representations of the same data.

What is step 2 of a Keras workflow?

Define a network of layers (or model ) that maps your inputs to your targets

What is step 1 of a Keras workflow?

Define your training data: input tensors and target tensors

Simple vector data, stored in 2D tensors of shape (samples, features), is often processed by what kind of layers?

Densely connected layers, otherwise known as fully connected or dense layers (the Dense) class in Keras

What is dimensionality?

Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times.

What is the name of the process of extracting useful representations manually?

Feature engineering

What is gradient boosting good for?

Gradient boosting is used for problems where structured data is available, whereas deep learning is used for perceptual problems such as image classification.

What is a gradient?

It is the derivative of a tensor operation

What does the deep in deep learning refer to?

It isn't a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations.

What does the loss function do?

It takes the predictions of the network and the true target (what you wanted the network to output) and computes a distance score, capturing how well the network has done on this specific example

What does the term "regularization" imply?

It tends to make the model simpler, more "regular," its curve smoother, more "generic"; thus it is less specific to the training set and better able to generalize by more closely approximating the latent manifold of the data.

What is an objective function?

It's a synonym for the loss function: To control the output of a neural network, you need to be able to measure how far this output is from what you expected. In this sense the 'objective' function refers to the how close you are to accomplishing the objective of the network.

What is step 4 of a Keras workflow?

Iterate on your training data by calling the fit() method of your model

What defines a layer's state?

Its weights

What is a dense neural network?

Layers are fully connected (dense) by the neurons in a network layer. Each neuron in a layer receives an input from all the neurons present in the previous layer—thus, they're densely connected.

What are the three key attributes that define a tensor?

Number of axes (rank); shape (a tuple of integers that describes how many dimensions the tensor has along each axis) and data type

How many axes does a five-dimensional vector have?

One

The gradient-descent process must be based on how many scalar loss values?

One

What is the first way of turning a list into a tensor?

Pad your lists so that they all have the same length

How do you reduce the size of the model?

Reduce the number of learnable parameters in the model, determined by the number of layers and the number of units per layer.

c = tf.sqrt(a) What does this do?

Take the square root.

What is an optimizer?

The mechanism through which the network will update itself based on the data it sees and its loss function.

What are the two ways of defining a model?

Using the Sequential class or the functional API

____________________ in a layer encapsulate some state and ___________________ some computation

Weights, and a forward pass

output = relu(dot(input, W) + b) and model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) The dot product with W will project the input data onto what?

a 16-dimensional representation space (and then you'll add the bias vector b and apply the relu operation)

What is GradientTape in TensorFlow?

a mathematical tool for automatic differentiation (autodiff), which is the core functionality of TensorFlow. It does not "track" the autodiff, it is a key part of performing the autodiff.

A simple API should have a single _____________________ around which everything is centered.

abstraction

In the context of tensors, what is another word for dimension?

axis

Classifying movie reviews as positive or negative is an example of what kind of classification?

binary classification

The Embedding layer is best understood as a _________________ that maps integer indices (which stand for specific words) to dense vectors.

dictionary

The entire learning process is made possible by the fact that neural networks are chains of ___________________________

differentiable tensor operations

Why is GradientTape so named?

it is used to record ("tape") a sequence of operations performed upon some input and producing some output, so that the output can be differentiated with respect to the input (via backpropagation / reverse-mode autodiff) (in order to then perform gradient descent optimization).

What loss function might you use for a regression problem?

mean squared error

Classifying news wires by topic is an example of ______________

multiclass classification

Classifying news wires by topic is an example of what kind of classification?

multiclass classification

Reducing the network's size is one of the most common ______________________ techniques

regularization

What are the two axes of a matrix?

rows and columns

In machine learning, data points are called ______________.

samples

A vector has a shape with a ____________________ element, such as (5,)

single

What does a layer encapsulate?

some weights and some computation

In order to be useful for a neural network, a list must be turned into a __________________.

tensor

Selecting specific elements in a tensor is called _______________

tensor slicing

What do we call selecting specific elements in a tensor?

tensor slicing

A layer is a data-processing module that takes as input one or more _____________ and that outputs one or more ______________.

tensors

Are weights scalars, matrices, or tensors?

tensors

To train a model, we'll need to update its state, which is a set of ____________.

tensors

In an image classification problem, what will the test set consist of?

test_images and test_labels. (That is, a set of images and their corresponding labels)

To train a model, what method do you use?

the fit() method

What is an advantage of having more units?

this allows your model to learn more-complex representations

What kind of variables are tracked by default?

trainable variables

Are NumPy arrays assignable?

yes

What is symbolic AI?

The idea that human-level artificial intelligence could be achieved by having programmers handcraft a sufficiently large set of explicit rules for manipulating knowledge

What are the two essential characteristics of how deep learning learns from data?

The incremental, layer-by-layer way in which increasingly complex representations are developed, and the fact that these intermediate incremental representations are learned jointly, each layer being updated to follow both the representational needs of the layer above and the needs of the layer below.

What is crossentropy?

A quantity from the field of information theory that measures the distance between probability distributions or, in this case, between the ground-truth distribution and your predictions.

What is the difference between a relu and a sigmoid function?

A relu (rectified linear unit) is a function meant to zero out negative values, whereas a sigmoid "squashes" arbitrary values into the [0, 1] interval, outputting something that can be interpreted as a probability.

d=b+c What does this do?

Add two tensors (element-wise).

What is backpropagation?

Because all tensor operations in neural networks are differentiable, it's possible to apply the chain rule of derivation to find the gradient function mapping the current parameters and current batch of data to a gradient value.

What should the loss represent?

The loss is the quantity you'll attempt to minimize during training, so it should represent a measure of success for the task you're trying to solve.

What is an optimizer in a Deep Learning network?

The mechanism through which the network will update itself based on the data it sees and its loss function.

What is the depth of a model?

The number of layers contribute to a model of the data.

model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) What is the first argument being passed to each layer?

The number of units in the layer

What does the optimizer specify?

The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.

What is the shape of a numpy array?

The shape attribute for numpy arrays returns the dimensions of the array. If Y has n rows and m columns, then Y.shape is (n,m). So Y.shape[0] is n.

What is the the best known kernel method?

The support vector machine (SVM).

What is the name of the data that the model will learn from?

The training set

What is the purpose of word embeddings?

They are meant to map human language into a geometric space.

What is the central problem in machine learning and deep learning?

To meaningfully transform data: in other words, to learn useful representations of the input data at hand—representations that get us closer to the expected output.

What is transposition?

Transposing a matrix means exchanging its rows and its columns, so that x[i, :] becomes x[:, i]

What library do most practitioners of gradient boosting use?

XGBoost library, which offers support for the two most popular languages of data science: Python and R

How do you determine the right number of layers or the right size for each layer?

You must evaluate an array of different architectures on your validation set, not on your test set

How would you track a constant tensor?

You'd have to manually mark it as being tracked by calling tape.watch() on it.

What's another way of saying "more units"?

a higher-dimensional representation space

In using the padding list to tensor technique, what kind of layer do you need to start your model?

a layer capable of handling such integer tensors , also known as an Embedding layer

What is validation data?

a set of inputs that the model doesn't see during training.

For multiloss networks, all losses are combined (via averaging) into __________________________________.

a single scalar quantity

What shape does a scalar have?

an empty shape, ().

What is a feature?

an individual measurable property or characteristic of a phenomenon

Before you start training a model, you need to pick three things. What are they?

an optimizer, a loss, and some metrics

What algorithm does the optimizer use to adjust the values of the weights a little, in a direction that will lower the loss score?

backpropagation

What is the central algorithm in deep learning?

backpropagation

What are text-processing models called that treat input words as a set, discarding their original order?

bag-of-words models

Classifying movie reviews as positive or negative is an example of _________________________

binary classification

What loss function might you use for a two-class classification problem?

binary crossentropy

What loss function might you use for a many-class classification problem?

categorical crossentropy

On rare occasions, you may see a ______________ tensor

char

In machine learning, a category in a classification problem is called a _____________________.

class.

Is logistic regression a regression algorithm or a classification algorithm?

classification

Building deep-learning models in Keras is done by clipping together ___________________________ to form useful data-transformation pipelines.

compatible layers

What are convolutional networks generally used for?

computer vision

What loss function might you use for a sequence-learning problem?

connectionist temporal classification (CTC)

A neural network expects to process __________________ batches of data.

contiguous

What is the data type usually called in Python libraries?

dtype

What is learning in the context of machine learning?

finding a set of values for the weights of all layers in a network, such that the network will correctly map example inputs to their associated targets

A network, or model, can also be thought of as a _____________

function. In machine learning, the specific model you are using is the function and requires parameters in order to make a prediction on new data.

Machine learning, in a way, is the science of ______________________?

generalization

The attempt to generate knowledge that can be used across different tasks.

generalization

What does the topology of a network do?

it defines a hypothesis space

What does the fit() method do?

it runs mini-batch gradient descent for you. You can also use it to monitor your loss and metrics on validation data

One or more tensors learned with stochastic gradient descent, together contain the network's _______________________.

knowledge

The class associated with a specific sample is called a _________________.

label

Learning means finding a combination of ____________________ that minimizes a loss function for a given set of training data samples and their corresponding targets.

model parameters

Once your model is trained, what method to you use to generate predictions on new inputs?

model.predict()

Layers are assembled into ____________________.

models

Name some of the arbitrary network architectures that Keras supports

multi-input or multi-output models, layer sharing, and model sharing

A neural network that has multiple outputs may have how many loss functions?

multiple loss functions (one per output)

What is a tensor's rank called in Python libraries such as Numpy?

ndim

Layers are combined into a _____________________

network (or model)

What does the network.compile function look like in Keras?

network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Are TensorFlow tensors assignable?

no

What kind of object is a layer's weights?

one or more tensors

each Dense layer with a relu activation implements a chain of tensor operations. Setting the variable as 'output', what is that chain of tensor operations?

output = relu(dot(input, W) + b)

What are unwanted patterns?

patterns that will improve performance on the training data but not on the test data

What's the first requirement of creating a variable?

provide some initial value, such as a random tensor

Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically processed by what kind of layers?

recurrent layers, such as an LSTM layer.

What kind of NN is used for for sequence processing?

recurrent network

Estimating the price of a house, given real-estate data, is an example of ______________

regression

Estimating the price of a house, given real-estate data, is an example of what?

scalar regression

What are text-processing models that care about word order called?

sequence models

In the context of deep learning, what is another word for 'parameters'?

settings, also 'weights'

What is a hypothesis space?

space of possibilities

What is a constant tensor?

tf.constant are fixed values, and hence not trainable

In Keras, what is the single abstraction around which everything is centered?

the Layer class

What is the meaning of 'the number of units in the layer'?

the dimensionality of representation space of the layer

How/where do you specify the three required elements before you start training a model?

the model.compile() method.

The specification of what a layer does to its input data is stored in the layer's ____________________.

weights

If tensors aren't assignable, how do we update a model's state?

with variables

What does the loss function define?

It defines the feedback signal used for learning

What does a support vector machine do?

It finds a good decision boundary: a line or surface separating the training data into two spaces corresponding to two different categories.

What can you tell from this layer: network.add(layers.Dense(10, activation='softmax'))

It is densely connected; it is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1).

What is the functional API for in Keras?

It is for directed acyclic graphs of layers, which lets you build completely arbitrary architectures.

What is the purpose of the loss function ?

It is how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

What is mini-batch stochastic gradient descent?

Learning happens by drawing random batches of data samples and their targets, and computing the gradient of the model parameters with respect to the loss on the batch. The model parameters are then moved a bit (the magnitude of the move is defined by the learning rate) in the opposite direction from the gradient.

What is the definition of learning in the context of deep learning?

Learning means finding a set of values for the model's weights that minimizes a loss function for a given set of training data samples and their corresponding targets.

What are element-wise operations?

Operations that are applied independently to each entry in the tensors being considered

A Layer is an object that encapsulates two things. What are they?

Some state and some computation

What is the general workflow for finding an appropriate model size?

Start with relatively few layers and parameters, and increase the size of the layers or add new layers until you see diminishing returns with regard to validation loss.

What kind of tensors don't exist in Numpy (or in most other libraries), and why not?

Strings, because tensors live in preallocated, contiguous memory segments and strings, being variable length, would preclude the use of this implementation.

How many axes does a 3D tensor have? How many does a matrix have?

3, and 2, respectively

e = tf.matmul(a, b) What does this do?

Take the product of two tensors

b = tf.square(a) What does this do?

Take the square.

What is eager execution?

Tensor operations get executed on the fly: at any point, you can print what the current result is, just like in NumPy.

What do we apply to find the gradient function mapping the current parameters and current batch of data to a gradient value?

The chain rule of derivation

______________________ takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It's effectively a dictionary lookup

The embedding layer

What does the optimizer specify?

The exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.

The notion of layer compatibility refers specifically to what?

The fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape.

from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 784 refer to?

The first dimension

What is accuracy in the context of image classification?

The fraction of the images that were correctly classified.

What is the purpose of the optimizer?

The fundamental trick in deep learning is to use the distance score generated by the loss function as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current example. That's what the optimizer does.


Related study sets

ENG II Integrating Sources Notes

View Set

Origins and Insertions (Rhomboid Major)

View Set

Week 9: Supply Chain Management (chapter 9)

View Set

TTE-221 Chapter 2 Systems Architectures

View Set

chapter 18 consumer behavior quiz

View Set

How minerals are identified Chapt 3

View Set