Keras, Deep Learning With Python

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is a fully connected neural network?

Every neuron in one layer is connected to every neuron in the next.

What is a fully connected layer?

Every neuron in the layer is connected to every neuron in the next layer

What purpose does a loss function serve in deep learning?

It serves as a feedback signal

What is a kernel method?

Kernel methods are a group of classification algorithms.

The Sequential class is used only for what arrangement of layers?

For linear stacks of layers

What does GRU stand for?

Gated recurrent unit

What is a GRU?

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks. The GRU is like a long short-term memory (LSTM) with a forget gate, but has fewer parameters than LSTM, as it lacks an output gate.

What's a metric used in image classification?

Metrics to monitor during training and testing; in this case, accuracy — the fraction of the images that were correctly classified

What is the second way of turning lists into a tensor?

Multi-hot encoding the lists to turn them into vectors of 0s and 1s.

When designing a neural network in Keras, we have to decide three things having to do with layers and nodes. What are they?

How many layers there should be, how many nodes should be in each layer and how the layers should be connected to each other

What does w, the weight (or parameter) imply?

How much weight or strength we should be giving to that incoming input. We can think of it as how important that input is

What is a loss function?

How the model will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction

What does a loss function measure?

How the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direc- tion.

You can intuitively understand the dimensionality of your representation space as _______________________

"how much freedom you're allowing the model to have when learning internal representations."

>>> x = np.zeros((300, 20)) >>> x = np.transpose(x) >>> print(x.shape) What is the new shape?

(20, 300)

x = np.array([[[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]]) What is the shape of this tensor?

(3, 3, 5).

>>> x = np.array([[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]) What is this rank 2 tensor's shape?

(3, 5)

From the above question, what is the shape of z?

(3,4,5)

x = np.array([[[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]]) what is x.shape?

(4, 3, 5). There are 4 of the double brackets ([[) three of the outermost brackets ([[[) and 5 elements in each row.

A vector has a shape with a single element, such as

(5, )

If you have 60,000 28x28 images, what is the shape of the resulting vector?

(60000, 28, 28)

The MNIST dataset has 60,000 training images each of which is 28 x 28. What is its shape?

(60000, 28, 28)

train_images for MNIST has 60,000 images, each 28 x 28. What is its shape?

(60000, 28, 28)

>>> my_slice = train_images[10:100, 0:28, 0:28] What is my_slice.shape?

(90, 28, 28)

model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) Having 16 units means the weight matrix W will have what shape?

(input_dimension, 16)

>>> x = np.array([12, 3, 6, 14, 7]) >>> x.ndim What is x.ndim?

1

A 5D vector has how many axes?

1

What are the two steps of broadcasting?

1 Axes (called broadcast axes) are added to the smaller tensor to match the ndim of the larger tensor. 2 The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor.

What are examples of transformations?

A coordinate system change, or linear projections (which may destroy information), translations, nonlinear operations (such as "select all points such that x > 0"), and so on

What is a layer?

A data-processing module that you can think of as a filter for data.

What is a decision boundary?

A decision boundary can be thought of as a line or surface separating your training data into two spaces corresponding to two categories

What is a deep-learning model?

A directed, acyclic graph of layers

For metrics, where would you write state-update logic for a custom metric?

In the update_state() method

What's another way of saying a 5D tensor?

A rank 5 tensor

What is the .h5 file extension?

An H5 file is a data file saved in the Hierarchical Data Format (HDF). It contains multidimensional arrays of scientific data.

What is a simple transformation such as a high-dimensional non-linear projection?

An SVM

What does a 10-way softmax layer return?

An array of 10 probability scores which sum to 1

What kind of a shape does a scalar have?

An empty shape, ().

What is the first step of broadcasting?

Axes (called broadcast axes) are added to the smaller tensor to match the ndim of the larger tensor.

What are the three most common use cases of neural networks?

Binary classification, multiclass classification, and scalar regression

How do you load a previously trained model?

By calling the load model function and passing in a file name.

What shape does a vector have?

It has a shape with a single element such as (5,)

What can you tell from this layer: network.add(layers.Dense(10, activation='softmax'))

It is densely connected; it is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1).

What is the 'state' of an RNN?

RNNs iterate through the elements of input sequence while maintaining an internal "state", which encodes everything which it has seen so far.

e = tf.matmul(a, b) What does this do?

Take the product of two tensors

What is the precise definition of encoding in machine learning?

The encoder is all of the model except for the final layer.

Each individual node is trained to perform a simple mathematical calculation and then feed its data to what?

To all the nodes it's connected to

What is the requirement for the 'deep' aspect of a deep neural network?

Two or more hidden layers

Where does the name "backpropagation" come from?

We "back propagate" the loss contributions of different nodes in a computation graph

How do you write an element-wise product?

With the * operator

Can a layer be stateless?

Yes

How would the nth batch of MNIST images be represented?

batch = train_images[128 * n : 128 * (n + 1)]

What is another term for 'tensor product'?

dot product

You can think of a layer as a _________ for data: some data goes in, and it comes out in a more useful form

filter

What does the topology of a network do?

it defines a hypothesis space

One or more tensors learned with stochastic gradient descent, together contain the network's _______________________.

knowledge

How do you modify the state of a variable?

via its assign method

If tensors aren't assignable, how do we update a model's state?

with variables [2-77]

How do we add an empty first axis to y, whose shape becomes (1, 10)?

y = np.expand_dims(y, axis=0)

Given the following two vectors, how would you create a a new tensor, z built from the dot product of x and y?

z = np.dot(x, y)

How would you write a dot product in mathematical notation?

z=x•y

What is a kernel function?

A kernel function is a computationally tractable operation that maps any two points in your initial space to the distance between these points in your target representation space, completely bypassing the explicit computation of the new representation

What are the three parts of the compilation step?

A loss function, an optimizer, and metrics to monitor during training and testing

What is a tensor that contains only one number?

A scalar

How many axes does a 5-D tensor have?

Five

The central class of Keras is the ___________________.

Layer

What kind of tensors don't exist in Numpy (or in most other libraries), and why not?

Strings, because tensors live in preallocated, contiguous memory segments and strings, being variable length, would preclude the use of this implementation.

If y now has a shape of (1,10) which axis is axis=0?

The first axis, ie the axis that contains 1

What is the purpose of the optimizer?

The fundamental trick in deep learning is to use the distance score generated by the loss function as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current example. That's what the optimizer does.

____________________ is the generalization of the concept of derivatives to functions of multidimensional inputs: that is, to functions that take tensors as inputs.

The gradient

What is an optimizer?

The mechanism through which the model will update itself based on the training data it sees, so as to improve its performance.

What is the rank of a tensor?

The number of axes

Does Layer.get_weights() return trainable or non-trainable weight values?

This function returns both trainable and non-trainable weight values associated with this layer as a list of NumPy arrays, which can in turn be used to load state into similarly parameterized layers.

What function do we call to train the model? What data do we pass in?

To train the model, we call model.fit and pass in the training data and the expected output for the training data.

In technical terms, we'd say that the transformation implemented by a layer is ____________________________ by its weights

parameterized

What does stochastic mean?

random

What is a hypothesis space?

space of possibilities

What do we call selecting specific elements in a tensor?

tensor slicing

What are the first two parameters in the layers.Dense() function?

units, activation=None (see https://machinelearningknowledge.ai/keras-dense-layer-explained-for-beginners/

Deep learning models consist of chains of simple tensor operations, parameterized by ______________________

weights

Image data is usually processed by what kind of layers?

2D convolution layers (Conv2D).

train_images for MNIST has 60,000 images, each 28 x 28. What is its rank?

3

x = np.array([[[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]], [[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]]) What rank is this tensor?

3

How many axes does a 3D tensor have? How many does a matrix have?

3, and 2, respectively

The loss is the quantity you'll attempt to minimize during training, so what should it represent?

A measure of success for the task you're trying to solve.

What is a hypothesis space?

A predefined set of operations through which machine learning algorithms search, in order to find the best transformations that turn data into more useful representations for a given task.

What does backpropagation compute?

Backpropagation starts with the final loss value and works backward from the top layers to the bottom layers, computing the contribution that each parameter had in the loss value.

What are element-wise operations?

Operations that are applied independently to each entry in the tensors being considered

What is the rank, shape, and description of Vector data?

Rank-2 tensors of shape(samples,features), where each sample is a vector of numerical attributes ("features")

A scalar has what shape?

an empty shape, ( )

What is learning in the context of machine learning?

finding a set of values for the weights of all layers in a network, such that the network will correctly map example inputs to their associated targets

A vector has a shape with a ____________________ element, such as (5,)

single

output = relu(dot(input, W) + b). What is W?

the weight matrix

In padding lists so they all have the same length, what is the integer tensor's shape?

(samples, max_length)

So, for example, if we have b equal to negative 10, then the effects of x*w won't really start to overcome that b or bias term until their product surpasses _____________.

10

If the shape of a vector is (10000, 28, 28) then what is its length?

10000

>>> x = np.array([[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]) >>> x.ndim

2

model = keras.Sequential([ layers.Dense(512, activation="relu"), layers.Dense(10, activation="softmax") How many layers are in this model?

2

z = np.array([[ ... [5, 78, 2, 34, 0], ... [59, 17, 4, 32, 11], ... [14, 88, 7, 22, 5], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [4, 77, 5, 33, 3], ... [6, 79, 3, 35, 1], ... [7, 80, 4, 36, 2]], ... [[5, 78, 2, 34, 0], ... [6, 79, 3, 35, 1], ... [2, 56, 4, 44, 1], ... [7, 80, 4, 36, 2]]]) What is the rank of z (its ndim)?

3

Image data is normally stored in n-dimensional tensors. What is n?

4

>>> x = np.array([12, 3, 6, 14, 7]) This is a __-dimensional vector

5

A 5D tensor has how many axes?

5

A 5D vector has how many dimensions along its axis?

5

For the MNIST dataset, what is another way of writing: my_slice = train_images[10:100]

>>> my_slice = train_images[10:100, 0:28, 0:28]

For the MNIST dataset, what is this equivalent to: my_slice = train_images[10:100]

>>> my_slice = train_images[10:100, :, :] >>> my_slice.shape (90, 28, 28)

When is a sequential model appropriate?

A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

a = tf.ones((2, 2)) What does this describe?

A Tensor

from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 32 refer to?

A dense layer with 32 outputs, aka output units

What is a synonym for a dense neural network?

A fully connected network

What kind of neural network does this show?

A fully connected neural network.

What is another way of saying 'a function of multidimensional inputs'?

A function that takes a tensor as inputs

What is another word for a 2D tensor?

A matrix

What is crossentropy?

A quantity from the field of information theory that measures the distance between probability distributions or, in this case, between the ground-truth distribution and your predictions.

What is the difference between a relu and a sigmoid function?

A relu (rectified linear unit) is a function meant to zero out negative values, whereas a sigmoid "squashes" arbitrary values into the [0, 1] interval, outputting something that can be interpreted as a probability.

In machine learning, how are rows used?

A row describes a single entity or observation and the columns describe properties about that entity or observation. The more rows you have, the more examples from the problem domain that you have.

What is another word for a zero dimensional tensor?

A scalar

What is regularization?

A set of best practices that actively impede the model's ability to fit perfectly to the training data, with the goal of making the model perform better during validation.

What is a five-dimensional vector?

A vector with five entries

What might be a good metric to monitor in training and testing for an image classification problem?

Accuracy (the fraction of the images that were correctly classified)

d=b+c What does this do?

Add two tensors (element-wise).

How do you indicate a tuple?

Although it is not necessary, it is conventional to enclose tuples in parentheses: >>> julia = ("Julia", "Roberts", 1967, "Duplicity", 2009, "Actress", "Atlanta, Georgia")

What is learning, in the context of machine learning?

An automatic search process for better representations

What acts as 'decoder' in a RNN?

Another RNN layer (or stack thereof) acts as "decoder"

What is shallow learning?

Any approach to machine learning that tends to focus on learning only one or two layers of representations of the data.

What's a representation?

At its core, it's a different way to look at data—to represent or encode data. For instance, a color image can be encoded in the RGB format (red-green-blue) or in the HSV format (hue-saturation-value): these are two different representations of the same data.

What algorithm does the optimizer use?

Backpropagation -- the central algorithm in deep learning.

What are some advanced ways to help gradient propagation?

Batch normalization, residual connections, and depth-wise separable convolutions

What is backpropagation?

Because all tensor operations in neural networks are differentiable, it's possible to apply the chain rule of derivation to find the gradient function mapping the current parameters and current batch of data to a gradient value.

How do we scale the test data and make sure it's scaled by the same amount as the training data?

By calling the transform function on this scaler instead of fit_transform.

How are kernel functions typically crafted?

By hand

What is step 3 of a Keras workflow?

Configure the learning process by choosing a loss function, an optimizer, and some metrics to monitor

What is a batch?

Deep learning models don't process an entire dataset at once; rather, they break the data into small batches. Concretely, here's one batch of our MNIST digits, with a batch size of 128: batch = train_images[:128]

What is step 2 of a Keras workflow?

Define a network of layers (or model ) that maps your inputs to your targets

What is step 1 of a Keras workflow?

Define your training data: input tensors and target tensors

Simple vector data, stored in 2D tensors of shape (samples, features), is often processed by what kind of layers?

Densely connected layers, otherwise known as fully connected or dense layers (the Dense) class in Keras

What is dimensionality?

Dimensionality can denote either the number of entries along a specific axis (as in the case of our 5D vector) or the number of axes in a tensor (such as a 5D tensor), which can be confusing at times.

What is the name of the process of extracting useful representations manually?

Feature engineering

What is the process of manually engineering good layers of representations for data?

Feature engineering

When we compile the model, we need to tell Keras two important things. What are they?

First, we need to tell it how we want to measure the accuracy of each prediction made by the model during the training process. This is called the loss function. Keras lets us choose from several standard loss functions or define our own. Second, we need to tell Keras which optimizer algorithm we want to use to train the model.

What is gradient boosting good for?

Gradient boosting is used for problems where structured data is available, whereas deep learning is used for perceptual problems such as image classification.

In machine learning, what is a kernel?

In machine learning, a "kernel" is usually used to refer to the kernel trick, a method of using a linear classifier to solve a non-linear problem.

Where does the knowledge of the model persist?

In weight tensors, which are attributes of the layers.

What does the loss function define?

It defines the feedback signal used for learning

What does a support vector machine do?

It finds a good decision boundary: a line or surface separating the training data into two spaces corresponding to two different categories.

What is a tuple?

It groups any number of items into a single compound value. Syntactically, it is a comma-separated sequence of values.

What is the functional API for in Keras?

It is for directed acyclic graphs of layers, which lets you build completely arbitrary architectures.

What is the purpose of the loss function ?

It is how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

What is a gradient?

It is the derivative of a tensor operation (2-51)

What does the deep in deep learning refer to?

It isn't a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations.

What does fit_transform mean?

It means we want it to first fit the scaler to our data, but figure out how much to scale down the numbers in each column, and then we want it to actually transform or scale the data.

What does the loss function do?

It takes the predictions of the network and the true target (what you wanted the network to output) and computes a distance score, capturing how well the network has done on this specific example

What does the term "regularization" imply?

It tends to make the model simpler, more "regular," its curve smoother, more "generic"; thus it is less specific to the training set and better able to generalize by more closely approximating the latent manifold of the data.

If the last layer is a 10-way softmax classification layer, what does that mean?

It will return an array of 10 probability scores (summing to 1)

What is an objective function?

It's a synonym for the loss function: To control the output of a neural network, you need to be able to measure how far this output is from what you expected. In this sense the 'objective' function refers to the how close you are to accomplishing the objective of the network.

What is the sequential model API in Keras and why is it called 'sequential'?

It's called the sequential model API because you first create an empty model object, and then you add layers to it one after another in sequence.

What is step 4 of a Keras workflow?

Iterate on your training data by calling the fit() method of your model

What defines a layer's state?

Its weights

What are kernel methods?

Kernel methods are a group of classification algorithms, the best known of which is the support vector machine (SVM)

Everything in Keras is either a ________________ or something that closely interacts with one.

Layer

What might be a better term than 'deep' learning?

Layered representations learning, or hierarchical representations learning. These better describe what is actually happening.

What is a dense neural network?

Layers are fully connected (dense) by the neurons in a network layer. Each neuron in a layer receives an input from all the neurons present in the previous layer—thus, they're densely connected.

What is mini-batch stochastic gradient descent?

Learning happens by drawing random batches of data samples and their targets, and computing the gradient of the model parameters with respect to the loss on the batch. The model parameters are then moved a bit (the magnitude of the move is defined by the learning rate) in the opposite direction from the gradient.

What is the definition of learning in the context of deep learning?

Learning means finding a set of values for the model's weights that minimizes a loss function for a given set of training data samples and their corresponding targets.

e *= d What does this do?

Multiply two tensors (element-wise).

What are the three key attributes that define a tensor?

Number of axes (rank); shape (a tuple of integers that describes how many dimensions the tensor has along each axis) and data type

In the model train test evaluation flow, what is the evaluation phase?

Once the model is trained and tested, we can use it in the real world. We pass in new data, and it gives us a prediction.

How many axes does a five-dimensional vector have?

One

The gradient-descent process must be based on how many scalar loss values?

One

What is the first way of turning a list into a tensor?

Pad your lists so that they all have the same length

How would you schedule actions to be taken at specific points during training?

Pass callbacks to the fit() method

What is the rank, shape, and description of Timeseries data or sequence data?

Rank-3 tensors of shape(samples, timesteps, features), where each sample is a sequence (of length timesteps) of feature vectors

What is the rank, shape, and description of Images?

Rank-4 tensors of shape(samples,height,width,channels), where each sample is a 2D grid of pixels, and each pixel is represented by a vector of values ("channels")

What is the rank, shape, and description of Video?

Rank-5 tensors of shape(samples, frames, height, width, channels), where each sample is a sequence (of length frames) of images

What does reshaping a tensor mean?

Rearranging its rows and columns to match a target shape

How do you reduce the size of the model?

Reduce the number of learnable parameters in the model, determined by the number of layers and the number of units per layer.

What is the simplest way to mitigate overfitting?

Reduce the size of the model

What does Y = np.concatenate([y] * 32, axis=0)

Repeat y 32 times along axis 0 to obtain Y, which has shape (32, 10).

What does Layer.get_weights() return?

Returns the current weights of the layer, as NumPy arrays.

What is Machine Learning?

Searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal.

A Layer is an object that encapsulates two things. What are they?

Some state and some computation

What is the general workflow for finding an appropriate model size?

Start with relatively few layers and parameters, and increase the size of the layers or add new layers until you see diminishing returns with regard to validation loss.

b = tf.square(a) What does this do?

Take the square

c = tf.sqrt(a) What does this do?

Take the square root

What is eager execution?

Tensor operations get executed on the fly: at any point, you can print what the current result is, just like in NumPy.

What is Broadcasting?

The altering of a smaller tensor to match the ndim of a larger tensor.

What is supervised machine learning?

The branch of machine learning where we train the model by showing it input data and the expected result for that data, and it works out how to transform the input into the expected output.

What do we apply to find the gradient function mapping the current parameters and current batch of data to a gradient value?

The chain rule of derivation

______________________ takes integers as input, it looks up these integers in an internal dictionary, and it returns the associated vectors. It's effectively a dictionary lookup

The embedding layer

In python, what is the colon (:) equivalent to?

The entire axis

What does the optimizer specify?

The exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.

The notion of layer compatibility refers specifically to what?

The fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape.

from keras import layers layer = layers.Dense(32, input_shape=(784,)) What does 784 refer to?

The first dimension

What is accuracy in the context of image classification?

The fraction of the images that were correctly classified.

What is symbolic AI?

The idea that human-level artificial intelligence could be achieved by having programmers handcraft a sufficiently large set of explicit rules for manipulating knowledge

What are the two essential characteristics of how deep learning learns from data?

The incremental, layer-by-layer way in which increasingly complex representations are developed, and the fact that these intermediate incremental representations are learned jointly, each layer being updated to follow both the representational needs of the layer above and the needs of the layer below.

What should the loss represent?

The loss is the quantity you'll attempt to minimize during training, so it should represent a measure of success for the task you're trying to solve.

What is an optimizer in a Deep Learning network?

The mechanism through which the network will update itself based on the data it sees and its loss function.

What is an optimizer?

The mechanism through which the network will update itself based on the data it sees and its loss function.

What is the decoder in an RNN trained to predict?

The next characters of the target sequence, given previous characters of the target sequence. Specifically, it is trained to turn the target sequences into the same sequences but offset by one timestep in the future

What are the key three attributes that define a tensor?

The number of axes (rank), the shape, and the data type

What is the depth of a model?

The number of layers contribute to a model of the data.

model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) What is the first argument being passed to each layer?

The number of units in the layer

What determines how learning proceeds?

The optimizer [2-76]

What does the optimizer specify?

The optimizer specifies the exact way in which the gradient of the loss will be used to update parameters: for instance, it could be the RMSProp optimizer, SGD with momentum, and so on.

Why bother saving a model?

The saved model file will contain everything we need to use our model in another program.

What is the shape of a numpy array?

The shape attribute for numpy arrays returns the dimensions of the array. If Y has n rows and m columns, then Y.shape is (n,m). So Y.shape[0] is n.

What is the second step of broadcasting?

The smaller tensor is repeated alongside these new axes to match the full shape of the larger tensor.

When is the state of an RNN reset?

The state of an RNN is reset when processing two different and independent sequences.

What is the the best known kernel method?

The support vector machine (SVM).

What defines a hypothesis space?

The topology of a network

What is the name of the data that the model will learn from?

The training set

What is the purpose of word embeddings?

They are meant to map human language into a geometric space.

What is the rank, shape, and description ofTimeseries data or sequence data—Rank-3 tensors of shape(samples,timesteps, features), where each sample is a sequence (of length timesteps) of feature vectors

Timeseries data or sequence data—Rank-3 tensors of shape(samples,timesteps, features), where each sample is a sequence (of length timesteps) of feature vectors

What is the central problem in machine learning and deep learning?

To meaningfully transform data: in other words, to learn useful representations of the input data at hand—representations that get us closer to the expected output.

What is transposition?

Transposing a matrix means exchanging its rows and its columns, so that x[i, :] becomes x[:, i]

What does the following code do, and what is the answer? time = tf.Variable(0.) with tf.GradientTape() as outer_tape: with tf.GradientTape() as inner_tape: position = 4.9 * time ** 2 speed = inner_tape.gradient(position, time) acceleration = outer_tape.gradient(speed, time)

Uses the outer tape to compute the gradient of the gradient from the inner tape. Naturally, the answer is 4.9 * 2 = 9.8.

What are the two ways of defining a model?

Using the Sequential class or the functional API

In the formula output = activation(dot(W, input) + b), what are W, b, and activation?

W and b are model parameters, and activation is an element-wise function (usually relu, but it could be softmax for the last layer)

When we are happy with the accuracy of the system, we can save the training model to a file. How do we do that?

We call model.save and pass in the file name.

How do we use the model to make new predictions? Which function do we call?

We call the predict function and pass in the new data we want predictions for.

How is a saved model used in the evaluation phase?

We load our previously trained model by calling the load model function and passing in a file name.

____________________ in a layer encapsulate some state and ___________________ some computation

Weights, and a forward pass

What is broadcasting?

When two tensors differ in shape and are subject to a mathematical operation, the smaller tensor will be adjusted to match the shape of the larger one

In numpy, how would you create a random matrix named X with shape (32, 10)?

X = np.random.random((32, 10))

What library do most practitioners of gradient boosting use?

XGBoost library, which offers support for the two most popular languages of data science: Python and R

Having added an empty first axis, how do we repeat y 32 times?

Y = np.concatenate([y] * 32, axis=0)

How do you determine the right number of layers or the right size for each layer?

You must evaluate an array of different architectures on your validation set, not on your test set

How would you track a constant tensor?

You'd have to manually mark it as being tracked by calling tape.watch() on it.

>>> x = np.array([[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]) What is the first column of x?

[5, 6, 7]

>>> x = np.array([[5, 78, 2, 34, 0], [6, 79, 3, 35, 1], [7, 80, 4, 36, 2]]) What is the first row of x?

[5, 78, 2, 34, 0]

model = keras.Sequential([ layers.Dense(512, activation="relu"), layers.Dense(10, activation="softmax") What does the 10 indicate?

a 10-way softmax classification layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.

output = relu(dot(input, W) + b) and model = keras.Sequential([ layers.Dense(16, activation="relu"), layers.Dense(16, activation="relu"), layers.Dense(1, activation="sigmoid") ]) The dot product with W will project the input data onto what?

a 16-dimensional representation space (and then you'll add the bias vector b and apply the relu operation)

What is another term for a fully connected neural network?

a dense neural network

What's another way of saying "more units"?

a higher-dimensional representation space

In using the padding list to tensor technique, what kind of layer do you need to start your model?

a layer capable of handling such integer tensors , also known as an Embedding layer

What is GradientTape in TensorFlow?

a mathematical tool for automatic differentiation (autodiff), which is the core functionality of TensorFlow. It does not "track" the autodiff, it is a key part of performing the autodiff.

What is the result of (dot(x, y)) ?

a matrix with shape (x.shape[0], y.shape[1]), where the coefficients are the vector products between the rows of x and the columns of y.

What is validation data?

a set of inputs that the model doesn't see during training.

For multiloss networks, all losses are combined (via averaging) into __________________________________.

a single scalar quantity

What is teacher forcing?

a training process by which a model is trained to turn the target sequence into the same sequences but offset by one timestep in the future

What is shape?

a tuple of integers that describes how many dimensions the tensor has along each axis

A simple API should have a single _____________________ around which everything is centered.

abstraction

What shape does a scalar have?

an empty shape, ().

What is a feature?

an individual measurable property or characteristic of a phenomenon

x*w + b. What can we think of b as?

an offset value. x times w has to reach a certain threshold before having an effect and overcoming that b term.

Before you start training a model, you need to pick three things. What are they?

an optimizer, a loss, and some metrics

The rank of a tensor is another way of describing how many ______________ it has

axes

In the context of tensors, what is another word for dimension?

axis

What algorithm does the optimizer use to adjust the values of the weights a little, in a direction that will lower the loss score?

backpropagation

What is the central algorithm in deep learning?

backpropagation

What are text-processing models called that treat input words as a set, discarding their original order?

bag-of-words models

How would the second batch of MNIST images be represented?

batch = train_images[128:256]

Classifying movie reviews as positive or negative is an example of _________________________

binary classification

Classifying movie reviews as positive or negative is an example of what kind of classification?

binary classification

What loss function might you use for a two-class classification problem?

binary crossentropy

How does the scaler scale the data behind the scenes?

by multiplying the data by a constant and adding another constant.

What loss function might you use for a many-class classification problem?

categorical crossentropy

On rare occasions, you may see a ______________ tensor

char

Create a class called RootMeanSquaredError. What's the syntax to subclass the Metric class?

class RootMeanSquaredError(keras.metrics.Metric):

In machine learning, a category in a classification problem is called a _____________________.

class.

Is logistic regression a regression algorithm or a classification algorithm?

classification

The entries from the second axis in a NumPy matrix are called the __________.

columns

Building deep-learning models in Keras is done by clipping together ___________________________ to form useful data-transformation pipelines.

compatible layers

What are convolutional networks generally used for?

computer vision

What loss function might you use for a sequence-learning problem?

connectionist temporal classification (CTC)

A neural network expects to process __________________ batches of data.

contiguous

Most of deep learning consists of chaining together simple layers that will implement a form of progressive _____________________________

data distillation

The Embedding layer is best understood as a _________________ that maps integer indices (which stand for specific words) to dense vectors.

dictionary

The entire learning process is made possible by the fact that neural networks are chains of ___________________________

differentiable tensor operations

As soon as one of the two tensors has an ndim greater than 1, dot is no longer symmetric. What does this mean?

dot(x, y) isn't the same as dot(y, x)

What is the data type usually called in Python libraries?

dtype

What is the signature for the error_rate function?

error_rate = model.evaluate(testing_data, expected_output)

What can sequence-to-sequence translation be used for, in addition to machine translation?

free-form question answering (generating a natural language answer given a natural language question). More generally, it is applicable any time you need to generate text.

A network, or model, can also be thought of as a _____________

function. In machine learning, the specific model you are using is the function and requires parameters in order to make a prediction on new data.

Machine learning, in a way, is the science of ______________________?

generalization

The attempt to generate knowledge that can be used across different tasks.

generalization

Why is GradientTape so named?

it is used to record ("tape") a sequence of operations performed upon some input and producing some output, so that the output can be differentiated with respect to the input (via backpropagation / reverse-mode autodiff) (in order to then perform gradient descent optimization).

What is a disadvantage of having more units?

it makes the model more computationally expensive and may lead to learning unwanted patterns

What is the general definition of encoding in machine learning?

it means 'to convert data into a required format'

mini-batch stochastic gradient descent (mini-batch SGD). What does the word 'stochastic' mean in this context?

it refers to the fact that each batch of data is drawn at random

What does the fit() method do?

it runs mini-batch gradient descent for you. You can also use it to monitor your loss and metrics on validation data

What is the purpose of units in the layers.Dense() function?

it uses positive integers to set the output size of the layer.

What is sequence-to-sequence learning (Seq2Seq)?

it's about training models to convert sequences from one domain (e.g. sentences in English) to sequences in another domain (e.g. the same sentences translated to French).

What is matmul?

it's how you say "dot product" in TensorFlow

If the gradient of the position of an object with regard to time is the speed of that object, what is the second-order gradient?

its acceleration.

The class associated with a specific sample is called a _________________.

label

These nodes, or neurons, are arranged into a series of groups called _____________________.

layers

model = keras.Sequential([ layers.Dense(512, activation="relu"), layers.Dense(10, activation="softmax") Which is the last layer?

layers.Dense(10, activation="softmax")

What is the most common network architecture by far?

linear stacks of layers

What loss function might you use for a regression problem?

mean squared error

What is an example of code to define a previously trained model?

model = keras.models.load_model('trained_model.h5')

Learning means finding a combination of ____________________ that minimizes a loss function for a given set of training data samples and their corresponding targets.

model parameters

What parameters does the model.fit() function take?

model.fit(training_data, expected_output)

Once your model is trained, what method to you use to generate predictions on new inputs?

model.predict()

What's an example of the model.save function?

model.save("trained_model.h5")

Layers are assembled into ____________________.

models

Name some of the arbitrary network architectures that Keras supports

multi-input or multi-output models, layer sharing, and model sharing

Classifying news wires by topic is an example of ______________

multiclass classification

Classifying news wires by topic is an example of what kind of classification?

multiclass classification

A neural network that has multiple outputs may have how many loss functions?

multiple loss functions (one per output)

From the train_images tensor of MNIST, selects digits #10 to #100 (#100 isn't included) from the training set and put them in an array of shape (90, 28, 28) and call it my_slice

my_slice = train_images[10:100]

In general, you may select slices between any two indices along each tensor axis. How would you select 14 × 14 pixels in the bottom-right corner of all MNIST training images?

my_slice = train_images[:, 14:, 14:] (because each image is 28x28

How would you crop all MNIST training images to patches of 14 × 14 pixels centered in the middle?

my_slice = train_images[:, 7:-7, 7:-7]

What is a tensor's rank called in Python libraries such as Numpy?

ndim

Layers are combined into a _____________________

network (or model)

What does the network.compile function look like in Keras?

network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

Which machine learning algorithm does Keras use?

neural networks

A neural network is a machine-learning algorithm made up of individual nodes called ____________.

neurons

Are TensorFlow tensors assignable?

no

Like layers, are metrics updated via backpropagation?

no

What is another term for neurons?

nodes

Nodes in each layer are connected to ______________________________.

nodes in the following layer

In NumPy, a tensor product is done using the _________ function

np.dot (because the mathematical notation for tensor product is usually a dot)

What does ndim tell you?

number of axes (rank)

What kind of object is a layer's weights?

one or more tensors

What is the input transformation implemented by the Dense layer?

output = activation(dot(W, input) + b)

each Dense layer with a relu activation implements a chain of tensor operations. Setting the variable as 'output', what is that chain of tensor operations?

output = relu(dot(input, W) + b)

A model that is too small will not _____________________

overfit

What are unwanted patterns?

patterns that will improve performance on the training data but not on the test data

What's the first requirement of creating a variable?

provide some initial value, such as a random tensor

Sequence data, stored in 3D tensors of shape (samples, timesteps, features), is typically processed by what kind of layers?

recurrent layers, such as an LSTM layer.

What kind of NN is used for for sequence processing?

recurrent network

Estimating the price of a house, given real-estate data, is an example of ______________

regression

Reducing the network's size is one of the most common ______________________ techniques

regularization

What do layers extract from the data fed into them?

representations —hopefully, representations that are more meaningful for the problem at hand.

What does the following code do: input_var = tf.Variable(initial_value=3.) with tf.GradientTape() as tape: result = tf.square(input_var) gradient = tape.gradient(result, input_var)

retrieves the gradient of any differentiable expression with respect to any of its inputs [2-78]

The entries from the first axis in a NumPy matrix are called the _____________

rows

What are the two axes of a matrix?

rows and columns

In machine learning, data points are called ______________.

samples

In general, the first axis (axis 0, because indexing starts at 0) in all data tensors you'll come across in deep learning will be the ___________ axis

samples axis (sometimes called the samples dimension). In the MNIST example, "samples" are images of digits.

The dot product between two vectors is a _____________

scalar

Estimating the price of a house, given real-estate data, is an example of what?

scalar regression

What are text-processing models that care about word order called?

sequence models

In the context of deep learning, what is another word for 'parameters'?

settings, also 'weights'

What does a layer encapsulate?

some weights and some computation

The weights of a layer represent the _______________ of the layer

state

In order to be useful for a neural network, a list must be turned into a __________________.

tensor

Selecting specific elements in a tensor is called _______________

tensor slicing

Selecting specific elements in a tensor is called __________________

tensor slicing

What is the difference between tensor.shape and tensor.size()?

tensor.shape is an alias to tensor.size(), though the shape is an attribute, and size() is a method

A layer is a data-processing module that takes as input one or more _____________ and that outputs one or more ______________.

tensors

Are weights scalars, matrices, or tensors?

tensors

To train a model, we'll need to update its state, which is a set of ____________.

tensors

In an image classification problem, what will the test set consist of?

test_images and test_labels. (That is, a set of images and their corresponding labels)

What is the class meant to manage modifiable state in TensorFlow?

tf.Variable

What is a constant tensor?

tf.constant are fixed values, and hence not trainable

In Keras, what is the single abstraction around which everything is centered?

the Layer class

What is the first axis of a batch tensor (axis 0) called?

the batch axis or batch dimension. This is a term you'll frequently encounter when using Keras and other deep learning libraries.

Input data in a neural network corresponds with what output data?

the corresponding targets

What is the meaning of 'the number of units in the layer'?

the dimensionality of representation space of the layer

To train a model, what method do you use?

the fit() method

The gradient tape is capable of computing second-order gradients. What are they?

the gradient of a gradient

What is the core building block of neural networks?

the layer

How/where do you specify the three required elements before you start training a model?

the model.compile() method.

Only vectors with are compatible for a dot product

the same number of elements

Neural networks work best when all data is scaled to ______________________.

the same range

What does the encoder use as initial state?

the state vectors from the encoder, which is how the decoder obtains information about what it is supposed to generate. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.

What is an advantage of having more units?

this allows your model to learn more-complex representations

What kind of variables are tracked by default?

trainable variables

The specification of what a layer does to its input data is stored in the layer's ____________________.

weights

You can take the dot product of two matrices x and y (dot(x, y)) if and only if _____________.

x.shape[1] == y.shape[0]

In numpy, how would you create a random matrix named y with shape (10,)?

y = np.random.random((10,))

Are NumPy arrays assignable?

yes

Like layers, do metrics have an internal state stored in TensorFlow variables?

yes


Set pelajaran terkait

Sleeman Midterm 1 : WWU Econ 206

View Set

doing nothing is something (review questions)

View Set

chapter 1 life and health insurance exam

View Set