Machine Learning

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

How would you compute action advantage?

Subtract the mean and divide by the standard deviation

The idea behind the convolution neural networks is to filter the images before training the deep network. features In the images come forefront and spot the features to identify some thing.

True

Y is the __________

Discount factor

Reinforcement learnings training instances are generally

Not independent

A, B, C, D are actions that are taken consecutively by the agent. A->B->C->D Action A -> reward 50 Action B -> reward 10 Action C -> reward -30 Action D -> reward 20 Discount factor is 0.5 What is the return for Action B?

0

Which of the following is a common strategy for the credit assignment problem? A.) evaluate an action based on the sum of all the rewards that come after it,usually applying a discount factor y gamma) at each step. B.) Evaluate an action based on the sum of all the rewards that become before it, usually applying a discount factor y (gamma) at each step. C.) Evaluate in action based on the sum of all the rewards that come after and before it, usually applying a discount factor why (gamma) at each step. D.) None of the options listed

A

____________ is how much better or worse in action is, compared to the other possible actions, on average.

Action advantage

In reinforcement learning, a software ________ makes observations and take actions within an _________, and in return receives _______

Agent, environment, rewards

Which of the following is true for convolution neural networks

All of the options listed

When used for reinforcement learning, a neural network will estimate a probability for each action, and then we will select in action randomly, according to the estimate it probabilities. Why are we picking a random action based on the probabilities rather than just picking the action with the highest score?

Allows the agent to find the right balance between exploring new actions and exploiting the actions that are known to work well

The _____________ Algorithm is used to decide The filter in weights in the convolutional and fully connected layers of the CNN's

Back propagation

The agent needs to find the right balance between exploring the environment, looking for new ways of getting rewards, and exploding sources of resources that it already knows.

Balance exploring vs exploiting

Provides an extra bias feature is generally added (Xo = 1): outputs 1 all the time

Bias neuron

The ___________ Is a very simple environment composed of a cart that couldn't move left or right, and pole place vertically on top of it. The agent must move the cart left or right to keep the pool upright.

Cartpole

Which of the following can be done to tune the hyper parameters for validation accuracy of a neural network is not too good.

Change the learning rate, try another optimizer, try tuning model hyper parameters, try other hyper parameters such as batch size

And reinforcement learning, the only God is the agent gets us through rewards, and rewards are typically sparse and delayed. Therefore, the last action is not entirely responsible for getting the last reward. __________ decides on how the reward credit should be distributed to the action taken.

Credit assignment problem

Which of the following are true: A.) The policy could be a neural network B.) The policy does not have to be deterministic C.) In some cases it does not even have to observe the environment D.)All of the options listed

D

When an artificial neural network contains a deep stack of hidden layers, it is called a

Deep neural network

Richard Bellman algorithm to find the optimal State value in Markoff decision process uses the_____________ algorithm design technique, which breaks down a complex problem into traceable sup problems that can be tackled iteratively.

Dynamic programming

Which of the following is a policy search technique: A.) Stochastic policy search B.) Brute force policy search C.) Policy search using genetic algorithms D.) Policy search using optimization algorithms E.) all of the options listed

E

Which of the following is true for perceptron's? A.) perceptions are based on a threshold logic unit B.) perceptron is a simple artificial neural network architecture C.) A perceptron is composed of a single layer of TLUs, with each TLU connected to all the inputs D.)Perceptron were invented in 1957 by Frank Rosenblatt E.) all of the options

E

One of the challenges of reinforcement learning is that in order to train an agent, you first need to have a working _________

Environment

How does reinforcement learning learn the unknown?

Experience each state in transition at least wants to know the reward and experience the multiple times to estimate the transition probabilities.

Supervised and Unsupervised Learning generally don't need to worry about ___________

Exploration

What is the goal of reinforcement learning?

Find a good policy

Using the back propagation algorithm, we figure out the filters that will give the best output this process is called

Feature extraction

The ________ method of the model created by using kera's library returns a __________ object.

Fit(), history

When all the neurons in a layer are connected to every neuron in the previous layer

Fully connected layer, or a dense layer

To find the best hyper parameters of a neural network, we can use ________ or ________ to explore the hyper parameter

Grid search CV, random search CV

Step functions used in perceptrons could be ___________ step function or the ____________ function

Heaviside, sign

A single TLU can be used for simple ____________ (Like a logistic regression or linear SVM classifier)

Linear binary classification

Supervised and Unsupervised Learning are ___________

Independent

Supervised learning's supervision is __________ by means of giving labeled data

Indirect

The special passthrough neurons output whatever input they are fed and feed the perceptron. All the input neurons from the input layer

Input neurons

Developed by François Chollet, Open-sources in 2015, is a high-level deep learning API that allows you to easily build, train, evaluate, and execute all sort of neural networks

Keras

During neural network training, network learning means ___________ The cost function

Minimizing

Unsupervised learning's supervision is __________

Nonexistent

A multi layer perceptron is composed of: ______________ ______________ ______________

One input layer, one or more layers of hidden layers, one output layer

McCulloch and Pitts proposed a very simple model of biological neurons called artificial neurons it has _____________ binary (on/off) inputs and one _____________ output. A neuron is activated when _____________ of its inputs are active

One or more, binary, at least two

__________ Is a tool kit for developing in comparing reinforcement learning algorithms

Open AI gym

The ____________, noted pi times (S): when the agent is in state s, it should choose the action with the highest Q value for that state

Optimal policy

The algorithm a software agent uses to determine is action is called

Policy

__________ layer of the CNN's shrinks the image stack into a smaller size ________ groups the Pixels of the images and filters them down to subset

Pooling

Facebook's ___________ Is a library, released in 2018, that can be utilized for building, training, executing neural networks

PyTorch

_______________ Understate action pier (S, E) is the sum of discounted future rewards the agent can expect on average after reaches the state S and chooses action a, but before it sees the outcome of this action, assuming it acts optimally after that action

Q value

We initialize the weights and bias __________. Initially neural network performs very poorly

Randomly

R (s, a, s') is the ____________ when it goes from state s to state s', when the agent chooses action a.

Reward

Reinforcement learning's supervision is learned through

Rewards

The ___________ library is ___________ Learning library based on __________ , Developed at google an open source in 2018.

TF-Agents, reinforcement, TensorFlow

Which of these is not a layer in CNN's

TLU layer

What are the main differences of the neural network architecture while using the sequential API of KERAS for regression in for classification

The output layer has a single neuron, uses no activation function, the loss function is the mean squared error

The inputs and outputs of a ____________ TLU are ___________ (instead of binary on/off values), and each input connection is associated with a weight

Threshold logic unit, numbers

What is the goal of supervised and unsupervised learning?

To find patterns in the data ans maks predictions

T (s, a, s') is the ____________ from state s to state s', when the agent chooses action a.

Transition probability

During reinforcement learning, the agent is not explicitly given the correct answer (i.e. labeled data). It must learn by ___________

Trial and error

While reinforcement learning was an old branch of machine learning existing since 1950s, the recent successes are mainly due to the application of the power of deep learning to the field of reinforcement learning

True

While exploring a search space to find the best type of parameters for a neural network, we can utilize the __________ technique that explores a region more when a region of the search space turns out to be good

Zooming


Ensembles d'études connexes

Fundamentals of Nursing Chapter 12: Diagnosing

View Set

Ch. 4: Cultural Dynamics in Assessing Global Markets

View Set

A&P II Chapter 21- Lymphatic System

View Set

Davis Advantage - Chapter 5 End-of-Life Care

View Set

Mortgage Loan Origination Activities

View Set