Machine Learning
How would you compute action advantage?
Subtract the mean and divide by the standard deviation
The idea behind the convolution neural networks is to filter the images before training the deep network. features In the images come forefront and spot the features to identify some thing.
True
Y is the __________
Discount factor
Reinforcement learnings training instances are generally
Not independent
A, B, C, D are actions that are taken consecutively by the agent. A->B->C->D Action A -> reward 50 Action B -> reward 10 Action C -> reward -30 Action D -> reward 20 Discount factor is 0.5 What is the return for Action B?
0
Which of the following is a common strategy for the credit assignment problem? A.) evaluate an action based on the sum of all the rewards that come after it,usually applying a discount factor y gamma) at each step. B.) Evaluate an action based on the sum of all the rewards that become before it, usually applying a discount factor y (gamma) at each step. C.) Evaluate in action based on the sum of all the rewards that come after and before it, usually applying a discount factor why (gamma) at each step. D.) None of the options listed
A
____________ is how much better or worse in action is, compared to the other possible actions, on average.
Action advantage
In reinforcement learning, a software ________ makes observations and take actions within an _________, and in return receives _______
Agent, environment, rewards
Which of the following is true for convolution neural networks
All of the options listed
When used for reinforcement learning, a neural network will estimate a probability for each action, and then we will select in action randomly, according to the estimate it probabilities. Why are we picking a random action based on the probabilities rather than just picking the action with the highest score?
Allows the agent to find the right balance between exploring new actions and exploiting the actions that are known to work well
The _____________ Algorithm is used to decide The filter in weights in the convolutional and fully connected layers of the CNN's
Back propagation
The agent needs to find the right balance between exploring the environment, looking for new ways of getting rewards, and exploding sources of resources that it already knows.
Balance exploring vs exploiting
Provides an extra bias feature is generally added (Xo = 1): outputs 1 all the time
Bias neuron
The ___________ Is a very simple environment composed of a cart that couldn't move left or right, and pole place vertically on top of it. The agent must move the cart left or right to keep the pool upright.
Cartpole
Which of the following can be done to tune the hyper parameters for validation accuracy of a neural network is not too good.
Change the learning rate, try another optimizer, try tuning model hyper parameters, try other hyper parameters such as batch size
And reinforcement learning, the only God is the agent gets us through rewards, and rewards are typically sparse and delayed. Therefore, the last action is not entirely responsible for getting the last reward. __________ decides on how the reward credit should be distributed to the action taken.
Credit assignment problem
Which of the following are true: A.) The policy could be a neural network B.) The policy does not have to be deterministic C.) In some cases it does not even have to observe the environment D.)All of the options listed
D
When an artificial neural network contains a deep stack of hidden layers, it is called a
Deep neural network
Richard Bellman algorithm to find the optimal State value in Markoff decision process uses the_____________ algorithm design technique, which breaks down a complex problem into traceable sup problems that can be tackled iteratively.
Dynamic programming
Which of the following is a policy search technique: A.) Stochastic policy search B.) Brute force policy search C.) Policy search using genetic algorithms D.) Policy search using optimization algorithms E.) all of the options listed
E
Which of the following is true for perceptron's? A.) perceptions are based on a threshold logic unit B.) perceptron is a simple artificial neural network architecture C.) A perceptron is composed of a single layer of TLUs, with each TLU connected to all the inputs D.)Perceptron were invented in 1957 by Frank Rosenblatt E.) all of the options
E
One of the challenges of reinforcement learning is that in order to train an agent, you first need to have a working _________
Environment
How does reinforcement learning learn the unknown?
Experience each state in transition at least wants to know the reward and experience the multiple times to estimate the transition probabilities.
Supervised and Unsupervised Learning generally don't need to worry about ___________
Exploration
What is the goal of reinforcement learning?
Find a good policy
Using the back propagation algorithm, we figure out the filters that will give the best output this process is called
Feature extraction
The ________ method of the model created by using kera's library returns a __________ object.
Fit(), history
When all the neurons in a layer are connected to every neuron in the previous layer
Fully connected layer, or a dense layer
To find the best hyper parameters of a neural network, we can use ________ or ________ to explore the hyper parameter
Grid search CV, random search CV
Step functions used in perceptrons could be ___________ step function or the ____________ function
Heaviside, sign
A single TLU can be used for simple ____________ (Like a logistic regression or linear SVM classifier)
Linear binary classification
Supervised and Unsupervised Learning are ___________
Independent
Supervised learning's supervision is __________ by means of giving labeled data
Indirect
The special passthrough neurons output whatever input they are fed and feed the perceptron. All the input neurons from the input layer
Input neurons
Developed by François Chollet, Open-sources in 2015, is a high-level deep learning API that allows you to easily build, train, evaluate, and execute all sort of neural networks
Keras
During neural network training, network learning means ___________ The cost function
Minimizing
Unsupervised learning's supervision is __________
Nonexistent
A multi layer perceptron is composed of: ______________ ______________ ______________
One input layer, one or more layers of hidden layers, one output layer
McCulloch and Pitts proposed a very simple model of biological neurons called artificial neurons it has _____________ binary (on/off) inputs and one _____________ output. A neuron is activated when _____________ of its inputs are active
One or more, binary, at least two
__________ Is a tool kit for developing in comparing reinforcement learning algorithms
Open AI gym
The ____________, noted pi times (S): when the agent is in state s, it should choose the action with the highest Q value for that state
Optimal policy
The algorithm a software agent uses to determine is action is called
Policy
__________ layer of the CNN's shrinks the image stack into a smaller size ________ groups the Pixels of the images and filters them down to subset
Pooling
Facebook's ___________ Is a library, released in 2018, that can be utilized for building, training, executing neural networks
PyTorch
_______________ Understate action pier (S, E) is the sum of discounted future rewards the agent can expect on average after reaches the state S and chooses action a, but before it sees the outcome of this action, assuming it acts optimally after that action
Q value
We initialize the weights and bias __________. Initially neural network performs very poorly
Randomly
R (s, a, s') is the ____________ when it goes from state s to state s', when the agent chooses action a.
Reward
Reinforcement learning's supervision is learned through
Rewards
The ___________ library is ___________ Learning library based on __________ , Developed at google an open source in 2018.
TF-Agents, reinforcement, TensorFlow
Which of these is not a layer in CNN's
TLU layer
What are the main differences of the neural network architecture while using the sequential API of KERAS for regression in for classification
The output layer has a single neuron, uses no activation function, the loss function is the mean squared error
The inputs and outputs of a ____________ TLU are ___________ (instead of binary on/off values), and each input connection is associated with a weight
Threshold logic unit, numbers
What is the goal of supervised and unsupervised learning?
To find patterns in the data ans maks predictions
T (s, a, s') is the ____________ from state s to state s', when the agent chooses action a.
Transition probability
During reinforcement learning, the agent is not explicitly given the correct answer (i.e. labeled data). It must learn by ___________
Trial and error
While reinforcement learning was an old branch of machine learning existing since 1950s, the recent successes are mainly due to the application of the power of deep learning to the field of reinforcement learning
True
While exploring a search space to find the best type of parameters for a neural network, we can utilize the __________ technique that explores a region more when a region of the search space turns out to be good
Zooming