Ai Final Study Guide
26. Which technique is used to train a Q-network in reinforcement learning? a) Gradient ascent b) Experience replay c) Policy gradient d) Bellman equation
Answer: b) Experience replay Explanation: Experience replay involves training a Q-network using random minibatches of transitions stored in a replay memory.
30. In Q-learning, what does the term "exploration-exploitation trade-off" refer to? a) Balancing the exploration of new states with exploiting known information b) Balancing the loss function with the value function c) Balancing the Q-values with the state-action pairs d) Balancing the gradient descent with the gradient ascent
Answer: a) Balancing the exploration of new states with exploiting known information Explanation: Q-learning involves choosing between exploring new states to gather more information and exploiting known information to maximize rewar
13. In Q-learning, what does the term "exploration-exploitation trade-off" refer to? a) Balancing the exploration of new states with exploiting known information b) Balancing the loss function with the value function c) Balancing the Q-values with the state-action pairs d) Balancing the gradient descent with the gradient ascent
Answer: a) Balancing the exploration of new states with exploiting known information Explanation: Q-learning involves choosing between exploring new states to gather more information and exploiting known information to maximize reward.
47. In Q-learning, what does the term "exploration-exploitation trade-off" refer to? a) Balancing the exploration of new states with exploiting known information b) Balancing the loss function with the value function c) Balancing the Q-values with the state-action pairs d) Balancing the gradient descent with the gradient ascent
Answer: a) Balancing the exploration of new states with exploiting known information Explanation: Q-learning involves choosing between exploring new states to gather more information and exploiting known information to maximize reward.
10. What problem does experience replay in reinforcement learning aim to solve? a) Correlated samples in training data b) Overfitting of the Q-network c) Exploration-exploitation trade-off d) Gradient vanishing problem
Answer: a) Correlated samples in training data Explanation: Experience replay helps to break the correlation between consecutive samples, improving the efficiency of training.
27. What problem does experience replay in reinforcement learning aim to solve? a) Correlated samples in training data b) Overfitting of the Q-network c) Exploration-exploitation trade-off d) Gradient vanishing problem
Answer: a) Correlated samples in training data Explanation: Experience replay helps to break the correlation between consecutive samples, improving the efficiency of training.
44. What problem does experience replay in reinforcement learning aim to solve? a) Correlated samples in training data b) Overfitting of the Q-network c) Exploration-exploitation trade-off d) Gradient vanishing problem
Answer: a) Correlated samples in training data Explanation: Experience replay helps to break the correlation between consecutive samples, improving the efficiency of training.
11. What does the Bellman equation describe in reinforcement learning? a) The update rule for Q-values in Q-learning b) The gradient of the policy function c) The expected return from complete episodes d) The process of experience replay
Answer: a) The update rule for Q-values in Q-learning Explanation: The Bellman equation describes how the Q-values should be updated based on the observed reward and the estimate of future rewards.
28. What does the Bellman equation describe in reinforcement learning? a) The update rule for Q-values in Q-learning b) The gradient of the policy function c) The expected return from complete episodes d) The process of experience replay
Answer: a) The update rule for Q-values in Q-learning Explanation: The Bellman equation describes how the Q-values should be updated based on the observed reward and the estimate of future rewards.
45. What does the Bellman equation describe in reinforcement learning? a) The update rule for Q-values in Q-learning b) The gradient of the policy function c) The expected return from complete episodes d) The process of experience replay
Answer: a) The update rule for Q-values in Q-learning Explanation: The Bellman equation describes how the Q-values should be updated based on the observed reward and the estimate of future rewards.
4. What is the actor in policy-based reinforcement learning? a) A function that estimates the value of state-action pairs b) A neural network that determines the probability of taking each action c) An algorithm that updates the Q-values based on experience d) A function that computes the gradient of the loss function
Answer: b) A neural network that determines the probability of taking each action Explanation: The actor in policy-based methods is typically represented by a neural network that outputs the probability distribution over actions given a state.
43. Which technique is used to train a Q-network in reinforcement learning? a) Gradient ascent b) Experience replay c) Policy gradient d) Bellman equation
Answer: b) Experience replay Explanation: Experience replay involves training a Q-network using random minibatches of transitions stored in a replay memory.
9. Which technique is used to train a Q-network in reinforcement learning? a) Gradient ascent b) Experience replay c) Policy gradient d) Bellman equation
Answer: b) Experience replay Explanation: Experience replay involves training a Q-network using random minibatches of transitions stored in a replay memory.
3. Which approach in reinforcement learning aims to optimize the policy directly? a) Value-based methods b) Policy-based methods c) Actor-critic methods d) Q-learning
Answer: b) Policy-based methods Explanation: Policy-based methods directly optimize the policy function to maximize the expected reward.
19. What is the primary advantage of using a Q-network over Q-tables in reinforcement learning? a) Q-networks require less memory storage b) Q-networks can handle continuous action spaces c) Q-networks provide faster convergence d) Q-networks are more interpretable
Answer: b) Q-networks can handle continuous action spaces Explanation: Q-networks can approximate Q-values for continuous action spaces, which is not feasible with Q-tables.
36. What is the primary advantage of using a Q-network over Q-tables in reinforcement learning? a) Q-networks require less memory storage b) Q-networks can handle continuous action spaces c) Q-networks provide faster convergence d) Q-networks are more interpretable
Answer: b) Q-networks can handle continuous action spaces Explanation: Q-networks can approximate Q-values for continuous action spaces, which is not feasible with Q-tables.
24. Which approach in reinforcement learning updates the value estimates incrementally at each step? a) Monte Carlo methods b) Temporal-Difference methods c) Policy Gradient methods d) Q-learning
Answer: b) Temporal-Difference methods Explanation: Temporal-Difference methods update the value estimates at each time step based on the current estimate and the observed reward.
41. Which approach in reinforcement learning updates the value estimates incrementally at each step? a) Monte Carlo methods b) Temporal-Difference methods c) Policy Gradient methods d) Q-learning
Answer: b) Temporal-Difference methods Explanation: Temporal-Difference methods update the value estimates at each time step based on the current estimate and the observed reward.
7. Which approach in reinforcement learning updates the value estimates incrementally at each step? a) Monte Carlo methods b) Temporal-Difference methods c) Policy Gradient methods d) Q-learning
Answer: b) Temporal-Difference methods Explanation: Temporal-Difference methods update the value estimates at each time step based on the current estimate and the observed reward.
25. What does the Q-function represent in Q-learning? a) The probability distribution over actions given a state b) The expected cumulative reward obtained from a state-action pair c) The value of a state under a particular policy d) The gradient of the policy function
Answer: b) The expected cumulative reward obtained from a state-action pair Explanation: The Q-function estimates the expected cumulative reward obtained by taking a specific action in a given state.
42. What does the Q-function represent in Q-learning? a) The probability distribution over actions given a state b) The expected cumulative reward obtained from a state-action pair c) The value of a state under a particular policy d) The gradient of the policy function
Answer: b) The expected cumulative reward obtained from a state-action pair Explanation: The Q-function estimates the expected cumulative reward obtained by taking a specific action in a given state.
8. What does the Q-function represent in Q-learning? a) The probability distribution over actions given a state b) The expected cumulative reward obtained from a state-action pair c) The value of a state under a particular policy d) The gradient of the policy function
Answer: b) The expected cumulative reward obtained from a state-action pair Explanation: The Q-function estimates the expected cumulative reward obtained by taking a specific action in a given state.
22. What does R_θ represent in reinforcement learning? a) The state-value function b) The expected reward obtained by an actor c) The Q-value function d) The loss function
Answer: b) The expected reward obtained by an actor Explanation: R_θ represents the expected cumulative reward obtained by an actor under a particular policy.
39. What does R_θ represent in reinforcement learning? a) The state-value function b) The expected reward obtained by an actor c) The Q-value function d) The loss function
Answer: b) The expected reward obtained by an actor Explanation: R_θ represents the expected cumulative reward obtained by an actor under a particular policy.
5. What does R_θ represent in reinforcement learning? a) The state-value function b) The expected reward obtained by an actor c) The Q-value function d) The loss function
Answer: b) The expected reward obtained by an actor Explanation: R_θ represents the expected cumulative reward obtained by an actor under a particular policy.
21. What does the actor represent in policy-based reinforcement learning? a) The Q-value function b) The probability distribution over actions given a state c) The loss function d) The gradient of the policy function
Answer: b) The probability distribution over actions given a state Explanation: The actor represents the policy function, which outputs the probability distribution over actions given a state.
38. What does the actor represent in policy-based reinforcement learning? a) The Q-value function b) The probability distribution over actions given a state c) The loss function d) The gradient of the policy function
Answer: b) The probability distribution over actions given a state Explanation: The actor represents the policy function, which outputs the probability distribution over actions given a state.
2. In reinforcement learning, what does the agent observe before taking an action? a) The reward obtained from the previous action b) The state of the environment c) The actions taken by other agents d) The gradient of the loss function
Answer: b) The state of the environment Explanation: The agent observes the current state of the environment before deciding which action to take.
14. What is the primary objective of experience replay in reinforcement learning? a) To store the entire trajectory of a game b) To break the correlation between consecutive samples c) To optimize the policy function d) To update the Q-values based on reward signals
Answer: b) To break the correlation between consecutive samples Explanation: Experience replay helps to decorrelate the training samples, making learning more efficient.
31. What is the primary objective of experience replay in reinforcement learning? a) To store the entire trajectory of a game b) To break the correlation between consecutive samples c) To optimize the policy function d) To update the Q-values based on reward signals
Answer: b) To break the correlation between consecutive samples Explanation: Experience replay helps to decorrelate the training samples, making learning more efficient.
48. What is the primary objective of experience replay in reinforcement learning? a) To store the entire trajectory of a game b) To break the correlation between consecutive samples c) To optimize the policy function d) To update the Q-values based on reward signals
Answer: b) To break the correlation between consecutive samples Explanation: Experience replay helps to decorrelate the training samples, making learning more efficient.
1. What is the main objective of reinforcement learning? a) To minimize the error between predicted and actual outputs b) To maximize the expected reward obtained by an agent c) To optimize the loss function of a neural network d) To minimize the variance of the training data
Answer: b) To maximize the expected reward obtained by an agent Explanation: Reinforcement learning aims to maximize the cumulative reward received by an agent interacting with an environment.
12. Which approach in reinforcement learning is mainly used for processing closely related continuous events? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
Answer: c) Policy Gradient methods Explanation: Policy Gradient methods are suitable for processing continuous events where the entire trajectory is needed.
29. Which approach in reinforcement learning is mainly used for processing closely related continuous events? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
Answer: c) Policy Gradient methods Explanation: Policy Gradient methods are suitable for processing continuous events where the entire trajectory is needed.
46. Which approach in reinforcement learning is mainly used for processing closely related continuous events? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
Answer: c) Policy Gradient methods Explanation: Policy Gradient methods are suitable for processing continuous events where the entire trajectory is needed.
20. Which method in reinforcement learning aims to directly optimize the policy function? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
Answer: c) Policy Gradient methods Explanation: Policy Gradient methods directly optimize the policy function to maximize the expected reward.
37. Which method in reinforcement learning aims to directly optimize the policy function? a) Q-learning b) Temporal-Difference methods c) Policy Gradient methods d) Monte Carlo methods
Answer: c) Policy Gradient methods Explanation: Policy Gradient methods directly optimize the policy function to maximize the expected reward.
18. Which technique is used to update the Q-values incrementally at each time step? a) Experience replay b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
Answer: c) Temporal-Difference methods Explanation: Temporal-Difference methods update the Q-values at each time step based on the observed reward and the current estimate.
35. Which technique is used to update the Q-values incrementally at each time step? a) Experience replay b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
Answer: c) Temporal-Difference methods Explanation: Temporal-Difference methods update the Q-values at each time step based on the observed reward and the current estimate.
17. What does the Q-network estimate in reinforcement learning? a) The probability distribution over actions given a state b) The state-value function c) The Q-value function d) The gradient of the policy function
Answer: c) The Q-value function Explanation: The Q-network estimates the Q-values, which represent the expected cumulative reward obtained by taking a specific action in a given state.
34. What does the Q-network estimate in reinforcement learning? a) The probability distribution over actions given a state b) The state-value function c) The Q-value function d) The gradient of the policy function
Answer: c) The Q-value function Explanation: The Q-network estimates the Q-values, which represent the expected cumulative reward obtained by taking a specific action in a given state.
23. What is the primary objective of the Monte Carlo method in reinforcement learning? a) To estimate the value of state-action pairs incrementally b) To update the Q-values based on experience at each time step c) To estimate the expected return from complete episodes/trajectories d) To optimize the policy using gradient ascent
Answer: c) To estimate the expected return from complete episodes/trajectories Explanation: The Monte Carlo method estimates the expected return by averaging the rewards obtained from complete episodes.
40. What is the primary objective of the Monte Carlo method in reinforcement learning? a) To estimate the value of state-action pairs incrementally b) To update the Q-values based on experience at each time step c) To estimate the expected return from complete episodes/trajectories d) To optimize the policy using gradient ascent
Answer: c) To estimate the expected return from complete episodes/trajectories Explanation: The Monte Carlo method estimates the expected return by averaging the rewards obtained from complete episodes.
6. What is the purpose of the Monte Carlo method in reinforcement learning? a) To estimate the value of state-action pairs incrementally b) To update the Q-values based on experience at each time step c) To estimate the expected return from complete episodes/trajectories d) To optimize the policy using gradient ascent
Answer: c) To estimate the expected return from complete episodes/trajectories Explanation: The Monte Carlo method estimates the expected return by averaging the rewards obtained from complete episodes.
16. Which method is used to estimate the value of state-action pairs through complete episodes? a) Q-learning b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
Answer: d) Monte Carlo methods Explanation: Monte Carlo methods estimate the value of state-action pairs by averaging the rewards obtained from complete episodes.
33. Which method is used to estimate the value of state-action pairs through complete episodes? a) Q-learning b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
Answer: d) Monte Carlo methods Explanation: Monte Carlo methods estimate the value of state-action pairs by averaging the rewards obtained from complete episodes.
50. Which method is used to estimate the value of state-action pairs through complete episodes? a) Q-learning b) Policy Gradient c) Temporal-Difference methods d) Monte Carlo methods
Answer: d) Monte Carlo methods Explanation: Monte Carlo methods estimate the value of state-action pairs by averaging the rewards obtained from complete episodes.
15. What is the main limitation of using Q-tables in reinforcement learning? a) They are computationally expensive to update b) They cannot handle continuous action spaces c) They require prior knowledge of the environment dynamics d) They suffer from the curse of dimensionality
Answer: d) They suffer from the curse of dimensionality Explanation: Q-tables become impractical for large state spaces due to the exponential growth of the number of states.
32. What is the main limitation of using Q-tables in reinforcement learning? a) They are computationally expensive to update b) They cannot handle continuous action spaces c) They require prior knowledge of the environment dynamics d) They suffer from the curse of dimensionality
Answer: d) They suffer from the curse of dimensionality Explanation: Q-tables become impractical for large state spaces due to the exponential growth of the number of states.
49. What is the main limitation of using Q-tables in reinforcement learning? a) They are computationally expensive to update b) They cannot handle continuous action spaces c) They require prior knowledge of the environment dynamics d) They suffer from the curse of dimensionality
Answer: d) They suffer from the curse of dimensionality Explanation: Q-tables become impractical for large state spaces due to the exponential growth of the number of states.
