COMP 560 Midterm: A.I. Artificial Intelligence

¡Supera tus tareas y exámenes ahora con Quizwiz!

Quiescence search

(Mitigates the horizon effect) Continue the search until the position becomes stable. In chess, make sure that you complete all piece captures.

most constraining variable

(Used to improve backtracking) Picks a variable that affects the values for the most other variables

Least Constraining Variable

(Used to improve backtracking) Picks the variable that constrains the least number of other nodes aka choose value that rules out the smallest number of values in variables connected to the chosen variable by constraints It is a good heuristic to choose the variable that is most constrained but the value that is least constraining in a CSP search.

Admissibility (of A* heuristic)

Admissibility means never overestimating the cost

Games: Perfect Information

monopoly, chess, checkers, tic-tac-toe (you can see the whole board)

Observable Environment

the agent always knows the current state. Example: chess, crossword Parital: poker Counterexample: guessing games

Known Environment

the agent knows which states are reached by each action example: checkers counter: unknown environment (might require some exploration of the solution space)

RL: Basic Process

1. In each time step, take some action 2. Observe resulting state and reward 3. Update internal representation of the environment and policy 4. Upon reaching a terminal state, start over. (Mostly, esp. at the beginning, this is failure.)

Markov Decision Process

1. States S, with a beginning state s-sub-0 2. Actions A(s) which are the actions available to us 3. Transition Model P(s' | s, a) the probability of ending up in state s' given current state s and action a 4. Reward function R(s, a, s') the reward we'll get from moving from s to s' with the action a 5. Policy pi(s) which returns the action to take when in a given states Consists of a set of states, set of actions, a probability function for the next state given the current state and action, and an immediate reward function given the current state, current action, and next state.

Local Search: How not to get stuck

1. random restarts 2. multiple random starting positions 3. random sideways jumps 4. simulated annealing 5. beam search 6. genetic algorithm

Basic steps for local search

1. start with an initial assignment 2. iterations are guided by an objective function trying to move state toward the goal state 3. at each iteration, choose any conflicted variable and re-assign it to a value that reduces the number of conflicts (violated constraints)

Minimax

Algorithm to choose the next move in a 2-player game. At each decision point in the tree the player chooses their best possible move. Recursively this is the maximum move of minimum moves of maximums of... In a simple game like tic-tac-toe, values can be assigned to win, loss or draw results. But in a more complex game like chess, where it is impossible to fully traverse the decision tree, minimax has to terminate at intermediate depths and evaluate the static position. This evaluation is an estimate of the final outcome given all current information, a very basic chess static position evaluator could use the difference in total piece values. For deterministic games that have too many possibilities to be considered, we can only use a partial tree to estimate our standing. Weaknesses: - Searches nodes that we don't need to search (taken care of with alpha-beta pruning) - Time complexity is prohibitive for large problems - Assumes that both players are playing optimally (not the case with chess!) Time complexity? O(b^m) (Where m is the depth of the search space in the worst case) visualization: https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/minimax_search/

Arc Consistency

All values in a variable's domain satisfy all relevant constraints Basic steps: 1. Create a queue with all the arcs in the CSP 2. While the queue is not empty 3. Pop an arc from the queue (Xi, Xj) 4. Remove any values from the domain of Xi that do not have any valid values in the domain of Xj 5. If any values were removed from the domain of Xi then add all arcs that lead to X If arc consistency is performed on a CSP network of nodes, and only one option for a domain is left in each node, then there exists a unique solution. If multiple values in a domain exist, then a search still needs to be performed. let the max size of a variable domain be d let the number of constraints be e complexity is O(ed^3) worst-case

Alpha-beta

Alpha represents the value that the "max" player is guaranteed to get on a certain branch. Beta represents the value that the "min" player is guaranteed on a certain branch. The best score is kept in mind when exploring nodes, and if a branch is not able to improve the best score, then it is pruned. Ordering matters very much on these searches - the sooner the optimal solution is found, the sooner unpromising branches are discarded. Time: With perfect ordering the runtime drops from O(b^m) to O(b^m/2) doubling the depth that can be searched in minimax time visualization: https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/minimax_search_with_alpha-beta_pruning/

Value Iteration

An algorithm for calculating an optimal policy. The basic idea is to calculate the utility of each state and then use the state utilities to select an optimal action in each state. https://www.youtube.com/watch?v=KovN7WKI9Y0

Greedy Best First Search

Complete (iff loops are handled) Not optimal (returns first solution found. Might take larger steps than is optimal) Time O(b^m) Space O(b^m) Is an informed search, since it uses h(n) where h(n) is n's cost/distance to goal. Chooses nodes that are closest to the goal node first. Tends to use a priority queue to sort the nodes closest to goal.

DFS

Complete (iff search space is finite & we handle loops) Not optimal (returns first solution) Time O(b^m) Space O(bm) b is max number of children for any node m is depth of search space in the worst case Visualization: https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/depth-first_search/

A*

Complete (iff there are a finite number of nodes less than optimal cost) Optimal Time O(# nodes where f(n) < optimal cost) Space O(# nodes where f(n) < optimal cost) Basically combines Uniform Cost with Greedy Best First - it keeps track of how far we've travelled, and how far we have remaining Is an INFORMED SEARCH since it uses an evaluation function. f(n) = h(n) + g(n) total = distance travelled + est. remaining. A* is optimally efficient for any given consistent heuristic. That is, no other optimal algorithm is guaranteed to expand fewer nodes than A*

BFS

Complete (works only if branching factor is finite) Optimal (returns shallowest solution) Time O(b^d) Space O(b^d) b is max number of children for any node d is depth of solution node Visualization: https://thimbleby.gitlab.io/algorithm-wiki-site/wiki/breadth-first_search/

Resource Constraints

Constraining factors (normally physical, like time or pace)

Iterative deepening

Doing searches of increasing depths using the timings to determine whether it's reasonable to search to the next depth or not. We store the current best move found and use that as the first move to search next in the next iteration.

Higher Order Constraints

Example: Y is between X and Z Between(X, Y, Z) Alldiff(A, B, C, D, E) means all variables must be unique

Games: Stochastic

Examples: Poker, monopoly (element of chance)

Games: Deterministic

Examples: chess, checkers (no chance, completely controlled)

Uniform Cost Search

Expands the node n with the lowest path cost (if all step costs are equal, this is identical to a breadth-first search) Complete (iff branching factor is finite) Not optimal (UNLESS cost for each move is the same) Time: O(b^C*/e) where C* is number of nodes with path cost less than optimum cost and e is smallest action cost Space: O(b^C*/e)

Exploration v. Exploitation (pros/cons)

Exploration: take a new action with unknown consequences Pros: Get a more accurate model of the environment Uncover higher value states than you have already found Cons: Might not be maximizing the utility Something bad might happen Exploitation: go with the best strategy you've found so far Pros: Maximize reward as reflected in the current utility functions Avoid bad stuff Cons: Might prevent you from finding the true optimal strategy

Limited Search

Getting a good evaluation function is key. The simplest eval functions are a linear sum of some variables. Deep Blue had 8000 features

Dominance

If h1 and h2 are both admissible and h2(n) >= h1(n) for all n, h2 dominates h1. If h2 dominates h1, then A* will expand fewer nodes with h2 than h1. We want as granular of a measure of goodness/badness as we can, and a higher number (that still does not overestimate) gives us more precise information.

Local Policy Search

In some ways, policy search is the simplest of all the methods in this chapter: the idea is to keep twiddling the policy as long as its performance improves, then stop. Start with an initial (possibly random) policy. Use parameterized representation with far fewer parameters than states. Remember that a policy π is a function that maps states to actions. We are interested primarily in parameterized representations of π that have far fewer parameters than there are states in the state space (just as in the preceding section). For example, we could represent π by a collection of parameterized Q-functions, one for each action, and take the action with the highest predicted value: pi = max of a set of Q-functions that account for action, state space, probabilities, parameters like (in pacman, if pacman is in a tunnel, dots eaten, etc etc etc) Policy search needs an accurate model of the domain before it can find a policy.

Model-based v. Model-free RL

Model-based: know the policy & want to determine transitions and rewards for each state uses a model P and a utility function U Model-free: Only learn the utility functions for each state uses an action-utility function Q Policy-search methods operate directly on a representation of the policy, attempting to improve it based on observed performance. The variation in the performance in a stochastic domain is a serious problem; for simulated domains this can be overcome by fixing the randomness in advance.

With X assumptions, the solution to any problem is a fixed sequence of actions

Observable, discrete, known, & deterministic Example: The Bucharest to Arad problem

Passive v. active RL

Passive: KNOW the policy & want to determine the utility function for each state Active: KNOW the actions for each state & want to determine the best policy

Games: Imperfect Information

Poker (partial), battleship How do we handle partially observable games?

Constraint Satisfaction Problem

Problem defined as variables with domains, and constraints on the variables

Iterative Deepening Depth First Search

Repetitive DFS with increasing max depth - first 1, then n + 1. Complete Not optimal (returns first solution) Time and space is that of DFS Time? O(b^d) Space? O(bd)

Unary Constraint

Restricts the value of a single variable

Consistency (of A* heuristic)

The addition has to work - it must fulfill triangle inequality, which means that the cost of reaching the goal form cannot be greater than the step cost of getting to n' plus the cost of reading the goal.

Monte Carlo Simulation

Simulate multiple games each randomly assigning the unobservable information and deducing your chances for many random outcomes. (I win in more worlds with this move than this one) Works for some games, but not for betting and bluffing in poker, where you must override the signals.

Preference Constraints

Softer discounting Example: Try not to schedule a professor in the afternoon, but don't make it impossible to do so

Horizon Effect

Something bad is going to happen but it is just beyond the horizon of the search. (example: doing an iterative deepening search for the best move but put a max depth of 6, and disaster is at layer 7) Even worse is that the bad outcome is seen but there are inconsequential moves that can be used to push the bad outcome beyond the horizon of the search.

Policy Iteration/Evaluation v. Passive RL

The main difference is that the passive learning agent does not know the transition model P(s'| s, a), which specifies the probability of reaching state s' from state s after doing action a; nor does it know the reward function R(s), which specifies the reward for each state. The RL agent is more blind - it doesn't need a transition model or probabilities, just what utility we get from following it

How do we find the utility of each state?

The utility of each state equals its own reward plus the expected utility of its successor states.

Transposition tables

There are often multiple paths that lead to the same position and there is no need to search that position again Store a hash of the position along with the value and the depth of that search where that position was previously found. When you process a node in the search, look it up in the transposition table and if it exists use its associated value Example: Endgame solutions in chess - once a board has a recognizable endgame solution, we can cease searching

Move Ordering

Try to order the moves so that the best move is processed first Unfortunately domain dependent and even situation dependent For alpha-beta pruning: BFS can be used to shallowly explore branches Unfortunate ordering can result in processing every single node

most constrained variable

Used to improve backtracking Minimum-remaining-values - it picks a variable that is most likely to cause a failure soon. If some variable has no legal values left, this heuristic will pick that variable right away and detect failure. also called the minimum remaining-values (MRV) heuristic - picks the variable with the fewest available legal values

When does hill climbing work best?

When there is only one solution. Warning: flat spaces can feel like progress - it's easy to get stuck on shoulders.

Forward Checking

When variable X is assigned, delete any value of constraint-graph neighbor variables inconsistent with the assigned value of X If a variable has an empty domain, we must start over. No solution can be found with current assignments. One of the simplest forms of inference is called forward checking. Whenever a variable X is assigned, the forward-checking process establishes arc consistency for it: for each unassigned variable Y that is connected to X by a constraint, delete from Y 's domain any value that is inconsistent with the value chosen for X. There is no reason to do forward checking if we have already done arc consistency as a preprocessing step

Uninformed search algorithms

are given no information about the problem other than its definition. It can find a solution to any solvable problem, but cannot do so efficiently. examples: BFS, DFS, uniform cost, iterative deepening

Discrete Environment

at any given state there are only finitely many actions to choose from. Example: poker (broken into moves) Counter: taxi driving (time itself is possibly continuous) Antonym: Continuous. There are infinitely many options, or the search space is infinitely granular.

Time and space complexity variables b, m, and d

b - Max successors/children of ANY node d - is the depth of the SHALLOWEST GOAL node m - is the maximum length of ANY path in the state space

Informed search algorithms

can do "quite well" given some guidance on where to look for solutions. May have access to a heuristic function that estimates the cost of a solution from n. examples: A* and greedy first

Minimax and alpha-beta algorithms work on games with these features

deterministic, perfect information, 2-player

Deterministic environment

each action has exactly one outcome example: chess, crossword counterexample: poker, taxi driving Antonym: Stochastic (Like the agent in the maze with .8 probability of moving forward, .1 left, .1 right)

Binary Constraint

relates the values of two variables. Example: A(value) != B(value)


Conjuntos de estudio relacionados

Microbio Final exam (old material)

View Set

Soynia RES study guide for State Exam

View Set

Sustainable development on the environment

View Set

Module 22. Studying And Encoding Memories

View Set

Osteoporosis/GERD/Pancreatitis/Urinary Calculi Exam 2 NCLEX

View Set

5. Fundamentals of TCP/ IP Transport and Applications

View Set

CH 9: Sustainable Development Goals

View Set