Ch. 5 Adversarial Search

¡Supera tus tareas y exámenes ahora con Quizwiz!

What are Alpha and Beta.

*Alpha = the value of the best (highest vlaue) choice we have found so far at any choice point along the path for MAX. *Beta = The best (lowest value) choice we have found so far for any choice point along the path for MIN.

Min nodes and Max nodes

*Min node can take on a value of AT MOST the smallest successor value. *Max node can take on a value of AT LEAST the largest successor value.

Card game partial observability solutions (3 steps)

1) Consider all deals of unseen cards 2) Solve all possible unseen card hands as if they were being played in a fully observable game 3) Choose move with best outcome averaged over all the deals.

Parts of an evaluation function (1 part, 2 kinds of functions)

1) Features: *Important data about state of the game. *Features taken together define categories of states. 2) Expected value (percent win/loss/tie * value of win/loss/tie) can be determined for each category, resulting in an evaluation function. 3) Weighted function *Evaluation function is a linear combination of feature vectors. *Involves assumption that features are independent.

Constraints of an evaluation function (3 constraints)

1) Should order terminal states in the same way as the utility function. 2) Computation shouldn't take too long. 3) For non-terminal states, the evaluation function should be strongly correlated with the actual chance of winning.

Monte carlo simulation

1) Start with an alpha-beta search algorithm 2) From start position, play thousands of games against itself, using random dice rolls. 3) Resulting win percentage for each position is a good approximation of the value of the position.

Stochastic game

A game that includes some element of randomness. *Due to dice games *Due to partial observability

Policy

A mapping from every possible state to the best move in that state. *Usually only possible for end game.

Accidental checkmate

A move in which it is not known that the goal wouldn't be in the resulting belief state space.

Alpha Beta pruning with respect to bounds.

Alpha Beta search updates the values of alpha and beta as it goes along and prunes the remaining branches at a node AS SOON AS the value of the current node is known to be worse than the current alpha or beta for MAX or MIN.

Equilibrium solution

An optimal randomized strategy for each player in a game.

Averaging over clairvoyance

Choosing best move averaged over all possible unseen combinations. *Assumes that the game will be come fully observable to both players after the first move. *FAILS because it does not consider the BELIEF state that an agent will be in after acting. Thus, its assumption means that it will never select states that gather information.

Beam search forward pruning

Consider only a small subset of nodes based on their value when the evaluation function is applied.

Chance nodes (And added condition for minimax)

Each branch is labeled with an outcome and a probability. The value for the chance node is the EXPECTED VALUE of the child nodes of the chance node. *Ev = Sum( Prob(X) * X.value) *Extra condition added to minimax algorithm: *Sum(Prob(r)ExpectedMinimax(s,r)) if Player(s) = Chance

Competitive environment

Environment in which agent's goals conflict. (Games)

Zero-sum game of perfect information

Environment that is deterministic, fully observable, where two agents act alternatively, and utility values are always equal and opposite.

Restrictions on evaluation function for stochastic games

Evaluation function must be a positive linear transformation of the probability of winning from a position. (Pg 179)

Quiescence search

Extra search on neighboring nodes that can determine if a state is quiescent or not.

Evaluation function

Function that allows us to approximate the utility of a state without doing a complete search.

Zero-sum game

Game where total payoff to all players is the same for every instance of the game.

Games formal relationship to search

Games can be formalized as a kind of search. Has the same 6 standard parts (Definition of all possible states, initial state, possible actions in a state, transition function, termination state (goal like), path cost (utility function)

Imperfect information

Games in which the environment isn't fully observable.

Partially observable strategy

Goal is to move to good positions, but also to MINIMIZE the amount of information the opponent has.

Pruning

Ignore portions of the search tree that make NO DIFFERENCE to the final choice.

Table lookup methods

Methods that have a table of states, and moves to take i each state. *Only useful for opening moves. *Typically after 10 moves, game is in a state that is rarely seen.

Minimax Algorithm structure (3 parts)

Minimax(s) = 1) Utility(s) if Terminal-Test(s) 2) Max a in actions: Minimax(Result(a,s)) if Player(s) = MAX 3) Min a in actions: Minimax(Result(a,s)) if Player(s) = MIN *Each node truly only evaluates its successor nodes. *Minimax values are backed up through the tree as the recursion unwinds. *For multiplayer games, the single value of an action is replaced with a vector, in which each player tries to maximize over their move. *Time complexity: O(b^m) *Space complexity: O(bm)

Singular extension

Move that is "clearly better" than all other moves in a position. *Once discovered, the move is remembered. If move is legal when search comes to it later, the move is taken.

Forward pruning

Pruning moves at a given node IMMEDIATELY, without further consideration. *Beam search *ProbCut: Uses statistics gained from prior experience to lessen the chance that the best move will be pruned.

Retrograde minimax search

Reverse the rules of a game to do unmoves rather than moves. *Solve the game backwards from goal state to current state.

Quiescent state

State that is unlikely to exhibit wild swings in the value of an evaluation function in the near future. *Evaluation functions should only be applied to quiescent states.

Optimal strategy

Strategy that leads to outcomes at least as good as any other strategy when one is playing an infallible opponent.

Real time games

Time limit is involved, so a cutoff test and evaluation function must be used. *Utility(s) and Terminal-test(s) are replaced with Evaluation(s) and Cutoff-test(s, d) *The Evaluation(s) function is a heuristic that estimates the expected utility of a given state.

Game tree

Tree in which nodes are game states, and edges are actions. Each level alternates which player's move led to a game state.

Minimax Algorithm concept

Try to maximize the utility of each of your turns, while also trying to minimize the utility of your opponent's utility for each of their turns.

Killer move

Trying to play the best moves first when playing a game.

Partially observable games and uncertainty

Uncertainty in these games arises completely from lack of access to the choices made by the opponent. *Use belief states and the belief state search space to solve the problem. Ex. Poker, Kerigspeil

Alpha-Beta pruning concept

We can compute the correct minimax decision without looking at every node in the game tree. *Effectively cuts exponential time complexity in half. *Concept is that decision subtrees that end up worse for the current player, or better for the opponent than moves previously found shouldn't be explored any further.

Alpha/Beta Move ordering

Which node gets evaluated next heavily affects the performance of Alpha/Beta pruning. Thus, methods for choosing the next node to process are important. *Choose next nodes that look the most promising. ***Use iterative deepening to gain more information about the current move!!!! Search 1 ply deep, then order nodes to expand in next ply based on results.


Conjuntos de estudio relacionados

Business Ethics 10th Ed. Ferrell

View Set

ITN 261 Final Exam (Chapter 15-19)

View Set

(5) Improved communication Systems

View Set