Ch. 5 Adversarial Search
What are Alpha and Beta.
*Alpha = the value of the best (highest vlaue) choice we have found so far at any choice point along the path for MAX. *Beta = The best (lowest value) choice we have found so far for any choice point along the path for MIN.
Min nodes and Max nodes
*Min node can take on a value of AT MOST the smallest successor value. *Max node can take on a value of AT LEAST the largest successor value.
Card game partial observability solutions (3 steps)
1) Consider all deals of unseen cards 2) Solve all possible unseen card hands as if they were being played in a fully observable game 3) Choose move with best outcome averaged over all the deals.
Parts of an evaluation function (1 part, 2 kinds of functions)
1) Features: *Important data about state of the game. *Features taken together define categories of states. 2) Expected value (percent win/loss/tie * value of win/loss/tie) can be determined for each category, resulting in an evaluation function. 3) Weighted function *Evaluation function is a linear combination of feature vectors. *Involves assumption that features are independent.
Constraints of an evaluation function (3 constraints)
1) Should order terminal states in the same way as the utility function. 2) Computation shouldn't take too long. 3) For non-terminal states, the evaluation function should be strongly correlated with the actual chance of winning.
Monte carlo simulation
1) Start with an alpha-beta search algorithm 2) From start position, play thousands of games against itself, using random dice rolls. 3) Resulting win percentage for each position is a good approximation of the value of the position.
Stochastic game
A game that includes some element of randomness. *Due to dice games *Due to partial observability
Policy
A mapping from every possible state to the best move in that state. *Usually only possible for end game.
Accidental checkmate
A move in which it is not known that the goal wouldn't be in the resulting belief state space.
Alpha Beta pruning with respect to bounds.
Alpha Beta search updates the values of alpha and beta as it goes along and prunes the remaining branches at a node AS SOON AS the value of the current node is known to be worse than the current alpha or beta for MAX or MIN.
Equilibrium solution
An optimal randomized strategy for each player in a game.
Averaging over clairvoyance
Choosing best move averaged over all possible unseen combinations. *Assumes that the game will be come fully observable to both players after the first move. *FAILS because it does not consider the BELIEF state that an agent will be in after acting. Thus, its assumption means that it will never select states that gather information.
Beam search forward pruning
Consider only a small subset of nodes based on their value when the evaluation function is applied.
Chance nodes (And added condition for minimax)
Each branch is labeled with an outcome and a probability. The value for the chance node is the EXPECTED VALUE of the child nodes of the chance node. *Ev = Sum( Prob(X) * X.value) *Extra condition added to minimax algorithm: *Sum(Prob(r)ExpectedMinimax(s,r)) if Player(s) = Chance
Competitive environment
Environment in which agent's goals conflict. (Games)
Zero-sum game of perfect information
Environment that is deterministic, fully observable, where two agents act alternatively, and utility values are always equal and opposite.
Restrictions on evaluation function for stochastic games
Evaluation function must be a positive linear transformation of the probability of winning from a position. (Pg 179)
Quiescence search
Extra search on neighboring nodes that can determine if a state is quiescent or not.
Evaluation function
Function that allows us to approximate the utility of a state without doing a complete search.
Zero-sum game
Game where total payoff to all players is the same for every instance of the game.
Games formal relationship to search
Games can be formalized as a kind of search. Has the same 6 standard parts (Definition of all possible states, initial state, possible actions in a state, transition function, termination state (goal like), path cost (utility function)
Imperfect information
Games in which the environment isn't fully observable.
Partially observable strategy
Goal is to move to good positions, but also to MINIMIZE the amount of information the opponent has.
Pruning
Ignore portions of the search tree that make NO DIFFERENCE to the final choice.
Table lookup methods
Methods that have a table of states, and moves to take i each state. *Only useful for opening moves. *Typically after 10 moves, game is in a state that is rarely seen.
Minimax Algorithm structure (3 parts)
Minimax(s) = 1) Utility(s) if Terminal-Test(s) 2) Max a in actions: Minimax(Result(a,s)) if Player(s) = MAX 3) Min a in actions: Minimax(Result(a,s)) if Player(s) = MIN *Each node truly only evaluates its successor nodes. *Minimax values are backed up through the tree as the recursion unwinds. *For multiplayer games, the single value of an action is replaced with a vector, in which each player tries to maximize over their move. *Time complexity: O(b^m) *Space complexity: O(bm)
Singular extension
Move that is "clearly better" than all other moves in a position. *Once discovered, the move is remembered. If move is legal when search comes to it later, the move is taken.
Forward pruning
Pruning moves at a given node IMMEDIATELY, without further consideration. *Beam search *ProbCut: Uses statistics gained from prior experience to lessen the chance that the best move will be pruned.
Retrograde minimax search
Reverse the rules of a game to do unmoves rather than moves. *Solve the game backwards from goal state to current state.
Quiescent state
State that is unlikely to exhibit wild swings in the value of an evaluation function in the near future. *Evaluation functions should only be applied to quiescent states.
Optimal strategy
Strategy that leads to outcomes at least as good as any other strategy when one is playing an infallible opponent.
Real time games
Time limit is involved, so a cutoff test and evaluation function must be used. *Utility(s) and Terminal-test(s) are replaced with Evaluation(s) and Cutoff-test(s, d) *The Evaluation(s) function is a heuristic that estimates the expected utility of a given state.
Game tree
Tree in which nodes are game states, and edges are actions. Each level alternates which player's move led to a game state.
Minimax Algorithm concept
Try to maximize the utility of each of your turns, while also trying to minimize the utility of your opponent's utility for each of their turns.
Killer move
Trying to play the best moves first when playing a game.
Partially observable games and uncertainty
Uncertainty in these games arises completely from lack of access to the choices made by the opponent. *Use belief states and the belief state search space to solve the problem. Ex. Poker, Kerigspeil
Alpha-Beta pruning concept
We can compute the correct minimax decision without looking at every node in the game tree. *Effectively cuts exponential time complexity in half. *Concept is that decision subtrees that end up worse for the current player, or better for the opponent than moves previously found shouldn't be explored any further.
Alpha/Beta Move ordering
Which node gets evaluated next heavily affects the performance of Alpha/Beta pruning. Thus, methods for choosing the next node to process are important. *Choose next nodes that look the most promising. ***Use iterative deepening to gain more information about the current move!!!! Search 1 ply deep, then order nodes to expand in next ply based on results.