RAD

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

What is a task environment?

"problems" to which rational agents are the solution

For: 10, b, c, d, 1 which gamma is equally good to reach 10 and 1 from state d?

(d + c + b + a)x^n = 1*x^n (0+0+0+10)x^3 = 1x^1 10x^3 = 1x x^2 = 1/10 x = sqrt(1/10) = 0.31

Pseudocode of Tree/Graph Search

*additions for graph function search(problem, fringe) return a solution, or failure *-closed<-an empty set -fringe<- Insert( Make-Node (Initial-State[problem]), fringe) loop do: -if fringe is empty then return failure -node <- Remove-Front(fringe) -if Goal-Test(problem, State[node]) then return node *-if State[node] is not in closed then... (*-)-for child-node in Expand(State[node], problem) do --fringe<- Insert(child-node, fringe) -end end

What are uninformed search strategies (6)?

-BFS -UCS -DFS -DLS -Iterative Deepening DFS -Bidirectional Search

Value vs. Policy Iteration

-both compute same thing (optimal values) -both are dynamic programs for solving MDPs Value: -every iteration updates both values and (implicitly) the policy -policy is not tracked, but taking max over actions implicity recomputes it Policy: -several passes that update utilities with a fixed policy is done ---each pass is fast cause only one action is considered (vs all of them) -after policy is evaluated, new policy is chosen (slow like a value iteration pass) -new policy will be better or optimal

What are some issues with UCS?

-explores in every direction (no goal location information)

What persistent is in a goal-based agent that isn't in others? What about utility-based?

-goal, a description of what the agent would like to achieve Utility: -possible states, possible states that may maximize happiness

Describe MDP

-they are non-deterministic (stochastic) and only depend on the present state -no limitation on the state you'll end up in -contains "noisy movement": actions that not always go as planned (.8 for direction, .1 for 2 other directions) -agent receives reward at each time step (can be negative -big reward comes at the end Goal: maximize sum of rewards from a given state, the agent takes an action at time step t in the environment THEN from the environment, a new state is outputted with time step t+1 and gives a reward r St -- At, Rt+1 -- St+1 -- At+1,Rt+2....

Describe degrees of consistency

1-consistency: -Node consistency: each single node's domain has a value which meets that node's unary constraints 2-consistency: -Arc consistency: for each pair of nodes, any consistent assignment to one can be extended to the other 3-consistency: -Path consistency k-consistency: -for each k nodes, any consistent assignment to k-1 can be extended to the kth node -if strong n-consistency, can solve without backtracking because implies that k-1, k-2, k-3, .... k-n are all consistent

Agent States (3)

1. Atomic - no internal structure (black box) - either matches or does not match what you're looking for - X & Y are chosen with no explanation 2. Factored - each state has attributes value properties -can represent uncertainty ---CSPs, Bayesian Networks ---GPS location, gas in tank 3. Structured - the relationship between objects of a state can be explicitly expressed ---First order logic, natural language understanding, knowledge based learning ---car, truck, and cow are defined by their relationships with each other

Time complexities of... 1. look up table 2. finding element in sorted array with binary search 3. find max element in unsorted array 4. sorting via merge sort 5. sorting via bubble sort 6. 3 variable equation solver 7. find all subsets 8. find all permutations of a given set/string

1. Constant O(1) 2. Logarithmic O(logn) 3. Linear O(n) 4. Linearithemic O(nlogn) 5. Quadratic O(n^2) 6. Cubic O(n^3) 7. Exponential O(2^n) 8. Factorial O(n!)

Your goal is to navigate a robot out of a maze. The robot starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west. You can direct the robot to move forward a certain distance, although it will stop before hitting a wall. 1. Formulate this problem. How large is the state space? 2. In navigating a maze, the only place we need to turn is at the intersection of two or more corridors. Reformulate this problem using this observation. How large is the state space now? 3. From each point in the maze, we can move in any of the four directions until we reach a turning point, and this is the only action we need to do. Reformulate the problem using these actions. Do we need to keep track of the robot's orientation now? 4. In our initial description of the problem we already abstracted from the real world, restricting actions and removing details. List three such simplifications we made.

1. Coordinate system defined so center is at (0,0) and the maze itself is a square from (-1,-1) to (1,1). -initial state = robot at (0,0) facing N -goal test = either |x| > 1 or |y| > 1 where (x,y) is the current location -successor function: move forwards any distance d; change direction robot is facing -cost function: total distance moved state space is infinitely large since robots position is continuous 2. The state will record the intersection the robot is currently at, along with the direction it's facing. at the end of each corridor leaving the maze, there will be an exit node. We'll assume some node corresponds to the center of the maze. -initial state: center of maze, facing north -goal test: an exit node -successor function: move to the next iteration in front of us, if there is one; turn to face a new direction. -cost function: total distance moved there are 4n states, where n is the number of intersections 3. Initial state: center goal: exit node sucessor: move to next intersection - NSEW cost: total distance we no longer need to keep track of the robot's orientation since it is irrelevant to predicting the outcome of our actions and not part of the goal test. The motor system that executes this plan will need to keep track of the robots current orientation to know when to rotate the robot

For each of the following assertions, say whether it is true or false and support your answer with examples or counterexamples where appropriate. 1. An agent that senses only partial information about the state cannot be perfectly rational. 2. There exist task environments in which no pure reflex agent can behave rationally. 3. There exists a task environment in which every agent is rational. 4. The input to an agent program is the same as the input to the agent function. 5. Every agent function is implementable by some program/machine combination. 6. Suppose an agent selects its action uniformly at random from the set of possible actions. There exists a deterministic task environment in which this agent is rational. 7. It is possible for a given agent to be perfectly rational in two distinct task environments. 8. Every agent is rational in an unobservable environment. 9. A perfectly rational poker-playing agent never loses.

1. False 2. True 3. True 4. False 5. False 6. True 7. True 8. False 9. False

What are the first few steps in solving a search problem?

1. Goal formulation -What is the goal 2. Problem formulation -Process of deciding what actions and states to consider, given a goal simply, 1. formulate 2. search 3. execute

Environment Type Characteristics (7)

1. Observability: fully, partial, unobservable -sensors detect all relevant aspects of the environment, sensors have noisy/ inaccurate/ missing/ unavailable data, no sensors ---tictactoe vs. battleship 2. Number of agents: 1 or many - can be competitive (to maximize performance = be best) or cooperative (to maximize performance = avoid collisions) ---car surrounded by trees vs. car surrounded by cars 3. Probability: deterministic vs. stochastic -deterministic = determined by current state and actions executed by the agent -stochastic = outcomes are uncertain/ unpredictable - stochastic = quantifiable by probabilities - non-deterministic = actions characterized by possible outcomes, no probabilities attached ---chess vs. monopoly 4. Independence: episodic vs. sequential -episodic: atomic episodes (receives a percept, performs an action - independent) -sequential: current decision can affect all future decisions ---catching a ball vs. a game of tennis 5. Dynamicity: static, dynamic, semi-dynamic ---observe once and it doesn't ever change, needs to be observed repeatedly, time changes 6. Continuity: discrete vs. continuous -applies to the state of the environment, the percepts and actions of the agent ---integer vs. float 7. Knownness: known vs. unknown -known can be partially observable, unknown can be fully observable (not correlated) ---physics/laws/rules of the world are known vs. not known

What do you need to know to use bidirectional search?

1. Parents 2. Goal (not abstract)

Schemes for Forcing Exploration (3)

1. Random - simplest (epsilon-greedy) - with small probability, epsilon, acts randomly - with large probability (1-epsilon), acts on current policy - Problem: once learning is done, will keep thrashing around -- Solution: lower epsilon over time 2. Exploration Function (count / density based) - explore areas whose badness is not (yet) established - takes value estimate 'u' and visit count 'n' and returns an optimistic utility -- f(u,n) = u + k/n -- Q(s,a) <- R + gamma maxf(Q, N) 3. Regret (result based) - regret = measure of total mistake cost (difference between expected rewards, including youthful suboptimality, and optimal (expected rewards) - minimizing regret requires optimally learning to be optimal

Agent Types (4)

1. Simple Reflex - sees and does, no complication - based on CURRENT state only (history not preserved) - only if the environment is FULLY OBSERVABLE -uses if-then (condition-action rule) 2. Model Based - thinks about model, applies some function, does some action - can handle PARTIAL OBSERVABILITY - state (internal) depends on percept history (best guess) - model/how internal state is updated is based on how the world evolves independently from the agent (a) and how the agent affects the world (b) 3. Goal Based - has an objective, all actions done to achieve it - needs GOAL information (current state knowledge not enough) - combines goal information with environment models to choose actions - considers FUTURE (what will happen if x is chosen) - FLEXIBLE: knowledge supporting decisions is explicitly represented and modifiable 4. Utility Based - an agent tries to maximize its utility - utility function = agents performance measure - agent happiness is required (utility) (goals are not enough)

This exercise explores the differences between agent functions and agent programs. 1. Can there be more than one agent program that implements a given agent function? Give an example, or show why one is not possible. 2. Are there agent functions that cannot be implemented by any agent program? 3. Given a fixed machine architecture, does each agent program implement exactly one agent function? 4. Given an architecture with n bits of storage, how many different possible agent programs are there? 5. Suppose we keep the agent program fixed but speed up the machine by a factor of two. Does that change the agent function?

1. Yes - Assume we are given an agent function whose actions only depend on the previous p percepts. One program can remember the previous p percepts to implement the agent function, while another could remember greater than p percepts and still implement the same agent function. 2. Yes 3. Yes. - Given a percept sequence, an agent program will select an action. To implement multiple agent functions would require the agent program to select different actions (or different distributions of actions) given the same percept sequence. 4. If x is the total number of actions, then the number of possible programs are x^(2^n), 2^n internal states, and x choices for each state 5. No, not directly. - However, this may allow the program to compress its memory further and to retain a better model of the world

Three main problems of HMMs

1. calculate the probability that the given model explains the given evidence sequence 2. calculate the most likely explanation state given the evidence sequence 3. improve the given model to increase the probability that the model explains the given evidence

Two differences of UCS to BFS

1. goal test is applied when selected for expansion rather than upon generation 2. test is added in case a better path is found

What are two things to be wary of when searching?

1. looped paths 2. repeated states

What are the assumptions about CSP world?

1. single agent 2. deterministic 3. fully observable 4. discrete state space

Cons of value iteration

1. slow O(S^2*A) per iteration 2. max rarely changes 3. policy often converges before values

4 components of a node for search

1. state 2. parent 3. action (action that was applied to parent) 4. path cost

What does a search problem consist of?

1. state space 2. successor function /transition model -Actions -Path costs [weighted] 3. start state 4. goal test 5. solution (sequence of actions - a plan - which transforms the start state to a goal state) search problems are models and may not always be clear that it is a search problem

What cases are goal-based agents are inadequate while utility-based agents can still make rational decisions?

1. where there are conflicting goals -utility function specifies the appropriate tradeoff 2. Where there are several goals that the agent can aim for but none of which can be achieved with certainty -utility provides a way in which the likelihood of success can be weighed against the importance of goals

What is in a state space (2)?

1. world state - includes every last detail of the environment 2. search state - keeps only the details needed for planning (abstractions) note: don't get misled by existing graphs, make sure they are relevant to what the problem is asking for consists of position, direction, velocity, items

If we want to find the current EXP value for a node with children of value: 8, 24, -12 with edge of probability: 1/2, 1/3, 1/6 respectively, what is the value of current?

10 (1/2)*8 = 4 (1/3)*24 = 8 (1/6)*-12 = -2 4+8+(-2) = 10

if the leaf nodes for a tree are: 3, 12, 8 2, 4, 6 14, 5, 2 what is the min value preceeding and the max value at root? what if it's the opposite?

3, 12, 8 = 3 2, 4, 6 = 2 14, 5, 2 = 2 root = 3 opposite: 3, 12, 8 = 12 2, 4, 6 = 6 14, 5, 2 = 14 root = 6

if the leaf nodes for a tree are: 3, 12, 8 2, 4, 6 14, 5, 2 what can be pruned if the predecessor is min? What if it is max?

4, 6 in the second branch for max: 5, 2 in the third branch

When gamma is 0.9, and the values are: 1, 1, 2, 4 what is the utility value?

6.436 1 + (1*.9^1) = .9 + (2*.9^2) = 1.62 + (4*.9^3) = 2.916 1 + .9 + 1.62 + 2.916 = 6.436

if the second level leaf nodes for a tree are: 10, 8 4, 50 compute the first level as min, and root as max. Do the opposite. Can anything be pruned? what about for: 10,6 100,8 1, 2 20, 4

8, 4 8 50 can be pruned 10, 50 10 nothing can be pruned 6, 8, 1, 4 8 2 can be pruned 10, 100, 2, 4 100 8 can be pruned

With: Noise = 0.2 Discount = 0.9 Living Reward = 0 End reward = 1 with probabilites 0.8, 0.1, 0.1 find V*

=0.72 (0.8)[0 + 0.9*1] = .72 (0.1)[0 + 0.9*0] = .00 (0.1)[0 + 0.9*0] = .00 Other directions consist of .1 and .1 take the max = .72

Explain Minimax

A state-space search tree where players alternate turns and each node has a computed minimax value (best achievable utility against a rational (optimal) adversary maximizes states under agent's control, minimize states under opponent's control -you want best for yourself, while opponent wants worst for you -def value(state): --if state is a terminal state, return state's utility --if the next agent is MAX, return max(state) --if the next agent is MIN, return min(state) terminal states: V(s) = known non-terminal states for agent: V(s) = maxV(s') max where s' is an element of the successor of s -def max(state): --initialize v = -inf --for each successor of state: --- v = max(v, min-value(successor)) --return v non-terminal states for opponent: V(s) = minV(s') max where s' is an element of the successor of s -def min(state): --initialize v = +inf --for each successor of state: --- v = min(v, max-value(successor)) --return v

What is a constraint graph?

A way to model constraints of a problem Can be drawn in two ways: -using only variables (only works for binary constraints) -using variables as circle nodes and rectangle nodes as constraints (can work for n-ary constraints)

Explain a state space graph. tree?

AKA Search Graph It's a mathematical representation of search problem -each state occurs only once (for graph, for a tree can be repeats) -nodes are abstracted world configurations -arcs represent successors (action results) -goal test is a set of goal nodes (maybe only one) -full graph is too big to build in memory, but is a useful idea tree: a "what if" tree of plans and their outcomes -root = start state -children = successors -nodes = states but correspond to PLANS to achieve those states -for most problems, can never build full tree as well notes: both are graphs, just special structure for trees and a time vs space tradeoff

Define in your own words the following terms: agent, agent function, agent program, rationality, autonomy, reflex agent, model-based agent, goal-based agent, utility-based agent, learning agent.

Agent: An algorithmic entity capable of displaying intelligent-like behavior. Agent function: a mapping from input-sequences to actions defining the behavior of an agent. Agent program: physical program implementing or approximating an agent function. Rationality: the behavior of maximizing one's own reward or performance. Reflex agent: agent only capable of considering its current perception of the world. Model-based agent: an agent that attempts to internalize aspects of the world through an approximating model. Goal-based agent: agent whose performance measure does not directly depend on local actions but on some (potentially) distant goal. Utility-based agent: agent whose performance measure is given by a utility function that determines which states are preferable and which are not on a continuous or many-valued scale. Learning agent: An agent whose performance can improve with experience.

Prove the Optimality of A* Search (Tree)

Assume: A = optimal goal node B = suboptimal goal node h(x) = admissible heuristic **** -needs to be admissible Claim: A will exit the fringe before B Proof: -Imagine B is on the fringe -Some ancestor n of A is on the fringe too (possibly A) -Claim: n will be expanded before B -1. f(n) is less that or equal to f(A) ---f(n) = g(n) + h(n) <-f-cost ---f(n) <= g(A) <- admissibility of h ---g(A) = f(A) <- h=0 at a goal -2. f(A) is less than f(B) ---g(A) < g(B) <- B is suboptimal ---f(A) < f(B) <- h=0 at a goal -3. n expands before B ---f(n) <= f(A) < f(B) -All ancestors of A expand before B -A expands before B -Therefore, A* search is optimal

Prove the Optimality of A* Search (Graph)

Assume: n' = successor of n a = action Claim: if h(n) is consistent, then the values of f(n) along any path are nondecreasing g(n') = g(n) + c(n,a,n') f(n') = g(n') + h(n') =(g(n) + c(n,a,n')) + h(n') >= g(n) + h(n) = f(n) Prove: whenever A* selects a node n for expansion, the optimal path to that node has been found Proof: -new possible problem: some n on path to G* isn't in queue when we need it, because some worse n' for the same state dequeued and expanded first -take the highest such n in tree -Let p be the ancestor of n that was on the queue when n' was popped -f(p) < f(n) because of consistency f(n) < f(n') because n' is suboptimal -p would have been expanded before n' -contradiction!

Which search algorithms are optimal is step costs are all identical (3)?

BFS IDS BS note: UCS is always optimal

Which search algorithms are complete if b is finite (4)?

BFS UCS -also complete if step costs >= e for positive e IDS BS -also complete if both directions use BFS

What is a binary CSP? binary constraint graph?

Binary CSP: each constraint relates (at most) two variables Binary Constraint Graph: nodes are variables, arcs show constraints

Describe feature representation

Can describe state using a vector of features (properties) ex. distance to closest ghost, number of ghosts, distance to closest dot, etc. - can also describe a q-state with features Q-function can be written using a few weights: V(s) = wf(s) + wf(s) + ... + wf(s) <- as w and f increment Q(s, a) = wf(s, a) + wf(s,a) +...+ wf(s,a)

What is used for structure in CSPs?

Can have independent subproblems -ex. Tasmania in Australia map coloring any solution for main problem + solution for subproblem = solution for whole problem

Describe Expectimax

Compute the average score under optimal play (a probabilistic model) -Used when there are unpredictable opponents, explicit randomness, and/or when actions can fail -max nodes = same as minimax -chance nodes are similar to min nodes, but the outcome is uncertain --therefore we calculate their expected utilities values now represent average case instead of worst case outcomes (minimax) def value(state): -if state is a terminal state, return state's utility -if the next agent is MAX, return max(state) -if the next agent is EXP, return exp(state) def max(state) -initialize v = +inf -for each successor of state: --v = max(v, value(successor)) return v def exp(state, alpha, beta) -initialize v = 0 -for each successor of state: --p = probability(successor) --v += p*value(successor) return v

Probabilistic Inference

Computes a desired probability from other known probabilities with new evidence, probabilities can change -> beliefs are updated

Define consistent

Consistency is when the heuristic "arc" cost <= the actual cost for each arc - f value along path should never decrease h(n) <= c(n, n') + h(n') c(n, n') = inclusive of each step to get from n to n' (n,x)+(x,g)+(g,n') h(A) <= cost (A to B) + h(B) d(A,G) <= c(A,B) + d(B,G) consistent = monotone all consistent heuristics are admissible (but not the opposite)

What is backtracking a variant of? Why is it advantageous?

DFS -space complexity is even better O(m) instead of O(bm)

What is used when depth is unlimited/infinite?

DLS -sets a predetermined depth limit l Optimal: if l > depth, no Complete: if l < depth, no time & space is same as DFS but l instead of m

What is DAG

Directed acyclic graph

What's one solution to reward preference determination?

Discounting

What is used for uncertain outcomes?

Expectimax search

What can be used with mixed layer types? (ex. layer 1 wants max, later 2 wants min, layer 3 is random)

Expectiminimax

MEU Principle

Explores the question, do utilities exist? -Theorem states that if preferences are rational, there are utilities that exist

Forward vs. Viterbi

Forward Alg computes sum of paths, Viterbi computes best paths

What is used for filtering in CSPs?

Forward Checking -cross off values that violate a constraint when if forward checking is implemented, no need to check is a new assignment is valid (would be redundant) Constraint Propagation e

If an agent has more knowledge than the current state, considers the future, and uses said knowledge to support decisions, which type is it?

Goal-Based

Write pseudocode agent programs for the goal-based and utility-based agents.

Goal: function GOAL-BASED-AGENT(percept) returns an action persistent: -state, the agent's current conception of the world state -goal, a description of what the agent would like to achieve -rules, a set of condition-action rules -action, the most recent action, initially none state ← UPDATE-STATE (state, action, percept, goal) rule ← RULE-MATCH (state, rules, goal) action ← rule.ACTION return action Utility: function UTILITY-BASED-AGENT(percept) returns an action persistent: -state, the agent's current conception of the world state -possible states, possible states that may maximize happiness -rules, a set of condition-action rules -action, the most recent action, initially none state ← UPDATE-STATE (state, action, percept, possible states) rule ← RULE-MATCH (state, rules, possible states) action ← rule.ACTION return action

What is Direct Evaluation?

Goal: compute values for each state under policy (pi) Idea: Average together observed sample values Compute using DE Problems: - wastes info about state connections - takes long time - each state is learned sep Good: - easy to understand - it doesn't need T or R - eventually is correct

Explain heuristics as a lattice

Heuristics form a semi-lattice -the bottom is 0 (also known as a trivial heuristic) -top is the exact heuristic h(n) = max(ha(n), hb(n))

What are the sensors vs actuators of a Human? Robot? Software?

Human: S: eyes, ears, skin, etc. A: hands, legs, vocal tract, etc. Robot: S: cameras, infrared ranger A: motors Software: S: keystrokes, files, network packets A: screen display, write files, sending network packets

Conditional Independence

If P[A | B] = P[A] or P[B | A] = P[B] X is conditionally independent of Y | Z iff for all x, y ,z: P(x, y|z) = P(x|z)P(y|z)

What is Q learning?

Learn the Q values, using an exponential moving average kind of approach converges to optimal policy even if acting suboptimally

Describe exponential moving average

Makes more recent samples more important and forgets about past (distant past values were wrong anyway) Decreasing learning rate (alpha) can give converging averages x = ((x + (1-alpha)) * (x + (1-alpha)^2 + ...) / 1+ (1- alpha) + (1-alpha)^2 Useful because most recent values are more important Running average requires us to store a lot of data (need last x values) Also sensitive to outliers/anomalies This, you only need to current (useful in RL cause there may be millions of states) Alpha = constant

What are the variables, domain, and constraints of the Australia map coloring problem? N-Queens?

Map: V = territories D = colors C = implicit - color of one territory cannot equal color of an adjacent territory explicit - (X, Y) has elements {(red, green), (red, blue)...} N-Queens: V = Square (Xij) D = Filled or Unfilled (0/1) C: cannot be same column, row, or diagonal as another queen

Markov Model vs Bayesian Network vs. Markov Network

Markov Model: - one variable - too simple, not useful - HMM = one variable, states are hidden -- medium complexity, wide applicability Bayesian Network: - many variables, DAG - medium complexity, wide applicability Markov Network: - many variables, undirected - Markov Random Field - Too complex, useful is only specific scenarios

If the agent can handle partial observability and knows its past, which type is it?

Model-Based

Will an inadmissible heuristic change completeness?

No

Can you prune with expectimax?

No the full tree needs to explore since probabilities of each child is used

Does money behave as a utility function?

No, but we can talk about the utility of having money (or being in dept)

What effect does multi-agents have on utilities?

Nothing really changes

Time complexity of value iteration

O(S^2 * A) -goes to nS^2*A for n = # of times for each iteration

Time and space of bidirectional search

O(b^(d/2))

memory required for minimax with alpha-beta pruning

O(bd) time = O(b^(m/2))

Time of tree-structured CSP

O(n*d^2) better than O(d^n) of general CSPs -d^n because domain of each variable needs to be checked? explanation: any tree with n nodes has n-1 arcs, so it can be made arc consistent in O(n) steps, each of which must compare up to d possible domain values for two variable for a total time of O(nd^2) -the two compared variables are parent/current

what is the run time for efficient tree-structured CSP

O(n*d^2) n = # variables d = max size of any variable's domain

runtime for the algorithm for tree-structured CSPs

O(n*d^2) n = number of variables d- max size of domain

Objectivist vs subjectivist probabilities

Objectivist = frequentist answer -averages over repeated experiments -assertion about how future experiments will go -new evidence changes the reference class -makes one think of inherently random events Subjectivist = bayesian answer -degrees of belief about unobserved variables -often learn probabilities from past experiences -new evidence updates beliefs

Define optimal vs complete

Optimal: will find best solution/least cost path Complete: will find a solution if one exists

What are the axioms of rationality? (5)

Orderability (Trichotomy): -given 2 outcomes, can see they must be orderable (have and order of preferences) (A>B) v (B>A) v (A~B) Transitivity: - be able to make a choice (A>B) ^ (B>C) -> (A>C) Continuity: A>B>C -> for each p [p, A; 1-p, c] ~ B Substitutability: A~B-> [p, A; 1-p, C] ~ [p, B; 1-p, C] Monotonicity: -if prefer A over B, should prefer a probability that prefers A over B A>B -> (p >= q <-> [p, A; 1-p, B] =>(fancy >) [q, A; 1-q, B])

Find joint distributions

P(A, B) or P(A ^ B) or P(A and B) the probability of event A given event B multiplied by the probability of event B P(A and B) = P(A given B) * P(B)

Antonio has an exciting soccer game coming up. In recent years, it has rained only 5 days each year in the city where they live. The weather-person has predicted rain for that day. When it actually rains, she correctly forecasts rain 90% of the time, when it doesn't rain, she incorrectly forecasts rain 10% of the time. What is the probability that it will rain on the day of Antonio's soccer game

P(A1) = 5/365 = 0.0136 -rains 5 days out of the year P(A2) = 360/365 = 0.986 -doesn't rain 360 days out of the year P(B|A1) = 0.9 - when it rain, weather-person predicts rain P(B|A2) = 0.1 - when it does not rain, weather-person predicts rain we want to know P(A1|B) = (P(A1)P(B|A1)) / ((P(A1)P(B|A1) + P(A2)P(B|A2)) = (0.014 * 0.9) / (0.014*0.9 + 0.986*0.1) =0.111

Bayes Rule

P(A|B) = P(B|A)P(A)/P(B) describes the probability of an event, based on prior knowledge of conditions that might be related to the event

Joint distribution of an HMM

P(X, E, X, E, X, E) = P(X1) P(E1, X1) P(X2, X1) P(E2|X2) P(X3 | X2) P(E3 | X3)

Joint distribution of a markov model

P(X1, X2, X3, X4) = P(X1) P(X2 | X1) P(X3 | X2) P(X4 | X3) more generally = P(X1) P(X2 | X1) P(Xt | Xt-1) =P(X1) for t = 2 P(Xt | Xt-1)

Factor a joint distribution over two variables

P(x,y) = P(x|y) P(y) = P(y|x)P(x) dividing, we get: P(x|y) = (P(y|x)/P(y))* P(x) Important because: - allows us to build one conditional from its reverse

Mini-Forward Algorithm

P(x1) = known P(xt) = sum of xt-1 P(xt-1, xt) =sum P(xt | xt-1) P(xt-1)

The Chain Rule

P(x1, x2, x3...) = P(x1) P(x2 | x1) P(x3 | x1, x2) = incremental product of P(xi | x1, ..., xi-1)

The Product Rule

P(y)P(x|y) = P(x,y) = P(x|y) = P(x,y) / P(y)

Task Environment Types (4) + auction example

PEAS / EPSA / CPOD P: performance measures E: environment A: actuators (actions) S: sensors (sensory options) used to explain the environment and have a good encapsulation of the task environment CPOD = context, purpose, observe, do Ex. Auction P: cost, quality, value, necessity of the item E: auctioneer, bidders, items to be bided A: speakers, microphones, display items S: camera, price monitor, eyes, ears

For each of the following activities, give a PEAS description of the task environment and characterize it in terms of the properties listed - Playing soccer. - Playing a tennis match.

Playing soccer. P- Win/Lose E- Soccer field A- Legs, Head, Upper body S- Eyes, Ears. partially observable, multiagent, stochastic, sequential, dynamic, continuous, unknown Playing a tennis match. P- Win/Lose E- Tennis court A- Tennis racquet, Legs S- Eyes, Ears. partially observable, multiagent, stochastic, sequential, dynamic, continuous, unknown

Prior vs. posterior distribution

Prior: P(T) Posterior: P(T|s) <- calculated using Bayes' rule

What are the bellman equations?

Q* and V* used for optimal utility bellman equation characterize optimal values while value iteration computes them

Main Idea of Reinforcement Learning

Receive feedback in the form of rewards Agent's utility is defined by the reward function Must (learn to) act so as to maximize expected rewards All learning is based on observed samples of outcomes Similar to MDP, but we don't know T or R - must try actions and states to figure them out - MDP is offline while RL is online learning

Normalization Trick

Select the joint probabilities matching the evidence Normalize the selection (sum to one) -compute z = sum over all entries -divide every entry by Z ex. T W P ---> P h s 20 .4 h r 5 .1 c s 10 .2 c r 15 .3 Z = 50 20/50 = .4 5/50 = .1

How are MDPs and RL different/similar?

Similar to MDP, but we don't know T or R - must try actions and states to figure them out - MDP is offline while RL is online learning - MDP is model-based while RL is model-free They both are looking for policy (pi)

If the agent only knows the current state, which type is it?

Simple Reflex

What are formalizations for adversarial search?

States: S (start at s0) Players (1, 2+) Actions (may depend on player / state) Transition Function (SxA -> S) -used to be successor function Terminal Test (S-> {t, f} Terminal Utilities (SxP -> R Policy (S->A) -policy = solution for a player

Lookup Table consists of...

Sum T where t = 1of Abs(P)^t where P is the set of possible percepts and T is the lifetime of the agent (total number of percepts it will receive)

3 Forms of Learning

Supervised: given training, learn the pattern Unsupervised: No training set, just learn/observe Reinforcement Learning: No training set, but can make actions and receive feedback. Feedback is similar to a mini training episode with costs and reward

Consider the n-queens problem using the "efficient" incremental formulation (each step adds a queen). Explain why the state space has at least ^3√n! states and estimate the largest n for which exhaustive exploration is feasible. (Hint: Derive a lower bound on the branching factor by considering the maximum number of squares that a queen can attack in any column.)

The first queen can be places in any square in column 1 (n choices). The second queen can be placed in any square in column 2 except the one in column 1 and the two other squares that can still be attacked from the first column choice (n-3 choices) etc. -next would be (n-6)... thus the state space size: S >= n*(n-3)*(n-6) This leads us to: S^3 >= n*n*n*(n-3)*(n-3)*(n-3)... >=n*(n-1)*(n-2)*(n-3)... S>= ^3√n!

How do you get O(S^2*A)

This is the complexity of EACH iteration for value iteration from V(s) -has to update S times to complete entire array V (S) -to compute one value, need to multiply A times S (A*S) ---from max(of a) sum (of s') in value iteration eq ---in total it would be nS^2*A where n = number iteration "max" and summation IS A LOOP don't be fooled

Difference of UCS and A* contours

UCS expands equally in all directions A* expands mainly towards the goal but does hedge its best to ensure optimality

UCS vs. Greedy

UCS: -orders by path cost (backward cost, g(n)) -doesn't take direction into account Greedy: -orders by goal proximity (forward cost, h(n)) -doesn't take past covered distance into account

What are varieties of constraints?

Unary - single variable -ex. SA =/= green Binary - pairs of variables -ex. SA =/= WA Higher-order - 3+ variables -ex. cryptarithmetic (two + two = four example)

If an agent has additional measures for achieving a goal, which type is it?

Utility (goal = get to A from B, utility = get to A from B in less than 5 steps)

Under what conditions can an MDP be solved using standard state space search techniques (DFS, BFS, etc.)(4)?

When (i) all transitions are deterministic, (ii) all rewards are non-positive (corresponding to non-negative step costs), (iii) there is no discount, and (iv) there is at least one terminal state (the goal state)

Utility reasoning for worst-case vs. average-case

Worst-case minimax: -terminal function scale doesn't matter --it is insensitive to monotonic transformation --just want to get the order right Average-case expectimax: - we need magnitudes to be meaningful --monotonic transformations matter

Inference by Enumeration

Worst-case time and space = O(d^n) space is big issue = it will crash

Is the max of two admissible heuristics also admissible? Why or why not? What about the min of one admissible heuristic and anything else?

Yes, both are admissible therefore both values work Yes, the min of an admissible or something else is guaranteed less than the admissible value therefore still valid

What does informed search add to uninformed?

a concept of direction -heuristics (uninformed search is a structured search that adds protection against looping but still not intelligent)

What is "goal" in pseudocode persistent

a description of what the agent would like to achieve

Define heuristics

a function that estimates how close a state is to a goal -design for particular search problems should always be an underestimate lower bound can be impossible to actually reach, just needs to be smaller than true cost ex. manhattan distance (NSEW movement), euclidean distance (radial)

What is the principle of maximum expected utility/

a rational agent should choose the action that maximizes its expected utility, given its knowledge

What is "rules" in pseudocode persistent?

a set of condition-action rules rule ← RULE-MATCH (state, rules, goal)

Bayes' Nets

a technique for describing complex joint distributions (models) using simple, local distributions (conditional probabilities) - aka graphical models describing how variables interact locally = topology (graph) + local conditional probabilities

Turing Test

a test for intelligence in a computer, requiring that a human being should be unable to distinguish the machine from another human being by using the replies to questions put to both.

independent variable

a variable (often denoted by x ) whose variation does not depend on that of another. P(X, Y) = P(X) P(Y)

Consider a modified version of the vacuum environment in Exercise 2.7, in which the geography of the environment - its extent, boundaries, and obstacles - is unknown, as is the initial dirt configuration. (The agent can go Up and Down as well as Left and Right.) a. Can a simple reflex agent be perfectly rational for this environment? Explain. b. Can s simple reflex agent with a randomized agent function outperform a simple reflex agent? Design such an agent and measure its performance on several environments. c. Can you design an environment in which your randomized agent will perform very poorly? Show your results. d. Can a reflex agent with state outperform a simple reflex agent? Design such an agent and measure its performance on several environments. Can you design a rational agent of this type?

a. Because a simple reflex agent does not maintain a model about the geography and only perceives location and local dirt. When it tries to move to a location that is blocked by a wall, it will get stuck forever. b. One possible design is as follows: if (Dirty), Suck; else Randomly choose a direction to move; This simple agent works well in normal, compact environments; but needs a long time to cover all squares if the environments contain long connecting passages, such as the one in c. c. The above randomized agent will perform poorly in environments like the following one. It will need a lot of time to get through the long passage because of the random walk. d. A reflex agent with state can first explore the environment thoroughly and build a map of this environment. This agent can do much better that the simple reflex agent because it maintains the map of the environment and can choose action based on not only the current percept, but also current location inside the map

The vacuum environments in the preceding exercises have all been deterministic. Discuss possible agent programs for each of the following stochastic versions: a. Murphy's law: twenty-five percent of the time, the Suck action fails to clean the floor if it is dirty and deposits dirt onto the floor if the floor is clean. How is your agent program affected if the dirt sensor gives the wrong answer 10% of the time? b. Small children: At each time step, each clean square has a 10% chance of becoming dirty. Can you come up with a rational agent design for this case?

a. The failure of Suck action doesn't cause any problem at all as long as we replace the reflex agent's 'Suck' action by 'Suck until clean'. If the dirt sensor gives wrong answers from time to time, the agent might just leave the dirty location and maybe will clean this location when it tours back to this location again, or might stay on the location for several steps to get a more reliable measurement before leaving. Both strategies have their own advantages and disadvantages. The first one might leave a dirty location and never return. The latter might wait too long in each location. b. . In this case, the agent must keep touring and cleaning the environment forever because a cleaned square might become dirty in near future

two ways to define utilities

additive: simply x+y+z discounted: x+gamma1y+gamma2z

Define admissibility

admissible (optimistic) -heuristics slow down bad plans but never outweigh true costs inadmissible (pessimistic) -heuristics break optimality by trapping good plans on the fringe -inadmissible heuristics can still be useful (can find solutions faster by expanding fewer nodes but not necessarily optimal) a heuristic is admissible if: 0 <= h(n) <= h*(n) where h*(n) is the true cost to nearest goal

Describe the concept of a rational agent

agent = an entity that PERCEIVES the environment through SENSORS and ACTS upon the environment through ACTUATORS -rational agent = something that acts to achieve BEST EXPECTED OUTCOME - = architecture + program

How do agents, percepts, environment, and action space interact?

agent = an entity that PERCEIVES the environment through SENSORS and ACTS upon the environment through ACTUATORS -something that acts -rational agent = something that acts to achieve best expected outcome - = architecture + program percepts, environment, and action space dictate techniques for selecting rational actions percept = agents perceptual input at any given moment -percept sequence = complete history of everything an agent has perceived

Describe pruning

aka alpha-beta pruning -has no effect on minimax value computed for root -with "perfect ordering", time complexity drops to O(b^(m/2)) and doubles solvable depth if value less/greater than the current min/max, the rest doesn't need to be search -provides cost savings alpha = MAX's best option from path to root beta = MIN's best option on path to root def max(state, alpha, beta) -initialize v = -inf -for each successor of state: --v = max(v, value(successor, alpha, beta) --if v >= beta, return v --alpha = max(alpha, v) return v def min(state, alpha, beta) -initialize v = +inf -for each successor of state: --v = min(v, value(successor, alpha, beta) --if v <= alpha, return v --beta = min(beta, v) return v

Proof of implied conditional independencies

aka past variables independent of future variables given the present (i.e., if t1<t2<t3 or t1>t2>t3 then Xt1 is cond ind of Xt2|Xt3 additional explicit assumption = P(Xt | Xt-1) is same for all t t is like time - ex. the day before and the current day

how to solve linear matrix? (multiply vector by matrices?

aka solving systems of equations

What is heuristic dominance?

all values of h2 must be greater than or equal to the corresponding values of h1 ha <= hc if for all n: ha(n) >= hc(n) if they are both admissible, the higher value is better (will do less work)

g(n) is

always the exact cost of the only path to n

What is a utility?

an agent's preference functions from outcomes (states of the world) to real numbers that describe an agent's preferences -they summarize an agent's goal -any "rational" preference can be summarized as a utility function L = [p, A; (1-p),B] preference: A > B (not actual > but fancy >) indifference: A ~ B

What is policy iteration?

an alternate approach for optimal values step 1: policy evaluation --calculate utilities for fixed policy (not optimal) until convergence sum pi equation step 2: policy improvement --update policy using one-step look-ahead with resulting converged (but not optimal) utility future values pi number i+1(s) = argmax(sum of (T(s,a,s'....) step 3: repeat will be optimal and will converge faster than value iteration in some conditions

Describe the concept of an agent

an entity that PERCEIVES the environment through SENSORS and ACTS upon the environment through ACTUATORS -something that acts

Enforcing arc consistency of a CSP

arc consistency = for an arc (X,Y), every x in the tail, there is some y in the head which could be assigned without violating a constraint if X loses a value, neighbors of X need to be rechecked early detection for failure (earlier than forward checking because it checks farther than simply the next nodes) -can be run as a preprocessor or after each assignment **remember: delete from the tail Even after enforcing arc consistency, can have 1 solution, multiple solutions, or no solution (and not know it) -still runs inside backtracking

Performance measure design

better to design according to what one actually wants in the environment rather than according to how one thinks the agent should behave

What is cost sensitive search?

chooses the least cost path -BFS does shortest path in terms of action -CSS does in terms of cost -refer to UCS

What does doubly stochastic transition matrix mean?

columns sum to one

What is A* search?

combination of UCS and greedy (backward & forward cost) f(n) = g(n) + h(n) g(n) = inclusive of each step to get from n to n' c(n,x)+c(x,g)+c(g,n') -the EXACT cost of the ONLY path must be admissible or consistent there is a tradeoff between quality of estimate and work per node

What is iterative deepening?

combines DFS space with BFS time ex. run a DFS with depth limit 1, if no solution... run a DFS with depth limit 2, if no solution... etc. Time: O(b^d) Space: O(bd) is it wastefully redundant? -most work generally happens in the lowest level searched, so not so bad -when search state is large and depth unknown it is preferred

Describe value iteration

computes optimal values start with initial V(s) = 0 -no time steps left, expected reward sum of zero given V* formula, repeat until convergence -takes a while and a lot of recalculating -POLICY MAY CONVERGE LONG BEFORE VALUES DO

Define metareasoning

computing about what to compute

What is a CSP?

constrain satisfaction problem - a special subset of search problems State: - defined by variables Xi with values from a domain D (sometimes depends on i) --can be finite (colors, integers, etc.) where size d means O(d^n) complete assignments --infinite, but countable (natural numbers) --infinite and uncountable (real numbers) there is an initial state (empty assignment) Successor function (to assign a value to an unassigned variable) Goal test (current assignment is complete and satisfies all constraints) Goal Test: - a set of constraints specifying allowable combinations of values for subsets of variable --aka why it is called CSP PLANNING: Path to the goal is the important part - they have various costs and depths - use heuristics for problem-specific guidance IDENTIFICATION: the goal itself is important, not the path - CSPs are specialized for identification problems

Off-policy learning

converges to optimal policy even if acting suboptimally

How to find a cutset?

cutset conditioning: instantiate (in all ways) a set of variables such that the remaining constraint graph is a tree -ex. mainland Aus = cutset with SA as main variable --instantiate SA as red, green, and blue in separate branches --solve cutset of size c gives runtime: O((d^c)(n-c)*d^2) -have to try each of the d^c combinations of values for the variables in set -for each combination, we must solve a tree problem of size n-c

Model-Free learning

don't learn T and R but learn Q and V values directly ex. Passive RL Active RL Q-Learning

Describe expectiminimax

environment is an extra "random agent" player that moves are each min/max agent each node computes the appropriate combination of its children

What is Passive RL

evaluating a policy V/Q values for given policy Goal: learn state values. basically policy evaluation

What is Active RL

evaluating a policy V/Q values for given policy Goal: learn the optimal policy/ values Full RL: optimal values like value iteration - don't know T, R - choosing actions now Tradeoff: exploration vs. exploitation

What is a zero-sum game?

ex. TicTacToe agents have opposite utilities (values on outcomes) -we think of a single value that one maximizes while the other minimizes -pure competition Utility(A) + U(B) = 0 -in other words: U(A) = -U(B)

how to find stationary distribution?

example: 1 2 3 1 .5 .4 .1 2 .2 .5 .3 4 .1 .3 .6 (xyz)(matrix) =.5x + .2y + .1z = x .4x + .5y + .3z = y .1x + .3y + .6z = z since linearly dependent, need to substitute one by x+y+z = 1 since it's also a probability vector =(11/47, 19/47, 17/47) if doubly stochastic: 1 2 3 1 .5 .4 .1 2 .3 .4 .3 4 .2 .2 .6 =(1/n, 1/n, 1/n) = (1/3, 1/3, 1/3)

Solutions to infinite utilities

finite horizon (similar to DLS) -terminate episodes after a fixed T steps -gives non-stationary policies use a gamma that is between 0 < gamma < 1 -smaller gamma means shorter horizon (short term focus) absorbing state: guaranteed that for every policy, a terminal state will eventually be reached

Running forward checking vs enforcing arc consistency?

forward checking will only check variables with constraints connected to the current variable, but arc consistency will check all other variables in the children's path

What is the difference in search algorithms?

fringe strategies! queue vs stack vs priority queue -technically priority queue can be used for all, but queue and stack are used for time savings (to avoid the log(n) overhead from priority queue)

Pseudocode for AC3

function AC3(csp) returns CSP, possibly with reduced domains -inputs: csp, a binary CSP with variables {X1, X2...} -local variables: queue, a queue of arcs, initially all the arcs in csp while queue is not empty do: -(Xi, Xj) <- Remove-First(queue) -if Remove-Inconsistent-Values(Xi, Xj) then --for each Xk in Neighbors[Xi] do ---add(Xk, Xi) to queue end function Remove-Inconsistent-Values(Xi, Xj) returns true iff succeeds -removed<-false -for each x in Domain[Xi] do --if no value in y in Domain[Xj] allows (x,y) to satisfy the constraint Xi <-> Xj ---then delete x from Domain[Xi] ---removed<- true -return removed runtime: O(n^2*d^3) but can be reduced to O(n^2*d^2) --d^2 because you never put anything back in the tree, so instead of d^3, d^2 -number of arcs in tree = n

Write pseudocode for a model-based agent

function MBAgent(percept) returns an action persistent: -state, the agents current conception of the world state -model, a description of how the next state depends on current state & action -rules, a set of condition-action rules -action, the most recent action, initially none state ← Update-State(state, action, percept, model) rule ← Rule-Match(state, rules) action ← Rule.ACTION return action update-state: responsible for creating the new internal state description

Write pseudocode for a simple reflex agent

function SRAgent(percept) returns an action persistent: -rules, a set of condition-action rules state ← Interpret-Input(percept) rule ← Rule-Match(state, rules) action ← Rule.ACTION return action interpret input: generates an abstracted description of the current state from the input rule-match: returns first rule in set of rules that matches the given state description

How to compute value of states (MDP)

fundamental operation: compute the (expectimax) value of a state value of state (s): V*(s) = maxQ*(s,a) (max of actions) -expected utility starting in s and acting optimally value of a q-state (s,a): Q*(s, a) = sumT(s, a, s')[R(s,a,s') + gammaV*(s')] (sum of s') -expected utility starting out having taken action a from state s and (thereafter) acting optimally -summation of transition probabilities * reward

Describe Hill Climbing

general idea: -start wherever and move to the best neighboring state -if no neighbors are better, quit Optimal: no Complete: no alg can get stuck on local hills or plateaus (shoulders/ flat local maxes) and not find the global max advantages: -quick -useful when you know when to use (ex. travelling salesman) simulated annealing to escape local maxes

What is policy extraction?

get policy implied by the values -actions are easier to select from q-values than values

The Forward Algorithm

help with problem 1: calculate the probability that the given model explains the given evidence sequence we are given evidence, but want to know the "belief state" B(X) = P(X | e)

Most Likely Explanation (MLE)

help with problem 2: calculate the most likely explanation state given the evidence sequence

What is the relevance of utilities of sequences?

helps determine whether the agent should have a certain preference over rewards (now vs later, etc.)

Describe Convergence

how do we know vectors are going to converge? 1. if the tree has max depth M then V of M holds the actual untruncated values 2. if the discount is less than 1, -for any state V of k and V of k-1, they can be viewed as depth k+1 expecting results in nearly identical search trees -the difference is on the bottom layer V of k+1 has actual rewards instead of zeros -the last layer is at best all Rmax -at worst Rmin -but everything is discounted by gamma^k that far out -so V of k and V of k+1 are at most gamma^k(max|R|) different -so AS K INCREASES THE VALUES CONVERGE gamma^k -> 0 as k -> infinity -so will converge

Policy evaluation

how to calculate V for fixed policy pi? 1. turn bellman equations into updates -initial = 0 k+1 = V^pi equation -Efficiency = O(S^2) --because need to compute for all S and for each S need to do summation of all S 2. without maxes, the bellman equations are a linear system -solvable without iteration

Describe Tree Decomposition

idea: create a tree-structured graph of mega-variables -each mega-variable encodes part of the original CSP --subproblems overlap to ensure consistent solutions ex. for Aus map WA, NT, SA is V1 NT, SA, Q = V2 Q, SA, NSW = V3 etc. -as you can see, variables within the mega-variables overlap each subproblem/mega-variable is solved independently then put together -works well if no subproblem is too large Can be solved in O(nd^(w+1) where w = width, d = domain, n = variables -solvable in polynomial time

What is simulated annealing?

idea: escape local maxima by allowing downhill moves -but make them rarer as time goes on -recorded as temperature (controls probability of taking downward steps)

explain (s, a, s')

in state s, take action a to state s'

Describe incremental formulation vs complete-state formulation

incremental - start with empty state, add step by step complete - start with final number (ex. 8 queens on board, but not passing the problem) then move them around

Define diameter

max number of steps needed to reach goal from any node

V*(s)=

max of actions Q*(s,a) max of actions is basically determining the policy that is why in Vpi(s) the max is unnecessary

"rational"

maximizing expected utility/maximally achieving pre-defined goals -relative to a performance measure goals = utility of outcome (only concerns what decisions are made NOT the thought process behind)

If an environment has n locations, how many states does it have?

n*(2^n)

Do deterministic actions require convergence?

no, only stochastic where all children need to be used

Examples of utility scales: normalized, micromorts, QALYs

normalized: 0 - 1 micromort: one-millionth chance of death QALY: quality-adjusted life years

How to calculate branching factor?

number of actions ^ number of agents NSEW with 2 agents = 4^2

What is a tree width for tree decomposition?

one less than the size of the largest subproblem

Dangers of optimism and pessimism in adversarial worlds?

optimism: assuming change when world is adversarial - not reach optimal pessimism: assuming worst when it is not likely - will take more time

How can we improve backtracking/CSPs (3)

ordering: -which variable should be assigned next -what order should values be tried filtering -can we detect failure early? -keeps track of domains for unassigned variables and crosses off bad options structure -can we exploit the problem structure

What is the fringe of a search tree?

partial plans under consideration (child nodes not yet in the tree) In DFS, it is kept as a LIFO stack In BFS, it is kept as a FIFO queue In UCS, it is kept as a priority queue (cumulative cost)

Policy for MDP

pi* is variable policy encapsulates entire world path from start to end (optimal path) -the policy (pi) gives an action for each state (think of the arrow grid world) policy when R(s, a, s') (when living reward is x for each time step) given pi*, agent determines it is in state s and the agent takes action given by pi*(s)

How is the expected value calculated?

probability of option 1 * option 1 + probability of option 2 * option 2 + probability of option n * option n probabilities should = 1, number of options defined by the problem remember: probabilities are NOT THE AVERAGE, they are the CALCULATED EXPECTATION

Define abstraction

process of removing detail from a representation -valid: can expand abstraction into a more detailed world -useful: if solution is easier than original problem

What is the basic nature of adversarial search?

recursion where we have to assume the opponent will also try to make the best move

What is a solution to resource limits for adversarial search/

resource limits: how deep we can see depth-limited search iterative deepening is also an option there is a tradeoff between complexity of features and complexity of computation

What is the efficiency of minimax?

same as DFS Time: O(b^m) Space: O(bm)

Stationary Distributions

satisfies: Pinf(X) = Pinf+1(X) = sumP(X|x)Pinf(x) a probability distribution that remains unchanged in the Markov chain as time progresses

What is an MDP defined by

set of states (where s is an element of S) set of actions (where a is an element of A) transition function T(s, a, s') -probability that a from s leads to s' (P(s'|s,a)) -aka model or dynamics reward function R(s, a, s') -sometimes just R(s) or R(s') start state terminal state

Backtracking for CSP

simply DFS that only considers 1 variable at a time (1) and checks constraints as you go (2) - through this, ordering can be fixed and values that conflict previous assignments will be avoided can be improved by ordering, filtering, structure

What is the state space of the toy vacuum world?

states for vacuum: 2 -right and left states for dirt: 2 -left and right 2*2^2 = 8 states technically (2*1) * 2^2 for 2 positions of 1 row, then 2 states of cleanliness for each (if larger than left and right, n*2^n)

Explain greedy search

strategy: expand node that you think is closest to the goal state -heuristic: estimate distance to nearest goal for each case common case: takes you straight to the wrong goal worst-case: a badly guided DFS that may not even find the result

Marginal Distributions

sub-tables that eliminate variables marginalization = combine collapsed rows by adding P(T,W) -> P(T) and P(W) where P(T) and P(W) are the sub-tables remember to normalize so subtables = 1

Q*(s,a) =

sum of s' T(s,a,s')[R(s,a,s')+(gamma x V*(s')) where V*(s') is the value of the destination (depending on the direction) T is the expectation (.8, .1, .1)

What is "state" in pseudocode persistent

the agent's current conception of the world state state ← UPDATE-STATE (state, action, percept, goal)

What is the value of a state?

the best achievable outcome (utility) from that state terminal states: V(s) = known non-terminal states: V(s) = maxV(s') max where s' is an element of the children of s need to go through all possibilities so depth can be very large if value stays the same, it's a better move than going down -value depends on state -new state depends on action

How does one maintain a tree or graph?

the closed list that keeps track of new nodes

Describe the concept of utility

the means by which we evaluate an agent's performance in relation to a problem - uses performance measures - the happiness of the agent

What is "action" in pseudocode persistent?

the most recent action, initially none action ← rule.ACTION

Describe stationary preference

the preference should stay the same even when both sides are multiplied by the same constant

What is constraint propagation?

the process of communicating the domain reduction of a decision variable to all of the constraints that are stated over this variable. -This process can result in more domain reductions. -These domain reductions, in turn, are communicated to the appropriate constraints. A method of inference that assigns values to variables characterizing a problem in such a way that some conditions (constraints) are satisfied Basically make sure all arcs are consistent

Conditional Probabilities

the relationship between joint and marginal probabilities P(W|T) = P(W, T) / P(T)

Filtering and Monitoring

the task of tracking the distribution over time

What happens with a fixed policy?

the tree would be simpler (one action per state) utility of state s under a fixed policy (V^pi(s)): sum of s'(T(s, pi(s), s')[R(s, pi(s), s') + gamma*V^pi(s')] -instead of a in V*, replace with pi(s) -get rid of max() because policy is given (although not necessarily optimal) CLEARER WORDS: instead of choosing an optimal action (node with multiple children - aka value iteration), there is only one child (policy iteration)

What are iterative algorithms for CSPs?

they live on the edge (there is no fringe) all variables are assigned at start and are reassigned as the search continues -should be solvable in constant time States: d^n ex. 4 queens in 4 columns = 4^4 Heuristic = min-conflicts -randomly select any conflicted variable -choose a value that violates the fewest constraints examples include hill climbing

Describe evaluation functions

they score non-terminal state in DLS ideal function: returns actual minimax value of the position in practice: typically weighted linear sum of features: Eval(s) = wf(s) + the next wf(s)... -(s) stays constant while w and f increment

What do Markov Models introduce?

time (or space) because want to reason about a sequence of observations value of X at a given time = the state

particle filtering

track samples, not all values also called sequential monte carlo

stationary distribution uses ___>

transition matrices

Describe genetic algorithms

use a natural selection metaphor -keep best N hypotheses at each step (selection) based on a fitness function -also have pairwise crossover operators with optional mutation to give variety often misused but effective in the right cases steps: start (value) fitness (number) selection (percentage) pairs (value separation) cross-over (value combination) mutation (end)

What are tools used for ordering in CSPs?

used in var<- Select-Unassigned-Variable(csp) VARIABLE ORDERING: MRV (minimum remaining values): choose the variable with the fewest legal values left in its domain --min to be more targeted and fail sooner than later --aka fail-fast ordering tie breaker for MRV = degree heuristics or most constrained variable -can still be tied after this rule VALUE ORDERING: LCV (least constraining value): given a choice of variable, choose the least constraining value -rules out fewest values/one that doesn't affect it as much to keep options open

Why are MDPs useful?

when environments are not deterministic when rules are not well known where there are more interim layers

When is IDS preferred?

when search space is large and depth is unknown

Describe discounting

when the value of rewards decay exponentially -maximized sum of rewards still desired step 1: worth 1 step 2: worth gamma step 3: worth gamma^2 **remember, it always starts with 1 1-gamma^1-gamma^2-gamma^etc. -each time you descend a level, you multiply the discount once -helps algorithm converge

Time and space complexity of DFS for search trees?

where b is the branching factor (# of successors) and m is the maximum depth Time: O(b^m) -if m is finite -also the number of nodes can access each node in O(1) time, so with branching factor of b and max depth m total nodes in worst case = 1+b+b^2+b^3+...+b^m-1 using the formula to sum a geometric sequence a((r^n)-1)/(r-1) = ((b^m) - 1)/(b-1) = O(b^m) -if have number of nodes, time complexity = O(number of nodes) Space (of fringe): O(bm) - only has siblings on path to root -m = longest path -need to store siblings of each node, so for m nodes down path you will have to store b nodes extra for each m node Optimal: no - finds leftmost solution regardless of depth or cost Complete: no - m could be infinite, so only if we prevent cycles -if DFS is finite and avoids cycles, it is better than BFS

Time and space complexity of BFS for search trees?

where b is the branching factor (# of successors) and s is the depth of the shallowest solution time and space are identical Time: O(b^s) -if m is finite -also the number of nodes Space (of fringe): O(b^s) - is roughly the last tier - greater than DFS -At the root, you expand out b nodes as the next elements in the search tree. These, if none are the solution, in turn expand out b nodes from each. This continues until a solution is found, which will be at depth s. Optimal: no - yes if s is finite Complete: only if costs are all 1

Time and space complexity of UCS?

where: -b is the branching factor (# of successors) -c* is the solution cost -e is the arc cost -c*/e is the effective depth time and space are identical Time: O(b^(1+(c*/e)))) -exponential in effective depth Space (of fringe): O(b^(1+(c*/e)))) - is roughly the last tier if all step costs are equal, O(b^(d+1)) Optimal: yes Complete: yes - if best solution has finite cost and minimum arc is positive

If there are 120 agent positions, with 30 item counts, 12 enemy positions, and directions NSEW, how many world states are there? states for pathing? states for eat all items?

world states: 120x(2^30)x(12^2)x4 -fundamental counting principle 120 * 4 * (12 * 12) : 12 for each enemy, each enemy has 12 positions * 2*2*2... = 2^30 because each 30 item counts have 2 configurations (eaten vs not eaten) pathing: 120 eat all items: 120x(2^30)

What is the Hidden Markov Models

you observe outputs (effects) at each time step instead of actual states

What is a trivial heuristic?

zero, bottom of the lattice -no useful information -still will always be admissible


Ensembles d'études connexes

Cap. 5: (5) Arquitectura morisca

View Set

AP Human Geography: Unit 7 Agriculture

View Set

7th Word Cell 1,2,3,4,5 A, B, C, D

View Set

Business & administration core exam 1

View Set

CH 3: Overview of Prenatal Development

View Set

Chapter 17 Neurological Emergencies

View Set