AI
Probability of an event X
# of possible occurrences where X holds / total # of possible occurrences
Effective branching factor (b*)
- # of successors generated by a typical node for given search problem - evaluates effectiveness of heuristic - N = total nodes expanded, d = solution depth - b* = branching factor that a tree of depth d needs to have N nodes
Three properties of logical connectives
- Commutativity: P ∨ Q ≡ Q ∨ P - Associativity: (P ∧ Q) ∧ R ≡ P ∧ (Q ∧ R) - Distributivity: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R) - Distributivity: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)
(Conjunction & Elimination), (Conjunction & Introduction)
- Conjunction & Elimination: if if P ∧ Q is true, then P and Q are true - Conjunction & Introduction: if P and Q are true, P ∧ Q is true
Geometric view of classifiers
- Examples lie in attribute space - Each attribute is a dimension - Example is a point - Classes are colors - Classifier = partition of the attribute space, i.e. dividing into subspaces
Laws: excluded middle, contrapositive, operator precedence
- Excluded Middle: either P or negation must be true, i.e. P ∨ ¬P - Contrapositive: P → Q ≡ ¬Q → ¬P - Precedence: ¬ precedes ∧ precedes ∨
Role of Knowledge Engineer
- Get knowledge and represent it in an appropriate way - Use it to derive previously unknown facts - follow chain of reasoning - must maintain a database of facts and rules that apply to them - real world problem formulation is difficult
What does it mean for an interpretation I to satisfy a sentence S? What about for I to be a model of S?
- I satisfies S if S has a true value of T under I and at least one variable assignment - I is a model of S if I satisfies S for all possible variable assignments in I
Hill Climbing Search
- Local search algorithm - Agenda = 1 - returns a state that is a local maximum
MINIMAX description (initial state, successors, terminal test, utility function)
- MAX moves first - initial state: starting position - successor function: list of (legal move, resulting state) pairs - terminal test: is the game finished? - utility function: gives numerical value of terminal states, i.e. MAX wins (+1), loses (-1), draws (0) - each player uses a search tree for next move (each level = different player, each node = state)
Neurally Inspired Computing
- NN are "neurally inspired", not brain science -NN can't explain outputs; hidden nodes and weights have no individual, explicit meaning
Games that include change
- No longer deterministic - Multiple MINIMAX values by their probabilities - Calculate EV of minimax
Overfitting
- Noise blurs boundaries between classes - Sampling (choice of training examples) leads to selection bias or rare events - Overfitting: learning a tree that models properties specific to training set
Non-adversial search (solution, heuristics, eval function)
- Solution is a set of actions for reaching a goal - Heuristics & constraint satisfaction can find optimal solution - Evaluation function = estimate of cost from start to goal through given node
Adversarial search
- Solution is a strategy specifying a move for every opponent reply - Time or other constraints may force an approximate solution - Evaluation function evaluates quality of game position
Reduced Error Pruning
- Split training set into one for learning, one for pruning - build a tree on learning set - greedily prune nodes to improve classification of pruning set
Types of sentences: symbols, negation, conjunction, disjunction, implication, equivalence
- Symbols: True, False, P - Negation: ¬P, ¬False - Conjunction: P ∧ Q - Disjunction: P ∨ Q - Implication: P → Q ≡ ¬ P ∨ Q - Equivalence: P ≡ Q ≡ P → Q ∧ Q → P
Algorithm A*
- admissible heuristic - never overestimates the distance to the closest solution - h*(n) = cost of optimal path from N to goal node - h(n) admissible if 0 ≤ h(n) ≤ h*(n)
Back propagation training
- apply the training algorithm once per perception layer (i.e. twice in a net with 1 hidden layer) - training works in opposite direction from the data flow - pushing errors back through system, assigning blame, correcting weights to reduce error - each hidden node i causes part of the error identified between input and output node j - node i's contribution to error at node j is proportion to wi, j (connection weight) - compute hidden layer error values by dividing output layer error in proportion to connection from hidden to output layer
FOPC: interpretation over a domain
- assignment of entities in D to each of the constant, variable, predicate, function symbols of a predicate calculus expression - each constant is assigned an element of D - each variable is assigned to a nonempty subset of D - each function of arity m is defined Dm -> D - each predicate of arity n is defined Dn -> {T,F}
the XOR problem
- bipolar perceptron can learn to classify data into two categories, i.e. straight line between linearly separable pairs of points - XOR doesn't have a straight line separation
Neural architecture: neuroplasticity
- brain's ability to change during life - can reorganize itself - can make new connections or change their strength
Training Perceptrons
- can be trained to compute a specific function - start with perceptron with random values for weights - if the answer is wrong, modify weights by adding or subtracting some fraction of the input vector - add to increase output, subtract to decrease - keep iterating until success with a predefined degree of error - if the unit should have fired but didn't, increase influence of inputs that are on by adding input to the weight (vice versa) - this is called the learning rate
Hardwiring Perceptrons
- can simulate basic logic gates such as AND - gives truth table for AND - a+b-1, need to set something for a and b
Classification difficulties
- classes are not observable - classes represent high level information but only low level attributes are observed - there are many attributes that individually have little correlation with classes - learning algorithms may have high computational cost
Classifier x Machine Learning (what is supervised learning?)
- classifier: a function that takes features describing an object and returns the class of that object, typically from set of predefined labels - machine learning learns classifier from a training set and applies it to problem examples - supervised learning: classes of training examples are known
Perceptron model of a neural system
- collection of units, each has: - weighted inputs from other units (weight = how much unit is affected by activity, input = degree to which other units are active) - threshold that the sum of weighted inputs must exceed - single output that connects to inputs of other units - output computed using a function that reflects how far perceptron activation is above or below 0. Usually one input is a fixed bias
Back propagation algorithm
- compute ∆ values for output units, using observed error. ∆k = (yk - hk) g' (sk), where sk = sum of weights * ai and ai=xi (input) or g(s) (otherwise) - propagate ∆ values from layer L to layer L-1 - for each node j in layer L-1: ∆j = g'(sj)sum wj,k ∆k - update weights between layer L-1 and Layer L: wi,j = wi,j + alpha ai ∆j - repeat for each example
First Order Predicate Calculus: constant symbols, variables, predicate symbols, function symbols, quantifiers
- constant symbols: stands for objects - variables: ranges over objects - predicate symbols: relationships between/properties of objects. written as propositions with arguments - function symbols: mapping between objects. written like predicates. - quantifiers: existential, universal
Decision Trees
- decompose complex problems into a sequence of simple questions - leaves = decisions, nodes = questions, branches = possible answers - classification is a decision problem - classifier should be general but fast to apply (shallow & bushy)
Iterative Deepening Search
- depth bounded search - DFS with a depth-bound - forces BFS like behavior - Start with depth cutoff of 1, expand only nodes with depth < cutoff - Increment cutoff
Information Gain (how related to mutual information?)
- difference between total entropy before and after partitioning set S into subsets Is - mutual information between attribute and class, conditioned to previous splits -G(S,Q) = H(S) - SUM(|Si|/|S| * H(Si))
FOPC: domain, n-ary relation
- domain D: non-empty set of objects, which may be related: - n-ary relation is a set of n-types of elements of D - n-ary function is a relation between n-types and objects in D
components of Propositional Calculus
- each symbol is either a proposition (basic, small unit of meaning) or a connective (combining propositions into sentences) - sentence is a syntactic unit to which all truth values can be attached
Atomic event, event, sample space, universe
- elementary / atomic event: occurrence that can't be made up of other events - event: set of atomic events - sample space or universe: set o fall possible outcomes
Goals of Knowledge Representation
- express facts or beliefs using a formal language - expressively and unambiguously - determine automatically what follows from facts correctly (soundly) and completely (tractably)
Attribute selection & entropy
- find attributes with high entropy - start with a compressing transform to reduce redundancy - variance can be used as a surrogate for entropy with numeric attributes - find features with high mutual information with classes
FOPC: functions & arity
- function: maps its arguments to fixed single value. - have an arity, i.e. # of arguments - ∀x Person(x) → Parent(Mother-of(x), x) - order matters
Proof by Resolution
- given a knowledge base (set of sentences S and interoperation I) - prove sentence A under I - show that KB entails A, i.e. a follows from KB
First choice hill climbing
- good for big problems - stochastic but randomly generates successors and picks first larger one
Expected MINIMAX value (n terminal, n max/min, n chance)
- if n is terminal, expected(n) = utility(n) - if n is max or min, expected(n) = max or min over all successors of expected(s) - if n is a chance mode, expected(n) = max or min * probability - order of values and values are important (must be scaled)
MINIMAX value of each node, n (& DFS or BFS?)
- if n is terminal, value(n) = utility(n) - if n is max, value(n) = max over all successors of value(s) - if n is min, value(n) = min over all successors of value(s) - exhaustive DFS - maximizes worst case outcome for MAX
Random restart hill climbing
- if stuck, try again from a different place - non-optimality is recognized
Greedy search (describe agenda)
- informed search - estimates cost from current node to solution to sort agenda - most greedy search has agenda = 1, i.e. only best node kept - increase agenda -> less likely to be caught in local minima
Single Layer Perceptron Networks (incl. ∆wi,j)
- input units and output units; as many perceptrons as outputs - every input is connected to every output - each connection has a weight - each output has a thresholding function - outputs are vectors - weight updating algorithm: ∆wi,j = α * xi * (yj - hj) * g' ( ∑k wk,j * xk) - α = learning rate, i.e. adjustment at each step - yj = desired node output - hj = actual node output - (yj - hj) = error in current node output - g' is derivative of activation function, adjusts error correction - ( ∑kwk,j * xk) is the value before activation at the current node - (yj - hj) * g' ( ∑kwk,j * xk) is called the ∆ value - stopping criteria E = 0.5 ∑i(yj - hj)2
Local Beam Search
- keeps track of k states - start with k randomly generated states - generate all successors of k (not agenda = k long) - pick k best successors - not == k hill-climbing algorithms - bad states weeded out - can lead to lack of diversity - can choose random k successors, biased toward good/bad
Neural networks: h = g(X*W)
- learning is achieved by adjusting weights s = sum of weights * inputs from other perceptrons (s = wi*xi) h = g(s) to other perceptrons - xi are reals [0,1] or [-1,1] - wi are real weights - wn is usually threshold with xn = 1 (bias) - s is weighted sum including threshold (activation level) - h is output - g is activation function, can be simple step, sigmoid, etc. pushes toward extremes (0,1)
Support Vector Machines (linear classifiers, maximal margin learning)
- linear classifiers: boundaries are hyperplanes - maximal margin learning: learn the hyperplane that is farthest away from making a mistake - maximize the distance to the closest (= worst) points of each class (the support vectors) - neural networks, binary classification
Prior Probability or Unconditional Probability
- probability assigned to an event in the absence of knowledge supporting its occurrence or non-occurrence -
subjective belief / interpretation
- probability that proposition A corresponds to degree of subjective belief, based on evidence available - has a strong formal basis
ID3 Find Best Question (entropy as you go down a tree, entropy of partition)
- questions that test only one attribute (boundary on one axis) - all borders are orthogonal to one axis - good for symbolic attributes - entropy of class variable c within set S measures how classified it is - probability of item belonging to c = # of items in c / # items in training set - entropy decreases down tree - entropy of partition = sum of entropies of subsets weighted by size (leaf has entropy 0) - maximize information gain
Genetic Algorithms: mutation
- randomly change a bit - mutation is a low probability of flipping a random bit at each crossover step
Inference rules: soundness, completeness
- soundness: a set of rules is sound iff every sentence it infers from a set of sentences E logically follows from E - completeness: a set of inference rules is complete iff it can infer every expression that logically follows from a set of sentences
Success Application of Unifiers, Composition of Unifiers
- success application of unifiers: "result of applying unifiers to Term": P(x,y) {A/x} {B/y} evaluates to P(A, B) - composition of unifiers: can combine unifiers as long as there are no contradictory assignments: {A/x}{B/y} combine to {A/x, B/y}
FOPC: terms, sentences
- terms: correspond with things in the world - sentences: statements that can be true or false - atomic sentence: predicate symbol of arity n followed by n terms
Multilayer NN
- two layers of values (one input and one output) with weighted connections in between - network of three layers of values with a linear output layer can approximate any function - one hidden layer of perceptrons - each unit connected to every unit in the next layer, each connection has a weight - each unit in hidden layer has an output modifier function - too many units = overfit, inefficiency. too few = can't learn - input layer: both values connected to both hidden units
testing trained neural network
- want it to generalize - divide training sets and perform cross validation - reserve 10% of data for testing and 90% for training - compare predicted values - 10-fold cross validation: partition data in 10 subsets, train on 9, test on 1, repeat.
Entropy equation
- weight logarithm by probability px of each event x from set X of all possible events - entropy: sum over possible outcomes of the probability of a particular outcome * log of the probability - H(X) = - SUM(px * log2(px)) - measures # of bits to encode a particular piece of information
Steps of CNF
0. Rewrite implications as OR 1. Minimize scope of negations (distribute them out) 2. Rewrite double negations 3. Standardize variables: rename all quantified variables so each quantifier has a different name 4. Skolemise all existential quantifiers, i.e. replace ∃x.P(x) ⇒ P(A) or ∀x.∃y.P(x, y) ⇒ ∀x.P(x, F(x)) 5. Drop all universal quantifiers 6. Convert sentences into CNF, i.e. a conjunction of disjunctions of atomic sentences 7. Split top level conjunction into set of disjunctions 8. Standardize variables apart 9. Resolution Proof
Three rules of probability
1. 0 ≤ P(A) ≤ 1 2. P(A∨B) = P(A) + P(B) - P(A∧B) 3. P(True) = 1
Unification: 1. P(x,y) unified with P(A,B) 2. P(x,y) unified with Q(A,B) 3. P(F(x)) unified with P(F(A)) 4. P(F(x), x, u, u) unified with P(F(y), z, z, A)
1. P(x,y) & P(A,B) => P(A,B) unifier {A/x, B/y} 2. P(x,y) & Q(A,B) => No unifier because P ≠ Q 3. P(F(x)) & P(F(A)) => P(F(A)) unified {A/x} 4.P(F(x), x, u, u) & P(F(y), z, z, A) => P(F(A), A, A, A) unified {x/y, x/z, x/u, A/x}
Search Algorithm
1. Takes problem as input (description of world, set of possible actions, set of goals) 2. Thinking phase: returns a solution as action sequence 3. Execution phase: carries out actions
What does a heuristic do in an informed search?
A heuristic provides extra knowledge, returning a number describing the desirability of expanding a node. - Typically evaluate the estimated cost of a solution, try to minimize - Incorporate estimate of cost from state to closest goal state
Simple Reflex Agents
Agent senses its environment -> builds a model of what the world is like now -> condition-action rules -> chooses action -> does action
Interpretation of a set of sentences
Assignment of truth value to each propositional sentence and therefore each sentence in the set
4 Categories of AI: Thinking Humanly
Automation of activities associated with human thinking, decision making, etc; "machines with minds". Theory of human mind expressed as a computer program
Goal based agent
Combine goal information with information about best possible actions to achieve a goal. May need search and planning to find action sequences that achieve the agent's goals. Less efficient but more flexible; can't guarantee best sequence
Compare computers and brains?
Computers are better at: symbolic calculations, instructions. Brains better at: language, learning, perceiving.
Conjunction is like ____, disjunction is like ____
Conjunction = multiplication Disjunction = addition with max value of 1
Environments: Dynamic vs. Semidynamic vs. Static
Dynamic: environment may change while agent is deliberating. Semidynamic: agent's utility score changes as time passes though environment is static (timed game)
What is a good attribute?
Easy: class is an attribute Hard: no attributes at all - any attribute that carries a lot of info related to class
Environments: episodic vs. sequential
Episodic: experiences divided into episodes, agent perceives -> acts. Each episode is independent. Don't have to think ahead.
How to evaluate an agent's action?
Evaluated in respect to an objective performance measure, which depends on what the agent is designed to receive. "Rational" depends on agent's knowledge, perception, actions.
4 Categories of AI: Acting Rationally
Explain intelligent behavior in terms of computational processing; automation of intelligent behavior. AI is the study of rational agents
Environments: fully vs. partially observable
Fully: Sensors allow the agent to perceive the complete state of the environment. Don't have to keep track of the world state. (Effectively observable: sensors detect all relevant aspects).
Classification goal
Given a set of examples described by several attributes and a set of classes, label each example with a class
Communication problem
How to compress a signal, i.e. a stream of independent random variables X? (X = coin toss) - quantify if info in signal = minimum bit rate to transmit it, i..e average # of bits/second
Generalized Resolution
If (P ∨ Q) is true and ( R ∨ ¬Q ) is true, then either P or R is true - ( P ∨ Q ) ∧ ( R ∨ ¬Q ) → ( P ∨ R )
Unit Resolution
If P ∨ Q ) is true and Q is false, then P is true - ( P ∨ Q ) ∧ ¬Q → P
Modus Ponens, Modus Tollens
MP: if we know P implies Q, if P, then Q, i.e. (P∧(P→Q))→Q MT: given that P implies Q and Q is false, P is false, i.e. (¬Q∧(P→Q))→¬P
Bayesian Classifiers
Maximum likelihood learning for each class C, learn probability distribution that maximizes the sum ∏p(m|C) for all m in the training class belonging to C - uses Bayes rule p(C|m) = p(C)p(m|C)/p(m) - scales linearly with # of classes - for each point m, find the class C that maximizes p(C|m)
Mutual information (independent, correlated)
Measure how much information in two random variables is shared - 0 if two variables are independent - maximal if one variable is a function of another - I(X;Y) = H(X) + H(Y) - H(X,Y)
Von Neumann Architecture
Memory/Input/Output <-> Bus <-> CPU
For an ideal rational agent, what is behavior based on?
Own experience + built in knowledge. Autonomous because the behavior is determined by its own experience; can adapt. Must be able to learn.
Conditional or posterior probability
P(Event | Event1) means probability of Event given that Event1 has also happened
Bayes' Rule
P(M|S) = P(S|M)P(M)/P(S)
Real world environment characteristics
Partially observable, stochastic, sequential, dynamic, continuous, multi agent
Types of games (matrix)
Perfect information vs. Imperfect information Deterministic vs. Chance
4 Categories of AI: Acting Humanly
Perform functions that require intelligence when performed by people; Turing Test
Agent Description: PEAS
Performance measure: how well an agent does what it is designed to do (external measure) Environment: domain in which it operates & interacts Actuators: means by which it acts upon its environment Sensors: means by which an agent perceives its environment
Search
Process of considering action sequences within a problem formulation. Create action sequences using an agenda to expand the search space by applying operators to present state.
Unification
Relationship(x,y) and Relationship(Maya, Michael) is unified to Relationship(Maya, Michael) with a unifier set of {Maya/x, Michael/y} - if either term is a variable, let it be identical to the other - if the functors don't match, it fails - resulting list is the Most General Unifier
Atomic sentences (statement, property, relation)
Statement: Socrates-Is-A-Man Property(Object): isMan(Socrates) Relation(Ob1, Ob2): Occupation(Socrates, Philosopher)
Attribute types: symbolic v. numeric
Symbolic: finite, discrete valued, no order or distance, only Boolean comparison, can be enumerated, use logic to combine different attributes Numeric: real or integer valued, can be measured, use algebra to combine different attributes, can create new attribute
4 Categories of AI (list)
Thinking Humanly, Acting Humanly, Thinking Rationally, Acting Rationally
Resolution Refutation
To prove S, - add negation of S to KB - see if it leads to contradiction - uses law of excluded middle - if ¬S is inconsistent with KB, then KB ⊧ S
Uninformed vs. informed strategy
Uninformed: use only information in the problem formulation, vs. Informed: use a quality measure or a heuristic not explicitly in the problem
Universal Elimination, Universal Introduction
Universal Elimination: if ∀ x P(x) is true, P(a) is true for all constants a Universal Introduction: if P(a) is true for all constants a, then ∀ x P(x) is true
Time and space complexity: what is b, d, m
b: maximum branching factor d: depth of least cost solution m: maximum depth of state space
Joint Probability Distribution
enumeration of the probabilities for all the possible combinations of the joint outcomes of the random variables - take the cross product of the variables' domains - co-occurrence, not cause - marginal probabilities: sum across rows of columns
Local search solutions can be thought of in terms of ____ or ____
heuristic cost: find minimum cost / global minimum objective function: find maximum utility / global maximum
A set of sentences, E, is inconsistent...
iff it is not satisfiable
A sentence is valid...
iff it is satisfiable for all possible interpretations
A set of sentences, E, is satisfiable...
iff there is at least one interpretation and variable assignment that satisfies every S in E
A sentence is satisfiable if...
iff there is at least one interpretation and variable assignment that satisfies it
De Morgan's Laws
¬(P ∨ Q) ≡ ¬P ∧ ¬Q ¬(P ∧ Q) ≡ ¬P ∨ ¬Q
Components of Logical Calculus
• Formal language: words and syntactic rules that tell us how to build up sentences, including semantic mapping to tell us what words mean • Inference procedure to compute which sentences are valid inferences from other sentences
Environments: deterministic vs. stochastic vs. strategic
Deterministic: next state completely determined by the current state and the agent's action Stochastic: there is uncertainty. Strategic: deterministic except for actions of other agents.
Environment: discrete vs. continuous
Discrete: limited # of distinct, clearly defined percepts and actions
AB Pruning Algorithm (explanation, time, branching factor)
- Considers [a, b] range of values - discards tree when (a > b) - value = min(make move state, a, b) - if value > best then best = a, value - if value ≥ b then return (null, value) - if value > a then a = value return best - time = O(b^m/2), branching factor is square root of b
Local Search
- No goal test; goal state itself is solution - State: keep a current state and try to improve it - Usually constant memory - Local bc don't care about path, only goal
What questions to ask for information gain? (for symbolic, numerical)
- Symbolic attributes: branches for different values - Numerical: choose a threshold t and ask, is value < t? - best threshold between two training examples of different classes - try all candidates with midpoint between opposite training examples
Algorithm A
- UCS + Greedy Search - f(n) = g(n) + h(n) - g(n) = cost so far, h(n) = estimated remaining
Rule Pruning (what is precision?)
- tree is a set of logical rules - each path is AND of rules and OR between different paths - convert the tree into rules and greedily prune the rules Precision: fraction of points of the correct class / points picked by the rule - don't need to split set into two
Neural Architecture: neurons, dendrites, axons, synapses
- brain doesn't have CPU; distributive processing - neurons: simple, parallel, async units (a single cell) - dendrites: short fibres, receivers - axon: long fibre, transmitters - synapse: connection between axons and dendrites
Stochastic hill climbing
- choose randomly between available uphill moves (choose from uniform or non-uniform distribution) - converges slowly but sometimes finds better solutions
Greedy local search (incl. ridges, plateau, foothills)
- grabs best neighbor without thinking ahead - rapid improvement in heuristic - local minima/maxima stops search (no best neighbor) - ridges: series of local maxima - plateau: uninformative heuristic values - foothills: local minima
Genetic Algorithms
- local beam search - combine 2 parent to create successor states - start with population, each state is binary string - evaluation/fitness function assesses quality of state - selection (parents), crossover, mutation (random) - every bit doubles search space
Incomplete Trees
- minimax requires too many leaf node evaluations - cut off search with CUTOFF-TEST - use evaluation function instead of utility function - can introduce fixed or dynamic depth limit
MINIMAX time, space
- mutual recursion - time: O(b^m), exponential in depth of tree - space: O(m), linear in depth of tree
Neural Architecture: signal emitted
- neurons emit chemicals (neurotransmitters) that move across the synapse and change the electric potential applied to the cell body - when the potential reaches a threshold, an electrical pulse (action potential) travels down axon, which releases neurotransmitters
Simulated Annealing Search
- probabilistic technique for approximating global optimum of a given function - hill climbing; only improves on the current situation. Not complete; may get stuck - moves randomly from state to state - escapes maxima by allowing some bad moves - decreases frequency of bad moves as 'temperature' drops - like shaking a ball
Probability and signal compression
- probability distribution of the signal represents what is known before the transmission (what doesn't depend on samples) - bit rates chosen before transmission - more uncertain distribution, more information in content (i.e. if coin is double sided, no info to transmit) - probs are multiplicative, information is additive - information content depends on logarithm of probability distribution - more bits for rare events, fewer for common events
Heuristic Evaluation (requirements)
- provide estimate of expected utility of the game from a given position - performance depends highly on EVAL function - requirements: -> computation can't take long, EVAL should value nodes in same order of UTILITY, EVAL correlated with winning
Multiplayer games
- replaces single zero-sum utility with a function for each player - no longer zero-sum - use a vector of values, one for each player - each player acts like a max player but maximize different things
Error rate with information encoding
- result is probabilistic - for bit rate b > H(X) and error rate e > 0, there is an encoder with bit rate b and error rate e - H(X) = 0.4 means words of 10 sample can be encoded with 4 bits
Neurons as a device
- slow but all are active simultaneously (massive parallelism) - maybe: densely connected networks transmitting simple signals
ID3 algorithm
- takes a training set and builds a decision tree such that each leaf is homogenous - root to leaves - greedy; no backtracking (not optimal) - if all points in training_set have same class C, return leaf(C) - else, if no questions remaining, r return majority_class(training_set) - else find best question, split training set on question, return tree(question, ID3(set_1), ID3(set_2)...)
Genetic Algorithms: crossover (roulette, tournamnet, select pairs)
- which genetic method comes from each - select pairs - roulette wheel: pick a point to cut, swap, keep good aspects - tournament: rank selection from a random subset of population
Genetic Algorithms: selection
After you encode states and calculate fitness, select pairs for reproduction - If parents are good, children may be good (fitness function) - generate random states and calculate probability of selection
Agent = Architecture + Program
Agent program is a function that implements the agent mapping (behavior from percept sequence to action). Architecture is the computing device that makes percepts available to the program, runs the program, and feeds the program's actions to the actuators.
Reflex Agents with State
Agents maintain an internal state / memory. Helps to distinguish between world states with same perceptual input but requiring different action. Agent senses -> updates model of world (understand new percepts, keep track of unseen parts, understand consequences of actions) -> condition action rules -> chooses action -> does action
What is an agent? How are they important in AI? What is the environment, the sensors, the actuators?
An agent is anything that (a) perceives its environment through sensors and (b) acts upon the environment through actuators. AI - the study & construction of rational agents. Environment: world / domain which the agent lives. Sensors: input for agent Actuators: output generated by agent
Problem formulation - knowledge representation (what is a goal? what are actions?)
Analysis of possible states & actions. Includes knowledge representation, i.e. what/how to represent. Includes: initial state, goal state/test, operators/actions, effect on state, path cost (Goals: set of world states in which goal is satisfied. Actions: cause transactions between world states)
The Imitation Game / Turing Test*
Can't differentiate between a person and a computer talking. Computer must understand & generate language, know/reason about the world, learn about the dialogue, combine all knowledge
Environment: multiagent vs. single agent
Competitive: maximizing A minimizes B's performance. Cooperative: A and B's performance measures move in same direction
Evaluate DFS with the four criteria
Complete: no, fails in infinite depth / loops Time: O(b^m), bad if m > d Space: O(bm), linear Optimal: no Pros: may be faster at exploring small portion of space Cons: could go down wrong path
Evaluate UCS with four criteria
Complete: yes Time: # of nodes with path cost less than optimal Space: # of nodes with path cost less than optimal Optimal: if step cost > 0 First solution reported is the cheapest
Evaluate IDS with four criteria
Complete: yes Time: O(b^d) Space: O(bd) Optimal: if step cost = 1 Pros: optimal in path length, modest memory requirement Cons: states expanded multiple times
Evaluate BFS with the four criteria
Complete: yes, all nodes examined. Time: O(b^d) Space: O(b^d), all nodes Optimal: if cost = 1 Drawback: memory requirements
Strategy is evaluated along (4)
Completeness: does it find solution Time complexity: # of steps Space complexity: max # of nodes Optimality: does it always find least cost solution?
4 Categories of AI: Thinking Rationally
Computations that make it possible to perceive, reason, act; computational models. Emphasis on correct inferences and formal knowledge
What is a rational agent?
Does the right thing in context, i.e. the agent's knowledge of the world
Uniform Cost Search / Best First Search
Queueing Fn: removes current node, adds its children, sorts agenda, stores cost, priority queue - Construct search tree by picking node with highest utility and expanding from there. - Guaranteed to find solution if exists - Finds highest utility action sequence
BFS
Queuing Fn: removes current node. Appends its children to back of agenda, FIFO queue. - Guaranteed to find solution / shortest action sequences if it exists.
DFS
Queuing-Fn: removes current node from the front, appends its children to the front, LIFO stack. - Fails when tree is infinitely deep (cycle, continuous values). Depends on order in which nodes are expanded.
Search Tree
Search space represented as tree: actions = edges, states = nodes
State space landscape
State has: - Location given by values of state variables - Elevation given by heuristic cost of objective value
How should an ideal rational agent act?
They should do what is expected to maximize its performance measure, on the basis of percept sequence and built in knowledge. Percept sequence in context of background knowledge of the world.
Utility based agent (what is utility?)
Utility: measure of preference over world states. Agents maximize utility, i.e. when there are conflicting goals or several ways to reach a goal or uncertainty
Weak vs. Strong AI
Weak AI: machines can be made to act as if they were intelligent (exists) Strong AI: machines have real, conscious minds and act intelligent