AI

Ace your homework & exams now with Quizwiz!

Probability of an event X

# of possible occurrences where X holds / total # of possible occurrences

Effective branching factor (b*)

- # of successors generated by a typical node for given search problem - evaluates effectiveness of heuristic - N = total nodes expanded, d = solution depth - b* = branching factor that a tree of depth d needs to have N nodes

Three properties of logical connectives

- Commutativity: P ∨ Q ≡ Q ∨ P - Associativity: (P ∧ Q) ∧ R ≡ P ∧ (Q ∧ R) - Distributivity: P ∨ (Q ∧ R) ≡ (P ∨ Q) ∧ (P ∨ R) - Distributivity: P ∧ (Q ∨ R) ≡ (P ∧ Q) ∨ (P ∧ R)

(Conjunction & Elimination), (Conjunction & Introduction)

- Conjunction & Elimination: if if P ∧ Q is true, then P and Q are true - Conjunction & Introduction: if P and Q are true, P ∧ Q is true

Geometric view of classifiers

- Examples lie in attribute space - Each attribute is a dimension - Example is a point - Classes are colors - Classifier = partition of the attribute space, i.e. dividing into subspaces

Laws: excluded middle, contrapositive, operator precedence

- Excluded Middle: either P or negation must be true, i.e. P ∨ ¬P - Contrapositive: P → Q ≡ ¬Q → ¬P - Precedence: ¬ precedes ∧ precedes ∨

Role of Knowledge Engineer

- Get knowledge and represent it in an appropriate way - Use it to derive previously unknown facts - follow chain of reasoning - must maintain a database of facts and rules that apply to them - real world problem formulation is difficult

What does it mean for an interpretation I to satisfy a sentence S? What about for I to be a model of S?

- I satisfies S if S has a true value of T under I and at least one variable assignment - I is a model of S if I satisfies S for all possible variable assignments in I

Hill Climbing Search

- Local search algorithm - Agenda = 1 - returns a state that is a local maximum

MINIMAX description (initial state, successors, terminal test, utility function)

- MAX moves first - initial state: starting position - successor function: list of (legal move, resulting state) pairs - terminal test: is the game finished? - utility function: gives numerical value of terminal states, i.e. MAX wins (+1), loses (-1), draws (0) - each player uses a search tree for next move (each level = different player, each node = state)

Neurally Inspired Computing

- NN are "neurally inspired", not brain science -NN can't explain outputs; hidden nodes and weights have no individual, explicit meaning

Games that include change

- No longer deterministic - Multiple MINIMAX values by their probabilities - Calculate EV of minimax

Overfitting

- Noise blurs boundaries between classes - Sampling (choice of training examples) leads to selection bias or rare events - Overfitting: learning a tree that models properties specific to training set

Non-adversial search (solution, heuristics, eval function)

- Solution is a set of actions for reaching a goal - Heuristics & constraint satisfaction can find optimal solution - Evaluation function = estimate of cost from start to goal through given node

Adversarial search

- Solution is a strategy specifying a move for every opponent reply - Time or other constraints may force an approximate solution - Evaluation function evaluates quality of game position

Reduced Error Pruning

- Split training set into one for learning, one for pruning - build a tree on learning set - greedily prune nodes to improve classification of pruning set

Types of sentences: symbols, negation, conjunction, disjunction, implication, equivalence

- Symbols: True, False, P - Negation: ¬P, ¬False - Conjunction: P ∧ Q - Disjunction: P ∨ Q - Implication: P → Q ≡ ¬ P ∨ Q - Equivalence: P ≡ Q ≡ P → Q ∧ Q → P

Algorithm A*

- admissible heuristic - never overestimates the distance to the closest solution - h*(n) = cost of optimal path from N to goal node - h(n) admissible if 0 ≤ h(n) ≤ h*(n)

Back propagation training

- apply the training algorithm once per perception layer (i.e. twice in a net with 1 hidden layer) - training works in opposite direction from the data flow - pushing errors back through system, assigning blame, correcting weights to reduce error - each hidden node i causes part of the error identified between input and output node j - node i's contribution to error at node j is proportion to wi, j (connection weight) - compute hidden layer error values by dividing output layer error in proportion to connection from hidden to output layer

FOPC: interpretation over a domain

- assignment of entities in D to each of the constant, variable, predicate, function symbols of a predicate calculus expression - each constant is assigned an element of D - each variable is assigned to a nonempty subset of D - each function of arity m is defined Dm -> D - each predicate of arity n is defined Dn -> {T,F}

the XOR problem

- bipolar perceptron can learn to classify data into two categories, i.e. straight line between linearly separable pairs of points - XOR doesn't have a straight line separation

Neural architecture: neuroplasticity

- brain's ability to change during life - can reorganize itself - can make new connections or change their strength

Training Perceptrons

- can be trained to compute a specific function - start with perceptron with random values for weights - if the answer is wrong, modify weights by adding or subtracting some fraction of the input vector - add to increase output, subtract to decrease - keep iterating until success with a predefined degree of error - if the unit should have fired but didn't, increase influence of inputs that are on by adding input to the weight (vice versa) - this is called the learning rate

Hardwiring Perceptrons

- can simulate basic logic gates such as AND - gives truth table for AND - a+b-1, need to set something for a and b

Classification difficulties

- classes are not observable - classes represent high level information but only low level attributes are observed - there are many attributes that individually have little correlation with classes - learning algorithms may have high computational cost

Classifier x Machine Learning (what is supervised learning?)

- classifier: a function that takes features describing an object and returns the class of that object, typically from set of predefined labels - machine learning learns classifier from a training set and applies it to problem examples - supervised learning: classes of training examples are known

Perceptron model of a neural system

- collection of units, each has: - weighted inputs from other units (weight = how much unit is affected by activity, input = degree to which other units are active) - threshold that the sum of weighted inputs must exceed - single output that connects to inputs of other units - output computed using a function that reflects how far perceptron activation is above or below 0. Usually one input is a fixed bias

Back propagation algorithm

- compute ∆ values for output units, using observed error. ∆k = (yk - hk) g' (sk), where sk = sum of weights * ai and ai=xi (input) or g(s) (otherwise) - propagate ∆ values from layer L to layer L-1 - for each node j in layer L-1: ∆j = g'(sj)sum wj,k ∆k - update weights between layer L-1 and Layer L: wi,j = wi,j + alpha ai ∆j - repeat for each example

First Order Predicate Calculus: constant symbols, variables, predicate symbols, function symbols, quantifiers

- constant symbols: stands for objects - variables: ranges over objects - predicate symbols: relationships between/properties of objects. written as propositions with arguments - function symbols: mapping between objects. written like predicates. - quantifiers: existential, universal

Decision Trees

- decompose complex problems into a sequence of simple questions - leaves = decisions, nodes = questions, branches = possible answers - classification is a decision problem - classifier should be general but fast to apply (shallow & bushy)

Iterative Deepening Search

- depth bounded search - DFS with a depth-bound - forces BFS like behavior - Start with depth cutoff of 1, expand only nodes with depth < cutoff - Increment cutoff

Information Gain (how related to mutual information?)

- difference between total entropy before and after partitioning set S into subsets Is - mutual information between attribute and class, conditioned to previous splits -G(S,Q) = H(S) - SUM(|Si|/|S| * H(Si))

FOPC: domain, n-ary relation

- domain D: non-empty set of objects, which may be related: - n-ary relation is a set of n-types of elements of D - n-ary function is a relation between n-types and objects in D

components of Propositional Calculus

- each symbol is either a proposition (basic, small unit of meaning) or a connective (combining propositions into sentences) - sentence is a syntactic unit to which all truth values can be attached

Atomic event, event, sample space, universe

- elementary / atomic event: occurrence that can't be made up of other events - event: set of atomic events - sample space or universe: set o fall possible outcomes

Goals of Knowledge Representation

- express facts or beliefs using a formal language - expressively and unambiguously - determine automatically what follows from facts correctly (soundly) and completely (tractably)

Attribute selection & entropy

- find attributes with high entropy - start with a compressing transform to reduce redundancy - variance can be used as a surrogate for entropy with numeric attributes - find features with high mutual information with classes

FOPC: functions & arity

- function: maps its arguments to fixed single value. - have an arity, i.e. # of arguments - ∀x Person(x) → Parent(Mother-of(x), x) - order matters

Proof by Resolution

- given a knowledge base (set of sentences S and interoperation I) - prove sentence A under I - show that KB entails A, i.e. a follows from KB

First choice hill climbing

- good for big problems - stochastic but randomly generates successors and picks first larger one

Expected MINIMAX value (n terminal, n max/min, n chance)

- if n is terminal, expected(n) = utility(n) - if n is max or min, expected(n) = max or min over all successors of expected(s) - if n is a chance mode, expected(n) = max or min * probability - order of values and values are important (must be scaled)

MINIMAX value of each node, n (& DFS or BFS?)

- if n is terminal, value(n) = utility(n) - if n is max, value(n) = max over all successors of value(s) - if n is min, value(n) = min over all successors of value(s) - exhaustive DFS - maximizes worst case outcome for MAX

Random restart hill climbing

- if stuck, try again from a different place - non-optimality is recognized

Greedy search (describe agenda)

- informed search - estimates cost from current node to solution to sort agenda - most greedy search has agenda = 1, i.e. only best node kept - increase agenda -> less likely to be caught in local minima

Single Layer Perceptron Networks (incl. ∆wi,j)

- input units and output units; as many perceptrons as outputs - every input is connected to every output - each connection has a weight - each output has a thresholding function - outputs are vectors - weight updating algorithm: ∆wi,j = α * xi * (yj - hj) * g' ( ∑k wk,j * xk) - α = learning rate, i.e. adjustment at each step - yj = desired node output - hj = actual node output - (yj - hj) = error in current node output - g' is derivative of activation function, adjusts error correction - ( ∑kwk,j * xk) is the value before activation at the current node - (yj - hj) * g' ( ∑kwk,j * xk) is called the ∆ value - stopping criteria E = 0.5 ∑i(yj - hj)2

Local Beam Search

- keeps track of k states - start with k randomly generated states - generate all successors of k (not agenda = k long) - pick k best successors - not == k hill-climbing algorithms - bad states weeded out - can lead to lack of diversity - can choose random k successors, biased toward good/bad

Neural networks: h = g(X*W)

- learning is achieved by adjusting weights s = sum of weights * inputs from other perceptrons (s = wi*xi) h = g(s) to other perceptrons - xi are reals [0,1] or [-1,1] - wi are real weights - wn is usually threshold with xn = 1 (bias) - s is weighted sum including threshold (activation level) - h is output - g is activation function, can be simple step, sigmoid, etc. pushes toward extremes (0,1)

Support Vector Machines (linear classifiers, maximal margin learning)

- linear classifiers: boundaries are hyperplanes - maximal margin learning: learn the hyperplane that is farthest away from making a mistake - maximize the distance to the closest (= worst) points of each class (the support vectors) - neural networks, binary classification

Prior Probability or Unconditional Probability

- probability assigned to an event in the absence of knowledge supporting its occurrence or non-occurrence -

subjective belief / interpretation

- probability that proposition A corresponds to degree of subjective belief, based on evidence available - has a strong formal basis

ID3 Find Best Question (entropy as you go down a tree, entropy of partition)

- questions that test only one attribute (boundary on one axis) - all borders are orthogonal to one axis - good for symbolic attributes - entropy of class variable c within set S measures how classified it is - probability of item belonging to c = # of items in c / # items in training set - entropy decreases down tree - entropy of partition = sum of entropies of subsets weighted by size (leaf has entropy 0) - maximize information gain

Genetic Algorithms: mutation

- randomly change a bit - mutation is a low probability of flipping a random bit at each crossover step

Inference rules: soundness, completeness

- soundness: a set of rules is sound iff every sentence it infers from a set of sentences E logically follows from E - completeness: a set of inference rules is complete iff it can infer every expression that logically follows from a set of sentences

Success Application of Unifiers, Composition of Unifiers

- success application of unifiers: "result of applying unifiers to Term": P(x,y) {A/x} {B/y} evaluates to P(A, B) - composition of unifiers: can combine unifiers as long as there are no contradictory assignments: {A/x}{B/y} combine to {A/x, B/y}

FOPC: terms, sentences

- terms: correspond with things in the world - sentences: statements that can be true or false - atomic sentence: predicate symbol of arity n followed by n terms

Multilayer NN

- two layers of values (one input and one output) with weighted connections in between - network of three layers of values with a linear output layer can approximate any function - one hidden layer of perceptrons - each unit connected to every unit in the next layer, each connection has a weight - each unit in hidden layer has an output modifier function - too many units = overfit, inefficiency. too few = can't learn - input layer: both values connected to both hidden units

testing trained neural network

- want it to generalize - divide training sets and perform cross validation - reserve 10% of data for testing and 90% for training - compare predicted values - 10-fold cross validation: partition data in 10 subsets, train on 9, test on 1, repeat.

Entropy equation

- weight logarithm by probability px of each event x from set X of all possible events - entropy: sum over possible outcomes of the probability of a particular outcome * log of the probability - H(X) = - SUM(px * log2(px)) - measures # of bits to encode a particular piece of information

Steps of CNF

0. Rewrite implications as OR 1. Minimize scope of negations (distribute them out) 2. Rewrite double negations 3. Standardize variables: rename all quantified variables so each quantifier has a different name 4. Skolemise all existential quantifiers, i.e. replace ∃x.P(x) ⇒ P(A) or ∀x.∃y.P(x, y) ⇒ ∀x.P(x, F(x)) 5. Drop all universal quantifiers 6. Convert sentences into CNF, i.e. a conjunction of disjunctions of atomic sentences 7. Split top level conjunction into set of disjunctions 8. Standardize variables apart 9. Resolution Proof

Three rules of probability

1. 0 ≤ P(A) ≤ 1 2. P(A∨B) = P(A) + P(B) - P(A∧B) 3. P(True) = 1

Unification: 1. P(x,y) unified with P(A,B) 2. P(x,y) unified with Q(A,B) 3. P(F(x)) unified with P(F(A)) 4. P(F(x), x, u, u) unified with P(F(y), z, z, A)

1. P(x,y) & P(A,B) => P(A,B) unifier {A/x, B/y} 2. P(x,y) & Q(A,B) => No unifier because P ≠ Q 3. P(F(x)) & P(F(A)) => P(F(A)) unified {A/x} 4.P(F(x), x, u, u) & P(F(y), z, z, A) => P(F(A), A, A, A) unified {x/y, x/z, x/u, A/x}

Search Algorithm

1. Takes problem as input (description of world, set of possible actions, set of goals) 2. Thinking phase: returns a solution as action sequence 3. Execution phase: carries out actions

What does a heuristic do in an informed search?

A heuristic provides extra knowledge, returning a number describing the desirability of expanding a node. - Typically evaluate the estimated cost of a solution, try to minimize - Incorporate estimate of cost from state to closest goal state

Simple Reflex Agents

Agent senses its environment -> builds a model of what the world is like now -> condition-action rules -> chooses action -> does action

Interpretation of a set of sentences

Assignment of truth value to each propositional sentence and therefore each sentence in the set

4 Categories of AI: Thinking Humanly

Automation of activities associated with human thinking, decision making, etc; "machines with minds". Theory of human mind expressed as a computer program

Goal based agent

Combine goal information with information about best possible actions to achieve a goal. May need search and planning to find action sequences that achieve the agent's goals. Less efficient but more flexible; can't guarantee best sequence

Compare computers and brains?

Computers are better at: symbolic calculations, instructions. Brains better at: language, learning, perceiving.

Conjunction is like ____, disjunction is like ____

Conjunction = multiplication Disjunction = addition with max value of 1

Environments: Dynamic vs. Semidynamic vs. Static

Dynamic: environment may change while agent is deliberating. Semidynamic: agent's utility score changes as time passes though environment is static (timed game)

What is a good attribute?

Easy: class is an attribute Hard: no attributes at all - any attribute that carries a lot of info related to class

Environments: episodic vs. sequential

Episodic: experiences divided into episodes, agent perceives -> acts. Each episode is independent. Don't have to think ahead.

How to evaluate an agent's action?

Evaluated in respect to an objective performance measure, which depends on what the agent is designed to receive. "Rational" depends on agent's knowledge, perception, actions.

4 Categories of AI: Acting Rationally

Explain intelligent behavior in terms of computational processing; automation of intelligent behavior. AI is the study of rational agents

Environments: fully vs. partially observable

Fully: Sensors allow the agent to perceive the complete state of the environment. Don't have to keep track of the world state. (Effectively observable: sensors detect all relevant aspects).

Classification goal

Given a set of examples described by several attributes and a set of classes, label each example with a class

Communication problem

How to compress a signal, i.e. a stream of independent random variables X? (X = coin toss) - quantify if info in signal = minimum bit rate to transmit it, i..e average # of bits/second

Generalized Resolution

If (P ∨ Q) is true and ( R ∨ ¬Q ) is true, then either P or R is true - ( P ∨ Q ) ∧ ( R ∨ ¬Q ) → ( P ∨ R )

Unit Resolution

If P ∨ Q ) is true and Q is false, then P is true - ( P ∨ Q ) ∧ ¬Q → P

Modus Ponens, Modus Tollens

MP: if we know P implies Q, if P, then Q, i.e. (P∧(P→Q))→Q MT: given that P implies Q and Q is false, P is false, i.e. (¬Q∧(P→Q))→¬P

Bayesian Classifiers

Maximum likelihood learning for each class C, learn probability distribution that maximizes the sum ∏p(m|C) for all m in the training class belonging to C - uses Bayes rule p(C|m) = p(C)p(m|C)/p(m) - scales linearly with # of classes - for each point m, find the class C that maximizes p(C|m)

Mutual information (independent, correlated)

Measure how much information in two random variables is shared - 0 if two variables are independent - maximal if one variable is a function of another - I(X;Y) = H(X) + H(Y) - H(X,Y)

Von Neumann Architecture

Memory/Input/Output <-> Bus <-> CPU

For an ideal rational agent, what is behavior based on?

Own experience + built in knowledge. Autonomous because the behavior is determined by its own experience; can adapt. Must be able to learn.

Conditional or posterior probability

P(Event | Event1) means probability of Event given that Event1 has also happened

Bayes' Rule

P(M|S) = P(S|M)P(M)/P(S)

Real world environment characteristics

Partially observable, stochastic, sequential, dynamic, continuous, multi agent

Types of games (matrix)

Perfect information vs. Imperfect information Deterministic vs. Chance

4 Categories of AI: Acting Humanly

Perform functions that require intelligence when performed by people; Turing Test

Agent Description: PEAS

Performance measure: how well an agent does what it is designed to do (external measure) Environment: domain in which it operates & interacts Actuators: means by which it acts upon its environment Sensors: means by which an agent perceives its environment

Search

Process of considering action sequences within a problem formulation. Create action sequences using an agenda to expand the search space by applying operators to present state.

Unification

Relationship(x,y) and Relationship(Maya, Michael) is unified to Relationship(Maya, Michael) with a unifier set of {Maya/x, Michael/y} - if either term is a variable, let it be identical to the other - if the functors don't match, it fails - resulting list is the Most General Unifier

Atomic sentences (statement, property, relation)

Statement: Socrates-Is-A-Man Property(Object): isMan(Socrates) Relation(Ob1, Ob2): Occupation(Socrates, Philosopher)

Attribute types: symbolic v. numeric

Symbolic: finite, discrete valued, no order or distance, only Boolean comparison, can be enumerated, use logic to combine different attributes Numeric: real or integer valued, can be measured, use algebra to combine different attributes, can create new attribute

4 Categories of AI (list)

Thinking Humanly, Acting Humanly, Thinking Rationally, Acting Rationally

Resolution Refutation

To prove S, - add negation of S to KB - see if it leads to contradiction - uses law of excluded middle - if ¬S is inconsistent with KB, then KB ⊧ S

Uninformed vs. informed strategy

Uninformed: use only information in the problem formulation, vs. Informed: use a quality measure or a heuristic not explicitly in the problem

Universal Elimination, Universal Introduction

Universal Elimination: if ∀ x P(x) is true, P(a) is true for all constants a Universal Introduction: if P(a) is true for all constants a, then ∀ x P(x) is true

Time and space complexity: what is b, d, m

b: maximum branching factor d: depth of least cost solution m: maximum depth of state space

Joint Probability Distribution

enumeration of the probabilities for all the possible combinations of the joint outcomes of the random variables - take the cross product of the variables' domains - co-occurrence, not cause - marginal probabilities: sum across rows of columns

Local search solutions can be thought of in terms of ____ or ____

heuristic cost: find minimum cost / global minimum objective function: find maximum utility / global maximum

A set of sentences, E, is inconsistent...

iff it is not satisfiable

A sentence is valid...

iff it is satisfiable for all possible interpretations

A set of sentences, E, is satisfiable...

iff there is at least one interpretation and variable assignment that satisfies every S in E

A sentence is satisfiable if...

iff there is at least one interpretation and variable assignment that satisfies it

De Morgan's Laws

¬(P ∨ Q) ≡ ¬P ∧ ¬Q ¬(P ∧ Q) ≡ ¬P ∨ ¬Q

Components of Logical Calculus

• Formal language: words and syntactic rules that tell us how to build up sentences, including semantic mapping to tell us what words mean • Inference procedure to compute which sentences are valid inferences from other sentences

Environments: deterministic vs. stochastic vs. strategic

Deterministic: next state completely determined by the current state and the agent's action Stochastic: there is uncertainty. Strategic: deterministic except for actions of other agents.

Environment: discrete vs. continuous

Discrete: limited # of distinct, clearly defined percepts and actions

AB Pruning Algorithm (explanation, time, branching factor)

- Considers [a, b] range of values - discards tree when (a > b) - value = min(make move state, a, b) - if value > best then best = a, value - if value ≥ b then return (null, value) - if value > a then a = value return best - time = O(b^m/2), branching factor is square root of b

Local Search

- No goal test; goal state itself is solution - State: keep a current state and try to improve it - Usually constant memory - Local bc don't care about path, only goal

What questions to ask for information gain? (for symbolic, numerical)

- Symbolic attributes: branches for different values - Numerical: choose a threshold t and ask, is value < t? - best threshold between two training examples of different classes - try all candidates with midpoint between opposite training examples

Algorithm A

- UCS + Greedy Search - f(n) = g(n) + h(n) - g(n) = cost so far, h(n) = estimated remaining

Rule Pruning (what is precision?)

- tree is a set of logical rules - each path is AND of rules and OR between different paths - convert the tree into rules and greedily prune the rules Precision: fraction of points of the correct class / points picked by the rule - don't need to split set into two

Neural Architecture: neurons, dendrites, axons, synapses

- brain doesn't have CPU; distributive processing - neurons: simple, parallel, async units (a single cell) - dendrites: short fibres, receivers - axon: long fibre, transmitters - synapse: connection between axons and dendrites

Stochastic hill climbing

- choose randomly between available uphill moves (choose from uniform or non-uniform distribution) - converges slowly but sometimes finds better solutions

Greedy local search (incl. ridges, plateau, foothills)

- grabs best neighbor without thinking ahead - rapid improvement in heuristic - local minima/maxima stops search (no best neighbor) - ridges: series of local maxima - plateau: uninformative heuristic values - foothills: local minima

Genetic Algorithms

- local beam search - combine 2 parent to create successor states - start with population, each state is binary string - evaluation/fitness function assesses quality of state - selection (parents), crossover, mutation (random) - every bit doubles search space

Incomplete Trees

- minimax requires too many leaf node evaluations - cut off search with CUTOFF-TEST - use evaluation function instead of utility function - can introduce fixed or dynamic depth limit

MINIMAX time, space

- mutual recursion - time: O(b^m), exponential in depth of tree - space: O(m), linear in depth of tree

Neural Architecture: signal emitted

- neurons emit chemicals (neurotransmitters) that move across the synapse and change the electric potential applied to the cell body - when the potential reaches a threshold, an electrical pulse (action potential) travels down axon, which releases neurotransmitters

Simulated Annealing Search

- probabilistic technique for approximating global optimum of a given function - hill climbing; only improves on the current situation. Not complete; may get stuck - moves randomly from state to state - escapes maxima by allowing some bad moves - decreases frequency of bad moves as 'temperature' drops - like shaking a ball

Probability and signal compression

- probability distribution of the signal represents what is known before the transmission (what doesn't depend on samples) - bit rates chosen before transmission - more uncertain distribution, more information in content (i.e. if coin is double sided, no info to transmit) - probs are multiplicative, information is additive - information content depends on logarithm of probability distribution - more bits for rare events, fewer for common events

Heuristic Evaluation (requirements)

- provide estimate of expected utility of the game from a given position - performance depends highly on EVAL function - requirements: -> computation can't take long, EVAL should value nodes in same order of UTILITY, EVAL correlated with winning

Multiplayer games

- replaces single zero-sum utility with a function for each player - no longer zero-sum - use a vector of values, one for each player - each player acts like a max player but maximize different things

Error rate with information encoding

- result is probabilistic - for bit rate b > H(X) and error rate e > 0, there is an encoder with bit rate b and error rate e - H(X) = 0.4 means words of 10 sample can be encoded with 4 bits

Neurons as a device

- slow but all are active simultaneously (massive parallelism) - maybe: densely connected networks transmitting simple signals

ID3 algorithm

- takes a training set and builds a decision tree such that each leaf is homogenous - root to leaves - greedy; no backtracking (not optimal) - if all points in training_set have same class C, return leaf(C) - else, if no questions remaining, r return majority_class(training_set) - else find best question, split training set on question, return tree(question, ID3(set_1), ID3(set_2)...)

Genetic Algorithms: crossover (roulette, tournamnet, select pairs)

- which genetic method comes from each - select pairs - roulette wheel: pick a point to cut, swap, keep good aspects - tournament: rank selection from a random subset of population

Genetic Algorithms: selection

After you encode states and calculate fitness, select pairs for reproduction - If parents are good, children may be good (fitness function) - generate random states and calculate probability of selection

Agent = Architecture + Program

Agent program is a function that implements the agent mapping (behavior from percept sequence to action). Architecture is the computing device that makes percepts available to the program, runs the program, and feeds the program's actions to the actuators.

Reflex Agents with State

Agents maintain an internal state / memory. Helps to distinguish between world states with same perceptual input but requiring different action. Agent senses -> updates model of world (understand new percepts, keep track of unseen parts, understand consequences of actions) -> condition action rules -> chooses action -> does action

What is an agent? How are they important in AI? What is the environment, the sensors, the actuators?

An agent is anything that (a) perceives its environment through sensors and (b) acts upon the environment through actuators. AI - the study & construction of rational agents. Environment: world / domain which the agent lives. Sensors: input for agent Actuators: output generated by agent

Problem formulation - knowledge representation (what is a goal? what are actions?)

Analysis of possible states & actions. Includes knowledge representation, i.e. what/how to represent. Includes: initial state, goal state/test, operators/actions, effect on state, path cost (Goals: set of world states in which goal is satisfied. Actions: cause transactions between world states)

The Imitation Game / Turing Test*

Can't differentiate between a person and a computer talking. Computer must understand & generate language, know/reason about the world, learn about the dialogue, combine all knowledge

Environment: multiagent vs. single agent

Competitive: maximizing A minimizes B's performance. Cooperative: A and B's performance measures move in same direction

Evaluate DFS with the four criteria

Complete: no, fails in infinite depth / loops Time: O(b^m), bad if m > d Space: O(bm), linear Optimal: no Pros: may be faster at exploring small portion of space Cons: could go down wrong path

Evaluate UCS with four criteria

Complete: yes Time: # of nodes with path cost less than optimal Space: # of nodes with path cost less than optimal Optimal: if step cost > 0 First solution reported is the cheapest

Evaluate IDS with four criteria

Complete: yes Time: O(b^d) Space: O(bd) Optimal: if step cost = 1 Pros: optimal in path length, modest memory requirement Cons: states expanded multiple times

Evaluate BFS with the four criteria

Complete: yes, all nodes examined. Time: O(b^d) Space: O(b^d), all nodes Optimal: if cost = 1 Drawback: memory requirements

Strategy is evaluated along (4)

Completeness: does it find solution Time complexity: # of steps Space complexity: max # of nodes Optimality: does it always find least cost solution?

4 Categories of AI: Thinking Rationally

Computations that make it possible to perceive, reason, act; computational models. Emphasis on correct inferences and formal knowledge

What is a rational agent?

Does the right thing in context, i.e. the agent's knowledge of the world

Uniform Cost Search / Best First Search

Queueing Fn: removes current node, adds its children, sorts agenda, stores cost, priority queue - Construct search tree by picking node with highest utility and expanding from there. - Guaranteed to find solution if exists - Finds highest utility action sequence

BFS

Queuing Fn: removes current node. Appends its children to back of agenda, FIFO queue. - Guaranteed to find solution / shortest action sequences if it exists.

DFS

Queuing-Fn: removes current node from the front, appends its children to the front, LIFO stack. - Fails when tree is infinitely deep (cycle, continuous values). Depends on order in which nodes are expanded.

Search Tree

Search space represented as tree: actions = edges, states = nodes

State space landscape

State has: - Location given by values of state variables - Elevation given by heuristic cost of objective value

How should an ideal rational agent act?

They should do what is expected to maximize its performance measure, on the basis of percept sequence and built in knowledge. Percept sequence in context of background knowledge of the world.

Utility based agent (what is utility?)

Utility: measure of preference over world states. Agents maximize utility, i.e. when there are conflicting goals or several ways to reach a goal or uncertainty

Weak vs. Strong AI

Weak AI: machines can be made to act as if they were intelligent (exists) Strong AI: machines have real, conscious minds and act intelligent


Related study sets

Chapter 20: Crisis and Mass Disaster

View Set

Chapter 14: Substance Use and Addiction Disorders - Combined (Townsend)

View Set

4.06 "The Love Song of J.Alfred Prufrock"

View Set