Cogsci 200 Final

Ace your homework & exams now with Quizwiz!

the expected utility maximization framework

"rational choice theory" -at heart of economics, important in psych, soc, polisci, phil, comp sci, and cog sci

implications of computational theory of mind

-Philosophy: implication that we have solved mind-body problem -Biology and neuroscience: implication that the fundamental function of neurons is computation

historic achievements in Al using "search"

-1956- theorem proving -1997- chess: deep blue -2015- texas hold'em poker -2016-Go: AlphaGo

the inverse optics problem

-2d pattern on the retina --> 3d representation of scene - P(scene|retinal patterns) =[ P(retinal patters|scene) P(scene)]/ P(retinal patterns)

the first serious design for a computing machine

-Charles Babbage: "Difference Engine" -designed in 1820s, constructed after his death

approx # of atoms in observable universe

10^80

Turings result #1: formalizing the minimal essence of computation

-a function is "computable" if and only if there is a corresponding machine in this class that computes the function -also called Church-Turing thesis

digital circuit design theory

-a key link from mathematical theory to physical machines -developed by Claude Shannon in 1937

multiple realizability claim

-a single algorithm can be realized (aka implemented) in multiple distinct physical substrates -if all mentation is computation --> mental states and processes are also multiply realizable --> a single mental state or process can be implemented in multiple different physical substrates

why do we believe Turings result #1?

-bc every formalism proposed as the basis of computation since 1936 has been shown to be mathematically equivalent to Turing machines in computational power -bc large classes of mathematical/logical functions have been proven computable by Turing machines -bc incr complex computational functions cont to be implemented on physical computing machines that are essentially mathematiclly equiv to Turing machine

why believe Turings claim #2?

-bc he formally constructed one in 1936-->existence of such machine is a mathematical truth

why does result #1 matter for cog sci?

-bc we have no alternative to computation as the explanation for how neurons (or any machine) can implement the complex functions of vision, perception, cognition, decision making, learning, motor control, etc

what the computational TOM claim actually is

-brains and (some) computers embody intelligence for some of the same reasons -they both embody abstract, general principles of computation -they both compute -ALL MENTATION IS COMPUTATION

Alan Turing

-british mathematician -deepest theoretical results about computation -1930s -key part in secret effort that lead to the breaking of germanys "enigma machine" coding scheme in WWII

how computational theory of mind claim is misunderstood

-claim is NOT that modern computer is metaphor for mind and brain -not the same as the "despised" computer metaphor

all reward is internally generated --Singh, Lewis, &Barto

-classic reinforcement frame work (environment, critic, agent) vs revised framework (external enviro, internal enviro, critic, RL agent) that is better for understanding biological organisms and can learn the control of internal processes as well as external actions

Gary Kasparov vs Deep Blue

-ex of the effectiveness of search-based algorithms -deeper search = higher chess rating

ENIAC

-first general purpose electronic digital computer (1948) -Arthur Burks one of og designers of it

two aspects of reinforcement learning

-functional: what is the problem being solved by reinforcement (reward-based) learning? -algorithmic: what algorithms solve this problem?

humor is rewarding

-humor activates subcortical reward system -modulates the mesolimbic reward centers

when we try to increase the power of the turing machine..

-if we allow it to jump to any cell on the tape instead of moving one cell at a time--> NO -if we incr the set of symbols, like instead of 0 and 1s how about 100000+-->NO -if we give it additional read/write heads, so that it can operate in parallel-->NO -if we allow it to operate nondeterministically or probabilistically aka next state it enters is not deterministically specified --> NO

limitations of fixed stimulus-response mappings

-internal search is only useful if you have already learned something about the world and how it works -wont help rat in a maze

the palindrome algorithm

-it is realized (aka implemented) in the physical states of the Mac, PC, and Golden Mac, but the algorithm is not identical to any of its realizers

properties of a computational procedure

-it maps one set of symbols into another set of symbols--aka it calculates a functions -it is finitely specifiable -its execution doesnt itself require "intelligence"

what does it mean to learn how to behave?

-it means to learn state-->action mappings AKA a function -such mappings/functions are called policies -learning how to behave can be thought of as search in a space of possible policies -must mix external and internal search (exploration)

function

-maps each member of one set of symbols to a (single) member of another set of symbols -to each argument of the function, unique value assigned -addition function -palindrome function

take 1: utility maximization

-maximize objective value -problem: our rule cant handle probabilities

neurons that anticipate reward before actions

-neurons in striatum (part of basal-ganglia) firing before movements- but only rewarded movements -participate in representing value function that maps from (state, action) pairs to future reward

involvement w/ subcortical structures

-not cortex -action selection, cognitive control, and reward based learning depends on circuits involving subcoritcal structures and their connections to frontal lobes

neurons that respond to reward

-positive hedonic "liking" -negative aversive "disliking" -opioid hedonic hotspots -cerebral cortex, nucleus accumbens, amygdala, hippocampus

Babbage machine

-precursor to Turings machine in 1840s -2 types: "the difference engine" and the "analytical engine"

relationship between prediction error and change in firing rate -Bayer and Glimcher

-quantitative relationship

temporal credit assignment problem

-refers to the fact that rewards, especially in fine grained state-action spaces, can occur terribly temporally delayed -such reward signals will only very weakly affect all temporally distant states that have preceded it -almost as if the influence of a reward gets more and more diluted over time and this can lead to bad convergence properties of the RL mechanism -Many steps performed by any iterative reinforcement-learning algorithm to propagate the influence of delayed reinforcement to all states and actions that have an effect on that reinforcement

why 3) add knowledge?

-requires a way to program or learn knowledge about the domain -so sys can evaluate a node (ex: game board) w/o searching to the end (addresses depth prob) -so sys can be more selective in actions it explores in search (addresses breadth prob)

what is the difference btwn reward and value?

-reward: quantity associated with states that defines how "intrinsically desireable" a state is. rewards define the goals of a reinforcement learning agent -values: expected (discounted) sums of future rewards, and are also quantities associate with states, or state and actions (state, action values are called Q values) -rewards are somtimes called primary signals -values called secondary signals

The "Turing Machine"

-self-described as a model of a "computer" -minimalist formation of the intuitive notion of computation -many mathematically equivalent notations for specifying Turing Machines

reward and reinforcement learning: the big idea//implicitly defining an optimization problem

-separate the goodness of states of the organism from behaviors required to attain those states -accomplish that by providing organism w reward sys that maps organism states to some quantitative signal -and a learning sys that uses signal along w experience to adjust behaviors so as to attain more of those good states

neurons that encode reward prediction error

-some midbrain dopamine neurons seem to encode this reward prediction error-->may participate in ERROR-DRIVEN LEARNING -first comes habitual (familiar) reward recognition -then a delayed (surprising) reward: suppressed firing followed by increased firing after reward -also an early (surprising) reward: increased firing after reward

algorithmic level

-specifies procedures and mechanisms that enable the problem to be solved

biological/physical level

-specifies the neural/chemical substrates in which the algorithm/procedures are implemented

functional level

-specifies what problem the capacity is supposed to solve

what is a good choice of action?

-the agent is at time step "t" and needs to pick up next best action a(lower t). what do we do? -best action=action that maximizes expected cumulative reward

4) evolutionary basis

-the beginnings of a plausible evolutionary theory can be provided for the origins of specific reward/motivational functions through the formulation of an optimal reward prob that asks: what is the BEST REWARD FUNCTION TO PROVIDE THIS LIMITED AGENT IN ORDER TO MAXIMIZE ITS FITNESS in some distribution of environments?

Pinker's claim: how the mind works

-the mind is what the brain does -brain processes computation -thinking is type of computation

Turings result #2: the existence of universal computing machines

-there are single Turing Machines that can compute EVERY computable function, by taking as input a description (program) of the function to compute, and its input

Ada Lovelace

-translated italian memoir on Babbage Analytical engine -created method for calculating sequence of Bernoulli #s w engine that wouldve run correctly if it had been built

thinking as computation via internal search

-we can create algorithms tht process symbolic patterns representing possible states of the game, aka possible states of any aspect of the world -algorithms encode the 'rules of the game' (aka how the world works) and can explore(search) the consequences of taking diff actions in game (world) by 'looking ahead' (predicting what might happen)

"bucket brigade" algorithm- John Holland

-wrote book "adaptation in natural and artificial systems"

Alan Turing's 2 breakthroughs

1) a minimal formalization of computation 2) universal compuation

two ways to represent value function

1) a table 2) a neural net that outputs a value given a feature vector representing the state

Parts of a Turing Machine

1) an infinite tape divided into cells which are either blank or have single symbols on them 2) finite alphabet of tape symbols 3) a read/write head that is positioned at a single cell on tape and can read the symbol at that cell & erase or write a symbol 4)a state memory that stores the single current thats of the Turing machine, one of a finite set of states 5) a finite transition table of instructions that determines the control of machine. each entry in table tells machine what to do based on its current state and symbol currently abajo del read/write cell. the actions indicate the new symbol to write, the direction to move the head, and the next state to enter.

what is computation?

1) execution of algorithms that implement functions 2) physical processes transforming physical symbols 3) what Alan Turing said

summary: reinforcement learning

1) functional problem 2) algorithmic solution 3) neural implementation 4)evolutionary basis

solution to delayed reward (Sutton and Barto, Holland)

1) keep track of value functions Q(s,a) 2) treat current value functions as a prediction, and gradually adjust it when there is an error in the prediction (in the direction that will reduce the error

how to find neural correlates of reward-based learning?

1) neurons that detect the presence of reward (REWARD FUNCTION) 2) neurons that anticipate reward (the VALUES OF STATES and the VALUES OF (STATE,ACTION) PAIRS) 3) neurons that encode a reward prediction error (the ERROR SIGNAL- "DELTA")

secrets of Al Turing

1) search 2) reinforcement learning--what is the problem being solved by reward-based learning?

three ways to make search perform better

1) search more deeply 2)search more broadly *both 1 and 2 require additional computational resources (more, faster processors) 3)add knowledge

finding the actual framework of rational choice theory

1) start w simple choice rule 2) notice probs 3) propose better rule 4) notice probs 5) etc -when done--convinced that rational chooser maximizes expected utility

intelligent thought

= knowledge + search -one of deepest principles abt thought to emerge from artifical intelligence

the first "computer programmer"

Ada Lovelace

value functions

mappings from states or (state, action) pairs= expected future cumulative discounted reward

3) neural implementation

a frontal/midbrain/striatal circuit involving the DOPAMINERGIC SYSTEM underlies implementation of this algorithm -evidence comes from neural recording that reveal specialized representations of the diff quantities implicated in the algorithm, including prediction error

Tesauro- "TD-Gamma"

an RL system that learned how to play Backgammon -reward function was: +100 is win -100 if lose 0 for all other states -trained by playing 1.5 million games against itself -became as good as best human players

3 lvls of explanation in cog sci: David Marr

any cognitive capacity can be described at three levels: 1) functional 2) algorithmic 3) biological/physical -all 3 lvls are indispensable -no competition between these levels of explanation

why does claim #2 matter to psychology and neuroscience?

bc we have no alternative to universal computation as the explanation for how a single computing machine (mind/brain) can implement the APPARENTLY UNBOUNDED VARIETY OF COMPLEX FUNCTIONS that are within human capacity

what is the best action to take in a given state?

best action is the one that maximizes EXPECTED CUMULATIVE FUTURE DISCOUNT REWARD

what makes thought (& intelligence) possible?

computation

"the analytical engine"

designed to take paper cards that specified diff functions for it to compute -inspired by the paper cards of Jacquard loom weaving machines

discount factor in reinforcement learning

determines how much the organism cares about future reward relative to immediate reward

what shows us connections among brain regions

diffusion tensor imaging of white-matter tracts

2) algorithmic solution

effective computational algorithm for solving this learning problem is based on: -separate reward functions (presumably innate part of organism) from value functions that estimate expected future rewards for taking particular actions in particular states, and -update these estimates based on reward prediction errors

1) functional problem

even if agents are endowed w reward functions telling them what things (States) are good, need mechanism for leaning behaviors that are effective in obtaining those good states-->learning mechanisms must solve the temporal credit assignment problem

extrinsic rewards

external force of motivation

Ada Lovelace's notes

foreshadowed the fundamental theory of computation developed 100 yrs later by Turing

DeepMind

had artificial intelligence breakthrough and google bought it for $617 million

intrinsic motivation and cognitive rewards

higher mammals (humans) are motivated to play and explore -internal reward functions may reward learning and exploration itself-->rewarding to experience "error signals" as we make predictions in the world--not just about reward but about how world works in general

we can use optimal Q value to select best action

if there are n actions available in current state s, and we had this value for each action, we could just pick the action with the highest Q value

Q-learning

kind of error-driven learning 1)current estimate 2) actual reward "recieved" at time t 3) what estimate should be according to this sample of experience and our estimate of future reward from st+1 4) subtracting out current estimate 5)learning rate (from 0 to 1)-says how fast to learn

way of behaving

mapping of states to actions

is every conceivable function computable?

no

value functions vs reward functions

not the same, although both map states to quantities

how should we (as engineers) or evolution design an organism that acts effectively in the world?

one possibility: "wire-in" innate stimulus-action mappings -when appendage pain sensors activate, withdraw appendage immediately -swim in direction of most shrimp -when shadow overhead, run like crazy into a hole

quantity value

particular way of formalizing subjective value or utility in the context of sequential decision making--making multiple choices over time -we want algorithms that learn what the good actions are

algorithm

procedure that generates the specified mapping relation -can be physically realized in multiple ways -palindrome algorithm

computation

refers to the execution of a computational procedure, called an "algorithm" -physical processes transforming physical symbols (patterns coding information) -doesnt depend on special properties specific to neurons and electronic digit circuits

what is reward?

reward function is a mapping from states to quantities

reinforcement learning (RL) is concerned with ?

sequential decision making--making good decisions over time--in uncertain (probablistic) environments -emphasized sequential part, "made up" reward functions

what quantity is the Q-learning algorithm (and all reinforcement learners) trying to maximize?

sum of (discounted) rewards

problem that search algorithms face

the combinatoric, exponential explosion of possible futures -search spaces can grow exponentially -ex: the tree of lunch

what is the prediction error?

the difference between the current value estimate and a value estimate that takes into account the reward actually recieved

objective value

things that have publicly share standards for how valuble they are -ex: money

Computational theory of mind

thinking/all mentation is type of computation

reward-based learning system in the brain

tracing out the neural circuitry that might underlie these computations

computations for classical conditioning

updating a value function using reward prediction error-taking actions out

How did TD-Gammon represent the value function?

used a neural net with single hidden layer

intrinsic rewards

we are infovores--the rewards of learning

what does it mean to learn how to behave? what is a "way of behaving?"

we formalize ways of behaving as MAPPINGS FROM STATES TO ACTION-->CALLED POLICIES

optimal Q-value function

we have solved problem of learning how to behave -we replace problem of learning the optimal policy w prob of learning the optimal Q-value function -the expected future discounted reward of taking different actions in different states, then behaving optimally (following the optimal policy) thereafter

what is the VALUE of a state, or state and action?

we want algorithms to lean what the good actions are

computations for operant conditioning

with actions included-updating a value function using reward prediction error


Related study sets

Che giorno è oggi? Qual è la data di oggi?

View Set

Physical Science Chapter 16 // sjhs

View Set

Penghayatan Etika dan Peradaban Topik 1-5

View Set

Topic 3 - Gross Domestic Product

View Set