Psych 1X03: Instrumental Conditioning
CS ≠ SD
CS: paired with US, elicits reflexive, involuntary response SD: paired with response reinforcer outcome, sets occasion for response but does not reflexively elicit response
fixed ratio schedules
Reinforcing a behavior after a specific number of responses have occurred.
fixed interval schedules
Reinforcing a behavior after a specific period of time has elapsed.
variable ratio schedules
Reinforcing the behavior after an unpredictable number of responses.
variable interval schedules
Reinforcing the behavior after an unpredictable period of time has elapsed.
Thorndike results
frequency of random behaviours prior to solving puzzle gradually decreased vs rapidly decreasing, never distinct "aha!"
secondary reinforcer
only can be reinforced by previous learning ex. money/grades/stickers
generalization
stimuli similar to SD can indicate validity of contingency to certain degree
negative contrast
when switched from high to low reward, response is slower than group accustomed to low reward
positive contrast
when switched from low to high reward, response is faster than group accustomed to high reward
"stamping in"
action followed by favourable consequence, ex. cats get food
autoshaping
Refers to experiments in which an apparatus allows an animal to control its reinforcements through behaviors. The animal, in a sense, is shaping its own behavior.
chaining
a response is reinforced with the opportunity to perform next response
reinforcer
anything that increases the probability of response being emitted again in the future
positive punishment
arrival of aversive stimulus follows response, decreases probability of response occurring again
positive reinforcement
arrival of stimulus following response increases probability response will occur again, reward training
difference between shaping and autoshaping
autoshaping: rewards some behaviour that animal may do spontaneously, shaping: rewards successive steps towards behaviour that animal normally would not do
partial reinforcement schedules
based on either number of responses or time
continuous reinforcement
behaviour is continuously reinforced on every single trial, extinction comes faster
partial reinforcement
behaviour is reinforced every other trial, extinction comes slower
Law of Effect
behaviours with positive consequences are performed more frequently (stamped in) whereas those with negative consequences are stamped out
extinction of behaviour via instrumental conditioning
can occur, be weakened by invalidating contingency of reinforcer
contrast effects
changes in the value of a reward leads to shift in response
shaping by successive approximation
complex behaviour is taught in steps that gradually build up to full response we want to condition (used by animal trainers)
s∂-
contingency between response and reinforcement is "off"
SD+
contingency between response and reinforcement is "on"
classic conditioning and punishment
create contingency between punishment and parent, parent may become signal for pain/distress and damage parent child relationship
operant
describes behaviour in instrumental conditioning: voluntary actions operate on environment to produce change leading to specific consequence
fixed
follow fixed schedules (ex. FR-1, FI-1) held constant across trials
primary reinforcers
have intrinsic value ex. access to food/water/mate
S+
informs you of what COULD happen
CS-
informs you of what WILL NOT happen
CS+
informs you of what WILL happen
S-
informs you that response reinforcer is not in effect
instrumental conditioning
learning contingency between voluntary behaviours and their consequences
overjustification
newly introduced reward for previously unrewarded task can alter individuals perception of task, previously perceived intrinsic values of tasks become extrinsic values
ratio
number of responses determines when response is given (ex. FR 10 = reinforcement given every 10 responses)
fixed ratio graph
pause and run, consequences are delivered following a specific number of behaviors
Skinner's shaping
ping pong pigons, used shaping by successive approximation
variable ratio graph
positive slope, reinforcement delivered after random number of responses, no pauses because anticipation of reward is constant therefore behaviour is constant
variable interval graph
positive slope, reinforcement delivered at anytime, no pauses because anticipation of reward is constant therefore behaviour is constant
punishment
presentation of negative reinforcer, decrease frequency of behaviour (controversial)
reward training
presentation of positive stimulus/reinforcer after response, increases frequency of behaviour
"stamping out"
random behaviours are performed less frequently, ex. turning in a circle
negative punishment
removal of appetitive stimulus decreases probability of response occurring again
negative reinforcement
removal of aversive stimulus follows response increases probability response will occur again, escape training
escape training
removal of negative reinforcers, increases frequency of behaviour
Omission training
remove positive reinforcer, decrease frequency of behaviour
fixed interval graph
scalloped, reinforcement delivered
discriminative stimulus
signals when a contingency between response and reinforcement is "on" or "off", indicates validity of contingency, leads to better discrimination
Thorndike
studied cats in puzzle boxes, observed overt behaviour - how long to pull string and escape box
ratio strain
the limit to how stingy a ratio can be
interval
time since last response reinforced (ex. FI-1 = reinforcement every minute for response)
time out procedure
used by schools and parents, take away positive reinforcers
variable
variable number of responses/length of time, random around a mean, more resistant to extinction