Instrumental Conditioning
Autoshaping
contingency that is learned without any guidance - organism shapes its own behaviour
Discriminative Stimuli
environmental cues that suggest when a contingency is valid
CS+ v/s SD
- CS+ will automatically elicit a response - SD will only set the occasion, a response will not be given unless the subject voluntarily performs the behaviour
CS- v/s Sδ
- CS- will automatically not elicit a response because the organism knows the contingency is invalid (even if the organism wants to it cannot elicit the response) - Sδ tells the organism that the contingency is invalid, but the organism still has a choice whether to respond or not-respond (organism can respond even in the presence of the Sδ)
Instrumental Conditioning v/s Human Learning
- Human's have a "Aha!" moment after which there is a steep drop in incorrect responses and steep rise in correct response - in instrumental conditioning, there is a gradual decrease in incorrect response and gradual increase in correct responses
B.F Skinner
- another psychologist with a major influence on instrumental conditioning - did the cat in a box experiment, except this time pulling the level would directly result in food
Cat in a Box Experiment
- place a hungry cat inside a puzzle box with a latched door - when a cat performs a certain action (usually pull a lever), the door opens and the cat is allowed food - as trials go on, it will take the cat less and less time to produce the correct response
Stimuli and Instrumental Conditioning
- stimuli associated with the environment act as an occasion setter for many possible voluntary actions
Why does Variable Interval hard to extinct?
- subject doesn't know the required time interval so it keeps giving response - its a time interval instead of set of responses, so the subject takes longer to realize that there won't be anymore reinforcements
Schedules of Reinforcement
1. Continuous Reinforcement (CRF) 2. Partial Reinforcement (PRF) a. Ratio i. Fixed Ratio (FR-5) ii. Variable Ratio (VR-5) b. Interval i. Fixed Interval (FI-5) ii. Variable Interval (VI-5)
Post-Reinforcement Pause
A pause in responding that typically occurs after the delivery of the reinforcer on FR schedules
Ratio Strain
As the number of response required (on FR schedule) increases, the length of the post-reinforcement pause increases
Punishment
An event that decreases the behavior that it follows - decrease the probability of the behaviour
Classical v/s Instrumental v/s Observational
Classical - learning relations between stimuli - elicit behaviour triggered by stimulus Instrumental - learning relation between behaviour and its consequences - voluntary behaviour emitted by the subject Observational - learning relation between behaviour and its consequence - involves no immediate change in behaviour
Positive Punishment / Punishment
Decreasing a behaviour by presenting a negative consequence/stimuli following a response
Negative Punishment / Omission Training
Decreasing a behaviour by removing a positive stimulus after the response
Edward L. Thorndike
First psychologist to study instrumental conditioning (training of voluntary responses) proposed the law of effect
Time Delay
In building contingency, presenting the reinforcement immediately after the response is the fastest way
Positive Reinforcements / Reward Training
Increasing behaviors by presenting positive stimuli, such as food. A positive reinforcer is any stimulus that, when presented after a response, strengthens the response.
Negative Reinforcements / Escape Training
Increasing behaviors by stopping or reducing negative stimuli, such as shock. A negative reinforcer is any stimulus that, when removed after a response, strengthens the response. (Note: negative reinforcement is not punishment.)
Shaping by Successive Approximations
Reinforced behaviours that aren't exactly the target behaviour but that at progressively closer versions of it - used to enforce complex behaviours - gradually get closer to the target behaviour - reduces acquisition time
Fixed Ratio (FR)
Reinforcement occurs following every specified number of responses - quickest to go extinct - quickest to form a contingency - subject will have post-reinforcment pauses - procrastination before starting another set of responses - smaller the required number of responses the quicker the contingency forms - FR-1 = Continuous Reinforcement FR-5 - reinforcement will follow every 5 correct responses
Variable Interval (VI)
Reinforcement occurs following the first response after an unpredictable or average amount of time - results in the steadier rate of responding - most resilient(hardest) to extinction - hardest to form a contingency
Fixed Interval (FI)
Reinforcement upon completion of the first response following a fixed duration of time - first response - wait fixed amount of time, then give reinforcement - if given another response - wait fixed amount of time from the LAST reinforcement before giving another reinforcement - response produced at low rate at the beginning for the interval and increase as you reach the end of the interval - procrastinating until test time FI-5 - subject must wait 5 minutes between each reinforcement if a response is given
Aversive Stimuli
Something that an organism will avoid. (something unpleasant)
Generalization
Stimuli similar to the SD can trick the organism into thinking the contingency is valid therefore the organism will perform the desired behaviour
Why is Variable Linear?
Subject doesn't know what the required response amount it or what the time interval is, therefore it keeps giving the response at a steady rate in the hopes that it reaches the required amount
Overjustification Effect
The effect of promising a reward for doing what one already likes to do. The person may now see the reward, rather than intrinsic interest, as the motivation for performing the task. - before the person did the response because they found it fun/enjoyable (intrinsic value) - once you start giving them a reward, they think they are performing the response for the reward (extrinsic value) - stop the reward the subject will stop the response
Variable Ratio (VR)
The reinforcement is delivered only after a variable (unpredictable) number of responses have occurred - the number of responses will be within some range around a mean/average - produce steady rates of responding - smaller the variable mean, the steeper the slope of the line - longer VR = harder to extinct - shorter VR = quicker to learn VR-5 - reinforcements given after 4 response, 6 response, 2 response, 8 response (average = 5)
Law of Effect
Thorndike's principle that behaviors followed by favorable consequences become more likely, and that behaviors followed by unfavorable consequences become less likely - Behaviours with positive consequences are "stamped in" - Behaviours with negative consequences are "stamped out"
Interval Schedule
based on the time elapsed since the last reinforcement
Observational Learning
change in behavior due to watching other people behave - we imitate or avoid behaviour based on the consequences we observe happening to others
Contrast Effects
changes in the value of the reward lead to shifts in the response rate - organism given a high reward will respond more than an organism that is given a low reward
Instrumental/Operant Conditioning
explicit training between voluntary behaviours and their consequences - contingency forms between behavioural response and reinforcer - subject's behaviour directly causes the satisfying/unsatisfying consequences - association between stimuli and voluntary behaviour
Primary Reinforcers
have intrinsic value Reinforcers that are rewarding such as food, water, and rest. Their natural properties are reinforcing.
Reinforcer
in operant conditioning, any event that strengthens the behavior it follows - increases the probability of the response
Continuous Reinforcement (CRF)
reinforcer follows EVERY correct response
Partial Reinforcement (PRF)
reinforcer follows only some of the responses - splits into Ratio and Interval schedules
Ratio Schedule
schedules are based on the number of responses made
Negative Discriminative Stimulus (S-)(Sδ)
signal that shows that the contingency between repose-reinforcer is invalid - signals that the reinforcer will NOT be present if the target behaviour is performed - no need to perform the reinforced behaviour
Positive Discriminative Stimulus (S+)(SD)
signal that shows that the contingency between repose-reinforcer is valid - informs you of what COULD happen if you do a certain behaviour - signals that the reinforcer will be present if the target behaviour is performed
Appetitive Stimuli
something that produces satisfaction when received
Delay of Gratification
sometimes the reinforcer cannot be presented right away - we have the ability to produce the response and wait for the reinforcer
Break Point
subjects stops responding because the required number of responses (FR Schedule) is so large the subject assumes there won't be a reinforcement
Positive Contrast
switching from a low reward to a high reward - organism will now respond at a faster rate - faster than an organism that was always given a high reward
Secondary Reinforcers
used to obtain other items that are natural reinforcers Things we have learned to value such as praise or money
Chaining
using operant conditioning to teach a complex response by linking together less complex skills - a response is reinforced with the opportunity to perform the next response
Negative Contrast
when a high reward is switched to a low reward - organism will now respond at a slower rate - slower than an organism that was always given low reward