Instrumental Conditioning

Ace your homework & exams now with Quizwiz!

Autoshaping

contingency that is learned without any guidance - organism shapes its own behaviour

Discriminative Stimuli

environmental cues that suggest when a contingency is valid

CS+ v/s SD

- CS+ will automatically elicit a response - SD will only set the occasion, a response will not be given unless the subject voluntarily performs the behaviour

CS- v/s Sδ

- CS- will automatically not elicit a response because the organism knows the contingency is invalid (even if the organism wants to it cannot elicit the response) - Sδ tells the organism that the contingency is invalid, but the organism still has a choice whether to respond or not-respond (organism can respond even in the presence of the Sδ)

Instrumental Conditioning v/s Human Learning

- Human's have a "Aha!" moment after which there is a steep drop in incorrect responses and steep rise in correct response - in instrumental conditioning, there is a gradual decrease in incorrect response and gradual increase in correct responses

B.F Skinner

- another psychologist with a major influence on instrumental conditioning - did the cat in a box experiment, except this time pulling the level would directly result in food

Cat in a Box Experiment

- place a hungry cat inside a puzzle box with a latched door - when a cat performs a certain action (usually pull a lever), the door opens and the cat is allowed food - as trials go on, it will take the cat less and less time to produce the correct response

Stimuli and Instrumental Conditioning

- stimuli associated with the environment act as an occasion setter for many possible voluntary actions

Why does Variable Interval hard to extinct?

- subject doesn't know the required time interval so it keeps giving response - its a time interval instead of set of responses, so the subject takes longer to realize that there won't be anymore reinforcements

Schedules of Reinforcement

1. Continuous Reinforcement (CRF) 2. Partial Reinforcement (PRF) a. Ratio i. Fixed Ratio (FR-5) ii. Variable Ratio (VR-5) b. Interval i. Fixed Interval (FI-5) ii. Variable Interval (VI-5)

Post-Reinforcement Pause

A pause in responding that typically occurs after the delivery of the reinforcer on FR schedules

Ratio Strain

As the number of response required (on FR schedule) increases, the length of the post-reinforcement pause increases

Punishment

An event that decreases the behavior that it follows - decrease the probability of the behaviour

Classical v/s Instrumental v/s Observational

Classical - learning relations between stimuli - elicit behaviour triggered by stimulus Instrumental - learning relation between behaviour and its consequences - voluntary behaviour emitted by the subject Observational - learning relation between behaviour and its consequence - involves no immediate change in behaviour

Positive Punishment / Punishment

Decreasing a behaviour by presenting a negative consequence/stimuli following a response

Negative Punishment / Omission Training

Decreasing a behaviour by removing a positive stimulus after the response

Edward L. Thorndike

First psychologist to study instrumental conditioning (training of voluntary responses) proposed the law of effect

Time Delay

In building contingency, presenting the reinforcement immediately after the response is the fastest way

Positive Reinforcements / Reward Training

Increasing behaviors by presenting positive stimuli, such as food. A positive reinforcer is any stimulus that, when presented after a response, strengthens the response.

Negative Reinforcements / Escape Training

Increasing behaviors by stopping or reducing negative stimuli, such as shock. A negative reinforcer is any stimulus that, when removed after a response, strengthens the response. (Note: negative reinforcement is not punishment.)

Shaping by Successive Approximations

Reinforced behaviours that aren't exactly the target behaviour but that at progressively closer versions of it - used to enforce complex behaviours - gradually get closer to the target behaviour - reduces acquisition time

Fixed Ratio (FR)

Reinforcement occurs following every specified number of responses - quickest to go extinct - quickest to form a contingency - subject will have post-reinforcment pauses - procrastination before starting another set of responses - smaller the required number of responses the quicker the contingency forms - FR-1 = Continuous Reinforcement FR-5 - reinforcement will follow every 5 correct responses

Variable Interval (VI)

Reinforcement occurs following the first response after an unpredictable or average amount of time - results in the steadier rate of responding - most resilient(hardest) to extinction - hardest to form a contingency

Fixed Interval (FI)

Reinforcement upon completion of the first response following a fixed duration of time - first response - wait fixed amount of time, then give reinforcement - if given another response - wait fixed amount of time from the LAST reinforcement before giving another reinforcement - response produced at low rate at the beginning for the interval and increase as you reach the end of the interval - procrastinating until test time FI-5 - subject must wait 5 minutes between each reinforcement if a response is given

Aversive Stimuli

Something that an organism will avoid. (something unpleasant)

Generalization

Stimuli similar to the SD can trick the organism into thinking the contingency is valid therefore the organism will perform the desired behaviour

Why is Variable Linear?

Subject doesn't know what the required response amount it or what the time interval is, therefore it keeps giving the response at a steady rate in the hopes that it reaches the required amount

Overjustification Effect

The effect of promising a reward for doing what one already likes to do. The person may now see the reward, rather than intrinsic interest, as the motivation for performing the task. - before the person did the response because they found it fun/enjoyable (intrinsic value) - once you start giving them a reward, they think they are performing the response for the reward (extrinsic value) - stop the reward the subject will stop the response

Variable Ratio (VR)

The reinforcement is delivered only after a variable (unpredictable) number of responses have occurred - the number of responses will be within some range around a mean/average - produce steady rates of responding - smaller the variable mean, the steeper the slope of the line - longer VR = harder to extinct - shorter VR = quicker to learn VR-5 - reinforcements given after 4 response, 6 response, 2 response, 8 response (average = 5)

Law of Effect

Thorndike's principle that behaviors followed by favorable consequences become more likely, and that behaviors followed by unfavorable consequences become less likely - Behaviours with positive consequences are "stamped in" - Behaviours with negative consequences are "stamped out"

Interval Schedule

based on the time elapsed since the last reinforcement

Observational Learning

change in behavior due to watching other people behave - we imitate or avoid behaviour based on the consequences we observe happening to others

Contrast Effects

changes in the value of the reward lead to shifts in the response rate - organism given a high reward will respond more than an organism that is given a low reward

Instrumental/Operant Conditioning

explicit training between voluntary behaviours and their consequences - contingency forms between behavioural response and reinforcer - subject's behaviour directly causes the satisfying/unsatisfying consequences - association between stimuli and voluntary behaviour

Primary Reinforcers

have intrinsic value Reinforcers that are rewarding such as food, water, and rest. Their natural properties are reinforcing.

Reinforcer

in operant conditioning, any event that strengthens the behavior it follows - increases the probability of the response

Continuous Reinforcement (CRF)

reinforcer follows EVERY correct response

Partial Reinforcement (PRF)

reinforcer follows only some of the responses - splits into Ratio and Interval schedules

Ratio Schedule

schedules are based on the number of responses made

Negative Discriminative Stimulus (S-)(Sδ)

signal that shows that the contingency between repose-reinforcer is invalid - signals that the reinforcer will NOT be present if the target behaviour is performed - no need to perform the reinforced behaviour

Positive Discriminative Stimulus (S+)(SD)

signal that shows that the contingency between repose-reinforcer is valid - informs you of what COULD happen if you do a certain behaviour - signals that the reinforcer will be present if the target behaviour is performed

Appetitive Stimuli

something that produces satisfaction when received

Delay of Gratification

sometimes the reinforcer cannot be presented right away - we have the ability to produce the response and wait for the reinforcer

Break Point

subjects stops responding because the required number of responses (FR Schedule) is so large the subject assumes there won't be a reinforcement

Positive Contrast

switching from a low reward to a high reward - organism will now respond at a faster rate - faster than an organism that was always given a high reward

Secondary Reinforcers

used to obtain other items that are natural reinforcers Things we have learned to value such as praise or money

Chaining

using operant conditioning to teach a complex response by linking together less complex skills - a response is reinforced with the opportunity to perform the next response

Negative Contrast

when a high reward is switched to a low reward - organism will now respond at a slower rate - slower than an organism that was always given low reward


Related study sets

Spikes Rhetorical Devices and Vocab

View Set

Abstract Algebra Chapter 6 definitions

View Set

PATHO / CHAP 19 [Structure and Function of the Hematologic System]

View Set

УСРР в період НЕПу (1921-1928)

View Set