Psy 116 Learning: Instrumental Conditioning
Application of Appetitive Conditioning: Behavior Modification
The use of reinforcement and nonreinforcement to control people's beahvior--> contingency management WATCH TED TALK
Behavioral Economics: Choice Behavior & Immediate vs. Delayed Reinforcers
*Choice Behavior* -Behavioral Allocation --> Blisspoint and contingencies *Important of timing and immediate vs. delayed reinforcers* -Immediate reinforcers are hard to turn down -ex. going to watch favorite show or turn off the tv and study?
What is a difference between the associations formed in operant and classical conditioning?
*Classical conditioning* forms associations between the stimuli (*CS and US*) *Operant conditioning* forms an association between *behaviors and the resulting events*, *R-O*
What is Skinner's contribution to Operant Conditioning?
*Contingency* -the specified relationship between behavior and reinforcement -behavioral modification -the environment determines the contingencies (environment determines the relationship between behavior and reinforcement) *Reinforcer* -An event (or termination of an event) that increases the probability of behavior -Defined merely by its effect on future behavior *Goal* -To influence (predict and control) behavior -Skinner's goal--> predict and control behavior -Doesn't increase data--> not reinforcer
What are reinforcement schedules? Two basic schedules?
*Continuous Reinforcement* -Reinforces the desired response each time it occurs ex. every time lever pressed--> reinforcement *Partial Reinforcement* -Reinforces a response only part of the time -Results in slow acquisition in the beginning, but shows greater resistance to extinction later on -more realistic in real life because continuous reinforcement is not normal -anything not continuous; part of responses being reinforced
Discontinuance of Reinforcement
*Extinction* The elimination or suppression of a response caused by the discontinuation of reinforcement or the removal of the unconditioned stimulus -When reinforcement is first discontinued, the rate of responding remains high --> Under some conditions, it even increases
Why do we get the PRE?
*Frustration Theory* -In animals that received continuous reward, the elicitation of an anticipatory frustration response during extinction leads to a rapid suppression response -In animals that received intermittent reward, anticipation of non-reward does not produce sufficient frustration to suppress responding early in acquisition, due to the low associative strength -when reward then follows non-reward during acquisition, the anticipatory frustration response becomes associated with responding--> animal leans to persist when they experience frustration -anticipatory frustration response is smaller/lower, but not that already conditioned to lever to maintain responding -because of a partial schedule--> get conditioned to the partial schedule *Sequential Theory* -If reward follows nonreward, the animal will associate the memory of the nonrewarded experience with the operant or instrumental response -During extinction, the only memory present after the first nonrewarded experience is that of the unrewarded experience --> Animals receiving continuous reward do not experience nonrewarded responses, and so they do not associate nonrewarded responses with later reward--> the memory of receiving a reward after persistence in the face of nonreward becomes a cue for continued responding -schedule is going to dictate the association the animal is going to make
How do we measure operant conditioning?
*Operant conditioning* -Evidence of learning is the frequency and consistency of responding -measuring frequency and consistency in behavior *Operant chamber* -An enclosed environment with a bar on the side wall used for the study of operant beahvior within it *Cumulative recorder* -Developed by Skinner; a pen attached to a recorder enabling the measurement of rates of behavior
The Features of a Fixed Ratio Schedule
*Post-Reinforcement pause* -A pause in behavior following reinforcement on a ratio schedule, which is followed by resumption of responding at the intensity characteristic of that ratio schedule -The higher the number of responses needed to obtain reinforcement, the more likely a post-reinforcement pause will occur -The higher the ratio schedule, the longer the pause -The greater the satiation, the longer the pause -ex. a 200 response rat is going to pause longer than an 100 response rat
What is a theory of Appetitive Conditioning?
*Premack's Probability-Differential Theory* -Reinforcers have a value that is calculated in regard to other behaviors depending on the probability of the behavior -Reinforcers can be anything (not just traditional natural reinforcers ex. food), including activities -Any activity or behavior that has a higher probability can serve as a reinforcer for an activity or behavior with a lower probability **organisms will do a lower probability behavior in order to gain access to or engage in the higher probability behavior && *Response Deprivation*
What are the different types of reinforcers?
*Primary Reinforcer* an activity whose reinforcing properties are innate (natural reinforcers i.e. food, sex, water, safety, or comfort, love or bonding) -natural; innate; unlearned -do not have to be trained to be liked *Secondary Reinforcer* an event that has developed its reinforcing properties through its association with primary reinforcers (classically conditioned to occur with or provide access to primary reinforcers) -associated with primary reinforcers (via classical conditioning) -ex. money (can get good, sex, love if your life sucks) ***Secondary reinforcers gain control over behavior through their associations with primary reinforcers* *Factors that affect behavior (rft)* -How often? --> contiguity, frequency -How much behavior? --> amount of work or effort to obtain the reinforcer (incentive value) How hard is that response for you to do? harder to press lever--> harder to learn not just about freuency and time anymore--> have to think about cost and benefit incentive value--> how valuable it is directs how much you'll work for it -How valuable?--> reward magnitude (how good is the thing) & How much is it worth to you?
What is Premack's principle effective in?
*Producing behavior changes* -ex. can get kid or self to do a less (probable or likely) desirable behavior in order to gain access to a more (likely or probable) desirable behavior --> get the kid to take out the trash so they they can play a video game --> eat vegetables before get dessert
How do we get a decrease in studying behavior?
*Punishment* Positive -Add something bad ex. student chamges in class by the profesor for not knowing an answer to a question Negative -Take away something good ex. student's change to increase their grade with extra credit is taken away by the professor because they are upset by the low class average
Applying Operant Conditioning: How do we get a decrease in lever pressing behavior?
*Punishment* Positive -Add something bad ex. the rat activates a floor shock every time it presses a lever Negative -Take away something good ex. the rat loses access to the water bottle for 5 minutes in the cage every time it presses the lever
Applying Operant Conditioning: How do we get an increase in lever pressing behavior?
*Reinforcement* Positive -Add something good ex. a hungry rat lever pressing for food Negative -Take away something bad ex. a lever press removes a shock of the floor
How do we get an increase in studying behavior?
*Reinforcement* Positive -Add something good ex. the student gets a piece of candy when they print the class notes Negative -Take away something bad ex. the student reduces the impact of their first bad grade by increasing their grade on the current test by studying
How readily is an Instrumental or Operant Response learned?
*The Importance of Contiguity* Reinforcers can lead to the acquisition of an instrumental response if it immediately follows the behavior -Learning is impaired if reinforcer delivery is delayed The Effect of Delay -The presence of a secondary reinforcer can bridge the interval and reduce the impact of delay --> when these cues are not present, even short intervals produce little conditioning
Changes in Reward Magnitude
*The Importance of Past Experience* Depression effect -The effect in which a shift from high to low reward magnitude produces a lower level os response than if the reward magnitude had always been low--> aka *negative contrast* **Experience the good life so when it changes you're like da fuq mang, no Elation effect -The effect in which a shift from low to high reward magnitude produces a greater level of responding than if the reward magnitude had always been high--> aka *positive contrast* -contrast effect only lasts for only a short time, then responding returns to the level appropriate for the reward magnitude -Frustration seems to play a role in negative contrast effect -emotional response of elation may explain the positive contrast effect **ABOUT COMPARISON (for a little bit...)
What is evidence for both Response Deprivation & Premack?
*You can switch reinforcer value by changing motivation* -You can get a water deprived rat to run more on the wheel if running more means that it gets access to water -Can get a rat that is not water deprived to drink more water if drinking means it gets to run on the wheel
What are immediate and delayed reinforcers?
*immediate reinforcer* -a reinforcer that occurs instantly after a behavior ex. a rat gets a food pellet for a bar press -always better *delayed reinforcer* -a reinforcer that is delayed in time for a certain behavior ex. a paycheck that comes at the end of the week -underestimate how much delay affects conditioning -delay--> longer to learn -how long before reinforcer delivery
Behavioral Economics
-Talk about costs and benefits -Cost is the work involved in the behavioral response -Benefit is the reinforcer What if you give a choice between two different schedules of reinforcement? What predicts an organism's choice behavior?
What is the Differential Reinforcement of Low Responding Schedule (DRL)?
-A schedule of reinforcement in which an interval of time must elapse before a response delivers reinforcement -DRL schedules effectively control behavior **Have to do something (behavior) an then wait for a given time interval for it to be reinforced
What is the Differential Reinforcement of Other Behaviors Schedule (DRO)?
-A schedule of reinforcement in which the absence of a specific response within a specified time leads to reinforcement -Unique because it reinforces the failure to exhibit a specific behavior during a particular period of time -Widely used in behavioral modification
What is the Importance of Consistency of Reward?
-Extinction is slower following partial rather than continuous reinforcement (once you get used to something, going to fight back more)
What is Operant (Instrumental) Conditioning?
-Involves learned behavior that is an association between a response and a stimulus --> The effects of a particular behavior in a particular situation increases (reinforce) or decrease (punish) the probability of the behavior occurring again--> More flexible form of learning -Reinforcing stimulus- an appetitive stimulus that follows a particular behavior and makes the behavior more likely to occur -->Reinforcers -Punishing stimulus- an aversive stimulus that follows a particular stimulus and makes the behavior less likely to occur -->Punishers -contingencies of environment change--> behavior changes (relationships within the environment change--> behavior changes)
Differential Reinforcement of High Rates of Responding Schedules (DRH)
-Schedule of reinforcement in which a specified high number of responses must occur within a specified time in order for reinforcement to occur -Time limit for responding -Similar to an FI, but FI has no time limit... or in other words, there is not time in a FI schedule where the response is NOT reinforced **DRH adds the pressure of time! -ex. sales at stores? -There is a time limit in a DRH schedule after a certain period of time that responses are not reinforced even if they are emitted -extremely effective -if DRH schedule is too high, it cannot be maintained and responding will decrease
What is the Response Deprivation Theory?
-When you deprive an organism of its usual responsiveness for something, then the organism will want to get back up to its usual level of response
What are fixed ratio schedules?
A specific number of responses is needed to produce reinforcement Produces a consistent response rate Talk about an FR schedule with the number indicating how many repsonses that are needed before a reinforcer is given -ex. a rat has to press a lever 5 times before it gets a food pellet as a reinforcer--> FR5 -continuous reinforcement is an FR1--> every response is reinforced
What is a Variable Interval Schedule?
An average interval of time between available reinforcers --> but the interval varies from one reinforcement to the next contingency -slower rates of responding -the longer the interval, the lower the response rate -Scallop effect does not occur on VI schedules because the time for the next interval is unknown
What are variable ratio schedules?
An average number of responses produces reinforcement, but the actual number of responses required to produce reinforcement varies over the course of training -VR5 would be a schedule where a fixed number of responses is reinforced, but that number varies around an average of 5 --> ex. For a VR5 you would give a reinforcer after 2 responses, then after 6 responses, then after 7, and then after 4, and then after 6 to equal an average of 5 responses ***Produces a consistent rate of responding*** ****Very hard to extinguish**** *Post-reinforcement pauses occur only occasionally on variable ratio schedules--> thus the rate in responding is *higher on VR than FR schedules* Example of VR Schedule: -Fishing (VR schedule with unknown average) -Slot machines (if they are the kind that is set to pay out after a certain number of responses)
What is are examples of a variable interval schedule?
Doing a behavior to see if reinforcement has occurred Checking your email when you are waiting for a response from someone you like--> VI because it is a variable amount of time when you get the reinforcer and you have to do the behavior to receive the message or to receive the reinforcer --> Slow steady rates of responding- how many times is email checked after that date?
What are some real life examples of a DRH?
Exams -every 4 weeks -treat like a FI schedule because scallop effect -it is a DRH schedule because there is a time limit--> would no matter if you studied for Exam 1 after it was administered -on a true FI schedule, you would get reinforced anytime after the interval has passed **Complete something (behavior) within a given time frame to receive reinforcement
Comparison of Schedules of Reinforcement
Fixed and variable ratio had the most number of responses--> fixed ratio had steady intervals between reinforcers--> variable ratio was a little more staggered because time between reinforcers was not set--> rate of responding is higher on variable ratio than fixed ratio Fixed intervals--> number of responses would fall after a reinforcer and then rise as the next reinforcer was due Variable interval--> There was study rate of responding--> reinforcers were spread out so the response was constant because kept checking back for reinforcement
What is Premack's explanation?
Hungry rat presses lever for food because: -lever pressing is a less probable behavior (less likely to occur) than eating (more likely to occur) -because of that^ probability differential, you can have the rat work to lever press (less likely probable behavior) to gain access to the more probable behavior, which is eating
What is a real life example of a DRL?
Ice cream example -Tell kids to gtfo for 20 minutes and they will get ice cream--> have to wait for that 20 minute interval to pass so they receive reinforcement -predict that the ice cream asking behavior should go down
What is reinforcement?
Increases the frequency of desirable behavior -Positive Reinforcement: Applies stimulus -Negative Reinforcement: Removes stimulus -better than punishment -only thing that Skinner cared about -more effective, longer lasting changes in behavior -tranfers more easily
What is contingency management?
Indicates the contingent reinforcement is being used to increase the frequency of appropriate behaviors and to eliminate or reduce inappropriate responses
What is the Influence of Reward Magnitude on Resistance?
Influence of reward magnitude on resistance is dependent upon the amount of acquisition training -A small reward during acquisition produces more resistance to extinction --> When a small reward magnitude is provided during acquisition, an anticipatory goal response (a conditioned anticipation of impending reward) develops very slowly --> During extinction, substantial differences in the level of the anticipatory goal response will not occur--> sooooo frustration produced will be small -with extended acquisition, a larger reward produces faster extinction -->this increases frustration which is produced as a result of extended training with a large reward, leading to a more rapid extinction of the appetitive response General: small reward--> more resistance Take away reinforcer--> elicit frustration
What are the two ideas of Behavioral Economics?
Matching & Maximizing! *Matching* Match the responding between the two choices in relation to the relative value of the reinforcer (benefit) in association with the work required to get response (the cost) --> only put in work for the better reinforcer *Maximizing* Maximize the responding to get the most reinforcer (benefit) for the least amount of work responding --> do the least amount of work you can to still receive reinforcement
What is shaping?
Operant conditioning procedure in which reinforcers guide behavior towards the desired target behavior through successive approximations -Reinforce along the way to get them there closer to target behavior
What are real examples of an FI schedule?
Reinforcement is available after a certain period of time Ex. Chores -kids get an allowance after they do the chores on Fridays when parent is paid--> do not ask for money on Wednesday and probably will not do the chores on Wednesday--> learn to do the chores on Fridays after the parent is paid
What is a Fixed Interval Schedule?
Reinforcement is available only after a specified period of time and the first response done after the interval is reinforced Responding only matters after a certain amount of time has elapsed *Scallop effect*--> a pattern of behavior of fixed internal schedules where responding stops after reinforcement and then slowly increases as the time approaches when reinforcement will be available *Effects of the FI schedule* The length of the pause on an FI schedule is affected: -Experience- the ability to withhold the response until close to the end of the interval increases with experience -The pause is longer with FI schedules (common sense.. duhh)
What factors contribute to the resistance to extinction of instrumental or operant responding?
Reward Magnitude & Schedule of Reinforcement! -how good is it -schedule is setting up an expectation -reward magnitude: whether the reward is really good or not; based on how you did the training--> not one rule--> bigger isn't always better
What should one think of when hear operant conditioning?
S-R-O -O is the outcome/consequence -Outcome will change depending on the probability of response -"S" tells you if this is the right environment for the R-O contingency--> tells when it is in effect -outcome changes--> change response (more flexible)
What is behaviorism? What is the goal?
Skinner's Behaviorism! --> faja of operant conditioning -Figure out what environmental stimuli are influencing behavior--> idea that the environment is the only thing that directly influences behavior -Goal is to predict and control behavior -Focus is on the environment or nurture side of the nature/nurture debate
What is a real life example of DRO
Student -disruptive student -tell them if they are not rude for 5 minutes, they will be reinforced with a piece of candy -should lead to a reduction in calling out and being rude -as student learns, can increase the interval to 10 minutes, and then 15 minutes, and greatly reduce their behavior
The Impact of Reward Magnitude
The Acquisition of an Instrumental or Operant Response -The greater the magnitude of the reward, the faster the task is learned -The differences in performance may reflect motivational differences
What is Thorndike's law of effect?
Thorndike is about that operant life Law: Rewarded behavior is likely to occur again -Ex: After figure out the puzzle to get out of the box, the time required to escape decreases with the amount of successive trials
What is a reward?
a pleasurable/positive stimuli that increases the probability of behavior **a reward is just positive--> only the pleasurable aspect! -A reward is a reinforcer (give + stimuli or take away a - one to increase the probability of behavior)
What is the operant chamber?
aka Skinner box -comes with a bar or key that an animal manipulates to obtain a reinforcer like good or water--> animal's response recorded -isolated environment to watch behavior--> opernt chamber -can get reinforcer -the chamber has something in it
What is punishment?
decreases behavior--> adding something bad or taking away something good Decreases the frequency of undesirable behavior -Positive Punishment: Applies Stimulus -Negative Punishment: Removes Stimulus -Skinner did not believe in punishment -fast learned, but does not transfer well
What is the Partial Reinforcement Effect (PRE)?
the greater resistance to extinction of an instrumental or operant response following intermittent rather than continuous reinforcement during acquisition -one of the most reliable phenomena in psychology -role of expectancy -argued a highly adaptive mechanism -one of the most reliable things you can get in conditioning -any partial reinforcement is going to lead to more behavior during extinction -behavior continues to increase during extinction in rats that received intermittent reinforcement than those who received continuous reinforcement