Schedules of Reinforcement and Choice Behavior: Chapter 6
What does k stand for in this equation?
(Bx+Bo) (Target response + any other response)
Give some daily-life examples of an FR schedule.
-A delivery person needs to go to a certain number of houses to finish his route -A certain number of digits need to be pressed in order to make a phone call
The simplest was to study choices is a concurrent schedule in which there are two response alternatives, each response is followed by a reinforcer, and each response-reinforcer relationship works according to its own schedule of reinforcement. What does this type of schedule allow for?
-Allows for continuous measurement -Allows for assessment of how the choices are distributed and how each schedule influences behavior
The equation in this image provides two ways of changing the rate of response. What are they?
-By changing own rate of reinforcement -By changing the rate of other sources of reinforcement For example, say you give a pigeon two keys to peck, and you want it to peck at the right one more. You can increase the rate of reinforcement for pecking the right key, AND lessen the rate of reinforcement for pressing the left key.
Fixed rations produce a typical pattern of responding. (FR/CRF) produces a steady but moderate rate of responding. (FR/CRF) produces a pause before the start of the required number of responses.
-CRF -FR
What are four kinds on combined schedules?
-Concurrent schedules -Compound schedules -Tandem schedules -Mixed schedules
What are the two types of response-rate schedules?
-Differential reinforcement of high rates (DRH) -Differential reinforcement of low rates (DRL)
In the value discounting function, V is (directly/inversely) related to M and (directly/inversely) related to D.
-Directly -Inversely The value of a reward increases as the magnitude increases and decreases as the delay between response and reward increases.
The unpredictability of a VR produces (many/few) pauses in the subject's behavior. Thus, the cumulative record shows a (steady/unsteady) rate of responding.
-Few -Steady Ex: rats have no idea how many times they need to press a lever to get food so they just continue pressing it all the time
What are the four types of reinforcement schedules?
-Fixed ratio (FR) -Fixed interval (FI) -Variable ratio (VR) -Variable interval (VI)
(Mechanisms of the matching law) Describe molecular theories.
-Focus on individual responses -View matching relation as the net result of individual choices
Short IRTs result in a (higher/lower) rate of responding, whereas long IRTs result in a (higher/lower) rate of responding.
-Higher -Lower
What are some variables that affect reward discounting?
-IQ -Educational level -Income
(Mechanisms of the matching law) Describe molar theories.
-Ignore what might occur at the level of individual responses -Explain batches of responses and deal with the overall distribution of responses and reinforcers
Interval schedules differentially reinforce (short/long) IRTs, whereas ratio schedules differentially reinforce (short/long) IRTs.
-Long -Short
What are the three theoretical approaches to the matching law?
-Molecular theories -Molar theories -Melioration
Describe molecular maximizing.
-Organisms always choose whichever response alternative is most likely to be reinforced at the time -The subject will switch from Schedule A to Schedule B as the probability of reinforcement for Schedule B increases
What are some reinforcer factors that bias matching?
-Quantity of reinforcer -Palatability of reinforcer -Amount of delay between response and reinforcer
In a (ratio/interval) schedule, there is no limit to the amount of reinforcers that could be gained. In a (ratio/interval) schedule, there is an upper limit on the number of reinforcers that can be gained.
-Ratio -Fixed
In a (ratio/interval) schedule, response rate is directly related to reinforcement rate. In a (ratio/interval) schedule, response rate is not directly related to reinforcement rate.
-Ratio -Interval
What are some techniques used to promote self-control?
-Shaping procedures, consisting of gradually increasing the delay for the large reward over trials -Introducing a distracting task during the delay to the large reward or distracting attention from the large reward during the delay period
What are some factors that influence matching?
-Species tested -Effort or difficulty in switching from one alternative to another -How schedules are constructed
(Delay discounting) With higher values of K, the reward discounting function will be (steeper/more gradual). This results in impulsive behavior.
-Steeper
In FR schedules, the cumulative record shows a typical postreinforcement or pre-ratio pause. What is this? Why does it occur?
-The low rate of responding just after delivery of reinforcement -This pause may occur because the animal needs to rest after obtaining a reward (postreinforcement), or because the animal needs to prepare for the upcoming task, that is, responding again to get the following reinforcer (pre-ratio)
In a ratio schedule, reinforcement depends only on what? When is reinforcement delivered?
-The number of responses the organism has to perform -Reinforcement is delivered once the required number of responses is reached
Describe the study by Rachlin and Green (1972) that examined choosing between a small/immediate reward and a large/delayed reward.
-There were two procedures, direct-choice and concurrent-chain -The direct-choice procedure led to either a small/immediate reward or a large/delayed reward -The concurrent-chain procedure involved a delay between the choice and the terminal link (same rewards as the direct-choice procedure) -Results showed that under the direct-choice procedure subjects lacked self control, and chose the small/immediate reward -In the concurrent-chain procedure the subjects exhibited self control and primarily selected the large/delayed reward All in all, if you don't have to wait between making a choice and receiving a reward, you're likely to choose a small/immediate reward. If you have to wait between making a choice and receiving a reward, you're likely to choose a large/delayed reward.
When does response bias occur?
-When response alternatives are different (ex: pecking a key or stepping on a treadle) -When the reinforcer provided for two responses is different (ex: two different types of seed)
(Relative rate of responding equation) When Ba=Bb, the relative rate of responding is?
0.5
If a pigeon is trained in a VI 60 VI 60 concurrent schedule, the pigeon will peck both keys equally often (Ba=Bb) resulting in a relative rate of responding of?
0.5
(Relative rate of responding equation) When Ba<Bb, the relative rate of responding is?
<0.5
(Relative rate of responding equation) When Ba>Bb, the relative rate of responding is?
>0.5
If a pigeon is trained in a VR 5 VR 60 concurrent schedule, the pigeon will choose to peck only on the key with the VR 5 (Ba>Bb), resulting in a relative rate of responding of?
>0.5
What is a variable ratio (VR)?
A different number of responses is required for the delivery of each reward, resulting in an average number of responses.
What is the value discounting function?
A mathematical function describing the decrease in the value of a reward as a function of how long the organism has to wait for it.
What is a multiple schedule?
A mixed schedule in which each separate component schedule is signaled by an external cue.
VR schedules produce more responding than VI schedules. Describe the Reynolds (1975) study with pigeons that showed this.
A pigeon on a VI schedule received the same number of reinforcers as a pigeon on a VR schedule (they were yoked), however the VR pigeons responded more than the VI pigeons.
What is an interval schedule?
A response is reinforced only if it occurs after a certain amount of time has passed.
A schedule in which specific rates of responding are reinforced is called?
A response-rate schedule
Describe the rate of responding in VI.
A steady and stable rate of responding without regular pauses.
Response-rate schedules require a specific rate of responding in order for the reinforcement to occur. Reinforcement depends on how soon a response occurs after the preceding response. Give an example of this.
A subject is required to make a response every 5 seconds (no more, no less) resulting in a response rate of 12 responses per minute.
What is a chain schedule?
A tandem schedule in which each separate component schedule is signaled by an external cue. For example when a light is turned on, an animal switches to a different task.
What does FR 10 mean?
After 10 responses, the reinforcer is delivered (thus, 10 responses per reinforcer).
In an interval schedule, the time course determines when reinforcement becomes available, not when it is delivered. What does this mean?
After a certain amount of time, performing a target behavior will result in a reward. The reward is not just randomly dispensed, the animal still has to perform a behavior.
What does VI 2 mean?
An average of 2 minutes elapses between reinforcer availability. For example, 1 minute can pass before the first reinforcer is available, the second reinforcer can become available after 3 minutes, and the third reinforcer after 2 minutes. This averages out to 2 minutes.
Give a real life example of how a VR schedule results in continuous responding.
An example is gamblers playing slots. They have no idea when they're going to win money, so they continue sitting and playing at the slot machine for long periods of time.
What is a variable interval (VI)?
An unpredictable amount of time is required to make the next response.
Explain this graph.
At T2, there is a long delay between rewards. The reward value for the large reward is higher than the reward value for the small reward. At T1, there is a short delay between rewards. The reward value for the small reward becomes higher than for the large reward.
How is undermatching solved?
By making s<1. This makes the term for the relative rate of reinforcement in the generalized matching law smaller.
How is overmatching solved?
By making s>1. This makes the term for the relative rate of reinforcement in the generalized matching law larger.
A cumulative record is now done with computers, but it was originally done using a device called a ________ ________
Chart recorder
What are some daily-life examples of FI's?
Checking for your mail at home. Your mail is delivered once a day, usually at the same time. There is a fixed length of time between each mail delivery, and every time the mail becomes available, the behavior of checking the mailbox is reinforced on the first responses. Watching your favorite show on TV. The show is played every Wednesday at 9 pm. Every time the show starts, the behavior of turning to that channel is reinforced on the first response.
What are some daily-life examples of VI's?
Checking your email. You might get many emails a day, but they do not arrive regularly. The reinforcer (checking the email) arrives unpredictably, and when it does, the response of clicking on the email still has to occur. Standing in line at the grocery store. The amount of time you spend in line varies. Once you're at the front of the line, the behavior is giving your items to the cashier and the reinforcer is leaving with groceries.
Describe the study for Eysenberg and Adornetto (1986) that showed that self control can be trained.
Children showed an initial preference for immediate reward. They were then trained with a delayed reward. After training, they showed preference for a delayed reward.
Is a concurrent-chain schedule choice with commitment or self control? Why?
Choice with commitment. The subject must choose between keys during the choice link. It cannot choose both. If this were self control, choosing one key in the choice link would not rule out the other key.
What is self-control? Give some examples.
Choosing a long-term benefit over a short-term benefit -Cigarette vs. nicotine gum -Juicy burger vs. vegetables -Hanging out with friends vs. studying for an exam
What is a choice with committment? Give some examples.
Choosing one alternative which makes another alternative unavailable. -Choosing to find a job after high school or going to college -Dating person A or person B, not both -Voting for either a democrat or a republican, not both
(Which combined schedule?) A reward is given only after successful completion of two or more simultaneous component schedules (ratio and interval).
Compound Ex: A subject must perform a behavior a certain number of times within a certain period of time.
(Which combined schedule?) Two or more schedules operating at the same time.
Concurrent
Molar maximizing fails to explain choice in concurrent VI-VI schedules and concurrent VR-VI schedules. Why?
Concurrent VI-VI schedules: subjects switch back and forth to the schedule that is most likely to give them a reinforcer. This goes against sticking with the schedule that requires the least amount of effort. Concurrent VR-VI schedule: subjects response much more on the VI schedule than predicted by the molar maximizing approach.
In FR 1, the FR schedule is also called what? Why is it called this?
Continuous reinforcement (CRF). It is called this because every instrumental behavior is reinforced.
A ________ ________ is a way of representing how a response is repeated over time. It shows the total number of responses that have occurred up to a particular point in time.
Cumulative record
The value of a reinforcer (increases/decreases) as a function of the delay.
Decreases
How can you avoid ratio strain?
Don't raise the FR ratio too much at one time.
Self control can be trained. How?
Exposure individual to delayed rewards. This increases their tolerance for delay.
Both ______ and ______ schedules produce a post-reinforcement pause and high rates of responding just prior to the availability of the next reinforcer.
FI and FR
(True/false) Reward discounting is steeper in senior adults than young adults when it comes to monetary rewards.
False. It's the other way around. This means a large/delayed monetary reward will have less value to a young person compared to an old person.
(True/false) Behavior is motivated in the same way in ratio and time schedules. Explain if false.
False. Ratio and and interval schedules produce different neurochemical changes. Responding in interval schedules is mediated by the subject's sense of time.
In feedback function, reinforcement is considered to be the ___________ of responding.
Feedback
Another explanation for the higher response rats on ratio schedules compared to interval schedules is __________ ___________, which is the relationship between response rates and reinforcement rates calculated over a long period of time.
Feedback function
Describe this diagram.
Fixed ratio: the amount of responses needed to earn a reward is fixed. There are pauses after each reward. Variable ratio: the amount of responses needed to earn a reward is varied. Animal continuously performs behavior without pausing because it doesn't know when reward will arrive. Fixed interval: the amount of time between each reward is fixed. There are pauses after each reward. Variable interval: the amount of time between each reward is varied. Animal continuously performs behavior because it doesn't know when reward will be available next.
Give an example of a limited hold situation.
Going to the movies. The movie takes place at a fixed interval, but the reinforcer (the movie itself) is available only for a limited amount of time.
According to the matching law, larger, more palatable, and immediate reinforcers are of (greater/lesser) value.
Greater
Provide an example scenario of molecular maximizing.
In a concurrent VI-VI schedule, the subject will continue to respond to Schedule A while the timer controlling Schedule B is still operating -The longer the subject stays on Schedule A, the greater the probability the required interval for Schedule B will elapse -Thus, by switching to Schedule B the subject can get the reinforcement
Describe this diagram.
In the direct-choice procedure, the pigeon will likely choose the small reward. In the concurrent-choice procedure, the pigeon will likely choose the large reward. No delay = choose small reward, delay = choose large reward.
In FR n, the FR schedule is also called what? Why is it called this?
Intermittent or partial reinforcement (PRF). It's called this because the instrumental behavior is only reinforced sometimes, not every time (like in FR 1).
In (ratio/interval) schedules, waiting (longer IRTs) allows for the reinforcer to be set up first.
Interval
To study how organisms make choices that involve commitment to one alternative or the other, the concurrent-chain schedule of reinforcement was developed. Describe this.
Involves two stages or links. The first is called a choice link, where the participant is allowed to choose between two schedule alternatives by making one of two responses. Next is the terminal link, which is where the reinforcement schedule is met. Once the schedule in the terminal link has ended, the subject is again presented the choice link, and must choose which key to press based on which terminal link it wants.
Performance on an FI schedule says what about animals and time?
It shows that animals are accurate in telling time and suggests that they have an internal clock.
Molar theory can deal with concurrent ratio-ratio schedules. Explain how.
It theorizes that subjects maximize reinforcement by responding to the ratio that requires the least amount of effort to provide the same amount of reinforcer. That is, subjects stick to a schedule of reinforcement.
Drug addicts tend to choose quick small rewards and discount long-term consequences. The value discounting function can explain this lack of self control by lessening the value of which parameter?
K
In variable interval, typically once the reinforcer is made available, it remains available until the subject makes the required response, unless there is a _________ _________.
Limited hold
(Melioration) If a subject responds 75 times on alternative A during the first 10 minutes of a 60 minute experimental session, the (overall/local) rate of response A is 75/10 = 7.5.
Local
FR schedule requirements influence the length of postreinforcement or pre-ratio pauses. The higher the ratio, the (shorter/longer) the post-reinforcement pause.
Longer Ex: there will be a longer pause during FR 25 compared to FR 10
___________ means to make something better. It operates on a timescale between molar and molecular mechanisms and focuses on local rates of responding and reinforcement.
Melioration
(Which combined schedule?) Like tandem schedules, but the different component schedules are presented in a random order.
Mixed
(Delay discounting) With lower values of K, the reward discounting function will be (steeper/more gradual). This results in self control.
More gradual
Grades and GPA correlate (positively/negatively) with reward discounting. Explain.
Negatively. Higher grades and GPA are associated with more self-control. Large/delayed rewards have more value.
What does FR 1 mean?
One response is required for each reinforcer. That is, every instrumental response results in the delivery of the reinforcer.
Describe molar maximizing.
Organisms respond to get the most reinforcement in the long run, or overall.
(Melioration) If a subject responds 75 times on alternative A during a 60-minute experimental session, the (local/overall) rate of response A is 75/60 = 1.25.
Overal
___________ is increased sensitivity of the choice behavior to the relative rates of reinforcement.
Overmatching In other words, this is when a pigeon pecks at a key more than the rate of reinforcement predicted.
Explain how this cumulative record chart recorder works.
Paper moves out of the machine to the left at a constant speed. Each response causes the pen to move up the paper one step. No responses occurred between points A and B. A slow rate of responding occurred between B and C, and responses occurred more frequently between C and D. At E, the top of the page was reached so the pen reset to the bottom of the page.
Performance on a FI schedule can be improved with an external clock. Describe the experiment by Ferster and Skinner (1957) with pigeons that showed this.
Pigeons were presented a spot of light that grew into a slit as time passed during the FI cycle. This treatment produced longer postreinforcement pauses and a shift of responses closer to the end of the cycle.
The slope of the line made by the cumulative record chart recorder represents the subject's what?
Rate of responding
In (ratio/interval) schedules, faster responding (shorter IRTs) is better. The faster the subject completes the behavior, the faster it will receive the reinforcer.
Ratio
After postreinforcement or pre-ratio pause, responding is again high and steady until the reinforcer is delivered. What is this called?
Ratio run
A sudden change from low ratio to high ratio can produce a pause before completion of the requirement. What is this called?
Ratio strain Ex: If you go from FR 5 to FR 10, a rat will not lever press as quickly because after 5 presses it's like oh guess I don't get my treat anymore
What does FI 4 mean?
Reinforcement occurs for the first response after 4 minutes. The subject waits for 4 minutes and then gets the reinforcer for the first response.
What does FR n (being n>1) mean?
Responding is reinforced only after a defined number of responses. The reinforcer is delivered after every nth response.
A program or rule that determines how and when the occurrence of a response will be followed by a reinforcer is called a?
Schedule of reinforcement
Relative rates of responding do not always exactly match relative rates of reinforcement. Therefore, the matching law was revised by Baum (1974), who renamed it the generalized matching law. What did he change it to?
See image.
What is the formula for the value discounting function? What does each variable stand for?
See image.
What was the original formula for the matching law? What does each letter stand for?
See image.
What is the formula for the matching law that takes into account other alternative responses which yield intrinsic reinforcers? What do the variables stand for?
See image. Bx is a target response and Bo is any other response. rx is a target reinforcement and ro is another intrinsic reinforcement.
(Mechanisms of the matching law) Describe melioration.
Somewhere in between molecular and molar theories.
(Which combined schedule?) The same as compound schedules, but sequentially presented.
Tandem
Explanations of the matching law are based on the idea that responses are distributed so as to maximize what?
The amount of reinforcement. That is, subjects switch back and forth between response alternatives in order to receive as many reinforcers as they can.
(Melioration) What is the overall rates?
The amount of time a participant devotes to a particular choice alternative over the entire duration of an experimental session.
What is the matching law? Give an example.
The amount of times an animal responds to a choice matches the amount of reinforcers they get from that choice. For example in a VI 5 VI 60 concurrent schedule, pigeons peck the VI 5 key more than the VI 60 key because the VI 5 key gives them more reinforcers.
Explain this equation in words.
The amount that an animal performs behavior x is directly proportional to the amount that reinforcer x is presented. It is inversely proportional to reinforcers given for any other behavior. When you want to increase the probability that one behavior will occur, you lessen reinforcement for all other behaviors.
What does VR 10 mean?
The average number of responses required for obtaining a reward is 10. Ex: 10 responses to earn first reward, 13 for the next, 7 for the next, 5 for the next, and 15 for the next. This averages out to 10.
Why might ratio schedules produce higher rates of responding than interval schedules?
The critical factor might be the spacing between responses just before reinforcement, or the interval between one response and the next.
FI schedules produce a fixed interval scallop. What is this?
The cumulative record shows a postreinforcement pause, followed by increased responding just prior to availability of the reinforcer.
What is the interresponse time (IRT)?
The interval between successive responses.
What is a fixed interval (FI)?
The interval is kept constant from one occasion to the next.
Explain this diagram.
The is a concurrent-chain schedule. A pigeon is first presented with a choice link. Based on which key it presses, it is presented with one of two terminal links that have different reinforcement schedules.
Herstein (1961) studied distribution of responses to various concurrent VI-VI schedules. What did the results indicate?
The relative rate of responding on an alternative (one of two keys) matched the relative rate of reinforcement on that alternative.
According to the melioration theory, adjustments in the distribution of behavior are assumed to continue until?
The same local rate of reward is obtained for all alternatives.
Describe a limited hold.
The subject has a limited amount of time to make a required response.
Describe the role of classical conditioning in concurrent-chain schedules.
The terminal links (the specific colors on the pecking keys) become associated with food (a primary reinforcer). This, the terminal links become secondary reinforcers. The subject chooses which key to press in the choice link based on the strength on each terminal link as a conditioned reinforcer.
Explain these graphs.
These graphs show delay discounting with different values of K. On the left, K is very large because the slope is great. This means that reward value is very low except for directly before reward presentation. This is somebody with little self control. On the right, K is small because the slope is gradual. Reward value is much higher at all points. This is somebody with a lot of self control.
Provided that a similar number of responses is required for each reinforcer, FR and VR schedules yield similar overall response rates. How do they differ?
They differ in how the response rates are distributed over time. FR schedules produce a pause-run pattern, whereas a steady pattern of responding is observed with VR schedules.
Reinforcement schedules are rarely presented in isolation. In the laboratory, different schedules can be combined. Explain this diagram that shows this.
This pigeon is being presented with two schedules. The left key is VI 60 sec, meaning food becomes available an average of every minute. The right key is FR 10, meaning every time the pigeon pecks the key 10 times, it gets food. The pigeon will likely focus on the right key because it's reliable, however it may go to the left key every now and then because it knows that sometimes when it pecks the key it results in a reward.
(Melioration) Local rates are calculated only over the?
Time period that a participant devotes to a particular choice alternative.
According to the melioration theory, why do organisms change from one response alternative to another?
To improve local rate of reinforcement.
(True/false) Melioration results in matching.
True
(True/false) The matching law describes how responses are distributed, but doesn't explain what mechanisms are responsible for the response distribution. It is a descriptive law of nature rather than a mechanistic law.
True
What is more common, overmatching or undermatching?
Undermatching
_________ is reduced sensitivity of the choice behavior to the relative rates of reinforcement.
Undermatching In others words, this is when a pigeon pecks at a key less than the rate of reinforcement predicted.
Both ______ and ______ schedules produce steady rates of responding, without predictable pauses.
VR and VI
How can the discounting function explain self control, or the choice between impulsive behavior (small/immediate reward) and self control (large/delayed reward)?
When you choose a small/immediate reward, the value of the reward decreases rapidly at first, then more slowly (hyperbolic delay). Thus, when there is a short wait for a small reward, the small reward has a greater value than the large reward. But as the delay for both rewards gets longer, the value of the large reward becomes greater then the value of the small reward.
Even single-response situations involve a choice. What is the choice between?
Whether to make a specific response or to engage in any other activities.
Describe differential reinforcement of low rates (DRL).
You get a reinforcement only for completing a behavior AFTER a certain amount of time has elapsed.
Describe differential reinforcement of high rates (DRH).
You get a reinforcement only if you complete a behavior BEFORE a certain amount of time has elapsed. This requires you to do things quickly!
What variables did Baum (1974) add to the matching law? What do they stand for?
b: response bias s: sensitivity of the choice behavior to the relative rates of reinforcement