Learning Ch 7
associative structure of instrumental conditioning
-Thorndike -molecular approach: focuses on individual responses and the specific stimulus antecedents and outcomes of those responses -instrumental conditioning involves more than just a response and a reinforcer. the instrumental response occurs in the context of specific environmental stimuli -3 events to consider in an analysis of instrumental learning: stimulus context (S), instrumental response ®, response outcome/reinforcer (O) -Skinner also believed in the three term contingency
explanation of reinforcement effects
-a reinforcement effect is identified by an increase in the occurrence of an instrumental response above the level of that behavior in the absence of the response-reinforcer contingency -increased performance of the instrumental response (a reinforcement effect) results from a reallocation of responses that minimize deviations from the free baseline or bliss point
S-R association
-association between the contextual stimuli (S) and the instrumental response ®: S-R association -this is central the law of effect, which states that instrumental conditioning involves the establishment of an S-R association between the reinforcer and the contextual stimuli that are present when the response is reinforced. the role of the reinforcer is to stamp in/strengthen the S-R association. -Thordike believed that once established, the S-R association was solely responsible for the occurrence of the instrumental behavior. thus, the basic motivation for the instrumental behavior was the activation of the s-r association by exposing the participant to the contextual stimuli, in the presence of which the response was previously reinforced. -the law of effect implies that instrumental conditioning does not involve learning about the reinforcer or response outcome at all. aka consequences don't matter (i.e. drug addiction)
conditioned emotional states or reward-specific expectancies?
-the two-process theory assumes that classical conditioning mediates instrumental behavior through the conditioning of positive or negative emotions depending on the emotional valence of the reinforcer. however, organisms also acquire specific reward expectancies instead of just categorical positive or negative emotional during instrumental and classical conditioning. furthermore, in some cases reward specific expectancies appear to determine the outcome of Pavlovian instrumental transfer experiments (ex: CS+ of food pellets facilitated instrumental responding reinforced with pellets much more that instrumental behavior reinforced with sugar solution).
two-process theory Rescorla and soloman 7.1 aimed at explaining S-O association
-two distinctive types of learning: Pavlovian and instrumental conditioning -during the course of instrumental conditioning, the stimuli (s) in the presence of which the instrumental response is reinforced become associated with the response outcome (o) through pavlovian (classical) conditioning, and this results in an S-O association. -the S-O association activates an emotional state that motivates the instrumental behavior...the emotional state is positive or negative depending on whether the reinforcer was an appetitive or an aversive stimulus -tested with pavlovian instrumental transfer experiment
mitchel and stoffelmayr study that utilized premaack principle
2 hospitalized patients with chronic schizophrenia and refused all tangible reinforcers that were offerend (candy, cig, fruit, biscuits) - other patients in the ward participated in a workproject that invovled removing copper wire from coils, all schizo patients wanted to do was sit - SO used sitting as reinforcer - Asked patients to stand, handed them the wire, if they made the coil stripping responses they could sit for 90 seconds =highly successful. - Contingency = more work than before
fig 7.9 demand curve : the relation between how much a commodity is purchased and the price of the comoddity
Fundamental to the application of economic concepts to the problem of reinforcement is the relation between the price of a commodity and how much of it is purchased. This relation is called the demand curve. Figure 7.9 shows three examples of demand curves. Curve A illustrates a situation in which the consump- tion of a commodity is very easily influenced by its price. This is the case with candy. If the price of candy increases substantially, the amount purchased quickly drops. Other commodities are less responsive to price changes (Curve C in Figure 7.9). The purchase of gasoline, for example, is not as easily discouraged by increases in price. People con- tinue to purchase gas for their cars even if the price increases, showing a small decline only at the highest prices.
how is response allocation appraoch different from traditional view of instrumental behavior SRO
Instead of considering instrumental conditioning in terms of reinforcement of a response in the presence of certain stimuli (molecular), response allocation is a molar approach that focuses on HOW instrumental conditioning procedures put limitations on organisms activities and cause redistribution of behavior among available response options. Contributions: Reinforcement effects are regarded as the consequences of schedule constraints on an organism's ongoing behavior Instrumental conditioning is no longer considered to "stamp in" or to strengthen instrumental behavior. But is seen as creating a new distribution or allocation of responses. Emphasize that instrumental behavior can't be studied in a vacuum or behavioral test tube. All of the organism's response options at a given time must be considered as a system.
how is the response allocation approach diff from premack principle OR the response deprivation hypothesis when considering reinforcement
Premack: responses that are paired with commonly used reinforcers involve activities that individuals are likely to perform (eating will reinforce bar pressing because eating is like bar pressing) - must observe behaviors in baseline - differential probability neccasary - contingency of 2 behaviors = more probable/prefered will reinforce less probable/prefered response allocaiton approach: radically diff worldview (molar) - considers entire range of activities always available to individual, considers how distribution of activities/responses is altered when contingency is introduced and what factors determine the nature of the response allocation response deprivation hypothesis: focuses on whether or not a behavior is prohibited/not allowedd
relationship between response allocation and economics
The behavioral bliss point comes from economic theory and assumes that a person acts to minimize cost and maximize gain - instrumental response = cost (in labor and time) - reinforcing activity= gain (profit) - the prediction is that the animal is emitting the minimal # of contingent responses to obtain the maximum level of reinforcing activities - Minimum Deviation model: the redistribution of responses will try to minimize the total deviation of the two responses from the bliss point - "economics is the study of the allcoatin of behavior within a system of constraint" o in instrumental conditioning restrictions are: • the number of responses an organism is able to make (income) • the number of responses required to obtain each reinforcer (the price of the reinforcers) • therefore the price is determined by the schedule of reinforcement • and responses equivalent to price → reinforcer (commodity) • = GOAL IS TO UNDERSTAND how instrumental responding (spending) is controlled by instrumental contingencies (prices)
2 approaches for studying motivation of instrumental behavior
associative structure of instrumental conditioning response allocation approach
hogwarth and chase conducted with college students smoking - R-O
guy who smoked several times a week. 2 choice concurrent schedule of reinforcement used. o Pressing one key reinforced with picture of ¼ of cigarette o Pressing the other key was reinforced with 1/3 of a chocolate bar o Both were devalued in different groups o Tested again o After devaluation if tobacco was devalued, only 40% of the time did they respond with tobacco, if chocolate was devalued they responding with chocolate around 55% of the time R-O mechanism predominate in free operant situations, S-R mechanism are activated when drug taking is a response to drug related cues
response deprivation hypothesis
probability of reinforcer activity is kept at high level by restricting access to the reinforcer hypothesis: even low probability responses can serve as a reinforcer provided that participants are restricted from making this response - responses that are prohibted become efficient reinforcer - rats arent allowed to have food before food is reinforcer CONTRARY TO PREMACK: shows that response deprivation is more basic to reinforcement effects than differential response probability simple new strategy for creating reinforcers = withold reinforcer until specified instrumental response has been performed
response allocation model behavioral blisspoint and introduction to instrumental contingency fig. 7.7
response allocation: refers to how an individual distributes their responses among the various options that are available bliss point/unconstrained baseline: what you spend time doing when you have no obligations/all free time instrumental contingency introduced: how does that change behavior - Figure 7.7: how high school students disrupt their activities when free time vs. studying.... More time on FB when they are studying... without restrictions student spends 60 min on fb for every 15 min of studying. - Unrestricted baseline = behavioral bliss point (preferred response allocation in absence of restrictions - Students adamant about defending their baseline- will add 30 minutes of studying to maintain the 60 minute on facebook - Dilemma faced: defending baseline of fb has disadvantages ( must study for 45 min) and defending baseline for study time has disadvantages (15 minutes study= 15 min fb)
response allocation approach:
skkinerian involves considering instrumental conditioning within the broader context of what organisms do molar perspective" global perspective, consideres long term goals, how organisms manage goals within context of behaviors, concerned with how instrumental activities limit behavior
S-O association
- Instrumental response ensures that the participant will always experience certain distinctive stimuli (s) in connection with making the response. o S- stimuli involve place where the response is performed, the texture of the object, olfactory, or visual cues. o Reinforcement of the instrumental response will inevitably result in pairing of stimuli with reinforcer/response outcome (O) - what happens AFTER response
S(R-O) associaiton
- R-O associations cannot act alone to produce instrumental behavior. Not sufficient for the behavior occurred in the first place - One possibility: R-O association is activated by the stimuli that are present when the response is reinforced. - So S-R and S(R-O) - Evidence firmly established this association during instrumental learning
Kurtz/casey/charlop 1990 study of autism and related skills
- children with autism that engaged in unusual repetitve or stereotyped behaviors (delayed echolalia involves repeating words) - preservative behavior: involves persistent manipulation of an object (a child may repeatedly handle on certain plastic toys) - = these responses can act as reinforcers in treatment procedures - The tasks included identifying which of several objects was the same or different from the one held up by the teacher, adding up coins, and correctly responding to sentences designed to teach receptive pronouns or prepositions. - Reinforcers: preferred food vs. opportunity to perform a stereotyped response for 3-5 seconds - Delayed echolalia and perseverative behavior both served to increase task performance above what was observed with food reinforcement. These results indicated that high-probability responses can serve to reinforce lower probability responses, even if the reinforcer responses are not characteristic of normal behavior.
reinforcer devaluation procedure as evidence for R-O associations
- devaluation of reinforcer because if you devalue the reinforcer (like make food aversive) the responding should decrease IF the instrumental response occurs because of an R-O association. - Similar procedure to the US devaluation;
concept of consumer demand and how its used to analyze instrumental behavior
-# of responses performed (or time spent responding) = money -reinforcer obtained= commodity that is purchased -price of a reinforcer is time/# of responses required to get reward -study: 3 plungers cud be pulled for diff reinforcers (3 puff on cig, 5cents, or 25 cents) -the response requirement for getting the reinforcer increased during each session -5cent most elastic (like candy-more responses required les likely to respond) -least elastic is cigarette puff willing to make many more responses for puffs than for monetary rewards - Consumer demand: the relation between how much of a commodity is purchased and the price of the commodity - Elasticity of demand: degree to which price influences consumption o Demand for candy is highly elastic (more it costs the less buy) o Demand for gasoline is less elastic (more it costs, still buy gas) - Determinants of elasticity of demand: availability of substitutes - most important determinant o Availability of substitutes increases sensitivity of original commodity to price o Price range: an increase in price has less of an effect at low prices than at high prices o Income level: higher the income, less deterred by increase in price o Link to complementary commodity: if price of one commodity drives down demand, it will also affect demand of the complementary commodity (like computers and software, or lettuce and salad dressing)
consummatory-response theory
-attributes reinforcement to species-typical behaviors. species typical consummatory responses (eating, drinking, etc) are themselves the critical feature of reinforcers -this theory moved the search for reinforcers from special kinds of stimuli to special types of stimuli to special types of responses. reinforcer responses were assumed to be special because they involved the consummation, or completion, of an instinctive behavior sequence. the theory assumed that the consummatory responses are fundamentally different from various potential instrumental responses, such a running, opening a latch, pressing a lever. 9. Describe the consummatory response theory and why this was such a radical viewpoint of what a reinforce was. - reinforcers like food and water elicit species typical unconditioned responses (chewing, licking swallowing). - Consummatory response theory: attributes reinforcement to these species typical behaviors. o Species typical consummatory responses are themselves the critical feature of reinforcers. - Radical: radical innovation because it moved the search for reinforcers from special kinds of stimuli to special types of response. o Reinforcer responses were assumed to be special because they involved the consummation or completion of an instinctive behavior sequence.
response interactions in PIT KRANK effects of pavlovian CS for alcohol on instrumental responding reinforced by alcohol
-classically conditioned stimuli elicit not only emotional states but also over responses such as sign tracking. -consequently, the overt responses elicited by a pavlovian CS may influence the results in a pavlovian instrumental transfer experiment -experiment: lab rats in a chamber had two response levers, one on either side of a water well. -rats were first trained to press either response lever reinforce by a drop of sweetened water, but was gradually replaced by ethanol. -pavlovian conditioning was conducted during the next 8 sessions: a light was presented above each of the response levers,the response lever was removed from the chambers. light was paired with ethanol. the light came to illicit a sign tracking response...the rats approached and sniffed the light. -then levers were replaced and PIT test took place. the paired (CS plus ethanol) group showed a significant increase in lever pressing during the CS period if the CS was presented on the same side as the lever the rat was pressing. -shows that an independently established S-O association can facilitate instrumental responding reinforced by that outcome. because the levers were removed from the chambers during the Pavlovian phase, no S-R associations could have been learned during that phase. -the facilitation of instrumental responding occurred only if the Pavlovian CS was presented on the same side as the lever that rat was pressing...thus the results of the PIT depend of the pavlovian CR and instrumental response
the response allocation approach
-considers a broad range of activities that are always available to an individual (instead of just focusing on the instrumental and reinforcer responses). -examines how the distribution of responses is altered when an instrumental conditioning procedure is introduced and what factors determine the haute of the response reallocation -unconstrained baseline: starting point for these analyses. the unconstrained baseline is how the individual allocated their responses to various behavioral options when there are no restrictions and presumably reflects the individual's unique preferences (also referred to as the behavioral bliss point).
S-O association
-expectancy of reward motivates instrumental behavior -specification of an instrumental response ensure that the participant will always experience certain distinctive stimuli (S) in connection with making the response. these stimuli may be the location, or distinctive olfactory or visual cues when the response in performed. whatever the stimuli, reinforcement of the instrumental response will inevitably result in pairing these stimuli (S) with the reinforcer or response outcome (O) -S-O association provides the potential for classical conditioning
applications of premack principle
-had an enduring impact in the design of reinforcement procedures used to help various clinical populations and remains the basis for various point systems and voucher systems used in residential treatment settings. -schizophrenic patients that refused tangible reinforcers to remove tightly wound copper wire from coils. liked sitting a lot, so used sitting as a reinforcer. they were allowed to sit for 90 seconds after the appropriate amount of coil-stripping responses. -children with autism display delayed echolalia (repeating words) and perseverative behavior (persistent manipulation of an object). in a study, autistic children were taught academic skills and either a preferred food or a stereotyped behavior served as a reinforcer. results: the opportunity to engage in a prevalent stereotyped response resulted in better performance on the training tasks than food reinforcement. these results indicate that high-probability responses can serve to reinforce lower probability responses, even if the reinforcer responses are not characteristic of normal behavior -premack principle encouraged thinking about reinforcers as responses rather than as stimuli
behavioral economics
-how is the allocation of behavior among an individual's response options altered by the constraints imposed by an instrumental conditioning procedure -in instrumental conditioning situations, the restrictions are provided by the number of responses an organisms is able to make (its income) and the number of responses required to obtain each reinforcer (the price of the reinforcer)
S(R-O) relations
-in addition to activating R directly, S also activates the R-O association. the subject comes to think of the R-O association when it encounter S and that motivates it to make the instrumental response. -Skinner supported this association - R-O associations cannot act alone to produce instrumental behavior o R-O associates is activated by the stimuli S that are present when the response is reinforced. So S activates R indirectly and activates R-O association. o - three contingency model (skinner) • animals and humans has established the S (R-O) associations are learned during the course of instrumental conditioning. • Experiements on S(R-O) use complicated discriminations training procedures - Skinner argued that voluntary human behaviors are selected on the basis of their consequences o Accidental variations in behavior are selected by their reinforcing consequences o Concept of SELECTION IS the kEY: similar to the
response deprivation hypothesis
-in most instrumental conditioning procedures, the probability of the reinforcer activity is kept at a high level by restricting access to the reinforcer -response-deprivation hypothesis: argued that restriction of the reinforcer activity was the critical factor for instrumental reinforcement -several researchers found that even a low-probability response can serve as a reinforcer provided that participants are restricted from making this response -study: students with mental retardation. teachers identified things the students were not likely to do, for example filing cards and tracing letters. however, the opportunity to trace (LP) became an effective reinforcer for filing behavior if access to tracing was restricted below baseline levels. this result is contrary to the premack principle. -the response deprivation hypothesis makes clear that a reinforcer is produced by the instrumental contingency itself and shows that reinforcer is not something that must exist independent of an instrumental conditioning procedure.
imposing an instrumental contingency
-individuals will generally defend their response allocations against challenges to the unrestricted baseline or bliss point condition -the baseline response allocation usually cannot be reestablished after an instrumental contingency has been introduced -study: unrestricted baseline point was 60 minutes on FB for every 15 min of studying. once the instrumental contingency is imposed, there is no way the student can be on FB for 60 min and only study for 15 min. if he or she insists on being on FB for 60 min, the student will have to tolerate adding 45 min to their studying time. on the other hand, if the student insists on spending only the 15 min on his or her studies (as in the baseline condition), he or she will have to make do with 45 min less on FB. defending the baseline study time or defending the baseline FB both have their disadvantages...that is often the dilemma posed by an instrumental contingency. -the distribution of responses btw the instrumental and contingent behaviors becomes a matter of compromise. the rate of one response is brought as close to possible to its preferred level without moving the other response too far away from its preferred level
viewing reinforcement contingencies in a broader behavioral context
-instrumental contingencies occur in the context of all the responses and reinforcers the participant has available -the effect of a particular instrumental conditioning procedure depends on what alternative sources of reinforcement are available, how those other reinforcers are related to the particular one involved in the instrumental contingency, and the cost of obtaining those alternatives -ex: if student enjoys listing to their iPod as much as being on FB, restrictions of FB may not increase styling behavior bc they can just switch the listing to ipod. this undermines the instrumental contingency.
minimum deviation model
-introduction of a response-reinforcer contingency causes organisms to redistribute their behavior between the instrumental and contingent responses in a way that minimizes the total deviation of the two responses from the unrestricted baseline or bliss point
consummatory response theory
-reinforcers like food and water elicit species typical unconditioned responses (chewing, licking swallowing). - Consummatory response theory: attributes reinforcement to these species typical behaviors. o Species typical consummatory responses are themselves the critical feature of reinforcers. - Radical: radical innovation because it moved the search for reinforcers from special kinds of stimuli to special types of response. o Reinforcer responses were assumed to be special because they involved the consummation or completion of an instinctive behavior sequence.
R-O relations in instrumental conditioning
-most common technique used to demonstrate the existence of R-O association involves devaluing the reinforcer after conditioning -if reinforcer is food, can use conditioned taste aversion. if the instrumental response occurs because of an R-O association, devaluation of the reinforcer should reduce the rate of the instrumental response -for pavlovian conditioning, if US devaluation disrupts the ability of the CS to elicit a CR, one may conclude that the CS activated the memory of the US and responding declined because the US memory was no longer as attractive -study: people who smoked several times per week. pressing one of the keys on a keyboard was reinforced with a picture of 1/4 of a cigarette, other key reinforced with pic of 1/4 of a chocolate bar. after acquisition trials, the reinforcers were devalued. one group was able to smoke a cigarette and one group was able to eat many chocolate bars. -during training, the two outcomes were equally preferred (50% of responses on each key) before devaluation. when the tobacco outcome was devalued, responding on that key significantly declined. when the chocolate outcome was devalued, responding on the cigarette key increased, indicating a decline in the chocolate response. thus, devaluation produced a decline in behavior specific to the response whose reinforcer had been devalued. -the result of the devaluation tests indicate that training established an R-O association linking each response with its specific reinforcer. the results cannot be explained by S-R associations because S-R association are not influence by reinforcer devaluation. the results also cannot be explained by S-O associations because S-O associations could not explain the response specificity of the devaluation effects that were observed. -results indicate that R-O associations are also involved in instrumental drug seeking behavior - Devalution: accomplished by satiating the participants with the corresponding reinforcer. o Chocolate: they ate 8 chocolate bars in 10 min o Cigarette: allowed to smoke an entire cigarette - After devaluation, participants tested on concurrent schedule and told that although they would continue to earn cigarettes and chocolate bars, they would not find out how many of each they obtained until the end of the session (intended to maintain responding on the basis of the current status of the memory of each reinforcer) - When tobacco was devalued: responding on the cigarette key declined, when chocolate was devalued, the cigarette key increased. - = established a R-O association linking each response with its specific reinforcer. o Results cannot be explained by S-R associations because S-R associations aren't influenced by reinforcer devaluation
limitations of response allocation model if it focuses on instrumental and reinforced responses
-must consider the broader context of the organisms response options (like listening to iPod or hanging out with friends besides just FB)
Pavlovian instrumental transfer experiment supports 2 process theory
-phase 1: standard instrumental conditioning (ex: lever CS pressing reinforced with US food) -phase 2: pavlovian conditioning (ex: tone reinforced with food) -phase 3: transfer phase. present pavlovian CS during performance of the instrumental response (lever pressing). if a pavlovian S-O association motivates instrumental behavior, then the rate of lever pressing should increase when the tone CS is presented. -as predicted, the pavlovian CS for food increases the rate of instrumental responding for food -this is presumable bc the positive emotion elicited by the CS+ for food adds onto the appetitive motivation that is involved in lever pressing for food. a suppression of responding is predicted if the Pavlovian CS predicts a negative emotion. this was the case when the CS+ was shock (conditioned fear).
consumer demand
-relation between price of a commodity and how it much it is purchased (demand curve) -curve A: a situation in which the consumption or a commodity is very easily influenced by its price. ex: candy. if the price of candy increases substantially, the amount purchased quickly drops. other commodities are less responsive to price, ex: gasoline -elasticity of demand: the degree to which price influences consumption. demand for candy is highly elastic, demand for gasoline is much less elastic. -concept of consumer demand has been used to analyze a variety of major behavior problems including eating a drug abuse. in a recent study, children increased their purchases of healthy food as the price of unhealthy alternatives was increased -has also been used to analyze instrumental behavior by considering the number of responses performed (or time spent responding) to be analogous to money and the reinforcer obtained to be analogous to the commodity that is being purchased. the goal is to understand how instrumental responding (spending) is controlled by instrumental contingencies (prices). -study: elasticity of demand for cigarettes and money in smokers with a mean age of 40 years who were not trying to quit. reinforcers were either puffs on a cig, 5 cents, or 25 cents. the response requirement for obtaining the reinforcer was gradually increased during each session from FR3 to FR6000 (lever pulls). the investigators wanted to determine at what point the participants would quit responding because the response requirement, or price, was too high. greatest elasticity of demand was 5 cents. smoking had least elasticity of demand.
response allocation and behavioral economics
-response allocation is a molar approach that focuses on how instrumental conditioning procedures put limitations on an organism's activities and cause redistributions of behavior among available response options
the premack principle
-responses that accompany commonly used reinforcers involve activities that individuals are highly likely to perform (ex: food/eating) -instrumental responses are typically low-probability activities -premack principle: given two responses of different likelihood, the opportunity to perform the higher probability response (H) after the lower probability response (L) will result in reinforcement of response L. (L -> H reinforced L). the opportunity to perform the lower probability response (L) after the higher probability response (H) will not result in reinforcement of response H. (H -> L does reinforce H). -focuses on the difference in the likelihood of the instrumental and reinforcer responses -eating will reinforce bar pressing because eating is typically more likely than bar pressing -study: compared lever pressing on a fixed-interval 30 sec schedule, reinforced by either sucrose or the opportunity to run in a wheel for 15 seconds. as expected with a fixed-interval schedule, response rates increased closer to the end of the 30-second period. wheel running as the reinforcer was just as effective at 2.5% sucrose, more effective than 0% sucrose, but less effective than 10% sucrose.
determinants of elasticity of demand
1. availability of substitutes: the availability of substitutes increases the sensitivity of the original item to higher prices (more elasticity of demand) 2. price range: an increase in price has less of an effect at low prices than at high prices 3. income level: the higher your income, the less deterred you will be by increases in price. also true for reinforcers obtained on schedules of reinforcement...the number of responses or amount of time available for responding corresponds to income. the more responses or time animals have available, the less their behavior is influenced by increases in the cost of the reinforcer 4. link to complementary commodity: if the price of hot dogs drives down the number of hot dogs purchases, this will also decrease the purchase of hot dog buns.
Premack principle
responses that accompany commonly used reinforcers involve activities that indivduals are highly likely to preform preferred behaviors or behaviors with a higher level of intrinsic reinforcement = can be used as a reward/reinforcement instrumental responses are low probability/preferred activities high probability/preffered responses are the reinforcer L then H = reinforced H then L= not reinforved the differenced in response probabilityies is critical for reinforcement