PSY 1400 Exam 2 (Chapter 5-8)

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What must we do to use extinction to positively influence behavior?

1. Determine if the problem behavior is operant behavior 2. Identify the reinforcer that maintains the behavior (Called Functional Analysis of Behavior)

3 Objections to Reinforcement

1. Intrinsic Motivation 2. Performance-Inhibiting Properties of Reinforcement 3. Cheating

What are the 2 approaches to determining how reinforcement works?

1. Investigate the neurological events that occur when reinforcers are obtained. If reinforcers influence future behavior, they must do this by changing the biology of the individual. 2. investigates this process at the level of the whole organism and how it interacts with its environment (without going into the neurological specifics that underlie the reinforcement process). Reveal in further detail how present behavior is influenced by past events. Most practical approach to positively influencing human behavior.

6 Principles of Effective Shaping

1. Objectively define the terminal behavior. 2. Along what dimension does the learner's current behavior fall short of the terminal behavior? 3. When mapping out the sequence of successive approximations, ensure that each one is neither too easy nor too difficult. 4. Differential reinforcement: Reinforce the current response approximation and extinguish everything else, including old response approximations. 5. Be sure the learner has mastered each response approximation before advancing to the next one. 6. If the next approximation proves too difficult (extinction), lower the reinforcement criterion until responding is earning reinforcers again.

Generalized Conditioned Reinforcer

A conditioned reinforcer signals a delay reduction to more than one backup reinforcer

Primary Reinforcers

A consequence that functions as a reinforcer because it is important in sustaining the life of the individual or the continuation of the species. Primary reinforcers are those that work because of our genetic inheritance - these reinforcers help us survive. Human and nonhuman animals don't have to learn anything for primary reinforcers to work - we are phylogenetically prepared for these consequences to reinforce our behavior. The logic is the same across all these primary reinforcers: If this consequence did not function as a reinforcer, the individual would be less likely to survive. Without oxygen, we die; therefore, oxygen-seeking behaviors (swimming to the water's surface) are reinforced with access to this primary reinforcer.

Automatic Reinforcer

A consequence that is directly produced by the response - it is not provided by someone else - and which increases the behavior above a no-reinforcer baseline.

Negative Reinforcement - Avoidance (SRA-)

A consequent prevention of a stimulus, the effect of which is to increase operant behavior above its no-reinforcer baseline level. Warning stimulus-- Operant response --> Aversive stimulus prevented. The operant response prevents an unpleasant stimulus change from happening. Operant response happens because of warning stimulus.

Negative Reinforcement - Escape (SRE-)

A consequent removal or reduction of a stimulus, the effect of which is to increase operant behavior above its no-reinforcer baseline level. (Aversive stimulus present-- operant response --> Aversive stimulus removed or reduced)

Differential Reinforcement

A procedure in which a previously reinforced behavior is placed on extinction while a second behavior is reinforced. Decreases 1st behavior and increases the 2nd. If working with alcohol addiction and Behavior 1 (working) becomes extinct, then Behavior 2 (drinking) will be reinforced. Better option than the more common approach of punishment. Differential reinforcement also (1) provides the opportunity to teach an adaptive behavior that will replace the problem behavior and (2) decreases the frequency of extinction bursts and extinction-induced aggression. Used to stop problem behavior of children (especially autistic children). Used in treating substance-use disorders. Behavior 1: Opiate Use -- Extinction Behavior 2: Opiate Refusal -- Reinforcement

Reward Devaluation

A reinforcer will be effective at increasing the frequency of a behavior in some cases, but less so in others. Like Hogarth and Chase study with smokers pushing buttons for cigarettes or chocolate

Objection 2: Performance- inhibiting properties of reinforcement

A second concern about using positive reinforcement is that reinforcement contingencies can actually inhibit performances. This performance decline takes two forms - reinforcers reduce creativity and reinforcers can lead to "choking under pressure." If creativity is an important component of the performance (as it is in any artistic endeavor, and in many problem-solving contexts), a contingent relation should be established between creativity and obtaining the reinforcer. Contingencies arranging very large rewards would appear to increase the probability of choking under pressure.

Flow (Csikszentmihalyi)

A state in which one feels immersed in a rewarding activity and in which we lose track of time and self. Musicians report being in flow while playing a song. Rock climbers describe the flow state achieved when on the wall. Reading a page-turning novel can also produce a feeling of flow. To get a sense of this, consider a rock climber engaged in an afternoon ascent. Will she experience flow? Csikszentmihalyi's research suggests she will if the response-reinforcer contingencies naturally imposed by the rock face have three characteristics. First, flow will occur if the climb is neither too easy nor too difficult for the skills of our climber. If she has chosen a climb that is too easy, she will be bored. Conversely, if the climb is too difficult, she will feel frustrated, angry, and so on as her climbing fails to produce reinforcers (extinction). Flow will be achieved only in the "Goldilocks zone," where the challenge is neither too difficult nor too easy. It needs to be "just right" for her skill level. The responses required to meet the reinforcement contingency will require her complete attention. Second, our climber is more likely to experience flow if the rock wall naturally arranges what Csikszentmihalyi called "proximal goals." A proximal goal on the wall might be making it past a difficult set of holds and crossing over to the next section of the climb. This consequence signals a delay reduction to the ultimate reinforcer - getting to the top of the wall. Therefore, a "proximal goal" can be thought of as a contingency arranging a conditioned reinforcer along the way. Third, the flow state will be achieved if the wall provides immediate task-relevant consequences for the climber's behavior. Said another way, when a skillful or unskillful response occurs, she needs to know that right away. This immediate feedback is provided when the climber either loses a toehold (the immediate, contingent consequence of an unskillful response) or makes it to the next hold (contingent on a skillful response). These consequences are immediate; they mark the exact response that produced the outcome.

Contingency Management

A treatment for drug abuse that uses contingent consequences. A new causal relation between drug abstinence (behavior) and a reward (consequence). IF you abstain from using drugs for the next two days, THEN you will earn a modest cash reward.

Differential Reinforcement of Variability

A unique contingency in which responses, or patterns of responses, that have either never been emitted before or have not been emitted in quite some time are reinforced, and repetition of recent response topographies are extinguished. Researchers in applied settings have begun to explore the utility of differentially reinforcing variability in children with autism, a diagnosis often characterized by rigid patterns of activity or speech.

What kind of learning does AI use and why?

AI uses trial-and-error learning and forced variability to solve complex problems. The problem is that if the AI exploited the first successful action (by exclusively engaging in that action) then it would settle upon a satisfactory solution, but not an optimal one. The thing that makes AI so effective in finding optimal solutions is that it is willing to continuously try something new. Every AI program is written with a forced-variability component. No matter how successful the AI is in solving the problem, its programming forces it to do something different some of the time.

Information Theory of Reinforcement

Agrees that reinforcers increase operant behavior above a baseline level, but it disagrees about how this happens. The Information Theory holds that reinforcers provide information that allows the individual to predict when or where subsequent reinforcers may be obtained. In much the same way that a road sign can direct you to Albuquerque, the delivery of a reinforcer provides information directing individuals to more reinforcers. In the words of Cowie and Davison (2016), "Behavior is... controlled by the likely future, as exemplified in the past"

What does flow have to do with shaping?

Although the climber is not a novice, she is constantly improving her skills. If she chooses to climb on a Goldilocks-zone rock wall, the contingencies inherent to the wall will differentially reinforce skillful responses that she finds challenging, but not impossible. Less skillful responses are not reinforced with progress up the wall. These skillful responses are approximations of the terminal skillset she desires - a skillset that would allow her to climb a wall that today she would find nothing but frustrating. Effective shaping, whether imposed naturally by a good rock wall or arranged artificially by a skilled video game designer or behavior analyst, will meet the learner where they are — offering challenges, immediately reinforcing skillful responses, and arranging conditioned reinforcers along the way. When achieved, the learner improves or acquires new skills, while achieving a state of flow in which all sense of time and self is lost.

Extinction-Induced Variability

An increase in the variety of operant response topographies following extinction. Saying Hey Siri with different tones of voice if she doesn't respond.

Examples of negative reinforcement- avoidance

Behavior -- Aversive event prevented Turning in an assignment on time-Avoid a late penalty Putting on sunblock-Avoid getting a sunburn Paying your electricity bill-Avoid a service interruption Ordering medium-heat wings-Avoid burning mouth Saying "yes, that shirt looks good on you"-Avoid your friend getting upset Withdrawing your hand from a growling dog-Avoid getting bit Using a condom-Avoid an unplanned pregnancy

Information Theory of Reinforcement

Behavior is controlled by the likely future, as exemplified in the past. Past reinforcement experiences provide information about what's likely to happen next. To explain the PREE, Information Theory holds that the estimate of the likely future is updated only when the individual detects a change in the reinforcement rate. If it's difficult to detect the change in reinforcement rate following extinction, then the estimate of the likely future is unchanged for a much longer period of time.

Rewards

Beneficial consequences that we thing will function as reinforcers, but we don't know if they will.

Functional Analysis of Self-Injurious Behavior

Conducting a functional analysis of behavior is critical in the treatment of self-injurious behavior. Is this behavior an operant? If so, what reinforcer could possibly maintain behavior that is so clearly harmful to the individual? Answers to these questions will influence the treatment prescribed. Therefore, it is good news that a functional analysis of behavior identifies the reinforcer maintaining problem behavior most of the time. Because the physical stimulation of self-injurious behavior is an automatic consequence of the response, if it functions as a reinforcer, it will be identified as an automatic reinforcer. To find out if self-injury is maintained by automatic reinforcement, the individual would, safety permitting, be given some time alone while the behavior analyst discretely records the frequency of self-injury. If self-injury occurs at the usual rate while alone, then the only consequence that might maintain this behavior is the automatic outcome of that response, that is, the "painful" stimulation. If automatic reinforcers maintain a problem behavior, then extinction is impossible - the therapist cannot turn OFF the automatic stimulation experienced each time face slapping occurs (Vollmer et al., 2015). Preventing self-injury by restraining the client is not extinction - during extinction, the response occurs but the reinforcer does not. Let's assume that automatic reinforcement is not responsible for the self-injurious behavior. During a functional analysis of behavior, several other hypothesized reinforcers will be evaluated. Does problem behavior occur primarily when attention is the consequence of self-injurious behavior? If so, then extinction will involve no longer delivering this positive reinforcer contingent upon self-injury. However, it is often ethically impossible to extinguish attention-maintained self-injurious behavior. One cannot ignore self-injury if serious physical harm to the client is the outcome. Alternatively a functional analysis may reveal that self-injurious behavior occurs because it allows the individual to escape from everyday tasks. For example, if self-injury occurs when the individual is asked to change clothes or transition to a new activity, then negative reinforcement (escape from the activity) is responsible for the problem behavior. In such cases, escape extinction can be effective in reducing self-injury; that is, clothes will be changed and transitions between activities will occur regardless of self-injurious behavior. But, as mentioned earlier, the behavior analyst should be prepared for extinction-induced emotions, physical aggression, and topographical variability in the self-injurious behavior, including an increase in the magnitude of the self-injurious response. Such reactions may make it impossible to use extinction alone.

Punisher

Consequences that decrease the probability of behaviors.

Examples of negative reinforcement - Escape

Deleting an email-Email removed from inbox Taking an aspirin-Headache goes away Picking your nose-Booger is removed Soothing an infant-Baby stops crying Taking out the trash-Garbage removed from apartment Putting out a fire-Fire stops burning Agreeing to go to the bar-Peer pressure ends Seeing a behavior therapist-Depression is reduced Leaving a room-Ending an unwanted conversation

Shaping

Differential reinforcement of successive approximations to a terminal behavior. Because the trainer asked for only a little more than before, the elephant succeeded in touching the target. This general strategy of asking for just a little bit more and gradually moving behavior toward a desired behavior is called "shaping." Let's dissect this definition so readers will fully understand shaping. Differential reinforcement involves reinforcing the desired behavior and extinguishing previously reinforced behaviors. For the elephant, the desired behavior is touching the trunk to the target stick. All other apple-seeking behaviors (e.g., trying to take apples from the trainer's bucket) are extinguished. Second, the terminal behavior is the performance you ultimately want. The trainer ultimately wants the elephant to walk to the target whenever and wherever it is presented. However, the trainer knows that if she asks for this performance at the beginning of training, the elephant will not be able to do it. If this happens, training fails, and the behavior of both the trainer and elephant is extinguished. Finally, shaping involves differentially reinforcing successive approximations to the terminal behavior. That is, we begin by reinforcing a first approximation of the terminal behavior; something simple, something we anticipate the elephant can already do. If the target is placed just an inch away from the trunk, the elephant will probably touch it. When the target is touched, the conditioned reinforcer is delivered and then the backup reinforcer is provided. The conditioned reinforcer marks the correct response and signals a delay reduction to the apple. All other apple-seeking behaviors are extinguished. Once the first approximation is mastered, shaping continues through a series of successive approximations of the terminal behavior. The next approximation is taking a step to reach the target. When the step is taken and the target is touched, the conditioned reinforcer marks the response and the backup reinforcer follows. The previous approximation, reaching for the target without taking a step, is extinguished, as are all other apple-seeking behaviors. Once the second approximation is mastered, the trainer moves on to the third (two steps must be taken to reach the target) and so on, until the terminal behavior is mastered.

The Response Strengthening Theory of Reinforcement

Each obtained reinforcer is hypothesized to strengthen the behavior it follows. The more frequently an operant behavior is followed by a reinforcer, the theory goes, the more firmly it is established, and the more difficult it will be to disrupt (Nevin & Grace, 2000). A metaphor for this hypothesis is a bucket of water - the heavier the bucket, the more response strength; the more response strength, the more probable is the response. Obtained reinforcers give the bucket its weight. Each reinforcer adds a bit more water to the bucket. Frequent or big reinforcers create a heavier bucket than infrequent or small reinforcers.

Response-Strengthening Theory of Reinforcement

Each time a reinforcer is obtained, the strength of the response it follows is increased. Operant responses with a lot of response strength are more likely to occur. Inconsistent with PREE and spontaneous recovery of operant behavior.

Examples of positive reinforcers

Food, water, electronic brain stimulation, heroin, methylphenidate, alcohol, cocaine, social reinforcers (reciprocal smiling, responsive parents)

2 categories of functional variables that may be used therapeutically to change behavior

Functional antecedent variables Functional Consequence variables

What are some everyday behaviors that are operant behaviors influenced by contingent consequences?

Grades, paychecks, soothed infants, smiles, frowns, speeding tickets.

What are 3 reasons for distinguishing between positive and negative reinforcment?

Heuristics, Loss Aversion, and Preference for Positive Reinforcement

What do behavior analysts do to positively influence behavior?

Identifying functional antecedent variables and functional consequence variables is the core of behavior analysis. Changing these variables is what behavior analysts do to positively influence behavior.

response variability

If a reinforcer never occurs, it cannot increase behavior. Behaving variably is critical to solving the puzzle of a new reinforcement contingency. When individuals are exposed to new contingencies of reinforcement, they tend to respond more variably - they explore in a trial-and-error way. When individuals learn how a reinforcement contingency works, they shift from exploration to exploitation - behaving efficiently and rarely trying something new.

How does the rate of reinforcement prior to extinction affect how quickly operant extinction occurs?

If behavior has been reinforced every time it occurs (high rate of reinforcement), then after extinction starts, behavior will quickly decrease to baseline levels. Conversely, if behavior was infrequently reinforced (low rate of reinforcement), then following the contingency change to extinction it will take longer for behavior to decrease to baseline levels. This direct relation between prior reinforcement rate and how quickly behavior undergoes extinction is called the partial reinforcement extinction effect (PREE). (Car starting vs Slot Machines)

Objection 3: Cheating

If cheating can produce the positive reinforcer more easily than engaging in the desired behavior, some people will succumb to the temptation.

Preference for positive reinforcement in distinguishing between positive and negative reinforcement

If individuals object to negative reinforcement contingencies but not positive reinforcement, then it is important to distinguish between these two types of reinforcement.

Heuristics in distinguishing between positive and negative reinforcement

Important to remember all you options when influencing behavior through reinforcement. SR+, SRE-, and SRA- provide a heuristic for remembering three different way sin which you can arrange reinforcement contingencies: consequences can be presented (SR+), removed/reduced (SRE−), or prevented (SRA−).

The Token Economy

In the 1960s applied behavior analysts developed a conditioned-reinforcement system known as the "token economy". In this therapeutic system, tokens were used to reinforce prosocial and life-skill behaviors among hospitalized psychiatric patients, individuals with developmental disabilities, and delinquent youth. A token economy is a set of rules governing the delivery of response-contingent conditioned reinforcers (tokens, points, etc.) that may be later exchanged for one or more backup reinforcers. A backup reinforcer is the reinforcer provided after the conditioned reinforcer signals the delay reduction to its delivery. In a token economy, earning a token signals that the individual is nearer in time to a desired product or service (the backup reinforcer) than they were before the token was given to them.

What theory explains the effects of extinction on operant behavior best?

Information Theory of Reinforcement (NOT Response Strengthening Theory)

clicker training with humans

Levy, when interviewed in 2018 by the Hidden Brain podcast, noted that clicker training has another, less easily quantified, advantage - using a clicker keeps the learner focused on the performance. Using verbal praise and criticism, the traditional teaching method, continuously draws the medical student's attention away from the surgical technique and places it on the teacher's evaluation of the student's competency as a person. Clickers, Levy noted, are "language free...baggage free...I'm quiet, you're quiet; we're just learning a skill." Learners find clicker feedback less disruptive and more fair than verbal feedback. A simple click keeps things objective, and better surgeons are the result. Using clickers to teach humans to execute complex behaviors has only begun to be studied, but the results are encouraging. Clicking marks correct responses at the moment they occur. This improves the skilled performances of athletes, dancers, and individuals with autism as they learn important life skills. They might also prove effective with patients in physical therapy, when learning to do the exercise their physical therapist is teaching. Here it is important that the exercise be done exactly correctly, lest the patient further injure themselves, or not benefit from the exercise as they should. Perhaps you too will think of an application of clicker training to positively influence human behavior.

Pavlovian Learning and Conditioned Reinforcers

Like all examples of Pavlovian conditioning, the neutral stimulus (ka-chunk) signals a reduction in the delay to the unconditioned stimulus (US). In this experiment, the US was food - a primary reinforcer. Because of this signaled delay reduction, ka-chunk comes to function as a conditioned stimulus (CS). In Figure 8.1, food is delivered once, on average, every 30 seconds. When the ka-chunk happens, the delay is reduced to a half-second; that's quite a delay reduction. Because of this large delay reduction, ka-chunk acquires a CS function - it evokes conditioned responding such as salivation and physiologically detectable excitation. If Pavlovian learning is necessary for a neutral consequence to become a conditioned reinforcer, then when the ka-chunk functions as a CS it should also function as a conditioned reinforcer. To test this, Skinner arranged an operant contingency between pressing a lever (newly introduced to the chamber) and the ka-chunk: IF lever press → THEN ka-chunk Note that the only consequence of pressing the lever was a ka-chunk; no food was provided. If Skinner's rats learned to press the lever (which they had never seen before), then he could conclude that ka-chunk functions as a reinforcer - a conditioned reinforcer. Indeed, Skinner's rats learned to press the lever when the only consequence was the ka-chunk. However, as they continued to earn ka-chunks, they gradually decreased their rate of lever pressing. Why? Because of Pavlovian extinction. When the CS (ka-chunk) was repeatedly presented without the US (food), the CS no longer signaled a delay reduction to the US. Eventually ka-chunk stopped functioning as a CS or as a conditioned reinforcer.

Examples of negative reinforcers

Medications: escape from pain, stress, depression, skin rashes etc. Addictive drugs: drug withdrawal (taking the drug stops withdrawal symptoms)

What does money function as?

Money has a conditioned-reinforcing function

Natural selection and primary reinforcers

Natural selection favors those who repeat behaviors that produce life-sustaining consequences. As an illustrative thought experiment, consider two infants living in Paleolithic (cave-person) times. One infant finds its mother's milk highly reinforcing - suckling behaviors that produce this milk are repeated again and again (Kron, 1967). The second infant's behavior is not reinforced by milk - the milk was sampled once and suckling was not repeated. The genes of the first infant have prepared it for survival. It did not need to learn anything new for mother's milk to reinforce its behavior; the first sip was enough to increase the future probability of successful suckling behaviors. The second infant's genes are, sadly, a death sentence in an age without modern medical technologies. An early death prevents these genes from reaching the next generation. The result of this natural selection process is that mother's milk functions as a reinforcer for virtually all human infants and other mammals. Individuals are more likely to survive if their genes establish life-sustaining consequences as reinforcers. Behavior analysts refer to these life-sustaining consequences as primary reinforcers.

noncontingent consequences in north korea

North Korea. Communist. Everyone gets the same rewards whether they do good at their jobs or not. So there's no encouragement to work hard. No innovation or creative thinking. Reduced productivity, innovation, and creativity.

Antecedent

Observable stimulus that is present before the behavior occurs. Example: Presence of the button

Percentile Schedule of Reinforcement

Offers a simple automated training technique incorporating the six principles of effective shaping. Behavior analysts have used percentile schedules of reinforcement to improve academic performance and social interactions among children with disabilities and to reduce cigarette smoking. Step 1: Determine Terminal Behavior Step 2: Collect 10 days of baseline data. Step 3: Choose the 60th percentile of the last 10 days of data as your goal for that day to get that day's reinforcer. (repeat each day)

Verbal Learning and Conditioned Reinforcers

Parents explain to you that tickets win arcade prizes (that tickets are a delay-reduction stimuli for prizes), which makes them function as conditioned reinforcers to skillfully play games. During verbal learning, information is provided indicating that the conditioned reinforcer signals a delay reduction to another reinforcer. That is, the Pavlovian CS→US contingency is verbally described. Laboratory studies reveal that humans are capable of verbally learning Pavlovian contingencies. For example, if humans are instructed that a red light will precede the delivery of an electric shock (US), when that instructed CS is encountered, an involuntary physiological fear response occurs (the conditioned response [CR]), even when the shock does not (Phelps et al., 2001). Similarly, in your everyday life, if a friend tells you their dog bites in a vicious and unpredictable way (an event that would function as a highly aversive US), then when you see the dog approaching (CS), you will experience fear (CR), despite having never seen the dog bite anyone.

What are two ways that a consequence can come to function as a conditioned reinforcer?

Pavlovian learning and verbal learning

Ways positive and negative reinforcers are different

Presentation vs. removal/reduction/prevention If given a choice, people prefer positive reinforcers.

What are the primary and other effects of operant extinctino?

Primary: It returns behavior to its baseline (no-reinforcer) level. Other Effects: 1. Extinction-induced emotional behavior 2. Extinction burst 3. Extinction-induced variability 4. Extinction-induced resurgence

Principles for effective conditioned reinforcement

Principle 1: Use an effective backup reinforcer. Principle 2: Use a salient conditioned reinforcer. Principle 3: Use a conditioned reinforcer that signals a large delay reduction to the backup reinforcer. Principle 4: Make sure the conditioned reinforcer is not redundant.

Principle 1 for effective conditioned reinforcement

Principle 1: Use an effective backup reinforcer. The first principle of Pavlovian conditioning was "Use an important US." The more important the US, the better the Pavlovian conditioning. Translated to conditioned reinforcement, this becomes, Use an effective backup reinforcer. The better the backup reinforcer, the more effective the conditioned reinforcer will be. Simply put, the more effective the backup reinforcer, the more effective the conditioned reinforcer. One strategy for arranging an effective conditioned reinforcer is to use a token that can be exchanged for a lot of different backup reinforcers. For example, a $100 bill is a highly effective conditioned reinforcer because its receipt signals a delay reduction to many different backup reinforcers (ice cream, concert tickets, a new pair of shoes, etc.). When a conditioned reinforcer signals a delay reduction to more than one backup reinforcer, it is referred to as a generalized conditioned reinforcer.

Principle 2 for effective conditioned reinforcement

Principle 2: Use a salient conditioned reinforcer. The second principle of Pavlovian conditioning was "Use a salient CS." Translated to conditioned reinforcement, this becomes, Use a salient conditioned reinforcer. Simply put, a noticeable conditioned reinforcer will work better than one that is easily overlooked. For example, giving a child a token and having her place it in her token bank is more salient than dropping the token into the bank for her. If you are going to deliver a conditioned reinforcer, you want to be certain the individual observes it. If they don't observe the conditioned reinforcer, it will never positively influence behavior. To increase the salience of conditioned reinforcers, animal trainers often use clickers instead of saying something like, "good boy" (after all, animals don't speak English). Clickers, which are inexpensive and may be purchased at most pet supply stores, present a salient auditory stimulus "click-click." This sound is established as a conditioned reinforcer by ensuring that it signals a delay reduction to an effective backup reinforcer, like a favorite treat. The sound of the clicker is unique. Pets, farm animals, and captive animals in zoos have never heard a sound like this before. This novelty makes the sound salient, that is, something that's difficult to miss. A second advantage of the clicker is that its "click-click" sound is brief, much briefer than the time it would take to say, "good boy." This brevity is important because an effective conditioned reinforcer marks the desired response (and no other behavior) as it occurs (Urcuioli & Kasprow, 1988; Williams, 1999). When we say marking, we mean that the conditioned reinforcer immediately follows the response, and this helps the individual learn which response produced the backup reinforcer.

Principle 3 for effective conditioned reinforcement

Principle 3: Use a conditioned reinforcer that signals a large delay reduction to the backup reinforcer. The third principle of Pavlovian conditioning was, "Use a CS that signals a large delay reduction to the US." Translation: Use a conditioned reinforcer that signals a large delay reduction to the backup reinforcer. The bigger the delay reduction to the backup reinforcer, the more effective the conditioned reinforcer will be. As in Chapter 4, the amount of delay reduction signaled by the conditioned reinforcer is easily calculated using the delay-reduction ratio: Delay-reduction ratio = US-->US interval / CS --> US interval In this equation, the US→US interval refers to the average time between backup reinforcers. The CS→US interval is the time separating the conditioned reinforcer and the delivery of the backup reinforcer.

Principle 4 for effective conditioned reinforcement

Principle 4: Make sure the conditioned reinforcer is not redundant. The final principle of Pavlovian conditioning was, "Make sure the CS is not redundant." Translated to conditioned reinforcement, we want our conditioned reinforcer to be the only stimulus signaling a delay reduction to the backup reinforcer. For example, imagine that you want to start using a clicker as a conditioned reinforcer when training your dog. Establishing "click-click" as a conditioned reinforcer will be easier if you avoid simultaneously presenting a stimulus that already functions as a conditioned reinforcer. So, if you normally praise your dog just before giving her a treat, then praise probably already functions as a conditioned reinforcer. If you click at the same time that you praise the dog, then the click is a redundant stimulus - the praise already signals a delay reduction to the treat. Because the "click-click" is redundant with praise, it is unlikely to acquire conditioned reinforcing properties. Instead, when establishing "click-click" as a conditioned reinforcer, you should withhold the praise throughout. This ensures that "click-click" is the only stimulus signaling a delay reduction to the treat.

Functional Communication Training

Problematic demands for attention (e.g., tantruming) are extinguished while appropriate requests (e.g., "will you play with me please") are established and reinforced. When a child has learned the latter response, they can appropriately request attention when they want it. Functional Communication Training has proven effective in reducing inappropriate requests for social reinforcers in a variety of populations and settings.

What has been the most prevalent means of decreasing problem behavior throughout history?

Punishment. Spanking/hitting misbehaving children. Criminal justice and religions often depend on punishment. Differential Reinforcement offers a better option for reducing problem behavior.

Differential Reinforcement of low-rate behavior (DRL)

Responding quickly is extinguished and responding slowly is reinforced. This happens naturally if you visit a foreign country with only a basic comprehension of their language. If the person giving you directions to the bus terminal speaks too quickly, you cannot understand them, so you look at them quizzically and withhold the usual reinforcers such as "merci." When the person repeats the directions slowly, you understand and then say, "merci beaucoup."

Escape Extinction

Responding that meets the negative reinforcement contingency no longer removes or reduces the aversive event. As a result, responding decreases to baseline (no-reinforcer) levels. The first, escape extinction, was designed to reduce food refusals: IF food refusal (tantrum) → THEN food is not removed The second contingency change, negative reinforcement escape (SRE−), was designed to increase eating: IF food consumed → THEN child can choose to end the meal

How can we effectively use Differential Reinforcement?

See flowchart picture on phone taken on 10/11/2021.

Shaping Principle 1

Shaping Principle 1 asks us to provide an objective definition of the terminal behavior. That is, describe the terminal behavior in enough detail that it can be objectively measured. The terminal behavior in Plants vs. Zombies - beating the final level of the game - involves a lot of skills that the game shapes up over time. Let's simplify things by focusing on just one of these skills - the expert player clicks things quickly. For example, when suns fall from the sky, the expert clicks them immediately after they appear. Likewise, when a desired seed packet is available, the expert player clicks it right away. When playing at the highest levels of the game, clicking slowly is disastrous - brains!

Shaping Principle 2

Shaping Principle 2 asks us to evaluate what the novice player can currently do and how that falls short of the terminal behavior. The designers of Plants vs. Zombies knew that novice players could click things - after all, they clicked the app that loaded the game. However, novices do not normally click apps the millisecond they become available. The dotted line shows how quickly the player will have to click things in order to win at the highest level of the game. This is the speed of the terminal behavior. Clearly, the current speed falls short. Identifying this gap between current and terminal performance helps the designers identify a dimension along which behavior needs to change - it needs to get faster (reduced latency between stimulus and response). Once this dimension is identified, the game designer can set the reinforcement contingency for the first response approximation.

Shaping Principle 3

Shaping Principle 3 provides advice for setting the reinforcement contingencies - the response approximations should challenge the player, but not so much that reinforcers cannot be obtained. Throughout shaping we need to hit that Goldilocks zone in which reinforcers can be earned, but only if the performance is a little better than before. If we set the reinforcement contingency at the superfast level shown in Figure 8.3, the player would never beat the level; that is, they would never obtain critical reinforcers in the game. Operant extinction would leave these players feeling frustrated and angry, and they would, in turn, leave negative reviews in the app store. Not a good thing if you are trying to make a living in video game design.

Shaping Principle 5

Shaping Principle 5 suggests we ensure the learner has mastered the current response approximation before moving on to the next one. This is achieved in Plants vs. Zombies by not letting players advance to the next level (where suns will have to be clicked a little faster) until they beat the current level. Beating a level demonstrates the player has mastered the current response approximation.

How is shaping used on human behavior?

Shaping is an effective way to help humans learn complex terminal behaviors, that is, those requiring skills that make it impossible for the novice to obtain a reinforcer. Shaping can transform the novice player into a zombie-killing machine. Imagine what would happen if the novice player's first experience with the game was the extreme challenge of the final level. The outcome would be disastrous. The brain-hungry zombies would quickly overrun the house and the reinforcer (defeating the level) would not be obtained. No matter how many times the novice tried to beat the final level, his game-playing behavior would be extinguished. They arranged lots of conditioned reinforcers for behaviors that were mere approximations of the skills needed to win when playing the final level. If you've played the game, you may remember that the first approximation taught to novice players is clicking on a seed packet (they appear in the upper left corner of the screen). This response produces a conditioned reinforcer - the sound of the packet being torn open. This sound marks the correct response and signals a small delay reduction to the backup reinforcer - killing all the zombies and winning the first level. The next approximation is to click suns as they fall from the sky. The first time this is done, another auditory conditioned reinforcer marks the response and points are added to a counter. In this token economy, points (generalized reinforcers) may be exchanged for many different backup reinforcers - zombie-killing plants, zombie-blocking obstacles, and sunflowers that produce more suns/points. All told, the game differentially reinforces thousands of successive approximations before the terminal behavior is acquired. When framed as a protracted learning task in which thousands of new skills must be acquired, it sounds tedious. But because the game designers so effectively used shaping, we use other words to describe the game - "fun," "engaging," and "a place where I lose all track of time."

B.F. Skinner

Skinner box, operant conditioning

Stimulus Presentation

Something new is added to the environment

Differential Reinforcement of High-Rate Behavior (DRH)

Sometimes the problem with a behavior is not its topography, but the rate at which it occurs. Rates of responding may be increased or decreased with differential reinforcement. This is used if the rate of behavior is too slow. Here, low-rate responding is put on extinction and high-rate responding is reinforced.

How has reinforcement been used to positively influence behavior?

Teaching independent living skills to adults and children with intellectual and developmental disabilities, treatment of children with autism using positive reinforcement, positively influence behaviors that are at the heart of public health deficits, and teach critical helping behaviors to animals.

Extinction Burst

Terminating the reinforcement contingency sometimes, but not always, produces a temporary increase in the rate, magnitude, or duration of the previously reinforced response. If your rate of button-pressing temporarily increases (pressing rapidly), the magnitude of your presses increases (pressing really hard), or the duration of your presses increases (holding the button down longer than normal), then this was an extinction burst. Similarly, in clinical settings, extinction bursts happen in about half the cases in which extinction is the only therapy employed. Because extinction bursts can be stressful and can lead to accidental lapses in extinction-based therapy, it is important to continue studying the variables influencing extinction bursts.

Differential Reinforcement of Incompatible Behavior (DRI)

The "something else" is a response that is topographically incompatible with the problem behavior. This was the technique used in the zoo study - holding the ring outside the cage was topographically incompatible with throwing feces and shaking the cage.

Two Term Contingency

The basic operant contingency. Two terms- response and consequence

Marking

The conditioned reinforcer immediately follows the response, and this helps the individual learn which response produced the backup reinforcer.

What two factors influence how quickly behavior decreases to baseline levels under an operant-extinction contingency?

The first factor is the rate of reinforcement prior to extinction - the higher the rate of reinforcement, the faster extinction will work. The second is the individual's motivation to acquire the reinforcer - the more the reinforcer is needed, the more persistent behavior will be during extinction.

Edward L. Thorndike

The first scientist to demonstrate that reinforcers increase the probability of behavior. Widely known for the law of effect- the principle that rewarded behavior is likely to recur and punished behavior is unlikely to recur. This principle was the basis for BF Skinner's behavioral technology.

Objection 1: Intrinsic Motivation

The natural drive to engage in a behavior because it fosters a sense of competence. (Extrinsic Reinforcers: reinforcers that are not automatically obtained by engaging in the behavior, instead, they are artificially arranged.) Extrinsic reinforcers can undermine intrinsic motivation. Studies have shown that extrinsic reinforcers do NOT decrease intrinsic motivation to engage in behavior. Extrinsic reinforcers can help people find automatic reinforcers that they would have otherwise never experienced. Verbal extrinsic reinforcers enhance intrinsic motivation. Tangible extrinsic reinforcers can have a temporary negative impact. Negative effects of tangible rewards can be avoided if the reinforcer comes as a surprise.

Positive Reinforcement (SR+)

The presentation of a consequence (a stimulus presentation), the effect of which is to increase operant behavior above its no-reinforcer baseline level. Any time that reinforcement involves the presentation of a stimulus we will classify it as a positive reinforcer.

Shaping Principle 4

The red dashed line in Figure 8.4 shows a reinforcement contingency that is neither too easy nor too difficult. If the player clicks things at least this quickly (anywhere in the reinforcement zone), the reinforcer will be obtained; that is, the player will defeat the zombies and win the level. Clicking slower than that (the extinction zone) is not reinforced. Reinforcing one response and extinguishing other, previously reinforced responses is, of course, differential reinforcement - Principle 4.

Functional Analysis of Behavior

The scientific method used to (1) determine if a problem behavior is an operant and (2) identify the reinforcer that maintains that operant The functional analysis is a brief experiment in which consequences that might be reinforcers are turned ON and OFF, while the effects of these manipulations on problem behavior are recorded. If the problem behavior occurs at a higher rate when one of these consequences is arranged, then we may conclude that the consequence functions as a reinforcer. If no experimenter-controlled consequences function as reinforcers, then either the behavior is maintained by an automatic reinforcer or the problem behavior is not an operant behavior. The results of a functional analysis of behavior are useful. If the problem behavior is not an operant, then an operant-based intervention may not be the right approach (perhaps a Pavlovian intervention, like graduated-exposure therapy, would work better). But if the problem behavior is an operant, then a consequence-based intervention can help to reduce the behavior.

Loss Aversion

The strong tendency to regard losses as considerably more important than gains of comparable magnitude—and, with this, a tendency to take steps (including risky steps) to avoid possible loss. Negative reinforcers seem to more effectively influence behavior.

Loss aversion in distinguishing between positive and negative reinforcement

The tendency for prevention SRA- to influence behavior more than presentation of the same stimulus SR+. People value loss prevention more highly than an equivalent gain.

Ways positive and negative reinforcers are the same

They are both consequences. They both increase behavior above baseline (no-reinforcement) level

What can reinforcers be used for?

To direct behavior toward actions that benefit the individual, those they interact with, and society at large.

Good Behavior Game

Token economies have also proven effective in schools. Perhaps, when you were growing up, your elementary-school teacher used a version of the "Good Behavior Game". In this game, students are assigned to teams and points, later exchangeable for goods and privileges, are given (and taken away) contingent upon appropriate and inappropriate behavior. Token economies have several attractive features: Motivationally robust: Because tokens can be exchanged for many different backup reinforcers, motivation to earn them remains fairly constant. For example, when a psychiatric patient uses his tokens to buy a shirt, he is still motivated to earn more tokens because they can be exchanged for candy, movies, and so on. Nondisruptive: Reinforcing an ongoing behavior with a token is easier than with a backup reinforcer that disrupts the performance. For example, providing a token when a patient buttons his shirt is less disruptive than taking him to the hospital's theater to watch a movie. Fair compensation: In a token economy, it is easy to assign larger reinforcers to motivate challenging or less-preferred behaviors - simply assign a larger number of tokens to that behavior. This ensures fair compensation is provided for each activity. Portability: Tokens are easy to keep on hand at all times and this allows reinforcement of appropriate behavior whenever it is observed. Likewise, points can be added to a team's score with nothing more than a whiteboard marker. This portability increases the probability that appropriate behavior will be reinforced. Delay-bridging: If professor Dumbledore told the Hogwarts students that the best-behaved students would win the House Cup at the end of the year, they would soon forget about this delayed reward. However, by providing points immediately after a desirable response, the delay is bridged between good behavior and the awarding of the House Cup. Nonfictional evidence for the importance of delay-bridging comes from animal experiments in which responding stops when backup reinforcers are delayed by just a few minutes. When the same animals' responding produces an immediate conditioned reinforcer, their behavior continues even when the backup reinforcer is delayed by an hour or more

What is the consequence in SRA-?

Two-Factor Theory: Consequence is fear reduction. Relies on two learning processes (two factors): Pavlovian and operant conditioning. Pavlovian conditioning explains why fear arises (the warning stimulus is a CS that evokes fear) and operant conditioning explains why avoidance behavior occurs (fear reduction is the consequence that functions as an SRA−). One-Factor Theory: One-factor theory holds that operant conditioning alone can explain SRA−. The other factor - Pavlovian conditioning - is not necessary. Thus, according to one-factor theory, momentarily preventing the aversive event is the consequence that maintains SRA− behavior. There is no need for fear reduction.

Conditioned Reinforcers

We have to learn something before these consequences will function as reinforcers. (Like Pavlovian conditioning) Conditioned reinforcers are those consequences that function as reinforcers only after learning occurs. Conditioned reinforcement is incredibly useful when teaching an individual to engage in new and complex behaviors. Therefore, those wishing to positively influence behavior will be more successful if they know precisely how to turn a neutral stimulus (e.g., a useless piece of paper) into an effective conditioned reinforcer (a $50 bill). Pavlovian learning is responsible for the transformation of a neutral consequence into a conditioned reinforcer.

Operant Extinction

When a normally reinforced behavior no longer works. When it no longer produces a reinforcer. Responding that meets the reinforcement contingency no longer produces the reinforcer and, as a result, it falls to baseline (no-reinforcer) levels. You did the IF behavior, but the THEN reinforcer didn't happen. Operant extinction can also increase the probability of behaviors other than the previously reinforced response.

Shaping Principle 6

When mastery is achieved, the red dotted line in Figure 8.4 will be shifted a little further to the right. Clicking at this slightly faster rate will be reinforced, but slower clicking will not. If the individual struggles to obtain reinforcers at this next level, Principle 6 instructs us to lower the criterion for reinforcement. The new criterion has asked for too much from this learner; the sequence of successive approximations will need to proceed more gradually. Plants vs. Zombies follows Principle 6 by constantly monitoring the player's performance during the initial levels. If the player is clicking things too slowly, the game lowers the reinforcement contingency a little. Zombies appear a bit slower, and the level is extended in duration to give the player more practice with the new contingency. By lowering the requirement for reinforcement, the game keeps the reinforcers flowing and prevents the player from encountering extinction. It also gives the player more practice, so they can continue to advance toward the terminal behavior. Keeping the game just a little bit challenging keeps players in flow.

Extinction-Induced Resurgence

When one operant behavior is extinguished, other (different) behaviors that were previously reinforced are emitted again; that is, they become "resurgent." In this study of infant care, resurgence of previously successful behaviors was a good thing - it helped the caretakers sooth a crying infant. Individuals who have recovered from a substance-use disorder can often relapse to drug use when they lose a significant source of positive reinforcement. For example, an abstinent person with alcohol-use disorder is at risk of resurgent drinking if they lose their job or their spouse - both are sources of significant positive reinforcers.

Differential Reinforcement of Other Behavior (DRO)

When this procedure is used, reinforcement is provided contingent upon abstaining from the problem behavior for a specified interval of time; presumably while "other behavior" is occurring. By setting a short time interval (e.g., IF the individual abstains from the problem behavior for 5 seconds → THEN the reinforcer will be delivered), the DRO procedure can arrange a high rate of therapeutic reinforcement, which can effectively compete with the reinforcement contingency maintaining problem behavior. As the patient succeeds in abstaining from the problem behavior, the contingency can be modified to require gradually longer intervals before the therapeutic reinforcer is delivered again. This strategy can be effective, although it does not teach a specific activity to replace the problem behavior, and increasing intervals between reinforcers can produce resurgence of problem behavior.

Contingent relation between response and consequence

When you press the elevator button (response), the elevator begins to operate (consequence). If you don't press the elevator button, the elevator will not operate. When the consequent is ON, there is a contingent relation between response and consequence. IF the behavior occurred, THEN the consequence followed. When consequence is off, no response-contingent relation. Describes the causal (IF → THEN) relation between an operant behavior and its consequence.

Reinforcer

Whether positive or negative, increase operant behavior above its no-reinforcer baseline level

Reinforcer

a consequence that increases the operant behavior above its baseline level. Refers to the consequence itself- the consequence of the behavior.

Consequence

an observable stimulus change that happens after behavior occurs. Consequence influences our behavior. Example: doors close and elevator begins to move.

Operant Behavior

behavior influenced by antecedent and consequence events. Changes the environment and in turn is influenced by that consequence. To find out how the consequence influences operant behavior, we need to turn ON and OFF the consequence (independent variable) and see if it affects the behavior. Example: Pressing the button. Operant behavior operates the environment by producing consequences.

Extinction-Induced Emotional Behavior

emotional responses induced by extinction- anger, frustration, violence. Long-term extinction contingencies placed on behaviors that previously produced important reinforcer can lead to debilitating emotions such as depression. (Long term unemployment --> Depression)

Operant Behavior

generic class of responses influenced by antecedents, with each response in the class producing the same consequence (example: to open spotify, there are many ways you can press the button)

What makes video games so engaging?

just as quickly as one reinforcer is obtained, another one is needed. For example, when you've landed safely just past the pit of fire, now you need to execute another skillful sequence of button presses and joystick movements to avoid an incoming projectile. In this way, the game influences your behavior through a series of reinforcement contingencies.

noncontingent consequence

occurs after a response, but not because the response caused it to occur. No causal relationship between response and consequence.

superstitious behavior

occurs when the individual behaves as though a response-consequence contingency exists when, in fact, the relation between response and consequence is noncontingent.

Reinforcement

refers to the process or procedure whereby a reinforcer increases operant behavior above its baseline level. The whole process of the consequence increasing the behavior above baseline.

Organizational Behavior Management

systematic application of positive reinforcement principles in organizational settings for the purpose of raising the incidence of desirable organizational behaviors. Can improve employee performance by an average of 69%.

Spontaneous recovery of operant behavior

temporary resumption in operant responding following time away from the extinction setting

How does motivation to acquire the reinforcer affect how quickly operant extinction occurs?

the hungriest rats (those deprived of food for 23 hours) pressed the lever during extinction far more often than rats that had eaten 1-3 hours ago.

Differential Reinforcement of Alternative Behavior (DRA)

the only difference is that the reinforced response can be any adaptive behavior (it need not be topographically incompatible with the problem behavior). A teacher who calls on children when they raise their hand appropriately, and ignores children when they yell out the answer, is using DRA to encourage appropriate classroom behavior.


Set pelajaran terkait

✨THE DEFINITIVE Darby's Simulated NBDHE Board Exam 4

View Set

Chapter 14: Minds, Machines, & cognitive psych

View Set

AP Government - Legislative Branch Review

View Set

Chapter 5 - Network Security Firewalls & VPNS

View Set

Taylor chapter-39 Review Questions. Fluid, Electrolyte, and Acid Base Balance

View Set

4.1 Given a scenario, use the appropriate tool to assess organizational security.

View Set

Demand Supply Elasticity Ch 19 Study Guide

View Set

Psychology Chapter 12: Personality

View Set