4.2 Learning Through Operant Conditioning

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

*Positive and Negative Reinforcement and Punishment*

"Positive" means that something is presented, and "negative" means that something is taken away. Reinforcement means that the behavior is strengthened, and punishment means that the behavior is weakened. In positive reinforcement, an appetitive (pleasant) stimulus is presented, and in positive punishment, an aversive (unpleasant) stimulus is presented. In negative reinforcement, an aversive stimulus is removed, and in negative punishment, an appetitive stimulus is removed.

Why does a cumulative record goes to flat when a response is being extinguished?

A cumulative record goes flat when a response is extinguished because no more responses are made; the cumulative total of responses remains the same over time. Thus, the record is flat because this total is not increasing at all. Remember that the cumulative record can never decrease because the total number of responses can only increase.

appetitive stimulus

pleasant stimulus

motivation

the set of internal and external factors that energize our behavior and direct it toward goals.

instinctual drift

the tendency for an animal to drift back from a learned operant response to an innate, instinctual response to an object

*Cumulative Record Illustrations of Acquisition, Extinction, and Spontaneous Recovery*

(a) This is an acquisition cumulative record; the responding rate increases as learning occurs so the cumulative record has a fairly steep slope reflecting this increase. (b) This is an extinction cumulative record; the responding rate has essentially fallen to zero. A flat cumulative record indicates extinction. (c) This is an example of spontaneous recovery—a burst of responding following a break in extinction training. As the extinction training continues, the record will return to flat (no responding).

stimulus generalization

(in operant conditioning) Giving the operant response in the presence of stimuli similar to the discriminative stimulus. The more similar the stimulus is to the discriminative stimulus, the higher the operant response rate

stimulus discrimination

(in operant conditioning) Learning to give the operant response only in the presence of the discriminative stimulus

extinction

(in operant conditioning) The diminishing of the operant response when it is no longer reinforced

discriminative stimulus

(in operant conditioning) The stimulus that has to be present for the operant response to be reinforced

acquisition

(in operant conditioning) The strengthening of a reinforced operant response

spontaneous recovery

(in operant conditioning) The temporary recovery of the operant response following a break during extinction training

Drive-Reduction Theory

*A theory of motivation* that proposes that our behavior is motivated to reduce drives (bodily tension states) created by unsatisfied bodily needs to return the body to a balanced internal state. • drives are disruptions of this balanced bodily state. We are "pushed" into action by these unpleasant drive states.

there are four types of partial schedules

*Ratio:* fixed ratio, variable ratio, *Interval:* fixed interval, and variable interval

Law of Effect

*a principle developed by Edward Thorndike that says that any behavior that results in satisfying consequences tends to be repeated and that any behavior that results in unsatisfying consequences tends not to be repeated* - Thorndike studied cats escaping from puzzle boxes. If the animal pressed the lever, as a result of its behavior it would manage to escape the box and get the food (satisfying effects). The animal would tend to repeat such successful behaviors in the future when put back into the box. However, other behaviors (for example, pushing the door) that did not lead to escaping and getting the food would not be repeated.

token economy

*a token economy is a form of behavior modification* designed to increase desirable behavior and decrease undesirable behavior with the use of tokens Individuals receive tokens immediately after displaying desirable behavior. The tokens are collected and later exchanged for a meaningful object or privilege.

in conducting their experiments with animals, operant conditioning researchers use ____

*operant chambers* which resemble large plastic boxes Computers record the animal's behavior, control the delivery of food, and maintain other aspects of the chamber. Thus, the operant chamber is a very controlled environment for studying the impact of the animal's behavior on its environment. Operant chambers are sometimes referred to as "Skinner boxes" because B. F. Skinner originally designed this type of chamber.*

Premack principle

*the principle that the opportunity to perform a highly frequent behavior can reinforce a less frequent behavior.* i.e. children typically spend more time watching television than doing homework. Thus, watching television could be used as a reinforcer for doing homework According to David Premack, you should view reinforcers as behaviors rather than stimuli. This enables the conceptualization of reinforcement as a sequence of two behaviors—the behavior that is being reinforced followed by the behavior that is the reinforcer.

Let's compare the cumulative records for the four types of partial-reinforcement schedules

1. First, ratio schedules lead to higher rates of responding than interval schedules. Their slopes are much steeper. This is because ratio schedules depend on responding, and interval schedules depend on time elapsing 2. Second, variable schedules lead to fewer breaks (periods during which no responding occurs) after reinforcements than fixed schedules. This is because with variable schedules it is not known how many responses will have to be made or how much time will have to elapse before the next reinforcement.

*What are the side effects of Punishment*

1. person getting punished may get the wrong message/ lesson 2. may be negative to person giving punishment (ie guilt) 3. lack of context for punishment, doesn't make intention of the punishers message clear 4. punisher may draw conclusions that a punishment works, when it may not be due to punishment, but due to natural behavior fluctuating (extremes even out over time, as pattern seeks we tend to think punishment works when that may not be the case)

*When does punishment not work?*

1. timing... it has to be consistent and timed right after the behavior you want to discourage

fixed-ratio schedule

A partial schedule of reinforcement in which a reinforcer is delivered each time a fixed number of responses is made. The fixed number can be any number greater than one. i.e. a factory in which a worker has to make a certain number of items (say two wallets) before receiving any pay. The worker makes two wallets and then receives a certain amount of money. Then he or she must make two more to be paid that amount of money again.

*Reinforcement without awareness*

According to behavioral psychologists, reinforcement should strengthen operant responding even when people are unaware of the contingency between their responding and the reinforcement.

*Yerkes-Dodson Law*

As arousal increases, the quality of performance increases—up to the point of optimal arousal. Further increases in arousal are detrimental to performance.

*Cumulative Records for Fixed-Ratio and Variable-Ratio Schedules of Partial Reinforcement*

Both ratio schedules lead to high rates of responding as indicated by the steep slopes of the two cumulative records. Each tick mark indicates when a reinforcer was delivered. As you can see, the tick marks appear regularly in the record for the fixed-ratio schedule, but irregularly in the record for the variable-ratio schedule. A fixed-ratio schedule leads to short pauses after reinforcement, but these pauses don't occur as often for a variable-ratio schedule.

*How to Understand a Cumulative Record*

By measuring how responses cumulate over time, a cumulative record shows the rate of responding. When no responses occur, the record is flat (has no slope). As the number of responses increases per unit of time, the cumulative total rises more quickly. The response rate is reflected in the slope of the record. The faster the response rate, the steeper the slope of the record.

we only know if a stimulus has served as a reinforcer or a punisher and led to reinforcement or punishment if the target behavior keeps occurring (reinforcement) or stops occurring (punishment).

For example, the spanking would be punishment if the disobedient behavior stopped, and the praise reinforcement if the chores continued to be done. However, if the disobedient behavior continued, the spanking would have to be considered reinforcement; and if the chores did not continue to be done, the praise would have to be considered punishment. This is an important point. What serves as reinforcement or punishment is relative to each individual, in a particular context, and at a particular point in time.

What is necessary to overcome the need for immediate consequences in operant conditioning is for the learner to have the cognitive capacity to link the relevant behavior to the consequences regardless of the delay interval between them. If the learner can make such causal links, then conditioning can occur over time lags between behaviors and their consequences.

For example, think about studying now for a psychology exam in two weeks. The consequences (your grade on the exam) do not immediately follow your present study behavior. They come two weeks later. Or think about any job that you have had. You likely didn't get paid immediately. Typically, you are paid weekly or biweekly.

*A ratio schedule is based on the number of responses made, and an interval schedule is based on the amount of time that has elapsed.*

In addition, the number of responses or the amount of elapsed time can be fixed or variable

*Extinction on a fixed vs variable ratio*

Obviously, it is easier to extinguish the response if the reinforcement had been given continuously for every response in the past. If a response is made and doesn't get reinforced, the responder knows immediately something is wrong because they have always been reinforced after each response. With partial schedules, if a response is made and doesn't get reinforced, the responder doesn't know that anything is wrong because they have not been reinforced for every response. Thus, it will take longer before extinction occurs for the partial schedules because it will take longer to realize that something is wrong.

What does the overjustification effect warn us of?

The overjustification effect imposes a limitation on operant conditioning and its effectiveness in applied settings. It tells us that we need to be careful in our use of extrinsic motivation so that we do not undermine intrinsic motivation.

*Explain why the partial-reinforcement effect is greater for variable schedules than fixed schedules of reinforcement.*

The partial-reinforcement effect is greater for variable schedules than fixed schedules because there is no way for the person or animal to know how many responses are necessary (on a ratio schedule) or how much time has to elapse (on an interval schedule) to obtain a reinforcer. Thus, it is very difficult to realize that reinforcement has been withdrawn and so the responding is more resistant to extinction. On fixed schedules, however, you know how many responses have to be made or how much time has to elapse because these are set numbers or amounts of time. Thus, it is easier to detect that the reinforcement has been withdrawn, so fixed schedules are less resistant to extinction.

*Cumulative Records for Fixed-Interval and Variable-Interval Schedules of Partial Reinforcement*

The tick marks indicate when reinforcers were delivered for each of the two schedules. The flat sections following reinforcements for the fixed-interval schedule indicate periods when little or no responding occurred. Such pauses do not occur for a variable-interval schedule. A variable-interval schedule leads to steady responding.

*Extinction on a fixed vs variable interval*

Think about the fixed schedules versus the variable schedules. Wouldn't it be much more difficult to realize that something is wrong on a variable schedule? On a fixed schedule, it would be easy to notice that the reinforcement didn't appear following the fixed number of responses or fixed time interval. On a variable schedule, however, the disappearance of reinforcement would be very difficult to detect because it's not known how many responses will have to be made or how much time has to elapse. Think about the example of a variable-ratio schedule with the rat pressing a lever. Because there is no fixed number of responses that have to be made, the rat wouldn't realize that its responding was being extinguished. It could be that the number of lever presses necessary to get the next reinforcement is very large. Because of such uncertainty, the variable schedules are much more resistant to extinction than the fixed schedules.

Remember, whether the behavior is strengthened or weakened is the only thing that tells you whether the consequences were reinforcing or punishing, respectively.

While it is certainly possible to say that certain stimuli usually serve as reinforcers or punishers, they do not inevitably do so. Think about money. For most people, $100 would serve as a reinforcer, but it might not for Bill Gates (of Microsoft), whose net worth is in the billions.

overjustification effect

a decrease in an intrinsically motivated behavior after the behavior is extrinsically reinforced and then the reinforcement is discontinued person may perceive the extrinsic reinforcement as an attempt at controlling their behavior, which may lead them to stop engaging in the activity to maintain their sense of choice. A person might also think that the reinforcement makes the activity more like work (something one does for extrinsic reinforcement) than play (something one does for its own sake), lessening their enjoyment of the activity and leading them to cease engaging in it.

Yerkes-Dodson Law

a law describing the relationship between the amount of arousal and the performance quality on a task • increasing arousal up to some optimal level increases performance quality on a task, but increasing arousal past this point is detrimental to performance. i.e you need to be aroused to do well on them; but if you are too aroused, your performance will be negatively affected.

fixed-interval schedule

a partial schedule of reinforcement in which a reinforcer is delivered after the first response is given once a set interval of time has elapsed ***Please note that the reinforcement does not automatically appear after the fixed interval of time has elapsed; the reinforcement merely becomes obtainable after the fixed interval. A response must be made in order to get the reinforcement. i.e. studying for an exam every two weeks. you go through longs periods of little studying, followed by bursts of intense studying

variable-ratio schedule

a partial schedule of reinforcement in which the number of responses it takes to obtain a reinforcer varies on each trial but averages to a set number across trials. i.e. person playing the slot machine knows that it will eventually pay off but does not know how many responses (insertions of money into the slot machine) are necessary to get that payoff.

variable-interval schedule

a partial schedule of reinforcement in which the time that must elapse on each trial before a response will lead to the delivery of a reinforcer varies from trial to trial but averages to a set time across trials. i.e. studying for surprise exams. You are steadily always studying because you are unsure of when the next exam will take place

cumulative record

a record of the total number of operant responses over time that visually depicts the rate of responding

punisher

a stimulus that decreases the probability of a prior response

secondary reinforcer

a stimulus that gains its reinforcing property through learning i.e. money, applause

reinforcer

a stimulus that increases the probability of a prior response

primary reinforcer

a stimulus that is innately reinforcing i.e. food and water Note that "innately reinforcing" does not mean "always reinforcing." For example, food would probably not serve as a reinforcer for someone who has just finished eating a five-course meal. Innately reinforcing only means that the reinforcing property of the stimulus does not have to be learned.

Incentive Theory

a theory of motivation that proposes that our behavior is motivated by incentives, external stimuli that we have learned to associate with reinforcement. • source of the motivation is outside the person

Arousal Theory

a theory of motivation that proposes that our behavior is motivated to maintain an optimal level of physiological arousal

observational learning (modeling)

learning by observing others and imitating their behavior

latent learning

learning that occurs but is not demonstrated until there is incentive to do so

operant conditioning

learning to associate behaviors with their consequences • behaviors that are reinforced (lead to satisfying consequences) will be strengthened, and behaviors that are punished (lead to unsatisfying consequences) will be weakened

mirror neurons

neurons that fire both when performing an action and when observing another person perform that same action.

negative punishment

punishment in which an appetitive stimulus is removed i.e. taking away a teenager's driving privileges after they breaks curfew

positive punishment

punishment in which an aversive stimulus is presented i.e. spanking a child for not obeying the rules

positive reinforcement

reinforcement in which an appetitive stimulus is presented i.e. would be praising a child for doing the chores

negative reinforcement

reinforcement in which an aversive stimulus is removed i.e. taking an Advil to get rid of a headache (the adverse stimulus being removed)

continuous schedule of reinforcement

reinforcing the desired operant response each time it is made

continuous schedule of reinforcement

reinforcing the desired operant response each time it is made.

partial schedule of reinforcement

reinforcing the desired operant response only part of the time

behavior modification

the application of classical and operant conditioning principles to eliminate undesirable behavior and to teach more desirable behavior

extrinsic motivation

the desire to perform a behavior for external reinforcement.

intrinsic motivation

the desire to perform a behavior for its own sake

partial-reinforcement effect

the finding that operant responses that are reinforced on partial schedules are more resistant to extinction than those reinforced on a continuous schedule

punishment

the process by which the probability of a response is decreased by the presentation of a punisher

reinforcement

the process by which the probability of a response is increased by the presentation of a reinforcer

shaping

training a human or animal to make an *operant response* by reinforcing successive approximations of the desired response

aversive stimulus

unpleasant stimulus

visual masking

when a visual stimulus is masked, it is exposed briefly (for maybe 50 milliseconds) and followed immediately by another visual stimulus that completely overrides it, thereby masking (preventing conscious perception) of the first stimulus

Theories of motivation

• drive-reduction theory • incentive theory • arousal theory

4 major aspects of partial-reinforcement schedules

• ratio • interval • fixed • variable


Set pelajaran terkait

GEOG traffic congestion strategies

View Set

Soliloquy and Figures of Speech in Romeo and Juliet, Part 4 (Assignment #1, Assignment #1, and Quiz) 100%

View Set

Progress Exam 09 Emerging Appraisal Methods

View Set

english german dictionary ( definitions ) 1

View Set

ch 16 mastering A&P2 - endocrine

View Set

Chapter 13: Open Economy Macroeconomics: Basic Concepts

View Set