Instrumental Conditioning: Foundations Chapter 5

¡Supera tus tareas y exámenes ahora con Quizwiz!

What is temporal contiguity?

The time elapsing between the production of the response and presentation of the reinforcer.

Decreasing the contingency makes the contiguity (stronger/weaker).

Weaker

What is omission training? Give an example.

-Suppression or decrease in response rate -Ex: telling a student to go out of the classroom after speaking on their cell phone (the student can no longer enjoy class)

(Instrumental conditioning procedures) The outcome can be, attending to its motivational characteristics, _________ or _________. Explain both.

-Appetitive: a pleasant outcome (food, water, sexual contact, social acceptance) -Aversive: an unpleasant outcome (shock, social rejection, embarrassment)

Describe Thorndike's theory of belongingness. Give an example.

-Certain responses naturally belong with the reinforcer because of the animal's evolutionary history -Ex: In his puzzle-box experiments with cats, operating a latch and pulling a string (responses naturally related to release from confinement), but not scratching oneself and yawning, were effectively learned as instrumental responses leading to release from confinement

In _________ conditioning, the animal has no control over the occurrence of the stimuli. In __________ conditioning, the occurrence/nonoccurence of the stimuli is dependent upon the animal's response.

-Classical -Instrumental

There are two main current procedures to study instrumental learning. What are they?

-Discrete-trial procedures -Free-operant procedures

Describe how magazine training works.

-During this training, food is delivered in the magazine -The sound of the food-delivery device is paired with the delivery of the food into the cup (classical conditioning) -Afterwards, the sound of the food-delivery device elicits a sign-tracking response (it becomes a second order reinforcer)

Describe the Shors (2006) experiment with rats that supports the activity deficit hypothesis.

-Exposure to inescapable shock disrupts escape learning of rats in a shuttle box, but facilities eyeblink conditioning -Suggests that helplessness effects are more likely observed in tasks that require movement

Describe Thorndike's law of effect.

-If a response is followed by a satisfying event, the S-R association is strengthened -If a response is followed by an annoying event, the S-R association is weakened

Give two examples of how completely new response forms can be shaped.

-Learning to throw a football 60 yards requires a specific combination of force, speed, and coordination that is unlikely to be a priori in the player's repertoire -Training pigeons to peck for food with either their beaks wide open or closed

Aside from Staddon and Simmelhag's theory, how might the periodicity of interim and terminal responses be explained?

-Terminal and interim responses are different manifestations of the same motivational system -Interim responses are general search responses -Terminal responses are focal search responses

Not only the quantity or quality of the reinforcer is important, but also the shifts in quantity and/or quality. These shifts produce two kinds of contrasts. What are they?

-Positive contrast: increase in responding that takes place when the current reward is larger or better than the previous reward -Negative contrast: decrease in responding that takes place when the current reward is smaller or worse than the previous reward

(Instrumental conditioning procedures) The response-outcome contingency can be either _________ or _________. Describe each.

-Positive: the outcome occurs more frequently when the response was previously performed (the response produces the outcome) -Negative: the outcome occurs less frequently when the response was previously performed (the response eliminates or prevents the outcome)

(Shifts in reinforcer quality/quantity) Describe the Mellgren (1972) experiment in which rats were given varying quantities of food in a maze.

-Rats in a runway were given a food reward in the goal box -In phase 1, two groups were given a small reward (2 pellets) and two groups were given a large reward (22 pellets) -In phase 2, one group in each reward quantity was shifted to the other quantity (S-L) and (L-S), whereas the remaining two groups continued to receive the same amount of reward (S-S and L-L) -The running speed was S-L>L-L>S-S>L-S -The rats that were all of a sudden presented with a larger amount of pellets ran much quicker

What is positive reinforcement? Give an example.

-Results in reinforcement or increase in response rate -Ex: giving sweets to a child when he/she is crying

What are the two types of negative reinforcement? Give examples.

-Results in reinforcement or increase in response rate 1. Escape: the response produces the termination of an aversive stimulus (turning the radio off when Britney Spears is singing) 2. Avoidance: the response prevents the presentation of an aversive stimulus (remembering your couple's anniversary and buying a present to avoid severe psychological pain)

What is punishment? Give an example.

-Results in suppression or decrease in response rate -Ex: criticizing a student for using a cell phone in class

What are three dependent variables used in discrete-trial procedures? Describe them.

-Running speed: time elapsed from the start box to the goal box (the speed is faster as conditioning proceeds) -Latency: the time it takes to leave the start box (the latency is shorter as conditioning proceeds) -Choice: in the T-maze, this is measured by choice of left arm vs. right arm (higher % of correct choices as conditioning proceeds)

Describe Skinner's superstition experiment. What did it lead researchers to believe?

-Skinner used pigeons to show that the delivery of a food reinforcer independently of the animal's behavior (it was given every 15 s) resulted in the "reinforcement" of several behaviors -Behaviors occurring just before the reinforcer (i.e., responses that were contiguous with the reinforcer) were repeated in the future, even when the response-reinforcer contingency was zero -Because the pigeons were behaving as if their responses caused the reinforcer (when they were actually independent of each other), he called it superstitious behavior -The idea was that accidental, or adventitious, reinforcement was responsible for the observation of these behaviors -Although these responses were not causing the reinforcer, they were strengthened due to mere contiguity -This led researchers to believe that temporal contiguity was not only a necessary, but also a sufficient condition for instrumental conditioning

(Shifts in reinforcer quality or quantity) Negative contrast is usually (stronger/weaker) than positive contrast. Explain why.

-Stronger -Reasons are not entirely clear, but emotional factors (e.g. frustration) might play an important role

What are some examples of instrumental conditioning?

-Studying hard to get a high score in an exam -A baby crying to get attention from parents -Putting a coin into a vending machine to get food

(Shifts in reinforcer quality/quantity) Positive and negative contrast effects can be __________ or __________. Explain.

-Successive: only one shift in reward magnitude, typically between phase -Simultaneous: reward conditions are shifted back and forth frequently, with a different cue signaling each reward condition

How did Staddon and Simmelhag (1971) explain the periodicity of interim and terminal responses?

-Terminal responses are species-typical responses that reflect the anticipation of food -Interim responses reflect other sources of motivation that are prominent early in the interval, when food is unlikely

(Learned-helplessness hypothesis) When an animal learns that there's nothing it can do to control an aversive outcome, there are two consequences. What are they?

-The lack of control reduces the motivation to perform the instrumental response in the future -Even if the animal makes the response, its expectation of lack of control makes it more difficult to learn that its behavior is now effective in producing reinforcement

What is a positive R-O contingency? What is a negative R-O contingency?

-The response produces the outcome -The response does not produce the outcome

There are two kinds of relationships between a response and a reinforcer that are known to be determinants of instrumental conditioning. What are they?

-The temporal relation, or temporal contiguity -The causal relation, or response-reinforcer contingency

What are three alternatives to the helplessness hypothesis?

1. Activity deficit hypothesis 2. Attention deficit hypothesis 3. Stimulus relations in escape conditioning

Three key elements are involved in instrumental conditioning. What are they?

1. Instrumental response 2. Outcome or reinforcer 3. Relation or contingency between the response and the reinforcer

Describe Edward Thorndike's puzzle box.

A hungry animal was placed in the bod with some food left outside in plain view. The task of the animal was to get out of the box in order to get the food. To escape the box, the animal had to make different responses, such as pulling a ring to release a latch that blocked the door.

Given an example of more than one action acting as the same operant response.

A lever press can be performed by the a rat with one paw or the other, or with the tail, but they all produce the same outcome (the closure of the microswitch by the lever being pressed). Thus, they are instances of the same operant response.

What is an interim response?

A response that increases after food delivery and declines as the next food delivery draws closer.

What is a terminal reponse?

A response that occurs towards the end of an interval.

Describe free-operant procedures.

Allows the animal to repeat the instrumental response without constraint.

Describe a discrete-trial procedure.

At the end of each training trial the animal is removed from the apparatus. Thus, the response is performed only once on each trial.

Research shows that there are important limitations on the types of new behaviors or responses that may be modified by instrumental conditioning. As the CS (tone) must be relevant to the US (food), the instrumental response must also be relevant to the reinforcer or outcome. Thus, Thorndike proposed the concept of ____________.

Belongingness

With a long response-outcome delay, contextual cues can become associated with the reinforcer, and these cues compete with the target response. What is this process called?

Classical conditioning

Instrumental behavior is commonly referred to as "________ ________".

Goal directed

What is another name for omission training? Why is it called this?

Differential reinforcement of other behavior (DRO) This term highlights the fact that the animal receives the appetitive stimulus provided that it is engaged in behavior other than the response specified by the procedure. Since the animal is always engaged in one or other behavior, this procedure allows the animal to receive reinforcement...unless the target behavior is performed.

Are Thorndike's puzzle box and a runaway (I-shape) maze or T-shape maze examples of discrete trial procedures or free-operant procedures?

Discrete-trial procedures. The response is performed only once on each trial.

_________ __________ was a pioneer in the research of instrumental conditioning. He was interested in the empirical study of animal intelligence, and developed a set of puzzle boxes for his experiments.

Edward Thorndike

(True/false) Shaping involves generating a new behavior.

False. It usually consists of the combination of familiar responses into a new behavior. However, completely new response forms can also be shaped.

(True/false) Different responses that have the same environmental effect are instances of difference operant responses.

False. They are the same.

(True/false) The learned-helplessness effect term is specific to aversive conditioning.

False. This effect has been also demonstrated with appetitive conditioning.

The creation of new responses by shaping depends on the inherent variability of behavior. Explain using a football player as an example.

If a new shaping step requires a trainee to throw a football 30 yards, each throw will likely be something different. The trainee may throw the ball 25, 32, 29, or 34 yards on successive attempts. The trainee may then be required to throw the ball 33 yards. Each throw will again be different, but will be closer to the 33 yard goal. As this pattern continues, the player will make longer and longer throws. The shaping process takes advantage of the variability of behavior to gradually move the distribution of responses away from the trainee's starting point and toward responses that are entirely new in the trainee's repertoire.

The instrumental response to be conditioned must be a part of the behavior system activated by the reinforcer. What does this mean?

If food is the reinforcer, the instrumental response must be a behavior that the animal performs to find food, such as digging, rooting, etc. It can't be a behavior such as grooming because that's not part of the behavior system for food.

What is the difference between instrumental and classical conditioning?

In classical conditioning, responses are produced by stimuli. In instrumental conditioning, stimuli are a result of the animal's behavior.

Researchers falsely concluded that contiguity, not contingency, was the critical factor in instrumental conditioning. What led them to believe this?

In experiments of delay of reinforcement, each instrumental response is typically followed by the reinforcer, and it's just more or less delayed. Thus, a perfect response-reinforcer contingency exists in these experiments. Still, even with this perfect causal relation between the response and the reinforcer, conditioning does not occur if the reinforcer is delayed too long.

Describe the activity deficit hypothesis.

Inescapable shocks produce a decrease in motor movement that is responsible for subsequent performance deficits.

In the learned helplessness effect, exposing animals to a zero R-O contingency has what effect?

It disrupts subsequent learning of a positive R-O contingency

(Larger/smaller) amounts of reinforcer produce better conditioning.

Larger

The learned-helplessness effect is similar to _________ _________ in Pavlovian conditioning, in which exposing animals to a zero CS-US contingency disrupts subsequent learning of a positive CS-US contingency.

Learned irrelevance

What is instrumental conditioning?

Learning of behaviors that were previously instrumental in producing certain consequences.

Rats do not a priori press a lever to get food. They must first learn that the food is provided in a food-delivery device called a food magazine. This is called ________ ________ and is required in order to establish lever-press behavior in animals.

Magazine training

The temporal contiguity between the instrumental response and the reinforcer presentation influences instrumental conditioning. Why is immediate reinforcement more effective than delayed reinforcement?

One possibility is that delayed reinforcement makes it difficult to figure out which response was responsible for the occurrence of the reinforcement. Responses that occur immediately before the presentation of the reinforcer would be reinforced at the expense of the response that really caused the reinforcer.

Give an example of how response variability can be the basis of instrumental conditioning.

Pigeons were presented with 8 keys and were reinforced for pecking 2 of those keys in a sequence that differed from all previous trials.

The concept of belongingness in instrumental conditioning is compatible with the behavior systems approach. Explain and give an example.

Only responses corresponding to the feeding system can be subject to instrumental reinforcement with food as the outcome. For example in hamsters, environment-directed activities (digging, scrabbling, and rearing) are part of the feeding system, whereas self-care responses (face washing and scratching) are not. Shettleworth (1975) showed that food reinforcement increased environment-directed responses, but not self-care responses.

Skinner recognized that a measurable unit of behavior should be defined in order to experimentally analyze behavior. He developed the concept of ___________ __________ as a way of dividing behavior into meaningful units.

Operant response

A technique to minimize the (otherwise) weak conditioning with delayed reinforcement is to use a marking procedure. What is this?

Presenting some stimuli to mark the instrumental response, so it can be distinguished from other activities of the organism. An example is music on the phone when you're waiting for assistance. This says that you've already performed the needed instrumental response (pressing a series of keys on the phone) and the reinforcer (customer service representative) will be with you shortly.

The effectiveness of a reinforcer is determined by the expectations of the animal based on what?

Prior experience

Explain this diagram that shows why immediate reinforcement is more effective than delayed reinforcement.

R1 is the target response for reinforcement. If too much time elapses between R1 and the reinforcer, the animal will have already performed other behaviors. The behavior performed directly before the presentation of the reinforcer will be reinforced instead of R1.

Breland & Breland (1961) identified some responses that could not be associated with appetitive outcomes (food). Give an example.

Raccoons and pigs were not able to learn to pick up coins and put them in a container. They began treating the coins like food. The raccoons rubbed and dunked the coins and the pigs rooted the coins. This is instinctive drift.

After magazine training, shaping is used. What is shaping?

Reinforcing successive approximations of the animal to the required response and non- reinforcement of earlier response forms. In other words, getting an animal to perform more and more complex tasks.

Free-operant methods permit continuous observation of behavior over long periods. Because there is a continuous opportunity to respond, the organism determines the frequency of its instrumental response. Thus, free-operant methods allow the study of changes in the likelihood of behavior over time. The rate of occurrence of operant behavior (frequency of the response per unit of time) can be used as the measure of _________ __________.

Response probability

Research shows that _________ _________ can be the basis of instrumental reinforcement. In this case, reinforcement is given for doing something new, that is, producing a response that differs from the responses produced on the preceding trials.

Response variability

What is instinctive drift?

Responses that animals perform instinctively with food-related objects.

According to Thorndike, the consequence of escaping a puzzle box strengthened the ________ association, and as it became stronger, the animal made the response more quickly.

S-R

If you'd like to train a rat to press a lever, the following chain of responses may be reinforced: -Get up on hind legs anywhere on the experimental chamber -Get up on hind legs over response lever -Touch lever -Depress lever What is this process called?

Shaping

Free-operant procedures were first devised by ___________ to study behavior in a more continuous manner.

Skinner

Describe the attention deficit hypothesis. Describe the Maier, Jackson and Tomie (1987) experiment that supports this.

States that inescapable shocks cause the animal to pay less attention to its actions. This experiment showed that a marking procedure can alleviate the learned-helplessness effect. Remember, marking helps the animal pay attention to the response that was instrumental in producing the outcome. In this experiment, group Y-M (yoked-marker) showed no evidence of learned helplessness.

Both Thorndike and Skinner believed that reinforcement increases the likelihood that the same instrumental response will be repeated in the future. That is, that instrumental conditioning produces uniformity or ____________ in behavior.

Stereotypy

Skinner compared the contingencies of reinforcement in shaping with the contingencies of _________.

Survival (natural selection)

The learned-helplessness hypothesis states that during exposure to uncontrollable reinforcers, the animal learns what?

That the reinforcers are independent of its behavior. In other words, the animal learns that there's nothing it can do to control the shocks.

An operant response is defined in terms of?

The effect that is has on the environment.

Describe the Skinner box.

The animal can receive appetitive stimuli (food or water) or aversive stimuli (shock). The occurrence of these stimuli can be dependent upon some responses of the animal. Some mechanisms can allow the animal to control the occurrence of those stimuli (a lever in experiments with rats and keys in experiments with pigeons). Other stimuli such as tones or light may be presented.

The learned-helplessness hypothesis assumes that animals can perceive what?

The contingency between behavior and the delivery of the reinforcer.

What does the activity deficit hypothesis fail to explain? Describe the Jackson, Alexander and Maier (1980) experiment that supports this.

The deficit in choice learning following inescapable shock. In this experiment, rats previously exposed to inescapable shock has difficulty learning an escape response consisting of selecting the correct arm in a Y maze. Both choices require the same amount of movement, the rats just chose incorrectly more often.

What is the difference between the learned-helplessness hypothesis and the learned-helplessness effect?

The effect is the pattern of results obtained with the triadic design. The hypothesis is an explanation or interpretation of the effect.

What is response-reinforcer contingency?

The extent to which the response is a necessary and sufficient condition for the occurrence of the reinforcer. In other words, how well the response predicts the reinforcer.

The effect of poor temporal contiguity can be explained in terms of a poor response-reinforcement contingency. Explain.

The instrumental response won't be connected with the reinforcer because the reinforcer is treated as independent of the response.

What was Thorndike's main observation during his puzzle box experiments?

The latencies to escape from the boxes decreased throughout the successive trials. On the initial trials, the responses displayed by the animals were varied and random. However, some responses eventually resulted in opening the door. As learning proceeded, the animal produced the right responses (i.e., to open the door) faster on each trial.

After Skinner's (1948) superstition experiment, other studies demonstrated that the response-reinforcer contingency does matter. In those studies, animals exposed to uncontrollable reinforcers were later impaired in learning a response-reinforcer relationship. What is this effect called?

The learned-helplessness effect

Thorndike thought that the successful escapes from his puzzle box led to learning of an association, or bond between what two things?

The stimulus of being inside the puzzle box (S) and the escape response (R).

The first experiments on learned-helplessness were performed with dogs in an escape-avoidance preparation using a shuttle-box. Describe the triadic design of these experiments.

There are three groups of animals, Group E (escape), Y (yoked) and R (restricted). Exposure phase: Group E rats are exposed to shocks that can be terminated by performing an escape response (rotating a wheel). Group Y receives the same shocks as Group E, but cannot escape them. Group R receives no shocks but is restricted to the apparatus like the other groups. Conditioning phase: All groups receive escape avoidance training. They are put in a shuttle apparatus with two compartments, and must jump back and forth to avoid shocks. Result: Group Y showed slow avoidance learning in the conditioning phase, whereas the other two groups showed rapid avoidance learning. All in all, during the exposure phase Group Y learned that no matter what they did, they would still get shocked. Therefore in the conditioning phase, they were not as motivated to try and avoid the shocks.

Staddon and Simmelhag (1971) attempted to replicate Skinner's superstition experiment. What did they find?

They found that some responses occurred toward the end of the interval (terminal responses) and other responses increased after the food delivery and declined as the next food delivery drew closer (interim repsonses). Thus, periodic presentations of a reinforcer produced behavioral regularities, with some responses predominating later in the interval and other responses predominating early in the interval.

Describe stimulus relations in escape conditioning.

This focuses on Group E, the animals that were exposed to shocks but were able to escape. Researchers wanted to know why exposure to shock was less harmful if an escape was available. They found that making an escape response results in internal response feedback cues. Just before escaping the shock, there are shock-cessation feedback cues, and just after escaping the shock there are safety-signal feedback cues. These cues act as conditioned inhibitors for fear because they signal the absence of shock. The conditioned inhibitors act as a buffer to prevent chronic stress in the animal.

Explain this diagram.

This illustrates stimulus relations in escape conditioning. When animals are presented with a shock and they have the ability to escape, there are shock-cessation feedback cues before escape and safety-signal feedback cues after they escape. These cues act as conditioned inhibitors for fear because they signal the absence of shock. This alleviates stress in the animal. This is why in the triadic design of the learned helplessness theory, Group E did not experience learned helplessness in the conditioning phase.

Thorndike noticed that animals learned on a _________ basis. Initially, they produced many responses, but only a few of them were followed by the expected outcome.

Trial-and-error

(True/false) A change in the quantity of a reinforcer may make the reinforcer qualitatively different.

True

A technique to minimize the (otherwise) weak conditioning with delayed reinforcement is to use a secondary or conditioned reinforcer. What is this?

Using verbal feedback like "well done" after the response. This secondary reinforcer is given immediately after the behavior and serves as a bridge between the instrumental response and the primary reinforcer.


Conjuntos de estudio relacionados

Chapter 12 and 13 Microeconomics Final

View Set

Ch. 23 Disruptive Behavior Disorders M.C.

View Set

Chapter 12 Business Presentations

View Set

Chapter 27: Safety, Security, and Emergency Preparedness

View Set

PMR: Chapter 1 - Recognize Core Terminology

View Set

3-D Geo Review - Surface Area & Volume

View Set