Expertise Ch.9 Cognitive Psych.
ventral medial prefrontal cortex
it is thought that the more medial portion of the anterior prefrontal region, where Gage's injury was localized, is important to motivation, emotional regulation, and social sensitivity
(Barbey & Sloman, 2007).
(Barbey & Sloman, 2007). There is also evidence that experience makes peo - ple more statistically tuned. In a study of medical diag- nosis, Weber, Böckenholt, Hilton, and Wallace (1993) found that doctors were quite sensitive both to base rates and to the evidence provided by the symptoms. Moreover, the more clinical experience the doctors had, the more tuned were their judgments. ■ Although participants' processing of abstract probabilities often does not correspond with Bayes's theorem, their behavior based on experience often does.
intelligent tutoring systems
Probably the most extensive use of such componential analysis is for intelligent tutoring systems (Sleeman & Brown, 1982). These computer systems inter - act with students while they are learning and solving problems, much as a hu- man tutor would. An example of such a tutor is the LISP tutor (J. R. Anderson, Conrad, & Corbett, 1989; J. R. Anderson & Reiser, 1985; Corbett & Anderson, 1990), which teaches LISP, the main programming language used in artificial intelligence in the 1980s and 1990s. The LISP tutor continuously taught LISP to students at Carnegie Mellon University from 1984 to 2002 and served as a prototype for a generation of intelligent tutors, many of which have focused on teaching middle-school and high-school mathematics. The mathematics tutors are now distributed by a company called Carnegie Learning, spun off by Car - negie Mellon University in 1998. The Carnegie Learning mathematics tutors have been deployed to about 3,000 schools nationwide and have interacted with over 600,000 students each year (Koedinger & Corbett, 2006; Ritter, Anderson, Koedinger, & Corbett, 2007; you can visit the Web site www.carnegielearning .com for promotional material that should be taken with a grain of salt). Color Plate 9.1 shows a screen shot from its most widely used product, which is a tutor for high-school algebra. A large-scale study conducted by the Rand Corporation (Pane, Griffin, McCaffrey, & Karam, 2013) indicates that the tutor does provide real, if modest, gains for high-school students. Co NC l USI o NS / 235 1 Opportunities (a) Number of errors .20 .50 1.00 2 3 − 4 5 − 8 (b) Coding time (s) 1 5 10 20 Opportunities 2 3 − 4 5 − 8 FIGURE 9.19 Data from the l ISP tutor: (a) number of errors (maximum is three) per rule as a function of the number of opportunities for practice; (b) time to correctly code rules as a function of the amount of practice. were able to learn new rules in a lesson quite rapidly, whereas other students had more difficulty. More or less independent of this acquisition factor, students could be classified according to how well they retained rules from earlier lessons. 3 Thus, students differ in how rapidly they learn with the LISP tutor. However, the tutor employs a mastery learning system in which slower students are given more prac- tice and so are brought to the same level of mastery achieved by other students. Students emerge from their interactions with the LISP tutor having acquired a complex and sophisticated skill. Their enhanced programming abilities make them appear more intelligent among their peers. However, when we examine what underlies that newfound intelligence, we find that it is the methodical ac- quisition of some 500 rules of programming. Some students can acquire these rules more easily than others because of past experience and specific abilities. However, when they graduate from the LISP course, all students have learned the 500 new rules. With the acquisition of these rules, few differences remain among the students with respect to ability to program in LISP. Thus, we see that, in the end, what is important with respect to individual differences is how much information students have previously learned, and not their native ability. ■ By carefully monitoring individual components of a skill and pro- viding feedback on learning, intelligent tutors can help students rap- idly master complex skills.
Priest and Lindsay (1992),
Priest and Lindsay (1992), failed to find a difference in problem-solving direction between novices and experts. Their study included British university students rather than American students, and they found that both novices and experts predominantly reasoned forward. However, their ex- perts were much more successful in doing so. Priest and Lindsay suggest that the experts have the necessary experience to know which forward inferences are appropriate for a problem. It seems that novices have two choices—reason forward, but fail (Priest & Lindsay's students) or reason backward, which is hard (Larkin's students). Reasoning backward is hard because it requires setting goals and subgoals and keeping track of them. For instance, a student must remember that he or she is calculating F so that a can be calculated in order for v to be calculated. Thus, reasoning backward puts a severe strain on working memory and this can lead to errors. Reasoning forward eliminates the need to keep track of subgoals. However, to successfully reason forward, one must know which of the many possible forward inferences are relevant to the final solution, which is what an expert learns with experience. That is, experts learn to associate various infer - ences with various patterns of features in the problems. The novices in Larkin's study seemed to prefer to struggle with backward reasoning, whereas the novices in Priest and Lindsay's study tried forward reasoning without success. Not all domains show this advantage for forward problem solving. A good counterexample is computer programming (J. R. Anderson, Farrell, & Sauers, 1984; Jeffries, Turner, Polson, & Atwood, 1981; Rist, 1989). Both novice and expert programmers develop programs in what is called a top-down man- ner: that is, they work from the statement of the problem to subproblems to sub-subproblems, and so on, until they solve the problem. This top-down development is basically the same as what is called reasoning backward in the context of geometry or physics. However, there are differences between expert programmers and novice programmers. Experts tend to develop problem solu - tions breadth first, in which they will work out all of the high-level solution, then decompose that into more detail, and so on, until they get to the final code. In contrast, novices will completely code the part of the problem before really working out the overall solution. Physics and geometry problems have a rich set of givens that are more predictive of solutions than is the goal, and this ena - bles forward problem solving. In contrast, nothing in the typical statement of a programming problem would guide a working forward or bottom-up solu- tion. The typical problem statement only describes the goal and often does so with information that will guide a top-down solution. Thus, we see that exper - tise in different domains requires the adoption of those approaches that will be successful for those particular domains. In summary, the transition from novices to experts does not entail the same changes in strategy in all domains. Different problem domains have dif- ferent structures that make different strategies optimal. Physics experts learn to reason forward; programming experts learn breadth-first expansion.
conditional probability
A conditional probability is the probability that a particular type of evidence is true if a particular hypothesis is true. Let us consider what the conditional probabilities of the evidence (door ajar) would be under the two hypotheses. First, suppose I believe that the probability of the door's being ajar is quite high if I have been burglarized, for example, 4 out of 5. Let E denote the evi - dence, or the event of the door being ajar. Then, we will denote this conditional probability of E given that H is true as Prob( E | H ) 5 .8 Second, we determine the probability of E if H is not true—that is, the probabil- ity the door would be ajar even if there was not a burglary. Suppose I know that chances are only 1 out of 100 that the door would be left ajar by accident, by neighbors with a key, or for some other reason. We denote this probability by Prob( E |~ H ) 5 .01 the probability of E given that H is not true.
theory of identical elements
A century ago Edward Thorndike criticized this doctrine of formal discipline, which holds that the mind can be trained like a muscle. Instead, he proposed his theory of identical elements . According to Thorndike, the mind is not composed of general faculties, but rather of specific habits and associations, which provide a person with a variety of narrow responses to very specific stim- uli. In fact, during Thorndike's time, the mind was regarded as just a convenient name for countless special operations or functions (Stratton, 1922). Thorndike's theory stated that training in one kind of activity would transfer to another only if the activities had situation-response elements in common: One mental function or activity improves others in so far as and be- cause they are in part identical with it, because it contains elements common to them. Addition improves multiplication because multi- plication is largely addition; knowledge of Latin gives increased ability to learn French because many of the facts learned in the one case are needed in the other. (Thorndike, 1906, p. 243) Thus, Thorndike was happy to accept transfer between diverse skills as long as the transfer was mediated by identical elements. Generally, however, he concluded that The mind is so specialized into a multitude of independent capacities that we alter human nature only in small spots, and any special school training has a much narrower influence upon the mind as a whole than has commonly been supposed. (p. 246) Although the doctrine of formal discipline was too broad in its predic- tions of transfer, Thorndike formulated his theory of identical elements in what proved to be an overly narrow manner. For instance, he argued that if you solved a geometry problem in which one set of letters is used to label the points in a diagram, you would not be able to transfer to a geometry problem with a different set of letters. The research on analogy examined in Chapter 8 indi- cated that this is not true. Transfer is not tied to the identity of surface elements. In some cases, there is very large positive transfer between two skills that have the same logical structure even if they have different surface elements (see Singley & Anderson, 1989, for a review). Thus, for instance, there is large posi- tive transfer between different word-processing systems, between different pro- gramming languages, and between using calculus to solve economics problems and using calculus to solve problems in solid geometry.
Lesgold et al
A good example of this shift in processing of perceptual features is the in- terpretation of X rays. Figure 9.13 is a schematic of one of the X rays diagnosed by participants in the research by Lesgold et al. The sail-like area in the right lung is a shadow (shown on the left side of the X ray) caused by a collapsed lobe of the lung that created a denser shadow in the X ray than did other parts of the lung. Medical students interpreted this shadow as an indication of a tu- mor because tumors are the most common cause of shadows on the lung. Radiological experts, on the other hand, were able to correctly interpret the shadow as an indication of a collapsed lobe. They saw that features such as the size of the sail-like region are counterindicative of a tumor. Because the radiologists are experts at examining these X rays, they no longer rely on a sim- ple associations between shadows on the lungs and tumors, but rather can see a richer set of features in X rays. ■ An important dimension of growing expertise is the ability to learn to perceive problems in ways that enable more effective problem- solving procedures to apply.
mastery learning
A particularly effective part of such componential programs is mastery learning . The basic idea in mastery learning is to follow students' performance on each of the components underlying the cognitive skill and to ensure that all components are mastered. Typical instruction, without mastery learning, leaves some students not knowing some of the material. This failure to learn some of the components can snowball in a course in which mastery of earlier material is a prerequisite for mastery of later material. There is a good deal of evidence that mastery learning leads to higher achievement (
(Walsh & Anderson, 2011)
A recent FRN study by a graduate student of mine (Walsh & Anderson, 2011) produced a striking demonstration of how experience-based (and stupid) this reinforcement learning can be. He had participants learn a simple task where they were shown two repeating stim- uli and had to choose one. Sometimes their choice was rewarded, and they were motivated to choose the one that was rewarded more often. The critical manipulation was whether the participants were told at the beginning what the better stimulus was or had to learn it from experience. Not surprisingly, if told which stimulus was better, they chose it from the start. If they were not told, it took them a while to learn the better stimulus. However, their FRN showed no difference between the two conditions. Whether participants had been told the correct response or not, the FRN started out responding identi- cally to the two stimuli. Only with time did it come to respond stronger when the reward (or lack of reward) for that stimulus was unexpected. So even though their choice behavior responded immediately to instruction, their FRN showed a slow learning process. It is as if their minds knew but their hearts had to learn. It is generally thought that the ventromedial prefrontal cortex is respon- sible for a more reflective processing of rewards, while the dopamine neurons in the basal ganglia are responsible for a more reflexive processing of rewards. A number of neural imaging studies seem consistent with this interpretation.
de Groot (1965, 1966)
A surprising discovery about expertise is that experts seem to display a special enhanced memory for information about problems in their domains of exper - tise. This enhanced memory was first discovered in the research of de Groot (1965, 1966), who was attempting to determine what separated master chess players from weaker chess players. It turns out that chess masters are not par - ticularly more intelligent in domains other than chess. De Groot found hardly any differences between expert players and weaker players—except, of course, that the expert players chose much better moves. For instance, a chess master considers about the same number of possible moves as does a weak chess player before selecting a move. In fact, if anything, masters consider fewer moves than do chess duffers. However, de Groot did find one intriguing difference between masters and weaker players. He presented chess masters with chess positions (i.e., chess- boards with pieces in a configuration that occurred in a game) for just 5 s and then removed the chess pieces. The chess masters were able to reconstruct the positions of more than 20 pieces after just 5 s of study. In contrast, the chess duffers could reconstruct only 4 or 5 pieces—an amount much more in line with the traditional capacity of working memory. Chess masters appear to have built up patterns of 4 or 5 pieces that correspond to common board configura- tions as a result of the massive amount of experience that they have had with chess. Thus, they remember not individual pieces but these patterns. In line with this analysis, if the players are presented with random chessboard posi- tions rather than ones that are actually encountered in games, no difference is demonstrated between masters and duffers—both reconstruct the positions of only a few pieces. The masters also complain about being very uncomfortable and disturbed by such chaotic board positions
framing effects
Although one might view the functions in Figures 11.7 and 11.8 as reasonable, there is evidence that they can lead people to do rather strange things. These demonstrations deal with framing effects . These effects refer to the fact that peo- ple's decisions vary, depending on where they perceive themselves to be on the subjective utility curve in Figure 11.7. Because the subjective difference between losing $140 and $150 is small, the person will likely choose B and make the bet. On the other hand, the bettor could view it as the following choice: C. Ref use the bet and face the certainty of having nothing change. D. Mak e the bet and face a good chance of losing an additional $10 and a poor chance of gaining $140. In this case, because of the greater weight on losses than on gains and because of the negatively accelerated utility function, the bettor is likely to avoid the bet. The only difference is whether one places oneself at the 2 $140 point or the 0 point on the curve in Figure 11.7. However, one gets a different evaluation of the two outcomes, depending on where one places oneself. As an example that appears to be more consequential, consider this situa- tion described by Kahneman and Tversky (1984): Problem 1: Imagine that the U.S. is preparing for the outbreak of an unusual Asian disease, which is expected to kill 600 people. Two alter- native programs to combat the disease have been proposed. Assume that the exact scientific estimates of the consequences of the programs are as follows: If program A is adopted, 200 people will be saved. If program B is adopted, there is a one-third probability that 600 people will be saved and a two-thirds probability that no people will be saved. Which of the two programs would you favor? Seventy-two percent of the participants preferred program A, which guarantees lives, to dealing with the risk of program B. However, consider what happens when, rather than describing the two programs in regard to saving lives, the two programs are described as follows: If program C is adopted, 400 people will die. If program D is adopted, there is a one-third probability nobody will die and a two-thirds probability that 600 people will die. With this description, only 22% preferred program C, which the reader will rec- ognize as equivalent to A (and D is equivalent to B). Both of these choices can be understood in terms of a negatively accelerated utility function for lives. In the first case, the subjective value of 600 lives saved is less than three times the subjective value of 200 lives saved, whereas in the second case, the subjective value of 400 deaths is more than two-thirds the subjective value of 600 deaths.
Greene, Sommerville, Nystrom, Darley, and Cohen (2001).
An interesting study in framing was performed by Greene, Sommerville, Nystrom, Darley, and Cohen (2001). They compared ethical dilemmas such as the following pair. In the first dilemma, a runaway trolley is headed for five people who will be killed if it proceeds on its current course. The only way to save them is to hit a switch that will turn the trolley onto an alter - nate set of tracks where it will kill one person instead of five. The second dilemma is like the first, except that you are standing next to a large stranger on a footbridge that spans the tracks in between the oncoming trolley and the five people. In this scenario, the only way to save the five people is to push the stranger off the bridge onto the tracks below. He will die, but his large body will stop the trolley from reaching the others. In the first case, most people are willing to sacrifice one person to save five, but in the second case, they are not. In an fMRI study, Greene et al. compared the brain areas activated when people considered an impersonal dilemma such as the first case, with the brain areas activated when people considered a personal dilemma such as the sec- ond. In the impersonal case, the regions of the parietal cortex that are associ- ated with cold calculation were active. On the other hand, when they judged the personal case, regions of the brain associated with emotion (such as the ventromedial prefrontal cortex that we discussed in the beginning of the chap- ter) were active. Thus, part of what can be involved in the different framing of problems seems to be which brain regions are engaged. ■ When there is no clear basis for making a decision, people are influenced by the way in which the problem is framed.
componential analyses
Approaches to instruction that begin with an analysis of the ele- ments to be taught are called componential analyses . A description of the ap- plication of componential approaches to the instruction of a number of topics in reading and mathematics can be found in J. R. Anderson (2000). Generally, higher achievement is obtained in programs that include such componential analysis. A particularly effective part of such componential programs is mastery learning
Bavelier, Green, Pouget, and Schrater (2012)
Bavelier, Green, Pouget, and Schrater (2012) emphasize the benefits of action video games, which include some of the more violent games such as the "Call of Duty" series. Most of the benefits seem confined to measures of vision and attention. This seems a plausible sort of transfer because these games often require monitoring rapidly changing visual displays. Among the benefits shown for players of action video games were greater visual acuity than nonplayers and the ability to track more objects in a random moving display of objects. Recently, however, many of the existing studies have been criticized (Boot, Blakely, & Simons, 2011) because they compare video- game players with non-video-game players, and different sorts of people
Bayes's theorem
Bayes's theorem specifies how to combine the prior probability of a hypothesis with the conditional probabilities of the evidence to determine the posterior probability of a hypothesis conditional syllogism of the following sort: If a burglar is in the house, then the door will be ajar. The door is ajar. A burglar is in the house. As a conditional syllogism, it would be judged as the erroneous affirmation of the consequent. However, it does have a certain plausibility as an inductive argument. Bayes's theorem provides a way of assessing just how plausible it is by combining what are called a prior probability and a conditional probability to produce what is called a posterior probability, which is a measure of the strength of the conclusion. Because Bayes's theorem rests on a mathematical analysis of the nature of probability, the formula can be proved to evaluate hypotheses correctly. Thus, it enables us to precisely determine the posterior probability of a hypothesis given the prior and conditional probabilities. The theorem serves as a prescriptive model , or normative model, specifying the means of evaluating the prob- ability of a hypothesis. Such a model contrasts with a descriptive model , which specifies what people actually do. People normally do not perform the calculations that we have just gone through any more than they follow the
Chi, Feltovich, and Glaser (1981)
Chi, Feltovich, and Glaser (1981) asked participants to classify a large set of problems into similar categories. Figure 9.11 shows pairs of problems that novices thought were similar and the novices' explanations for the similarity groupings. As can be seen, the novices chose surface features, such as rota- tions or inclined planes, as their bases for classification. Being a physics novice myself, I have to admit that these seem very intuitive bases for similarity. Contrast these classifications with the pairs of problems in Figure 9.12 that the expert participants saw as similar. Problems that are completely different on the surface were seen as similar because they both entailed conservation of energy or they both used Newton's second law. Thus, experts have the ability to map surface features of a problem onto these deeper principles. This ability is very useful because the deeper principles are more predictive of the method of solution. This shift in classification from reliance on simple features to reliance on more complex features has been found in a number of domains, including mathematics (Silver, 1979; Schoenfeld & Herrmann, 1982), computer program- ming (Weiser & Shertz, 1983), and medical diagnosis (Lesgold et al., 1988)
Carraher, Carraher, and Schliemann (1985).
Carraher, Carraher, and Schliemann (1985). These researchers investigated the mathematical strategies used by Brazilian schoolchildren who also worked as street vendors. On the job, these children used quite sophisti- cated strategies for calculating the total cost of orders consisting of different numbers of different objects (e.g., the total cost of 4 coconuts and 12 lemons); what's more, they could perform such calculations reliably in their heads. Car - raher et al. actually went to the trouble of going to the streets and posing as cus- tomers for these children, making certain kinds of purchases and recording the percentage of correct calculations. The experimenters then asked the children to come with them to the laboratory, where they were given written mathemat- ics tests that included the same numbers and mathematical operations that they had manipulated successfully in the streets. For example, if a child had correctly calculated the total cost of 5 lemons at 35 cruzeiros apiece on the street, the child was given the following written problem: 5 3 35 5 ? Whereas children correctly solved 98% of the problems presented in the real- world context, they solved only 37% of the problems presented in the labora- tory context. It should be stressed that these problems included the exact same numbers and mathematical operations. Interestingly, if the problems were stated in the form of word problems in the laboratory, performance improved to 74%. This improvement runs counter to the usual finding, which is that word problems are more difficult than equivalent "number" problems (Carpenter & Moser, 1982). Apparently, the additional context provided by the word problem allowed the Brazilian children to make contact with their pragmatic strategies. The study of Carraher et al. showed a curious failure of expertise to transfer from the real world to the classroom, but the typical concern of educators is whether what is taught in one class will transfer to other classes and the real world. Early in the 20th century, when educators were fairly optimistic on this matter, a number of educational psychologists subscribed to what has been called the doctrine of formal discipline (Angell, 1908; Pillsbury, 1908; Woodrow, 1927). This doctrine held that studying such esoteric subjects as Latin and geometry was of significant value because it served to discipline the mind. Those who believed in formal discipline subscribed to the faculty view of mind, which extends back to Aristotle and was first formalized by Thomas Reid in the late 18th century (Boring, 1950). The faculty view held that the mind is composed of a collection of general faculties, such as observation, attention, discrimination, and reasoning, which could be exercised in much the same way as a set of muscles. The content of
Charness (1976)
Charness (1976) compared experts' memory for chess posi- tions immediately after they had viewed the positions or after a 30-s delay filled with an interfering task. Class A chess players showed no loss in recall over the 30-s interval, unlike weaker participants, who showed a great deal of forgetting. Thus, expert chess players, unlike duffers, have an increased capacity to store in- formation about the domain. Interestingly, these participants showed the same poor memory for three-letter trigrams as do ordinary participants. Thus, their increased long-term memory is only for the domain of expertise
Chase and Ericsson (1982)
Chase and Ericsson (1982), who studied the development of a simple but remarkable skill. They watched a participant, called SF, increase his digit span, which is the number of digits that he could repeat after one presentation. As discussed in Chapter 6, the normal digit span is about 7 or 8 items, just enough to accommodate a telephone num - ber. After about 200 hr of practice, SF was able to recall 81 random digits pre- sented at the rate of 1 digit per second. Figure 9.17 illustrates how his memory span grew with practice. What was behind this apparently superhuman feat of memory? In part, SF was learning to chunk the digits into meaningful patterns. He was a long- distance runner, and part of his technique was to convert digits into run- ning times. So, he would take 4 digits, such as 3492, and convert them into "Three minutes, 49.2 seconds—near world-record mile time." Using such a strategy, he could convert a memory span for 7 digits into a memory span for 7 patterns consisting of 3 or 4 digits each. This would get him to a digit span of more than 20, far short of his eventual performance. In addition to this chunking, he developed what Chase and Ericsson called a retrieval structure, which enabled him to recall 22 such patterns. This retrieval structure was very specific; it did not generalize to retrieving letters rather than digits. Chase and Ericsson hypothesized that part of what underlies the development of exper - tise in other domains, such as chess, is the development of retrieval structures, which allows superior recall for past patterns. ■ As people become more expert in a domain, they develop a better ability to store problem information in long-term memory and to re- trieve it. Chase and Ericsson's participant SF was unable to transfer memory span skill from digits to letters.
Chase and Simon (1973)
Chase and Simon (1973) compared novices, Class A (advanced) players, and masters. They compared these different types of players with respect to their ability to reproduce game positions such as those shown in Figure 9.14a and to reproduce random positions such as those illustrated in Figure 9.14b. As shown in Figure 9.15, memory was poorer for all groups for the random positions, and if anything, masters were worst at re- producing these positions. On the other hand, masters showed a considerable advantage for the actual board positions. This basic phenomenon of superior expert memory for meaningful problems has been demonstrated in a large number of domains, including the game of Go (Reitman, 1976), electronic cir - cuit diagrams (Egan & Schwartz, 1979), bridge hands (Engle & Bukstel, 1978; Charness, 1979), and computer programming (McKeithen, Reitman, Rueter, & Hirtle, 1981; Schneiderman, 1976). Chase and Simon (1973) also used a chessboard-reproduction task to examine the nature of the patterns, or "chunks," used by chess masters. The participants' task was simply to reproduce the positions of pieces of a target 224 / Chapter 9 e x P e RTIS e chessboard on a test chessboard. In this task, participants glanced at the tar - get board, placed some pieces on the test board, glanced back to the tar - get board, placed some more pieces on the test board, and so on. Chase and Simon defined a chunk to be a group of pieces that participants moved after one glance. They found that these chunks tended to define mean- ingful game relations among the pieces. For in- stance, more than half of the masters' chunks were pawn chains (configurations of pawns that occur frequently in chess).
Chase and Simon (1973)
Chase and Simon (1973) in their study (see Fig- ures 9.14 and 9.15) tried to identify the patterns that their participants used to recall the chess- boards. They found that participants would tend to recall a pattern, pause, recall another pattern, pause, and so on. They found that they could use a 2-s pause to identify boundaries between pat- terns. With this objective definition of what a pat- tern is, they could then explore how many patterns were recalled and how large these patterns were. In comparing a master chess player with a begin- ner, they found large differences in both measures. First, the pattern size of the master averaged 3.8 pieces, whereas it was only 2.4 for the beginner. Second, the master also recalled an average of 7.7 patterns per board, whereas the beginner recalled an average of only 5.3. Thus, it seems that the experts' memory advantage is based not only on larger patterns but also on the ability to recall more of them.
Elbert, Pantev, Wienbruch, Rockstroh, and Taub (1995)
Elbert, Pantev, Wienbruch, Rockstroh, and Taub (1995) found that violinists, who finger strings with the left hand, show increased development of the right cor - tical regions that correspond to their fingers.
Ericsson, Krampe, and Tesch-Römer (1993)
Ericsson, Krampe, and Tesch-Römer (1993) compared the best violinists at a music academy in Berlin with those who were only very good. They looked at diaries and self-estimates to determine how much the two populations had practiced and estimated that the best violinists had prac- ticed more than 7,000 hr before coming to the academy, whereas the very good had practiced only 5,000 hr. Ericsson et al. reviewed a great many fields where, like music, time spent practicing is critical. Not only is time on task important at the highest levels of performance, but also it is essential to mas- tering school subjects. Ericsson et al. (1993) make the strong claim that almost all of expertise is to be accounted for by amount of practice, and there is virtually no role for natural talent. Ericsson et al. are careful to note, however, that not all practice leads to the development of expertise. They note that many people spend a lifetime playing chess or some sport without ever getting any better. What is critical, according to Ericsson et al., is what they call deliberate practice. In deliberate practice, learners are motivated to learn, not just perform; they are given feedback on their performance; and they carefully monitor how well their performance corresponds to the correct performance and where the devia- tions exist. The learners focus on eliminating these points of discrepancy. The importance of deliberate practice in the acquisition of expertise is similar to the importance of deep and elaborative processing in improving memory, as described in Chapters 6 and 7, in which passive study was shown to yield few memory benefits
Gigerenzer and Hoffrage (1995)
Gigerenzer and Hoffrage (1995) showed that base-rate neglect also decreases if events are stated in terms of frequencies rather than in terms of probabilities. Some of their participants were given a description in terms of probabilities, such as the one that follows: The probability of breast cancer is 1% for women at age 40 who participate in routine screening. If a woman has breast cancer, the probability is 80% that she will get a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she also will get a positive mammography. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer? Fewer than 20 out of 100 (20%) of the participants given such statements calculated the correct Bayesian answer (which is about 8%). In the other condi- tion, participants were given descriptions in terms of frequencies, such as the one that follows: Ten out of every 1,000 women at age 40 who participate in routine screening have breast cancer. Eight of every 10 women with breast can- cer will get a positive mammography. Ninety-five out of every 990 women without breast cancer also will get a positive mammography. Here is a new representative sample of women at age 40 who got a positive mammography in routine screening. How many of these women do you ex- pect to actually have breast cancer? Almost 50% of the participants given such statements calculated the correct Bayesian answer. Gigerenzer and Hoffrage argued that we can reason better with frequencies than with probabilities because we expe- rience frequencies of events, but not probabilities, in our daily lives. However, just what people do in such a task continues to be debated
Gigerenzer, Todd, and ABC Research Group (1999)
Gigerenzer, Todd, and ABC Research Group (1999), in their book Simple Heu - ristics That Make Us Smart , argue that such cases are the exception and not the rule. They argue that people tend to identify the most valid cues for making judgments and use these. For instance, through evolution people have acquired a tendency to pay attention to availability of events in memory, which is more often helpful than not.
Gluck and Bower (1988)
Gluck and Bower (1988) performed an experiment that illustrates implicit Bayesian behavior. Participants were given records of fictitious patients who could display from one to four symptoms (bloody nose, stomach cramps, puffy eyes, and discolored gums) and made discriminative diagnoses about which of two hypothetical diseases the patients had. One of these diseases had a base rate three times that of the other. Additionally, the conditional probabilities of dis- playing the various symptoms, given the diseases, were varied. Participants were not told directly about these base rates or conditional probabilities. They merely looked at a series of 256 patient records, chose the disease they thought the pa- tient had, and were given feedback on the correctness of their judgments. There are 15 possible combinations of one to four symptom patterns that a patient might have. Gluck and Bower calculated the probability of each disease for each pattern by using Bayes's theorem and arranged it so that each disease occurred with that probability when the symptoms were present. Thus, the participants experienced the base probabilities and conditional probabilities implicitly in terms of the frequencies of symptom-disease combinations. Of interest is the probability with which they assigned the rarer disease to various symptom combinations. Gluck and Bower compared the participant probabilities with the true Bayesian probabilities. This correspondence is displayed by the scatterplot in Figure 11.2. There we have, for each symptom combination, the Bayesian probability (labeled objective probability) and the proportion of times that participants assigned the rare disease to that symptom combination. As can be seen, these points fall very close to a straight diagonal line with a slope of 1, which indicates that the proportion of the participants' choices were very close to the true probabilities. Thus, implicitly, the participants had become quite good Bayesians in this experiment. The behavior of choosing among alternatives in proportion to their success is called probability matching . After the experiment, Gluck and Bower presented the participants with the four symptoms individu- ally and asked them how frequently the rare disease had appeared with each symptom. This result is pre- sented in Figure 11.3 in a format similar to that of Figure 11.2. As can be seen, participants showed some neglect of the base rate, consistently overestimating the frequency of the rare disease. Still, their judg- ments show some influence of base rate in that their average estimated probability of the rare disease is less than 50%
recognition heuristic
Goldstein and Gigerenzer (1999, 2002) report studies of what they call the recognition heuristic , which applies in cases where people recognize one thing and not another. This heuristic leads people to believe that the recognized item is bigger and more important than the unrecognized item. In one study, they looked at the ability of students at the University of Chicago to judge the rela- tive size of various German cities. For instance, which city is larger—Bamberg or Heidelberg? Most of the students knew that Heidelberg is a German city, but most did not recognize Bamberg—that is, one city was available in mem- ory and the other was not. Goldstein and Gigerenzer showed that when faced with pairs like this, students almost always picked the city they recognized. One might think this shows another fallacy based on availability in memory. However, Goldstein and Gigerenzer showed that the students were actu- ally more accurate when they made their judgment for pairs of cities like this (where they recognized one and not the other) than when they were given two cities they recognized (such as Munich and Hamburg). When they recognized both cities, they had to use other bases for judging the relative size of the cit- ies and most American students have little knowledge about the population of German cities. Thus, far from a fallacy, the recognition heuristic proves to be an effective strategy for making accurate judgments. Also, American students do better at judging the relative size of German cities using this heuristic than either American students do judging American cities or German students do judging German cities, where this heuristic cannot be used because almost all the cities are recognized. 3 German students do better than American students in judging the relative size of American cities because they can use the recogni- tion heuristic and Americans cannot. Figure 11.6 illustrates Goldstein and Gigerenzer's explanation for why these students were more accurate in judging the relative size of two cities when they did not know one of them. They looked at the frequency with which German cities were mentioned in the Chicago Tribune and the frequency with which American cities were mentioned in the German newspaper Die Zeit . It turns out that there is a strong correlation between the actual size of the city and the frequency of mention in these newspapers. Not surprisingly, people read about the larger cities in other countries more frequently. Gigerenzer and Goldstein also show that there is a strong correlation between the frequency of mention in the newspapers (and the media more generally) and the probability M A k I ng d e CI s I ons u nde R u n C e RTAI n T y / 271 that these students will recognize the name. This is just the basic effect of fre- quency on memory. As a consequence of these two strong correlations, there will be a strong correlation between availability in memory and the actual size of the city. Goldstein and Gigerenzer argue that the recognition heuristic is useful in many but not all domains. In some domains, researchers have shown that people intelligently combine it with other information. For instance, Richter and Späth (2006) had participants judge which of two animals has the larger population size. For example, consider the following questions: Are there more Hainan partridges or arctic hares? Are there more giant pandas or mottled umbers? In the first case, most people have heard of arctic hares and not Hainan par - tridges and would correctly choose arctic hares using the recognition heuristic. In the second case, most people would recognize giant pandas and not mottled um- bers (a moth). Nonetheless, they also know giant pandas are an endangered spe- cies and therefore correctly choose mottled umbers. This is an example of how people can adaptively choose what aspects of information to pay attention to. ■ People can use their ability to recognize an item, and combine this with other information, to make good judgments
Greeno (1974)
Greeno (1974) found that it took only about four repetitions of the hobbits and orcs problem (see the discussion surrounding Figure 8.7 in Chapter 8) before participants could solve the prob- lem perfectly. In this experiment, participants were learning the sequence of moves to get the creatures across the river. Once they had learned the sequence, they could simply recall it and did not have to figure it out.
Knutson, Taylor, Kaufman, Peterson, and Glover (2005)
In one fMRI study, Knutson, Taylor, Kaufman, Peterson, and Glover (2005) presented participants with various uncertain outcomes. For instance, on one trial participants might be told that they had a 50% chance of winning $5; on another trial that they had a 50% chance of winning $1. Knutson et al. imaged the brain activity associated with each such gamble. The magnitude of the fMRI response in the nucleus accumbens in the basal ganglia reflected the differential magnitude of these rewards. However, this region does not respond differently to information about probability of reward. For instance, it did not respond differently when participants were told on one trial that they had an 80% probability of a reward versus a 20% probability on another trial. In con- trast, the ventromedial prefrontal cortex responded to probability of the reward. Figure 11.9 illustrates the contrasting response of these regions to reward mag- nitude and reward probability. Although the Knutson et al. study found the ventromedial prefrontal re- gion only responding to probabilities, other research finds it responds to mag- nitude as well. It is generally thought to be involved in the integration of the probability of succeeding in an action and the possible reward of success—that is, it is a key decision-making region. The ventromedial region is that portion that was destroyed in Phineas Gage (see Figure 11.1), and his problems went beyond judging probabilities. Subsequent research has confirmed that people who have damage to this region do have diffi- culty in responding adaptively in situations where they experience good and bad outcomes with dif- ferent probabilities
Iowa gambling task
Iowa gambling task (Bechara, Damasio, Damasio, & Anderson, 1994; Bechara, Damasio, Tranel, & Damasio, 2005), illustrated in Figure 11.10. The participants choose cards from four decks. In this version of the problem, decks A and B are equiva- lent and decks C and D are equivalent. Every time one selects from deck A or B, the participant will gain $100 dollars but 1 time out of 10 will also lose $1,250 dollars. So, applying our formula for expected value, the expected value of selecting a card from one of these decks is $100 2 0.1 3 $1,250 5 2 $25 or equivalently if participants play these decks for 10 trials, they can expect to lose $250. Every time they select a card from decks C and D, they get only $50, but they also only lose $250 on that 1 out of every 10 draws. The expected value of selecting from one of these desks is $50 2 0.1 3 $250 5 1 $25 and so choosing from these decks, participants can expect to make $250 every 10 trials. Players are initially attracted to decks A and B because of their higher payoff, but normal participants eventually learn to avoid them. In contrast, patients with ventromedial damage keep coming back to the high-paying decks. Also, unlike normal participants, they do not show measures of emotional engagement (such as increased galvanic skin response) when they choose from these dangerous decks. ■ Dopamine activity in the nucleus accumbens reflects the magnitude of reward, whereas the human ventromedial cortex is involved in integrating probabilities with reward.
Kahneman and Tversky (1973)
Kahneman and Tversky (1973) Thus, they allowed a completely uninformative piece of information to change their probabilities. Once again, the participants were shown to be completely unable to use prior probabilities in assessing the posterior probability of a hypothesis. The failure to take prior probabilities into account can lead people to make some totally unwarranted conclusions. . What is the error? You would have failed to consider the base rate (prior probability) for the particular type of cancer in question. Suppose only 1 in 10,000 people have this cancer. This percentage would be your prior probability. Now, with this information, you would be able to determine the posterior probability of your having the cancer. People often fail to take base rates into account in making probability judgments
J. R. Anderson, 1982
It is illustrated in my own work on the development of expertise in geometry (J. R. Anderson, 1982). One student had just learned the side-side-side (SSS) and side-angle-side (SAS) pos- tulates for proving triangles congruent. The side-side-side postulate states that, if three sides of one triangle are congruent to the corresponding sides of an- other triangle, the triangles are congruent. The side-angle-side postulate states that, if two sides and the included angle of one triangle are congruent to the corresponding parts of another triangle, the triangles are congruent. Figure 9.6 illustrates the first problem that the student had to solve. The first thing that he did in trying to solve this problem was to decide which postulate to use. The following is a part of his thinking-aloud protocol, during which he decided on the appropriate postulate: If you looked at the side-angle-side postulate (long pause) well RK and RJ could almost be (long pause) what the missing (long pause) the missing side. I think somehow the side-angle-side postulate works its way into here (long pause). Let's see what it says: "Two sides and the included angle." What would I have to have to have two sides JS and KS are one of them. Then you could go back to RS = RS . So that would bring up the side-angle-side postulate (long pause). But where would Angle 1 and Angle 2 are right angles fit in (long pause) wait I see how they work (long pause). JS is congruent to KS (long pause) and with Angle 1 and Angle 2 are right angles that's a little problem (long pause). OK, what does it say—check it one more time: "If two sides and the included angle of one triangle are congruent to the cor- responding parts." So I have got to find the two sides and the included angle. With the included angle you get Angle 1 and Angle 2. I suppose (long pause) they are both right angles, which means they are congru- ent to each other. My first side is JS is to KS . And the next one is RS to RS . So these are the two sides. Yes, I think it is the side-angle-side pos- tulate. (J. R. Anderson, 1982, pp. 381-382) After a series of four more problems (two solved by SAS and two by SSS), the student applied the SAS postulate in solving the problem illustrated in Figure 9.7. The method-recognition part of the protocol was as follows: Right off the top of my head I am going to take a guess at what I am supposed to do: Angle DCK is congruent to Angle ABK . There is only one of two and the side-angle-side postulate is what they are getting to. (J. R. Anderson, 1982, p. 382)A number of things seem striking about the contrast between these two proto- cols. One is that the application of the postulate has clearly sped up. A second is that there is no verbal rehearsal of the statement of the postulate in the sec- ond case. The student is no longer calling a declarative representation of the postulate into working memory. Note also that, in the first protocol, working memory fails a number of times—points at which the student had to recover information that he had forgotten. The third feature of difference is that, in the first protocol, application of the postulate is piecemeal; the student is sepa- rately identifying every element of the postulate. Piecemeal application is ab- sent in the second protocol. It appears that the postulate is being matched in a single step.
J. R. Anderson (2007)
J. R. Anderson (2007) I reviewed a number of studies in our labora- tory looking at the effects of practice on the performance of mathematical problem-solving tasks like the ones we have been discussing in this section. We were interested in the effects of this sort of practice on the three brain regions illustrated in Chapter 1, Figure 1.15: Motor, which is involved in programming the actual motor move- ments in writing out the solution; Parietal, which is involved in representing the problem internally; and Prefrontal, which is involved in retrieving things like the task instructions. In addition we looked at a fourth region: Anterior cingulate cortex (ACC), which is involved in the control of cognition—see Figure 3.1 and later discussion in Chapter 3. Figure 9.8 shows the mean level of activation in these regions initially and after 5 days of practice. The motor and cognitive control of the tasks do not change much and so there is comparable activation early versus late in the mo- tor cortex and the ACC. There is some reduction in the parietal suggesting that the representational demands may be decreasing a bit. However, the dramatic change is in the prefrontal, which is showing a major decrease because the task instructions are no longer being retrieved. Rather, the knowledge is coming to be directly applied. ■ Proceduralization refers to the process by which people switch from explicit use of declarative knowledge to direct application of proce- dural knowledge, which enables them to perform the task without thinking about it.
J. R. Anderson, Reder, and Simon (1998)
J. R. Anderson, Reder, and Simon (1998) noted that a major reason for the higher achievement in mathematics of students in Asian countries is that those students spend twice as much time practicing mathematics.
Jaeggi, Buschkuehl, Jonides, and Perrig (2008)
Jaeggi, Buschkuehl, Jonides, and Perrig (2008) published a report on the effectiveness of the "dual n -back" training program. In a typical single n -back task participants have to see or hear a long series of stimuli and have to say whether the current stimulus is the same as the one that occurred n items back. For example, in a 2-back task with letters participants might see T L H C H OC O R R K C K M and would respond yes to the three cases in italics. In Jaeggi et al. (2008), dual n -back task participants had the very demanding task of simultaneously tracking a sequence of letters presented auditorily and the locations of squares presented visually. The experimenters varied n (the length of the gap partici- pants had to monitor) from 1 to 4, raising it as participants got better. This is a very demanding task. To see the effect of practicing this task, Jaeggi et al. had participants take the Raven's Progressive Matrices test, a general test of intelligence. Figure 9.18 shows how participants improved on the Raven's test as a function of how many days they had practiced the dual n -back tasks. It seems like working-memory practice can raise general intelligence. Results like this led to a glowing article in the New York Times Magazine titled "Can You Make Yourself Smarter?" Numerous commercial compa- nies have sprung up (e.g., Brain Age, BrainTwister, Cogmed, JungleMemory, Lumosity), marketing cognitive training programs to individuals and schools. However, a more careful investigation by cognitive scientists has led to ques- tions, and just one year later the New Yorker published an article titled "Brain Games are Bogus." The early studies showing positive results had small sample sizes, and more adequately powered studies (Chooi & Thompson, 2012; Redick et al., 2013) have often failed to find positive results.
Jenkins, Brooks, Nixon, Frackowiak, and Passingham (1994)
Jenkins, Brooks, Nixon, Frackowiak, and Passingham (1994) looked at participants learning to key out various sequences of finger presses such as "ring, index, middle, little, middle, index, ring, index." They compared participants initially learning these sequences with participants practiced in these sequences. Using PET imaging they found that there was more activation in frontal areas early in learning than late in learning. 2 On the other hand, later in learning, there was more activation in the hippocampus, which is a structure associated with memory. Such results indicate that, early in a task, there is significant involvement of the anterior cin- gulate in organizing the behavior but that, late in learning, participants are just recalling the answers from memory. Thus, these neurophysiological data are consistent with Logan's proposal. ■ Tactical learning refers to a process by which people learn specific procedures for solving specific problems.
Kolers (1979)
Kolers (1979) investigated the acquisition of reading skills, by using materials such as those illustrated in Figure 9.4. The first type of text (N) is normal, but the others have been transformed in various ways. In the R transformation, the whole line has been turned upside down; in the I transformation, each letter has been inverted; in the M transformation, the sentence has been set as a mirror image of standard type. The rest are combinations of the several transformations. In one study, Kolers looked at the effect of massive practice on reading inverted (I) text. Participants took more than 16 min to read their first page of inverted text compared with 1.5 min for normal text. After the initial reading-speed test, participants practiced on 200 pages of inverted text. Figure 9.5 provides a log-log plot of reading time against amount of practice. In this figure, practice is measured as number of pages read. The change in speed with practice is given by the curve labeled "Original training on inverted text." Kolers interspersed a few tests on normal text; data for these tests are given by the curve labeled "Original tests on normal text." investigated the acquisition of reading skills, by using materials such as those illustrated in Figure 9.4. The first type of text (N) is normal, but the others have been transformed in various ways. In the R transformation, the whole line has been turned upside down; in the I transformation, each letter has been inverted; in the M transformation, the sentence has been set as a mirror image of standard type. The rest are combinations of the several transformations. In one study, Kolers looked at the effect of massive practice on reading inverted (I) text. Participants took more than 16 min to read their first page of inverted text compared with 1.5 min for normal text. After the initial reading-speed test, participants practiced on 200 pages of inverted text. Figure 9.5 provides a log-log plot of reading time against amount of practice. In this figure, practice is measured as number of pages read. The change in speed with practice is given by the curve labeled "Original training on inverted text." Kolers interspersed a few tests on normal text; data for these tests are given by the curve labeled "Original tests on normal text." We see the same kind of improvement for inverted text as in Figures 9.2 and 9.3 (i.e., a straight-line function on a log-log plot). After reading 200 pages, Kolers's participants were reading at the rate of 1.6 min per page—almost the same rate as that of participants reading normal text. A year later, Kolers had his participants read inverted text again. These data are given by the curve in Figure 9.5 labeled "Retraining on inverted text." Participants now took about 3 min to read the first page of the inverted text. Compared with their performance of 16 min on their first page a year earlier, participants displayed an enormous savings in time, but it was now taking them almost twice as long to read the text as it did after their 200 pages of training a year earlier. They had clearly forgotten something. As Figure 9.5 illustrates, participants' improvement on the retraining trials showed a log-log relation be- tween practice and performance, as had their original training. The same level of performance that participants had initially reached after 200 pages of train- ing was now reached after 50 pages. Skills generally show very high levels of retention. In many cases, such skills can be maintained for years with no re- tention loss. Someone coming back to a skill—skiing, for example—after many years of absence often requires just a short warm-up period before the skill is reestablished (Schmidt, 1988)
LISP
LISP, the main programming language used in artificial intelligence in the 1980s and 1990s. The LISP tutor continuously taught LISP to students at Carnegie Mellon University from 1984 to 2002 and served as a prototype for a generation of intelligent tutors, many of which have focused on teaching middle-school and high-school mathematics. The mathematics tutors are now distributed by a company called Carnegie Learning, spun off by Car - negie Mellon University in 1998. The Carnegie Learning mathematics tutors have been deployed to about 3,000 schools nationwide and have interacted with over 600,000 students each year (Koedinger & Corbett, 2006; Ritter, Anderson, Koedinger, & Corbett, 2007; you can visit the Web site www.carnegielearning .com for promotional material that should be taken with a grain of salt). Color Plate 9.1 shows a screen shot from its most widely used product, which is a tutor for high-school algebra. A large-scale study conducted by the Rand Corporation (Pane, Griffin, McCaffrey, & Karam, 2013) indicates that the tutor does provide real, if modest, gains for high-school students. correct form, and provide the student with practice on these rules. The success of the LISP tutor is one piece of evidence that these 500 rules indeed underlie coding skill in LISP. Besides providing an instructional tool, the LISP tutor is a research tool for studying the course of skill acquisition. The tutor can monitor how well a student is doing on each of the 500 rules, recording statistics such as the number of errors that a student is making and the time taken by a student to type the code corresponding to each of these rules. These data have indicated that students acquire the skill of LISP by independently acquiring each of the 500 rules. Figure 9.19 displays the learning curves for these rules. The two dependent measures are the number of errors made on a rule and the time taken to write the code corresponding to a rule (when that rule is correctly coded). These statistics are plotted as a function of learning opportunities, which present themselves each time the student comes to a point in a problem where that rule can be applied. As can be seen, performance on these rules dramatically improves from first to second learning opportunity and improves more gradually thereafter. These learning curves are similar to those identified in Chapter 6 for the learning of simple associations. There were substantial differences in the speed with which different stu- dents learned the material. Students who have already learned a program- ming language are at a considerable advantage compared with students for whom their first programming language is that of the LISP tutor. The "identical elements model" of transfer, in which rules for programming in one language transfer to programming in another language, can account for this advantage. We also analyzed the performance of individual students in the LISP tutor and found evidence for two factors underlying individual differences. Some students
Larkin (1981)
Larkin (1981) found a dif- ference in how they approached the problem. Table 9.1 shows a typical novice's solution to the problem and Table 9.2 shows a typical expert's solution. The novice's solution typifies the reasoning backward method, which starts with the unknown—in this case, the velocity v . Then the novice finds an equation for calculating v . However, to calculate v by this equation, it is necessary to calcu- late a, the acceleration. So the novice finds an equation for calculating a; and the novice chains backward until a set of equations is found for solving the problem.220 / Chapter 9 e x P e RTIS e The expert, on the other hand, uses similar equations but in the completely opposite order. The expert starts with quantities that can be directly computed, such as gravitational force, and works toward the desired velocity. It is also apparent that the expert is speaking a bit like the physics teacher that he is, leaving the final substitutions for the student.
Logan (1988)
Logan (1988) argued that a general mechanism of skill acquisition involves learning to recall solutions to problems that formerly had to be figured out. A nice illustration of this mechanism is from a domain called alpha-arithmetic. It entails solving problems such as F 1 3, in which the participant is supposed to say the letter that is the number of letters forward in the alphabet—in this case, F 1 3 5 I . Logan and Klapp (1991) performed an experiment in which they gave participants problems with numbers from 2 (e.g., C 1 2) through 5 (e.g., G 1 5). Figure 9.9 shows the time taken by participants to answer these problems initially and then after 12 sessions of practice. Initially, participants took 1.5 s longer on problems with 5 than on problems with 2, because it takes longer to count five letters forward in the alphabet than two letters. However, the problems were repeated again and again across the sessions. With repeated, continued practice, participants became faster on all problems, reaching the point where they could solve with 5 as quickly as the problems with 2. They had memorized the answers to these problems and were not going through the pro- cedure of solving the problems by counting. 1 There is evidence that, as people become more practiced at a task and shift from computation to retrieval, brain activation shifts from the prefron- tal cortex to more posterior areas of the cortex.
Maguire et al. (2003)
Maguire et al. (2003) used imaging to examine the brains of London taxi drivers. It takes at least 3 years for London taxi drivers to acquire all of the knowledge necessary to navigate expertly through the streets of London. The taxi drivers were found to have significantly more gray matter in the hippocampal region than did their matched controls. This finding cor - responds to the increased hippocampal volume reported in small mammals and birds that engage in behavior requiring navigation (Lee, Miyasato, & Clayton, 1998). For instance, food-storing birds show seasonal increases in hippocampal volume corresponding to times of the year when they need to remember where they stored food. ■ A great deal of deliberate practice is necessary to develop expertise in any field.
prior probability
Many people are surprised that the open door in the preceding example does not provide as much evidence for a burglary as might have been expected. The reason for the surprise is that they do not grasp the importance of the prior probabilities. People sometimes ignore prior probabilities. A prior probability is the probability that a hypothesis is true before con- sideration of the evidence (e.g., the door is ajar). The less likely the hypothesis was before the evidence, the less likely it should be after the evidence. Let us refer to the hypothesis that my house has been burglarized as H . Suppose that I know from police statistics that the probability of a house in my neighbor - hood being burglarized on any particular day is 1 in 1,000. 2 This probability is expressed as: Prob( H) 5 . 001 This equation expresses the prior probability of the hypothesis, or the probabil- ity that the hypothesis is true before the evidence is considered. The other prior probability needed for the application of Bayes's theorem is the probability that the house has not been burglarized. This alternate hypothesis is denoted ~ H . The probability of ~ H is 1 minus Prob( H ) and is expressed as Prob(~ H ) 5 .999
McNeil, Pauker, Sox, and Tversky (1982)
McNeil, Pauker, Sox, and Tversky (1982) found that this tendency extended to actual medical treatment. What treatment a doctor will choose depends on whether the treatment is described in terms of odds of living or odds of dying. Situations in which framing effects are most prevalent tend to have one thing in common—no clear basis for choice. This commonality is true of the three examples that we have reviewed. In the case in which the shopper has an opportunity for a savings, whether $5 is worth going to another store is unclear. In the gambling example, there is no clear basis for making a decision. 4 The stakes are very high in the third case, but it is, unfortunately, one of those social policy decisions that defy a clear analysis. Thus, these cases are hard to decide on their merits alone.
subjective probability
People make decisions under uncertainty in terms of subjective utilities and subjective probabilities. Kahneman and Tversky (1984) also argued that, as with subjective utility, people associate a subjective probability with an event that is not identical with the objective probability. They proposed the function in Figure 11.8 to relate subjective probability to objective probability. According to this function, very low probabilities are overweighted relative to high probabilities, producing a bowing in the function. Thus, a participant might prefer a 1% chance of $400 to a 2% chance of $200 because 1% is not represented as half of 2%. Kahneman and Tversky (1979) showed that a great deal of human decision making can be explained by assuming that participants are responding in terms of these subjective utilities and subjective probabilities.
Poldrack and Gabrieli (2001)
Poldrack and Gabrieli (2001) investigated the brain correlates of the changes taking place as participants learn to read transformed text such as that in Figure 9.4. In an fMRI brain-imaging study, they found increased activity in the basal ganglia and decreased activation in the hippocampus as learning progressed. Recall from Chapters 6 and 7 that the basal gan- glia are associated with procedural knowledge, whereas the hippocampus is associated with declarative knowledge.
Polson, Muncher, and Kieras (1987)
Polson, Muncher, and Kieras (1987) provided a good demonstration of lack of negative transfer in the domain of text editing on a computer (using the command-based word processors that were common at the time). They asked participants to learn one text editor and then learn a second, which was designed to be maximally confusing with the first. Whereas the command to go down a line of text might be n and the command to delete a character might be k in one text editor, n would mean to delete a character in another text editor and k would mean to go down a line. However, participants experienced overwhelming positive transfer in going from one text editor to the other because the two text editors worked in the same way, even though the surface commands had been scrambled. There is only one clearly documented kind of negative transfer in regard to cognitive skills—the Einstellung effect discussed in Chapter 8. Students can learn ways of solving problems in one domain that are no longer optimal for solving problems in another domain. So, for instance, someone may learn tricks in algebra to avoid having to perform difficult arithmetic computations. These tricks may no longer be necessary when that person uses a calculator to perform these computations. Still, students show a tendency to continue to perform these unnecessary simplifications in their algebraic manipulations. This example is not a case of failure to transfer; rather, it is a case of transferring knowledge that is no longer useful. ■ There is transfer between skills only when these skills have the same abstract knowledge elements.
Shafir (1993
Shafir (1993) suggested that, in such situations, we may make a decision not on the basis of which decision is actually the best one but on the basis of which will be easiest to justify (to ourselves or to others). Different framings make it easier or harder to justify an action. In the disease example, the first framing focuses one on saving lives and the second framing focuses one on avoiding deaths. In the first case, one would justify the action by pointing to the people whose lives have been saved (therefore it is critical that there be some people to point to). In the second case, a justification would have to explain why people died (and it would be better if there were no such people). This need to justify one's action can lead one to pick the same alternative whether asked to pick something to accept or something to reject. Consider the example in Table 11.2 in which two parents are described in a divorce case and participants are asked to play the role of a judge who must decide to which par - ent to award custody of the child. In the award condition, participants are asked to decide who is to be awarded custody; in the deny condition, they are asked to decide who is to be denied custody. The parents are overall rather equivalent, but parent B has rather more extreme positive and negative factors. Asked to make an award decision, more participants choose to award custody to parent B; asked to make a deny decision, they tend to deny custody, again, to parent B. The rea- son, Shafir argued, is that parent B offers reasons, such as a close relation with the child, that can be used to justify the awarding of custody, but parent B also has reasons, such as time away from home, to justify denying custody of the child to that parent.
Shuford (1961)
Shuford (1961), who presented arrays such as the one shown in Figure 11.4 to participants for 1 s. He then asked participants to judge the proportion of vertical bars relative to horizontal bars. The number of vertical bars varied from 10% to 90% in different arrays. Shuford's results are shown in Figure 11.5, and as can be seen, participants' estimates are quite close to the true proportions. The situation just described is one where the participants can see the relevant information and make a judgment about proportions. When participants cannot see events and must recall them from memory, their judgments may be distorted if they recall too many of one kind from memory. A fair amount of research has been done on the ways in which participants can be biased in their estimation of the relative frequency of various events in the population.
Poldrack et al. (1999)
Similar changes in the activation of brain areas have been found by Poldrack et al. (1999) in another skill- acquisition task that required the classification of stimuli. As participants develop their skill, they appear to move to a direct recognition of the stimuli. Thus, the results of this brain-imaging research reveal changes consistent with the switch between the cognitive and the associative stages. Thus, quali- tative changes appear to be contributing to the quantitative changes captured by the power function. We will consider these qualitative changes in more detail in the next section. ■ Performance of a cognitive skill improves as a power function of practice and shows modest declines only over long retention intervals
Simon and Gilmartin (1973)
Simon and Gilmartin (1973) estimated that chess masters have acquired 50,000 dif- ferent chess chunks, that they can quickly rec- ognize such patterns on a chessboard, and that this ability is what underlies their superior memory performance in chess. This 50,000 figure is not unreasonable when one consid- ers the years of dedicated study that becom- ing a chess master requires. What might be the relation between memory for so many chess patterns and superior performance in chess?
the posterior probability
The posterior probability is the probability that a hypothesis is true after consideration of the evidence. The notation Prob( H | E ) is the posterior prob- ability of hypothesis H given evidence E . According to Bayes's theorem, we can calculate the posterior probability of H , that the house has been burglarized given the evidence, thus: Given our assumed values, we can solve for Prob( H | E ) by substituting into the preceding equation: Thus, the probability that my house has been burglarized is still less than 8 in 100. Note that the posterior probability is this low even though an open door is good evidence for a burglary and not for a normal state of affairs: Prob( E | H ) 5 .8 versus Prob( E |~ H ) 5 .01. The posterior probability is still quite low because the prior probability of H —Prob( H ) 5 .001—was very low to begin with. Relative to that low start, the posterior probability of .074 is a considerable increase. Table 11.1 offers an illustration of Bayes's theorem as applied to the burglary example. It offers an analysis of 100,000 households, assuming these statistics. There are four possible states of affairs, determined by whether the burglary hypothesis is true or not and by whether there is evidence of an open door or not. The frequency of each state of affairs is set forth in the four cells of the table. Let's consider the frequency in the upper-left cell, which is the case I was worried about—the door is open and my house has been burglarized. Because 1 in a 1,000 households are burglarized (Prob( H ) is .001), there should be 100 burglaries in the 100,000 households. This is the frequency of both events in the left column. Because 8 times out of 10 the front door is left open in a burglary (Prob( E | H ) is .8), 80 of these 100 burglaries should leave the door open—the number in the upper left. Similarly, in the upper-right cell, we can calculate that of the 99,900 homes without burglary, the front door will be left open 1 in 100 times, for 999 cases. Thus, in total there are 80 1 999 5 1,079 cases of front doors left open, and the probability of the house being burglarized is 80 ∕ 1,079 5 .074. The calcu- lations in Bayes's theorem perform the same calculation as afforded by Table 11.1, but in terms of probabilities rather than frequencies. As we will see, people find it easier to reason in terms of frequencies.
chess expertise
The acquisition of chess expertise appears to involve neural reorganization in the fusiform visual area. We reviewed in Chapter 2 how the fusiform tended to be engaged in recognition of faces but can be engaged by other stimuli (e.g., Figure 2.23) for which people have acquired high levels of expertise. It also ap- pears to be engaged in the development of chess expertise. To summarize, chess experts have stored the solutions to many problems that duffers must solve as novel problems. Duffers have to analyze different configurations, try to figure out their consequences, and act accordingly. Masters have all this information stored in memory, thereby claiming two advantages. First, they do not risk making errors in solving these problems, because they have stored the correct solution. Second, because they have stored correct analyses of so many positions, they can focus their problem- solving efforts on more sophisticated aspects and strategies of chess. Thus, the experts' pattern learning and better memory for board positions is a part of the tactical learning discussed earlier. The way humans become expert at chess reflects the fact that we are very good at pattern recognition but relatively
Olds and Milner (1954)
The importance of this region to motivation has been known since the 1950s, when Olds and Milner (1954) discovered that rats would press a lever to the point of exhaustion to receive electrical stimulation from electrodes near this region. This stimulation caused release of dopamine in a region of the basal ganglia called the nucleus accumbens. Drugs like heroin and cocaine have their effect by producing in- creased levels of dopamine from this region. These dopamine neurons show increased activity for all sorts of positive rewards including basic rewards like food and sex, but also social rewards like money or sports cars (Camerer, Loewenstein, & Prelec, 2005). Thus they might appear to be the neural equiva- lent of subjective utility.There is an interesting twist to the response of dopamine neurons (Schultz, 1998). When a reward was unexpectedly presented to monkeys, their dopamine neurons showed enhanced activity at the time of reward de- livery. However, when a stimulus preceded the reward that reliably predicted the reward, the neurons no longer responded to reward delivery. Rather, the dopamine response transferred to the earlier stimulus. Finally, when a re- ward was unexpectedly omitted following the stimulus, dopamine neurons showed depressed activity at the expected time of reward delivery. These ob- servations motivated the idea that the response of dopamine neurons codes for a difference in the actual reward and what was expected (Montague, Dayan, & Sejnowski, 1996). This seems related to the experience that pleas- ures seem to fade upon repetition in the same circumstance. For instance, many people report that if they have a great meal at a new restaurant and return, the next meal is not as good. There are multiple possible explana- tions for this, but one is that the reward is expected and so the dopamine response is less. Most recording of the response of dopamine neurons is done in nonhu- mans (occasionally they are studied in patients as part of their treatment), but a number of measures have been found to track their behavior in healthy humans. One of the most frequently studied is an ERP response called feedback-related negativity (FRN—more than 200 studies have been run—for a review read Walsh & Anderson, 2012). If the reward is less than expected, there is increased negativity in the ERP response 200-350 ms after the reward is delivered; if it is greater than expected, the ERP response is more positive. Other studies have looked at fMRI (e.g., O'Doherty et al., 2004; McClure, Laibson, Loewenstein, & Cohen, 2004), and generally there is a stronger re- sponse in areas that contain dopamine neurons when the reward deviates from expectation. The fact that dopamine neurons respond to changes from expectation implies a learning component, because their response is relative to a learned expectation. Their response has been associated with a popular learning tech- nique in artificial intelligence called reinforcement learning (Holyroyd & Coles, 2002). This is a mechanism for learning what actions to take in a novel environment through experience.
proceduralization
The process of converting the deliberate use of de- clarative knowledge into pattern-driven application of procedural knowledge is called proceduralization .
subjective utility
The subjective utility of an outcome appears to be related to the activity of dopamine neurons in the basal ganglia. This idea has been formal- ized in the terms of what is referred to as subjective utility —the value that we place on money is not linear with the face value of the money. Figure 11.7, which shows a typical function proposed for the relation of subjective utility to money (Kahneman & Tversky, 1984), has two interesting properties. The first is that it curves in such a way that the amount of money must more than double in order to double its utility. The second property of this utility function is that it is steeper in the loss region than in the gain region. For example, participants might be given the following choice of gambles
prescriptivemodel
The theorem serves as a prescriptive model , or normative model, specifying the means of evaluating the prob- ability of a hypothesis. Such a model contrasts with a descriptive model , which specifies what people actually do. People normally do not perform the calculations that we have just gone through any more than they follow the
negative transfer
There is a positive side to this specificity in the transfer of skill: there seldom seems to be negative transfer, in which learning one skill makes a person worse at learning another skill. Interference, such as that which occurs in memory for facts (see Chapter 7), is almost nonexistent in skill acquisition.
superior expert memory for meaningful problems
This basic phenomenon of superior expert memory for meaningful problems has been demonstrated in a large number of domains, including the game of Go (Reitman, 1976), electronic cir - cuit diagrams (Egan & Schwartz, 1979), bridge hands (Engle & Bukstel, 1978; Charness, 1979), and computer programming (McKeithen, Reitman, Rueter, & Hirtle, 1981; Schneiderman, 1976).
the normative theory
This theory says that they should choose the alternative with highest expected value. The expected value of an al- ternative is to be calculated by multiplying the probability by the value. Thus, the expected value of alternative A is $8 3 1 ∕ 3 5 $2.67, whereas the expected value of alternative B is $3 3 5 ∕ 6 5 $2.50. Thus, the normative theory says that partic- ipants should select gamble A. However, most participants will select gamble B
Tversky and Kahneman (1974)
Tversky and Kahneman (1974), which demonstrates that judgments of proportion can be biased by differential availability of examples. These investigators asked participants to judge the proportion of English words that fit certain characteristics. For instance, they asked participants to estimate the proportion of words that begin with the letter k versus words with the letter k in the third position. How might participants perform this task? One obvious method is to briefly try to think of words that satisfy the specification and words that do not and to estimate the relative proportion of target words. How many words can you think of that begin with the letter k ? How many words can you think of that do not? What is your estimate of their proportion? Now, how many words can you think of that have the letter k in the third position? How many words can you think of that do not? What is their relative proportion? Participants estimated that more words begin with the letter k than have the letter k in the third position, although, in actual fact, the opposite is true: three times as many words have the letter k in the third position as begin with the letter k . Generally, participants overestimate the frequency with which words begin with various letters. As in this experiment, many real-life circumstances require that we estimate probabilities without having direct access to the population that these probabilities describe. In such cases, we must rely on memory as the source for our estimates. The memory factors that we studied in Chapters 6 and 7 serve to explain how such estimates can be biased. Under the reasonable assumption that words are more strongly associated with their first letter than with their third letter, the bias exhibited in the experimental results can be explained by the spreading-activation theory (Chapter 6). With the focus of attention on the letter k , for example, activation will spread from that letter to words beginning with it. This process will tend to make words beginning with the letter k more available than other words. Thus, these words will be overrepresented in the sample that participants take from memory to estimate the true proportion in the population. The same overestimation is not made for words with the letter k in the third position because words are unlikely to be directly associated with the letters in the third position. Therefore, these words cannot be associatively primed and made more available
Tversky and Kahneman (1974).
Tversky and Kahneman (1974). Which of the following sequences of six tosses of a coin (where H denotes heads and T tails) is more likely: H T H T T H or H H H H H H? Many people think the first sequence is more probable, but both sequences are actually equally probable. The probability of the first sequence is the probability of H on the first toss (which is .50) times the probability of T on the second toss (which is .50), times the probability of H on the third toss (which is .50), and so on. The probability of the whole sequence is .50 ● .50 ● .50 ● .50 ● .50 ● .50 = .016. Similarly, the probability of the second sequence is the product of the probabilities of each coin toss, and the probability of a head on each coin toss is .50. Thus, again, the final probability also is .50 ● .50 ● .50 ● .50 ● .50 ● .50 = .016. Why do some people have the illusion that the first sequence is more probable? It is because the first event seems similar to a lot of other events—for example, H T H T H T or H T T H T H. These similar events serve to bias upward a person's probability estimate of the target event. On the other hand, H H H H H H, six straight heads, seems unlike any other event, and its probability will therefore not be biased upward by other similar sequences. In conclusion, a person's estimate of the probability of an event will be biased by other events that are similar to it. A related phenomenon is what is called the gambler's fallacy: the belief that if an event has not occurred for a while, then it is more likely, by the "law of averages," to occur in the near future.
Ward Edwards (1968)
Ward Edwards (1968) extensively investigated how people use new information to adjust their estimates of the probabilities of various hypotheses. In one experiment, he presented participants with two bags, each containing 100 poker chips. Participants were shown that one of the bags con- tained 70 red chips and 30 blue, while the other contained 70 blue chips and 30 red. The experimenter chose one of the bags at random and the participants' task was to decide which bag had been chosen. In the absence of any prior information, the probability of either bag having been chosen was 50%. Thus, Prob( H R ) 5 .50 and Prob( H B ) 5 .50 where H R is the hypothesis of a predominantly red bag and H B is the hypothesis of a predominantly blue bag. To obtain further information, participants sam- pled chips at random from the bag. Suppose the first chip drawn was red. The conditional probability of a red chip drawn from each bag is Prob( R | H R ) 5 .70 and Prob( R | H R ) 5 .30 Now, we can calculate the posterior probability of the bag's being predomi- nantly red, given the red chip is drawn, by applying the Bayes equation to this situation: Prob( H ) ● Prob( E | H ) Prob( H ) ● Prob( E | H ) 1 Prob(~ H ) ● Prob( E |~ H ) Prob( H | E ) 5 (.0001)(.95) (.0001)(.95) 1 (.9999)(.05) Prob( H | E ) 5 5 .0019 Prob( R | H R ) ● Prob( H R ) Prob( R | H R ) ● Prob( H R ) 1 Prob( R | H B ) ● Prob( H B This result seems, to both naive and sophisticated observers, to be a rather sharp increase in probabilities. Typically, participants do not increase the probability of a red-majority bag to .70; rather, they make a more conservative revision to a value such as .60. After this first drawing, the experiment continues: The poker chip is put back in the bag and a second chip is drawn at random. Suppose this chip too is red. Again, by applying Bayes's theorem, we can show that the posterior prob- ability of a red bag is now .84. Suppose our observations continued for 10 more trials and, after all 12 trials, we have observed eight reds and four blues. By con- tinuing the Bayesian analysis, we could show that the new posterior probabil- ity of the hypothesis of a red bag is .97. Participants who see this sequence of 12 trials estimate subjectively a posterior probability of only .75 or less for the red bag. Edwards used the term conservative to refer to the tendency to under - estimate the full force of available evidence. He estimated that we use between a fifth and a half of the evidence available to us in situations like this experiment. ■ People frequently underestimate the cumulative force of evidence in making probability judgments.
Newell and Simon (1972) speculated that, in
addition to learning many patterns, masters have learned what to do in the presence of such patterns. For instance, if the chunk pattern is symptomatic of weakness on one side of the board, the response might be to suggest an attack on the weak side. Thus, masters effectively "see" possibilities for moves; they do not have to think them out, which explains why chess masters do so well at lightning chess, in which they have only a few seconds for each move. The acquisition of chess expertise appears to involve neural reorganization in the fusiform visual area. We reviewed in Chapter 2 how the fusiform tended to be engaged in recognition of faces but can be engaged by other stimuli (e.g., Figure 2.23) for which people have acquired high levels of expertise. It also ap- pears to be engaged in the development of chess expertise. Figure 9.16a shows examples of the board configurations that Bilalić, Langner, Ulrich, and Grodd (2011) presented to chess experts and to novices. The chessboards show po- sitions found in normal chess games or random positions. Participants' tasks were to indicate whether the king was in check (the Check task) or whether the position included knights of both colors (the Knight task). In Figure 9.16b, the blue bars show activity levels in the fusiform area when participants were presented with normal chess positions, whereas the gray bars show activity for random positions. As you can see, activation in the fusiform area was consid- erably higher for experts than for novices. Also, for experts, the normal chess positions produced greater activation than did the random chess positions; in contrast, for novices, normal versus random positions produced no difference in activation. To summarize, chess experts have stored the solutions to many problems that duffers must solve as novel problems. Duffers have to analyze different configurations, try to figure out their consequences, and act accordingly. Masters have all this information stored in memory, thereby claiming two advantages. First, they do not risk making errors in solving these problems, because they have stored the correct solution. Second, because they have stored correct analyses of so many positions, they can focus their problem- solving efforts on more sophisticated aspects and strategies of chess. Thus, the experts' pattern learning and better memory for board positions is a part of the tactical learning discussed earlier. The way humans become expert at chess reflects the fact that we are very good at pattern recognition but relatively
descriptive model
descriptive model , which specifies what people actually do. People normally do not perform the calculations that we have just gone through any more than they follow the teps prescribed by formal logic. Nonetheless, they do hold various strengths of belief in assertions such as "My house has been burglarized." Moreover, their strength of belief does vary with evidence such as whether the door has been found ajar. The interesting question is whether the strength of their belief changes in accord with Bayes's theorem. ■ Bayes's theorem specifies how to combine the prior probability of a hypothesis with the conditional probabilities of the evidence to determine the posterior probability of a hypothesis.
probability matching
implicitly, the participants had become quite good Bayesians in this experiment. The behavior of choosing among alternatives in proportion to their success is called probability matching
Bloom (1985a, 1985b)
research of Bloom (1985a, 1985b), who looked at the histories of children who became great in fields such as music or ten- nis. Bloom found that most of these children got started by playing casually, but after a short time they typically showed promise and were encouraged by their parents to start serious training with a teacher. However, the early natural abilities of these children were surprisingly modest and did not predict ultimate success in the domain (Ericsson et al., 1993). Rather, what is critical seems to be that parents come to believe that a child is talented and consequently pay for their child's instruction and equipment as well as support their time-consuming practice. Ericsson et al. speculated that the resulting training is sufficient to ac- count for the development of children's success. Talent almost certainly plays some role (considered in Chapter 14), but all the evidence indicates that genius is 90% perspiration and 10% inspiration.