Chapter 5 [S&P] 1494
Semantic Regularities
-->Refers to the meaning of a scene. This meaning is often related to the function of a scene—what happens within it. -->For example, food preparation, cooking, and perhaps eating occur in a kitchen -->Semantic regularities are the characteristics associated with the functions carried out in different types of scenes. --> Experiment by Stephen Palmer (1975). Palmer first presented a context scene and then briefly flashed one of the target pictures. [Ex. Kitchen / Then flash knife] When Palmer asked observers to identify the object in the target picture, they correctly identified it [knife] 80 percent of the time, but correctly identified unrelated stimuli [computer] only 40 percent of the time. Apparently, Palmer's observers were using their knowledge about kitchens to help them perceive the briefly flashed loaf of bread. -->The multiple personalities of a blob / The blob is perceived as different objects depending on its orientation and the context within which it is seen.
Does Past Experience Play a Role in Perception
-->The Gestalt idea that past experience and the meanings of stimuli (like the W and M) play a minor role in perceptual organization is also illustrated by the Gestalt proposal that one of the first things that occurs in the perceptual process is the segregation of figure from ground. They contended that the figure must stand out from the ground before it can be recognized. In other words, the figure has to be separated from the ground before we can assign a meaning to the figure. -->But Bradley Gibson and Mary Peterson (1994) did an experiment that argued against this idea by showing that figure-ground formation can be affected by the meaningfulness of a stimulus. They demonstrated this by presenting a display like the one in Figure 5.31a, which can be perceived in two ways: (1) a standing woman (the black part of the display) or (2) a less meaningful shape (the white part of the display). When they presented stimuli such as this for a fraction of a second and asked observers which region seemed to be the figure, they found that observers were more likely to say that the meaningful part of the display (the woman, in this example) was the figure. -->Why were the observers more likely to perceive the woman? One possibility is that they recognized that the black area was a familiar object. In fact, when Gibson and Peterson turned the display upside down, as in Figure 5.31b, so that it was more difficult to recognize the black area as a woman, subjects were less likely to see that area as being the figure. The fact that meaningfulness can influence the assignment of an area as figure means that the process of recognition must be occurring either before or at the same time as the figure is being separated from the ground subjects were less likely to see that area as being the figure. The fact that meaningfulness can influence the assignment of an area as figure means that the process of recognition must be occurring either before or at the same time as the figure is being separated from the ground
Properties of Figure and Ground
-->The figure is more "thinglike" and more memorable than the ground. Thus, when you see the vase as figure, it appears as an object that can be remembered later. However, when you see the same light area as ground, it does not appear to be an object but is just "background" and is therefore not particularly memorable. -->The figure is seen as being in front of the ground. Thus, when the vase is seen as figure, it appears to be in front of the dark background (Figure 5.26a), and when the faces are seen as figure, they are on top of the light background (Figure 5.26b). --> Near the borders it shares with the figure, the ground is seen as unformed material, without a specific shape, and seems to extend behind the figure. This is not to say that grounds lack shape entirely. They are often shaped by borders distant from those they share with the figure; for instance, the backgrounds in Figure 5.26 are square. -->The border separating the figure from the ground appears to belong to the figure. Consider, for example, the Rubin face-vase in Figure 5.25. When the two faces are seen as figure, the border separating the blue faces from the grey background belongs to the faces. This property of the border belonging to one area is called *border ownership*. When perception shifts so the vase is perceived as figure, border ownership shifts as well, so now the border belongs to the face.
Perceptions
-More complex conscious experiences such as our awareness of objects -Accounts for the vast majority of our sensory experiences. For example, when you look at Figure 5.13, you perceive a face, but the starting point, according to structuralism, would be many sensations, which are indicated by the small dots.
Think About It
1. Consider this situation: We saw in Chapter 1 that top-down processing occurs when perception is affected by the observer's knowledge and expectations. Of course, this knowledge is stored in neurons and groups of neurons in the brain. In this chapter, we saw that there are neurons that have become tuned to respond to specific characteristics of the environment. We could therefore say that some knowledge of the environment is built into these neurons. Thus, if a particular perception occurs because of the firing of these tuned neurons, does this qualify as top-down processing? 2. Reacting to the results of the recent DARPA race, Harry says, "Well, we've finally shown that computers can perceive as well as people." How would you respond to this statement? (p. 96) 3. Biological evolution caused our perceptual system to be tuned to the Stone Age world in which we evolved. Given this fact, how well do we handle activities like downhill skiing or driving, which are very recent additions to our behavioral repertoire? 4. Vecera showed that regions in the lower part of a stimulus are more likely to be perceived as figure. How does this result relate to the idea that our visual system is tuned to regularities in the environment? (p. 105) 5. The blue area in the painting in Figure 5.56 is a silhouette of a mountain. The dark area in the foreground represents trees. What happens to your perception of these two areas when you turn the picture upside down? (Hint: Can you see yellow mountains in the foreground of the upside-down picture?) Relate the changes in perception you experience to the determinants of figure and ground discussed in this chapter. 6. When you first look at Figure 5.57, do you notice anything funny about the walkers' legs? Do they initially appear tangled? What is it about this picture that makes the legs appear to be perceptually organized in that way? Can you relate your perception to any of the laws of perceptual organization? To cognitive processes based on assumptions or past experience? (pp. 102, 110)
Test Yourself III
1. Describe Grill-Spector's "Harrison Ford" experiment. What do the results indicate about the connection between brain activity and the ability to recognize faces? 2. Describe Sheinberg and Logothetis's binocular rivalry experi- ment in which they presented a picture of a butterfly to one eye and a sunburst to the other eye. What did the results indicate? 3. Describe Tong's experiment in which he presented a picture of a house to one eye and a picture of a face to the other eye. What did the results indicate? 4. Describe how "decoders" have enabled researchers to use the brain's response, measured using fMRI, to predict what orientation or what picture a person is looking at. Be sure you understand the difference between semantic encoding and structural encoding. 5. Why is it correct to say that faces are "special"? What do the face inversion experiments show? Do faces activate the brain mainly in one place or in many different places? 6. What is the evidence that newborns and young infants can perceive faces? What is the evidence that perceiving the full complexity of faces does not occur until late adolescence or adulthood?
Test Yourself I
1. What are some of the problems that make object perception difficult for computers but not for humans? 2. What is structuralism, and why did the Gestalt psychologists propose an alternative to this way of explaining perception? 3. How did the Gestalt psychologists explain perceptual organization? 4. How did the Gestalt psychologists describe figure-ground segregation? What are some basic properties of figure and ground? 5.What image-based properties of a stimulus tend to favor perceiving an area as "figure"? Be sure you understand Vecera's experiment that showed that the lower region of a display tends to be perceived as figure, and why Peterson and Salvagio stated that to understand how segregation occurs we have to consider what is happening in the wider scene. 6. Describe the Gestalt ideas about the role of meaning and past experience in determining figure-ground segregation. 7. Describe Gibson and Peterson's experiment that showed that meaning can play a role in figure-ground segregation. 8. What does the Bev Doolittle scene in Figure 5.32 demonstrate?
Test Yourself II
1. What is a "scene," and how is it different from an "object"? 2.What is the evidence that we can perceive the gist of a scene very rapidly? What information helps us identify the gist? 3. What are regularities in the environment? Give examples of physical regularities, and discuss how these regularities are related to the Gestalt laws of organization. 4. What are semantic regularities? How do semantic regularities affect our perception of objects within scenes? What is the relation between semantic regularities and the idea that perception involves inference? 5. What did Helmholtz have to say about inference and perception? 6. What is Bayesian inference, and how is it related to Helmholtz's ideas about inference?
Voxels
A small cube-shaped area of the brain about 2 or 3 mm on each side. (The size of a voxel depends on the resolution of the fMRI scanner. Scanners are being developed that will be able to resolve areas smaller than 2 or 3 mm on a side.)
Scene
A view of a real-world environment that contains (1) background elements and (2) multiple objects that are organized in a meaningful way relative to each other and the background --> One way of distinguishing between objects and scenes is that objects are compact and are acted upon, whereas scenes are extended in space and are acted within. For example, if we are walking down the street and mail a letter, we would be acting upon the mailbox (an object) and acting within the street (the scene).
Apparent Movement
Although movement is perceived, nothing is actually moving. There are three components to stimuli that create apparent movement: (1) One image flashes on and off (Figure 5.14a); (2) there is a period of darkness, lasting a fraction of a second (Figure 5.14b); and (3) the second image flashes on and off -->We don't see the darkness because our perceptual system adds something during the period of darkness—the perception of an image moving through the space between the flashing images -->A modern example of apparent movement is provided by electronic signs, which display moving advertisements or news headlines. It is difficult to imagine that they are made up of stationary lights flashing on and off. -->Wertheimer drew two conclusions from the phenomenon of apparent movement. First, apparent movement can't be explained by sensations, because there is nothing in the dark space between the flashing images. Second, the whole is different than the sum of its parts, because the perceptual system creates the perception of movement where there actually is none. This idea that the whole is different than the sum of its parts became the battle cry of the Gestalt psychologists. "Wholes" were in. "Sensations" were out!
Developmental Dimension: Infant Face Perception
At very close distances, a young infant can detect some gross features -->By 3 to 4 months, infants can tell the difference between faces that look happy and those that show surprise, anger, or are neutral and can tell the difference between a cat and a dog -->The face that the infant sees most frequently is usually the mother's, and there is evidence that young infants can recognize their mother's face shortly after they are born. -->Using preferential looking in which 2-day-old infants were given a choice between their mother's face and a strang- er's, Ian Bushnell and coworkers (1989) found that newborns looked at the mother about 63 percent of the time. This result is above the 50 percent chance level, so Bushnell concluded that the 2-day-olds could recognize their mother's face. -->Olivier Pascalis and coworkers (1995) showed that when the mother and the stranger wore pink scarves that covered their hairline, the preference for the mother disappeared. The high-contrast border between the mother's dark hairline and light forehead apparently provide important information about the moth- er's physical characteristics that infants use to recognize the mother -->In an experiment that tested newborns within an hour after they were born, John Morton and Mark Johnson (1991) presented stimuli (see bottom of Figure 5.54) to the newborns and then moved the stimuli to the left and right. As they did this, they videotaped the infant's face. Later, scorers who were unaware of which stimulus had been presented viewed the tapes and noted whether the infant turned its head or eyes to follow the moving stimulus. The results in Figure 5.54 show that the newborns looked at the moving face more than at the other moving stimuli, which led Morton and Johnson to propose that infants are born with some information about the structure of faces
Frank Tong and coworkers (1998)
Binocular rivalry procedure has been used to connect perception and neural responding in humans by using fMRI. --> Presented a picture of a person's face to one eye and a picture of a house to the other eye, by having observers view the pictures through colored glasses. (Binocular Rivalry) -->Observers perceived just the face or just the house, and these perceptions alternated back and forth every few seconds -->Subjects pushed one button when they perceived the house and another button when they perceived the face -->Tong used fMRI to measure activity in the subject's parahippocampal place area (PPA) and fusiform face area (FFA). When observers were perceiving the house, activity increased in the PPA (and decreased in the FFA); when they were perceiving the face, activity increased in the FFA (and decreased in the PPA). This result is therefore similar to what Sheinberg and Logothetis found in the monkey single neuron butterfly- sunburst experiment. -->Even though the images on the retina remained the same throughout the experiment, activity in the brain changed depending on what the person was experiencing. These experiments generated a great deal of excitement among brain researchers because they measured brain activation and perception simultaneously and demonstrated a dynamic relationship between perception and brain activity in which changes in perception and changes in brain activity mirrored each other.
Role of Experience in Infant Face Perception
But there is also evidence for a role of experience in infant face perception. Ian Bushnell (2001) observed newborns over the first 3 days of life to determine whether there was a relationship between their looking behavior and the amount of time they were with their mother. He found that at 3 days of age, when the infants were given a choice between looking at a stranger's face or their mother's face, the infants who had been exposed to their mother longer were more likely to prefer her over the stranger. The two infants with the lowest exposure to the mother (an average of 1.5 hours) divided their looking evenly between the mother and stranger, but the two infants with the longest exposure (an average of 7.5 hours) looked at the mother 68 percent of the time. Analyzing the results from all of the infants led Bushnell to conclude that face perception emerges very rapidly after birth, but that experience in looking at faces does have an effect. -->Their ability to identify faces doesn't reach adult levels until adolescence or early adulthood
Reasons Behind prolonged Face Perception Development
Can be traced to physiology. Figure 5.55 shows that the fusiform face area (FFA), indicated by red, is small in an 8-year-old child compared to the FFA in an adult. In contrast, the parahippocampal place area (PPA), indicated by green, is similar in the 8-year-old child and the adult. -->It has been suggested that this slow development of the specialized face area may be related to the maturation of the ability to recognize faces and their emotions, and especially the ability to perceive the overall configuration of facial features. Thus, the specialness of faces extends from birth, when newborns can react to some aspects of faces, to late adolescence, when the true complexity of our responses to faces finally emerges.
D. L. Sheinberg and Nikos Logothetis
D. L. Sheinberg and Nikos Logothetis (1997) used the principle (binocular rivalry), presenting a sunburst pattern to a monkey's left eye and simultaneously presenting a picture of a butterfly to the monkey's right eye -->To determine what the monkey was perceiving, they trained the monkey to pull one lever when it perceived the sunburst pattern and another lever when it perceived the butterfly. As the monkey was reporting, they simultaneously recorded the activity of a neuron in the inferotemporal (IT) cortex that had previously been shown to respond to the butterfly but not to the sunburst. -->Whenever the monkey perceived the sunburst, the neuron's firing rate was low, but when the monkey's perception shifted to the butterfly, firing increased -->Consider what happened in this experiment. The images on the monkey's retinas remained the same throughout the experiment—the sunburst was always imaged on the left retina, and the butterfly was always imaged on the right retina. The change in perception from "sunburst" to "butterfly" must therefore have been happening in the monkey's brain, and these changes in the perception were linked to changes in the firing of a neuron in the brain.
Sensations
Elementary processes that occur due to stimulation of the senses -Sensations might be linked to very simple experiences, such as seeing a single flash of light -The structuralists saw sensations as analogous to the atoms of chemistry. Just as atoms combine to create complex molecular structures, sensations combine to create complex perceptions.
Are Faces Special?
Faces are pervasive in the environment. What makes them special is that they are important sources of information. Faces establish a person's identity, which is important for social interactions (who is the person who just said hello to me?) and for security surveillance. Provide information about a person's mood, where the person is looking, and can elicit evaluative judgments in an observer (the person seems unfriendly, the person is attractive, and so on). -->Faces are also special because, as we've discussed in pre- vious chapters, there are neurons that respond selectively to faces, and there are specialized places in the brain, such as the fusiform face area, that are rich in these neurons. --> When given a task that involves moving eyes as rapidly as possible to look at a picture of either a face, an animal, or a vehicle, faces elicit the fastest eye movements, occurring within 138 ms. Results such as these have led to the suggestion that faces have special status that allows them to be processed more efficiently and faster than other classes of objects -->One research finding that had been repeated many times is that inverting a picture of a face (turning it upside down) makes it more difficult to identify the face or to tell if two inverted faces are the same or different. Similar effects occur for other objects, such as cars, but the effect is much smaller -->The inversion effect has been interpreted as providing evidence that faces are processed holistically. Thus, while all faces contain the same basic features—two eyes, a nose, and a mouth—our ability to distinguish thousands of different faces seems to be based on our ability to detect the configuration of these features—how they are arranged relative to each other on the face. -->Changing a photograph of a face into a negative image, makes it much more difficult to recognize; changing only the eyes back to positive greatly increases the ability to recognize the face (Gilad et al., 2009). This suggests that eyes are an important cue for facial recognition and may explain why it is difficult to recognize someone who is wearing a mask that covers just the eyes. -->Finally, although the existence of areas of the brain that respond specifically to faces provides evidence for specialized modules in the brain, faces, provide evidence for distributed processing as well. Initial processing of faces occurs in the occipital cortex, which sends signals to the fusiform gyrus, where visual information concerned with identification of the face is processed. Emotional aspects of the face, including facial expression and the observer's emotional reaction to the face, are reflected in activation of the amygdala, which is located deep within the brain. Evaluation of where a person is looking is linked to activity in the superior temporal sulcus; this area is also involved in perceiving movements of a person's mouth as the person speaks and general movement of faces. Evaluation of a face's attractiveness is linked to activity in the frontal area of the brain, and the pattern of activation across many areas of the brain differs in familiar faces compared to unfamiliar faces, with familiar faces causing more activation in areas associated with emotions. Faces, it appears, are special both because of the role they play in our environment and because of the widespread activity they trigger in the brain.
Reading the Brain
Following the success of the binocular rivalry experiments, researchers took the next step by asking whether it is possible to determine what a person is seeing by analyzing the pattern of activity in the brain. Achieving this involves measuring a person's brain activity as they are seeing something, and then somehow "decoding" that activity to determine the perception associated with it. -->Kamitani and Tong (2005) recorded their subjects' fMRI responses to a number of gratings with different orientations (the one in Figure 5.47a is 45 degrees to the right) and determined the response to the gratings in a number of fMRI voxels. -->Kamitani and Tong determined the pattern of voxel activity generated by each orientation and used the relationship between voxel activity and orientation to create an "orientation decoder." -->To test the decoder, they presented oriented gratings to a subject, fed the resulting fMRI response into the decoder, and had the decoder predict the grating's orientation. --> Decoder accurately predicted the orientations that were presented. -->What about complex stimuli like scenes in the environment? Expanding our stimulus set from eight grating orientations to every possible scene in the environment is quite a jump! But recent work toward creating such a "scene decoder" has had some success.
Principle of Uniform Connectedness
Gestalt Organizing Principles -->A connected region of the same visual properties, such as lightness, colour, texture, or motion, is perceived as a single unit. For example, connected circles are perceived as grouped together, just as they were when they were in the same region in Figure 5.23a. Again, connectedness overpowers proximity.
Subjective Factors That Determine Which Area Is Figure
Gestalt psychologists disagreed with the idea that a person's past experience played an important role in determining perception. -->The following demonstration by the Gestalt psychologist Max Wertheimer (1912) illustrates how the Gestalt psychologists downplayed experience. -->Wertheimer notes that we tend to perceive the display in Figure 5.30a as a "W" sitting on top of an "M," largely because of our past experiences with those two letters. However, when the W and M are arranged as in Figure 5.30b, we see two uprights with a pattern in between. Although we can tell where the W and M are if we look closely, the pattern with the two uprights is the dominant perception. Returning to the Gestalt organizing principles, Wertheimer said that the uprights are created by the principle of good continuation, and that this principle overrides any effects of past experience due to having seen Ws or Ms before. Our discussion of organization and figure-ground described how our perception is influenced by characteristics such as nearness, good continuation, and similarity; higher or lower in the visual field; convexity or concavity of borders.
Gestalt Organizing Principles
Having questioned the idea that perceptions are created by adding up sensations, the Gestalt psychologists proposed that perception depends on a number of organizing principles, which determine how elements in a scene become grouped together. -->Many of my students react to this idea by saying that the Gestalt principles aren't therefore anything special, because all they are doing is describing the obvious things we see every day. When they say this, I remind them that the reason we perceive scenes like the city buildings so easily is because we use observations about commonly occurring properties of the environment to organize the scene. Thus, we assume, without even thinking about it, that the men's legs in Figure 5.24 extend behind the gray board, because generally in the environment when two visible parts of an object (like the men's legs) have the same color and are "lined up," they belong to the same object and extend behind whatever is blocking it. -->People don't usually think about how we perceive situations like this as being based on assumptions, but that is, in fact, what is happening. The reason the "assumption" seems so obvious is that we have had so much experience with things such as this in the environment. That the "assumption" is actually almost a "sure thing," may cause us to take the Gestalt principles for granted, and label them as "obvious." But the reality is that the Gestalt principles are nothing less than the basic operating characteristics of our visual system that determine how our perceptual system organizes elements of the environment into larger units.
Brain Activity and Identifying a Picture
Kalanit Grill-Spector and coworkers (2004) were interested in determining the relationship between the brain activation that occurs when looking at an object and a person's ability to identify the object. -->The "object" they used were pictures of Harrison Ford's face. Measuring response to fusiform face area (FFA) in the temporal lobe to each picture. Three Trials; (a) a picture of Harrison Ford (b) a picture of another person's face, or (c) a random texture. Each of these stimuli was presented briefly followed immediately by a random-pattern mask, which limited the visibility of each stimulus to just 50 ms --> Results from the study in Fig. 5.44. The red curve shows that activation was greatest when observers correctly identified the picture as Harrison Ford's face. The next curve shows that activation was less when they responded "other object" to Harrison Ford's face. In this case, they detected the picture as a face but were not able to identify it as Harrison Ford's face. The lowest curve indicates that there was little activation when observers could not even tell that a face was presented. *Remember that all of the curves in Figure 5.44 represent the brain activity that occurred during presentation of Harrison Ford's face. These results therefore show that neural activity that occurs as a person is looking at a stimulus is related to that person's ability to identify the stimulus. A large neural response is associated with processing that results in the ability to identify the stimulus; a smaller response, with detecting the stimulus; and the absence of a response with missing the stimulus altogether. This is important because it shows that how the brain reacts to a stimulus as it is being presented determines our ability to identify the stimulus.*
Decoder Database
Knowing the features of a scene and the type of scene doesn't tell us what the scene actually looks like. This step is achieved when the decoder consults a database of 6 million natural images and picks the images that most closely match the information determined from analyzing the person's brain activity. -->Figure 5.49a shows the results when just the structural encoder was used. The encoder has picked the three images on the right as the best match for the target image in the red box, which is the image the person was observing. The structure of all of the matching images is similar, with objects appearing on the left of the image and open spaces in the middle and right. However, whereas the target image contains buildings, buildings are either absent or difficult to see in the matching images. -->The structural encoder alone does a good job of matching the structure of the target image, but a poor job of matching the meaning of the target image. Adding the semantic encoder improves performance, as shown in Figure 5.49b. It is easy to see the effect of the semantic encoder, because now the meanings of the match images are much closer to the test image, with all of the matches showing the sides of buildings. -->ne reason the images picked as matches are not exactly the same as the target is that the target images are not contained in the 6-million-picture database of images from which the encoder selected. Eventually, according to Naselaris, much larger image databases will result in matches that are much closer to the target. Accuracy will also increase as we learn more about how the neural activity of various areas of the brain represents the characteristics of environmental scenes
Inverse Projection Problem Poses Serious Challenges to Computer Vision Systems
Objects Can Be Hidden or Blurred --> This problem of hidden objects occurs any time one object obscures part of another object. This occurs frequently in the environment, but people easily understand that the part of an object that is covered continues to exist, and they are able to use their knowledge of the environment to determine what is likely to be present. People are also able to recognize objects that are not in sharp focus. Despite the degraded nature of images, people can often identify most of them, whereas computers perform poorly on this task. Objects Look Different From Different Viewpoints -->Another problem facing any perceiving machine is that objects are often viewed from different angles. The ability to recognize an object seen from different viewpoints is called *viewpoint invariance*. We've already seen that viewpoint invariance enables people to tell whether faces seen from different angles are the same person, but this task is difficult for computers How do humans overcome these complexities? We begin answering this question by considering perceptual organization
Principle of Pragnanz
One of the Gestalt Organizing Principles -->Roughly translated from the German, means "good figure." / The central principle of Gestalt psychology -->Every stimulus pattern is seen in such a way that the resulting structure is as simple as possible. The familiar Olympic symbol in Figure 5.19a is an example of the principle of simplicity at work. We see this display as five circles and not as a larger number of more complicated shapes such as the ones in Figure 5.19b.
Principle of Proximity / Nearness
One of the Gestalt Organizing Principles --> Things that are near each other appear to be grouped together
Principle of Common Region
One of the Gestalt Organizing Principles -->Fig. 5.23 Elements that are within the same region of space appear to be grouped together. Even though the circles inside the ovals are farther apart than the circles that are next to each other in neighbouring ovals, we see the circles inside the ovals as belonging together. This occurs because each oval is seen as a separate region of space. Notice that in this example, common region over-powers proximity, because proximity would predict that the nearby circles would be perceived together. But even though the circles that are in different regions are close to each other in space, they do not group with each other
The Principle of Similarity
One of the Gestalt Organizing Principles -->Most people perceive Figure 5.20a as either horizontal rows of circles, vertical columns of When we change the colour of some of the columns, as in Figure 5.20b, most people perceive vertical columns of circles. This perception illustrates the principle of similarity: Similar things appear to be grouped together. This law causes circles of the same colour to be grouped together. A striking example of grouping by similarity of colour is shown in Figure 5.21. Grouping can also occur because of similarity of shape, size, or orientation. Grouping also occurs for auditory stimuli. For example, notes that have similar pitches and that follow each other closely in time can become perceptually grouped to form a melody.
Principle of good continuation
One of the Gestalt Organizing Principles -->Points that when connected result in straight or smoothly curving lines are seen as belonging together, and the lines tend to be seen in such a way as to follow the smoothest path. [Rope] -->Objects that are partially covered by other objects are seen as continuing behind the covering object.
Principle Common Fate
One of the Gestalt Organizing Principles -->Things that are moving in the same direction appear to be grouped together. Thus, when you see a flock of hundreds of birds all flying together, you tend to see the flock as a unit, and if some birds start flying in another direction, this creates a new unit. Note that common fate can work even if the objects in a group are dissimilar. The key to common fate is that a group of objects are moving in the same direction. The principles we have just described were proposed by the Gestalt psychologists in the early 1900s. The following additional principles have been proposed by modern perceptual psychologists.
Reversible Figure-Ground
One way the Gestalt psychologists studied the properties of figure and ground was by considering patterns like this one, which was introduced by Danish psychologist Edgar Rubin in 1915 (Reversible Figure-Ground Picture). -->It can be perceived alternately either as two dark blue faces looking at each other, in front of a gray background, or as a gray vase on a dark blue background.
Gist of a Scene
Perceiving scenes presents a paradox. Scenes are often large and complex, despite this size and complexity, you can identify most scenes after viewing them for only a fraction of a second. This general description of the type of scene is called the gist of a scene. -->An example of your ability to rapidly perceive the gist of a scene is the way you can rapidly flip from one TV channel to another, yet still grasp the meaning of each picture. --> Possible to perceive the gist of a scene within a fraction of a second / Ex. When a target picture was specified by a written description, such as "girl clapping," observers achieved an accuracy of almost 90 percent! --> Another approach to determining how rapidly people can perceive scenes was used by Li Fei-Fei and coworkers. Presented pictures of scenes for exposures ranging from 27 ms to 500 ms and asked observers to write a description of what they saw. This method of determining the observer's response is a nice example of the phenomenological method. Used a procedure called masking to be sure the observers saw the pictures for exactly the desired duration. Typical results of Fei-Fei's experiment are shown in Figure 5.34. At brief durations, observers saw only light and dark areas of the pictures. By 67 ms they could identify some large objects (a person, a table), and when the duration was increased to 500 ms (half a second) they were able to identify smaller objects and details (the boy, the laptop). For a picture of an ornate 1800s living room, observers were able to iden- tify the picture as a room in a house at 67 ms and to identify details, such as chairs and portraits, at 500 ms. Thus, the over- all gist of the scene is perceived first, followed by perception of details and smaller objects within the scene.
Physical Regularities
Regularly occurring physical properties of the environment. For example, there are more vertical and horizontal orientations in the environment than oblique (angled) orientations. Therefore, no coincidence that people can perceive horizontals and verticals more easily than other orientations—the oblique effect. -->Another physical regularity is that objects in the environment often have homogeneous (same) colors and nearby objects have different colors. Light-From-Above Assumption --> The assumption that light is coming from above has been called the light-from-above assumption. Apparently, people make the light- from-above assumption because most light in our environment comes from above. This includes the sun, as well as most artificial light sources. +Another example of the light-from-above assumption at work is provided by the two pictures in Figure 5.38. Figure 5.38a shows indentations created by people walking in the sand. But when we turn this picture upside down, as in Figure 5.38b, then the indentations in the sand become rounded mounds. +It is clear from these examples of physical regularities in the environment that one of the reasons humans are able to perceive and recognize objects and scenes so much better than computer-guided robots is that our system is customized to respond to the physical characteristics of our environment. But this customization goes beyond physical characteristics. It also occurs because we have learned about what types of objects typically occur in specific types of scenes.
Connecting Neural Activity and Object Perception
So far in our discussion of objects and scenes, we have focused on how perception is determined by aspects of stimuli. In fact, the words neuron and brain haven't appeared even once! Now it is time to consider the relationship between physiological processes and the perception of objects. This relationship has been studied in a number of different ways, both in animals (mostly monkeys) and in humans.
Theory of Unconscious Inference
States that some of our perceptions are the result of unconscious assumptions we make about the environment. --> Proposed to account for our ability to create perceptions from stimulus information that can be seen in more than one way. --> This display could have been caused by a six-sided red shape positioned either in front of or behind the blue rectangle. According to the theory of unconscious inference, we infer that A is a rectangle covering another rectangle because of experiences we have had with similar situations in the past. -->The display in (a) is usually interpreted as being (b) a blue rectangle in front of a red rectangle. It could, however, be (c) a blue rectangle and an appropriately positioned six-sided red figure.
The Role of Inference in Perception
States that some of our perceptions are the result of unconscious assumptions we make about the environment. -->The idea that perception involves inference is nothing new; it was proposed in the 18th century by Hermann von Helmholtz. One of his proposals about perception is a principle called the *Theory of Unconscious Inference*
Likelihood Principle
States that we perceive the object that is most likely to have caused the pattern of stimuli we have received. Thus, we perceive Figure A as a blue rectangle in front of a red rectangle because it is most likely, based on our past experience, to have caused that pattern. -->One reason that Helmholtz proposed the likelihood principle is to deal with the ambiguity of the perceptual stimulus that we described at the beginning of the chapter.
Perceptual Segregation
The Gestalt psychologists were also interested in determining characteristics of the environment responsible for perceptual segregation—*the perceptual separation of one object from another*, as occurs when you see buildings in a skyline as separate from one another.
Brain Activity and Seeing
The Harrison Ford experiment presented stimuli that were quickly flashed and so were difficult to see. Another approach to studying the relationship between brain activity and vision is to look for connections between brain activity and stimuli that are easy to see. -->Our two eyes receive slightly different images. These two images are similar enough so they can be combined into a single perception by the brain (*binocular fusion*). --> If each eye receives totally different images, the brain can't fuse the two images and a condition called *binocular rivalry* occurs, in which the observer perceives either the left-eye image or the right-eye image, but not both at the same time.
Viewpoint Invariance
The ability to recognize an object seen from different viewpoints.
Bayesian Fnference
The idea that assumptions and inferences are important for perception has recurred throughout the history of percep- tion research in various forms, from Helmholtz to the Gestalt principles to regularities of the environment. Most recently, modern psychologists have quantified the idea of inferential perception by using a statistical technique called Bayesian inference that takes probabilities into account -->For example, let's say we want to determine how likely it is that it will rain tomorrow. If we know it rained today, then this increases the chances that it will rain tomorrow, because if it rains one day it is more likely to rain the next day. Applying reasoning like this to perception, we can ask, for example, whether a given object in a kitchen is a loaf of bread or a mailbox. Since it is more likely that a loaf of bread will be in a kitchen, the perceptual system concludes that bread is present. Bayesian statistics involves this type of reasoning, expressed in mathematical formulas that we won't describe here
Inverse Projection Problem
The perceptual system is not concerned with determining an object's image on the retina. It starts with the image on the retina, and its job is to determine the object "out there" that created the image. The task of determining the object responsible for a particular image on the retina is called the inverse projection problem, because it involves starting with the retinal image and extending rays out from the eye. --> When we do this, as shown in Figure 5.6, we see that the rectangular page (in red) could have created the retinal image, but that a number of other objects, including a tilted trapezoid, a much larger rectangle, and an infinite number of other objects, could also have created that image. When we consider that a particular image on the retina can be created by many different objects in the environment, it is easy to see why we say that the image on the retina is ambiguous.
The Stimulus on the Receptors Is Ambiguous
The perceptual system is not concerned with determining an object's image on the retina. It starts with the image on the retina, and its job is to determine the object "out there" that created the image. The task of determining the object responsible for a particular image on the retina is called the *inverse projection problem*, because it involves starting with the retinal image and extending rays out from the eye.
Perceptual organization
The process by which elements in the environment become perceptually grouped to create our perception of objects. During this process, incoming stimulation is organized into coherent units such as objects. The process of perceptual organization involves two components, grouping and segregation
Grouping
The process by which visual events are "put together" into units or objects. --> If you can perceive the Dalmatian dog in Figure 5.12, you have perceptually grouped some of the dark areas to form a Dalmatian, with the other dark areas being seen as shadows on the ground.
Segregation
The process of separating one area or object from another [Seeing multiple buildings on a skyline, realizing that they're not all one]
Figure-Ground Segregation
The question of what causes perceptual segregation is often referred to as the problem of figure-ground segregation. -->When we see a separate object, it is usually seen as a *figure* that stands out from its background, which is called the *ground*. Ex. Book on surface of desk (Figure) / Desk Surface (Ground) -->Gestalt psychologists were interested in determining the properties of the figure and the ground and what causes us to perceive one area as figure and the other as ground.
Semantic Encoding
The second method, called semantic encoding, is based on the relationship between voxel activation and the meaning or category of a scene. The semantic encoder is calibrated by measuring the pattern of voxel activation to a large number of images that have previously been classified into categories such as "crowd," "portrait," "vehicle," and "outdoor." --> From this calibration, the relationship between the pattern of voxel activation and image category is determine -->The structural encoder might indicate that there are straight lines of various orientations on the left of the scene, that there are curved contours in some places and that there are few straight or curved contours in another area. The semantic decoder, which provides a different type of information, might indicate that the subject is looking at an outdoor scene.
Illusory Contours
This figure argues against sensations and for the idea that the whole is different than the sum of its parts. -->The edges that create the triangle are called illusory contours because there are actually no physical edges present. Sensations can't explain illusory contours, because there aren't any sensations along the contours. This demonstration provides more evidence that the whole is different than the sum of its parts.
Structural Encoding
Thomas Naselaris and coworkers (2009) created a brain-reading device by developing two methods for analyzing the patterns of voxel activation recorded from visual areas of an observer's brain. -->The first method, called structural encoding, is based on the relationship between voxel activation and structural characteristics of a scene, such as lines, contrasts, shapes, and textures. -->Just as Kamitani and Tong's orientation decoder was calibrated by determining the voxel activation patterns generated by eight different orientations, Naselaris's structural decoder was calibrated by presenting a large number of images, like the ones in Figure 5.48, to an observer and determining how a large number of voxels responded to specific features of each scene, such as line orientation, detail, and the position of the image. Once the structural encoder was calibrated, it was "reversed" to make predictions in the opposite direction, using the patterns of voxel responses to predict the features of the image that the subject was viewing.
Structuralism
We can understand the Gestalt approach by first considering an approach that came before Gestalt psychology, called structuralism, which was proposed by Wilhelm Wundt, who established the first laboratory of scientific psychology at the University of Leipzig in 1879. -->Structuralism distinguished between sensations (elementary processes that occur due to stimulation of the senses)—and Perceptions (more complex conscious experiences such as our awareness of objects) -->Another principle of structuralism is that the combination of sensations to form perceptions is aided by the observer's past experience -->The Gestalt psychologists rejected the idea that perceptions were formed by "adding up" sensations and also rejected past experience as playing a major role in perception. To see why the Gestalt psychologists felt that perceptions could not be explained by adding up small sensations, consider the experience of psychologist Max Wertheimer, who was on vacation taking a train ride through Germany in 1911. When he got off the train to stretch his legs at Frankfurt, he bought a toy stroboscope from a vendor who was selling toys on the train platform. The stroboscope, a mechanical device that created an illusion of movement by rapidly alternating two slightly different pictures, caused Wertheimer to wonder *how the structuralist idea that experience is created from sensations could explain the illusion of movement he observed.*
Regularities in the Environment
We learn, for example, that blue is associated with open sky, that landscapes are often green and smooth, and that verticals and horizontals are associated with buildings. Characteristics of the environment such as this, which occur frequently, are called regularities in the environment. -->We can distinguish two types of regularities: physical regularities and semantic regularities.
The Gestalt Approach to Perceptual Grouping
What causes some elements to become grouped so they are part of one object? Answers to this question were provided in the early 1900s by the Gestalt psychologists—where Gestalt, roughly translated, means configuration. "How," asked the Gestalt psychologists, "are configurations formed from smaller elements?"
Global Image Features
What enables observers to perceive the gist of a scene so rapidly? Aude Oliva and Antonio Torralba propose that observers use information called global image features, which can be perceived rapidly and are associated with specific types of scenes. ■ Degree of naturalness. Natural scenes, such as the ocean and forest in Figure 5.35, have textured zones and undulating contours. Man-made scenes, such as the street, are dominated by straight lines and horizontals and verticals. ■ Degree of openness. Open scenes, such as the ocean, often have a visible horizon line and contain few objects. The street scene is also open, although not as much as the ocean scene. The forest is an example of a scene with a low degree of openness. ■ Degree of roughness. Smooth scenes (low roughness) like the ocean contain fewer small elements. Scenes with high roughness like the forest contain many small elements and are more complex. ■ Degree of expansion. The convergence of parallel lines, like what you see when you look down railroad tracks that appear to vanish in the distance, or in the street scene in Figure 5.35, indicates a high degree of expansion. This feature is especially dependent on the observer's viewpoint. For example, in the street scene, looking directly at the side of a building would result in low expansion. ■ Color. Some scenes have characteristic colors, like the ocean scene (blue) and the forest (green and brown) (Goffaux et al., 2005). *Global image features are holistic and rapidly perceived. They are properties of the scene as a whole and do not depend on time-consuming processes such as perceiving small details, recognizing individual objects, or separating one object from another.* *Another property of global image features is that they contain information about a scene's structure and spatial layout. For example, the degree of openness and the degree of expansion refer directly to characteristics of a scene's layout, and naturalness also provides layout information that comes from knowing whether a scene is "from nature" or contains "human-made structures."* *Global image properties not only help explain how we can perceive the gist of scenes based on features that can be seen in brief exposures, they also illustrate the following general property of perception: Our past experiences in perceiving properties of the environment play a role in determining our perceptions.*
Using a Mask to Achieve Brief Stimulus Presentations
What if we want to present a stimulus that is visible for only 100 ms? Although you might think that the way to do this would be a flash a stimulus for 100 ms, this won't work because of a phe- nomenon called persistence of vision—the perception of a visual stimulus continues for about 250 ms (1/4 second) after the stimu- lus is extinguished. Thus, a picture that is presented for 100 ms will be perceived as lasting about 350 ms. But the persistence of vision can be eliminated by presenting a visual masking stimulus, usually a random pattern that covers the original stimulus, so if a picture is flashed for 100 ms followed immediately by a masking stimulus, the picture is visible for just 100 ms. A masking stimulus is therefore often presented immediately after a test stimulus to stop the persistence of vision from increasing the duration of the test stimulus
Image-Based Factors That Determine Which Area Is Figure
areas lower in the field of view are more likely to be perceived as figure. / Idea was confirmed experimentally years later who flashed stimuli like the ones in Figure 5.27a for 150 milliseconds (ms) and determined which area was seen as figure, the red area or the green area. for the upper-lower displays, observers were more likely to perceive the lower area as figure, but for the left-right displays, they showed only a small preference for the left region. -->Conclusion from this experiment, that the lower region of a display tends to be seen as figure, makes sense when we consider a natural scene. Typically the lower part of the scene is the figure and the sky is ground. In our normal experience, the "figure" is much more likely to be below the horizon.
Convex side of borders
figures are more likely to be perceived on the convex side of borders (borders that bulge out) -Displays like the one in Figure 5.29a and asking observers to indicate whether the red square was "on" or "off " a perceived figure. If they perceived the dark area in this example as being a figure, they would say "on." If they perceived the dark area as ground, they would say "off." The result, was that convex regions, like the dark region in were perceived as figure 89 percent of the time. -->But Peterson and Salvagio went beyond simply confirming the Gestalt proposals by also presenting displays like the ones in Figure 5.29b and c, which had fewer components. Doing this greatly decreased the likelihood that convex displays would be seen as figure, with the convex region containing the red square in the two-component display being seen as figure only 58% of the time. To understand how segregation occurs we need to go beyond simply identifying factors like convexity. Segregation is determined not by just what is happening at a single border but by what is happening in the wider scene. consider that perception generally occurs in scenes that extend over a wide area.