PSYCH 3420 Final Questions
The main differences between the three display systems is whether the level of information matches the limits of the sensory system: Efficient (=): Information is matched to the sensory system Inefficient (>): There is more Information than the sensory system can handle Restrictive (<): There is less Information than can be handled by the sensory system. No, we do not need all displays to be efficient.
1. Describe the differences between efficient, inefficient and restrictive display systems. Do we need all displays to be efficient?
The famous blue and black (or white and gold) dress is seen differently by different people because our eyes can interpret lighting differently; we could either see the brightly illuminated blue and black dress or the dimly illuminated white and gold dress.
10. The famous blue and black (or white and gold) dress is seen differently by different people. Why? (Lecture 5)
Foveal and peripheral processing differ in: Sensitivity to light -- foveal processing is much less sensitive to dim light (cones) so it is specialized for day vision; peripheral (rods) processing is specialized for night vision Acuity -- foveal processing has much higher acuity, peripheral processing has lower acuity because since we are constantly moving our eyes around, we never really notice the massive drop in acuity in our periphery Chromaticity -- Foveal processing is chromatic; peripheral processing is achromatic Flicker -- foveal processing has low convergence on ganglion cells (as few as 1 photoreceptor in excitatory field and 5 in inhibitory field), peripheral processing has high convergence on ganglion cells (which causes the differences in acuity); periphery can respond faster to flicker
11. Describe four differences between foveal and peripheral processing. (Lecture 3 and L Ch 2)
Bits of Luminance: Each bit has two intensity values (ex. black and white) 2^x = 2 colors, then x = 1. So 1 bit. 1 bit has 2 levels, 2 bits has 4 levels, 3 bits has 8 levels, etc. Photographic: 9-10 bits are needed to produce photographic quality images Three Factors: Three factors that have bearing on this number are screen range, particular image type (smooth intensity variations require more bits), and screen gamma.
12. How many "bits" of luminance are available with a display that has only two intensity values (i.e., a monochrome display). How many are needed to produce photographic quality images? What are three factors that have bearing on this number?
Motion sickness is triggered when sensory expectations are being violated. The Mismatch Hypothesis states that motion sickness is caused by conflicting sensory input from the visual and vestibular systems. While your vision may suggest that you are stable, your inner ear senses movement (or when most of your visual field is moving but no corresponding information comes from the vestibular system). This sensory confusion is what makes you sick. However, the Mismatch Hypothesis does not fit all data. Camel vs. Horse: You are more likely to get sick on a camel as opposed to a horse because of the slower frequency of motion (swaying gait) of the camels, in comparison to the higher frequency movement of a horse. The camel's lower frequency of motion is closer to the frequency that usually causes motion sickness. Driver vs. Passenger: A passenger is more likely to get sick than a driver because the driver expects the sudden turns and stopping of a car while the passenger does not. Drivers keep a more focused attention on the road congruent to the direction of travel, so they are better able to integrate visual motion information with vestibular sensation of movement.
14. What causes motion sickness? How does this relate to the mismatch hypothesis? Why might motion cause sickness in the first place? Why is one more likely to get sick on a camel as opposed to a horse? Why is a passenger more likely to get sick than a driver even when they are both keeping their eyes on the road?
Luminance - Physical measure of light coming off a surface; a photometric quantity which is basically the effect of radiance on our eyes. When it comes from the original source. (Watts). Radiance - Physical measure of light or heat as emitted or reflected by something (can be measured). Physical quantity related to light intensity: the power of the light, spreading out in some solid angle, over some area. When light reflected off objects. Hyperopia - Farsighted. Occurs when the eyeball is too short or the cornea has too little curvature. Vision condition in which distant objects can be seen clearly but close ones do not come into proper focus Myopia - Nearsighted. NOT necessarily inherited but likely caused by close work, lack of sufficient bright light, and genetic predisposition. Occurs if the eyeball is too long or the cornea, the clear front cover of the eye, has too much curvature. Vision condition in which close objects can be seen clearly but distant ones do not come into proper focus. Bits - Binary digit used to measure graphic color pallets of display; basic unit of information in computing and digital communications Blindsight - The ability of a blind person to sense accurately a light source or other visual stimulus even though they are unable to see it consciously; people that are cortically blind but can still avoid objects as if they can see, even if they are not aware of it; results from damage in visual cortex Blindspot - The small circular area at the back of the retina where the optic nerve enters the eyeball and which is devoid of rods and cones and is not sensitive to light Lateral Geniculate Nucleus (LGN) - A relay center in the thalamus for the visual pathway. Signals from the retina are transmitted along the optic nerve to the LGN. From there, they are distributed to a number of areas but primarily Visual area 1 (V1) of the cortex at the back of the head. 90% of optic of LGN projects to visual cortex. 10% to structures like superior colliculus. Midway from eye to visual cortex LGN - Ganglion cells send info through the optic nerve via a way station (LGN), on to the primary visual processing areas at the back of the brain. Ganglion Cell - Neurons that take information from rods and cones, sends information from the eyeball up the optic nerve to the cortex. Visual Cortex - Main visual processing area of the brain. 90% of LGN projects to the cortex. Damage can cause "blindsight." The eye and the visual cortex of the brain form a massively parallel processor that provides the highest bandwidth channel into human cognitive centers. Receptive Field - Visual area over which a cell responds to light (field of cones and rods that receive light information). Feeds into a ganglion cell. The patterns of light falling on the retina influence the way the neuron responds. Retinal ganglion cells are organized with circular receptive fields which can be either on-center or off-center. Fovea - Small area in the center of the retina that is densely packed ONLY with cones. Vision is sharpest here. Lateral Inhibition - The capacity of an excited neuron to reduce the activity of its neighbors. Disables the spreading of action potentials from excited neurons to neighboring neurons in the lateral direction. First stage of edge detection process. Signals the positions and contrasts of edges in environment. Diopters - A unit of measurement of the optical power of a lens. A unit of refractive power that is equal to the reciprocal of the focal length (in meters) of a given lens: [Diopters = 1 / focal length] and [Diopters = 1/object distance + 1/image distance] Focal length - Distance between center of a lens and its focus. Distance at which objects at infinity are focused. Troland - A unit of conventional retinal illuminance. Meant as a method for correcting photometric measurements of luminance values impinging on the human eye by scaling them by the effective pupil size. 1 Candelas/meter2 seen through a 1 mm2 pupil V Lambda Curve - Luminosity function that describes the average spectral sensitivity of human visual perception of brightness. Peaks at 550 nm. (See Q1) Purkinje Shift - The tendency for the peak luminance sensitivity of the human eye to shift toward the blue end of the color spectrum at low illumination levels. In dark environment, shorter-wavelength light (blue and green) appear to be brighter than longer-wavelength light (red) but in bright conditions, red looks brighter than blue. Oblique Effect - The relative deficiency in perceptual performance for oblique visual stimuli as compared to the performance for horizontal or vertical visual stimuli. (People are good at detecting if picture is hung vertical, but are two- to four-fold worse for judging angles) Mach Bands - An optical illusion that exaggerates the contrast between degrees of the slightly differing shades of gray, as soon as they contact one another, by triggering edge-detection in the human visual system. At the point where a uniform area meets a luminance ramp, a bright band is seen. Appear where there is an abrupt change in the first derivative of a brightness profile.
15. Concepts to understand: Luminance vs. radiance, hyperopia, myopia, bits, blindsight, blindspot, lateral geniculate nucleus (LGN), ganglion cell, visual cortex, receptive field, fovea, lateral inhibition, diopters, focal length, troland, V lambda curve, Purkinje shift, the oblique effect, mach bands
The plot shows that we see cooler colors better in the night time (hence blue looks brighter during the night) and brighter colors during the day. Purkinje shift is explained by the scotopic (periphery rods) peaking at 505nm with limited wavelength in red spectrum but enhanced wavelength in blue spectrum and photopic (cones in fovea) peaking at 555 nm with limited wavelength in blue spectrum but greater wavelength in red spectrum. Red light/goggles are often used in scotopic conditions to help adjust quickly to low light levels by reducing / eliminating saturation of the rods. it brings the curve closer to what it would be in the daytime. This shift in peak sensitivity to different wavelengths is know as the Purkinje shift. This shift from photopic to scotopic vision, however, can be very slow. Red goggles keep both active.
16. Plot the relationship between scotopic and photopic spectral sensitivity and use this plot to explain the Purkinje shift and why red light or red goggles are often used under scotopic conditions.
Photogrammetry - Photogrammetry is the science of making measurements from photographs and dates back to mid 19th century. The output of photogrammetry is typically a map, drawing, measurement, or a 3D model of some real-world object or scene. How it creates realistic 3D graphics: Start with taking many photos of an object from every conceivable angles. The bigger the structure, the more photos needed. Use software with a special algorithm like AgiSoft Photo Scan to convert 2D images into 3D images through stitching all the photos together. The photos are then compressed since the software creates huge photos. The very high resolution 3D models can then be used for topographic mapping, architecture, engineering, movies (like The Matrix), and video games.
164. What is photogrammetry? How does it create realistic 3D graphics?
Synesthesia - Synesthesia is a neurological phenomenon where stimulation in one sensory or cognitive pathway becomes linked involuntarily with stimulation in another separate, potentially unrelated neural pathway. Examples: Grapheme à Color Synesthesia (most common) - individual's perception of numbers and letters is associated with the experience of colors. Music à Color Synesthesia ("Chromosthesia") - individual's perception of sounds evokes an involuntary experience of color Number form synesthesia - a mental map of numbers which automatically and involuntarily appears whenever someone who experiences number-form thinks of numbers. Numbers are mapped into distinct spatial locations and the mapping may be different across individuals. Personification - (ordered sequences such as numbers, days, months, and letters are associated with personalities) Lexical à gustatory synesthesia - (read/hear/articulate words or word sounds leads to experience of taste sensation on tongue) Evidence that synesthesia is a real perceptual phenomenon The most common evidence is grapheme-color synesthesia where those who perceive certain letters and numbers as having specific, attached colors will be able to list the same colors for the numbers/letters decades later. (5 inside of S's, SS5SS, are hard for normal people to identify).
165. What is synesthesia? Describe one line of evidence that synesthesia is a "real" perceptual phenomenon.
The video is remarkable because despite being blind, Ben Underwood is able to map a detailed mental plan of his surroundings using sound bounce backs of echolocation/sonar techniques. He makes clicking sounds, and he is able to use the echoes he hears to precisely locate where objects around him are. According to the video, he is the only person in the world who sees using echolocation. He taught himself echolocation at the age of five. Skills: He can run, play basketball, bicycle, rollerblade, play football, skateboard and video games through the use of his other senses and the detection of objects by making frequent clicking sounds with his tongue.
166. What is so remarkable about the boy that "clicks" video? What skills did he have?
1. The "vOICe" Vision Technology (Visual Substitution) Offers the experience of live camera views through image-to-sound renderings (allows the totally blind to see with sound) Images are converted into sound by scanning them from left to right while associating elevation with pitch and brightness with loudness The technology is a pair of augmented reality glasses with a camera and computer system that converts frequencies from images into soundscapes like distorted sound waves. Those who use visual substitution systems are using their visual cortex in a highly proficient manner. The issue with visual substitution systems is that it renders the sense being used useless for its original purpose. For instance, those who are born blind don't think about using their eyes since they rely on their auditory system and are already good at navigating environments without their visual system. 2. Artificial Eyes (Cortical implants) This technology is useful for patients with no eyes. The artificial eyes are implanted straight on the cortex - not approved by the US since it is very dangerous and prone to infections. The eyes have lens with ultrasonic distance gauge and camera, which sends signals to external computer system which then sends processed signal to the brain. Has 200 electrodes per eye. The result is "sight" of dot formations. (See Lecture 20) 3. Artificial retinas: for people with photoreceptor degeneration, but still functioning ganglion cells --> implant converts light into an electrical signal that stimulates ganglion cells directly to activate visual system. They stimulate the ganglion cells not the receptors. But ganglion cells respond to light differences not just light. For example, a uniform field of light does not stimulate your ganglion cells.
167. Provide three examples of technology that allow the blind to see.
In terms of color vision, the new, evolved What system needed to be backwards-compatible with the already existing achromatic Where system. Also, primates still needed to be able to see the black-and-white images. Thus, rather than starting from scratch with three independent cone signals, the new What system has added two color-difference signals to the already existing luminance signal. Television can tolerate much less information in the two color-difference signals than in the luminance signal because the Color part of our What system has a lower resolution than the Form part of our What/Where system, so a low-resolution color signal doesn't look as bad as a low-resolution luminance signal would. We tolerate especially low resolution in the blue-difference signal because our blue-yellow resolution is lower than our red-green color resolution. This is because 1% of our cones are blue and 99% are red and green. **We don't even notice that the color part of the video signal in our TV is much lower resolution than the luminance image.
168. Relate the evolution of color vision with the 'evolution' of color television. (Livingstone).
Color Television: In color television, the screen is coated with red, green and blue phosphors arranged in dots or stripes. Pointillism: Pointillism is a technique of painting in which small, distinct dots of color are applied in patterns to form an image. Similarities: Both use a limited color selection to get many different colors. We blend the TV color image spatially because the individual pixels are too small to be resolved by the photoreceptor spacing in our retinas. Similarly, in pointillism, photographs or paintings consists of tiny dots of colored pigments that we blend spatially because they are too small to be resolved. Differences: TV images not only blend colors spatially but also chromatically and temporally. We blend TV image chromatically in that we don't see only various shades of red, blue, and green, but all the colors in between. We blend TV image temporally by using interlacing. Television use pixels instead of pigments, which are spots of phosphor, a luminance material. The rows of pixels on a television screen are illuminated alternately and not sequentially. Interlacing is used to minimize flickering.
169. Describe the similarities and differences between color television and pointillism (Livingstone)
The Hermann grid is the black spot illusion. In this Hermann Grid, there is light coming from the four sides of the intersection, but from only two sides of a band going away from the intersection. The region viewing the intersection is more inhibited than the region of the band going away. Thus the intersection appears darker than the other section. You see dark spots at the intersections of the white bands, but not at the points away from the intersections.
17. What is the Hermann grid and what is the current explanation?
Correlational properties (spatial frequency versus amplitude) in "good" art are very similar to real world frequencies because that's what we like Rules of natural images and art: Neighboring points are highly correlated Edge hypothesis: If there is a sudden change in intensity then that change is likely to continue along the discontinuity Continuity hypothesis: Edges are part of extended surfaces Examples of artwork: 171 -181 in textbook that talks about pointillism
170. Provide two examples of art that demonstrate the relationship between spatial resolution and color perception (Livingstone).
Google Glass Google Glass is an example of Augmented Reality. Basic Components: Processor Display (prism w/curved surface that projects image at correct optimal distance) Speaker Camera & Microphone Battery Sensors (accelerometer, etc.) Input devices (buttons, swipe gesture sensor on frame) Perceptual Experience: It allows you to view overlaid information in your normal field of view. This then allows you to function normally but received various notifications or other data forms in way that covers a portion of your regular view. It creates an image to bounce off into your eye that puts it at the right optical distance / source of data (like a projector). In the Google Glass, this information is presented in the upper right corner of your field of view. Oculus Rift Oculus Rift is an example of Virtual Reality and is a VR headset. Basic Components: Head-mounted display (HMD) - helmet / headset device presenting visual info Motion tracker - method of tracking position of the head Interactive input - input changes depending on the position of the head, corresponding to 3D movement. Means of interacting with the environment. Lens - magnifies so what you see will be at infinity Perceptual Experience: Using VR headsets, one is able to become immersed in a digitally created world. The field of view is large enough to create a perceptually believable environment that allows one to feel entirely transported.
171. Describe the basic components of Google Glass and the Oculus Rift. Briefly describe the perceptual experience of each.
Training - Military training and operations was one of the first practical applications of VR (flight simulators, development and testing of combat strategy). It allows avoidance of exposure to hazards such as firing of real weapons, or performing dangerous maneuvers. It also allows pilots and vehicle operators to operate a vehicle without windows or remotely. Medicine - VR has successfully helped burn victims as well as amputees with efforts in rehabilitation, reducing phantom or real pains as well as helping to cope with new lifestyles. Also, VR devices provide mechanisms for handicapped people to communicate and navigate. Physically handicapped individuals can participate in virtual activities such as sports or dancing. Science and Education - Interactive 3D virtual environments can help doctors, scientists and others to visually explore scientific models and improve understanding. Mathematicians can visualize and interact with three-dimensional equations. Students can learn about scientific concepts by experiencing them in VR. Chemists can study molecular structure and interactions without actually mixing chemicals.
172. Outside of entertainment, describe three current applications of virtual reality.
Force feedback systems - Force feedback combines output of forces from the system with input of positions and forces to the system. In other words, the user feels the force of objects in response to the forces he/she applies. (ex. the Dextrous Handmaster) Limits Update rates (too low can impact the "stiffness" of surfaces) Force model (how realistic the haptic rendering is), Safety (force-feedback achieved via robotics with a human requires a fail-safe) Cannot actually give the feeling of the thing, just the force or movement, but even this is limited at the scale of feedback on the fingers
173. What are the limits in getting force feedback in Virtual reality?
Missing aspects: Other sensations besides the vision (i.e. texture, smell, etc). Lack of tangibility and true receptor sensation (i.e. pain, irritation, abrasion, textures, etc.) Hard to fool hands since it needs to induce feeling of mechanoreceptor cues (i.e. shape, texture, temperature, tactile acuity) and proprioceptive feedback (ie. shape, force, firmness). Update rates (too low can impact the "stiffness" of surfaces) Force model (how realistic the haptic rendering is) Safety (force-feedback achieved w/ robotics with a human in the loop requires a fail-safe) Aspects likely to change over the next 10 years: Resolution (Retina display) Augmented reality / More interactivity Data that's available More colors beyond RGB triangle
174. If you were to take a vacation using current virtual reality software, what aspects would be missing? What aspects are likely to change in the next 10 years?
To what sorts of changes can we adapt? We can adapt to distortions. (i.e. virtual reality, telepresence, and artificial limbs) Virtual reality - accomplished by changing the distance between the eyes and magnifying the movements. Telepresence - allows a person to feel as if they were present, to give the appearance of being present, or to have an effect, via telerobotics, at a place other than their true location. Artificial limbs attached to nerves. Others: orientation, spatial frequency, light, color, motion What adapts? One can adapt one hand without the other. (Proprioception: the perception of where the hand feels to be in space is adapted). The body part that we are using in the perceptual field will adapt How does the Held and Hein (1963) study with kittens relate to prism adaptation? The Held and Hein study ("Kitten Carousel") aimed to investigate the role of experience in perceptual-motor development. Kittens were kept in the dark for 8 weeks from birth except for an hour per day when they were kept in the "Kitten Carousel". The Active kitten was allowed to move freely here while the Passive kitten was not. Though the Active and Passive kittens received an equal amount of visual experience, the experiences the Passive kitten had were not related to its movements, while the Active kitten had visual experience tied to his emotions. In the end, both were released into the light, with the Active kitten being indistinguishable from normal kittens, but the Passive kitten showing no evidence of perceiving depth Prism adaptation is when the motor system adapts to new visuospatial coordinates imposed by prisms that displace the visual field. The Kitten Carousel study showed the importance of self-produced movement in the development of visually guided reaching, as in human prism adaptation (in other words, the importance of active limb movements for prism adaptation).
175. What are the limits of perceptual adaptation? a. To what sorts of changes can we adapt? b. What adapts? c. How does the Held and Hein (1963) study with kittens relate to prism adaptation?
Prism adaptation is when the motor system adapts to new visuospatial coordinates imposed by prisms that displace the visual field. Explanation: Initially, the person makes pointing errors to the right of a target, but errors disappear in a few pointing trials depending upon exposure conditions. The person has smoothly adapted to the prismatic displacement. Once the prisms are withdrawn, the degree and strength of the adaptation can be measured by the spatial deviation of the motor actions in the direction opposite to the visual displacement imposed by the prisms - a phenomenon known as aftereffect. Adaptation is smooth because what you are perceiving in space must be a continuous point map. Initially, adaption is linear (but debated) and has sudden transitions. Adaptation is local to the body part you are actively using but not local to the world. It is not visual phenomena. Your vision doesn't adapt, the body part adapts and only the body parts used adapt. Adaptation is active because you need to interact to adapt. When you hide one hand behind back and use your other hand, you only adapt with that one hand that is used
176. Prism adaptation is smooth, and local and active. Explain.
The TVSS (Tactile Vision Substitution System) constructs devices which are able to provide blind subjects with a substitution for vision making use of their tactile perception. Guarniero, a blind man, was the first to be trained to use TVSS. He had pins placed on his back, and the light intensity changed the vibration of the pins. Guarniero eventually stopped feeling as though he had vibrating pins on his back and started to see things happening in front of him as he learned. The array of vibrating pins were used to transmit the image picked up by a camera to the skin of his back. The Visual Cortex is NOT involved in the interpretation of sensory inputs received from device. Sensations appeared on his back but then they appeared in a 2D space. Training allowed him to detect things in an ordered, 2D space but without precise locations. At first, he was able to detect if an object was moving or still. He had to learn how to differentiate his own influences on the camera (position, tilt, movement, etc.) from information in the real world. With the help of zoom, he was eventually able to detect objects and their features as well as determine distances. Then, he detected object rotation/orientation, and finally learned how to use plane mirrors so as to see the front and back of something simultaneously. In particular, he was fascinated at the movement of a lit candle flame and even more so when finally 'seeing' himself.
177. What is the TVSS? Briefly describe Guarniero's experience with it (see readings).
Subliminal perception is the perception of or reaction to a stimulus that occurs without awareness or consciousness. The Kunst - Wilson Study The Kunst-Wilson study says that animal and human subjects readily develop strong preferences for objects that have become familiar through repeated exposures. Experimental evidence is presented that these preferences can develop even when the exposures are so degraded (subliminal) that recognition is precluded. They found that subjects have a strong preference for familiar stimuli - even if subjects could not actually distinguish exposed stimuli from novel stimuli. Results suggests that there may exist a capacity for making affective discriminations without extensive participation of the cognitive system (aka subliminal perception). It also indicates that various forms of affect-linked reactions are possible with only minimal access to the content.
178. What does the Kunst-Wilson study say about the possibility of subliminal perception? What did they find?
Anchoring is a cognitive bias in decision making caused by a relevant or irrelevant source of information (the anchor). It describes the common human tendency to rely too heavily on the first piece of information offered (the anchor) when making decisions. Yes, anchoring is an example of subliminal perception. Because it's below your perceptual threshold Example: Subjects shown a photo of a restaurant that had a sign studio 17 or studio 97. Participants were asked how much they would pay for dinner at these 2 establishments. People said they were willing to pay $24.58 at studio 17 People said they were willing to pay $32.84 at studio 97 Effects were even stronger for those that could not recall the name of the restaurants.
179. What is "anchoring"? Is this an example of subliminal perception?
The prototype effect in face recognition refers to a tendency to recognize the face corresponding to the central value of a series of seen faces, even when this central value or prototype has not been seen. showed that humans readily identify a prototype dot figure (a triangle) underlying a series of distorted patterns of dots (in Figure 17 to the left, numbers represent the level of distortion given in bits/dots). As a result, the more similar a novel pattern was to a category prototype, the easier it was to classify. Posner and Keele: In the experiment, subjects are shown variations of dots orientation. After, when shown the prototype and an image of a pattern that was seen, we experience more familiarity to the prototype. It showed that memory is better for the prototype than for the actual pattern. Posner and Keele showed that humans readily identify a prototype dot figure (a triangle) underlying a series of distorted pattern of dots. Thus, the more similar a novel pattern was to a category prototype, the easier it was to classify. We recognize many objects in terms of it variation from the prototype.
180. What is the relationship between prototypes and face recognition? What did Posner and Keele show?
Generating a caricature: A caricature is a picture, description, or imitation of a person or thing in which there is an exaggeration of certain distinct characteristics and oversimplification of that person or thing. A caricature is generated by placing heavy emphasis on a few defining characteristics of the target in order to create a comic or grotesque effect. Relevance to prototype: Prototypes are the average or typical version of something, while a caricature emphasizes the traits which are most different from the typical. We recognize distinct faces with greater variation away from the prototype more easily because we can most quickly recognize things when we can perceive their most defining characteristic. Caricatures emphasize defining characteristics, thus we recognize people easier in them.
181. How does one generate a caricature? How is this relevant to prototypes?
Conjunctions of two stimulus parameters (more stimulus) doesn't mean more information will be remembered by an observer. It provides the same information but if the information is mixed, segregation becomes difficult and it takes longer to search for the information. Having one dimension, we can remember about 7 +/- 2 ,and if we have another parameter, it will be a conjunction. If the other stimulus parameter has 7 +/- 2 components, we probably cannot handle the information of the conjunction. Successive features add less and less information. It requires focal attention to glue each parameter together to see if they match.
182. Can conjunctions of two stimulus parameters provide more information to an observer than just one? Does it provide twice the information? Explain.
Identification of an object based on a single feature is faster than when based on a conjunction of features. Fast search occurs when a single feature differs (everything is same, but one is different). When in conjunction, the response time to search increases. The segregation between the features also becomes less visible. Search task & segregation task - can't find my conjunction of the features Illusory conjunctions - Conjunction searches make things difficult
183. How does the identification of an object based on a single feature differ from the identification based on a conjunction of features?
Evidence that subliminal perception exists Semantic priming Kunst-Wilson Study (mere exposure effect): The Kunst-Wilson study showed that exposure to stimuli can increase affection for stimuli without recognition. It provides evidence that subliminal perception exists because we tend to like things that we have already subconsciously seen. For example, subjects were shown polygons and asked which ones they had seen before and which one they liked better. The one they liked better was more likely to be the one they had seen before. Corteen & Wood Study (electric shocks/city names): The Corteen & Wood study showed that conditioned electrodermal responses to city names, presented via earphones, could be elicited without the subjects being aware of the presentation of the city names. In the first phase, subjects were electrically shocked when presented city names. In the second phase, conditioned words were presented in the left ear with other phrases presented in the right ear. Subjects were instructed to ignore the left-ear input and only pay attention to the right-ear input. Results showed that previously shocked city names elicited more electrodermal responses, so they were processing things without even knowing it (they heard city names and knew they were city names). Likely Conditions: Detection is required for subliminal perception to be likely Must be detected but not actively recognized Unlikely Conditions: Very precise timing would be required for flashed stimuli. The subject must not be actively looking for the subliminal message, as unattended stimuli are often easily recognized when attended. Subliminal is when perceptual effect is greater than recognition
184. What is the evidence that subliminal perception exists? What are the conditions under which it might be possible? Under what conditions is it unlikely?
Semantic Priming: Semantic priming refers to the observation that a response to a target (i.e dog) is faster when it is preceded by a semantically related prime (i.e. cat) compared to an unrelated prime (i.e. car). The prime and the target are from the same semantic category and share features. When a person thinks of one item in a category, similar items are stimulated by the brain (i.e. the word dog is a semantic prime for cat, because the two are both similar animals). Corteen and Wood Experiment: Corteen and Wood conducted an experiment that suggested that unattended words are processed to a semantic level. They associated electric shocks with words to produce a conditioned galvanic skin response. After the conditioned response was established, the participants performed a task where the shock-associated words were presented to the unattended ear. Although participants were unaware of the words being said, they showed a clear galvanic skin response not only to the shock associated words but also to semantic associates of shock words. This experiment proved that unattended messages, although unconscious, was still processed to a semantic level. Shadowing: During shadowing, participants wear a special headset that presents a different message to each ear. The participant is asked to repeat aloud the message (called shadowing) that is heard in a specified ear (called a channel). Cocktail Party Effect: The Cocktail party effect is a phenomenon of being able to focus one's auditory attention on a particular stimulus while filtering out a range of other stimuli, much the same way that a partygoer can focus on a single conversation in a noisy room. Stroop Effect: The Stroop Effect is the phenomenon that the brain's reaction time slows down when it has to deal with conflicting information. This slowed reaction time happens because of interference, or a processing delay caused by competing or incompatible functions in the brain. A common example is being asked to name the colors of words that represent colors (i.e. the word "blue" may be colored red, so you should say "red") Illusory Conjunctions: Illusory conjunctions are psychological effects in which participants combine features of two objects into one object. For example, a green circle is the illusory conjunction of a red circle and a green rectangle next to it.
185. Be able to identify and give a brief explanation of: Semantic priming, Corteen and Wood experiment, shadowing, the cocktail party effect, Stroop effect, illusory conjunctions.
If you designed a speedometer on a car that used luminance to tell velocity, since luminance is nonlinear, you would not notice the change in luminance after you pass a certain speed level. Because we are not good at discriminating luminance levels at a bright level, it will seem as if the car speeds up fast in the beginning, but it will also seem as if the car picks up speed at a slower rate once it starts going.
186. What would happen if you designed a speedometer on a car that used luminance to tell velocity?
Subliminal perception is the perception of or reaction to a stimulus that occurs without awareness or consciousness. It occurs whenever stimuli presented below the threshold for awareness are found to influence thoughts, feelings, or actions.
187. How should one define subliminal perception?
Affordance is the quality of an object that suggests how it might be used. For example, a mouse button invites pushing (in so doing acting clicking) by the way it is physically constrained in its plastic shell. At a very simple level, to afford means "to give a clue" (Norman, 1988). When the affordances of a physical object are perceptually obvious it is easy to know how to interact with it. Relation of Affordances to Design of Effective Displays: Shadowing, overlap makes it seem as if there's one window or object behind another Mouse goes in direction of finger
188. How does the concept of affordances relate to the design of effective displays? Give three examples.
Stages: 1) Recognizing the face as a face Decisions about faces are faster than face naming 2) Recognizing the face as familiar Familiarity' decisions are faster than identity decisions (Young et al.,1986). 3) Retrieving stored biographical knowledge Hardly ever possible to name a face without knowing something else about the person (Flude et al 1989) 4) Retrieving the name of the person Decisions about familiarity are faster than decisions about person knowledge which are in turn faster than producing the name (Young et al 1986) After a stroke if you lose one stage, you lose the following stages as well.
189. Recognition of a face appears to have a number of stages. Explain. Provide one line of evidence for these stages?
Myope - Nearsighted; When close objects can be seen clearly but distant ones do not come into proper focus; Visual system has too much power Presbyope - When the lens of the eye loses elasticity, resulting in an inability to focus sharply for near vision. When you are young, the lens in your eye is soft and flexible (changes shape easily, allowing you to focus on objects both close and far away). After age of 40, lens become more rigid, so lens can't change shape as easily as it once did, making it more difficult to read at close range Age & Presbyopia - The lens harden with age, becoming almost completely rigid at 48, losing the ability to focus on close objects because only a few diopters of accommodation are left. VR Helmet Design - The design of the virtual reality helmet might be problem because people of older age will focus constantly (because their lens have hardened) while the younger people will try to look far away and go in and out of focus (because they can).
19. In terms of the power of the lens and cornea, describe the difference between a myope and a presbyope. What is the relationship between age and presbyopia? What does this say about the design of a virtual reality helmet?
Negative priming is an implicit memory effect in which prior exposure to a stimulus unfavorably influences the response to the same stimulus. It is the effect caused by experiencing a stimulus and then ignoring it. After a person has ignored a stimulus, the processing of that ignored stimulus shortly afterwards is impaired. Negative priming is related to inattentional blindness and will slow down processing speed. According to the episodic retrieval model, this shows that our brain will flag ignored stimuli and deal with them later, causing a conflict that takes time to resolve resulting in negative priming.
190. What is negative priming?
2D Representation Depth cues let it be perceived as a depth and give more clues to infer 3D. We see 2D, the depth cues give us 2.5D, giving info to infer and assume 3D. 2.5D Representation To truly see in 3D, you would have to be in a 4th dimension. You see 2D because you are outside of that dimension, so our vision is 2.5D. When we change angle of 2.5D, the perception of 3D is gone 3D Representation Our brain cannot get all 3D vantage points; it's an inference 3D model can give all vantage point, but any perception of information will be at 2.5D Escher's Paintings Gives you cues about the 2.5 D cues. No global 3D model that make sense of all the 2.5D cues
191. Describe the difference between 2D, 2.5D and 3D representations. How can Escher's paintings be explained in terms of this difference?
1. Our visual system assume angles are 90 degree angles when possible (i.e. thinking 2D cube is 3D) 2. If lines are connected in 2D, we assume they are connected and overlapping in 3D 3. If lines are parallel in 2D, we assume they are parallel in 3D 4. Early visual system: points are similar / correlated. assumes neighboring points are correlated, just wants to know when it is different (visual system assumes difference is due to an edge)
192. Provide 4 examples that demonstrate that the visual system is making assumptions about what is likely in the environment. Include examples on the relation between 2D and 3D structure.
1. Visibility and Feedback Visibility: The more visible functions are, the more likely users will be able to know what to do next. In contrast, when functions are "out of sight," it makes them more difficult to find and know how to use. Feedback: Feedback is about sending back information about what action has been done and what has been accomplished, allowing the person to continue with the activity. Various kinds of feedback are available for interaction design - audio, tactile, verbal, and combinations of these. 2. Natural Mapping Natural mapping refers to the relationship between controls and their effects in the world. Nearly all artifacts need some kind of mapping between controls and effects, whether it is a flashlight, car, power plant, or cockpit. An example of a good mapping between control and effect is the up and down arrows used to represent the up and down movement of the cursor, respectively, on a computer keyboard. 3. Constraints Constraints: The design concept of constraining refers to determining ways of restricting the kind of user interaction that can take place at a given moment. There are various ways this can be achieved. 4. Design for Error Design for error: Design should allow for human error. Designers should understand causes of error and try to minimize them. They should also make it possible to undo actions and make it easier to discover when errors occur and make them easy to fix.
193. Describe four "rules of design" (according to Donald Norman).
Frank Rosenblatt was an American psychologist well-known in the field of artificial intelligence. He developed the first artificial neuron-based computer (Perceptron). It was the first computer that could learn new skills by trial and error, using a type of neural network that simulates human thought processes. Perceptron vs Deep Network: Perceptron only was a single layer of nodes while a deep network has more than one with some up to the hundreds or thousands.
194. Who was Frank Rosenblatt? How does the perceptron differ from a deep network?
The Window of Visibility: The window of visibility refers to the idea that out of all the information in the world, there is a particular window of information that we can visibly perceive. For instance, the human eye can sense electromagnetic radiation only within a band of wavelength extending from about 380 to 780 nm. Outside this band, we are blind. This band is thus a relatively narrow window through which we can see. Other examples include flicker at ~50 hertz and acuity at 100 pixels per degree. The implications, for the design of displays and visual tasks, are clear, so displays can be designed around this in order to increase our window of visibility. Discrimination Limits: Discrimination limits refers to our inability to distinguish one object from another despite being able to detect different objects / stimuli. For example, flashing displays may use 5Hz to mean one thing and 6Hz to mean another. A person could discern that there is flashing, but could not tell the difference between the two. Nonlinear Relationships: Nonlinear relationships refers to the idea that some things do not have equal integrals of detectable change to the human eye. For instance, luminance changes are detectable early on, but even out and are non-discernable after a certain threshold. So, if one were to try and use luminance to show the amount of fuel in a car, it would be ambiguous until the fuel was very low since luminance in nonlinear. Capacity Limits: Capacity limits refers to the idea that the relationship between transmitted information to the input information shows that we have a limited capacity for information before we stop accurately perceiving it; when we are presenting with overwhelming information, our working memory cannot comprehend it. This applies to displays because mobile and web interfaces can easily be oversaturated with information and should be designed to display information that can be seen within our capacity limits. Conjunction Limits: Conjunction limits refer to the idea that we are not good at determining things when presented with multiple cues in conjunction. For example, all planes in the air above the US flashes if they are low on fuel. Civilian planes are green, while government planes are red. We are not good at isolating cues when trying to determine civilian planes that are low on fuel, since there are two cues in conjunction - flashing and green.
195. Provide examples of how each of the following limits relates to the design of a display. a. The window of visibility b. Discrimination limits c. Nonlinear relationships d. Capacity limits e. Conjunction limits
Limitations in detection (visibility) Humans can only perceive the visible spectrum. The visible and ultraviolet view of a flower are substantially different. A billboard that has a phone number well below your range of visibility. Discriminability limitations. Provided information is detectable, but you are unable to tell apart Light flashes at 5Hz if it is safe to stay in building and 6Hz if it is not. You can tell it is flashing, but unable to detect the difference between safe and unsafe. Limitations in identification (capacity) Multimedia is overwhelming our capacity. Working memory (short term) has a small capacity. The efficient conversion of information from working memory to long term memory requires full capacity. Multimedia - hyperlinks etc grab working memory. A number of studies demonstrate significantly lower comprehension of essays when hyperlinks are added. More information than you can handle Motion Object Tracking - MOT 1. As many as 4 to 5 objects can be tracked simultaneously 2. Targets can be tracked even when they disappear behind an occluder and, under certain conditions, even when all objects disappear from view as in an eye blink 3. Parts of objects (e.g., end points) are difficult relative to whole objects. 4. Can improve with practice - video gamers are, on average, better
196. Provide two examples of each of the following when viewing a display device: Limitations in detection (visibility). Discriminability limitations. Limitations in identification (capacity)
Experiment: Six different lengths of lines were shown labeled from 1 to 6. After a blank slide, one of the six lengths was shown. The majority of observers could remember the label of that particular line. However, when there were 10 different lengths of lines, and one of them was shown, people had a hard time getting the correct label.
197. Describe an experiment showing that sensory identification is limited to 7 plus or minus 2 levels.
Designing such a display is not an easy task since there may be perceptual problems involved like illusory conjunction, where a brief presentation / glance may result in seeing the wrong racecar. Also, it will take longer for viewers to find a green race car that is also low on fuel, since they have to look for two different cues rather than just one.
198. You have been asked to design a display to monitor racecars on a race track. The display should provide an aerial view of the track and allow the observer to quickly identify each of the 8 racecars, provide a cue as to the amount of fuel left and warn the observer when the fuel in any of the cars is critically low. Discuss the perceptual problems involved in designing such a display and a possible solution.
Biometrics Examples Face recognition Fingerprint recognition Retina recognition Issues: Someone can know about you by simply using face recognition tech; these biometric devices are installed in many public spaces. Security systems uses person's movement for recognition
199. Provide 3 examples of biometrics and 2 issues regarding privacy
A 100 watt lamp that produces most of its energy in the range of 500-540 nm will be brighter. If watts changed to nitts, then they would be perceived to be the same brightness. Why: 500 to 540 nanometers → GREEN (at peak of visible light spectrum) 650 to 700 nanometers → RED (outside, at end of visible light spectrum) Lux (unit of illuminance) = brightness. Watts don't measure brightness.
2. What will be perceived as brighter? A 100 watt lamp that produces most of its energy in the range of 500 to 540 nanometers or a 100 watt lamp that produce most of its energy between 650 and 700? (Livingstone Chapter 2)
Grapheme -> color synesthesia Letters and numbers are linked to specific colors. Often there are common trends (b, blue, etc.) but this is the MOST COMMON type of synesthesia. Proof of this is that individuals with this always remain consistent with the colors represented by each grapheme, regardless of environment or priming. Sound -> color synesthesia Specific pitches and tones are linked to colors. Famous composers are thought to have this and are used as an example of its existence. Number(Letter?) -> form synesthesia Numbers (and in some cases maybe letters) have distinct spatial relationships to each other to people with this kind of synesthesia. Each number has a place in a mental "map" and it can help the individuals solve problems or cause them to be at unease when numbers are in a position which contradicts their mental mapping. People with this can draw the same map with the same numbers at any time as it remains consistent.
200. Describe 3 forms of synesthesia. Provide one line of evidence that this is a real perceptual phenomenon. (W)
RSVP (rapid serial visual presentation) - an experimental technique. When you ask a group of people "Is there a dog in one of the following pictures?" and then showed them a set of images quickly all in the same place at a rate of 10 per second.... people will be able to detect the presence, or absence, of a dog somewhere in the sequence of images most of the time. This method has shown the maximum ability of people detecting common objects in images is about 10 images per second. People can recognize objects in images that are presented very rapidly.
201. Describe the RSVP method. What does it tell us about the speed of processing? (W)
Experimental work by Biederman and Cooper (1992) suggest that the optimal size is 4 - 6 degrees of visual angle. (p229)
202. According to Wade, what is the optimal size for object recognition?
Priming - if you identify something, even if it is a fleeting meaningless encounter, you will identify it faster if you see it again in the near future Bar and Biederman (1996) Exposed pictorial images to subjects so briefly that it was impossible for them to identify the objects. They followed the brief image exposure with what is called a visual mask, a random pattern shown immediately after the target stimulus to remove the target from the visual iconic store, and they rigorously tested to show that subjects performed at chance levels when reporting what they had seen. Fifteen minutes later, this unperceived exposure substantially increased the chance of recognition. Although the information was not consciously perceived, exposure to the particular combination of image features apparently primed the visual system to make subsequent recognition easier. Result: found that the priming effect decreased substantially if the imagery was displaced sideways. The mechanism of priming is highly image dependent and not based on high-level semantic information Lawson et al. (1994) Devised a series of experiments in which subjects were required to identify a specific object in a series of briefly presented pictures. Recognition was much easier if subjects had been primed by visually similar images that were not representations of semantically related objects. They argued that this should not be the case if objects are recognized on the basis of a high-level, 3D structural model—only image-based storage can account for their result. Result: supports to the image-based theory of object recognition, because the effects are based on 2D image information
203. Describe two experiments with priming that provide insights into object recognition. (W)
Life-Logging: the idea that it is becoming possible to have a personal memory data bank containing video and audio data collected during every waking moment through the course of a person's lifetime. Problem : seeing a video replay is not at all the same as remembering.
204. What is "life logging"? Why is search a problem? (W)
Geon Theory - a hierarchical set of processing stages leading to object recognition. Visual information is decomposed into edges, then into component axes, oriented blobs, and vertices. The next layer, 3D primitives such as cones, cylinders, and boxes, called geons, are identified. Then a structure is extracted that specifies how geon components interconnect. Finally, object recognition is achieved.
205. Describe the basics of Biederman's "geon theory" (W)
Facial Action Coding System (FACS) - is a widely applied method of measuring and defining groups of facial muscles and their effect on facial expression. Both static and dynamic facial expressions are produced by the contractions of facial muscles Eyebrows, mouth, and the shape of the eyes are very important in portraying realistic emotion. False and true smiles can be distinguished from each other by a particular expression around the eyes. FACS is applied to Avatars in order to allow them to convey realistic human emotion. Appropriate facial expressions may help make virtual salespersons more convincing.
206. What is FACS theory and how does this relate to Avatars. (W)
Canonical view - View which objects are most easily identified. (Not all views of an object are equally easy to recognize) Theory: we recognize objects by matching the visual information with internally stored viewpoint-specific exemplars (prototypes)
207. Describe the role of canonical views in object recognition. (W)
Static version - better retention of the information and better ability to generalize from the materials, indicating a deeper understanding. Advantages: Animated brings graphics closer to words in expressive capacity Booher (1975) concluded that an animated description is the best way to convey perceptual motor tasks.
208. Describe two advantages of using animated images over static images (W)
Deep learning refers to artificial neural networks that are composed of many layers. A supervised algorithm is fed labeled data whereas an unsupervised learning algorithm is fed unlabeled data.
209. What is deep learning? Is this a supervised or an unsupervised learning algorithm?
Lens flare is a stray patch of brightness in a photographic image resulting from aberrant refractions or reflections within the lens due to an exceptionally bright light source, sometimes one just outside the image proper. Lens flare is usually unwanted, but can be exploited for artistic effect. Lens flare can also be caused by imperfections on the lens. The light entering the eye is coming from a diffuse source, scattering the rays and making the source seem intensely bright. This works because our brain perceives this impression of false increased contrast from the scattered light, making the rest of the image seem darker compared to the perceived brighter light source.
21. What is the relation between lens flare and perceived brightness. Why does this work?
Scene gist - important in data visualization because what we see depends enormously on the context. The gist of familiar visual displays will be processed just as fast as the gist of natural scenes and it will have a similar effect on our response biases.
210. What is scene gist? How long does it take to recognize the gist? (W)
1. Oddly high confidence that the change would be noticed 2. Changes in the focus of interest more likely to be noticed 3. Suggests simple labels for the environment; change in race, gender, significant age is noticed 4. Older individuals are more susceptible
211. What is meant by O'Regan's comment that the world has its own memory? How is that relevant to change blindness? (W)
The phenomena where most of the time we simply do not register what is going on in our environment unless we are looking for it. Attention is central to all perception. Although we are blind to many changes in our environment, some visual events are more likely to cause us to change attention than others are. (Mack and Rock 1998 experiment)
212. What is 'inattentional blindness'? (W)
An epistemic action - an activity intended to uncover new information. A good visualization isn't not just a static picture or a 3D virtual environment that we can walk through and inspect like a museum full of statues Good visualization - something that allows us to drill down and find more data about anything that seems important "overview first, zoom and filter, then details on demand." A good computer-based visualization is an interface that can support all these activities.
213. What are epistemic actions and how do they relate to perception? (W)
Clearly describes method or process for solving a problem. Epistemic Actions - actions designed to seek info in some way. Includes eye movements to focus on different part of a display and mouse movements to select data objects or navigate thru/ a data space Externalizing - these are instances where someone saves some knowledge that's been gained by putting it out into world. e.g. adding marks to paper, or entering something into a computer
214. Describe an example of a 'visual thinking algorithm'. (W)
Object displays will be most effective when the components of the objects have a natural or metaphorical relationship to the data being represented (p240) Make use of existing perceptual mechanisms like artificial spatial cues. One common technique that is used to enhance 3D scatter plots is dropping a line from each data point to the ground plane. Without these lines, only a 2D judgment of spatial layout is possible. With the lines, it is possible to estimate 3D position. (p279) NOT SURE
215. Describe four things that you learned from the Ware text regarding techniques that produce more effective displays.
The Day for Night technique is the name for cinematographic techniques used to simulate a night scene while filming in daylight. Techniques: Using blue shift/filter and blurring the image a little Using tungsten-balanced rather than daylight-balanced film stock or special blue filters Underexposing the shot (usually in post-production) to create the illusion of darkness or moonlight.
22. What is the Day for Night technique. How does it work?
Lateral Inhibition is the second stage of level variance. Lateral inhibition is the capacity of an excited neuron to reduce the activity of its neighbors. Lateral inhibition disables the spreading of action potentials from excited neurons to neighboring neurons in the lateral direction. It helps the visual system to factor out the effects of the amount and color of the illumination. (p. 84) Lightness Constancy is the perception that the apparent brightness of light and dark surfaces remains more or less the the same under different luminance conditions. Can be represented using the luminance ratio between two surface patches with the same illumination. Lateral inhibition provides the mechanism for measuring luminance ratios (log(center) - log(surround) = log(center/surround)).
23. What role does lateral inhibition play in lightness constancy?
If you are myopic, it means your eye has too much power. Negative lens (concave) help by compensating for the excessive positive diopters.
24. Explain why a negative lens helps a myope to see better.
Presbyopia is the condition in which the lens of the eye hardens with age, losing its ability to focus, making it difficult to see objects up close. Presbyopia usually affects everyone after the age of 40 and occurs naturally as people age, so there is no known way to avoid it. It brings the near points and far points closer together to a narrower range.
25. What is presbyopia? Can you avoid it? What does this do to the near and far points?
If mounted 10 cm = 0.1m in front of the lens, they would need -10 D lens (1/.1 = 10D). If they have -2 D of correction, they would need 8 D lens.
27. What type of lens does a virtual reality display require if the display monitors are mounted 10 cm in front of the lens and are intended to be viewed at infinity? What type of lens is required if the observer normally requires -2 diopters of correction?
Astigmatism is a defect in the eye or in a lens caused by a deviation from spherical curvature, which results in distorted images, as light rays are prevented from meeting at a common focus. In other words, astigmatism is an imperfection in the curvature of your cornea or in the shape of the eye's lens. Normally, the cornea and lens are smooth and curved equally in all directions, helping to focus light rays sharply onto the retina at the back of your eye. However, if your cornea or lens isn't smooth and evenly curved, light rays aren't refracted properly. This is called a refractive error. Axes of lens is stronger in some areas than others. Can be fixed by contacts if the astigmatism is on retina. With the "bicycle wheel" test there will be clear vision on some lines but some lines will appear blurry to those affected.
28. What is astigmatism?
1. The surface of skin is extremely complex; it can be uneven, discolored, have cracks, wrinkles, and other imperfections. To account for all of these differences to make realistic skin is meticulous, time consuming, and difficult. Skin does not only reflect light on its surface, light also passes through the outer layers and is reflected back from underlying structures, causing diffuse reflection from several points in a single given area and at various depths. To account for the various luminance sources and skin variations, graphics programs to create accurate skin must be extremely complex. For example, they must consider diffuse light and specular light. 2. Humans are very accurate at judging how skin is reflecting light; our brains are good at understanding the complex physics.
29. Why is skin so hard to model in computer graphics (two reasons).
1. Measure the length of your arm and thumb. 2. Visual angle (theta) = 2 * arctan((thumb / 2) / arm) 3. For everything else, hold up the thumb at arm's length to the object being measured and count the number of "thumbs" wide the object is. Multiply by the thumb's visual angle.
3. Calculate the visual angle of your thumb at arms length, your big toe standing up and the moon. Show your calculations. (Lecture Notes 2)
Ratios Computation: log (center) - log(surround) = ratio OR log (center / surround ) Good: Makes lightness constancy better by making it more accurate (ratio is constant but difference is not); allows you to have contrast constancy regardless of the amount of light
30. How does the early visual system compute ratios? Why is this a good thing?
Shadows, in conjunction with the qualities of illumination, can help determine the material and depth of objects given specific cues. Two examples, with two cues, are given below: The penumbra is the fuzzy outline that is expected on the edge of a shadow. It helps form a relative boundary that notifies the viewer it is not an object or discoloration on its own. An example could be the graphically created shadow shown in class. When there is a more distinct penumbra, the brain interprets the dark area to be a fuzzy shadow. With a cleaner edge, the darker area can look more like an oil spill. Shadows also help us interpret the way in which an object is moving. This is especially relevant in computer graphics and animation. Depending on the shape or direction of movement of an object's shadow, the viewer will perceive its movement differently.
31. Provide two examples that demonstrate that shadows play a role in the interpretation of objects.
Plastic vs Paper Balls - Plastic ball has higher specular reflectance than the paper or soft rubber ball. Capturing Skin - Skin is hard to capture in computer graphics because of the uncanny valley where a computer-generated figure looks close to a human but not human enough, so it arouses a sense of unease or revulsion in the person viewing it; humans are very good at determining material surfaces Specular Reflection - The mirror-like reflection of light from a surface in which light from a single incoming direction (a ray) is reflected into a SINGLE outgoing direction; the bright white spot on a shiny object Diffuse Reflection - The reflection of light from a surface such that an incident ray is reflected at MANY angles rather than just one angle like specular reflection; the light spread out over a dull object
32. What makes a plastic ball appear different from a paper or soft rubber ball? Why is skin so hard to capture in computer graphics? What is a specular reflection? What is a diffuse reflection?
1 cycle = 1 light bar and 1 dark bar. First count the number of cycles, then measure how many degrees across the grating is (If your thumb is 2 degrees across, and it takes 14 thumbs across, then that's 28 degrees. X number of cycles/28 degrees = spatial frequency) Our acuity limit is 100 pixels/degree, so for 28 degrees you would need 2800 lines (each line is representative of a pixel)
33. Provide a rough estimate of the spatial frequency of the grating shown on the blackboard (in cycles/deg) and explain how you arrived at this estimate. Approximately, how many lines would there need to be to approach your acuity limit? (also tell us where you are sitting).
Light enters the eye and is picked up by rods/cones which is processed differently due to the periphery and fovea systems. This will determine the acuity, color, saturation, etc of what we see. The superior colliculus is a region in the brain that is closely connected to visual processing and eye movements. Pathway Map: Light hits the retina, which stimulates photoreceptors (rods and cones). This information is integrated by the ganglion cells and flows through the optic nerve to the optic chiasm where the information from the left visual field goes to the right Lateral Geniculate Nucleus, LGN, and the information from right visual field to the left LGN. 90% of information from here goes to the visual cortex, but 10% goes to the superior colliculus, which functions to help orient the eyes and head.
34. Provide a simple map of the pathways that take visual information from the eye to the brain. Include a discuss of the left and right visual fields and the superior colliculus
Hypermetropia = Hyperopia = Farsightedness (not enough power in visual system) Caused when eye is shorter than normal or cornea is not curved enough, or lens sits back farther in eye than it should. Also can be caused by diabetes or rare eye tumor or glaucoma. (and genetics!) Remedied with positive lens Myopia = Nearsightedness (too much power in visual system) Caused when eye is longer than normal or cornea is too curved. Also can be caused by: close work, lack of sufficient bright light, genetics Remedied by negative lens Astigmatism = a defect in the eye or in a lens. Caused by a deviation from spherical curvature. Results in distorted images, as light rays are prevented from meeting at a common focus. Remedied by special contact lens if on retina
35. Distinguish between hypermetropia, myopia, and astigmatism including the symptoms, causes, and possible remedies of each.
Cornea acts as the eye's outermost lens. It controls and focuses the entry of light into the eye and refracts it onto the lens (contributes between 65-75% of the eye's total focusing power). The lens further refocuses that light onto the retina, a layer of light sensing cells lining the back of the eye, that starts the translation of light into vision. Cornea is fixed focus while lens can adjust to different depths of vision.
36. Describe the roles that the cornea and lens play in focusing an image.
Younger people have flexible lenses that adjust to focus on images, while older people's lenses no longer have the ability to adjust (this is called presbyopia). VR headsets currently have fixed focal lengths, which becomes an issue for young people because their eyes will constantly try to adjust, while older people's eyes won't. If younger people try to reach for close things, for example, their eyes will try to adjust and things go out of focus.
37. Why is accommodation a problem for virtual reality. Why is this not a significant problem for those older than 60?
An eye tracker can help reduce bandwidth by only having proper acuity where the observer is focusing. This concept is great for simulators because there is typically one participant, needs to be efficient, and eye tracking makes it so in a simulator, an action is only given to a user when they are looking at a particular location. Eye tracking is not good for televisions or cinemas where there is more than one observer present, and the view doesn't change (view is fixed, so eye tracking isn't necessary). Saccadic suppression is the phenomenon in visual perception where the brain selectively blocks visual processing during eye movements in such a way that neither the motion of the eye (and subsequent motion blur of the image) nor the gap in visual perception is noticeable to the viewer. Saccadic suppression (during eye movement, we are less sensitive to visual input) helps the designer of the simulator by allowing a small amount of time (delay) to re-adjust the acuity focus to the new spot before the viewer notices.
38. How can an eye-tracker be used to make a display more efficient? Why is this concept useful to designers of simulators but not very useful to designers of televisions or cinemas? In what way does saccadic suppression help the designer of the simulator.
The gamma of a display is the distribution of intensity, or how intensely bright a color will look. Our eyes perceive brightness non-linearly, and gamma converts from linear, absolute brightness to perceived brightness, storing values more efficiently. (Output = inputϒ; γ = gamma) Typical display has gamma of 2. Gamma of the visual system is about 0.5.
4. What is the gamma of a display? (Lecture Notes 2)
Electro-oculograph - Non-invasive and mostly used for sleep research, it uses measures in the form of small electrical charges to determine when your eyes move. Disadvantage - You can determine when the eyes move, but not where. Crude and highly imprecise. Infrared Sensing - Sensors first determine the ratio between the sclera and iris of each eye and track when the ratio changed to determine eye movement. Done by sending infrared beams onto the eye. Used to be the most popular method. Disadvantage - Not very accurate, helmet required, which can impair vision Contact lenses - a contact lens with a rod attached to its center is placed in one eye while the other is left alone to allow unimpaired sight. The direction of the rod is physically recorded to determine what direction, and where, the eyes go. Disadvantage - painful, imprecise Purkinje Eye Tracker - Most popular for research due to its high accuracy and precision. It uses a bite bar to still the head, and measures the reflections of light on different parts of the eye. The following images are taken from different parts of the eye: 1st Purkinje image - front of cornea, 2nd - back of cornea, 3rd - front of lens, 4th - back of lens. These help determine where the eye is looking. Disadvantage - Extremely expensive and time consuming, not very feasible for business purposes Image Processing Techniques - Comes in a variety of forms, the most widely used form today. It also allows for head movements. It overlays a movie of where one's eye moved with a movie of where they were looking. (an image of the eye and an image of the world, which are compared to determine where the user was looking). Disadvantage - Can be difficult to code and put together for analysis
41. Describe three different ways of measuring eye-movements and a disadvantage of each.
Fixating: Images of low frequencies can disappear after your eyes fixate for a long time because they adapt to the stimuli presented to you. But because of your saccades/microsaccades, tremors, and drift, an image doesn't normally disappear if you fixate very accurately at one point. Make disappear: Blur the edges so small changes in eye movement don't change the image much Use an eye tracker to stabilize your eyes
42. Why doesn't an image normally disappear if I fixate very accurately at one point? What does one have to do to make it disappear?
Measuring CSF: Usually determine someone's CSF (contrast sensitivity function) by showing them sine gratings with varying degrees of CONTRAST and varying SPATIAL FREQUENCY. We can then make a curve by plotting contrast threshold on y-axis and spatial frequency on x-axis for varying spatial frequencies. (Method of Adjustment) Usefulness: CSF is useful because we can find where a person is most able to identify contrast based on spatial frequency, and contrast does not need to be as extreme for us to identify patterns. Contrast-sensitivity tests can provide useful info by revealing in some conditions, visual loss is not identifiable through visual acuity tests, by providing another method of monitoring treatments, and by providing a better understanding of visual performance problems faced by people with visual impairments. (calvin)
45. How does one measure the contrast sensitivity function (CSF) of a human observer? Why is such a function useful?
Perceptual Differences: Most modern TVs and monitors have a refresh rate of 60Hz, meaning they redraw the displayed frame 60 times a second. This means that for games with 30 FPS, each frame is displayed twice. Meanwhile, 60 FPS games have each new frame perfectly matched up with the screen's refresh rate, so any flickering is reduced and inputs are translated faster into on-screen actions because the game is polling your inputs twice as much. This results in a smoother, more responsive experience for players. Frame Rate: The faster the movement, the more you notice, so the faster the frame rate, the better the movement. How high of frame rate you need depends on the size of visual field and speed of motion; 120 FPS is probably good.
46. Current gaming systems have frame rates of 30 or 60 FPS. How do these differ perceptually? How high of a frame rate do we need?
Blur: In the space time plot, we replicate the same thing (image) a few times and blur to fill in the gaps (reduce the flicker during the wide movement). Shutter gets rid of gaps in time so there's no flicker but there are still gaps in space, so we use motion blur. However, this makes the image look blurry, so we sample the image more often. We don't notice the blur because vision acuity lessens with motion.
47. Use a space time plot to explain why directional blur is sometime introduced into movies or games with fast motion. Why don't we notice the blur?
TV is 1.5m, distance is 1m tan(theta) = 1.5/1 = 1.5 Theta = 56.31 degrees Assuming 100 PPD, highest visible resolution is... 56.31 degrees * 100 PPD(Lines/Degree) = 5,631 lines
49. For a 1.5 meter television screen at 1 meter, how many lines are required to be at the highest visible resolution (assume 100 PPD).
Real World: A typical scene is 300 - 600 to 1 limit but can be 20 to 1 up to 20,000 - 30,000 to 1 range Modern LCD: 1000 - 5000 to 1 limit Hard Copy: 30 to 1 limit
5. Roughly, for a single image, what is the range of intensities found in the real world, modern LCD televisions and hard copy?
X pixels/180 degrees = 100 pixels/degree X = 18000 pixels X pixels/120 degrees = 100 pixels/degree X = 12000 pixels (for one eye) For binocular vision, we need 2x the pixels because displays for each eye overlap. There are 12000px for left eye and 12000px for right eye, so there are 24000 pixels across the entire display.
50. For virtual reality glasses with a visual field of 180 degrees, how many pixels across are needed to be at the acuity limit. If the display is binocular with 120 degrees in each eye, how many pixels are required for each eye?
Anti-aliasing - Anti-aliasing is a method which can produce smoother looking images with low resolution. Purportedly smooth lines are often rendered with jagged edges (aliasing), and anti-aliasing smooths out these jagged lines and removes aliasing in images. This is done because of what is called the Nyquist Limit, or the limit of the camera's ability to capture higher frequencies of an image, which translates into the image's resolution. Higher frequencies are reflected back to lower frequencies, which is when aliasing occurs. To eliminate the appearance of this: 1. Take a high resolution image, higher than what is desired if possible. 2. Blur the image to remove unwanted frequencies before aliasing. 3. Subsample by going down to the needed resolution, producing a smooth image that appears high resolution. AKA 1. Render the graphic at a higher resolution, sample at each pixel and then decrease its size to the original with a scaling filter by averaging out energy (sum of all the grey levels) 2. This emulates a higher-resolution display by using more than two colors in the graphic (there will be shades of gray on the edges as well). The additional colors smooth the edges to a great extent. Anti-aliasing works well when a high number of gray levels are permitted.
51. Describe the steps for producing anti-aliasing on a computer screen. Why is this done and what does it achieve?
Dithering is an intentionally applied form of noise used to randomize quantization error, preventing large-scale patterns such as color banding in images (attempt by computer program to approximate a color from a mixture of other colors when the required color is not available). It is representing a color with a mixture of other colors in close proximity. It is used when the number of intensity levels is not high, but the resolution is high. It is effective perceptually because human resolution is limited so the mixture of colors is perceived as one.
52. What is dithering? Why is it effective perceptually?
CSF (contrast sensitivity function) measures the ability to discern between luminances of different levels in a static image. CSF for infants quickly develops in the low frequencies, but as they get older, they get better at identifying the higher spatial frequencies (contrast sensitivity in infants is poor but steadily improves with age). This means infants can see larger forms just as well, but have less ability to detect fine patterns or detail. Before their visual acuity is fully developed, infants basically see a lot of light grey instead of high contrast.
53. What does the CSF say about the differences between the visual system of the infant and the adult?
Yes, natural scenes are redundant. (redundancy = predictability) Three forms of statistical redundancy (properties of natural signals): 1. Nearby points in most data sets are highly correlated 2. Data are sparse, events are localized in time and space (edges); natural scenes have sparse structures not described by the correlations between pixels; edge detection in dissimilar/far points 3. Data show continuity across space, time and scale (spatial frequency) Understanding Compression: Because nearby points are highly correlated, we can compress the data to represent it with the minimum number of units. Compression takes advantage of the redundancy of data by sending only the info that match the image coding ability of the observer. Visual System Advantage: The visual system takes advantage of this by only looking for differences in the image and looking for edges. 1. Spatial resolution - responds only to local differences, periphery only long range differences 2. Temporal resolution - responds to only local changes over time
54. Are natural scenes redundant? Describe 3 forms of statistical redundancy. Why is this important for understanding compression? Describe how the visual system takes advantage of this?
Scotopic: Much lower contrast threshold than photopic conditions. Can only see up to about 5 cycles/degree Mesopic: Up until ~ 1 cycle/degree it follows photopic conditions, but higher spatial frequencies drop off more quickly than photopic conditions and we cannot see more than about 20 cycles/degree Photopic: Our contrast threshold peaks at around 6-8 cycles/degree and becomes undetectable at 100 cycles/degree.
55. Explain the contrast sensitivity function under scotopic, mesopic and photopic conditions.
Flicker above the critical fusion frequency can still have an effect on a human observer because we can still pick up jitter. If there is movement, you are sampling them (Sick building syndrome from lights or when there is motion, we can pick it up). Motion interacting with flicker can cause strobing effect (can test flicker by waving hand in front).
56. How can flicker above the critical fusion frequency still have an effect on a human observer?
Yes, it is possible to compress an image to below 1 bit/pixel. Why: Given some number of pixels in image, we can use some number of descriptors (patches with 10 bits/patch) to compress the image. If number of descriptors goes below number of pixels in your screen, and each is one bit, then you can go below 1 bit/pixel (if (# patches * 1 bit/pixel)/total pixels of original image < 1 bit/pixel).
57. Is it possible to compress an image to below 1 bit/pixel? Explain.
Sparse representation does not mean that we have a small number of neurons. It means that there are a small number of active neurons (think of rich language: use a small number of rules to describe a large representation). Similarly, only a small number of neurons need to be used to code the world. Brain has developed a "rich language" using sparse representation.
58. What does it mean when we say that the visual system uses a "sparse" representation?
We see inference colors in an oily puddle because the thickness of the oil film is in the range of the wavelengths of visible light. The oil film is thick in the center and thinner toward the edges. Over different ranges of oil-film thickness, between 370 and 730 nanometers, different colors of light are reflected in and out of phase from the top and bottom surfaces of the oil film.
59. Why do we see colors in an oily puddle? (L)
Rods are responsible for night vision and since the fovea (~2 degrees) is tightly packed only with cones, it is useful to look two or three degrees off target so that the low-light sensitive rods are able to pick up the target. This is because the periphery acknowledges different kinds of data (due to the larger concentration of rods). Rods are slightly longer and able to catch more light, and can therefore pick up very dim lights, but not the colors of those lights. You rely on more rods than cones as you move off the fovea (all cones). If you look a little off axis, it's easier to see by activating your rods.
8. Under dim conditions (e.g., finding a star at night), it can be useful to look two or three degrees off the target. Why? (Lecture 3)