PSYC236 Final Exam - HH: Perception and Recognition

Ace your homework & exams now with Quizwiz!

Epitome of object recognition

"Quite recently he had been struck by how objects changed their shape when he walked around them. He would look at a lamp post, walk around it, stand studying it from a different aspect, and wonder why it looked different and yet the same" Gregory and Wallace (1963)

Describe the phenomenon of pictures in the brain

- 1/3 to ½ of our brain is involved in processing vision - solving the impossible problem of perception - In a sense there are pictures in the brain; Retino-topic mapping; where there's a relationship between what's on the retina and how they are spatially distributed in the cortex e.g. fMRI

What is the Direct approach to perception?

- Emphasised the richness of the visual information that is available to us; 'Ecological optics" i.e. retinal image is adequate - things that invariant in the image, test whether this source of information is actually used by people - But... Do we actually use all these sources of information? If so, how does the process them? (James J Gibson)

What is the constructivist approach to perception?

- Retinal image is inadequate; thus, we have to construct percepts from the data using assumptions, prior knowledge and learning - Hypothesis testing, you have a guess at what's out there and a theory and you fit it to your data (environmental perceptions through vision) (Herman von Helmholtz and Richard Gregory).

What are the types of distance/depth cues?

1) oculomotor cues - to do with the eye 2) Pectoral cues - in pictures 3) stereoscopic cues - binocular stereoscopic (we have 2 eyes) 4) motion cues - as in motion

The pictorial cues are...

2.1 occlusion - one thing in front of another 2.2 relative size - can tell the object is further away because its smaller 2.2a converging parallels - e.g Ponzo illusion, Railroad illusion 2.2b compression gradient - horizontal getting further 2.2c texture gradient - e.g. grass 2.3 familiar size - using people as reference for other objects 2.4 height in field - further away lower 2.5 shadows and shading 2.6 Aerial perspective 2.7 edge interpretation 2.8 image blur - further way blurrier

Psychophysics is...

A branch of psychology that studies the relationship between the objective physical characteristics of a stimulus (e.g., its measured intensity) and the subjective perception of that stimulus (e.g., its apparent brightness).

Binocular stereopsis refers to...

A depth cue based on binocular disparity. - is a term that is most often used to refer to the perception of depth and 3-dimensional structure obtained on the basis of visual information deriving from two eyes by individuals with normally developed binocular vision.

Define an illusion

A false sensory percept. Illusions of the senses, such as visual illusions, result from the misinterpretation of sensory stimuli. For example, parallel railroad tracks appear to meet in the distance (see alley problem; linear perspective). Other examples of visual illusions are apparent movement, contrast illusions, distortion illusions (such as the Hering illusion, Müller-Lyer illusion, Poggendorf illusion, Ponzo illusion, and Zöllner illusion), and the Panum phenomenon.

size-distance scaling

A hypothesized mechanism that helps maintain size constancy by taking an object's perceived distance into account. According to this mechanism, an object's perceived size, S, is determined by multiplying the size of the retinal image, R, times the object's perceived distance, D.

Emmert's law

A law describing the relationship between size constancy and apparent distance—the farther away the object appears to be, the more the scaling device in the brain will compensate for its retinal size by enlarging our perception of the object. - A law stating that the size of an afterimage depends on the distance of the surface against which the afterimage is viewed. The farther away the surface, the larger the afterimage appears.

What is perception?

A process by which individuals organise and interpret their sensory impressions in order to give meaning to their environment - output of higher level processing of sensation - the conscious experience - the process by which information about the environment is extracted

template matching

A theory of pattern recognition stating that an object is recognized as a function of its overlap with various pattern templates stored in the brain - find major axis, normalize, scale, fit to template - two big classes of object recognition schemes ones that are image or appearance based and ones that 3D model based

Explain the kinetic occlusion cue...

Accretion and deletion Accretion - the moving object gets occluded deletion - the static object gets occluded gives ordinal depth information (knowing A is closer than B)

The Ponzo Illusion

An illusion of size in which two objects of equal size that are positioned between two converging lines appear to be different in size. Also called the railroad track illusion. - The higher up yellow line will appear longer the lower down yellow line (when in fact it is not).

What is another way of expressing ' the problem' for depth?

Any point int he image corresponds to a single line-of-sight in 3D space. But none of the pictorial cues will do this, need motion or 2-eyes to do this. A lone of sight is just like the many 'rays' or 'lines' of light coming out of your eye. the light that stimulates your retina could come from any distance along the line of sight. Infinitely many points along the same line of sight project to the same point on the retina.

Holloway and Boring (1941) tested...

Apparent size as a function of distance and available depth cues - Tested whether size was simply determined by visual angle or whether it scaled with distance resulting in size constancy. • S = kRD where S is the perceived size, R is the retinal image size and D is the perceived distance between the observer and the object - demonstrated in the Ames room - size perspective trick

Stereoscopic cues refer to...

Binocular (2 eyes) - most animals don't go into binocular vision - they go into vision on the side of their head so they can see behind them - humans have overlapping eyes and have large overlapping field of view. E.g. Wheatstone (1838) mirror stereoscope two lines of sight can resolve ambiguity of depth if we are looking at a certain point that is closer to us anything closer will be displaced on the fovea, in both eyes. Will be displaced in a different way for further objects. Gives ordinal depth information (far) Gives relative depth information (near)

Pictorial cue converging parallels...

Cue within relative size - also called linear perspective - used in illusions such as the Ponzo illusion - occurs in train tracks depth info relative (how much further is B than A without knowing exact metres) static - fixed when giving info optical - involves eyes monocular - can be done with one eye

Pictorial cue relative size involves...

E.g. Assume two objects are the same size so we know the smaller one must be further away - Depth info in relative (how much further is B than A without knowing exact metres) works over far distance perspective is a combination of relative size and foreshortening

Pictorial cue occlusion involves...

E.g. square in front on circle = square occluding circle depth range is far/near = both depth information is ordinal (Knowing A is closer than B)

Explain the motion parallax

Everything is relative to the point you're fixating. With the camera thats just pointing out and moving with the car, but is very difficult for humans to do this. we can't keep our eyes fixed in on direction. we follow a stopped point e.g. tree to tree

Define 'the problem' for depth

How do we see a 3D world from a 2D image? The Retina is a form of 2D image like e.g. a painting or TV- Points x, y - any point can be specified on the retina with an x dimension and a y dimension - whereas in the world points have 3 dimensions (X, Y, Z) - so... how do we get 3 unknowns from 2 unknowns

What is 'the problem' for perception?

How do we see a 3D world on the basis of a 2D image? • The geometrical optics of how a three-dimensional scene (the distal stimulus) projects a two-dimensional image on the retina (the proximal stimulus) is well understood (i.e. the stimulus). So too is how that retinal image is transduced into patterns of neural firing by the photoreceptor cells (i.e. "sensation") • However, what the brain appears to do is solve the problem of inverse optics, that is reverse the process of image formation and give us a percept of a three-dimensional scene based on the two-dimensional image. • Inverse optics is a problem because the two-dimensional image is inherently ambiguous, in that any number of three-dimensional scenes can result in the same image.

What is meant by depth?

How far are things from me in whatever direction? - the radial distance from the observer

Absolute egocentric depth refers to...

How many exact metres away is B?

Relative depth refers to...

How much further is B than A? (Without knowing exact metres)

Is the eye a camera?

In a sense, the eye is a camera - it's a good camera but the brain helps it a lot, the dynamic range of our eye makes it a good camera - good at changing from bright to dark - works on an incredible range from brightest day on the beach to being in a movie theatre - The image/environment we see is the right way up but we initially produce an upside down retinal image and then flip it the right way up (as in a camera)

What is the distal stimulus?

In perception, the actual object in the environment that stimulates or acts on a sense organ.

Summary: What is the definition of depth? What is 'the problem'?

Information about depth can be ordinal, relative or absolute. the classical problem for vision is how we see a 3D world from a 2D image - the perception of depth is fundamentally ambiguous given any point corresponds to a single line of sight.

What is the Information Processing Approach to perception?

Information extracted from images and represented by the brain - fits with cognition You have stage and representations of the environment Modular Stages - Primal sketch 2.5D sketch 3D model Levels of explanation - computational level (what is being processed and why?), representation and algorithm (what is represented at each stage?), hardware/wetware implementation (Physiology) (David Marr)

What are the rules for image interpretation?

Labelling contours (edges) - intrinsic contours - should be processed as part of the object (provide shape) - Extrinsic contours - don't belong to the object (belong to another object)

Does foreshortening tell you about depth?

Orientation is the first derivative of depth - the rate of change of depth, slant - general point; depth perception is closely related to the perception of 3D shapes. Not dependent on size-distance relationship.

size constancy refers to...

Perception of an object as the same size regardless of the distance from which it is viewed

Pictorial cue height in field...

Position relative to the horizon is a cue to depth order and relative depth: - objects below the horizon appear to be further away when they are higher - object above the horizon appear to be further away when they are lower Range is both near and far depth information in relative seen in the Ames room

oculomotor cues- convergence

Relative inward rotation of eyes also provide cue to distance. ex. feeling the discomfort when you look at your nose - when fixating on an object you move your eyes inwards or outwards to bring the objects images onto your fovea - turning in of the eye provides information about depth e.g. going crossed when attempting to look at your nose - simple geometry shows that the angel of convergence required to fixate an object is inversely related to its physical distance from the observer. Convergence is: Ocular - involves the eyes binocular - requires both eyes static - fixed when giving info about depth range - close has potential for absolute egocentric depth

Does perception equate to the image on the retina?

Retinal Image is inadequate for perception i.e. retinal image does not equal perception. As demonstrated by seeing an up-right world but producing an upside-down retinal image

Three dimensional shapes are perceived through what means?

Slant and curvature; 1st and 2nd derivatives of depth respectively. Where slant is dependent on viewpoint and curvature is a property of the object - Depth perception needed to perceive the 3D shape of objects independent of their distance from us. - the Angle of the surface is still centred on you (differing perspectives) but curvature becomes a property of the object - if objects have different curvatures we can recognise them

visual angle

The angle of an object relative to the observer's eye

just noticeable difference (JND) refers to...

The minimal change in a stimulus that can just barely be detected

lateral inhibition in the retina refers to...

The pattern of interaction among neurons in the visual system in which activity in one neuron inhibits adjacent neurons' responses. - Actually increases the visual systems ability to respond to the edges of a surface (i.e. work out what it is and respond) - Sharpens contrasts to emphasize the borders of objects The reduction of activity in one neuron by activity in neighboring neurons The response of cells in the visual system depends upon the net result of excitatory and inhibitory messages it receives

What is the proximal stimulus?

The proximal stimulus is the optical image on the retina. - The physical energy from a stimulus as it directly stimulates a sense organ or receptor, in contrast to the distal stimulus in the actual environment. In reading, for example, the distal stimulus is the printed page of a book, whereas the proximal stimulus is the light energy reflected by the page that stimulates the photoreceptors of the retina.

How does the retina differ from a camera?

The retina differs from camera in that the retina is not evenly distributed - Our eye is often likened to a camera— both have an aperture at the front, some type of lens to focus the light, and then something that absorbs the light at the back. This may not be a bad analogy for the eye itself, but it is a bad analogy for vision. Why? Because the camera doesn't have to do very much; it doesn't have to interpret the world and act accordingly— it doesn't have to produce dogs from spots. The camera never lies, because the camera doesn't have to tell us what it sees— but our visual system does. Our visual system is not there to faithfully record the image outside, it is there to give us the necessary information for us to behave appropriately.

Is there more to perception than what meets the eye?

There is more to perception than meets the eye what does this mean? The visual system works through the retina and photoreceptors to create vision and a perception of the world around us Or is there? One theory questions the role that memory plays (if any) and whether we have bottom up or top-down processing e.g. the cognitive process of reasoning, or do we simply just have to pick up the stimuli in light and perceive the physical space around us or do we have bottom up processing; from the data you process it stage by stage until you end up with a perception, or top down; knowledge about the world used in interpreting the world through data perceived in bottom up processing.

Muller-Lyer Illusion

Two equal-length lines tipped with inward or outward pointing V's appear to be of different lengths.

perceiving size encompasses...

Two objects that differ in size can subtend the same visual angle if they are placed at the appropriate distance

How does viewpoint relate to object recognition?

Viewpoint: as you wander around a 3D object/s the viewpoint changes - objects out there in the world are constantly holding their shape but lighting, viewpoint etc. provides variation and we must recognise the object despite these changes - size, depending on viewing distance

viewing geometry

Visual angle as a measure of size - a = 2arctan (h/2d) - where 'a' is the visual angle, 'h' is the object height and 'd' is the object distance

Ordinal depth refers to...

Which is closer - Knowing A is closer than B

False Perspective is...

a technique which employs optical illusion to manipulate visual perception through the use of scaled objects and the correlation between them and the vantage point - E.g. the leaning tower of Pizza, stalking tricks, holding the Effle tower in your hand etc.

What is the direct theory of size?

a. Size invariants in the optic array b. Emphasised the possible sources of information that are available - what is available? - Horizon ratio; invariants - how much is below the horizon and how much is above the horizon

What is the constructivist/indirect theoretical approach to size?

a. Size-distance scaling. "Accounting for" distance. b. Taking account of distance c. Unconsciously infer size from retinal image and apparent distance d. Size distance scaling e. Scale according to perceived distance

Summary; oculomotor cues are...

accomodation and convergence are linked; a blurry stimulus will drive both accomodation and convergence responses. this is a problem for 3D tv and film as accomodation should always be to the screen but you converge on an object that appears nearer

depth is inherently...

ambiguous

Beuchet chair illusion

big chair back and regular legs placed apart so they line up - makes the person look tiny in a very large chair - size perspective

The proximal stimulus, the size of the image on the retina, is a function of...

both the size and the distance of the distal stimulus, the object in the world.

David Marr's Three Levels of object recognition analysis encompass...

bottom up processing 1)Primal Sketch 2) 2.5D sketch 3) 3D model - sees object recognition as the main goal of perception, to know what is where by seeing • primal sketching you start off with your raw image and full sketch you begin to get edge detection • 2.5D sketch - object centred - you're surrounded by a whole lot of surfaces - each at a certain distance and orientation - map of the surfaces in the world from your point of view - egocentric • object centred representation - stage 3: 3D models - most objects are 3D, 3D, object centred, description - advantages: 3D shape is constant (2d image is variable), need only store one model • In Marr's scheme axes are important for 3D object recognition - you also use a generalised cone - a constant cross sectional shape with variable size - so you have your description within those parameters

Pictorial cues to depth are...

called pictorial cues because they are present in pictures only - they are all: Static - fixed when giving information monocular - can be seen with only one eye optical - involves the eyes

Pictorial cue to depth shadows involves...

cast shadow --> position of shadow can determine depth of object E.g. lady giving speech on the beach on a rug and the flag shadow makes her look as if she's floating -

Ebbinghaus Illusion

causes us to perceive a circle as larger when surrounded by smaller circles and smaller when surrounded by larger circles - links with size constancy - The orange disc on the right appears larger than the one on the left, but both discs are precisely the same size.

Provide an example of basic level categorisation

chair as opposed to office chair, dining chair, arm chair etc. which is a more detailed level

What is the psychophysical evidence for oculomotor cues?

convergence - used at close distances when image cues are poor (tresillian, Mon-williams and Kelly, 1999) accomodation - coarse, ordinal information at best (monwilliams & tresillian, 2000).

Pictorial cue texture gradient...

cue within relative size - E.g. Blades of grass - size and density gradient - our image in the eye will be a function of how far the blades of grass are and the angle at which they are slanted with respect to us - this geometry affects the apparent size or retinal image and can provide cues both to absolute and relative/slant.

Pictorial cue compression gradient...

cue within relative size - when the lines get closer as they get further away horizontally depth info relative (how much further is B than A without knowing exact metres) static - fixed when giving info optical - involves eyes monocular - can be done with one eye

size perspective

depth information in scenes in which the size-distance relation is apparent

Retinal image is a function of the size and....

distance of the object - but we only get the size in our retina

We tend to see occlusion clearly and...

fill in the occluded object

size

how big or small something is

3D objects present a number of additional problems compared to 2D letters... but...

if there are viewpoint invariant features such as colour we may be able to use them to help our recognition

Pictorial cue to depth shading involves...

if you think of a simple convex (outward pointing) surface lit from below its going to lightest below and darkest above (perpendicular to the light) - shading is ambiguous - ambiguity assumptions of shading; light comes from above (assumption) therefore is it a convex lit from above or a concave lit from below And vice versa for lit from below)?

How do we determine what is object and what is background?

image segmentation - determining what is the background and what is the object Image segmentation - interpretation of this image; what you see as the figure and what you see as the ground, if you see the object and not the background then the edges belong to the object rather than the background - what is object? What is background?

Ames room illusion

involves a trapezium-shaped room that is longer and higher on one side than the other. When viewed through a peephole at the front of the room using only one eye, the room appears rectangular. The room's unusual shape and being restricted to the use of monocular vision to view it provides the basis for the illusion.

What is meant by cue combination?

no one cue dominates in all situations. Compliment and compensate for each other - provide different types of information based on different evidence (what you're seeing). the more cues the better the impression of depth.

How can we classify depth cues?

ocular or optical? - image or eye? Monocular or binocular? - can see with one eye or two? Static or dynamic? - still or changing? ordinal, relative, absolute? - ref. definition range; near, far, both? - can it be seen at what distance?

Binocular disparity is related to which depth cue?

oculomotor cue; convergence - points that are closer to us than the object have crossed disparity and points further away have uncrossed disparity

What does the Ames room tell us?

perceived size depends on perceived distance, size constancies break down the distances wrongly perceived

Binocular disparity refers to...

refers to the difference in image location of an object seen by the left and right eyes, resulting from the eyes' horizontal separation (parallax). The brain uses binocular disparity to extract depth information from the two-dimensional retinal images in stereopsis.

spatial frequency

refers to the level of detail present in a stimulus per degree of visual angle. A scene with small details and sharp edges contains more high spatial frequency information than one composed of large coarse stimuli. Frequency being encoded on the retina at that moment.

A foreshortening aspect ratio perspective encompasses...

slanting as the basis of the foreshortening depth illusion - nothing to do with size distance - a square may be perceived as rotated in depth e.g. slanting the square backwards

object recognition is very much located in the...

temporal lobe - agnosia's associated with different levels of damage in the temporal lobe

Pictorial depth cue image blur involves...

the blurrier the image looks, the further away it is perceived (and vice versa).

retinex theory is...

the cerebral cortex compares the patterns of light coming from different parts of the retina and synthesizes a color perception for each area

motion parallax is...

the interrelated movements of elements in a scene that can occur when the observer moves relative to the scene. Motion parallax is a depth cue.

Summary: Oculomotor cues

the monocular cue is related to accomodation of the lens, the binocular ocular cue is related to convergence of the 2 eyes. Both ocular cues have potential to give absolute egocentric depth but are only effective at relatively short ranges.

sensory threshold

the point at which a stimulus is strong enough to make a conscious impact on a person's awareness - the point at which increasing stimuli trigger the start of an afferent nerve impulse. Absolute threshold is the lowest point at which response to a stimulus can be perceived.

What is Sensation?

the process by which our sensory receptors and nervous system receive and represent stimulus energies from our environment - the output of lower level sensory mechanisms - the raw level encoding, like photoreceptors bouncing off the eye

receptive field

the region of the sensory surface that, when stimulated, causes a change in the firing rate of that neuron

size-distance invariance

the relation between perceived size and perceived distance: the perceived size of an object depends on its perceived distance, and vice versa

Motion cues refer to...

things closer to you move faster than things further away. This gives an impression of depth e.g. driving along trees move faster than the clouds in the distance. Size distance relationship dependent but its dynamic i.e. moves rather than is static

Pictorial cue to depth atmospheric perspective involves...

things in the distance look blurrier and bluer than things in the near distance. - physical basis; scatter of light makes things further away blurrier --> transmission of light through the atmosphere - An example is the Bart Anderson Chess men

Oculomotor cues- accomodation

to focus your vision sharply onto a close object you have to change the shape of your lens. Cillary muscles relax to make the lens thinner if the object is further away from the observer. conversely, ciliary muscles contract to the lens fatter if the object is near to the observer. muscles that distort the object/vision. Accomodation is: Ocular - involves the eyes monocular - works with 1 eye static - fixed when giving info range; close absolute egocentric depth

Oculomotor cues to depth

two main oculomotor cues to depth - where ocular refers to eye rather than image 1.1 angle of convergence between the 2 eyes 1.2 amount of accomodation of the lens both based on information from the eye muscles (ciliary muscles) both can potentially provide absolute egocentric depth at close distances (know exactly how far in metres B is from you)

Pictorial depth cue Familliar size involves...

under certain conditions, knowledge of an objects true size can influence our perception of its distance from us. Epstien (1965) experiment - observers were presented with equal size photos of a dime, quarter and 50 cent piece in a darkened room - where the same physical size of the coin in the photos made observers perceive the dime closer and the 50 cent coin further away because of their knowledge of relative size of the original coin.

The appearance of an object varies greatly as a function of...

viewing conditions such as lighting and viewpoint - Viewing conditions - size, position, occlusion, distance, lighting, viewpoint - important both for objects and faces

Pictorial depth cue edge interpretation involves...

we cannot see a 3D cube as flat - how is it that we get depth from a line drawing? - it's got the foreshortening if you can connect the surfaces - its note size distance relationship dependent. different junction types - E.g. Y junction Edges signal discontinuities in depth

Pictorial cue to depth height in ground plane cue involves...

whether its attached to the ground gives an idea of depth - not attached to ground could mean further away (floating). Occurs as a result of lighting.

The neuropsychological segment of object recognition involves...

• Agnosia: apperceptive, associative - refer to later cognition lectures • Inferotemporal (IT) cortex: "what" pathway - Ventral stream

What are the key tasks within object recognition?

• Edge extraction, perceptual organization, image segmentation • View normalization, structural description • Matching to stored description/representation

Define object recognition...

• Incoming image matched to stored representation • "The ability to rapidly report object identity or category after just a single brief glimpse of visual input" (DiCarlo & Cox, 2007)

Empirical approaches to perception include

• Physiology, anatomy • Computational modelling, machine vision • Neuropsychology • Psychophysics


Related study sets

CHAPTER 1 - INTRODUCTION TO DRUGS - PrepU Practice Questions

View Set

Chapter 46 - Nursing Care of the Child With an Alteration in Cellular Regulation

View Set

managerial accounting exam practice learnsmart

View Set

HCS 220 Test Bank Questions - Test #2

View Set

What is Generative AI? - Chapter Quiz

View Set

Data Science Interview Questions

View Set