Chapter 4: Object Recognition
Viewpoint Invariance
1. Property of object that doesn't change when an observer changes viewpoint 2. Class of theories of object recognition that proposes representations of objects that don't change when viewpoint changes
Common Region
2 features will group if they appear to be part of the same larger region
Object Recognition
What Pathway • V1 - cells respond best to lines & edges • V2 - early steps from local features to objects • V4 - interested in complex attributes • Functional imagining studies show different regions of cortex are activated better by some categories of stimuli than by others Moving from V1 into IT - neurons respond to more & more complex stimuli • By area V4 - interested in stimuli such as fans, spirals, pinwheels, different corners, etc. • After extrastriate cortex - processing of object info is split into what & where pathways Process 1. Determine features present in image (low-level) - line segments, edges, etc. 2. Group features into objects (middle) - gestalt, etc. 3. Match perceived to encoded representations (high-level) - neural processes
Connectedness
• 2 items will tend to group if they are connected
Face Recognition
• 2 steps 1. Recognition of face parts (eyes, mouth, nose, etc.) 2. Recognition of spatial configuration of these parts • Steps happen almost independently
Grandmother Cell
• Any cell that seems to be selectively responsive to 1 specific object • Likely to involve large network of cells with individual cells participating in recognition of more than 1 stimulus
Geon
• Any of the geometric ions out of which perceptual objects are built • Represent structure of object, not what it looks like from 1 view • Combine geons into geometric objects • Identification is difficult if geons are obscured (vertices deleted) • Deleting midsections but not vertices doesn't hurt identification • Ground truth - what they actually look like
Recognition-By-Components Model
• Biederman's model of object recognition • Holds that objects are recognized by identities & relationships of their component parts • Set of geons could be basic building blocks of perception of objects in world • Visual system recognizes objects on basis of relationship of geons, regardless of how geon is oriented in space
Homologous Regions
• Brain regions that appear to have same function in different species
Feed-Forward Process
• Carries out computation (ex: object recognition) one neural step after another, without need for feedback from later stage to earlier stage
Texture Segmentation
• Carving an image into regions of common texture properties • Visual system looks at statistics of all features in one region - determines that those statistics differ from the statistics in neighbouring region
Illusory Contours
• Contour that is perceived even though nothing changes from 1 side of it to the other in an image • Looks like contour is present even without physical evidence at that location • Ex: see the house when it's just a bunch of pacman-like circles
Relatability
• Degree to which 2 line segments appear to be part of the same contour • Visual system is unwilling to propose elaborate relationships - relate lines by simple elbow curve - 2 bends are less likely than 1
Structural Description
• Description of object in terms of nature of its constituent parts & relationship between these parts • Biederman's Recognition-by-Components Model • Problems: • Real object recognition isn't viewpoint-independent • No geon-like primitives would work over all objects (faces, animals, etc.) • Observers how viewpoint effects in object recognition • Farther an object is rotated away from learned view, longer it takes to recognize it • RBC predict viewpoint invariance - many empirical studies have found viewpoint dependence • Participants shown novel objects composed of geon-like structures - exposed to rotated versions - had to recognize whether same or different - rt depended on angle by which object was rotated (further away = longer rt)
Dynamic Grouping Rules
• Elements that are changing
Synchrony
• Elements that change at same time tend to group together
Common Fate
• Elements that move in same direction tend to group together
Agnosia
• Failure to recognize objects in spite of ability to see them • Lesion in IT • Typically due to brain damage • General agnosia - inability to recognize objects • Prosopagnosia - inability to recognize faces • Inanimate agnosia - inability to recognize inanimate objects • Place agnosia - inability to recognize places • Mirror agnosia - inability to understand mirrors • Indicates different tissue in brain responsible for processing each of these types of stimuli - anatomical separation in processing - separate agnosia for specific injuries
Reverse-Hierarchy Theory
• Fast, feed forward processes give you crude info about objects & scenes based on activity in high-level parts of visual cortex • Become aware of details when activity flows back down hierarchy of visual areas to lower-level areas where detailed info is preserved • Re-entrant feedback & processing • Pathway for object recognition runs in both directions • Initial object recognition can occur very quickly (150 ms) - not the end of the story • Brain continues to process info sending signals up & down where pathway • Conversation among many parts of brain rather than 1 way progression
Nonaccidental Features
• Feature of object that isn't dependent on exact (or accidental) viewing position of observer • Y junction - indicate corners facing observer • T junction - top of T is in front & stem of T is in back • Arrow junction - indicate corners facing away from observer • Provide clues to object structure
Global Superiority Effect
• Finding in various experiments that properties of the whole object take precedence over properties of parts of object • Ex: Navon letters
Border Ownership
• For a given boundary, which side is part of object vs background • Object like black square sitting on background, edges defining the border between object & background belong to the object • V1 neuron would respond equally to both visual input (same content in receptive fields) • V2 might respond more to one where black edge is owned by square
Good Continuation
• Gestalt grouping rule stating that 2 elements will tend to group together if they seem to lie on the same contour • Some contours in image will group because of good continuation • Statistical basis - likelihood of lines occurring next to horizontal line by position & orientation
Proximity
• Gestalt grouping rule stating that tendency of 2 features to group together will increase as the distance between them decreases
Similarity
• Gestalt grouping rule stating that the tendency of 2 features to group together will increase as the similarity between them increases • Can be based on size, colour, orientation, aspects of form
Closure
• Gestalt principle that holds that a closed contour is preferred to an open contour
Perceptual "Committees"
• Gets together & voices opinions about how stimulus ought to be understood • Opinions collide - result can be ambiguous • Rules - honour physics & avoid accidents • Visual systems know physical principles implicitly - ex: understand that solid objects block light
Camouflage
• Getting features to group with features of environment - persuade observer your features don't form perceptual group of their own • Organism's attempt at breaking Gestalt rules so that its features aren't perceived as object on own, but as part of larger object • Avoid detection • Sometimes used to confuse viewer rather than hide object • Dazzle camouflage in WWII - confusing design on ship - hard for viewer to determine what they're looking at
Structuralism
• Group of thought that believed complex objects or perceptions could be understood by analysis of the components • Wilhelm Wundt, Edward Titchener
What/Ventral Pathway
• Heads down into temporal lobe • Locus for explicit acts of object recognition • Receptive fields get much bigger - what is in view seems more important than where it is • V4 & IT • Concerned with names & functions of objects regardless of location
Where/Dorsal Pathway
• Heads up into parietal lobe • Important for processing info relating to location of objects in space & actions required to interact with them (ex: moving hands, eyes , etc.) • Important role in deployment of attention • Action, navigation & attention • Concerned with locations & shapes of objects but not their names or functions
Inversion Effect
• Imagined inversion (upside down) hurts face recognition more than object recognition • Cost of inverting particular stimuli • Cars - upright vs inverted are identified as quickly • Faces - big cost - recognition decreases from upright to inverted a lot • Right configuration of faces is very important • Affects parts & configuration differently • Inverted face & inverted features - don't really notice inverted features when whole face is inverted
Subtraction Method
• In functional magnetic imagining - comparison of brain activity measured in 2 conditions - one with & one without the involvement of mental process of interest • Ex: 1 with intact object & 1 with features that could make an object • Difference between images for 2 conditions may show regions of brain specifically activated by that mental process
Lesion
• In reference to neurophysiology 1. (n) Region of damaged brain 2. (v) To destroy a section of brain
Prosopagnosia
• Inability to recognize faces • Congenital prosopagnosia - form of face blindness apparently present from birth • Acquired prosopagnosia - result of injury to nervous system
Entry-Level Category
• Label for an object that comes to mind most quickly when we identify it (ex: bird) • Subordinate level - object might be more specifically named (ex: eagle) • Superordinate level - might be more generally named (ex: animal)
Gestalt
• Literally "form" in German • School of thought stressing that perceptual whole is "other" than the apparent sum of the parts • Max Wertheimer, Wolfgang Kohler, Kurt Koffka • Don't just enumerate parts as representation of whole - something that is different than nature of the whole together
• Mid-Level (Middle) Vision
• Loosely defined stage of visual processing that comes after basic features have been extracted from image (low-level/early vision) & before object recognition & scene understanding (high-level vision) • Organize elements of visual scene into groups that we can recognize as objects • Involves perception of edges & surfaces • Summary: 1. Brings together that which should be brought together (Gestalt grouping principles, processes that complete contours behind occluders, relatability heuristic) 2. Split asunder that which should be split asunder (edge-finding processes, figure-ground assignment, texture segmentation) 3. Use what you know (implicit knowledge of physics & image formation) 4. Avoid accidents (avoid interpretations that require assumptions of highly specific, accidental combinations of features or accidental viewpoints) 5. Seek consensus & avoid ambiguity ("committee" model)
Heuristic
• Mental shortcut • Ex: relatability • Not infallible
Finding Edges
• Occasional lack of edge doesn't seem to bother visual system • Computer algorithms don't do a good job - we have no problem • Visual system knows some gaps are accidents of lighting & fills in contour • Inferential nature of contour perception
Pandemonium Model
• Oliver Selfridge - account of letter recognition • Perception by committee - demons must agree • Demons loosely represent neuron - each level represents a different brain area • See "B" in world - image processing demon (see something in world) → feature demons (each processes different features - vertical line demon, horizontal line demon, right angle, etc.) → cognitive demons (look at the output of feature demons - combine the outputs - get louder when features match totally) → decision demon (picks out cognitive demon that shouts the loudest)
Necker Cube
• Outline that is perceptually bi-stable • 2 interpretations continually battle for perceptual dominance
Inferotemporal (IT) Cortex
• Part of cerebral cortex in lower portion of temporal lobe • Important in object recognition - part of what pathway • Cells have receptive fields that could spread over vast regions of field of view • Complex recognition • Close connection with parts of brain involved in memory formation - IT cells need to learn receptive-field properties • Receptive field properties • Very large - some cover half of visual field • Don't respond well to spots or lines • Do respond well to stimuli such as hands, faces, objects • Specific neuron response to Jennifer Aniston
Problem of Object Recognition
• Pictures are just a bunch of pixels on screen - we still perceive ex: houses • How do we recognize different looking houses? • We have processes that successfully combine features into objects • Retinal ganglion cells & LGN - spots • Primary visual cortex - bars of of different orientations • How do spots & bars get turned into objects & surfaces
Figure-Ground Assignment
• Process of determining that some regions of image belong to foreground object (figure) & others are part of background (ground) • Critical step on path from image to object recognition - object recognition starts before figure-ground assignment finishes • Meaning can influence result of assignment • Not just bottom-up - also top-down • Ambiguous cases - ex: vase/face figure
Decoding
• Process of determining the nature of a stimulus from the pattern of responses measured in the brain or, potentially, in an artificial system like a computer network • Take fMRI scans of participant looking at many images from various known categories • Present observer in MRI scanner with range of different images - catalogue responses in brain to each image - present new image of one of these objects - use patterns of activity to try to guess identity of object
Holistic Processing
• Processing based on analysis of entire object or scene & not on adding together a set of smaller parts or features • Used in processing faces - process complex face as single thing • Concerned with precise configuration of eyes, nose & mouth
Naive Template Theory
• Proposal that visual system recognizes objects by matching neural representation of image with stored representation of same "shape" in brain • Internal representation of stimulus used to recognize the stimulus in the world • Lock-and-key representations • Perceived 'A' matches to stored representation of 'A' • Unlike its use in, for example, making a key, a mental template is not expected to actually look like the stimulus that it matches • Problem - we would need a lot of them - so many different kinds of 'A's • Used in some applications of machine vision • Postal codes - use template matching theory • Cheques - detect handwriting, depositing money • Barcodes - machine readable through template matching
Extrastriate Cortex
• Region of cortex bordering primary visual cortex & containing multiple areas involved in visual processing • Basic local properties are pulled out of image by early stages • Sophisticated tasks like object recognition require subsequent processing • V2 - receptive fields begin to show interest in properties important for object perception
Fusiform Face Area (FFA)
• Region of extrastriate visual cortex in humans that is specifically & reliably activated by human faces
Extrastriate Body Area (EBA)
• Region of extrastriate visual cortex in humans that is specifically & reliably activated by images of body other than face
Parahippocampal Place Area (PPA)
• Region of extrastriate visual cortex in humans that is specifically & reliably activated more by images of places than by other stimuli
Surroundedness
• Rule for figure-ground assignment stating that if 1 region is entirely surrounded by another, it is likely that the surrounded region is the figure
Parallelism
• Rule for figure-ground assignment stating that parallel contours are likely to belong to same figure • Many objects give rise to parallel lines
Symmetry
• Rule for figure-ground assignment stating that symmetrical regions are more likely to be seen as figure • Many objects are symmetric
Gestalt Grouping Rules
• Set of rules describing which elements in an image will appear to group together • Original list was assembled by members of Gestalt school of thought • Capitalize on certain regularities that characterize physical world • Ecological validity - work because they represent our environment
Size
• Smaller region is likely to be figure
Object Ambiguity
• Some ambiguous shapes - locally look fine but globally confusing • Accidental views - shapes in different spatial orientations - 1 specific viewpoint they look nicely arranged one lat surface - vision assumes non-accidental
Canonical View
• Some viewpoints are more easily recognized than others • Canonical - familiar • Non-canonical - unfamiliar - rarely see objects from this vantage point
Occlusion
• Something gets in the way of another thing, hiding it from our view • Explanation for illusory contour
Middle Temporal Area (MT)
• Specialized for motion processing
Relative Motion
• Surface detail movement relative to an edge
Deep Neural Network (DNN)
• Type of machine learning in AI • Computer is programmed to learn something (ex: object recognition) • First, network is trained using input for which the answer is known (ex: that is a cow) • Next, the network can provide answers from input that it has never seen before • Set of features is extracted from image • First layer - like what simple cells do • Then info is pooled like what complex cells do • Feature extraction & pooling repeated for number of laters
Object Categorization
• Useful to categorize objects in order to deploy general strategies • Based on appearance, function, taxonomy, context, etc.
Accidental Viewpoint
• Viewing position that produces some regularity in visual image that isn't present in the world • Ex: sides of 2 independent objects lining up perfectly • Perceptual committees assume viewpoints are non accidental • Ex: align 2 pencils along same contour on desk - pens by themselves don't generally align • Any slight shift in viewpoint destroys illusion of accidental viewpoint
Ambiguous Figure
• Visual stimulus giving rise to 2 or more interpretations of its identity or structure • Perceptual committees tend to obey laws of physics • Ex: Necker Cube
Bayesian Approach
• Way of formalizing idea that perception is combination of current stimulus & knowledge about conditions of world (what is & isn't likely to occur) • Visual system faced with stimulus - tries to figure out most likely situation in world that has produced this particular pattern of activity • P(A|O) = P(A) × P(O|A)/P(O) - enables us to calculate the probability (P) that the world is in a particular state (A) given a particular observation (O) • Two factors 1. How likely is what you are proposing - prior probability 2. How consistent is each hypothesis with observation