
Shape Processing in Higher Level Visual Cortex
Shape perception is an integral aspect of conscious experience. It is our primary means of identifying important items in our environment. Right now you are recognizing and interpreting alphanumeric symbols in order to understand this text. If you look away from the screen, you will be recognizing objects in the room and perhaps other individuals. All of this seems immediate and effortless, but shape perception is computationally very difficult-machine vision systems have not come anywhere near human performance.
What makes shape perception so difficult? First, any given object can present an infinity of different images to the retina, and your visual system has to somehow map all those images to one category. For example, the images in Fig. 1 differ completely in size, orientation, degree of occlusion by other objects, and color. Somehow, your visual system correctly maps all these very different images to the category “dolphins.”
Second, there is a virtual infinity of objects in the world, and you have to be able to perceive each one, store it in memory, and recognize it when you see it again. Fig. 2 shows a small sample of the endless variety of shapes you can identify based on previous experience.
All of this is accomplished by a large but finite number of neurons. There aren’t enough neurons in the brain to have one for each object, much less one for each retinal image of each object. So a simple neural lookup table of everything you’ve ever seen would be impractical. The visual system must have a neural representation scheme with the capacity to encode an infinity of shapes and the flexibility to handle different retinal images of the same shape.
There are lots of theories about how that might be done. Most of these theories depend on the idea of structural representation, or representation by parts.
Put in its simplest form, this is the idea that the neural description of an object is something like a list of parts that constitute that object, with the shapes and relative positions of those parts specified. For the dolphin in Fig. 3, the parts-level description might be something like:
•Sharp convex projection near the top
•Broad convex curve at the upper right
•Broad convex curve at the upper left
•Sharp convex projection at the left (etc.)
This kind of parts-level or structural description has the capacity to represent an infinite number of shapes, because there can be an infinite number of part combinations (just as 26 letters of the alphabet can combine in different ways to make an infinite number of words). It has the flexibility to handle changes in the retinal image you can change size, position, orientation, even posture and still the parts list will be more or less the same.
Of course real neural representations wouldn’t be discrete like word descriptions; they would involve lots of neurons with gradual tuning for things like boundary curvature fragments. We have found that there are lots of neurons like that in higher areas of visual cortex-V4, PIT/TEO and CIT/TE-that seem to represent not entire shapes but parts of shapes. For example, Fig. 4 shows the responses of a V4 neuron to a large set of shape stimuli in which boundary curvature was parametrically varied. The response to each shape (in spikes per second) is indicated by the gray level of the background circle (darker means higher responses). This cell responded to shapes containing sharp convex curvature at the lower left, flanked by shallow concave curvature near the bottom.
We quantify this kind of shape tuning with mathematical functions in boundary curvature space. The responses in Fig. 4 were characterized with the tuning function shown in Fig. 5.
Each subplot has a 2-axis domain. The curvature axis runs from –0.5 (concave) through 0 (flat) to 1.0 (sharp convex). The position axis runs from 0° (to the right), through 90° (top), 180° (left) and 270° (bottom). The tuning function peak lies at curvature 1.0 (sharp convex) and angular position 225° (lower left). The columns of subplots represent different levels of curvature in the counter-clockwise direction (in this case, towards the bottom). The peak in this dimension is at curvature –0.15 (shallow concave). The rows of subplots represent different levels of clockwise curvature (this cell was not tuned in this dimension).
Thus, this neuron is tuned for sharp convexity to the lower left flanked by concavity near the bottom. This cell would contribute to encoding shapes like the dolphin in Fig. 3 its response would signal the sharp nose and the shallow concave ventral surface. Our recent experiments in PIT/TEO and CIT/TE confirm that tuning functions based on parametric shape stimuli are highly predictive of responses to images of natural objects. Our analyses at the population level show that combined signals of neurons tuned for different curvature fragments yield complete shape representations.
A controversial issue in shape theory is whether the visual system represents shape in two dimensions or three. Our recent experiments show clear tuning for 3D orientation of elongated image elements (like edges and lines). The V4 cell represented in Fig. 6
was studied with bar-shaped stimuli at various 3D orientations. Stereoscopic presentation (slight image differences between the two eyes) was used to convey 3D shape. Orientation in Fig. 6 is indicated with perspective-each bar is plotted on the surface of a dome projecting out of the screen (and tilted upwards slightly to enhance the perspective effect). Response level is indicated by color (blue represents weak responses, yellow represents strong responses). This neuron was tuned for diagonally-oriented stimuli with the far end at the lower right and the near end at the upper left. This kind of tuning suggests that the visual system uses stereoscopic depth information to appreciate the 3D structure of objects.
Ongoing experiments in V4, PIT/TEO and CIT/TE are directed at more complex aspects of 2D and 3D shape tuning. Our ultimate goal is to understand the neural code for object shape at a quantitative/algorithmic level. This would represent a major step in analyzing the neural basis of conscious experience.
| Brincat, S.L. & Connor, C.E. (2006) Dynamic Shape Synthesis in Posterior Inferotemporal Cortex. Neuron. 2006 Jan5;49(1):17-24   PDF |
| Brincat, S.L. & Connor, C.E. (2004) Underlying principles of visual shape selectivity in posterior inferotemporal cortex. Nature Neuroscience 7: 880-886   PDF |
| Pasupathy, A. & Connor, C.E. (2002) Population coding of shape in area V4. Nature Neuroscience 5: 1332-1338   PDF |
| Hinkle, D.A. & Connor, C.E. (2002) Three-dimensional orientation tuning in macaque area V4 Nature Neuroscience 5: 665-670.   PDF |
|
|








