When we view a scene, the world seems to be filled with objects that have particular shapes, colors, and material properties. The primary source of information that we use to acquire information about our world is visual, which relies on the light reflected off of object surfaces to a point of observation. Thus, our knowledge of object structure -- or any aspect of our visual world -- is determined by the structure of the surfaces of objects, since it is here that light interacts with objects. Surface perception refers to our ability to use the images projected to our eyes to determine the color, shape, opacity, 3-D layout, and material properties of the objects in our environment. In this discussion, some of the basic problems studied in this domain are briefly introduced.
The problem of surface perception is to understand exactly how the visual system uses the structure in light to recover the 3-D structure of objects in the world. A solution to this problem requires that the visual system untangle the different causes that operate collectively to form the variations in luminance that project images to our eyes. The reason this problem is so hard is that there are a number of different ways that the same image could have been physically generated. Consider, for example, the problem of recovering the apparent lightness of a surface. The same shade of gray can be created by a dimly illuminated white surface, or a brightly illuminated black surface. Yet we seem to be remarkably good at untangling the contributions of illumination from the contributions of reflectance, and recovering the lightness of a surface. One of the major areas of research in surface perception is in LIGHTNESS PERCEPTION, which is one of the oldest areas of research in vision science. Yet even today, we are only beginning to understand how the photometric and geometric relationships in an image interact to determine the perceived lightness of a surface.
Another primary difficulty in recovering surface structure is in classifying the different types of luminance variations that arise in images. Consider the problem created by understanding the cause of a simple luminance discontinuity. Abrupt changes in luminance can be generated by occluding contours, shadows, or abrupt changes in the reflectance of a surface. An incorrect classification of luminance edges would lead to a variety of perceptual disasters. For example, consider a scene in which a face is brightly illuminated from the left, casting a strong shadow on the person's right cheek. If the boundary of this shadow is treated as an object boundary, the person's face would be split in pieces, and it would probably be impossible to recognize the underlying face. If the shadow boundary was interpreted as a change in the reflectance of the surface, the person's face would appear to have a large, dark stain. Both forms of misclassification would lead to distinct errors. In order to perceive the person's face as a homogeneously pigmented single object, the visual system must be capable of correctly classifying the shadow boundary as a shadow boundary, which can then provide information about the 3-D surface that generated the shadow.
A related problem arises when attempting to use luminance gradients ("shading") to determine perceived shape. The "shape from shading problem" refers to the difficulty in using luminance variations to reveal 3-D shape. However, the amount of light reflected off a 3-D surface to a point of observation depends on a number of variables, including the position and intensity of the light source, as well as the orientation and reflectance function of the surface. In order to use the luminance variations in an image to recover surface structure, the visual system must distinguish luminance variations due to changes in 3-D shape from changes in surface reflectance or changes in illumination. Virtually all models of this ability assume the existence of a single light source that has a known position, and further assume that the reflectance of a surface is known. The ability of our visual systems to recover shape from shading appears much more general than these models would suggest, but just how the visual system manages to use luminance gradients to infer 3-D shape in natural scenes remains largely unknown.
More generally, the visual system must be able to decompose a variation in luminance caused by intrinsic changes in surface properties (such as surface reflectance), from those caused by the variations that are extrinsic to a surface. There are two ways in which this decomposition seems to be accomplished. One method relies on image properties that provide a unique "signature" of their environmental cause to classify the causes of luminance variations. Such methods are usually described as "bottom-up" or data-driven processes, since they only depend on the form of the current input to the visual system. The other method is to use TOP-DOWN PROCESSING IN VISION to determine the causes of image structure, which relies on previously acquired information to classify ambiguous images. Both types of processes seem to operate when we view natural scenes in determining perceived surface structure.
One of the most challenging problems in the accurate recovery of surface structure is generated by the geometry of occlusion or camouflage. In cluttered scenes, many objects and surfaces are partially obscured by nearer, occluding surfaces, and some nearer surfaces are camouflaged by distant surfaces that have identical textural and reflectance properties. In order for the visual system to recover surface structure in these scenes, the visual system must be capable of distinguishing between those situations in which an object actually ends and those in which an object ends because it is partially occluded or obscured by a camouflaged background. Once it is determined that partial occlusion or camouflage is present, the visual system must integrate the spatially separated scene fragments into a single surface. The perceptual interpolation of objects behind occluders is known as amodal completion, and was originally studied by workers in GESTALT PERCEPTION. A related phenomenon is the modal completion of surfaces and contours over image regions that are partially camouflaged by a more distant surface. Recent physiological work has demonstrated that there exist cells early in the cortical processing stream (V2) that respond to illusory contours, providing strong evidence that the interpolation of surface is truly a form of visual processing, and does not require more abstract "cognitive" processes to occur.
In summary, the problem of surface perception is difficult because the visual system is confronted with the problem of untangling the different physical causes of the images on our retinas, and filling in missing information when only portions of a surface are visible. Although much progress has been made in understanding how the visual system infers surface structure in some simplified images, much remains to be done before we have a full understanding of how our visual system works in the highly structured images created by natural scenes.
Adelson, E. H., and A. P. Pentland. (1996). The perception of shading and reflectance. In D. C. Knill and W. Richards, Eds. Perception as Bayesian Inference. New York: Cambridge University Press.
Anderson, B. L. (1997). A theory of illusory lightness and transparency in monocular and binocular images: The role of contour junctions. Perception 26:419-453.
Anderson, B. L., and B. Julesz. (1995). A theoretical analysis of illusory contour formation in stereopsis. Psychol. Rev. 102:705-743.
Barrow, H. G., and J. M. Tennenbaum. (1978). Recovering intrinsic scene characteristics from images. In A. R. Hanson and E. M. Riseman, Eds., Computer Vision Systems, pp. 3-26.
Cavanagh, P., and Y. Leclerc. (1989). Shape from shadows. Journal of Experimental Psychology: Human Percption and Performance 15:3-27.
Gilchrist, A. L. (1977). Perceived lightness depends on perceived spatial arrangement. Science 185-187.
Gilchrist, A. L. (1994). Lightness, Brightness, and Transparency. Mawah, NJ: Erlbaum.
Metelli, F. (1974). The perception of transparency. Scientific American 230:90-98.
Nakayama, K., Z. J. He, and S. Shimojo. (1995). Visual surface representation: A critical link between lower-level and higher-level vision. In S. M. Kosslyn and D. N. Osherson, Eds., An Invitation to Cognitive Science: vol. 2, Visual Cognition. Cam bridge, MA: MIT Press.