Multisensory Integration

Because of its importance in forming an appropriate picture of the external world, the representation of sensory information has been a powerful driving force in EVOLUTION. Extant organisms possess an impressive array of specialized sensory systems that allow them to monitor simultaneously a host of environmental cues. This "parallel" processing of multiple cues not only increases the probability of detecting a given stimulus but, because the information carried along each sensory channel reflects a different feature of that stimulus, it also increases the likelihood of its accurate identification. For example, stimuli that are similar along one physical dimension (how they sound) might be identified on the basis of a second dimension (how they look). But if a coherent representation of the external world is to be constructed, and if the appropriate responses are to be generated, the brain must synthesize the information originating from these different sensory channels. One way in which such a multimodal representation is generated is by having information from different sensory systems converge on a common group of neurons.

During the evolution of sensory systems, mechanisms were preserved or elaborated so that the combined action of sensory systems would provide information not available within any single sensory channel. Indeed, in many circumstances, events are more readily perceived, have less ambiguity, and elicit a response far more rapidly when signaled by the coordinated action of multiple sensory modalities. Sensory systems have evolved to work in concert, and normally, different sensory cues that originate from the same event are concordant in both space and time. The products of this spatial and temporal coherence are synergistic intersensory interactions within the central nervous system (CNS) , interactions that are presumed to enhance the salience of the initiating event. For example, seeing a speaker's face makes the spoken message far easier to understand, especially in a noisy room (Sumby and Pollack 1954).

Similarly, discordant cues from different modalities can have powerful effects on perception, as illustrated by a host of interesting cross-modal illusions. One of the most compelling of these is the so-called McGurk effect, wherein a speaker lip-synchs the syllable "ga" in time with the sound "ba" (McGurk and MacDonald 1976). The perception is of neither "ga" nor "ba," but a synthesis of the two, "da." Similarly, in the "ventriloquism effect," the sight of movement (i.e., the dummy's head and lips) compels one to believe it is also the source of the sound.

Multisensory neurons, which receive input from more than a single sensory modality, are found in many areas of the CNS (see Stein and Meredith 1993 for a review). These neurons are involved in a number of circuits, and presumably in a variety of cognitive and behavioral functions. Thus, for example, multisensory neurons in neocortex are likely participants in the perceptual, mnemonic, and associative processes that serve to bind together the modality-specific components of a multisensory experience. Still other multisensory neurons, positioned at the sensorimotor interface, are known to mediate goal-directed orientation behavior. Such neurons, a high incidence of which are found in the superior colliculus (SC), have been the most extensively studied, and serve as the model for deciphering how multiple sensory cues are integrated at the level of the single neuron (see Stein and Meredith 1993 for review). Visual, auditory, and somatosensory inputs converge on individual neurons in the SC, where each of these modalities is represented in a common coordinate frame. As a result, the modality-specific receptive fields of an individual multisensory neuron represent similar regions of space.

Figure 1

Figure 1 Multisensory integration in a visual-auditory SC neuron. The two receptive fields (RFs) of this neuron (dark gray shading shows the region of their overlap) are shown at the top. Icons depict stimuli: visual (V) is a moving bar of light, auditory is a broad-band noise burst from a speaker either within (A1), or outside (A2) the RF. Below, peristimulus time histograms and bar graphs (means) show responses to the visual stimulus alone (movement is represented by a ramp), the within-field auditory stimulus alone (square wave), and the stimulus combination. The summary bar graph shows that the large response enhancement is greater than the sum of A+V. The bottom panel illustrates the inhibition of the visual response when the auditory stimulus is outside its RF.

An example of an SC neuron's ability to integrate two different sensory inputs is illustrated in figure 1. When presented simultaneously and paired within their receptive fields, a visual and auditory stimulus result in a substantial response enhancement, well above the sum of the two individual responses (see A1V). Conversely, when the auditory stimulus is presented outside its receptive field, the neuron's ability to generate a vigorous response to the visual stimulus is suppressed (see A2V). The timing of these stimuli is critical, and the magnitude of their interaction changes when the interval between the two stimuli is manipulated (Meredith and Stein 1986). However, this interval or "temporal window" generally is quite broad (e.g., several hundred milliseconds).

The multisensory interactions that are observable at the level of the single neuron are reflected in the animal's behavior (Stein et al. 1989). Thus, its ability to detect and orient toward a visual stimulus is markedly enhanced when it is paired with a neutral auditory cue at the same position in space. However, if the auditory cue is spatially disparate from the visual, the response is strongly degraded.

Although SC neurons can respond to different sensory stimuli via inputs from a variety of structures, their ability to integrate multisensory information depends on projections from a specific region of neocortex (Wallace and Stein 1994). If these inputs from cortex are removed, SC neurons continue to respond to stimuli from different sensory modalities but fail to exhibit the synergistic interactions that characterize multisensory integration. At the behavioral level, animals can still orient normally to unimodal cues, but the benefit derived from combined cues is markedly diminished (Wilkinson, Meredith, and Stein 1996). This intimate relationship between cortex and SC suggests that the higher-level cognitive functions of the neocortex play a substantial role in controlling the information-processing capability of multisensory neurons in the SC, as well as the overt behaviors they mediate.

At present, comparatively little is known about the multisensory integrative properties of the cortical multisensory neurons presumed to be involved in various aspects of perception. However, they have been shown to share some of the features of SC neurons (Wallace, Meredith, and Stein 1992). Future studies detailing their response properties and associated circuitry should greatly aid in our understanding of how multisensory information is used in higher cognitive functions and, in doing so, reveal the neural basis of a fully integrated multisensory experience.

See also

Additional links

-- Barry E. Stein, Terrence R. Stanford, J. William Vaughan, and Mark T. Wallace

References

McGurk, H., and J. MacDonald. (1976). Hearing lips and seeing voices. Nature 264:746-748.

Meredith, M. A., and B. E. Stein. (1986). Visual, auditory and somatosensory convergence on cells in superior colliculus results in multisensory integration. J. Neurophysiol. 56:640-662.

Stein, B. E., and M. A. Meredith. (1993). The Merging of the Senses. Cambridge, MA: MIT Press.

Stein, B. E., M. A. Meredith, W. S. Huneycutt, and L. McDade. (1989). Behavioral indices of multisensory integration: orientation to visual cues is affected by auditory stimuli. J. Cogn. Neurosci. 1:12-24.

Sumby, W. H., and I. Pollack. (1954). Visual contribution to speech intelligibility in noise. J. Acoust. Soc. Am. 26:212-215.

Wallace, M. T., M. A. Meredith, and B. E. Stein. (1992). Integration of multiple sensory modalities in cat cortex. Exp. Brain Res. 91:484-488.

Wallace, M. T., and B. E. Stein. (1994). Cross-modal synthesis in the midbrain depends on input from cortex. J. Neurophysiol. 71:429-432.

Wilkinson, L. K., M. A. Meredith, and B. E. Stein. (1996). The role of anterior ectosylvian cortex in cross-modality orientation and approach behavior. Exp. Brain Res. 112:1-10.

Further Readings

Cytowic, R. E. (1989). Synesthesia: A Union of the Senses. New York: Springer-Verlag.

Lewkowicz, D. J., and R. Lickliter. (1994). The Development of Intersensory Perception: Comparative Perspectives. Hillsdale, NJ: Erlbaum.

Stein, B. E., M. A. Meredith, and M. T. Wallace. (1994). Neural mechanisms mediating attention and orientation to multisensory cues. In M. Gazzaniga, Ed., The Cognitive Neurosciences. Cambridge, MA: MIT Press, pp. 683-702.

Walk, R. D., and L. H. Pick. (1981). Intersensory Perception and Sensory Integration. New York: Plenum Press.

Welch, R. B., and D. H. Warren. (1986). Intersensory interactions. In K. R. Boff, L. Kaufman, and J. P. Thomas, Eds., Handbook of Perception and Human Performance, vol. 1: Sensory Pro cesses and Perception. New York: Wiley, pp. 25-1-25-36.