Auditory Physiology

The two main functions of hearing lie in auditory communication and in the localization of sounds. Auditory physiology tries to understand the perception, storage, and recognition of various types of sounds for both purposes in terms of neural activity patterns in the auditory pathways. The following article will try to analyze what auditory representations may have in common with other sensory systems, such as the visual system (see visual anatomy and physiology), and what may be special about them.

Since the days of HELMHOLTZ (1885) the auditory system has been considered to function primarily as a frequency analyzer. According to von Békésy's work (1960), which was awarded the Nobel Prize in 1961, sound reaching the tympanic membrane generates a traveling wave along the basilar membrane in the cochlea of the inner ear. Depending on the frequency of the sound, the traveling wave achieves maximum amplitude in different locations. Thus frequency gets translated into a place code, with high frequencies represented near the base and low frequencies near the apex of the cochlea. Although the traveling wave has a rather broad peak, various synergistic resonance mechanisms assure effective stimulation of the cochlear hair cells at very precise locations.

Electrophysiological studies using tones of a single frequency (pure tones) led to a multitude of valuable data on the responses of neurons to such stimuli and to the recognition of tonotopic organization in the auditory pathways. Tonotopy, the neural representation of tones of best frequency in a topographic map, is analogous to retinotopy in the visual and somatotopy in the somatosensory system. The map is preserved by maintaining neighborhood relationships between best frequencies from the cochlea and auditory nerve through the initial stages of the central auditory system, such as cochlear nuclei, inferior colliculus and medial geniculate nucleus, to primary auditory cortex (A1; fig. 1). The standard assumption in pure-tone studies is that in order to understand stimulus coding at each subsequent level, one has to completely analyze the lower levels and then establish the transformations taking place from one level to the next (Kiang 1965). While this approach sounds logical, it assumes that the system is linear, which cannot always be taken for granted. Another problem that this theory has not solved is how information from different frequency channels gets integrated, that is, how complex sounds are analyzed by the auditory system.

Figure 1. Schematic illustration of the major structures and pathways in the auditory system of higher mammals. (A) Pathways up to the level of primary auditory cortex (from Journal of NIH Research 9 [October 1997], with permission). (B) Cortical processing pathways in audition (from Rauschecker 1998b, Current Opinion in Neurobiology 8: 516-521, with permission).

The use of complex sound stimuli, therefore, is of the essence in the analysis of higher auditory pathways. This has been done successfully in a number of specialized systems, such as frogs, songbirds, owls, and bats. For all these species, a neuroethological approach has been adopted based on functional-behavioral data (Capranica 1972; see also ANIMAL COMMUNICATION, ECHOLOCATION, and ETHOLOGY). The same approach has been used only sparingly in higher mammals, including primates (Winter and Funkenstein 1973).

The neurophysiological basis in humans for processing complex sounds, such as speech (see SPEECH PERCEPTION), cannot be studied directly with invasive methods. Therefore, animal models (e.g., nonhuman primates) have to be used. The question then arises to what extent human speech sounds can be applied validly as stimuli for the study of neurons in a different species. From a biological-evolutionary vantage point, it is more meaningful to employ the types of complex sounds that are used for communication in those same species (see ANIMAL COMMUNICATION). In using conspecific vocalizations we can be confident that the central auditory system of the studied species must be capable of processing these calls. By contrast, human speech sounds may not be processed in the same way by that species.

When comparing human speech sounds with communication sound systems in other species it is plain to see that most systems have certain components in common, which are used as carriers of (semantic) information. Among these distinctive features are segments of frequency changing over time (FM sweeps or "glides") and bandpass noise bursts with specific center frequencies and bandwidths (fig. 2). Such universal elements of auditory communication signals can be used as stimuli with a degree of complexity that is intermediate between the pure tones used in traditional auditory physiology and the whole signal whose representation one really wants to understand.

Figure 2. Sound spectrograms human speech samples (A) and monkey calls (B) illustrating the common occurrence of FM glides and band-pass noise bursts in vocalizations from both species.

Psychophysical studies have indeed provided evidence for the existence of neural mechanisms tuned to the rate and direction of FM glides (Liberman et al. 1967, Kay 1982) as well as to specific bands of noise (Zwicker 1970). Neurophysiologically, neurons selective to the same parameters have been identified in the auditory cortex of various species. Most notably, a large proportion of FM selective neurons as well as neurons tuned to certain bandwidths have recently been found in the lateral belt areas of the superior temporal gyrus (STG) in rhesus monkeys (Rauschecker, Tian, and Hauser 1995). The posterior STG region has also been found to contain mechanisms selective for phoneme identification in humans, using functional neuroimaging techniques (see PHONOLOGY, NEURAL BASIS OF).

Many neurons in the lateral belt or STG region of rhesus monkeys (fig. 1B) also respond well and quite selectively to the monkey calls themselves. The question arises by what neural mechanisms such selectivity is generated. Studies in which monkey calls are dissected into their constituent elements (both in the spectral and temporal domains), and the elements are played to the neurons separately or in combination can provide an answer to this question (Rauschecker 1998). A sizable proportion of neurons in the STG (but not in A1) responds much better to the whole call than to any of the elements. These results are indicative of nonlinear summation in the frequency and time domain playing a crucial role in the generation of selectivity for specific types of calls. Coincidence detection in the time domain is perhaps the most important mechanism in shaping this selectivity. Temporal integration acts over several tens (or hundreds) of milliseconds, as most "syllables" in monkey calls (as well as in human speech) are of that duration.

There is some limited evidence for a columnar or patchy representation of specific types of monkey calls in the lateral belt areas. Rhesus calls can be subdivided into three coarse classes: tonal, harmonic, and noisy calls. Neurons responsive to one or another category are often found grouped together. It would be interesting to look for an orderly "phonetic map" of the constituent elements themselves, whereby interactions in two-dimensional arrays of time and frequency might be expected.

It is very likely that the lateral belt areas are not yet the ultimate stage in the processing of communication sounds. They may just present an intermediate stage, similar to V4 in the visual system, which also contains neurons selective for the size of visual stimuli. Such size selectivity is obviously of great importance for the encoding of visual patterns or objects, but the differentiation into neurons selective for even more specific patterns, such as faces, is not accomplished until an even higher processing stage, namely, the inferotemporal cortex (Desimone 1991; see also FACE RECOGNITION). In the auditory cortex, areas in the anterior or lateral parts of the STG or in the dorsal STS may be target areas for the exploration of call-specific neurons.

The second main task of hearing is to localize sound sources in space. Because the auditory periphery does not a priori possess a two-dimensional quality, as do the visual and somatosensory peripheries, auditory space has to be computed from attributes of sound that vary systematically with spatial location and are thus processed differentially by the central auditory system. This problem is logistically similar to the computation of 3-D information from two-dimensional sensory information in the visual system. Sound attributes most commonly assigned to spatial quality are differences between sound arriving at the two ears. Both the intensity and the time of arrival of sound originating from the same source differ when the sound source is located outside the median plane. Interaural time and intensity differences (ITD and IID, respectively) are registered and mapped already in areas of the brainstem, such as the superior olivary complex (Irvine 1992). In addition, the spectral composition of sound arriving at the two ears varies with position due to the spectral filter characteristics of the external ears (pinnae) and the head. Even monaurally, specific spectral "fingerprints" can be assigned to spatial location, with attenuation of particular frequency bands ("spectral notches") varying systematically with azimuth or elevation (Blauert 1996). Neurons in the dorsal cochlear nuclei are tuned to such spectral notches and may thus be involved in extracting spatial information from complex sounds (Young et al. 1992).

The information computed by these lower brainstem structures is used by higher centers of the midbrain, such as the inferior and superior colliculi, to guide orienting movements toward sounds. For more "conscious" spatial perception in higher mammals, including humans, auditory cortex seems to be indispensable, as cortical lesions almost completely abolish the ability to judge the direction of sound in space. Neurons in the primary auditory cortex of cats show tuning to the spatial location of a sound presented in free field (Imig, Irons, and Samson 1990). Most recently, an area in the anterior ectosylvian sulcus (AES), which is part of the cat's parietal cortex, has been postulated to be crucially involved in sound localization (Korte and Rauschecker 1993, Middlebrooks et al. 1994). Functional neuroimaging studies in humans also demonstrate specific activation in the posterior parietal cortex of the right hemisphere by virtual auditory space stimuli (Rauschecker 1998a,b).

Both animal and human studies suggest, therefore, that information about auditory patterns or objects gets processed, among others, in the superior temporal gyrus (STG). By contrast, auditory spatial information seems to get processed in parietal regions of cortex (fig. 1B). This dual processing scheme is reminiscent of the visual pathways, where a ventral stream has been postulated for the processing of visual object information and a dorsal stream for the processing of visual space and motion (Mishkin, Ungerleider, and Macko 1983; see also visual processing streams).

Additional links

-- Josef P. Rauschecker

References

Békésy, G. von. (1960). Experiments in Hearing. New York: McGraw-Hill.

Blauert, J. (1996). Spatial Hearing. 2d ed. Cambridge, MA: MIT Press.

Capranica, R. R. (1972). Why auditory neurophysiologists should be more interested in animal sound communication. Physiologist 15:55-60.

Desimone, R. (1991). Face-selective cells in the temporal cortex of monkeys. Journal of Cognitive Neuroscience 3:1-8.

Helmholtz, H. von. (1885). On the Sensation of Tones. Reprinted 1954. New York: Dover Publications.

Imig, T. J., W. A. Irons, and F. R. Samson. (1990). Single-unit selectivity to azimuthal direction and sound pressure level of noise bursts in cat high-frequency primary auditory cortex. Journal of Neurophysiology 63:1448-1466.

Irvine, D. (1992). Auditory brainstem processing. In A. N. Popper and R. R. Fay, Eds., The Mammalian Auditory Pathway: Neurophysiology. New York: Springer, pp. 153-231.

Kay, R. H. (1982). Hearing of modulation in sounds. Physiological Reviews 62:894-975.

Kiang, N. Y-S. (1965). Stimulus coding in the auditory nerve and cochlear nucleus. Acta Otolaryngologica 59:186-200.

Korte, M., and J. P. Rauschecker. (1993). Auditory spatial tuning of cortical neurons is sharpened in cats with early blindness. Journal of Neurophysiology 70:1717-1721.

Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy. (1967). Perception of the speech code. Psychological Review 74:431-461.

Middlebrooks, J. C., A. E. Clock, L. Xu, and D. M. Green. (1994). A panoramic code for sound location by cortical neurons. Science 264:842-844.

Mishkin, M., L. G. Ungerleider, and K. A. Macko. (1983). Object vision and spatial vision: two cortical pathways. Trends in Neurosciences 6:414-417.

Rauschecker, J. P. (1998a). Parallel processing in the auditory cortex of primates. Audiology and Neurootology 3:86-103.

Rauschecker, J. P. (1998b). Cortical processing of complex sounds. Current Opinion in Neurobiology 8:516-521.

Rauschecker, J. P., B. Tian, and M. Hauser. (1995). Processing of complex sounds in the macaque nonprimary auditory cortex. Science 268:111-114.

Winter, P., and H. H. Funkenstein. (1973). The effects of species-specific vocalization on the discharge of auditory cortical cells in the awake squirrel monkey (Saimiri sciureus). Experimental Brain Research 18:489-504.

Young, E. D., G. A. Spirou, J. J. Rice, and H. F. Voigt. (1992). Neural organization and responses to complex stimuli in the dorsal cochlear nucleus. Philosophical Transactions of the Royal Society Lond B 336(1278):407-413.

Zwicker, E. (1970). Masking and psychological excitation as consequences of the ear's frequency analysis. In R. Plomp and G. F. Smoorenburg, Eds., Frequency Analysis and Periodicity Detection in Hearing. Leiden: Sijthoff, pp. 376-394.

Auditory Physiology

See also

Additional links

References

Further Readings