Articulation means movement. In speech, articulation is the process by which speech sounds are formed. The articulators are the movable speech organs, including the tongue, lips, jaw, velum, and pharynx. These organs, together with related tissues, comprise the vocal tract, or the resonating cavities of speech production that extend from the larynx (voice box) to the lips or nostrils. Human speech production is accomplished by the coordination of muscular actions in the respiratory, laryngeal, and vocal tract systems. Typically, the word articulation refers to the functions of the vocal tract, but speech production requires the action of all three systems. A full account of speech production would go beyond articulation to include such topics as intonation and emotional expression. Essentially, articulation is the means by which speech is formed to express language (see LANGUAGE PRODUCTION and LANGUAGE AND COMMUNICATION).

Articulation is a suitable topic for cognitive science for several reasons, but especially because it is (1) arguably the most precisely performed of human movements, (2) a serial behavior of exceptional complexity, (3) the most natural means of language expression in all communities except people with impairments of hearing, and (4) a uniquely human behavior linked to a variety of other accomplishments.

Ordinary conversational speech is produced at rates of five to ten syllables per second, or about twenty to thirty phonemes (sound units that distinguish words) per second. Individual speech sounds therefore have an average duration of approximately fifty milliseconds. This rapid rate has been emphasized in studies of speech perception because no other sound sequence can be perceived at comparable rates of presentation (Liberman et al. 1967). The rapid rate is impressive also from the perspective of production and the motor control processes it entails. Each sound must be uttered in the correct sequence, and each, in turn, requires the precise timing of the movements that distinguish it from other sounds. Although a given sound can be prototypically defined by its associated movements (e.g., closure of the lips and laryngeal vibrations for the b in boy), the actual pattern of movements varies with other sounds in the sequence to be produced (the phonetic context). Generally, articulatory movements overlap one another and can be mutually adjusted. At any one instant, the articulators may appear to be simultaneously adjusted to the requirements of two or more sounds. For example, the s sound in the word stew is typically produced with lip rounding, but the s sound in the word stay is not. The reason for this difference is that the s sound in the word stew anticipates the lip rounding required for the forthcoming rounded vowel. This phenomenon is called coarticulation and has been one of the most challenging issues in speech production theory. Coarticulation is not restricted to immediately adjacent sounds and may, in fact, extend over several segments and even cross syllable and word boundaries. The complex overlapping of articulatory movements has been the subject of considerable research, as summarized by Fowler and Saltzman (1993) and by Kent and Minifie (1977). Coarticulation is an obstacle to segmentation, or the demarcation of speech behavior into discrete units such as phonemes and words.

With the exception of SIGN LANGUAGES used by people who are deaf, speech is the primary means of communication in all human communities. Speech is therefore closely related to language (and to the auditory perception of language) and is often the only means by which a particular language can be studied, because the majority of the world's languages do not have a written form. Speech appears to be unique to humans (see ANIMAL COMMUNICATION). Because speech is harnessed to language, it is difficult or impossible to gain a deep understanding of speech apart from its linguistic service. As Fujimura (1990) observed, "While speech signals convey information other than linguistic codes, and the boundary between linguistic and extra- or paralinguistic issues may not be clearcut, there is no question that the primary goal of speech research is to understand the relation of the units and organization of linguistic forms to the properties of speech signals uttered and perceived under varying circumstances" (p. 244). The output of the phonological component of the grammar has often been assumed as the input to the system that regulates speech production (see PHONETICS and PHONOLOGY).

Because the speech signal is perishable, expression and perception of its serial order are essential to communication by speech. In his classic paper, LASHLEY (1951) considered speech as exemplary of the problem of serial order in human behavior. He proposed three mechanisms for the control of seriation: determining tendency (the idea to be expressed), activation of the selected units (meaning that they are primed for use but not yet serially ordered), and the schema of order (or the syntax of the act that finally yields a serial ordering of the intended utterance). Lashley's insights illuminate some of the major cognitive dimensions of articulation, and Lashley's ideas resonate in contemporary studies of speech. One area in particular is the study of sequencing errors (e.g., substitutions, deletions, and exchanges of segments) in both normal and pathological speech. These errors have attracted careful study because of the belief that the mistakes in the motor output of speech can reveal the underlying organization of speech behavior. Large corpora of speech errors have been collected and analyzed in attempts to discover the structures of speech organization (Fromkin 1980). But this is only part of the problem of serial order in speech. It is also necessary to understand how individual movements are coordinated to meet the needs of intelligibility while being energetically efficient (Kelso, Saltzman, and Tuller 1986; MacNeilage 1970).

A number of laboratory techniques have been developed to study speech production. The two major methodologies are physiologic and acoustic. Physiological methods are diverse because no single method is suited to study the different structures and motor systems involved in speech. Among the methods used are electromyography, aerodynamics, various kinds of movement transduction, X-ray, and photoelectrical techniques (Stone 1997). Of these, X-ray techniques have provided the most direct information, but, to avoid the hazards of X-ray exposure, investigators are using alternative methods such as the use of miniature magnetometers. Acoustic studies offer the advantages of economy, convenience, and a focus on the physical signal that mediates between speaker and listener. Acoustic methods are limited to some degree because of uncertainties in inferring articulatory actions from the acoustic patterns of speech (Fant 1970), but acoustic analysis has been a primary source of information on articulation and its relation to speech perception (Fujimura and Erickson 1997, Stevens 1997).

Among the most influential theories or models of articulation have been stage models, dynamic systems, and connectionist networks. In stage models, information is successively processed in serially or hierarchically structured components (Meyer and Gordon 1985). Dynamic systems theories seek solutions in terms of task-dependent biomechanical properties (Kelso, Saltzman, and Tuller 1986). Connectionist networks employ massively parallel architectures that are trained with various kinds of input information (Jordan 1991). Significant progress has been made in the computer simulation of articulation, beginning in the 1960s (Henke 1966) and extending to contemporary efforts that combine various knowledge structures and control strategies (Saltzman and Munhall 1989, Guenther 1995, Wilhelms-Tricarico 1996). This work is relevant both to the understanding of how humans produce speech and to the development of articulatory speech synthesizers (see SPEECH SYNTHESIS).

A major construct of recent theorizing about speech articulation is the gesture, defined as an abstract characterization of an individual movement (e.g., closure of the lips). It has been proposed that gestures for individual articulators are combined in a motor score that specifies the movements for a particular phonetic sequence. A particularly appealing property of the gesture is its potential as a construct in phonology (Browman and Goldstein 1986), speech production (Saltzman and Munhall 1989), speech perception (Fowler 1986), and speech development in children (Goodell and Studdert-Kennedy 1993).

See also

Additional links

-- Raymond D. Kent


Browman, C., and L. Goldstein. (1986). Towards an articulatory phonology. In C. Ewan and J. Anderson, Eds., Phonology Yearbook 3, pp. 219-252. Cambridge: Cambridge University Press.

Fowler, C. A. (1986). An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics 14:3-28.

Fowler, C. A., and E. Saltzman. (1993). Coordination and coarticulation in speech production. Language and Speech 36:171-195.

Fromkin, V. A., Ed. (1980). Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen and Hand. New York: Academic Press.

Fujimura, O., and D. Erickson. (1997). Acoustic phonetics. In W. J. Hardcastle and J. Laver, Eds., Handbook of Phonetic Sciences, pp. 65-115. Cambridge, MA: Blackwell.

Goodell, E. W., and M. Studdert-Kennedy. (1993). Acoustic evidence for the development of gestural coordination in the speech of 2-year-olds: a longitudinal study. Journal of Speech and Hearing Research 36:707-727.

Guenther, F. H. (1995). Speech sound acquisition, coarticulation and rate effects in a neural network model of speech production. Psychological Review 102:594-621.

Henke, W. L. (1966). Dynamic Articulatory Model of Speech Production Using Computer Simulation. Ph.D. diss., MIT.

Jordan, M. I. (1991). Serial order: a parallel distributed processing approach. In J. L. Elman and D. E. Rumelhart, Eds., Advances in Connectionist Theory: Speech, Hillsdale, NJ: Erlbaum, pp. 214-249.

Kelso, J. A. S., E. L. Saltzman, and B. Tuller. (1986). The dynamical perspective on speech production: data and theory. Journal of Phonetics 14:29-59.

Kent, R. D., and F. D. Minifie. (1977). Coarticulation in recent speech production models. Journal of Phonetics 5:115-133.

Lashley, K. (1951). The problem of serial order in behavior. In L. A. Jeffress, Ed., Cerebral Mechanisms in Behavior, pp. 506-528. New York: Wiley.

Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy. (1970). Perception of the speech code. Psychological Review 74:431-461.

Meyer, D. E., and P. C. Gordon. (1985). Speech production: motor programming of phonetic features. Journal of Memory and Language 24:3-26.

MacNeilage, P. (1970). Motor control of serial ordering of speech. Psychological Review 77:182-196.

Saltzman, E. L., and K. G. Munhall. (1989). A dynamical approach to gestural patterning in speech production. Ecological Psychology 1:333-382.

Shattuck-Hufnagel, S. (1983). Sublexical units and suprasegmental structure in speech production planning. In P. MacNeilage, Ed., The Production of Speech, New York: Springer, pp. 109-136.

Stevens, K. N. (1997). Articulatory-acoustic-auditory relationships. In W. J. Hardcastle and J. Laver, Eds., Handbook of Phonetic Sciences, Cambridge, MA: Blackwell, pp. 462-506.

Stone, M. (1997). Laboratory techniques for investigating speech articulation. In W. J. Hardcastle and J. Laver, Eds., Handbook of Phonetic Sciences, Cambridge, MA: Blackwell, pp. 11-32.

Wilhelms-Tricarico, R. (1996). A biomechanical and physiologically-based vocal tract model and its control. Journal of Phonetics 24:23-38.

Further Readings

Fant, G. (1980). The relations between area functions and the acoustic signal. Phonetica 37:55-86.

Fujimura, O. (1990). Articulatory perspectives of speech organization. In W. J. Hardcastle and J. Laver, Eds., Speech Production and Speech Modelling. Dordrecht; Kluwer Academic Press, pp. 323-342.

Kent, R. D., S. G. Adams, and G. S. Turner. (1996). Models of speech production. In N. J. Lass, Ed., Principles of Experimental Phonetics. St. Louis: Mosby, pp. 3-45.

Levelt, W. J. M. (1989). Speaking: From Intention to Articulation. Cambridge, MA: MIT Press.

Kent, R. D., B. S. Atal, and J. L. Miller, Eds. (1991). Papers in Speech Communication: Speech Production. Woodbury, NY: Acoustical Society of America.

Lindblom, B. E. F., and J. E. F. Sundberg. (1971). Acoustical consequences of lip, tongue, jaw and larynx movement. Journal of the Acoustical Society of America 50:1166-1179.

Lofqvist, A. (1997). Theories and models of speech production. In W. J. Hardcastle and J. Laver, Eds., Handbook of Phonetic Sciences. Cambridge, MA: Blackwell, pp. 405-426.

Mermelstein, P. (1973). Articulatory model of speech production. Journal of the Acoustical Society of America 53:1070-1082.

Ohman, S. E. G. (1967). Numerical model of coarticulation. Journal of the Acoustical Society of America 41:310-320.

Perkell, J. S. (1997). Articulatory processes. In W. J. Hardcastle and J. Laver, Eds., Handbook of Phonetic Sciences. Cambridge, MA: Blackwell, pp. 333-370.

Smith, A. (1992). The control of orofacial movements in speech. Critical Reviews in Oral Biology and Medicine 3:233-267.

Stevens, K. N. (1989). On the quantal nature of speech. Journal of Phonetics 17:3-46.