Psycholinguistics is the study of people's actions and mental processes as they use language. At its core are speaking and listening, which have been studied in domains as different as LANGUAGE ACQUISITION and language disorders. Yet the primary domain of psycholinguistics is everyday language use.

Speaking and listening have several levels. At the bottom are the perceptible sounds and gestures of language: how speakers produce them, and how listeners hear, see, and identify them (see PHONETICS, PHONOLOGY, SIGN LANGUAGES). One level up are the words, gestural signals, and syntactic arrangement of what is uttered: how speakers formulate utterances, and how listeners identify them (see SENTENCE PROCESSING). At the next level up are communicative acts: what speakers do with their utterances, and how listeners understand what they mean (see PRAGMATICS). At the highest level is DISCOURSE, the joint activities people engage in as they use language. At each level, speakers and listeners have to coordinate their actions.

Speakers plan what they say more than one word at a time. In conversation and spontaneous narratives, they tend to plan in intonation units, generally a single major clause or phrase delivered under a unifying intonation contour (Chafe 1980). Intonation units take time to plan, so they often begin with pauses and disfluencies (uh or um, elongated words, repeated words). For example, one speaker recounting a film said: "[1.0 sec pause] A--nd u--m [2.6 sec pause] you see him taking . . . picking the pears off the leaves."

Planning such units generally proceeds from the top level of language down -- from intention to ARTICULATION (Levelt 1989). Speakers decide on a message, then choose constructions for expressing it, and finally program the phonetic segments for articulating it. They do this in overlapping stages.

Formulation starts at a functional level. Consider a woman planning "Take the steaks out of the freezer." First she chooses the subject, verb, direct object, and source she wants to express, roughly "the addressee is to get meat from a freezer." Then she chooses an appropriate syntactic frame, an imperative construction with a verb, object, and source location. She then finds the noun and verbs she needs, take, steak, and freeze. Finally, she fills in the necessary syntactic elements -- the article the, the preposition out of, and the suffixes -s and -er. Formulation then proceeds to a positional level. She creates a phonetic plan for what she has formulated so far. She uses the plan to program her articulatory organs (tongue, lips, glottis) to produce the actual sounds, "Take the steaks out of the freezer." Processing at these levels overlaps as she plans later phrases while articulating earlier ones.

Much of the evidence for these stages comes from slips of the tongue collected over the past century (Fromkin 1973; Garrett 1980). Suppose that the speaker of the last example had, by mistake, transposed steak and freeze as she introduced them. She would then have added -s to freeze and -er to steak and produced "Take the freezes out of the steaker." Other slips occur at the positional level, as when the initial sounds in left hemisphere are switched to form heft lemisphere.

Listeners are often thought to work from the bottom up. They are assumed to start with the sounds they hear, infer the words and syntax of an utterance, and, finally, infer what the speakers meant. The actual picture is more complicated. In everyday conversation, listeners have a good idea of what speakers are trying to do, and working top down, they use this information to help them identify and understand what they hear (Tanenhaus and Trueswell 1995).

Spoken utterances are identified from left to right by an incremental process of elimination (Marslen-Wilson 1987). As listeners take in the sounds of "elephant," for example, they narrow down the words it might be. They start with an initial cohort of all words beginning with "e" (roughly 1000 words), narrow that to the cohort of all words beginning with "el" (roughly 100 words), and so on. By the sound "f" the cohort contains only one word, allowing them to identify the word as "elephant." This way listeners often identify a word before it is complete. Evidence also suggests that listeners access all of the meanings of the words in these cohorts (Swinney 1979). For example, the moment they identify "bugs" in "He found several bugs in the corner of his room" they activate the two meanings "insects" and "hidden microphones." Remarkably, they activate the same two meanings in "He found several spiders, roaches, and other bugs in the corner of his room," even though the context rules out microphones. But after only .2 to .4 seconds "hidden microphones" gets suppressed in favor of "insects."

Still, listeners do use t op-down information in identifying words and constructions (Tanenhaus et al. 1995). When people are placed at a table with many objects on it and are asked, "Pick up the candle," they move their gaze to the candle before they reach for it. Indeed, they start to move their eyes toward the candle about 50 msec before the end of "candle." But if there is candy on the table along with the candle, they do not start to move their eyes until 30 msec after the end of "candle." As a sentence, "Put the apple on the towel in the box" may mean either (1) an apple is to go on a towel that is in a box, or (2) an apple on a towel is to go into a box. Without context, listeners strongly prefer interpretation 1. But when people are placed at a table with two apples, one on a towel and another on a napkin, their eye movements show that they infer interpretation 2 from the beginning. In identifying utterances, then, listeners are flexible in the information they exploit -- auditory information, knowledge of syntax, and the context.

Speaking and listening aren't autonomous processes. People talk in order to do things together, and to accomplish that they have to coordinate speaking with listening at many levels (Clark 1996).

One way people coordinate in conversation is with adjacency pairs. An adjacency pair consists of two turns, the first of which projects the second, as in questions and answers:

In his first turn Sam proposes a simple joint project, that he and Duncan exchange information about what Duncan is. In the next turn Duncan takes up his proposal, completing the joint project, by giving the information Sam wanted. People use adjacency pairs for establishing joint commitments throughout conversations. They use them for openings (as in the exchange "Hey, Barbara" "Yes?") and closings ("Bye" "Bye"). They use them for setting up narratives ("Tell you who I met yesterday --" "Who?"), elaborate questions ("Oh there's one thing I wanted to ask you" "Mhm"), and other extended joint projects.

Speakers use their utterances to perform illocutionary acts -- assertions, questions, requests, offers, promises, apologies, and the like -- acts that differ in the uptake they project. Most constructions (e.g., "Sit down") can be used for more than one illocutionary act (e.g., a command, a request, an advisory), so speakers and listeners have to coordinate on what is intended. One way they coordinate is by treating each utterance as a contribution to a larger joint project. For example, when restaurant managers were asked on the telephone, "Do you accept American Express cards?" they inferred that the caller had an American Express card and wanted a "yes" or "no" answer. But when they were asked "Do you accept any kinds of credit cards?" they inferred the caller had more than one credit card and wanted a list of the cards they accepted ("Visa and Mastercard"). Listeners draw such inferences more quickly when the construction is conventionally used for the intended action. "Can you tell me the time?" is a conventional way to ask for the time, making it harder to construe as a question about ability (Gibbs 1994).

People work hard in conversation to establish that each utterance has been understood as intended (Clark 1996). To do that, speakers monitor their speech for problems and repair them as quickly as reasonable (Levelt 1983; Schegloff, Jefferson, and Sacks 1977). In "if she'd been -- he'd been alive," the speaker discovers that "she" is wrong, replaces it with "he," and continues. Listeners also monitor and, on finding problems, often ask for repairs, as Barbara does here:

People monitor at all levels of speaking and listening. Speakers, for example, monitor their addressees for lapses of attention, mishearings, and misunderstandings. They also monitor for positive evidence of attention, hearing, and understanding, evidence that addressees provide. Addressees, for example, systematically signal their attention with eye gaze and acknowledge hearing and understanding with "yeah" and "uh huh."

Speaking and listening are not the same in all circumstances. They vary with the language (English, Japanese, etc.), with the medium (print, telephones, video, etc.), with age (infants, adults, etc.), with the genre (fiction, parody, etc.), with the trope (irony, metaphor, etc.), and with the joint activity (gossip, court trials, etc.). Accounting for these variations remains a major challenge for psycholinguistics.

See also

Additional links

-- Herbert H. Clark


Chafe, W. (1980). The deployment of consciousness in the production of a narrative. In W. Chafe, Ed., The Pear Stories. Norwood, NJ: Ablex, pp. 9-50.

Clark, H. H. (1996). Using Language. Cambridge: Cambridge University Press.

Fromkin, V. A., Ed. (1973). Speech Errors as Linguistic Evidence. The Hague: Mouton.

Garrett, M. F. (1980). Syntactic processes in sentence production. In B. Butterworth, Ed., Speech Production. New York: Academic Press, pp. 170-220.

Gibbs, R. W., Jr. (1994). The Poetics of Mind: Figurative Thought, Language, and Understanding. Cambridge: Cambridge University Press.

Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14:41-104.

Levelt, W. J. M. (1989). Speaking. Cambridge, MA: MIT Press.

Marslen-Wilson, W. (1987). Functional parallelism in spoken word recognition. Cognition 25:71-102.

Schegloff, E. A., G. Jefferson, and H. Sacks. (1977). The preference for self-correction in the organization of repair in conversation. Language 53:361-382.

Swinney, D. A. (1979). Lexical access during sentence comprehension: (Re)consideration of context effects. Journal of Verbal Learning and Verbal Behavior 18:645-660.

Tanenhaus, M. K., M. J. Spivey-Knowlton, K. M. Eberhard, and J. C. Sedivy. (1995). Integration of visual and linguistic information in spoken language comprehension. Science 268:1632-1634.

Tanenhaus, M. K., and J. C. Trueswell. (1995). Sentence comprehension. In J. L. Miller and P. D. Eimas, Eds., Handbook of Perception and Cognition: Speech, Language, and Communication. 2nd ed. San Diego: Academic Press, pp. 217-262.