Syntax

Syntax is the study of the part of the human linguistic system that determines how sentences are put together out of words. Syntax interfaces with the semantic and phonological components, which interpret the representations ("sentences") provided by syntax. It has emerged in the last several decades that the system is largely universal across languages and language types, and therefore presumably innate, with narrow channels within which languages may differ from one another (Chomsky 1965; 1981b), a conclusion confirmed by studies of child language acquisition (Crain 1990 and numerous others).

The term syntax is also used to refer to the "structure" of sentences in a particular language. One aspect of the syntactic structure of sentences is the division of a sentence into phrases, and those phrases into further phrases, and so forth. The range of phrase types is quite specific and generally considered to be innately determined (and hence not learned) and so are the rules that determine how phrases are formed from words and other phrases. Essentially, the inventory of phrase types is derived from the inventory of the parts of speech, or lexical categories (noun, verb, etc.). First, any part of speech can be the "head" or nucleus of a phrase. Then, larger phrases are built from a head by combining it with another phrase in one of three ways: the added phrase can express an argument of the head; it can modify the head; or it can be a "specifier" of the head:

(1)

a. argument:	see + the boy	[V NP]_VP
b. modification:	see + clearly	[VP Adv]_VP
c. specification:	the + boys	[Article N]_NP

The structure of a phrase is indicated by surrounding the parts of the phrase with brackets labeled with the name of the phrase; hence the notation [V NP] _VP asserts that a Verb Phrase (VP) can consist of a verb followed by a Noun Phrase (NP). The theory of phrase structure is sometimes called X-BAR THEORY, as the phrasal labels are alternatively written with a "bar" over the lexical category instead of a P after it: V" (for VP), N" (for NP), and so on.

Principles of this sort determine that every sentence has as part of its linguistic description a tree-like structure, which shows how the component phrases are related to each other by these three relations:

(2): a.

: b. [[John]_NP [[sees]_V [[the]_Art [boys]_N]_NP]_VP [clearly]_Adv]_VP]_S

The two representations (a,b) are identical in their content -- they both represent the phrase structure of the sentence John sees the boys clearly. The tree representation is often used because it is easier to read. "S" stands for sentence.

Ambiguity of phrasing arises when a single string of words can be associated with two different phrase structures according to the phrase definitions of a language; for example:

(3): John sees [the boys with the telescope]_NP; John sees [the boys]_NP with the telescope

In one structure, the boys have the telescope; in the other, John does.

Great uniformity is found in how the three relations in (1) are instantiated in languages. One dimension of variation is the position of the head in its phrase. In a language like English, the head uniformly precedes its arguments (except for the subject); in Japanese, the head uniformly follows its arguments:

(4)

English				Japanese
VP:	[V NP]_VP	read a book	[NP]_VP	sakana o taberu 'fish eat'
PP:	[P NP]_PP	to New York	[NP P]_PP	New York ni 'New York to'

We see here that in Japanese not only does the argument of the verb precede the verb, but the object of the preposition precedes the preposition (for which reason, it is called a postposition). Languages with postpositions instead of prepositions always show "object-verb" word order; because the verb and preposition are the heads of their respective phrases, this suggests that the head-order parameter is set once and for all for a given language, all phrases in that language taking the same value (Greenberg 1963).

Thus every language instantiates the "argument of" relation as a lexical item (the head) combined with a phrase (the argument), but languages differ in where the head stands in relation to the argument phrase. The rich concepts here -- the notion "phrase," the notion "argument of," and so on -- are innate; what must be learned is only the left-right order of head and argument. The ratio of learned things to innate things here is typical of syntax. The parameters of syntactic variation, of which head position is one, appear to be limited in number, and individually, limited in scope. The head parameter, for example, can only take "left" and "right" as values, meaning that there are only two types of languages in regard to head position.

Another aspect of the syntactic structure of a sentence is "movement" relations that hold between one syntactic position in a sentence and another. Among these are the relation of a question word in a question to the grammatical position in the sentence on which the question pivots:

(5): What_i does John think that Bill wants t_i?

The position of the "trace" (t_i) is the "understood" position of the wh-word what -- the question is about the "object" of Bill's desires, so to speak. What is an "argument of" wants but does not appear in the correct position for such arguments; it has been "moved" to the front of the sentence. The relation between the moved wh-phrase and its understood position (marked by its trace, t_i) is called WH-MOVEMENT. (5) is understood to have a phrase-structure representation of the following form (its "d(eep) = structure") transformed into (5) by an operation moving the wh-word to the front:

(6): John does think that Bill wants what?

Wh-movement is an instance of what is called a "grammatical transformation."

Wh-movement is a relation of very particular character, as the following examples might suggest, if examined closely:

(7): a. What_i does John think that Bill said that Mary would like t_i?; b. *What_i does John think that t_i would please Bill?; c. *What_i does John think t_i time it is?; d. *What_i does John wonder who has t_i?

The prefixed asterisk is used to mark ungrammatical strings of words, but strings that would correspond to reasonable questions, if they were grammatical. (7a) suggests that the relation may span perhaps an arbitrarily large amount of sentence structure. But the rest of the examples suggest sharp limitations on the relation. The movement relation has been studied in great detail in a number of languages and language families. A core set of restrictions on the relation, some of which are illustrated in (7), have been found to hold universally. For example, (7d) illustrates what has been called the "Empty Category Principle" (ECP; Chomsky 1981a).

The movement relation occurs in a number of sentence and clause types besides questions:

(8)

a.	[What a fool]_i John turned out to be t_i.	(exclamative)
b.	The man who_i Mary thinks she met t_i.	(relative clause)
c.	Happy_i though Mary is t_i, she is still insensitive to others.	(though-clause)
d.	John_i I think I saw t_i in the store yesterday.	(topicalized clause)
e.	John saw [more people]_i than we thought he had seen t_i.	(comparative clause)

Exactly the same restrictions illustrated in (7) for questions hold for all these further types as well, suggesting the deep systematicity of the principles involved; for example, the ECP holds for comparative clauses, as the following shows:

(9): a. *John saw [more people]_i we that thought that t_i were there.; b. John saw [more people]_i that we thought t_i were there.

The mental computation of the relation of a wh-word to its trace is easily detected by psycholinguistic testing of on-line real-time sentence comprehension, on which it imposes an extra processing load.

Languages can differ in some limited ways in this aspect of sentence structure -- for example, in whether the wh-trace relation is instantiated for a given sentence type in the language. Chinese, unlike English, does not use movement for questions; the wh-word stays in its "original" position (Huang 1982):

(10): Zhangsan xiangxin [shei mai-le shu]; Zhangsan believes [who bought books]; 'Who does Zhangsan believe bought books?'

But if a language instantiates a sentence type with wh-movement, the movement will have the same very particular character it has in every other language that instantiates it.

The syntax of a language describes a set of forms (as in (3)) with which a sound and meaning can be associated, and so mediates between these two palpable aspects of a sentence. The interface of syntax to the sound and meaning components of the language system seems again largely universal. For example, languages typically have pronouns like the English reflexive and reciprocal pronouns (himself, each other) that require antecedents in the same utterance in which the pronouns themselves occur; such pronouns are called anaphors (see ANAPHORA):

(11): John likes himself.; *Why did John succeed? Because himself is ambitious.; Why did John succeed? Because he is ambitious.

A universal property of anaphors is that their antecedents cannot be contained in a phrase that does not contain the anaphor itself:

(12): a. [John's mother]_NP likes herself.; b.*[John's mother]_NP likes himself.

In both cases, John is contained in the subject NP John's mother; in the first case, the antecedent is John's mother, and so the "containment condition" is satisfied; but in the second case, John is the antecedent, so the condition is violated. This "containment condition" (known as the "c-command condition") is a part of Binding Theory (see BINDING THEORY), which treats in a general way pronominal antecedence and its relation to syntactic structure.

Typically, the relation between anaphor and antecedent is governed by a locality condition -- the anaphor and antecedent cannot be "too far apart;" in English, the pronoun cannot occur in an embedded clause that does not contain the antecedent:

(13): *John thinks that [Mary likes himself]_S

The English locality condition is not universal, however; languages differ in what "too far apart" means. In Icelandic, for example, the reflexive pronoun sig can be separated from its antecedent by a subjunctive clause boundary but not an indicative one:

(14): a. Jon_i segir [að Maria elskar sig_i]_{Subjunctive Clause}; 'Jon says that Maria loves himself'; b.*Jon_i segir [að Maria elski sig_i]_{Indicative Clause}; 'Jon says that Maria loves himself'

The indices here indicate which NP is the antecedent of sig. Although the domain of anaphors is not fixed universally, there is only a small list of possible domains (Wexler and Manzini 1982). Although Icelandic differs from English in the details of the locality condition, it obeys the same containment condition mentioned earlier, as do all languages.

As with the case of phrase types, what is universal in the syntax of anaphors is considerable: the notion of anaphor, the necessity of antecedents for anaphors, the containment condition for anaphors, and the notion of locality condition. Beyond this, we see a slim range of linguistic variation: the identification of the domain of locality for anaphors. The language learner has simply to identify the anaphors in her language and identify the domain of locality in order for the full behavior of anaphors to be determined.

The syntactic system interfaces with lexical knowledge as well. The syntactic system of a language defines a set of general sentence patterns for that language, but any given lexical item will fit in only a subset of these. For example, the English VP could be described as a pattern of the following sort:

(15): V NP NP PP AP S AdvP

where only the head (V) is a necessary part of the phrase; but different verbs will in general match only limited subpatterns of this general pattern:

(16): think: [V S] "think that Bill was sick"; persuade: [V NP S] "persuade him that Bill was sick"; [V NP PP] "persuade him of my good intentions"

These must be learned along with the verb's meaning and other properties (see WORD MEANING, ACQUISITION OF). It is an open but much pursued question how much of the syntactic parameterization of a language might be reducible to aspects of lexical learning (see SYNTAX, ACQUISITION OF). The principal obstacle to firm conclusions is that the LEXICON, or the human lexical ability, is comparatively less well understood than the syntactic system.

The most productive vein of research in syntax in recent years has been the comparison of closely related languages. The goal has been to discover the "minimum" differences between languages, as it stands to reason that these will correlate with the actual parameters of the syntactic system.

French and Italian are two closely related Romance languages with a signal difference in how the subject of the sentence is expressed. French, like English, requires a subject for every clause; but Italian permits the omission of subjects that are understood:

(17): Gianni credo che ha molto argento. (Italian); *Jean pense que a beaucoup d'argent. (French); Jean pense qu'il a beaucoup d'argent. (French); John thinks that he has a lot of money. (English); *John thinks that has a lot of money. (English)

By itself this difference between French and Italian might be of little general interest, but in fact it appears to correlate with other differences (Perlmutter 1978; Rizzi 1982). French is like English in blocking movement of the subject of embedded clauses when the "complementizer" that (que in French, che in Italian) is present, but Italian has no such restriction:

(18): Chi_i credo che t_i ha molto argento? (Italian); *Qui_i pense-tu que t_i a beaucoup d'argent? (French); *Who do you think that has a lot of money? (English)

And Italian permits its subject to appear in postposed postition, after the verb, but French, like English, excludes this:

(19): Credo che t_i ha molto argento Sergio_i. (Italian); *Je crois que t_i a beaucoup d'argent Sergio _i. (French); *I believe that t_i has lots of money Sergio_i. (English)

In all three cases, Italian differs from English in permitting the trace of a moved or deleted subject in subject position under various circumstances. As there are other language pairs that differ in the same way, it is likely that these three differences between French and Italian are related and that in fact each is a manifestation of a single grammatical "parameter" set differently for French and Italian, a parameter governing the expression of the subject. Experiments in language acquisition have confirmed this view (Hyams 1986).

The mapping of the syntactic parameters through detailed study of language comparisons like the one just mentioned has been the principal goal of research in syntax in the 1980s and '90s. The general theory of syntax has been forced to become quite abstract in order to accommodate the parameterizations in a straightforward way, but the compensation has been a deeper understanding of what the range of possible human syntactic systems looks like.

-- Edwin Williams

References

Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.

Chomsky, N. (1981a). Lectures on Government and Binding. Dordrecht: Foris.

Chomsky, N. (1981b). Principles and parameters of linguistic theory. In N. Hornstein and D. Lightfoot, Eds., Explanations in Linguistics. Longmans, pp. 123-146.

Crain, S. (1990). Language learning in the absence of experience. Brain and Behavioral Science 14:597-650.

Greenberg, J. (1963). Universals of Language. Cambridge, MA: MIT Press.

Huang, J. (1982). Logical Relations in Chinese and the Theory of Grammar. Ph.D. diss., MIT.

Hyams, N. (1986). Language Acquisition and the Theory of Parameters. Dordrecht: Reidel

Perlmutter, D. (1978). Impersonal passives and the unaccusative hypothesis. In J. Jaeger et al., Eds., Proceedings of the Fourth Annual Meeting of the Berkeley Linguistics Society, 1978:157-189.

Rizzi, L. (1982). Issues in Italian Syntax. Dordrecht: Foris.

Wexler, K., and R. Manzini. (1982). Parameters and learnability in binding theory. In T. Roeper and E. Williams, Eds., Parameter Setting. Dordrecht: Reidel, pp. 41-89.