A formal theory of language acquisition (FTLA) can be defined as a mathematical investigation of the learnability properties of the class of human languages. Every FTLA can therefore be seen as an application of COMPUTATIONAL LEARNING THEORY to the problem of LANGUAGE ACQUISITION, one of the core problems of LEARNING (see also LEARNING SYSTEMS).
The need for FTLAs stems from one of the standard assumptions of linguistics: A successful theory must prove that the grammars proposed by linguists not only account for all linguistic data (descriptive adequacy) but also are the kind of objects that can be acquired on the kind of data and with the kind of cognitive resources that are typical of human language learning (explanatory adequacy).
In order to be properly stated, every FTLA requires four distinct components:
Given this characterization, every FTLA consists of either a proof that there is at least a learner in A that successfully acquires every language in L when success is defined as in C and data are presented as prescribed by M (a positive result) or a proof that there is at least a language in L that no learner in A can successfully acquire according to C and M (a negative result).
Although the importance of a positive result (typically the presentation of a model shown as the proof of the existence of a learning algorithm with the desired properties) is obvious, it must not be overlooked that negative results can be just as useful. In fact, as explained earlier, such results can be used to eliminate whole classes of theories that are descriptively but not explanatorily adequate.
Most recent FTLAs assume that, in human language learning, M consists of unordered and simple positive evidence. This assumption rests on twenty years of research in developmental psycholinguistics (reviewed in Marcus 1993), pointing to the conclusion that children receive a largely grammatical set of simple sentences from their target language with very little or no reliable instruction on what sentences are ungrammatical.
The criterion of success C that has been most commonly adopted is identification in the limit (Gold 1967): a learner is successful if and only if, for every language in L, it eventually stabilizes on a grammar that is equivalent to that of all the other speakers of that language (i.e., it yields the same grammaticality judgments and assigns to sentences the same meanings). Identification in the limit, however, can be argued to be too strict and too liberal a criterion at the same time. The criterion is too strict because the evolution of languages over time (see LANGUAGE VARIATION AND CHANGE) would appear to be problematic, barring language contact, if each generation acquired exactly the language of the previous one, as required by the criterion of identification in the limit. The criterion is too weak because children appear to learn their target language(s) in a very short time, whereas identification in the limit considers successful any learner that eventually stabilizes on a correct grammar, however long this might take.
These considerations seem to recommend as a plausible alternative the PAC criterion (Probably Approximately Correct; Valian 1984): a learner is successful if and only if, for every language in L, it is very likely (but not certain) to produce a grammar that is very close (but not necessarily equivalent) to the target grammar and do so not in the limit but in a very short time, measured as a function of how close it gets and how likely it is to do so (see COMPUTATIONAL LEARNING THEORY). As an element of a FTLA, however, the PAC criterion is not without problems of its own. For example, if the error of a conjecture with respect to a target language is measured as the probability of the environment in which the language is exhibited, presenting a string that the conjecture misclassifies, then the assumption that children only receive positive evidence has as a consequence that their conjectures have error zero even if they overgeneralize. In this respect, PAC would appear to be too weak a criterion because, empirically, human learners do not appear to overgeneralize in this fashion.
The L and the A components have traditionally been the locus of the most important differences among alternative FTLAs. Common restrictions on A include memory limitations, smoothness (successive hypotheses must not be very different from one another), continuity (every hypothesis is a possible adult grammar), maturation (some possible adult grammars cannot be part of the child's early hypotheses), and so on. A principled investigation of the effects of such restrictions on identification in the limit can be found in Jain et al. (forthcoming). At the time of writing, however, developmental psycholinguists have not reached the kind of consensus on A that was reached on M.
As for the L component, it must be noted that no existing FTLA is based on a formal definition of the class of human languages quite simply because such a definition is currently unavailable. Indeed, some have even argued against the scientific relevance of formally defining a language as a set of strings (Chomsky 1986). In practice, although ultimately an FTLA would have to explain the child's ability to learn every aspect of a target language, most existing FTLAs have respected the division of labor that is traditional in linguistics, so that there now are formal theories of the acquisition of SYNTAX, acquisition of PHONOLOGY and acquisition of word meaning.
Within the domain of syntax, for example, several very broad results have been established with respect to classes of languages generated by formal grammars. Positive learnability results have been established for the class of languages generated by suitably restricted Transformational Grammars (Wexler and Culicover 1980), the class generated by rigid CATEGORIAL GRAMMARS (Kanazawa 1994), and the class generated by a recently introduced formalism based on Chomsky's MINIMALISM (Stabler 1997).
It is an open question whether this division of labor can be recommended. Indeed, several nonformal theories have advocated one form or other of bootstrapping, the view that the acquisition of any one of these domains aids and must be aided by the acquisition of the other domains (see Pinker 1987; Gleitman 1990; Mazuka 1996 for semantic, syntactic, and prosodic bootstrapping respectively).
Many current FTLAs try to sidestep the problem of the unavailability of a formal characterization of L in two ways, either by explicitly modeling only the fragments of their intended domain (syntax, phonology, SEMANTICS) for which a formal grammar is available or by providing a meta-analysis of the learnability properties of every class of languages that can be generated, assuming various kinds of innate restrictions on the possible range of variation of human languages (as dictated, for example, by POVERTY OF THE STIMULUS ARGUMENTS; see also LINGUISTIC UNIVERSALS and INNATENESS OF LANGUAGE). Most such meta-analyses are based either on the Principles and Parameters Hypothesis (Chomsky 1981) or on OPTIMALITY THEORY (Prince and Smolensky 1993). For reviews of such analyses, see Bertolo (forthcoming) and Tesar and Smolensky (forthcoming), respectively. It is instructive to note that exactly the same kind of meta-analysis can be achieved also in connectionist models (see NEURAL NETWORKS and CONNECTIONIST APPROACHES TO LANGUAGE) when certain principled restrictions are imposed on their architecture (Kremer 1996).
Bertolo, S., Ed. (Forthcoming). Principles and Parameters and Learnability. Cambridge: Cambridge University Press.
Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris.
Chomsky, N. (1986). Knowledge of Language: Its Nature, Origins and Use. New York: Praeger.
Gleitman, L. (1990). The structural sources of verb meaning. Language Acquisition 1(1):3-55.
Gold, M. E. (1967). Language identification in the limit. Information and Control 10:447-474.
Jain, S., D. Osherson, J. Royer, and A. Sharma. (Forthcoming). Systems That Learn. 2nd ed. Cambridge, MA: MIT Press.
Kanazawa, M. (1994). Learnable Classes of Categorial Grammars. Ph.D. diss., Stanford University.
Kremer, S. (1996). A Theory of Grammatical Induction in the Connectionist Paradigm. Ph.D. diss., University of Alberta.
Marcus, G. (1993). Negative evidence in language acquisition. Cognition 46(1):53-85.
Mazuka, R. (1996). Can a grammatical parameter be set before the first word? Prosodic contributions to early setting of a grammatical parameter. In J. L. Morgan and K. Demuth, Eds., Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition. Hillsdale, NJ: Erlbaum, pp. 313-330.
Pinker, S. (1987). The bootstrapping problem in language acquisition. In B. MacWhinney, Ed., Mechanisms of Language Acquisition. Hillsdale, NJ: Erlbaum, pp. 399-441.
Prince, A., and P. Smolensky. (1993). Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report, Center for Cognitive Science, Rutgers University, New Brunswick, NJ.
Stabler, E. (1997). Acquiring and Parsing Languages with Movement. Unpublished manuscript, University of California, Los Angeles.
Tesar, B., and P. Smolensky. (Forthcoming). The learnability of Optimality Theory: an algorithm and some complexity results. Linguistic Inquiry.
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM 27:1134-1142.
Wexler, K., and P. W. Culicover. (1980). Formal Principles of Language Acquisition. Cambridge, MA: MIT Press.
Berwick, R. (1985). The Acquisition of Syntactic Knowledge. Cambridge, MA: MIT Press.
Bertolo, S. (1995). Maturation and learnability in parametric systems. Language Acquisition 4(4):277-318.
Brent, M. R. (1996). Advances in the computational study of language acquisition. Cognition 61:1-38.
Clark, R. (1992). The selection of syntactic knowledge. Language Acquisition 2(2):83-149.
Clark, R., and I. Roberts. (1993). A computational model of language learnability and language change. Linguistic Inquiry 24(2):299-345.
Clark, R. (1993). Finitude, boundedness and complexity. Learnability and the study of first language acquisition. In B. Lust, G. Hermon, and J. Kornfilt, Eds., Syntactic Theory and First Language Acquisition: Cross Linguistic Perspectives. Hillsdale, NJ: Erlbaum, pp. 473-489.
Clark, R. (1996a). Complexity and the Induction of Tree Adjoining Grammars. TechReport IRCS-96-14, University of Pennsylvania.
Clark, R. (1996b). Learning First Order Quantifier Denotations. An Essay in Semantic Learnability. TechReport IRCS-96-19, University of Pennsylvania.
Dresher, E., and J. Kaye. (1990). A computational learning model for metrical phonology. Cognition 34:137-195.
Fodor, J. (Forthcoming). Unambiguous triggers. Linguistic Inquiry.
Frank, R., and S. Kapur. (1996). On the use of triggers in parameter setting. Linguistic Inquiry 27(4):623-660.
Gibson, E., and K. Wexler. (1994). Triggers. Linguistic Inquiry 25(3):407-454.
Niyogi, P., and R. Berwick. (1995). The Logical Problem of Language Change. A.I. Memo no. 1516, MIT.
Niyogi, P., and R. Berwick. (1996). A language learning model for finite parameter spaces. Cognition 61:161-193.
Osherson, D., and S. Weinstein. (1984). Natural languages. Cognition 12(2):1-23.
Osherson, D., M. Stob, and S. Weinstein. (1986). Systems that Learn. Cambridge, MA: MIT Press.
Osherson, D., D. de Jongh, E. Martin, and S. Weinstein. (1997). Formal learning theory. In J. van Benthem and A. ter Meulen, Eds., Handbook of Logic and Language. Amsterdam: North Holland, pp. 737-775.
Pinker, S. (1979). Formal models of language learning. Cognition 7:217-283.
Pinker, S. (1984). Language Learnability and Language Development. Cambridge, MA: Harvard University Press.
Wacholder, N. (1995). Acquiring Syntactic Generalizations from Positive Evidence: An HPSG Model. Ph.D. diss., City University of New York.
Wexler, K., and R. Manzini. (1987). Parameters and learnability in binding theory. In T. Roeper and E. Williams, Eds., Parameter Setting. Dordrecht: Reidel.
Wu, A. (1994). The Spell-Out Parameters: A Minimalist Approach to Syntax. Ph.D. diss., University of California, Los Angeles.