An automaton (pl. automata) was originally anything with the power of self-movement, then, more specifically, a machine with the power of self-movement, especially a figure that simulated the motion of living beings. Perhaps the most impressive such automata were those of Jacques de Vaucanson (1709-1782), including a duck that ate and drank with realistic motions of head and throat, produced the sound of quacking, and could pick up cornmeal and swallow, digest, and excrete it.
People acting in a mechanical, nonspontaneous way came to be called automata, but this begs the very question for which cognitive science seeks a positive answer: "Is the working of the human mind reducible to information processing embodied in the workings of the human brain?" that is, is human spontaneity and intelligence a purely material phenomenon? René DESCARTES (1596-1650) saw the functioning of nonhuman animals, and much of human function, as being explainable in terms of the automata of his day but drew the line at cognitive function. However, whereas Descartes's view was based on clockwork and hydraulic automata, most cognitive science is based on a view of automata as "information processing machines" (though there is now a welcome increase of interest in embodied automata).
The present article describes key concepts of information processing automata from 1936 through 1956 (the year of publication of Automata Studies by Shannon and McCarthy), including Turing machines, finite automata, automata for formal languages, McCulloch-Pitts neural networks, and self-reproducing automata.
TURING (1936) and Post (1936) introduced what is now called a Turing machine (TM), consisting of a control box containing a finite program; an indefinitely extendable tape divided lengthwise into squares; and a device for scanning and then printing on one square of the tape at a time and subsequently moving the tape one square left or right or not at all. We start the machine with a finite sequence of symbols on the tape, and a program in the control box. The symbol scanned, and the instruction now being executed from the program, determine what new symbol is printed on the square, how the tape is moved, and what instruction is executed next.
We associate with a TM Z a numerical function fZ by placing a number n encoded as a string <n> on the tape and start Z scanning the leftmost square of <n>. If and when Z stops, we decode the result to obtain the number fZ(n). If Z never stops, we leave fZ(n) undefined. Just consider the program Z, which always moves right on its tape (new squares are added whenever needed), as an example of a machine that never stops computing, no matter what its input. More subtle machines might, for example, test to see whether n is prime and stop computing only if that is the case. (We can only associate fZ with Z after we have chosen our encoding. Nonnumerical functions can be associated with a TM once we have a "straightforward" way of unambiguously coding the necessarily finite or countable input and output structures on the tape.)
Turing's Hypothesis (also called Church's thesis; see CHURCH-TURING THESIS) is that a function is effectively computable if and only if it is computable by some TM. This statement is informal, but each attempt to formalize the notion of effectiveness using finite "programs" has yielded procedures equivalent to those implementable by TMs (or a subclass thereof).
Because each TM is described by a finite list of instructions we may effectively enumerate the TMs as Z1, Z2, Z3, . . . -- given n we may effectively find Zn, and given the list of instructions for Z, we may effectively find the n for which Z = Zn. For example, we might list all the one-instruction programs first, then all those with two instructions, and so on, listing all programs of a given length in some suitable generalization of alphabetical order.
Turing showed that there is a universal Turing Machine that, given a coded description of Zn on its tape as well as x, will proceed to compute fZn(x) = fn(x), if it is defined. This is obvious if we accept Turing's hypothesis, for given n and x we find Zn effectively, and then use it to compute fn(x), and so there should exist a TM to implement the effective procedure of going from the pair (n, x) to the value fn(x). More directly, we can program a Turing machine U that divides the data on its tape into two parts, that on the left providing the instructions for Zn, and that on the right providing the string x(t) on which Zn would now be computing. U is programmed to place markers against the current instruction from Zn and the currently scanned symbol of x(t), and to move back and forth between instructions and data to simulate the effect of Zn on x. For further details see Minsky (1967) and Arbib (1969), the second of which generalizes Turing machines to those that can work in parallel on multiple, possibly multidimensional tapes.
Is every function mapping natural numbers (N = {0, 1, 2, 3, . . .}) to natural numbers computable? Obviously not, for we have seen that the number of computable functions is countable, whereas the number of all functions is uncountable. To understand the latter claim, note that any real number between zero and one can be represented by an infinite decimal expansion 0.d0d1d2d3. . . dn . . . and thus as a function f:N → N with f(n) = dn -- and thus the uncountable set of these real numbers can be viewed as a subset of the set of all functions from N to N. However, the interest of computability theory is that there are "computationally interesting" functions that are not computable.
The following provides a simple example of an explicit proof of noncomputability. We define a total function h (i.e., h(x) is defined for every x in N) as follows. Let h(n) = n if fn is itself total, while h(n) = n0 if fn is not total -- where n0 is a fixed choice of an integer for which fn0 is a total computable function; h thus has the interesting property that a computable function fn is total if and only if n = h(m) for some m. h is certainly a well-defined total function, but is h a computable function? The answer is no. For if h were computable, then so too would be the function f defined by f(n) = fh(n)(n) + 1, and f would also be total. Then f is total and computable, and so f = fh(m) for some m so that f(m) = fh(m)(m). But, by definition, f(m) = fh(m)(m) + 1, a contradiction! This is one example of the many things undecidable by any effective procedure.
The most famous example -- proved by an extension of the above proof -- is that we cannot tell effectively for arbitrary (n, x) whether Zn will stop computing for input x. Thus the halting problem for TMs is unsolvable.
Finite automata provide, essentially, the input-output behavior of the TM control box without regard for its interaction with the tape. A finite automaton is abstractly described as a quintuple M = (X, Y, Q, δ, β) where X, Y, and Q are finite sets of inputs, outputs, and states, and if at time t the automaton is in state q and receives input x, it will emit output β(q) at time t, and then make the transition to state δ(q, x) at time t + l.
Given this, we can view the control box of a Turing Machine as a finite automaton: let X be the set of tape symbols, Q the set of control-box instructions, and Y the set X × M of pairs (x, m) where x is symbol to be printed on the tape and m is a symbol for one of the three possible tape moves (left, right, or not at all).
Another special case of the finite automaton definition takes Y to be the set {0,1} -- this is equivalent to dividing the set Q into the set F of designated final states for which β(q) = 1, and its complement -- and then designates a specific state q0 of Q to be the "initial state." In this case, interest focuses on the language accepted by M, namely the set of strings w consisting of a sequence of 0 or more elements of X (we use X* to denote the set of such strings) with the property that if we start M in state q0 and apply input sequence w, then M will end up in a state -- denoted by δ*(q0, w) -- that belongs to F.
The above examples define two sets of "languages" (in the sense of subsets of X*, i.e., strings on some specified alphabet): finite-state languages defined as those languages accepted by some finite automaton, and recursively enumerable sets that have many equivalent definitions, including "a set R is recursively enumerable if and only if there is a Turing machine Z such that Z halts computation on initial tape x if and only if x belongs to R." Without going into details and definitions here, it is interesting to note that two intermediate classes of formal languages -- the context-free languages and the context-sensitive languages -- have also been associated with classes of automata. The former are associated with push-down automata and the latter with linear-bounded automata. Discussion of the relation of these language classes to human language was a staple topic for discussion in the 1960s (Chomsky and Miller 1963; Chomsky 1963; Miller and Chomsky 1963), and Chomsky has subsequently defined a number of other systems of formal grammar to account for human language competence. However, much work in PSYCHOLINGUISTICS now focuses on the claim that adaptive networks provide a better model of language performance than does any system based on using some fixed automaton structure which embodies such a formal grammar (e.g., Seidenberg 1995).
McCulloch and Pitts (1943) modeled the neuron as a logic element, dividing time into units so small that in each time period at most one spike can be initiated in the axon of a given neuron, with output 1 or 0 coding whether or not the neuron "fires" (i.e., has a spike on its axon). Each connection from neuron i to neuron j has an attached synaptic weight. They also associate a threshold with each neuron, and assume exactly one unit of delay in the effect of all presynaptic inputs on the cell's output. A McCulloch-Pitts neuron (MP-neuron) fires just in case the weighted value of its inputs at that time reaches threshold.
Clearly, a network of MP-neurons functions like a finite automaton, as each neuron changes state synchronously on each tick of the time scale. Conversely, it was shown, though inscrutably, by McCulloch and Pitts that any finite automaton can be simulated by a suitable network of MP-neurons -- providing formal "brains" for the TMs, which can carry out any effective procedure. Knowledge of these results inspired VON NEUMANN's logical design for digital computers with stored programs (von Neumann, Burks, and Goldstine 1947-48).
Intuitively, one has the idea that a construction machine must build a simpler machine. But in biological systems, the complexity of an offspring matches that of the parent; and evolution may yield increasing biological complexity. Von Neumann (1951) outlined the construction of an automaton A that, when furnished with the description of any other automaton M (composed from some suitable collection of elementary parts) would construct a copy of M. However, A is not self-reproducing: A, supplied with a copy of its own description, will build a copy of A without its own description. The passage from a universal constructor to a self-reproducing automaton was spelled out in von Neumann (1966) in which the "organism" is a pattern of activity in an unbounded array of identical finite automata, each with only twenty-nine states. This was one root of the study of cellular automata.
As embodied humans we have far richer interactions with the world than does a TM; our "computations" involve all of our bodies, not just our brains; and biological neurons are amazingly more subtle than MP-neurons (Arbib 1995, pp. 4-11). A TM might in principle be able to solve a given problem -- but take far too long to solve it. Thus it is not enough that something be computable -- it must be tractable, that is, computable with available resources (Garey and Johnson 1979). Distributed computation may render tractable a problem that a serial TM would solve too slowly. When some people say "the brain is a computer," they talk as if the notion of "computer" were already settled. However, a human brain is built of hundreds of regions, each with millions of cells, each cell with tens of thousands of connections, each connection involving subtle neurochemical processes. To understand this is to push the theory of automata -- and cognitive science -- far beyond anything available today. Our concepts of automata will grow immensely over the coming decades as we better understand how the brain functions.
Arbib (1987) provides much further information on neural nets, finite automata, TMs, and automata that construct as well as compute, as well as a refutation of the claim that Gödel's Incompleteness theorem (see GÖDEL'S THEOREMS) sets limits to the intelligence of automata. Many articles in Arbib (1995) and in the present volume explore the theme of connectionist approaches to cognitive modeling (see COGNITIVE MODELING, CONNECTIONIST).
Arbib, M. A. (1969). Theories of Abstract Automata. Englewood Cliffs, NJ: Prentice-Hall.
Arbib, M. A. (1987). Brains, Machines and Mathematics. Second edition. New York: Springer.
Arbib, M. A., Ed. (1995). The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press. (See pp. 4-11.)
Chomsky, N. (1963). Formal properties of grammars. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 323-418.
Chomsky, N., and G. A. Miller. (1963). Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 269-321.
Garey, M. R., and D. S. Johnson. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman.
McCulloch, W. S., and W. H. Pitts. (1943). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5:115-133.
Miller, G. A., and N. Chomsky. (1963). Finitary models of language users. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 419-491.
Minsky, M. L. (1967). Computation: Finite and Infinite Machines. Englewood Cliffs, NJ: Prentice-Hall.
Post, E. L. (1936). Finite combinatory processes -- formulation I. Journal of Symbolic Logic 1:103-105.
Seidenberg, M. (1995). Linguistic morphology. In M. A. Arbib, Ed., The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press.
Shannon, C. E., and J. McCarthy, Eds. (1956). Automata Studies. Princeton: Princeton University Press.
Turing, A. M. (1936). On computable numbers. Proc. London Math. Soc. Ser. 2, 42:230-265.
von Neumann, J. (1951). The general and logical theory of automata. In L. A. Jeffress, Ed., Cerebral Mechanisms in Behaviour. New York: Wiley.
von Neumann, J. (1966). The Theory of Self-Reproducing Automata. Edited and completed by Arthur W. Burks. Urbana: University of Illinois Press.
von Neumann, J., A. Burks, and H. H. Goldstine. (1947-48). Planning and Coding of Problems for an Electronic Computing Instrument. Institute for Advanced Study, Princeton. (Reprinted in von Neumann's Collected Works 5:80-235.)