An *automaton* (pl. *automata*) was originally anything
with the power of self-movement, then, more specifically, a
*machine* with the power of self-movement, especially a figure
that simulated the motion of living beings. Perhaps the most
impressive such automata were those of Jacques de Vaucanson (1709-1782), including a duck that ate and drank with realistic motions of
head and throat, produced the sound of quacking, and could pick up
cornmeal and swallow, digest, and excrete it.

People acting in a mechanical, nonspontaneous way came to be called
automata, but this begs the very question for which cognitive science
seeks a positive answer: "Is the working of the human mind reducible
to information processing embodied in the workings of the human
brain?" that is, is human spontaneity and intelligence a purely
material phenomenon? René DESCARTES
(1596-1650) saw the functioning of nonhuman animals, and much of
human function, as being explainable in terms of the automata of his
day but drew the line at cognitive function. However, whereas
Descartes's view was based on clockwork and hydraulic automata, most
cognitive science is based on a view of automata as "information
processing machines" (though there is now a welcome increase of
interest in *embodied automata*).

The present article describes key concepts of information processing
automata from 1936 through 1956 (the year of publication of
*Automata Studies* by Shannon and McCarthy), including Turing
machines, finite automata, automata for formal languages,
McCulloch-Pitts neural networks, and self-reproducing automata.

TURING (1936) and Post (1936) introduced
what is now called a *Turing machine* (TM), consisting of a
control box containing a finite program; an indefinitely extendable
tape divided lengthwise into squares; and a device for scanning and
then printing on one square of the tape at a time and subsequently
moving the tape one square left or right or not at all. We start the
machine with a finite sequence of symbols on the tape, and a program
in the control box. The symbol scanned, and the instruction now being
executed from the program, determine what new symbol is printed on the
square, how the tape is moved, and what instruction is executed next.

We associate with a TM Z a numerical function
`f`_{Z} by placing a number `n`
encoded as a string <`n`> on the tape and start Z
scanning the leftmost square of <`n`>. If and when Z
stops, we decode the result to obtain the number
`f`_{Z}(`n`). If Z never stops, we leave
`f`_{Z}(`n`) undefined. Just consider the
program Z, which always moves right on its tape (new squares are added
whenever needed), as an example of a machine that never stops
computing, no matter what its input. More subtle machines might, for
example, test to see whether `n` is prime and stop computing
only if that is the case. (We can only associate
`f`_{Z} with Z after we have chosen our
encoding. Nonnumerical functions can be associated with a TM once we
have a "straightforward" way of unambiguously coding the necessarily
finite or countable input and output structures on the tape.)

*Turing's Hypothesis* (also called *Church's thesis*;
see CHURCH-TURING THESIS) is that a function
is effectively computable if and only if it is computable by some
TM. This statement is informal, but each attempt to formalize the
notion of effectiveness using finite "programs" has yielded procedures
equivalent to those implementable by TMs (or a subclass thereof).

Because each TM is described by a finite list of instructions we may
*effectively enumerate* the TMs as `Z`_{1},
`Z`_{2}, `Z`_{3}, . . . -- given
`n` we may effectively find `Z`_{n},
and given the list of instructions for Z, we may effectively find the
`n` for which Z = Z` _{n}`. For example, we might
list all the one-instruction programs first, then all those with two
instructions, and so on, listing all programs of a given length in
some suitable generalization of alphabetical order.

Turing showed that there is a *universal Turing Machine* that,
given a coded description of Z_{n} on its tape as
well as `x`, will proceed to compute
`f`_{Zn}(`x`)
= `f`_{n}(`x`), if it is defined.
This is obvious if we accept Turing's hypothesis, for given
`n` and `x` we find Z_{n}
effectively, and then use it to compute
`f`_{n}(`x`), and so there should
exist a TM to implement the effective procedure of going from the pair
(`n`, `x`) to the value
`f`_{n}(`x`). More directly, we
can program a Turing machine U that divides the data on its tape into
two parts, that on the left providing the instructions for
Z_{n}, and that on the right providing the string
`x`(`t`) on which Z_{n} would
now be computing. U is programmed to place markers against the current
instruction from Z_{n} and the currently scanned
symbol of `x`(`t`), and to move back and forth
between instructions and data to simulate the effect of
Z_{n} on `x`. For further details see
Minsky (1967) and Arbib (1969), the second of which generalizes Turing
machines to those that can work in parallel on multiple, possibly
multidimensional tapes.

Is every function mapping natural numbers (`N` = {0, 1, 2, 3,
. . .}) to natural numbers computable? Obviously not, for we have
seen that the number of computable functions is countable, whereas the
number of all functions is uncountable. To understand the latter
claim, note that any real number between zero and one can be
represented by an infinite decimal expansion
0.`d`_{0}`d`_{1}`d`_{2}`d`_{3}. . . ` d_{n}`
. . . and thus as a function

The following provides a simple example of an explicit proof of
noncomputability. We define a *total* function `h`
(i.e., `h`(`x`) is defined for every `x` in
`N`) as follows. Let `h`(`n`) =
`n` if `f`_{n} is itself total,
while `h`(`n`) = `n`_{0} if
`f`_{n} is not total -- where
`n`_{0} is a fixed choice of an integer for which
`f`_{n0} is a total computable
function; `h` thus has the interesting property that a
computable function `f`_{n} is total if
and only if `n` = `h`(`m`) for some
`m`. `h` is certainly a well-defined total function,
but is `h` a *computable* function? The answer is
no. For if `h` were computable, then so too would be the
function `f` defined by `f`(`n`) =
`f`_{h(n)}(`n`) + 1,
and `f` would also be total. Then `f` is total and
computable, and so `f` =
`f`_{h(m)} for some
`m` so that `f`(`m`) =
`f`_{h(m)}(`m`). But,
by definition, `f`(`m`) =
`f`_{h(m)}(`m`) + 1, a
contradiction! This is one example of the many things undecidable by
any effective procedure.

The most famous example -- proved by an extension of the above proof
-- is that we cannot tell effectively for arbitrary (`n`,
`x`) whether `Z`_{n} will stop
computing for input `x`. Thus *the halting problem for TMs
is unsolvable.*

Finite automata provide, essentially, the input-output behavior of the
TM control box without regard for its interaction with the tape. A
*finite automaton* is abstractly described as a quintuple
`M` = (`X`, `Y`, `Q`, δ,
β) where `X`, `Y`, and `Q` are finite
sets of inputs, outputs, and states, and if at time `t` the
automaton is in state `q` and receives input `x`, it
will emit output β(`q`) at time `t`, and then
make the transition to state δ(`q`, `x`) at
time `t` + l.

Given this, we can view the control box of a Turing Machine as a
finite automaton: let `X` be the set of tape symbols,
`Q` the set of control-box instructions, and `Y` the
set `X` × `M` of pairs (`x`,
`m`) where `x` is symbol to be printed on the tape
and `m` is a symbol for one of the three possible tape moves
(left, right, or not at all).

Another special case of the finite automaton definition takes
`Y` to be the set {0,1} -- this is equivalent to dividing the
set `Q` into the set `F` of designated
*final* states for which β(`q`) = 1, and its
complement -- and then designates a specific state
`q`_{0} of `Q` to be the "initial state." In
this case, interest focuses on the *language accepted* by
`M`, namely the set of strings `w` consisting of a
sequence of 0 or more elements of `X` (we use `X`*
to denote the set of such strings) with the property that if we start
`M` in state `q`_{0} and apply input
sequence `w`, then `M` will end up in a state --
denoted by δ*(`q`_{0}, `w`) -- that
belongs to `F`.

The above examples define two sets of "languages" (in the sense of
subsets of `X`*, i.e., strings on some specified alphabet):
*finite-state languages* defined as those languages accepted by
some finite automaton, and *recursively enumerable sets* that
have many equivalent definitions, including "a set `R` is
*recursively enumerable* if and only if there is a Turing
machine Z such that Z halts computation on initial tape `x` if
and only if `x` belongs to `R`." Without going into
details and definitions here, it is interesting to note that two
intermediate classes of formal languages -- the context-free languages
and the context-sensitive languages -- have also been associated with
classes of automata. The former are associated with *push-down
automata* and the latter with *linear-bounded
automata*. Discussion of the relation of these language classes to
human language was a staple topic for discussion in the 1960s (Chomsky
and Miller 1963; Chomsky 1963; Miller and Chomsky 1963), and Chomsky
has subsequently defined a number of other systems of formal grammar
to account for human language competence. However, much work in PSYCHOLINGUISTICS now focuses on the claim that
adaptive networks provide a better model of language performance than
does any system based on using some fixed automaton structure which
embodies such a formal grammar (e.g., Seidenberg 1995).

McCulloch and Pitts (1943) modeled the neuron as a logic element,
dividing time into units so small that in each time period at most one
spike can be initiated in the axon of a given neuron, with output 1 or
0 coding whether or not the neuron "fires" (i.e., has a spike on its
axon). Each connection from neuron `i` to neuron
`j` has an attached *synaptic weight*. They also
associate a *threshold* with each neuron, and assume exactly
one unit of delay in the effect of all presynaptic inputs on the
cell's output. A McCulloch-Pitts neuron (MP-neuron) fires just in case
the weighted value of its inputs at that time reaches threshold.

Clearly, a network of MP-neurons functions like a finite automaton, as each neuron changes state synchronously on each tick of the time scale. Conversely, it was shown, though inscrutably, by McCulloch and Pitts that any finite automaton can be simulated by a suitable network of MP-neurons -- providing formal "brains" for the TMs, which can carry out any effective procedure. Knowledge of these results inspired VON NEUMANN's logical design for digital computers with stored programs (von Neumann, Burks, and Goldstine 1947-48).

Intuitively, one has the idea that a construction machine must build a
simpler machine. But in biological systems, the complexity of an
offspring matches that of the parent; and evolution may yield
increasing biological complexity. Von Neumann (1951) outlined the
construction of an automaton A that, when furnished with the
description of any other automaton M (composed from some suitable
collection of elementary parts) would construct a copy of M. However,
A is *not* self-reproducing: A, supplied with a copy of its own
description, will build a copy of A *without* its own
description. The passage from a *universal constructor* to a
*self-reproducing automaton* was spelled out in von Neumann
(1966) in which the "organism" is a pattern of activity in an
unbounded array of identical finite automata, each with only
twenty-nine states. This was one root of the study of *cellular
automata*.

As embodied humans we have far richer interactions with the world than
does a TM; our "computations" involve all of our bodies, not just our
brains; and biological neurons are amazingly more subtle than
MP-neurons (Arbib 1995, pp. 4-11). A TM might in principle be able
to solve a given problem -- but take far too long to solve it. Thus it
is not enough that something be computable -- it must be
*tractable*, that is, computable with available resources
(Garey and Johnson 1979). Distributed computation may render tractable
a problem that a serial TM would solve too slowly. When some people
say "the brain is a computer," they talk as if the notion of
"computer" were already settled. However, a human brain is built of
hundreds of regions, each with millions of cells, each cell with tens
of thousands of connections, each connection involving subtle
neurochemical processes. To understand this is to push the theory of
automata -- and cognitive science -- far beyond anything available
today. Our concepts of automata will grow immensely over the coming
decades as we better understand how the brain functions.

Arbib (1987) provides much further information on neural nets, finite automata, TMs, and automata that construct as well as compute, as well as a refutation of the claim that Gödel's Incompleteness theorem (see GÖDEL'S THEOREMS) sets limits to the intelligence of automata. Many articles in Arbib (1995) and in the present volume explore the theme of connectionist approaches to cognitive modeling (see COGNITIVE MODELING, CONNECTIONIST).

- COMPUTATION
- COMPUTATION AND THE BRAIN
- COMPUTATIONAL COMPLEXITY
- COMPUTATIONAL THEORY OF MIND
- FORMAL GRAMMARS
- VISUAL WORD RECOGNITION

Arbib, M. A. (1969). Theories of Abstract Automata. Englewood Cliffs, NJ: Prentice-Hall.

Arbib, M. A. (1987). Brains, Machines and Mathematics. Second edition. New York: Springer.

Arbib, M. A., Ed. (1995). The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press. (See pp. 4-11.)

Chomsky, N. (1963). Formal properties of grammars. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 323-418.

Chomsky, N., and G. A. Miller. (1963). Introduction to the formal analysis of natural languages. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 269-321.

Garey, M. R., and D. S. Johnson. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. New York: W. H. Freeman.

McCulloch, W. S., and W. H. Pitts. (1943). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5:115-133.

Miller, G. A., and N. Chomsky. (1963). Finitary models of language users. In R. D. Luce, R. R. Bush, and E. Galanter, Eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley, pp. 419-491.

Minsky, M. L. (1967). Computation: Finite and Infinite Machines. Englewood Cliffs, NJ: Prentice-Hall.

Post, E. L. (1936). Finite combinatory processes -- formulation I. Journal of Symbolic Logic 1:103-105.

Seidenberg, M. (1995). Linguistic morphology. In M. A. Arbib, Ed., The Handbook of Brain Theory and Neural Networks. Cambridge, MA: MIT Press.

Shannon, C. E., and J. McCarthy, Eds. (1956). Automata Studies. Princeton: Princeton University Press.

Turing, A. M. (1936). On computable numbers. Proc. London Math. Soc. Ser. 2, 42:230-265.

von Neumann, J. (1951). The general and logical theory of automata. In L. A. Jeffress, Ed., Cerebral Mechanisms in Behaviour. New York: Wiley.

von Neumann, J. (1966). The Theory of Self-Reproducing Automata. Edited and completed by Arthur W. Burks. Urbana: University of Illinois Press.

von Neumann, J., A. Burks, and H. H. Goldstine.
(1947-48). Planning and Coding of Problems for an Electronic
Computing Instrument. Institute for Advanced Study,
Princeton. (Reprinted in von Neumann's *Collected Works*
5:80-235.)