The study of neural networks is the study of information processing in networks of elementary numerical processors. In some cases these networks are endowed with a certain degree of biological realism and the goal is to build models that account for neurobiological data. In other cases abstract networks are studied and the goal is to develop a computational theory of highly parallel, distributed information-processing systems. In both cases the emphasis is on accounting for intelligence via the statistical and dynamic regularities of highly interconnected, large-scale networks.

Historically, neural networks arose as a number of loosely connected strands, many of which were subsequently absorbed into mainstream engineering disciplines. Some of the earliest research on neural networks (in the 1940s and 1950s) involved the study of interconnected systems of binary switches, or MCCULLOCH-PITTS neurons. This research also contributed to the development of AUTOMATA theory and dynamic systems theory. Links between these fields and neural networks continue to the present day.

Other early research efforts emphasized
adaptive systems. In the 1950s and 1960s, Widrow and others studied
adaptive linear systems and in particular the *LMS algorithm* (cf. SUPERVISED LEARNING IN MULTILAYER NEURAL NETWORKS). This work led
to the field of adaptive signal processing and provided the basis
for later extensions to nonlinear neural networks. Adaptive classifiers (systems
with a discrete output variable) were also studied during the same
period in the form of the "perceptron" algorithm
and related schemes; these developments contributed to the development
of the engineering field of PATTERN RECOGNITION,
which continues to house much neural network research. Finally, efforts
in the area of REINFORCEMENT LEARNING formed
a strand of neural network research with strong ties to CONTROL THEORY. In the 1980s, these ties were further solidified
by research establishing a link between reinforcement learning and
optimal control theory, in particular the optimization technique
of DYNAMIC PROGRAMMING.

Neural networks received much attention during the 1970s and 1980s, partly as a reaction against the prevailing symbolic approach to the study of intelligence in artificial intelligence (AI). Emphasizing architectures that largely dispense with centralized sequential processing and strict separation between process and data, researchers studied distributed processing in highly parallel architectures (cf. COGNITIVE MODELING, CONNECTIONIST). Intelligence was viewed in terms of mechanisms of CONSTRAINT SATISFACTION and pattern recognition rather than explicit symbol manipulation. A number of technical developments sustained research during this period, two of which stand out.

First, the dynamics of symmetrical
networks (networks in which a connection from node A to node B of
a given strength implies a connection from B to A of the same strength)
was elucidated by the discovery of *energy functions* (see RECURRENT NETWORKS). This allowed network dynamics to be understood
in terms of a (generally finite) set of *attractors,* points
in the state space toward which trajectories tend as the nodes in
the network are updated. This gave a satisfying formal interpretation
of constraint satisfaction in neural networks -- as the minimization of
an energy function -- and provided an interesting implementation
of an associative memory: the attractors are the memories.

The second important technical development was the discovery of a class of learning algorithms for general networks. The focus on learning algorithms can either be viewed as a natural outgrowth of the earlier research on adaptive algorithms for simple one-layer networks (e.g., the LMS algorithm and the perceptron), or as a necessity born of the fact that general networks are difficult to analyze and accordingly difficult to program. In any case, the algorithms have greatly extended the range of the networks that can be utilized in models and in practical applications, so much so that in AI and engineering the topic of neural networks has become essentially synonymous with the study of numerical learning algorithms.

The earliest successes were obtained
with SUPERVISED LEARNING algorithms. These
algorithms require an error signal at each of the output nodes of
the network. The paradigm case is that of the layered *feedforward
network,* a network with no feedback connections between layers
and no lateral connections within a layer. Input patterns are presented
at the first layer, and each subsequent layer is updated in turn,
resulting in an output at the final layer. This output is compared
to a desired output pattern, yielding an error signal. Algorithms differ
in how they utilize this error signal, but in one way or another
the error signal is propagated backward into the network to compute
updates to the weights and thereby decrease the error.

A wide variety of theoretical results
are available concerning neural network computation. Layered neural
networks have been shown to be *universal,* in the sense
of being able to represent essentially any function. *Best approximation* results
are available for large classes of feedforward networks. Recurrent
neural networks have been shown to be TURING- equivalent
and have also been shown to be able to represent a wide class of
nonlinear dynamic systems. A variety of results are also available
for supervised learning in neural networks. In particular, the Vapnik- Chervonenkis
(VC) dimension (a measure of the sample complexity of a learning
system; see COMPUTATIONAL LEARNING THEORY and STATISTICAL LEARNING THEORY) has been computed for simple networks,
and bounds on the VC dimension are available for more complex networks.
In classification problems, network learning algorithms have been
shown to converge to the posterior probabilities of the classes.
Methods from statistical physics have been utilized to characterize
learning curves. Finally, Bayesian statistical methods (see BAYESIAN LEARNING) have been exploited both for the analysis of
supervised learning and for the design of new algorithms.

Recent years have seen an increase
in interest in UNSUPERVISED LEARNING and a
concomitant growth in interest in fully probabilistic approaches
to neural network design. The unsupervised learning framework is
in many ways more powerful and more general than supervised learning,
requiring no error signal and no explicit designation of nodes as
input nodes or output nodes. One general way to approach the problem
involves specifying a *generative model* -- an explicit
model of the way in which the environment is assumed to generate
data. In the neural network setting, such models are generally realized
in the form of a network. The learner's uncertainty about
the environment is formalized by annotating the network with probabilities.
The learning problem in this setting becomes the classic statistical
problem of finding the best model to fit the data. The learner may
either explicitly manipulate an instantiation of the generative
model, or may utilize a network that is obtained by inverting the
generative model (e.g., via an application of Bayes's rule).
The latter network is often referred to as a *discriminative* network.

Probabilistic network models are studied in other areas of AI. In particular, BAYESIAN NETWORKS provide a general formalism for designing probabilistic networks. It is interesting to note that essentially all of the unsupervised learning architectures that have been studied in the neural network literature can be obtained by specifying a generative model in the form of a Bayesian network.

This rapprochement between neural networks and Bayes ian networks has a number of important consequences that are of current research interest. First, the Bayesian network formalism makes it natural to specify and manipulate prior knowledge, an ability that eluded earlier, nonprobabilistic neural networks. By associating a generative model with a neural network, prior knowledge can be more readily incorporated and posterior knowledge more readily extracted from the network. Second, the relationship between generative models and discriminative models can be exploited, yielding architectures that utilize feedback connections and lateral connectivity. Third, the strengths of the neural network focus on LEARNING -- particularly discriminative learning -- and the Bayesian network focus on inference can be combined. Indeed, learning and inference can be fruitfully viewed as two sides of the same coin. Finally, the emphasis on approximation techniques and laws of large numbers that is present in the neural network literature can be transferred to the Bayesian network setting, yielding a variety of methods for approximate inference in complex Bayesian networks.

- COGNITIVE ARCHITECTURE
- COMPUTATION AND THE BRAIN
- COMPUTATIONAL NEUROSCIENCE
- CONNECTIONISM, PHILOSOPHICAL ISSUES
- DISTRIBUTED VS. LOCAL REPRESENTATION
- MODELING NEUROPSYCHOLOGICAL DEFICITS

- Artificial Neural Networks
- FAQ in comp.ai.neural-nets -- monthly posting
- Neural Computation
- Neural Machines
- Neural Nets by Kevin Gurney
- Neural Network FAQ, part 1 of 7: Introduction
- Neural Networks
- Neural Networks: Newsgroups, List Servers, and Mailing Lists
- The MathWorks - Neural Network Toolbox
- Turing Machines are Recurrent Neural Networks

Bishop, C. M. (1995). Neural Networks for Pattern Recognition. New York: Oxford University Press.

Duda, R. O., and P. E. Hart. (1973). Pattern Classification and Scene Analysis. New York: Wiley.

Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. New York: Macmillan College Publishing.

Hertz, J., A. Krogh, and R. G. Palmer. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.

Jensen, F. (1996). An Introduction to Bayesian Networks. London: UCL Press.

Jordan, M. I., Ed. (1998). Learning in Graphical Models. Cambridge, MA: MIT Press.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.

Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer.

Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. New York: Wiley.