The study of neural networks is the study of information processing in networks of elementary numerical processors. In some cases these networks are endowed with a certain degree of biological realism and the goal is to build models that account for neurobiological data. In other cases abstract networks are studied and the goal is to develop a computational theory of highly parallel, distributed information-processing systems. In both cases the emphasis is on accounting for intelligence via the statistical and dynamic regularities of highly interconnected, large-scale networks.
Historically, neural networks arose as a number of loosely connected strands, many of which were subsequently absorbed into mainstream engineering disciplines. Some of the earliest research on neural networks (in the 1940s and 1950s) involved the study of interconnected systems of binary switches, or MCCULLOCH-PITTS neurons. This research also contributed to the development of AUTOMATA theory and dynamic systems theory. Links between these fields and neural networks continue to the present day.
Other early research efforts emphasized adaptive systems. In the 1950s and 1960s, Widrow and others studied adaptive linear systems and in particular the LMS algorithm (cf. SUPERVISED LEARNING IN MULTILAYER NEURAL NETWORKS). This work led to the field of adaptive signal processing and provided the basis for later extensions to nonlinear neural networks. Adaptive classifiers (systems with a discrete output variable) were also studied during the same period in the form of the "perceptron" algorithm and related schemes; these developments contributed to the development of the engineering field of PATTERN RECOGNITION, which continues to house much neural network research. Finally, efforts in the area of REINFORCEMENT LEARNING formed a strand of neural network research with strong ties to CONTROL THEORY. In the 1980s, these ties were further solidified by research establishing a link between reinforcement learning and optimal control theory, in particular the optimization technique of DYNAMIC PROGRAMMING.
Neural networks received much attention during the 1970s and 1980s, partly as a reaction against the prevailing symbolic approach to the study of intelligence in artificial intelligence (AI). Emphasizing architectures that largely dispense with centralized sequential processing and strict separation between process and data, researchers studied distributed processing in highly parallel architectures (cf. COGNITIVE MODELING, CONNECTIONIST). Intelligence was viewed in terms of mechanisms of CONSTRAINT SATISFACTION and pattern recognition rather than explicit symbol manipulation. A number of technical developments sustained research during this period, two of which stand out.
First, the dynamics of symmetrical networks (networks in which a connection from node A to node B of a given strength implies a connection from B to A of the same strength) was elucidated by the discovery of energy functions (see RECURRENT NETWORKS). This allowed network dynamics to be understood in terms of a (generally finite) set of attractors, points in the state space toward which trajectories tend as the nodes in the network are updated. This gave a satisfying formal interpretation of constraint satisfaction in neural networks -- as the minimization of an energy function -- and provided an interesting implementation of an associative memory: the attractors are the memories.
The second important technical development was the discovery of a class of learning algorithms for general networks. The focus on learning algorithms can either be viewed as a natural outgrowth of the earlier research on adaptive algorithms for simple one-layer networks (e.g., the LMS algorithm and the perceptron), or as a necessity born of the fact that general networks are difficult to analyze and accordingly difficult to program. In any case, the algorithms have greatly extended the range of the networks that can be utilized in models and in practical applications, so much so that in AI and engineering the topic of neural networks has become essentially synonymous with the study of numerical learning algorithms.
The earliest successes were obtained with SUPERVISED LEARNING algorithms. These algorithms require an error signal at each of the output nodes of the network. The paradigm case is that of the layered feedforward network, a network with no feedback connections between layers and no lateral connections within a layer. Input patterns are presented at the first layer, and each subsequent layer is updated in turn, resulting in an output at the final layer. This output is compared to a desired output pattern, yielding an error signal. Algorithms differ in how they utilize this error signal, but in one way or another the error signal is propagated backward into the network to compute updates to the weights and thereby decrease the error.
A wide variety of theoretical results are available concerning neural network computation. Layered neural networks have been shown to be universal, in the sense of being able to represent essentially any function. Best approximation results are available for large classes of feedforward networks. Recurrent neural networks have been shown to be TURING- equivalent and have also been shown to be able to represent a wide class of nonlinear dynamic systems. A variety of results are also available for supervised learning in neural networks. In particular, the Vapnik- Chervonenkis (VC) dimension (a measure of the sample complexity of a learning system; see COMPUTATIONAL LEARNING THEORY and STATISTICAL LEARNING THEORY) has been computed for simple networks, and bounds on the VC dimension are available for more complex networks. In classification problems, network learning algorithms have been shown to converge to the posterior probabilities of the classes. Methods from statistical physics have been utilized to characterize learning curves. Finally, Bayesian statistical methods (see BAYESIAN LEARNING) have been exploited both for the analysis of supervised learning and for the design of new algorithms.
Recent years have seen an increase in interest in UNSUPERVISED LEARNING and a concomitant growth in interest in fully probabilistic approaches to neural network design. The unsupervised learning framework is in many ways more powerful and more general than supervised learning, requiring no error signal and no explicit designation of nodes as input nodes or output nodes. One general way to approach the problem involves specifying a generative model -- an explicit model of the way in which the environment is assumed to generate data. In the neural network setting, such models are generally realized in the form of a network. The learner's uncertainty about the environment is formalized by annotating the network with probabilities. The learning problem in this setting becomes the classic statistical problem of finding the best model to fit the data. The learner may either explicitly manipulate an instantiation of the generative model, or may utilize a network that is obtained by inverting the generative model (e.g., via an application of Bayes's rule). The latter network is often referred to as a discriminative network.
Probabilistic network models are studied in other areas of AI. In particular, BAYESIAN NETWORKS provide a general formalism for designing probabilistic networks. It is interesting to note that essentially all of the unsupervised learning architectures that have been studied in the neural network literature can be obtained by specifying a generative model in the form of a Bayesian network.
This rapprochement between neural networks and Bayes ian networks has a number of important consequences that are of current research interest. First, the Bayesian network formalism makes it natural to specify and manipulate prior knowledge, an ability that eluded earlier, nonprobabilistic neural networks. By associating a generative model with a neural network, prior knowledge can be more readily incorporated and posterior knowledge more readily extracted from the network. Second, the relationship between generative models and discriminative models can be exploited, yielding architectures that utilize feedback connections and lateral connectivity. Third, the strengths of the neural network focus on LEARNING -- particularly discriminative learning -- and the Bayesian network focus on inference can be combined. Indeed, learning and inference can be fruitfully viewed as two sides of the same coin. Finally, the emphasis on approximation techniques and laws of large numbers that is present in the neural network literature can be transferred to the Bayesian network setting, yielding a variety of methods for approximate inference in complex Bayesian networks.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. New York: Oxford University Press.
Duda, R. O., and P. E. Hart. (1973). Pattern Classification and Scene Analysis. New York: Wiley.
Haykin, S. (1994). Neural Networks: A Comprehensive Foundation. New York: Macmillan College Publishing.
Hertz, J., A. Krogh, and R. G. Palmer. (1991). Introduction to the Theory of Neural Computation. Redwood City, CA: Addison-Wesley.
Jensen, F. (1996). An Introduction to Bayesian Networks. London: UCL Press.
Jordan, M. I., Ed. (1998). Learning in Graphical Models. Cambridge, MA: MIT Press.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer.
Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. New York: Wiley.