Conditioning

When Ivan Pavlov observed that hungry dogs salivated profusely not only at the taste or sight of food, but also at the sight or sound of the laboratory attendant who regularly fed them, he described this salivation as a "psychical reflex" and later as a "conditional reflex." Salivation was an inborn, reflexive response, unconditionally elicited by food in the mouth, but which could be elicited by other stimuli conditionally on their having signaled the delivery of food. The term conditional was translated as "conditioned," whence by back-formation the verb "to condition," which has been used ever since.

In Pavlov's experimental studies of conditioning (1927), the unconditional stimulus (US), food or dilute acid injected into the dog's mouth, was delivered immediately after the presentation of the conditional stimulus (CS), a bell, metronome, or flashing light, regardless of the animal's behavior. The US served to strengthen or reinforce the conditional reflex of salivating to the CS, which would extinguish if the US was no longer presented. Hence the US is often referred to as a "reinforcer." Pavlovian or classical conditioning is contrasted with instrumental or operant conditioning, where the delivery of the reinforcer is dependent on the animal performing a particular response or action. This was first studied in the laboratory by Thorndike (1911), at much the same time as, but quite independently of, Pavlov's experiments. Thorndike talked of "trial-and-error learning," but the "conditioning" terminology was popularized by Skinner (1938), who devised the first successful fully automated apparatus for studying instrumental conditioning.

In Pavlovian conditioning, the delivery of the reinforcer is contingent on the occurrence of a stimulus (the CS), whereas in instrumental conditioning, it is contingent on the occurrence of a designated response. This operational distinction was first clearly articulated by Skinner, but Miller and Konorski (1928) in Poland and Grindley (1932) in England had already argued, on experimental and theoretical grounds, for the importance of this distinction. According to the simplest, and still widely accepted, interpretation of Pavlovian conditioning, the US serves to elicit a response (e.g., salivation), and pairing a CS with this US results in the formation of an association between the two, such that the presentation of the CS can activate a representation of the US, which then elicits the same (or a related) response. This account cannot explain the occurrence of instrumental conditioning. If the delivery of a food reinforcer is contingent on the execution of a particular response, this may well lead to the formation of an association between response and reinforcer. The Pavlovian principle can then predict that the dog performing the required response will salivate when doing so (a prediction that has been confirmed), but what needs to be explained is why the dog learns to perform the response in the first place.

Another way of stating the distinction between Pavlovian and instrumental conditioning is to note that instrumentally conditioned responses are being modified by their consequences, much as Thorndike's law of effect, or Skinner's talk of "controlling contingencies of reinforcement," implied. The hungry rat that presses a lever to obtain food, will desist from pressing the lever if punished for doing so. But Pavlovian conditioned responses are not modified by their consequences; they are simply elicited by a CS associated with a US, as experiments employing omission schedules demonstrate. If, in a Pavlovian experiment, the delivery of food to a hungry pigeon is signaled by the illumination of a small light some distance away from the food hopper, the pigeon will soon learn to approach and peck at this light, even though this pattern of behavior takes it farther away from the food, to the point of reducing the amount of food it obtains. Indeed, it will continue to approach and peck the light on a high proportion of trials even if the experimenter arranges that any such response actually cancels the delivery of food on that trial. The light, as a CS, has been associated with food, as a US, and comes to elicit the same pattern of behavior as food, approach and pecking, regardless of its consequences (Mackintosh 1983) .

Most research in COMPARATIVE PSYCHOLOGY accepts that the conditioning process is of wide generality, common at least to most vertebrates, and allows them to learn about the important contingencies in their environment -- what events predict danger, what signs reliably indicate the availability of food, how to take effective action to avoid predators or capture prey; in short, to learn about the causal structure of their world. But why should cognitive scientists pay attention to conditioning? One plausible answer is that conditioning experiments provide the best way to study simple associative LEARNING, and associative learning is what NEURAL NETWORKS implement. Conditioning experiments have unique advantages for the study of associative learning: experiments on eyelid conditioning in rabbits, conditioned suppression in rats, or autoshaping in pigeons reveal the operation of simple associative processes untrammeled by other, cognitive operations that people bring to bear when asked to solve problems. And through such preparations researchers can directly study the rules governing the formation of single associations between elementary events. As many commentators have noted, there is a striking similarity between the Rescorla- Wagner (1972) model of Pavlovian conditioning and the Widrow-Hoff or delta rule frequently used to determine changes in connection weights in a parallel distributed processing (PDP) network (Sutton and Barto 1981). The phenomenon of "blocking" in Pavlovian conditioning provides a direct illustration of the operation of this rule: if a given reinforcer is already well predicted by CS1, further conditioning trials on which CS2 is added to CS1 and the two are followed by the reinforcer results in little or no conditioning to CS2. The Rescorla-Wagner model explains this by noting that the strength of an association between a CS and reinforcer will change only when there is a discrepancy between the reinforcer that actually occurs and the one that was expected to occur. According to the delta rule, connections between elements in a network are changed only insofar as is necessary to bring them into line with external inputs to those elements.

But conditioning theorists, not least Rescorla and Wagner themselves, have long known that the Rescorla-Wagner model is incomplete in several important respects. A second determinant of the rate of change in the strength of an association between a CS and a US is the associability of the CS -- which can itself change as a consequence of experience. For example, in the phenomenon of latent inhibition, a novel CS will enter into association with a US rapidly, but a familiar one will condition only slowly. Inhibitory conditioning, when a CS signals the absence of an otherwise predicted US, is not the symmetrical opposite of excitatory conditioning, when the CS signals the occurrence of an otherwise unexpected US. Even the rather simple stimuli used in most conditioning experiments are, at least sometimes, represented as configurations of patterns of elements rather than as a simple sum of their elements (Pearce 1994). This last point has indeed been incorporated into many connectionist networks because a simple, elementary representation of stimuli makes the solution of many discriminations impossible. A familiar example is the XOR (exclusive or) problem: if each of two stimuli, A and B, signaled the delivery of a US when presented alone, but their combination, AB, predicted the absence of the US, a simple elementary system would respond more vigorously to the AB compound than to A or B alone, and thus fail to learn the discrimination. The solution must be to represent the compound as something more than, or different from, the sum of its components. But apart from this, not all connectionist models have acknowledged the modifications to error-correcting associative systems that conditioning theorists have been willing to entertain to supplement the simple Rescorla-Wagner model. Conversely, some of the phenomena once thought to contradict, or lie well outside the scope of, standard conditioning theory, such as evidence of so-called constraints on learning (Seligman and Hager 1972), turn out on closer experimental and theoretical analysis to require little more than minor parametric changes to the theory (Mackintosh 1983). Conditioning theory and conditioning experiments may still have some important lessons to teach.

See also

Additional links

-- Nicholas J. Mackintosh

References

Grindley, G. C. (1932). The formation of a simple habit in guinea pigs. British Journal of Psychology 23:127-147.

Mackintosh, N. J. (1983). Conditioning and Associative Learning. Oxford: Oxford University Press.

Miller, S., and J. Konorski. (1928). Sur une forme particulière des reflexes conditionnels. C. R. Sèance.Soc. Biol. 99:1155-1157.

Pavlov, I. P. (1927). Conditioned Reflexes. Oxford: Oxford University Press.

Pearce, J. M. (1994). Similarity and discrimination: A selective review and a connectionist model. Psychological Review 10:587-607.

Rescorla, R. A., and A. R. Wagner. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black and W. F. Proskay, Eds., Classical Conditioning, vol. 2, Current Research and Theory. New York: Appleton-Century-Crofts, pp. 54-99.

Seligman, M. E. P., and J. L. Hager, Eds. (1972). Biological Boundaries of Learning. New York: Appleton-Century-Crofts.

Skinner, B. F. (1938). The Behavior of Organisms. New York: Appleton-Century-Crofts.

Sutton, R. S., and A. G. Barto. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review 88:135-170.

Thorndike, E. L. (1911). Animal Intelligence: Experimental Studies. New York: Macmillan .