Robotics and Learning

Learning will play an increasingly important role in the design and implementation of autonomous robotic systems. Robots are notoriously difficult to program, because the correctness of their behavior depends on details of interaction with the environment, which are typically unknown to human engineers. In addition, truly flexible, robust robots will have to adapt to their specific and changing environmental conditions.

There are many opportunities for learning in a complex robotic system; the two most common are learning models and learning behavior. A robot may learn a model of its environment, in the form of a map, a kinematic or dynamical system, or an extended HIDDEN MARKOV MODEL. The model can then be used for planning, possibly using techniques of DYNAMIC PROGRAMMING. Another approach is to learn behavior, typically expressed as a mapping from perceptual input to effector outputs, directly.

A variety of MACHINE LEARNING algorithms exists, but they do not all apply well to robot learning, which is special for a number of reasons. Robots interact with their environments through physical sensors and effectors that always have some degree of noise; algorithms for robot learning must be particularly tolerant of noisy inputs and outputs. For learning to be effective, most robots must learn "on line"; that is, they learn while they are performing their task in the environment. This means that learning algorithms must be efficient and incremental (process data singly or in small batches rather than all at once) and that errors must be minimized throughout the life of the robot. Many robots are deployed in changing environments; this requires learning algorithms that can track a changing function. Finally, one of the most interesting special properties of robots is that they can often collect data actively, choosing to explore their environment in such a way as to make learning more efficient. This freedom to explore comes at the price of having to decide how to trade off gaining more information about the environment versus acting in the best possible way given the current information; this is often called the "exploration/exploitation dilemma."

Some of the most effective robot learning systems are built on SUPERVISED LEARNING, where there is a "teacher" that can supply a stream of desired outputs corresponding to the perceptual inputs of the robot. Pomerleau's ALVINN system (Pomerleau 1993) is an excellent example of supervised robot learning. A van learns to drive down a variety of roads at moderately high speeds based on visual input. The learning data is gathered with a human supervisor at the wheel, enabling the system to collect a series of input-output pairs, consisting of a computer-vision image and a desired steering angle. This data is used to train a neural network, which can then steer the car unaided.

Unfortunately, it is difficult to find such reliable supervisory information. Generally, humans are much better able to supply a "reinforcement" signal, which simply indicates when the robot is performing well or poorly. Techniques of REINFORCEMENT LEARNING can be used to learn behavior based on a reinforcement signal. This problem is much more difficult than supervised learning, because the robot is not told what outputs to generate for each input; this requires the robot to explore the space of possible actions and to be able to notice that an action taken much earlier may have been a contributing factor to current performance.

Because reinforcement learning is difficult, robot systems that make use of it must have a large amount of built-in structure. In BEHAVIOR-BASED ROBOTICS, for example, the robot may learn to switch between fixed behaviors or may learn a collection of specific behaviors given different reinforcement functions. Other robots learn a dynamical model of the world (Moore, Atkeson, and Schaal 1997) using supervised techniques, then use the model to generate actions using techniques of optimal control, which are appropriate for a restricted range of tasks.

There are still very few examples of real robotic systems that learn to behave, but this is a very active area of current research. The range of current robot learning applications and research is well documented in two collections of papers: a book edited by Connell and Mahadevan (1993) and a journal issue edited by Franklin, Mitchell, and Thrun (1996).

See also

-- Leslie Pack Kaelbling

References

Connell, J. H., and S. Mahadevan, Eds. (1993). Robot Learning. Dordrecht: Kluwer.

Franklin, J. A., T. M. Mitchell, and S. Thrun, Eds. (1996). Machine Learning, vol. 23, no. 2/3. Dordercht: Kluwer.

Moore, A. W., C. G. Atkeson, and S. A. Schaal. (1997). Locally weighted learning for control. AI Review 11:75-113.

Pomerleau, D. A. (1993). Neural Network Perception for Mobile Robot Guidance. Boston: Kluwer .