Game-Playing Systems

Games have long been popular in Artificial Intelligence as idealized domains suitable for research into various aspects of search, KNOWLEDGE REPRESENTATION, and the interaction between the two. CHESS, in particular, has been dubbed the "Drosophila of Artificial Intelligence" (McCarthy 1990), suggesting that the role of games in Artificial Intelligence is akin to that of the fruit fly in genetic research. In each case, certain practical advantages compensate for the lack of intrinsic importance of the given problem. In genetic research. In each case, certain practical advantages compensate for the lack of intrinsic importance of the given problem. In genetic research, fruit flies make it easy to maintain large populations with a short breeding cycle at low cost. In Artificial Intelligence research, games generally have rules that are well defined and can be stated in a few sentences, thus allowing for a relatively straightforward computer implementation. Yet the combinatorial complexity of interesting games can create immensely difficult problems. It has taken many decades of research combined with sufficiently powerful computers in order to approximate the level of leading human experts in many popular games. And in some games, human players still reign supreme.

Games can be classified according to a number of criteria, among them number of players, perfect versus hidden information, presence of a stochastic element, zero-sum versus non-zero-sum, average branching factor, and the size of the state space. Different combinations of characteristics emphasize different research issues. Much of the early research in game-playing systems concentrated on two- person zero-sum games of perfect information with low or moderate branching factors, in particular chess and checkers. Claude Shannon's 1950 paper on programming a computer to play chess mapped out much territory for later researchers. Alan TURING wrote a chess program (Turing et al. 1953), which he hand-simulated in the early 1950s. The earliest fully functioning chess program was described in Bernstein et al. (1958). The first chess program demonstrably superior to casual human chess players appeared in the mid-sixties (Greenblatt et al. 1967), and progress continued as faster machines became available and algorithms were refined (Slate and Atkin 1977; Condon and Thompson 1982; Hyatt, Gower, and Nelson 1990; Berliner and Ebeling 1990; Hsu et al. 1990). But it took until 1997 for a computer, the IBM Deep Blue chess machine, to defeat the human world chess champion, Gary Kasparov, in a regulation match.

Much of the success in game-playing systems has come from approaches based on depth-first minimax search with alpha-beta pruning in two-person zero-sum games of perfect information. This is essentially a brute-force search technique, searching forward as many moves as possible in an allotted time, assessing positions according to an evaluation function, and choosing the best move based on the minimax principle. The evaluation function captures essential domain knowledge. In fact, there is often a trade-off between the quality of the evaluation function and the depth of search required to achieve a given level of play. Minimax search is made more efficient through the use of alpha-beta pruning (Knuth and Moore 1975), which allows searching roughly twice as deep as would be possible in a pure minimax search. Notable examples of this approach include Deep Blue; Chinook (Schaeffer et al. 1992), which has defeated the world's best checkers players; and Logistello (Buro 1995), which has easily beaten the top human Othello players. The methods used in these programs and others of this type have been constantly improved and refined, and include such techniques as iterative deepening, transposition tables, null-move pruning, endgame databases, and singular extensions, as well as increasingly sophisticated evaluation functions.

In spite of the success of high-performance alpha-beta-based game-playing systems, there has been limited transference of ideas generated in this work to other areas of Artificial Intelligence (although see, for example, Newborn's work on theorem proving; Newborn 1992). There are, however, many alternatives to minimax alpha-beta that have been examined. Conspiracy numbers search (McAllester 1988) counts the number of positions whose evaluations must change in order for a different move choice to be made. This idea has led to methods that are capable of solving some interesting nontrivial games (Allis 1994). Decision-theoretic approaches to game playing, particularly under constraints of limited resources (BOUNDED RATIONALITY), are promising and have broad applicability outside the game-playing area. For example, Russell and Wefald (1991) reasoned specifically about when to terminate a search, based on the expected utility of further search and the cost of the time required for the additional work. Statistical methods for search and evaluation (Baum and Smith 1997) also have shown promise, adding uncertainty to a standard evaluation function and then using the inexact statistical information to approximate the exploration of the most important positions.

MACHINE LEARNING has a long history of using games as domains for experimentation. Samuel's checkers program (Samuel 1959), originally developed in the 1950s, employed both a rote-learning scheme and a method for tuning the coefficients of his evaluation function. Many current chess and Othello programs use forms of rote learning to avoid losing the same game twice. The Logistello program has also used an automatically tuned evaluation function with excellent results. However, the most noteworthy example of learning in game-playing systems is TD-Gammon (Tesauro 1995), a neural network program for playing backgammon. For games with stochastic elements, for instance the dice in backgammon, forward searching approaches are less efficient, which places a premium on the quality of the evaluation function. TD-Gammon used a reinforcement learning algorithm to train its neural network solely by playing against itself and learning from the results. This network, with or without some limited search, produces world-class play in backgammon (Tesauro and Galperin 1997). Reinforcement learning also has applications in other decision making and scheduling problems.

Some games are very difficult for computers at the present time. Go programs are actively being developed (Chen et al. 1990), but are far from the level of the best human players. The alpha-beta search paradigm, which is so successful in chess, checkers, and the like, is not directly applicable to Go because of the large branching factor. More subtle approaches involving decomposition of a game state into subproblems appear to be necessary. Hidden information games, such as bridge, are also not appropriate for direct application of the alpha-beta ALGORITHM, and have begun to receive more attention (Ginsberg 1996). Some other games are too difficult for even initial attempts. For example, in the game of Nomic (Suber 1990) a player's turn involves an amendment or addition to an existing set of rules. Initially players vote on proposed changes, but the game can evolve into something completely different. There is no clear way at present to design a game-playing system to play a game such as Nomic due to the tremendous amount of world knowledge required.

Game-playing systems have helped illustrate the role of search and knowledge working together in systems for solving complex problems, but games have also been useful as domains for experimentation with various types of machine learning and HEURISTIC SEARCH. The absence of significant learning capabilities in most game-playing systems, as well as the difficulty in creating high-performance programs for games such as Go, suggest that games are still fertile domains for Artificial Intelligence research.

See also

Additional links

-- Murray Campbell

References

Allis, V. (1994). Searching for Solutions in Games and Artificial Intelligence. Maastricht: University of Limburg.

Baum, E. B., and W. D. Smith. (1997). A bayesian approach to game playing. Artificial Intelligence 97:195-242.

Berliner, H., and C. Ebeling. (1990). Hitech. In T. A. Marsland and J. Schaeffer, Eds., Computers, Chess, and Cognition. New York: Springer, pp. 79-110.

Bernstein, A., M. deV. Roberts, T. Arbuckle, and M. A. Belsky. (1958). A chess-playing program for the IBM 704. Proceedings of the Western Joint Computer Conference. New York: The American Institute of Electrical Engineers.

Buro, M. (1995). Statistical feature combination for the evaluation of game positions. Journal of Artificial Intelligence Research 3:373-382.

Chen, K., A. Kierult, M. Muller, and J. Nievergelt. (1990). The design and evolution of Go explorer. In T. A. Marsland and J. Schaeffer, Eds., Computers, Chess, and Cognition. New York: Springer, pp. 271-286.

Condon, J., and K. Thompson. (1982). Belle chess hardware. In M. Clarke, Ed., Advances in Computer Chess 3. New York: Pergamon, pp. 45-54.

Ginsberg, M. (1996). Partition search. Proceedings of AAAI-96.

Greenblatt, R. D., D. E. Eastlake, and S. D. Crocker. (1967). The Greenblatt Chess Program. Proceedings of the Fall Joint Computer Conference. The American Federation of Information Processing Societies.

Hsu, F., T. Anantharman, M. Campbell, and A. Nowatzyk. (1990). Deep Thought. In T. A. Marsland and J. Schaeffer, Eds., Computers, Chess, and Cognition. New York: Springer, pp. 55-78.

Hyatt, R., A. Gower, and H. Nelson. (1990). Cray Blitz. In T. A. Marsland and J. Schaeffer, Eds., Computers, Chess, and Cognition. New York: Springer, pp. 111-130.

Knuth, D., and R. Moore. (1975). An analysis of alpha-beta pruning. Artificial Intelligence 6:293-326.

McAllester, D. (1988). Conspiracy numbers for min-max search. Artificial Intelligence 35:287-310

McCarthy, J. (1990). Chess as the Drosophila of AI. In T. A. Marsland and J. Schaeffer, Eds., Computers, Chess, and Cognition. New York: Springer, pp. 227-238.

Newborn, M. (1992). A theorem-proving program for the IBM PC. In D. Kopec and R. B. Thompson, Eds., Artificial Intelligence and Intelligent Tutoring Systems. Upper Saddle River, NJ: Prentice Hall, pp. 65-92.

Russell, S., and E. Wefald. (1991). Principles of metareasoning. Artificial Intelligence 49:361-395.

Samuel, A. L. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3:210-229.

Schaeffer, J., J. Culberson, N. Treloar, B. Knight, P. Lu, and D. Szafron. (1992). A world championship caliber checkers program. Artificial Intelligence 53:273-290.

Shannon, C. (1950). Programming a computer for playing chess. Philosophical Magazine 41:256-275.

Slate, D., and L. Atkin. (1977). Chess 4.5 -- The Northwestern Chess Program. In P. Frey, Ed., Chess Skill in Man and Machine. Berlin: Springer, pp. 82-118.

Suber, P. (1990). The Paradox of Self-Amendment: A Study of Law, Logic, Omnipotence and Change. New York: Peter Lang.

Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM 38:58-68.

Tesauro, G., and R. Galperin. (1997). On-line policy improvement using Monte-Carlo search. In M. I. Jordan and M. C. Mozer, Eds., Advances in Neural Information Processing Systems 9: Proceedings of the 1996 Conference. Cambridge, MA: MIT Press.

Turing, A., C. Strachey, M. Bates, and B. Bowden. (1953). Digital computers applied to games. In B. Bowden, Ed., Faster Than Thought. New York: Pitman, pp. 286-310.