|
ISSN 1320-0682 | ||||
| Volume 3 | April 1996 | ||||
Ken Lodge, Jim Geyer & Terry Bossomaier
Poker is a game of simple rules yet complex strategy. It is particularly interesting, in that strategy is not constant with time but varies with players and duration of game. We investigate the training of neural nets for the classification of poker hands as a prelude to evolving strategies for playing the game itself. Choice of network is discussed and training algorithms are tested.
Neural networks are the quintessential complex adaptive system. Amongst the many individual examples of complexity, few general principles have so far emerged. However, many such systems consist of agents, perhaps very simple, which interact in some way. One important underlying principle [4] is the dominance of connectivity over dynamics: the pattern of connectiviy may determine significant properties of a system independently of the dynamics of the connections. Such a principle dramatically simplifies the high level analysis of complex systems.
The respect in which connectivity is extremely important is the way in which it relates to phase transitions. Studies in different domains have all led to the idea of computation at the edge of chaos [6], an idea with strands from many areas, such as random Boolean Networks [7], and discussed at length in Kauffman [5]. This behaviour may be understood through its isomorphism to the behaviour of random digraphs which exhibit so-called connectivity avalanches at certain threshold levels of connectivity. It is during these so-called phase transitions that order is of long range and the system is most adaptable.
The need for adaptability in neural networks varies with the application of course. An animal doesn't need to continually relearn to walk (unless it gets injured), but it does need to continually update and adapt to a changing environment. Thus, the level of connectivity which makes for an optimally adaptive neural network is both interesting and of great practical importance.
An area of particular activity in the study of adaptive agents is the development of strategies for playing games. Although in games such as chess, deterministic searching methodologies have reached expert levels, such programs lack the innovative character of biological systems - as pointed out by Penrose [8] with an example easily solved by human novices which defeated the best such program. On the other hand, top class backgammon programs such as TDGammon [12] which are neural networks trained only by feedback as to whether they win or lose, develop strategies recognised by human experts as novel. Even extremely simple games such as iterated prisoners dilemma (see Fogel [2] for a review) lead to complex evolved strategies.
These considerations led us to a simplified form of poker as a template for studying the evolution of maximally adaptable neural networks:
The first stage of the investigation is to look at the behaviour of nets for the classification task before considering the evolution of players of the game in order to ensure that the tasks that we know to be important (there may be others) can indeed be carried out by the networks. From this vantage point, we can then study the networks of greatest flexibility, by studying degrees of connectivity.
Although feedforward networks have been successfully used in games research [12, 13], they do not have the range of connectivity options to really investigate adaptability. Recurrent nets may contain all possible connections and are much closer to biological systems and are often used for real time systems.
The equation describing the evolution of the neural network in time is a simple equation with lumped synapses, no spatial distribution of the neuron and a sigmoid, continuous output function rather than a spiking mechanism. The equation is given below:
where v is the neuronal activation,
the sigmoid
output function and I the input. Networks were used with up to full
connectivity but excluding self-excitation.
The network architecture was chosen as follows:
The first stage in the development of the poker player is to have it correctly classify hands. This is a problem that is easily done by conventional programming and also by feedforward networks. However, for reasons outlined above, a recurrent network is used.
The first problem considered is the identification of pairs, where exactly two cards are of the same denomination. Instead of a pack of normal size and the normal five-card hands, we consider, at various stages, the following reduced structures:
Genetic algorithms have been used for training recurrent neural nets, particularly for real time applications as for example in [9]. A similar algorithm was used here, without the extension of phenotype modification, for training a recurrent net to identify pairs with the twenty-card pack. GAs were thought to be an appropriate tool because they are frequently good for high-dimensional search spaces with rough landscapes. They also provide a wide coverage of the search space which should be useful for avoiding problems associated with local minima.
The GA was used only to adjust the weights, using a 20-bit binary representation of each weight, stochastic universal sampling, single-point crossover and, initially, a mutation rate chosen so that the probability of a mutation in just one bit from all the weights was 0.5. The population consisted of 40 individuals.
The GA was successful in training sparse networks of about 10% connectivity (50 weights) using minimal training sets (10 hands). The resulting networks were not able to generalise to categorise hands other than those on which they were trained. The GA was not successful with an increased training set of 30 hands. Problems were encountered with decreasing population diversity. Increasing the mutation rate and introducing fitness-sharing were marginally successful in alleviating this.
The GA was also applied to the training of a fully connected network with 30 hands in the training set. In this case, problems of network instability arose due to the number of large weights selected by the GA. Attempts to mitigate this, by reducing the bounds on the weights and by applying different mappings from the chromosomes to the weights, were unsuccessful since a wide range of weight values seem to be necessary for effective networks.
The recurrent back-propagation algorithm of Pineda [10] was also used to train the recurrent networks described above. The Pineda algorithm is a steepest descent algorithm which assumes that the network will converge to a fixed point as time tends to infinity.
Care has to be taken to ensure that for the sparse networks, the connectivity matrix remains sparse and that weight updates are not allowed to percolate through to non-existent connections. It is also essential to train adiabatically - that is, with an adequately low learning rate.
In particular, using the twenty-card pack, Pineda's algorithm was able to train the fully connected network using training sets of 32 and 64 randomly chosen hands. However, the networks again showed poor generalisation. After increasing the training set to 128 hands, the Pineda algorithm did not successfully train the network. A possibly significant difference between the training sets is the inclusion of a number of hands containing three of a kind in the 128-hand training set.
To investigate these problems further, Pineda's algorithm was applied to a number of intuitively derived modular structures that separate out the denomination components of a hand. For simplicity, this was done using the pack with three suits. The function being investigated is given in Table 1.

Table 1: Sample training set for classification of pairs in a single denomination (for example, King or Ace)
Using fully connected modules with either one or two hidden units, we were able to train successfully the parity case using the complete set of inputs, but the pair classification remained obdurate, despite testing of a number of different starting weight distributions. It is interesting that we were able to train the 3-input even parity function, which differs from the above function only by having an output of 1 for the first row.
The recurrent networks used here trained by either GA or the recurrent back propagation algorithm showed a capability of fitting training examples but poor capacity to generalise. The networks have a considerable number of degrees of freedom but the constraint that they converge to a fixed point may be unduly restrictive. In fact, biological systems exhibit oscillatory behaviour in many situations [11, 3]
It is in fact possible to demonstrate that oscillatory sub-populations will lead asymptotically to an essentially constant output. However, determining this output involves solving a set of coupled differential equations, a considerably more computationally expensive exercise than finding the fixed point of a dynamical system. Algorithms for finding such solutions are a source of further work.
However, from the point of view of evolving strategies for poker automatons, there is a great deal of flexibility in determining when and what output to take from a network in determining a move (in this case, the placing of a bet). The fitness function can be biased to emphasise nets which stabilise quickly, have only fixed points and so on. This remains the subject of further investigation.
Adaptability in Neural Networks
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html klodge2.
The translation was initiated by Pam Milliken on Fri Jan 17 13:54:01 EST 1997