Complexity International
    ISSN 1320-0682     
Volume 3 April 1996
 

Adaptability in Neural Networks

Ken Lodge, Jim Geyer & Terry Bossomaier

...Lodge, Geyer & Bossomaier
School of Information Technology, Charles Sturt University, Bathurst, NSW 2795. Email: klodge@csu.edu.au

 

Abstract:

Poker is a game of simple rules yet complex strategy. It is particularly interesting, in that strategy is not constant with time but varies with players and duration of game. We investigate the training of neural nets for the classification of poker hands as a prelude to evolving strategies for playing the game itself. Choice of network is discussed and training algorithms are tested.


Introduction

Neural networks are the quintessential complex adaptive system. Amongst the many individual examples of complexity, few general principles have so far emerged. However, many such systems consist of agents, perhaps very simple, which interact in some way. One important underlying principle [4] is the dominance of connectivity over dynamics: the pattern of connectiviy may determine significant properties of a system independently of the dynamics of the connections. Such a principle dramatically simplifies the high level analysis of complex systems.

The respect in which connectivity is extremely important is the way in which it relates to phase transitions. Studies in different domains have all led to the idea of computation at the edge of chaos [6], an idea with strands from many areas, such as random Boolean Networks [7], and discussed at length in Kauffman [5]. This behaviour may be understood through its isomorphism to the behaviour of random digraphs which exhibit so-called connectivity avalanches at certain threshold levels of connectivity. It is during these so-called phase transitions that order is of long range and the system is most adaptable.

The need for adaptability in neural networks varies with the application of course. An animal doesn't need to continually relearn to walk (unless it gets injured), but it does need to continually update and adapt to a changing environment. Thus, the level of connectivity which makes for an optimally adaptive neural network is both interesting and of great practical importance.

An area of particular activity in the study of adaptive agents is the development of strategies for playing games. Although in games such as chess, deterministic searching methodologies have reached expert levels, such programs lack the innovative character of biological systems - as pointed out by Penrose [8] with an example easily solved by human novices which defeated the best such program. On the other hand, top class backgammon programs such as TDGammon [12] which are neural networks trained only by feedback as to whether they win or lose, develop strategies recognised by human experts as novel. Even extremely simple games such as iterated prisoners dilemma (see Fogel [2] for a review) lead to complex evolved strategies.

These considerations led us to a simplified form of poker as a template for studying the evolution of maximally adaptable neural networks:

The first stage of the investigation is to look at the behaviour of nets for the classification task before considering the evolution of players of the game in order to ensure that the tasks that we know to be important (there may be others) can indeed be carried out by the networks. From this vantage point, we can then study the networks of greatest flexibility, by studying degrees of connectivity.


Network Architecture

Although feedforward networks have been successfully used in games research [12, 13], they do not have the range of connectivity options to really investigate adaptability. Recurrent nets may contain all possible connections and are much closer to biological systems and are often used for real time systems.

The equation describing the evolution of the neural network in time is a simple equation with lumped synapses, no spatial distribution of the neuron and a sigmoid, continuous output function rather than a spiking mechanism. The equation is given below:

displaymath137

where v is the neuronal activation, tex2html_wrap_inline141 the sigmoid output function and I the input. Networks were used with up to full connectivity but excluding self-excitation.

The network architecture was chosen as follows:


Training

The first stage in the development of the poker player is to have it correctly classify hands. This is a problem that is easily done by conventional programming and also by feedforward networks. However, for reasons outlined above, a recurrent network is used.

The first problem considered is the identification of pairs, where exactly two cards are of the same denomination. Instead of a pack of normal size and the normal five-card hands, we consider, at various stages, the following reduced structures:


Use of a Genetic Algorithm (GA)

Genetic algorithms have been used for training recurrent neural nets, particularly for real time applications as for example in [9]. A similar algorithm was used here, without the extension of phenotype modification, for training a recurrent net to identify pairs with the twenty-card pack. GAs were thought to be an appropriate tool because they are frequently good for high-dimensional search spaces with rough landscapes. They also provide a wide coverage of the search space which should be useful for avoiding problems associated with local minima.

The GA was used only to adjust the weights, using a 20-bit binary representation of each weight, stochastic universal sampling, single-point crossover and, initially, a mutation rate chosen so that the probability of a mutation in just one bit from all the weights was 0.5. The population consisted of 40 individuals.

The GA was successful in training sparse networks of about 10% connectivity (50 weights) using minimal training sets (10 hands). The resulting networks were not able to generalise to categorise hands other than those on which they were trained. The GA was not successful with an increased training set of 30 hands. Problems were encountered with decreasing population diversity. Increasing the mutation rate and introducing fitness-sharing were marginally successful in alleviating this.

The GA was also applied to the training of a fully connected network with 30 hands in the training set. In this case, problems of network instability arose due to the number of large weights selected by the GA. Attempts to mitigate this, by reducing the bounds on the weights and by applying different mappings from the chromosomes to the weights, were unsuccessful since a wide range of weight values seem to be necessary for effective networks.


Use of Pineda's Algorithm

The recurrent back-propagation algorithm of Pineda [10] was also used to train the recurrent networks described above. The Pineda algorithm is a steepest descent algorithm which assumes that the network will converge to a fixed point as time tends to infinity.

Care has to be taken to ensure that for the sparse networks, the connectivity matrix remains sparse and that weight updates are not allowed to percolate through to non-existent connections. It is also essential to train adiabatically - that is, with an adequately low learning rate.

In particular, using the twenty-card pack, Pineda's algorithm was able to train the fully connected network using training sets of 32 and 64 randomly chosen hands. However, the networks again showed poor generalisation. After increasing the training set to 128 hands, the Pineda algorithm did not successfully train the network. A possibly significant difference between the training sets is the inclusion of a number of hands containing three of a kind in the 128-hand training set.

To investigate these problems further, Pineda's algorithm was applied to a number of intuitively derived modular structures that separate out the denomination components of a hand. For simplicity, this was done using the pack with three suits. The function being investigated is given in Table 1.

 

table36

Table 1: Sample training set for classification of pairs in a single denomination (for example, King or Ace)

Using fully connected modules with either one or two hidden units, we were able to train successfully the parity case using the complete set of inputs, but the pair classification remained obdurate, despite testing of a number of different starting weight distributions. It is interesting that we were able to train the 3-input even parity function, which differs from the above function only by having an output of 1 for the first row.


Discussion

The recurrent networks used here trained by either GA or the recurrent back propagation algorithm showed a capability of fitting training examples but poor capacity to generalise. The networks have a considerable number of degrees of freedom but the constraint that they converge to a fixed point may be unduly restrictive. In fact, biological systems exhibit oscillatory behaviour in many situations [11, 3]

It is in fact possible to demonstrate that oscillatory sub-populations will lead asymptotically to an essentially constant output. However, determining this output involves solving a set of coupled differential equations, a considerably more computationally expensive exercise than finding the fixed point of a dynamical system. Algorithms for finding such solutions are a source of further work.

However, from the point of view of evolving strategies for poker automatons, there is a great deal of flexibility in determining when and what output to take from a network in determining a move (in this case, the placing of a bet). The fitness function can be biased to emphasise nets which stabilise quickly, have only fixed points and so on. This remains the subject of further investigation.


References

1
D.H. Ackley and M.L. Littman. "Altruism in the evolution of communication". In R.A. Brooks and P. Maes, editors, ALife IV. MIT Press, 1994.

 

2
D. Fogel. "Evolutionary Computation". IEEE Press, 1995.

 

3
W.F. Freeman. "Colligation of cortical oscillations". In C.W. Gear, editor, Computers and Cognition, pages 69-103. SIAM, 1991.

 

4
D.G. Green. "Connectivity and the evolution of biological systems". Journal of Biological Systems, 2:91-103, 1994.

 

5
S. Kauffman. "The Origins of Order". Oxford University Press, 1993.

 

6
C.G. Langton. "Computation at the edge of chaos: Phase transitions and emergent computation". In S. Forrest, editor, Emergent Computation. MIT Press, 1990.

 

7
J.F. Lynch. "Phase transitions in random boolean networks". In R.A. Brooks and P. Maes, editors, ALife V, pages 236-245. MIT Press, 1994.

 

8
R. Penrose. "Shadows of the Mind". Oxford University Press, 1994.

 

9
V. Petridis and A. Papaikonomou. "Recurrent neural networks as pattern generators". In Proc. IEEE Conference on Neural Networks, pages 872-875, 1994.

 

10
F.J. Pineda. "Recurrent backpropagation and the dynamical approach to adaptive neural computation". Neural Computation, 1:161-172, 1989.

 

11
C. Skarda and W.F. Freeman. "How Brains Make Chaos in Order to Make Sense of the World". Volume 10, 1987.

 

12
G. Tesauro. "Practical issues in temporal difference learning". Machine Learning, 8:257-277, 1996.

 

13
L. Weaver and T.R.J. Bossomaier. "Evolution of neural networks to play the game of dots and boxes". In ALife V, 1996.

About this document ...

Adaptability in Neural Networks

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html klodge2.

The translation was initiated by Pam Milliken on Fri Jan 17 13:54:01 EST 1997


Complexity International (1996) 3