Complexity International      ISSN 1320-0682     
Volume 02 April 1995

On the Dynamics of Genetic Algorithms (and Other Evolving Systems)

Ben Goertzel, Malwane Ananda and Matthew Iklé
Computer Science Department
Waikato University
Hamilton, New Zealand

Department of Mathematical Sciences
University of Nevada, Las Vegas
Las Vegas Nevada 89154 USA

Email: goertzel@waikato.ac.nz

Abstract:

The genetic algorithm is approximated by a deterministic iteration on , called the "iterated mean path" and obtained by repeatedly iterating the function which takes a population into its expected value after one step of the GA. This is a "no genetic drift" approximation which becomes increasingly accurate as population size increases. It is shown that the optimum of the fitness function is usually not an attractor for the iterated mean path; a result which reflects the empirical phenomenon of "dithering". Also, a surprisingly simple formula for the Jacobian determinant of is presented, which states that under appropriate conditions, this determinant is inversely proportional to a positive power of the average fitness of the population. These results, it is argued, suggest the possibility of two general principles of evolutionary dynamics, extending beyond the context of the GA: a law of diversity generation and a law of fitness-driven focussing.

 


Introduction

The genetic algorithm (GA) is both a useful optimisation technique and a rough schematic model of biological evolution [1,2,3]. Thus, there is reason to expect that a solid theoretical understanding of the dynamics of genetic optimisation would be doubly useful. Not only would it provide guidance for practical GA applications, but it might well provide hints toward the solution of the more general mystery of how evolution works.

Unfortunately, the underlying dynamics of the GA have, as yet, largely eluded understanding. This paper describes a new method of analysis which partially remedies this deficit. We describe how, by approximating the GA with a certain nonlinear mapping on , one may obtain interesting mathematical results which lend significant insight into the empirical behaviour of the GA, and which also, perhaps more importantly, suggest general principles of evolutionary dynamics extending beyond the context of genetic algorithms.

Inspired by physical research in which complex, intractable field dynamics are analysed with by a rough but workable "mean field approximation", we introduce a "mean population approximation" of the GA. The GA is replaced with a nonlinear recursion formed by repeatedly iterating the function, which takes a population into its expected value after one step of the GA. By starting this recursion from an initial point representing a fixed initial population, one obtains a deterministic path through population space, which we call the "iterated mean path" of the GA.

Iterated mean path analysis is exact for the case where population size approaches infinity. If one assumes that GA performance scales up with population size (an intuitively reasonable claim regarding which, however, there is no unanimity [2]), then it follows that the iterated mean path theory is a "best case" analysis of the long-run average behaviour of the GA, and thus, in a sense, an upper bound on GA performance. In biological terms, the iterated mean path is a "no genetic drift" approximation. It therefore coincides with a long tradition in theoretical biology of ignoring genetic drift [4].

Here we will show, first of all, that iterated mean path analysis gives a novel explanation of the oft-observed phenomenon of "dithering". Even for very large population sizes, the GA will dither around the answer instead of zooming right in; and this is because the optimum of the objective function is not, under usual circumstances, an attractor for the dynamical system implied by the iterated mean path.

Next, we will present a compact formula for the Jacobian determinant of the iterated mean path. Under appropriate conditions, this determinant is proportional to the product of the values of the fitness function, divided by a positive integer power of the average fitness of the population. This striking formula shows that, if one ignores genetic drift, the scope of search of the GA progressively decreases as the overall fitness of the population increases. The proof of this formula is relatively lengthy and will be given elsewhere [5].

The results described here are restricted to a relatively simple, standardised version of the GA: we will assume a fixed-size population of fixed-length binary strings evolving with non-overlapping generations, and probability of selection proportional to fitness. For our treatment of dithering, the crossover operator will remain fairly general; for our analysis of the Jacobian determinant, however, we will assume standard crossover at one cut-point. However, these restrictions have been made in the interest of mathematical tractability, and are not considered to reflect fundamental conceptual limitations of the iterated mean path approach. In fact, it seems quite plausible that the results may apply, in spirit if not in detail, beyond the context of the GA, to the actual dynamics of evolving biological, psychological, chemical and computational systems.

In particular, these results suggest two potential "laws of evolution":

These two principles, if validated by further investigation, would go a long way toward explaining the remarkable ability of evolutionary systems to generate complex structures.


The iterated mean path

Throughout this paper we will assume a simple version of the "standard genetic algorithm" described in [2]; that is, we will consider a population of N M-bit binary sequences, evolving with non-overlapping generations and with selection probability proportional to unscaled fitness. For the sake of simplicity, we will set the mutation rate to zero, although as will be noted, this does not affect the fundamental character of the results. Finally, where f is the non-negative-valued fitness function, we will use the notation = .

To facilitate the expression of formulae involving crossover, we will represent a crossover operator as a collection of probabilities: will denote the probability of crossing bit string k with bit string l to obtain bit string i. For instance, perhaps the most common crossover operator is one-cut-point crossover.

Some insight into the mathematical peculiarity of the GA may be obtained by considering binary strings as integers, and observing that for this simple case one has

where gives the greatest integer not greater than x, and is 1 or 0 depending on whether s is true or false.

Suppose that gives the proportion of type i in generation n-1. It is not hard to see that the expected proportion of type i in generation n, is given by

Where , the iterated mean path of the genetic algorithm is then given by the formula:

where the initial vector represents the vector of proportions derived from the fixed initial population.

One may represent the GA as a mapping from vectors of probabilities into probability measures on the space of probability vectors. Where is taken to denote the mapping resulting from r generations of the GA, the following lemma follows from straightforward applications of Chebyshev's Inequality and the Bonferroni Inequality.

Though the function is a perfectly accurate representation of the iterated mean path, it is useful to introduce a different representation - one which takes into account the implicit constraint that all the arguments of must sum to one. By replacing with the function defined by

-dimensional space. The results to be derived below, all regard the Jacobian derivative of the function , as given by the following formulas:

 


Dithering

Whoever has done serious practical work with genetic algorithms has observed the phenomenon of "dithering". Given a reasonable fitness function, a suitable representation and an adequate population size, the GA is often successful at evolving a fair approximation to the optimum of the fitness function. But when it comes to the next stage of the optimisation process - closing in on the precise value of the answer - the GA's performance declines rather drastically. It tends to dither around the answer, moving from one nearby value to another in an apparently chaotic fashion. In this way, the GA is precisely the opposite of classical optimisation schemes like Newton's method, which are at their best when in the vicinity of the answer.

The iterated mean path approximation illuminates the phenomenon of dithering in a very simple way. By the definition of derivative, the behaviour of the iterated mean path near the answer can be determined by calculating its derivative at the answer. For example, if we arbitrarily declare that the maximum of the objective function occurs at , then it can be determined by plugging the "answer" vector into the derivative formula.

In classical numerical analysis, one classifies optimisation methods by their "order of convergence" to the answer; it is assumed that the answer will be an attractor. But it turns out that, in the case of the GA, the answer is generally not an attractor. To see this, it suffices to show that the norm of the Jacobian at the answer, in any natural norm, exceeds one. This implies that there is some direction from the answer in which the iteration is expanding rather than contracting, so that the answer must be either a saddle point or a repeller [6].

For ease of computation, we choose the 1-norm, and arrive at the following formula. Assume that whenever ; then the question is whether, where satisfies .

This formula is not terribly enlightening. It may be simplified, however, making use of specific properties of the bit string representation. Also, it is easiest to restrict attention to one cut-point crossover, although similar formulas may be generated for the more general case. The worst case is where q=0, yielding

But this worst case is extremely unlikely. For the vast majority of objective functions, q will differ from 0 more than of its bits (for a review of Hamming distance geometry, see [7]), yielding

for "almost all" objective functions f in the limit of large M.

The appearance of in these inequalities is puzzling and one is tempted to ignore it - after all, how much importance can be attached to the value of f at one particular point in its domain? Its appearance would appear to result from the use of the 1-norm and could probably be removed by a more sophisticated analysis.

But despite its limitations, this result explains the phenomenon of dithering. The requirement that be less than of for all is much too strong to be satisfied in reality. In general, there will be a spectrum of fitness values gradually approaching the optimum .

The conclusion is that for most objective functions, even for arbitrarily large populations and no mutations, the answer is not an attractor in probability space. So how could one possibly expect the GA to converge with finite population size and mutation? The genetic algorithm, it would seem, intrinsically generates diversity. Even once it has found a good answer to its problem, it keeps on forming a variety of different individuals anyway. And one suspects that this property may hold for all evolving systems, biological, chemical, psychological and computational.


A surprising formula

Given a Jacobian matrix, the natural thing to do is to compute its eigenvalues and determinant. When the elements of the matrix are multivariable polynomials, however, direct calculation becomes implausible. It is thus pleasantly surprising to find that, for the case of one cut-point crossover, at any rate, the quantity takes on a simple and elegant form. The question of the eigenvalues remains open, as does the generalisation to more general crossover operators.

The formula in question is given by Theorem 2.

The proof of this formula is subtle and fairly lengthy, and may be found in [5]. It is a surprisingly, almost shockingly neat equation, given the horrendously complex Jacobian which gave rise to it. The only complicated part is the constant, which apparently has to do with the dimension reduction from the operator to the operator , since it is equal to .

Finally, the assumption of zero mutation rate is not crucial to this formula. To incorporate a mutation rate, it suffices to consider the GA with mutation as a two-stage process: one stage of true crossover and another stage of mutation. The mutation stage may be represented as a constant linear operator on probability vectors, namely

where denotes the probability of i mutating into j. The dimension of this matrix may be reduced just as with . According to the chain rule, the Jacobian of the two-stage GA is then given by the product of the determinant of this mutation matrix with the determinant of the crossover Jacobian. Thus, mutation merely serves to add an extra constant outside the formula of the theorem. For most mutation schemes, the mutation matrix is only singular for finitely many values of the mutation rate, so that the constant will rarely be zero (numerical experiments verify that this is true for independent point mutation with M<6).

As noted in the Introduction, this determinant formula has interesting philosophical implications. The numerator simply scales the determinant by a power of the geometric mean of the fitness function; and the denominator decreases the determinant whenever the fitness increases, and vice versa. The quantity measures the proportion by which stretches areas in probability vector space; thus, it is a rough measure of the scope of the search carried out by the genetic algorithm. The theorem suggests that genetic algorithms accomplish their remarkably effective optimisation by the simple strategy of adaptively narrowing the scope of their crossover-based statistical search.


Broader implications

Iterated mean path analysis does not give an exact description of the behaviour of the genetic algorithm. However, it has the immense benefit of mathematical tractability, and, as we have seen, it leads to very suggestive results - results which hold out the promise of extending beyond the genetic algorithm to the general domains of biological, psychological, chemical and computational evolution.

Theorem 1 suggests a law which, from the biological point of view, seems at first sight, fairly obvious: a law of diversity generation. However, the novelty is that this law is seen to follow from crossover in a constant environment, without need for mutation or environmental variation. This suggests that diversity generation is a far more integral part of evolution that has previously been thought.

Theorem 2 suggests a law which is far less obvious. Taken literally, it hints at the possibility of a general power law relationship between scope of evolutionary search and average fitness; or, taken more loosely, it suggests a general law of fitness-driving focussing. The idea that the scope of evolutionary search depends only on the average fitness of the population, and not on the specific makeup of the population, is a new and powerful one.

Due to genetic drift, Theorems 1 and 2 hold only approximately for real genetic algorithms. Thus, when thinking of further applications, the best that can sensibly be hoped for is that the laws suggested by these results approximate the behaviour of real evolving systems. But this is not terribly disappointing - after all, evolution theory is not physics; there are no exact laws in evolutionary biology. The laws of diversity generation and fitness-driven focussing give us a new view of the ideal to which evolutionary systems aspire.

Clearly, these two proposed laws, crucial though they may perhaps be to our understanding of evolutionary dynamics, do not suffice to explain the creative power of evolution. They say nothing about the "amenability to evolution" of different environments (different fitness functions), let alone about the intricate interdependence of evolution and ecology [3]. However, it seems quite likely that, as we persist in searching for robust, general laws of evolution, the iterated mean path approximation will continue to be a useful tool.


References

1
Holland J. (1975), Adaptation in Natural and Artificial Systems, Ann Arbor: University of Michigan Press.

2
Goldberg D. (1988), Genetic Algorithms for Search, Optimization and Machine Learning, New York: Addison-Wesley.

3
Goertzel B. (1993), The Evolving Mind, New York: Gordon and Breach.

4
Kimura Y. (1978), The Neutral Theory of Molecular Evolution, Cambridge: Cambridge University Press.

5
Bachman G., Goertzel B. & Iklé M. (1994), "A surprising determinant formula concerning the genetic algorithm", in preparation.

6
Devaney R. (1988), Chaotic Dynamical Systems, New York: Addison-Wesley.

7
Kanerva P. (1988), Sparse Distributed Memory, Cambridge, MA: MIT Press.

About this document ...

On the Dynamics of Genetic Algorithms (and Other Evolving Systems)

This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
l2h -dir test gozl2.tex.

The translation was initiated by Pam Milliken on Fri Aug 16 16:12:12 EST 1996


Complexity International (1995) 2