|
ISSN 1320-0682 | |||
| Volume 02 | April 1995 | |||
Ben Goertzel, Malwane Ananda and Matthew Iklé
Department of Mathematical Sciences
Email: goertzel@waikato.ac.nz
Computer Science Department
Waikato University
Hamilton, New Zealand
University of Nevada, Las Vegas
Las Vegas Nevada 89154 USA
The genetic algorithm (GA) is both a useful optimisation technique and a rough schematic model of biological evolution [1,2,3]. Thus, there is reason to expect that a solid theoretical understanding of the dynamics of genetic optimisation would be doubly useful. Not only would it provide guidance for practical GA applications, but it might well provide hints toward the solution of the more general mystery of how evolution works.
Unfortunately, the underlying dynamics of the GA have, as yet,
largely eluded understanding. This paper describes a new
method of analysis which partially remedies this deficit. We
describe how, by approximating the GA with a certain nonlinear
mapping on
, one may obtain interesting mathematical
results which lend significant insight into the empirical
behaviour of the GA, and which also, perhaps more importantly,
suggest general principles of evolutionary dynamics extending
beyond the context of genetic algorithms.
Inspired by physical research in which complex, intractable field dynamics are analysed with by a rough but workable "mean field approximation", we introduce a "mean population approximation" of the GA. The GA is replaced with a nonlinear recursion formed by repeatedly iterating the function, which takes a population into its expected value after one step of the GA. By starting this recursion from an initial point representing a fixed initial population, one obtains a deterministic path through population space, which we call the "iterated mean path" of the GA.
Iterated mean path analysis is exact for the case where population size approaches infinity. If one assumes that GA performance scales up with population size (an intuitively reasonable claim regarding which, however, there is no unanimity [2]), then it follows that the iterated mean path theory is a "best case" analysis of the long-run average behaviour of the GA, and thus, in a sense, an upper bound on GA performance. In biological terms, the iterated mean path is a "no genetic drift" approximation. It therefore coincides with a long tradition in theoretical biology of ignoring genetic drift [4].
Here we will show, first of all, that iterated mean path analysis gives a novel explanation of the oft-observed phenomenon of "dithering". Even for very large population sizes, the GA will dither around the answer instead of zooming right in; and this is because the optimum of the objective function is not, under usual circumstances, an attractor for the dynamical system implied by the iterated mean path.
Next, we will present a compact formula for the Jacobian determinant of the iterated mean path. Under appropriate conditions, this determinant is proportional to the product of the values of the fitness function, divided by a positive integer power of the average fitness of the population. This striking formula shows that, if one ignores genetic drift, the scope of search of the GA progressively decreases as the overall fitness of the population increases. The proof of this formula is relatively lengthy and will be given elsewhere [5].
The results described here are restricted to a relatively simple, standardised version of the GA: we will assume a fixed-size population of fixed-length binary strings evolving with non-overlapping generations, and probability of selection proportional to fitness. For our treatment of dithering, the crossover operator will remain fairly general; for our analysis of the Jacobian determinant, however, we will assume standard crossover at one cut-point. However, these restrictions have been made in the interest of mathematical tractability, and are not considered to reflect fundamental conceptual limitations of the iterated mean path approach. In fact, it seems quite plausible that the results may apply, in spirit if not in detail, beyond the context of the GA, to the actual dynamics of evolving biological, psychological, chemical and computational systems.
In particular, these results suggest two potential "laws of evolution":
These two principles, if validated by further investigation, would go a long way toward explaining the remarkable ability of evolutionary systems to generate complex structures.
Throughout this paper we will assume a simple version of the
"standard genetic algorithm" described in [2]; that
is, we will consider a population of N M-bit binary
sequences, evolving with non-overlapping generations and with
selection probability proportional to unscaled fitness. For the sake
of simplicity, we will set the mutation rate to zero, although as
will be noted, this does not affect the fundamental character of
the results. Finally, where f is the non-negative-valued
fitness function, we will use the notation
=
.
To facilitate the expression of formulae involving crossover, we
will represent a crossover operator as a collection of
probabilities:
will denote the probability of crossing
bit string k with bit string l to obtain bit string i. For
instance, perhaps the most common crossover operator is
one-cut-point crossover.
Some insight into the mathematical peculiarity of the GA may be obtained by considering binary strings as integers, and observing that for this simple case one has
![]()
where
gives the greatest integer not greater
than x,
and
is 1 or 0 depending on whether s is true or
false.
Suppose that
gives the proportion of type i in
generation n-1. It is not hard to see that the expected
proportion of type i in generation n, is given by

Where
, the iterated mean path of
the genetic algorithm is then given by the formula:
![]()
where the initial vector
represents the vector of
proportions derived from the fixed initial population.
One may represent the GA as a mapping
from vectors of
probabilities into probability measures on the space of
probability vectors.
Where
is taken to denote the mapping resulting from
r generations of the GA, the following lemma follows from
straightforward applications of Chebyshev's Inequality and the
Bonferroni Inequality.

Though the function
is a perfectly accurate
representation of the iterated mean path, it is useful to
introduce a different representation - one which takes into
account the implicit constraint that all the arguments of
must sum to one. By replacing
with the function
defined by
![]()
-dimensional space.
The results to be derived below, all regard the Jacobian
derivative of the function
, as given by the following
formulas:

![]()
Whoever has done serious practical work with genetic algorithms has observed the phenomenon of "dithering". Given a reasonable fitness function, a suitable representation and an adequate population size, the GA is often successful at evolving a fair approximation to the optimum of the fitness function. But when it comes to the next stage of the optimisation process - closing in on the precise value of the answer - the GA's performance declines rather drastically. It tends to dither around the answer, moving from one nearby value to another in an apparently chaotic fashion. In this way, the GA is precisely the opposite of classical optimisation schemes like Newton's method, which are at their best when in the vicinity of the answer.
The iterated mean path approximation illuminates the phenomenon
of dithering in a very simple way. By the definition of
derivative, the behaviour of the iterated mean path near the
answer can be determined by calculating its derivative at the
answer. For example, if we arbitrarily declare that the maximum
of the objective function occurs at
, then it can
be determined by plugging the "answer" vector
into the derivative formula.
In classical numerical analysis, one classifies optimisation methods by their "order of convergence" to the answer; it is assumed that the answer will be an attractor. But it turns out that, in the case of the GA, the answer is generally not an attractor. To see this, it suffices to show that the norm of the Jacobian at the answer, in any natural norm, exceeds one. This implies that there is some direction from the answer in which the iteration is expanding rather than contracting, so that the answer must be either a saddle point or a repeller [6].
For ease of computation, we choose the 1-norm, and arrive at
the following formula. Assume that
whenever
; then the question is whether, where
satisfies
.

This formula is not terribly enlightening. It may be simplified, however, making use of specific properties of the bit string representation. Also, it is easiest to restrict attention to one cut-point crossover, although similar formulas may be generated for the more general case. The worst case is where q=0, yielding
![]()
But this worst case is extremely unlikely. For the vast majority
of objective functions, q will differ from 0 more than
of its bits (for a review of Hamming distance
geometry, see [7]), yielding
![]()
for "almost all" objective functions f in the limit of large M.
The appearance of
in these inequalities is puzzling and one
is tempted to ignore it - after all, how much importance can be
attached to the value of f at one particular point in its domain?
Its appearance would appear to result from the use of the 1-norm
and could probably be removed by a more sophisticated
analysis.
But despite its limitations, this result explains the phenomenon
of dithering. The requirement that
be less than
of
for all
is much too strong
to be satisfied in reality. In general, there will be a spectrum
of fitness values gradually approaching the optimum
.
The conclusion is that for most objective functions, even for arbitrarily large populations and no mutations, the answer is not an attractor in probability space. So how could one possibly expect the GA to converge with finite population size and mutation? The genetic algorithm, it would seem, intrinsically generates diversity. Even once it has found a good answer to its problem, it keeps on forming a variety of different individuals anyway. And one suspects that this property may hold for all evolving systems, biological, chemical, psychological and computational.
Given a Jacobian matrix, the natural thing to do is to compute
its eigenvalues and determinant. When the elements of the matrix
are multivariable polynomials, however, direct calculation
becomes implausible. It is thus pleasantly surprising to find
that, for the case of one cut-point crossover, at any rate, the
quantity
takes on a simple and elegant form.
The question of the eigenvalues remains open, as does the
generalisation to more general crossover operators.
The formula in question is given by Theorem 2.

The proof of this formula is subtle and fairly lengthy, and may
be found in [5]. It is a surprisingly, almost
shockingly neat equation, given the horrendously complex Jacobian
which gave rise to it. The only complicated part is the
constant, which apparently has to do with the dimension reduction
from the operator
to the operator
, since it is
equal to
.
Finally, the assumption of zero mutation rate is not crucial to this formula. To incorporate a mutation rate, it suffices to consider the GA with mutation as a two-stage process: one stage of true crossover and another stage of mutation. The mutation stage may be represented as a constant linear operator on probability vectors, namely
![]()
where
denotes the probability of i mutating into j.
The dimension of this matrix may be reduced just as with
.
According to the chain rule, the Jacobian of the two-stage GA is
then given by the product of the determinant of this mutation
matrix with the determinant of the crossover Jacobian. Thus,
mutation merely serves to add an extra constant outside the
formula of the theorem. For most mutation schemes, the mutation
matrix is only singular for finitely many values of the mutation
rate, so that the constant will rarely be zero (numerical
experiments verify that this is true for independent point
mutation with M<6).
As noted in the Introduction, this determinant formula has
interesting philosophical implications. The numerator simply
scales the determinant by a power of the geometric mean of the
fitness function; and the denominator decreases the determinant
whenever the fitness increases, and vice versa. The quantity
measures the proportion by which
stretches areas in probability vector space; thus, it is a rough
measure of the scope of the search carried out by the
genetic algorithm. The theorem suggests that genetic algorithms
accomplish their remarkably effective optimisation by the simple
strategy of adaptively narrowing the scope of their
crossover-based statistical search.
Iterated mean path analysis does not give an exact description of the behaviour of the genetic algorithm. However, it has the immense benefit of mathematical tractability, and, as we have seen, it leads to very suggestive results - results which hold out the promise of extending beyond the genetic algorithm to the general domains of biological, psychological, chemical and computational evolution.
Theorem 1 suggests a law which, from the biological point of view, seems at first sight, fairly obvious: a law of diversity generation. However, the novelty is that this law is seen to follow from crossover in a constant environment, without need for mutation or environmental variation. This suggests that diversity generation is a far more integral part of evolution that has previously been thought.
Theorem 2 suggests a law which is far less obvious. Taken literally, it hints at the possibility of a general power law relationship between scope of evolutionary search and average fitness; or, taken more loosely, it suggests a general law of fitness-driving focussing. The idea that the scope of evolutionary search depends only on the average fitness of the population, and not on the specific makeup of the population, is a new and powerful one.
Due to genetic drift, Theorems 1 and 2 hold only approximately for real genetic algorithms. Thus, when thinking of further applications, the best that can sensibly be hoped for is that the laws suggested by these results approximate the behaviour of real evolving systems. But this is not terribly disappointing - after all, evolution theory is not physics; there are no exact laws in evolutionary biology. The laws of diversity generation and fitness-driven focussing give us a new view of the ideal to which evolutionary systems aspire.
Clearly, these two proposed laws, crucial though they may perhaps be to our understanding of evolutionary dynamics, do not suffice to explain the creative power of evolution. They say nothing about the "amenability to evolution" of different environments (different fitness functions), let alone about the intricate interdependence of evolution and ecology [3]. However, it seems quite likely that, as we persist in searching for robust, general laws of evolution, the iterated mean path approximation will continue to be a useful tool.
On the Dynamics of Genetic Algorithms (and Other Evolving Systems)
This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The
command line arguments were:
l2h -dir test gozl2.tex.
The translation was initiated by Pam Milliken on Fri Aug 16 16:12:12 EST 1996