|
ISSN 1320-0682 | |||
| Volume 03 | April 1996 | |||
Russell K. Standish
Evolution is a process that involves both ecology and genetic mutation. Ecological interactions between species gives rise to the selective pressures that comprise natural selection, and mutation gives rise to the variation within species upon which selection acts. There is a long history of the study of the dynamics of evolution, starting with the Lotka-Volterra equation [7]. There is similarly a not quite so long history in genetic algorithms [4], [5], studying the dynamics of mutation. Only recently, however, have people been able to consider the two processes together in order to understand evolution. These have generally involved simulating the lives and procreation of individual organisms; for example, Thomas Ray's Tierra model, [8], [10], [9] or coupling genetic algorithms with neural networks [1]. These models are computationally intensive, and do not necessarily illuminate the system dynamics.
We start with a generalised form of the Lotka-Volterra equation (in tensor notation)
=
+
+ mutate(
) (1)
Here, n is the population density, the component
being the
number of individuals of species i,
is the
difference between reproduction and death,
is the
interaction matrix, with
being the
interaction between species i and j, and mutate is the
mutation operator.
The difficulty with adding mutation to this model is how to define the
mapping between genotype space and phenotype space, or in other words, what
defines the embryology. A few studies, including Ray's Tierra world, do
this with an explicit mapping from the genotype to some particular organism
property (for example, interpreted as machine language instructions, or as
weight in a neural net). These organisms then interact with one another to
determine the population dynamics. In this model, however, we are doing away
with the organismal layer and so an explicit embryology is impossible. The only
possibility left is to use a statistical model of embryology. The mapping
between genotype space and the population parameters
,
is expected
to look like a rugged landscape. However, if two genotypes are close together
(in a Hamming sense) then one might expect that the phenotypes are likely to be
similar, as would the population parameters. This I call random embryology
with locality.
In the simple case of point mutations, the probability P(x) of any
child lying distance x in genotype space from its parent follows a
Poisson distribution. Random embryology with locality implies that the
phenotypic parameters are distributed randomly about the parent species, with a
standard deviation that depends monotonically on the genotypic displacement.
The simplest such model is to distribute the phenotypic parameters in a
Gaussian fashion about the parent's values, with standard deviation
proportional to the genotypic displacement. This constant of proportionality
can be conflated with the species' intrinsic mutation rate, to give rise to
another phenotypic parameter
. It is
assumed that the probability of a mutation generating a previously existing
species is negligible, and can be ignored. We also need another arbitrary
parameter
,
"species radius", which can be understood as the minimum genotypic distance
separating species, conflated with the same constant of proportionality as
.
In summary, the mutation algorithm is as follows:
Figure 1: Evolution of the maximal eigenvalue of
for a typical
Ecolab run.
Figure 2: Evolution of the number of species for a typical
Ecolab run.
These equations have been implemented in a computer model called Ecolab [12], [11]. Also reported is a stability analysis of the equations [12], [13], of which I will briefly summarise the results:
Figure 1 shows the evolution of the maximal eigenvalue of
. With a
random seeding of species and phenotypic values, the system rapidly finds one
of the fixed points (by a massive extinction event!) with a negative definite
. Over time,
mutations build up in the system, increasing the maximal eigenvalue towards 0.
What then follows are periods of episodic extinctions, and system growth
through speciation. Figure 2 shows the number of species with more than 10
individuals for a typical run. This is an example of self organised
criticality [3],
and gives rise to power law behaviour.
Do we see the same power law behaviour observed by others [14], [2] ? The answer is emphatically yes. If speciation and extinction events occurred uniformly throughout history, as mandated by gradualism, one would expect a Poisson distribution for species lifetimes. On a log-linear plot, this would be a straight line. Alternatively, if a power law spectrum was evident, the log-log plot would be straight. The two plots are shown in Figures 3 and 4.
Figure 3: Distribution of species lifetimes on a log-linear
plot.
Figure 4: Distribution of species lifetimes on a log-log plot.
The equations and assumptions underlying Ecolab are general, and should be
testable by specific systems. It seems unlikely that anyone can create a
biological system where evolution occurs on human timescales, with the ability
to measure the phenotypic parameters
,
and
. However,
artificial life systems do offer this promise, in particular Tierra, which
mutates via point mutation and is a "well-stirred" model. Other systems, such
as Avida, can be applicable once migration is taken into account.
The general procedure of the tests is to extract individual species from a
Tierra run, run them individually to determine the
and the
diagonal
terms, and then pitting them in duels to extract the off-diagonal term
terms.
However, a number of pitfalls occur when one tries to do this. In the case of a
single species reproducing in Tierra, the population grows exponentially until
the carrying capacity is reached (memory is exhausted), and then for each new
daughter cell being born requires another cell to be killed. Tierra provided 6
different ways of doing this, ranging from killing at random to killing from
the top of a reaper queue, which roughly corresponds to the age of an
organism. This "clipped" dynamics is qualitatively different from the "sigmoid"
dynamics predicted by Equation (1).
There is an alternative form of "artificial death" which can reproduce the sigmoid behaviour. As each organism attempts to reproduce, it must take space from the environment. If this space is chosen randomly, a creature will die with probability proportional to the total number of cells (both adult and embryonic) currently allocated. In this work, we take this creature from the top of the reaper queue, preserving the age structure of the population.
Stated mathematically, the second order term due to this process reads (assuming equi-probability of any species occupying the top of the reaper queue)
where
is
the length in instructions of organism j,
is the
number of embryonic daughter cells allocated per organism, and soup_size
is the size of the soup in instructions. The extra factor of
comes about
because not only is the adult killed, but so is the embryo that has already
been counted in the
term.
In the case of a single replicating organism, this expression simplifies to
with
2,
as nearly every adult organism has an embryonic daughter cell. Table 1 shows
the ratio -
soup_size /
which should
be equal to
by Equation 3, for a short Tierra run. In Tierra notation, species names have
the genome length as a prefix, and an arbitrary 3 letter suffix. In these
examples, I have used instruction set number 1, and seeded the soup with Ray's
ancestor creature 0080aaa. In the table, the value NaN (for not a number)
refers to the situation
. Most of
the entries have
2, but a few
depart significantly from 2. These are worth closer inspection to determine
what is going on.
Table 1: Self Phenotypic Parameters for assorted Tierran
organisms
Tierra was modified to print out the number of organisms
every million
instructions executed, giving rise to a time series
. The first
difference
=
-
is related
to the derivative by
by virtue of the fact that the computer time shares between individuals -
the greater the number of individuals, the fewer instructions each individual
can execute for every million instructions.
and
can be
computed by fitting the line
=
+
![]()
by least squares.
By Equation (2), two non-interacting organisms
and
satisfy the
following equation of evolution:
In the case where
=
(for
example, maximally replicating same-length organisms), the quadratic form can
be expressed:
with
If
, then the
quadratic term becomes
Let
and
By fitting the plane
we can extract the value of
. The values
of
and
can be
compared with Equations 3 and 7 to provide a sensitive test for independence of
two organisms.
Figure 5: Dynamics of 0080boz and 0080asj for different
initial conditions.
As an example of how strange the behaviour of two organisms can get,
consider the behaviour of 0080boz and 0080asj. The organism count time series
are plotted in Figure 5 for five different initial conditions A-E. In curves
A,C and D, each organism was introduced into the soup as a solid colony, each
colony abutting the other one. In curves B and E, the two organisms were
intermixed. The time series moves temporally upwards, and the most obvious
thing is that there is more than one limit point. These appear colinearly,
marking the locus of
+
carrying
capacity. This is in stark contrast with Lotka-Volterra dynamics, with its
single equilibrium in the interior of the plane. Upon closer inspection of
these organisms, it turns out that 0080boz is incapable of self-replication,
yet 0080asj will replicate any piece of code that looks like itself (has the
same start and end templates). In effect, 0080boz is a virus that infects
0080asj colonies. The dynamics of this situation is quite complex, and cannot
be readily understood in a Lotka-Volterra framework.
The next stage of this project is to concentrate on a large sample of organisms, and to try to understand the behaviour of those pairs that interact.
A full parameter space study should be undertaken - it would be interesting, for example, to know what affects the slope of the line in Figure 4.
Migration is another important factor, as this is known to alter stability
in the ecological equations. This can be added readily with the addition of a
term
n
to the basic equations. However, spatial variation is probably most important
when considering evolution under sex (crossover mutation), as geographic
isolation leading to genetic drift is considered a prime mechanism for
speciation. Modelling sex is a challenge under Ecolab because of the necessity
of representing variation within a species. An example of how extreme this can
get is the case of ring species, where a species of bird in Europe can
mate with similar birds in Russia, which in turn can mate with those in North
America, and those in turn with a completely different species in Europe that
cannot interbreed with the species with which we started out! It is still
unclear whether it is necessary to go to an individual model [6],
at an order of magnitude increased computational cost, in order to answer
questions about sexual evolution.
Another area of interest is to study the effect of catastrophes. Catastrophes are undoubtedly a feature of evolution, as the well-documented case of the K/T event that caused the extinction of the dinosaurs testifies. However, it is interesting to ask whether the turnover of species is dominated by catastrophic events or by the self-organised criticality that Ecolab shows.
Ecolab: where to now?
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html rks2.
The translation was initiated by Pam Milliken on Mon Jan 20 13:30:32 EST 1997