|
ISSN 1320-0682 | ||||
| Volume 4 | 1997 | ||||
Ecolab [9, 7] is a model system of an ecology that attempts to understand evolutionary processes. Ecolab's dynamics is taken to be a generalised form of the Lotka-Volterra equation which is, in fact, the most general analytic two-term interaction model.
Here n is the population density, the component
being the
number of individuals of species i, r is the
difference between reproduction and death,
is the
interaction matrix, with
being the
interaction between species i and j, and mutate
is the mutation operator. * is used to denote elementwise multiplication of two
vectors.
The mutation operator must generate new degrees of freedom
(where
is the
number of species currently in the ecology), somehow defining the new
ecological coefficients
from the
previous state of the system. In reality, there is another layer (hidden in
Equation (1)
called the genotypic layer, where each organism has a definite genotype. There
is a specific map from the genotypic layer to the space of ecological
coefficients (hereafter called the phenotypic layer) called the embryology;
then the mutation operator is a convolution of the genetic algorithm operations
operating at the genotypic layer, with the embryology.
A few studies, including Ray's Tierra world, define an explicit mapping from
the genotype to some particular organism property (for example, interpreted as
machine language instructions or as weight in a neural net). These organisms
then interact with one another to determine the population dynamics. In Ecolab,
however, we are doing away with the organismal layer, and so an explicit
embryology is impossible. The only possibility left is to use a statistical
model of embryology. The mapping between genotype space and the population
parameters r,
is
expected to look like a rugged landscape. However, if two genotypes are close
together (in a Hamming sense) then one might expect that the phenotypes are
likely to be similar, as would the population parameters. This we call
random embryology with locality. Here we tend to idealise genotypes as bit
strings, although strings over an arbitrary alphabet (for example, the four DNA
bases ACGT) can equally be considered. The Hamming distance is the number of
bits (bases) that differ between the two strings. Thus, for example, if a
single bit has been removed from one string, the Hamming distance is one.
In the simple case of point mutations, the probability P(x)
of any child lying distance x in genotype space from its parent
follows a Poisson distribution, as this is the distribution of the number of
bit flips, or deletions that might occur with a point mutation. Random
embryology with locality implies that the phenotypic parameters are distributed
randomly about the parent species, with a standard deviation that depends
monotonically on the genotypic displacement. The simplest such model is to
distribute the phenotypic parameters in a Gaussian fashion about the parent's
values, with standard deviation proportional to the genotypic displacement.
This constant of proportionality can be conflated with the species' intrinsic
mutation rate, to give rise to another phenotypic parameter
. It is
assumed that the probability of a mutation generating a previously existing
species is negligible and can be ignored. We also need another arbitrary
parameter
,
"species radius", which can be understood as the minimum genotypic distance
separating species conflated with the same constant of proportionality as
.
In summary, the mutation algorithm is as follows:
mutation(random,maxval) variable. We may represent the Ecolab embryology as a probability distribution
where
or
is the
distance between two species' phenotypic parameters, and g is the
difference between the two genotypes. In the case of
which
undergoes log-normal evolution,
for small
displacements. Figure
1 shows the general form of this probability distribution, where we have
ignored the "species radius"
. Taking
into account
would mean that all genotypes
would be
considered to have the same phenotypes and lie at p=0. In effect,
this would replace a slice of the surface
with a wall
along the p=0 axis (that is,
).
Figure 1 Probability distribution of the
relation between genotype difference and the corresponding phenotype difference
Figure 2 shows the probability distribution of a mutant phenotypical coefficient about that of its parent's value. This is given by
Figure 2 The probability distribution of a
mutant phenotypical coefficient about that of its parent's value.
Tierra [2,
4,
3] is a system where small assembly language programs execute (and hence
replicate), with a small chance of errors inducing mutations in the code. As
the systems space (called the soup) is filled, a reaper operation
comes into play that kills the organisms. In a previous paper [7],
much was made about the form of the artificial death, with an argument put
forward to alter the reaper function so that its activity was proportional to
the number of cells filled and the number of cells dividing. However,
regardless of whether the original reaper function is chosen, or the new one,
the phenotypical coefficients that limit population growth (in Ecolab, these
are the diagonal terms
) depend in
a simple way on the soup size and the reproduction rate
. Therefore,
only the growth term is a relevant phenotypical coefficient.
A single species ecosystem in Tierra has one of the following behaviours:
It turns out that organisms replicating a finite number of times (but more than once) is very rare in Tierra. Cases 3 and 4 can be distinguished by comparing CPU states at successive cell divisions. If the CPU state is identical, then the organism will replicate continuously. In our study, case 3 never occurred. The ecological equation describing a single species ecology in Tierra is then given by:
where
is
the number of newborn organisms and
is the number
of adults at time t.
is given by
the recursive relation
with quantities at negative time being zero and
given by
initial seed population.
is the
number of instructions taken to replicate the first time and
is the
number of instructions taken to replicate after that. In developing this
equation, we assume that the system has been evolving long enough for the
individual cell divisions to be independent of each other in time so that the
first term of Equation (5)
is just the average number of cell divisions per time step (
). Clearly
the normal reproduction rate is
. If the
organism only replicates once, then
, in which
case the growth is arithmetic rather than exponential; and if it doesn't
replicate at all, then
also.
Within the Tierran instruction set (instruction set 0), there is only a limited number of ways two organisms can interact with each other. These occur when a matching operation fails to match a target within the executing organism. The archetypical matching operation is adrf/adrb, which interprets the following instructions as a template made up of the two instructions nop0 and nop1 (which when executed theselves do nothing - a no operation). The matching operation then searches in a forward or backward direction for a complementary template. In this study, we define all matching operations to work in both a forward and backwards direction, with the operation returning the closest match to the current program counter. Other operations in this family include jmp and call. The outcomes are as follows:
Within Tierra, the contribution of a matching operation on the ecological
dynamics is ill-determined because either the matching templates lies within a
radius SearchLimit of the calling template, in which case the cost
is exactly one instruction; or the template fails to match, most likely
preventing the organism from reproducing, therefore incurring a huge cost. As
argued in Standish [8],
an alternative is to cost the matching operation proportional to the distance
that the search needs to perform the match. Within the miniTierra
system presented in that paper, an algorithm is presented to perform this type
of match extremely efficiently. If the latter case is taken, the dynamical
Equation (5)
describing an ecology where species only interact pairwise, is given by:
As with Equation (5),
we assume
and that the ecology has evolved sufficiently so that each cell division is
independent of the others. In this case, the third and fourth terms of the
right hand side refer to the average rate of cell divisions when a template
match occurs. There is an additional cost due to the matching which is
described by the
parameters.
The fifth and sixth terms refer to those organisms that enter the adult
(reproducing) stage for the first time, like the second term. Here the
superscript p refers to the parasitic case, where the
organism does not reproduce unless at least one member of the host species
exists. The superscript c refers to the cuckold case, where a
species harnesses the metabolic processes of another species to reproduce
itself. The
s and
s are
defined similarly to the single species case; they are the time taken for the
species to reproduce the first time and the time taken to reproduce after that.
The
s and
s refer to
the accumulated matching interaction times between the two species. The ratio
of species densities multiplying the
and
refers to
the fact that the likely distance between parasite and host will depend on the
relative densities of each species.
The equations of dynamics for three or more species interacting can be worked out in a similar way as these are a logical possibility but, in this study, only pair-wise interactions amongst species were considered.
Neutral Mutation is the concept that the embryology is a
many-to-one map, and that there exist mutations of the genotype that do not
affect the phenotype. Work by Peter Schuster's group in Jena [5,
6] indicate that in the case of RNA folding
, there are connected
sequences of neutrally variant
genotypes that cover
genotype space (giant components). Basically this means that any
genotype can be reached by a single mutation from the giant component, allowing
neutral evolution to effectively walk through genotype space. Kauffman
[1]
makes the same point in saying that evolution will proceed along neutral ridges
to find new fitness peaks.
In Tierra, possibilities for neutral mutations abound. For example, the instruction used to mark the end of a template (if z is the ancestor) could be anything other than a nop instruction. A large number of neutral variants can be created by altering this instruction. Another example is if a template in a matching instruction is reduced by one instruction; the organism will often still successfully match the counterpart template. In the ancestral code, the initial search is offset by 4 instructions; this calculation is generally redundant and many neutral variants appear in this section. An example of this is the following code fragments from the ancestor (0080aaa) and single site mutant (0080aev):
nop1 ; 110 01 0 nop1 ; 110 01 1 nop1 ; 110 01 2 nop1 ; 110 01 3 zero ; 110 04 4 not0 ; 110 02 5 shl ; 110 03 6 shl ; 110 03 7 movDC ; 110 18 8 adrb ; 110 1c 9 nop0 ; 100 00 10 nop0 ; 100 00 11 nop0 ; 100 00 12 nop0 ; 100 00 13
nop1 ; 000 01 0 nop1 ; 000 01 1 nop1 ; 000 01 2 nop1 ; 000 01 3 zero ; 000 04 4 subCAB ; 000 06 5 shl ; 000 03 6 shl ; 000 03 7 movDC ; 000 18 8 adrb ; 000 1c 9 nop0 ; 000 00 10 nop0 ; 000 00 11 nop0 ; 000 00 12 nop0 ; 000 00 13
The difference between these two organisms is instruction 5, where not0 has been changed to subCAB. The point is that the adrb instruction does not use the value of the C register and, in fact, overwrites it, and the D register is not otherwise accessed anywhere in the code. Therefore, the code sequence 4-8 is a neutral area, and intron as it were, left over from Ray's earlier experiments with instruction sets.
In order to compute the phenotypical coefficients for organisms, a single
organism must be executed (without competition, and without mutation) in order
to measure the parameters
and
, and pairs
of organisms must be run in a tournament in order to calculate the interaction
terms. Initially we attempted to do this using Tierra (with all mutation etc.
turned off) as the execution environment as described in Standish [7].
The idea was to infer these quantities from the measured dynamics (
).
The measured value of
was
reasonable (or
quite well
when
), but
the technique was unable to generate accurate results for the interaction
terms. The sheer cost of executing the organisms sufficiently long enough to
obtain statistically significant results meant that months would be spent
analysing a single day's Tierra run.
By analysing the situation more carefully, we realised that it would be possible to measure the phenotypic coefficients directly if we wrote our own Tierra interpreter, as the modifications required on Tierra itself were prohibitive. Since in this project we are only interested in pairs of organisms executing, this simplified the task considerably, so the code was dubbed miniTierra [8]. It is also only necessary to execute an archetype of the organism rather than every individual. The net result is that the anticipated day's Tierra run could be analysed in ten minutes, and still give accurate results. To obtain the final results in this paper, Tierra was run for about three months continuously and miniTierra was able to analyse the data in less than an hour on a 20-processor Power Challenge.
To compute the embryology distribution f(p,g),
two distributions are computed: one for the differences in reproductive rates
(inverses of the
s and
s) as these
all measure on the same scale (that is, in instructions executed), and another
distribution for the differences in matching costs (
s and
s) as these
measure on a different scale (instructions executed per organism density
ratio); then the phenotypic difference between two organisms i
and j is given by
and
The genetic difference is relatively trivial to define, the difference being
the number of sites along the genome that the two organisms differ. Figures
3 and
4 show the histograms of
and
tuples from
a 3-month Tierra run.
These figures are also presented in VRML (Virtual Reality Markup Language) format, which allows the readers having a VRML-capable browser to rotate the graph interactively using their mouse. More information on VRML can be obtained from the VRML repository.
Figure 3
The distribution of genome differences (
) is included
at one end.
VRML version
Figure 4
The distribution of genome differences (
) is included
at one end.
VRML version
The large ridge at the origin of phenotype space indicates the fact that
most organisms are neutrally related. The only other feature of the data is a
significant pair of ridges some distance from
. This
really indicates the difference between organisms having different lifestyles -
for example, the self-reproducers versus the parasites.
By examining the parentage of the organism, it is possible to calculate the distribution of genotypic and phenotypic displacements between parent and mutant species, as shown in figures 5 and 6.
Figure 5 Distribution of
differences
that arise through mutation
Figure 6 Distribution of
differences
that arise through mutation
Clearly, Tierra has a remarkably different embryology to that posited with Ecolab in a number of ways. The first is the significance of neutral variation. The second is major changes in lifestyle between parent and child species caused by "cleavage". In Tierra, mutations are caused by single byte errors either through "cosmic rays", copying errors or instruction errors; yet, if that error occurs in a critical area of the organism like a template, a large chunk of the organism is no longer copied and hence is lost. This contrasts sharply with the point mutations that Ecolab models.
Since embryologies will vary considerably from ecosystem to ecosystem, the next step should be to examine the effect of different forms of embryologies on the emergent behaviour of the ecosystem.
The author would like to thank the New South Wales Centre for Parallel Computing for use of their Power Challenge facility. He would also like to thank Ben Simons from Sydney Vislab for advice on the use of AVS and VRML.