Complexity International
    ISSN 1320-0682     
Volume 4 1997
 

Embryology in Tierra: A study of a genotype to phenotype map

Russell K. Standish
ACSU,
Division of Information Services
The University of New South Wales
Sydney 2052, Australia

Email: R.Standish@unsw.edu.au
URL: http://parallel.acsu.unsw.edu.au/rks

Abstract:

Ecolab [9, 7] is a model system of an ecology that attempts to understand evolutionary processes. It makes a particular assumption about the embryology or genotype-phenotype map in order to generate the novel ecological interaction coefficients from the novel genotypes as they arise through mutation. The Eco-Tierra project [7] examines this assumption using Tierra [2] as a model ecosystem, which has an implicitly defined embryology.

Ecolab's Embryology

Ecolab [9, 7] is a model system of an ecology that attempts to understand evolutionary processes. Ecolab's dynamics is taken to be a generalised form of the Lotka-Volterra equation which is, in fact, the most general analytic two-term interaction model.

  equation28

Here n is the population density, the component tex2html_wrap_inline582 being the number of individuals of species i, r is the difference between reproduction and death, tex2html_wrap_inline588 is the interaction matrix, with tex2html_wrap_inline590 being the interaction between species i and j, and mutate is the mutation operator. * is used to denote elementwise multiplication of two vectors.

The mutation operator must generate new degrees of freedom tex2html_wrap_inline598 (where tex2html_wrap_inline600 is the number of species currently in the ecology), somehow defining the new ecological coefficients tex2html_wrap_inline602 from the previous state of the system. In reality, there is another layer (hidden in Equation (1) called the genotypic layer, where each organism has a definite genotype. There is a specific map from the genotypic layer to the space of ecological coefficients (hereafter called the phenotypic layer) called the embryology; then the mutation operator is a convolution of the genetic algorithm operations operating at the genotypic layer, with the embryology.

A few studies, including Ray's Tierra world, define an explicit mapping from the genotype to some particular organism property (for example, interpreted as machine language instructions or as weight in a neural net). These organisms then interact with one another to determine the population dynamics. In Ecolab, however, we are doing away with the organismal layer, and so an explicit embryology is impossible. The only possibility left is to use a statistical model of embryology. The mapping between genotype space and the population parameters r, tex2html_wrap_inline588 is expected to look like a rugged landscape. However, if two genotypes are close together (in a Hamming sense) then one might expect that the phenotypes are likely to be similar, as would the population parameters. This we call random embryology with locality. Here we tend to idealise genotypes as bit strings, although strings over an arbitrary alphabet (for example, the four DNA bases ACGT) can equally be considered. The Hamming distance is the number of bits (bases) that differ between the two strings. Thus, for example, if a single bit has been removed from one string, the Hamming distance is one.

In the simple case of point mutations, the probability P(x) of any child lying distance x in genotype space from its parent follows a Poisson distribution, as this is the distribution of the number of bit flips, or deletions that might occur with a point mutation. Random embryology with locality implies that the phenotypic parameters are distributed randomly about the parent species, with a standard deviation that depends monotonically on the genotypic displacement. The simplest such model is to distribute the phenotypic parameters in a Gaussian fashion about the parent's values, with standard deviation proportional to the genotypic displacement. This constant of proportionality can be conflated with the species' intrinsic mutation rate, to give rise to another phenotypic parameter tex2html_wrap_inline612 . It is assumed that the probability of a mutation generating a previously existing species is negligible and can be ignored. We also need another arbitrary parameter tex2html_wrap_inline614 , "species radius", which can be understood as the minimum genotypic distance separating species conflated with the same constant of proportionality as tex2html_wrap_inline612 .

In summary, the mutation algorithm is as follows:

  1. The number of mutant species arising from species i within a timestep is tex2html_wrap_inline620 . This number is rounded stochastically to the nearest integer; for example, 0.25 is rounded up to one 25% of the time and down to zero 75% of the time.
  2. Roll a random number from a Poisson distribution tex2html_wrap_inline622 to determine the standard deviation tex2html_wrap_inline624 of phenotypic variation.
  3. Vary r according to a Gaussian distribution about the parents' values, with tex2html_wrap_inline628 as the standard deviation, where tex2html_wrap_inline630 is the range of values tp which r is initialised; that is, tex2html_wrap_inline634 .
  4. The diagonal part of tex2html_wrap_inline588 must be negative, so vary tex2html_wrap_inline588 according to a log-normal distribution. This means that if the old value is tex2html_wrap_inline588 , the new value becomes tex2html_wrap_inline642 , where tex2html_wrap_inline644 is a normally distributed random variate with standard deviation tex2html_wrap_inline624 . These values cannot arbitrarily approach zero however, as this would imply that some species make arbitrarily small demands on the environment and will become infinite in number. In Ecolab, the diagonal interactions terms are prevented from becoming larger than tex2html_wrap_inline648 , where r is the corresponding growth rate for the new species.
  5. The off-diagonal components of tex2html_wrap_inline588 , are varied in a similar fashion to r, where tex2html_wrap_inline656 takes the place of tex2html_wrap_inline630 . However, new connections are added or old ones removed according to tex2html_wrap_inline660 , where tex2html_wrap_inline662 is chosen from a uniform distribution. The values on the new connections are chosen from the same initial distribution that the off-diagonal values were originally set with; that is, the range tex2html_wrap_inline664 to tex2html_wrap_inline666 . In order to ensure that the system is bounded, yet still allow "interesting" non-definite matrix behaviour, the sum tex2html_wrap_inline668 is adjusted to be non-positive.
  6. tex2html_wrap_inline612 must be positive, so it should evolve according to the log-normal distribution like the diagonal components of tex2html_wrap_inline588 . Similar to tex2html_wrap_inline588 , it is a catastrophe to allow tex2html_wrap_inline612 to become arbitrarily large. In the real world, mutation normally exists at some fixed background rate; species can reduce the level of mutation by improving their genetic repair algorithms. In Ecolab, this ceiling on tex2html_wrap_inline612 is given by the mutation(random,maxval) variable.

We may represent the Ecolab embryology as a probability distribution

equation66

where tex2html_wrap_inline680 or tex2html_wrap_inline682 is the distance between two species' phenotypic parameters, and g is the difference between the two genotypes. In the case of tex2html_wrap_inline686 which undergoes log-normal evolution, tex2html_wrap_inline688 for small displacements. Figure 1 shows the general form of this probability distribution, where we have ignored the "species radius" tex2html_wrap_inline614 . Taking tex2html_wrap_inline614 into account would mean that all genotypes tex2html_wrap_inline694 would be considered to have the same phenotypes and lie at p=0. In effect, this would replace a slice of the surface tex2html_wrap_inline698 with a wall along the p=0 axis (that is, tex2html_wrap_inline702 ).

 

   figure86
Figure 1     Probability distribution of the relation between genotype difference and the corresponding phenotype difference

 

Figure 2 shows the probability distribution of a mutant phenotypical coefficient about that of its parent's value. This is given by

equation98

 

   figure106
Figure 2     The probability distribution of a mutant phenotypical coefficient about that of its parent's value.


Tierran Dynamics

Tierra [2, 4, 3] is a system where small assembly language programs execute (and hence replicate), with a small chance of errors inducing mutations in the code. As the systems space (called the soup) is filled, a reaper operation comes into play that kills the organisms. In a previous paper [7], much was made about the form of the artificial death, with an argument put forward to alter the reaper function so that its activity was proportional to the number of cells filled and the number of cells dividing. However, regardless of whether the original reaper function is chosen, or the new one, the phenotypical coefficients that limit population growth (in Ecolab, these are the diagonal terms tex2html_wrap_inline686 ) depend in a simple way on the soup size and the reproduction rate tex2html_wrap_inline712 . Therefore, only the growth term is a relevant phenotypical coefficient.

A single species ecosystem in Tierra has one of the following behaviours:

  1. The organism does not do anything (that is, it does not replicate);
  2. The organism replicates once then does not replicate again;
  3.  The organism replicates a finite number of times;
  4.  The organism replicates continuously.

It turns out that organisms replicating a finite number of times (but more than once) is very rare in Tierra. Cases 3 and 4 can be distinguished by comparing CPU states at successive cell divisions. If the CPU state is identical, then the organism will replicate continuously. In our study, case 3 never occurred. The ecological equation describing a single species ecology in Tierra is then given by:

  equation124

where tex2html_wrap_inline714 is the number of newborn organisms and tex2html_wrap_inline582 is the number of adults at time t. tex2html_wrap_inline714 is given by the recursive relation

  equation129

with quantities at negative time being zero and tex2html_wrap_inline722 given by initial seed population. tex2html_wrap_inline724 is the number of instructions taken to replicate the first time and tex2html_wrap_inline726 is the number of instructions taken to replicate after that. In developing this equation, we assume that the system has been evolving long enough for the individual cell divisions to be independent of each other in time so that the first term of Equation (5) is just the average number of cell divisions per time step ( tex2html_wrap_inline728 ). Clearly the normal reproduction rate is tex2html_wrap_inline730 . If the organism only replicates once, then tex2html_wrap_inline732 , in which case the growth is arithmetic rather than exponential; and if it doesn't replicate at all, then tex2html_wrap_inline734 also.

Within the Tierran instruction set (instruction set 0), there is only a limited number of ways two organisms can interact with each other. These occur when a matching operation fails to match a target within the executing organism. The archetypical matching operation is adrf/adrb, which interprets the following instructions as a template made up of the two instructions nop0 and nop1 (which when executed theselves do nothing - a no operation). The matching operation then searches in a forward or backward direction for a complementary template. In this study, we define all matching operations to work in both a forward and backwards direction, with the operation returning the closest match to the current program counter. Other operations in this family include jmp and call. The outcomes are as follows:

  1. The organism does not reproduce unless at least one individual of the other organism is present;
  2. The organism reproduces the other organism (its instruction pointer is captured).

Within Tierra, the contribution of a matching operation on the ecological dynamics is ill-determined because either the matching templates lies within a radius SearchLimit of the calling template, in which case the cost is exactly one instruction; or the template fails to match, most likely preventing the organism from reproducing, therefore incurring a huge cost. As argued in Standish [8], an alternative is to cost the matching operation proportional to the distance that the search needs to perform the match. Within the miniTierra system presented in that paper, an algorithm is presented to perform this type of match extremely efficiently. If the latter case is taken, the dynamical Equation (5) describing an ecology where species only interact pairwise, is given by:

  eqnarray146

As with Equation (5), we assume tex2html_wrap_inline736 and that the ecology has evolved sufficiently so that each cell division is independent of the others. In this case, the third and fourth terms of the right hand side refer to the average rate of cell divisions when a template match occurs. There is an additional cost due to the matching which is described by the tex2html_wrap_inline738 parameters. The fifth and sixth terms refer to those organisms that enter the adult (reproducing) stage for the first time, like the second term. Here the superscript p refers to the parasitic case, where the organism does not reproduce unless at least one member of the host species exists. The superscript c refers to the cuckold case, where a species harnesses the metabolic processes of another species to reproduce itself. The tex2html_wrap_inline624 s and tex2html_wrap_inline746 s are defined similarly to the single species case; they are the time taken for the species to reproduce the first time and the time taken to reproduce after that. The tex2html_wrap_inline612 s and tex2html_wrap_inline738 s refer to the accumulated matching interaction times between the two species. The ratio of species densities multiplying the tex2html_wrap_inline612 and tex2html_wrap_inline738 refers to the fact that the likely distance between parasite and host will depend on the relative densities of each species.

The equations of dynamics for three or more species interacting can be worked out in a similar way as these are a logical possibility but, in this study, only pair-wise interactions amongst species were considered.


Neutral Mutation

Neutral Mutation is the concept that the embryology is a many-to-one map, and that there exist mutations of the genotype that do not affect the phenotype. Work by Peter Schuster's group in Jena [5, 6] indicate that in the case of RNA folding gif, there are connected sequences of neutrally variant gif genotypes that cover genotype space (giant components). Basically this means that any genotype can be reached by a single mutation from the giant component, allowing neutral evolution to effectively walk through genotype space. Kauffman [1] makes the same point in saying that evolution will proceed along neutral ridges to find new fitness peaks.

In Tierra, possibilities for neutral mutations abound. For example, the instruction used to mark the end of a template (if z is the ancestor) could be anything other than a nop instruction. A large number of neutral variants can be created by altering this instruction. Another example is if a template in a matching instruction is reduced by one instruction; the organism will often still successfully match the counterpart template. In the ancestral code, the initial search is offset by 4 instructions; this calculation is generally redundant and many neutral variants appear in this section. An example of this is the following code fragments from the ancestor (0080aaa) and single site mutant (0080aev):

0080aaa
nop1    ; 110 01   0
nop1    ; 110 01   1
nop1    ; 110 01   2
nop1    ; 110 01   3
zero    ; 110 04   4
not0    ; 110 02   5
shl     ; 110 03   6
shl     ; 110 03   7
movDC   ; 110 18   8
adrb    ; 110 1c   9
nop0    ; 100 00  10
nop0    ; 100 00  11
nop0    ; 100 00  12
nop0    ; 100 00  13
0080aev
nop1    ; 000 01   0
nop1    ; 000 01   1
nop1    ; 000 01   2
nop1    ; 000 01   3
zero    ; 000 04   4
subCAB  ; 000 06   5
shl     ; 000 03   6
shl     ; 000 03   7
movDC   ; 000 18   8
adrb    ; 000 1c   9
nop0    ; 000 00  10
nop0    ; 000 00  11
nop0    ; 000 00  12
nop0    ; 000 00  13

The difference between these two organisms is instruction 5, where not0 has been changed to subCAB. The point is that the adrb instruction does not use the value of the C register and, in fact, overwrites it, and the D register is not otherwise accessed anywhere in the code. Therefore, the code sequence 4-8 is a neutral area, and intron as it were, left over from Ray's earlier experiments with instruction sets.


Results and Discussion

In order to compute the phenotypical coefficients for organisms, a single organism must be executed (without competition, and without mutation) in order to measure the parameters tex2html_wrap_inline724 and tex2html_wrap_inline726 , and pairs of organisms must be run in a tournament in order to calculate the interaction terms. Initially we attempted to do this using Tierra (with all mutation etc. turned off) as the execution environment as described in Standish [7]. The idea was to infer these quantities from the measured dynamics (tex2html_wrap_inline760). The measured value of tex2html_wrap_inline726 was reasonable (or tex2html_wrap_inline724 quite well when tex2html_wrap_inline732), but the technique was unable to generate accurate results for the interaction terms. The sheer cost of executing the organisms sufficiently long enough to obtain statistically significant results meant that months would be spent analysing a single day's Tierra run.

By analysing the situation more carefully, we realised that it would be possible to measure the phenotypic coefficients directly if we wrote our own Tierra interpreter, as the modifications required on Tierra itself were prohibitive. Since in this project we are only interested in pairs of organisms executing, this simplified the task considerably, so the code was dubbed miniTierra [8]. It is also only necessary to execute an archetype of the organism rather than every individual. The net result is that the anticipated day's Tierra run could be analysed in ten minutes, and still give accurate results. To obtain the final results in this paper, Tierra was run for about three months continuously and miniTierra was able to analyse the data in less than an hour on a 20-processor Power Challenge.

To compute the embryology distribution f(p,g), two distributions are computed: one for the differences in reproductive rates (inverses of the tex2html_wrap_inline746 s and tex2html_wrap_inline624 s) as these all measure on the same scale (that is, in instructions executed), and another distribution for the differences in matching costs ( tex2html_wrap_inline612 s and tex2html_wrap_inline738 s) as these measure on a different scale (instructions executed per organism density ratio); then the phenotypic difference between two organisms i and j is given by

equation188

and

equation208

The genetic difference is relatively trivial to define, the difference being the number of sites along the genome that the two organisms differ. Figures 3 and 4 show the histograms of tex2html_wrap_inline782 and tex2html_wrap_inline784 tuples from a 3-month Tierra run.

These figures are also presented in VRML (Virtual Reality Markup Language) format, which allows the readers having a VRML-capable browser to rotate the graph interactively using their mouse. More information on VRML can be obtained from the VRML repository.

 

   figure225


Figure 3      tex2html_wrap_inline554
                         The distribution of genome differences ( tex2html_wrap_inline556 ) is included at one end.
VRML version
 

   figure244


Figure 4      tex2html_wrap_inline558
                         The distribution of genome differences ( tex2html_wrap_inline556 ) is included at one end.
VRML version
 

The large ridge at the origin of phenotype space indicates the fact that most organisms are neutrally related. The only other feature of the data is a significant pair of ridges some distance from tex2html_wrap_inline806 . This really indicates the difference between organisms having different lifestyles - for example, the self-reproducers versus the parasites.

By examining the parentage of the organism, it is possible to calculate the distribution of genotypic and phenotypic displacements between parent and mutant species, as shown in figures 5 and 6.

 

   figure266


Figure 5     Distribution of tex2html_wrap_inline562 differences that arise through mutation

 

   figure271


Figure 6     Distribution of tex2html_wrap_inline564 differences that arise through mutation


Conclusion

Clearly, Tierra has a remarkably different embryology to that posited with Ecolab in a number of ways. The first is the significance of neutral variation. The second is major changes in lifestyle between parent and child species caused by "cleavage". In Tierra, mutations are caused by single byte errors either through "cosmic rays", copying errors or instruction errors; yet, if that error occurs in a critical area of the organism like a template, a large chunk of the organism is no longer copied and hence is lost. This contrasts sharply with the point mutations that Ecolab models.

Since embryologies will vary considerably from ecosystem to ecosystem, the next step should be to examine the effect of different forms of embryologies on the emergent behaviour of the ecosystem.


Acknowledgements

The author would like to thank the New South Wales Centre for Parallel Computing for use of their Power Challenge facility. He would also like to thank Ben Simons from Sydney Vislab for advice on the use of AVS and VRML.


References

1
Kauffman, S. (1995) At Home in the Universe: The Search for Laws of Complexity, Oxford UP, New York.

 

2
Ray, T. S. (1991) An approach to the synthesis of life, in Artificial Life II, (eds) C. G. Langton, J. Taylor, D. Farmer & S. Rasmussen, Addison-Wesley, New York, p. 371.

 

3
Ray, T. S. (1994) Evolution, complexity, entropy and artificial reality, Physica D, 75: 239-263
URL. http://www.hip.atr.co.jp/~ray/pubs/oji/ojihtm .html.

 

4
Ray, T. S. (1994) An evolutionary approach to synthetic biology, zen and the art of creating life, Artificial Life, 1: 195.
URL: http://www.hip.atr.co.jp/~ray/pubs/zen/z nhtml.html

 

5
Reidys, C., Kopp, S. & Schuster, P. (1997) Evolutionary optimization of biopolymers and sequence structure maps, in Artificial Life V,(eds) C. Langton & K. Shimohara, MIT Press, p. 379.

 

6
Schuster, P. (1997) Landscapes and molecular evolution, Physica D, (submitted).
URL: http://www.santafe.edu/s i/publications/Working-Papers/96-07-047.ps

 

7
Standish, R. K. (1996) Ecolab: Where to now? in Complex Systems: From Local Interaction to Global Phenomena, (eds) R. Stocker, H. Jelinek, B. Durnota & T. Bossomaier, IOS Press.
URL: Complexity International, vol. 3

 

8
Standish, R. K. (1997) On an efficient implementation of Tierra. Complexity International, vol 4.
URL: http://www.complexity.org.au/vol4/eff-t erra/eff-tierra.html

 

9
Standish, R. K. (1994) Population models with random embryologies as a paradigm for evolution, in Complex Systems: Mechanism of Adaption, (eds) R. J. Stonier & Xing Huo Yu IOS Press, Amsterdam.
URL: Complexity International, vol. 2

...folding,
here the amino acid sequence is the genotype, and the secondary structure, or shape of the molecule is the phenotype
...variant
i.e. have the same phenotype

Complexity International (1997) 4