|
ISSN 1320-0682 | ||||
| Volume 4 | 1997 | ||||
A common way of pursuing this approach involves use of the crossover-based genetic algorithm or C-GA (Goldberg 1989). In this approach, candidate solutions are represented as strings of characters or genotypes. Reproduction involves the production of a new individual through the splicing together of genotypes from two `parents'. In the usual approach, parent genotypes are split at a certain point, forming a left part and a right part. The right part from one parent is then joined to left part from other, and vice versa. This produces two offspring genotypes which then replace relatively unfit individuals from the population.
*10**0****
Note that we can construct a schema which will match some specific genotype by replacing any one of its characters with asterisks. There are thus 2l schemas matching to any genotype of length l. There are potentially n . 2l schemas for a population of n genotypes of length l, although in practice there will usually be fewer due to duplication. (There are always at least 2l.) The defining length of a schema is the number of positions between its first and last specific position. Its order is the number of specified bits. Thus the defining length of "*10**0****" is 5 and its order is 3.
We view the goal of the GA as the multiplication of highly-fit schemas in the population.(It is of course the instances rather than the schemas which are "in" the population.)
| Genotype | Fitness | Schemata | Instances | Mean fitness |
| 01100 (1) | 103 | *1*0** | 1,2 | 63.5 |
| 01001 (2) | 24 | 0**0** | 1,2,4 | 83 |
| 10000 (3) | 87 | *****1 | 2,4 | 72 |
| 00011 (4) | 122 | 11**** | 0 | |
| 01010 (5) | 90 | **01** | 4,5 | 106 |
As Holland (Holland 1975) has shown, this goal is achieved by ordinary reproduction (that is, genotype-copying) without the need for crossover. Under this scenario, the evolutionary process involves repeatedly selecting an individual and making a copy of it. Provided that genotypes are selected with a probability proportional to their fitness, high-fitness schemas are bound to become more prevalent in the population.
Highly fit schemas will obviously tend to have many, highly fit instances. Each of these has a high chance of being selected for reproduction. Thus, highly fit schemas will tend to multiply. In fact under pure reproduction, schemas can be expected to grow "as the ratio of average fitness of the schema to the average fitness of the population" (Goldberg 1989). Unfit schemas gradually get squeezed out of the population because their instances tend not to be reproduced.
The growth formula for schemas in a pure-reproduction scenario is as follows (Goldberg 1989).
Here H is a schema, m(H,t) is the number of instances of H in the
population at time t. f(H) is the fitness of H and
is the
average fitness.
Unfortunately, the crossover method involves cutting parental genotypes into two parts and thus runs the risk of slicing up and destroying useful schemas. We therefore need to determine the extent to which the destructive properties of the crossover operator undermine schema-processing.
In the schema analysis, this is dealt with by introducing a schema conservation probability to the growth formula. In the original growth formula, the multiplication of a schema in each round is determined solely by the ratio of its fitness advantage (the difference between fitness under consideration and mean fitness) to the mean fitness. The probability of schema destruction depends on the defining length of the schema: longer schemas have a higher probability of being destroyed under crossover. The probability of destruction is thus the ratio of defining length to genotype length.(In Goldberg's book - Goldberg 1989 - it is the ratio of defining length to genotype length - 1) and the probability of conservation is the complement of this. The amended formula thus becomes

The destructive properties of the crossover operation are assumed to be negligible with schemas of short defining length since, in this case, the bracketed part of the growth formula evaluates to a value close to 1. This leads directly to the so-called schema theorem which states that "short, low-order, above-average schemata receive exponentially increasing trials in subsequent generations" (Goldberg 1989).
If fitness values range between 80 and 120 with the average value being 100, the maximum fitness ratio is 120/100 and the maximum fitness advantage is 20/100. Any schema for which the ratio of defining length to genotype length exceeds this will not be preserved correctly. If the schema is of above average fitness, it will receive decreasing rather than increasing trials in future populations (and vice versa). With a genotype length of 100 bits, the defining length of a schema must thus be less than 20 bits. The number of schemas with a defining length of less than 20 bits form less than 10-22 percent of the total number of schemas in this scenario. (This estimate is arrived at by taking the number of 20-bit schemas (220) as a proportion of the number of genotypes (2100), and multiplying by 100. The value obtained is 0.0000000000000000000000827181.) In this scenario then the shortness assumption is effectively violated and the C-GA processes a negligable fraction of the total number of schemas. Since the original fitness values used are not atypical, we have to infer that the shortness assumption is easily violated.
The schema theorem also covertly introduces an assumption concerning schema fitness. It explicitly refers to "above-average schemata" and is thus implicitly assuming that schemas have an independent fitness value. The fitness of a schema is defined as the average fitness of its instances. In the pure reproduction scenario, this quantity cannot change much since instances remain the same from generation to generation. However, in the crossover scenario, instances do change from generation to generation. Thus the fitness value of a particular schema may vary.
A schema which turns out to have a constant fitness value under crossover has an independent impact on fitness since, clearly, its fitness value is not affected by its genetic context. Conversely, a schema fitness value which does change under crossover is affected by context. The technical term for the situation where schema fitness is affected by context is epistasis. The schema theorem assumes that schema fitness is not affected by context. It thus introduces the assumption of low epistasis. How easily is this assumption satisfied?
We can imagine scenarios in which the assumption is quite reasonable. We might, for example, be using the C-GA to evolve agents whose fitness depends in some way on their colour. We might use a part of the genotype to encode a colour value. Provided that colour impacts fitness independently of other factors - as we might expect - all those schemas which effectively identify the high-fitness colour value clearly do have an independent impact on fitness.
However, where genotypes encode for any sort of mechanism or system of components, the assumption of schema-fitness is likely to be violated. In such a scenario, parts of the genotype will typically encode for parts of the mechanism. But the fitness of the whole is then related with the interactions of the parts. Thus, in principle, no one part makes an independent contribution to fitness. Where GAs are used to evolve genotypes which encode for systems or mechanisms, the low-epistasis assumption will thus typically be violated.
The scenario in which fitness is affected by `epistatic interactions' between parts of the genotype has, of course, been intensively investigated by the GA community. In fact, the construction of the so-called "GA-deceptive" problem is typically a matter of deliberately nurturing epistasis ("nonlinearity") in an encoding (cf. Goldberg 1989, ch. 2). However, the observation that the schema theorem effectively assumes low epistasis, may help to explain the problems that some researchers have encountered in the use of C-GAs in genetic programming - Koza 1992; cf. Lang 1995 and O'Reilly, in press.
In this application, high-fitness genotypes are programs for a given task and the genotype is thus literally an encoding of a mechanism. Fitness is not independently attributable to individual parts of the genotype, but only to their interactions.
The building-block hypothesis assumes that the fitness of any one block is typically affected by the other blocks on the genotype. If this were not the case it would be meaningless to talk about a "building-block process" operating over and above the usual evolutionary process. Thus the building-block hypothesis implicitly assumes only a positive effect of epistasis on fitness and thus contradicts the low-epistasis assumption introduced by the schema theorem.
When we come to consider the length implications of the building-block hypothesis we uncover a further contradiction. During the building-block process, the schemas that require processing at any given stage are actually the blocks that have been put together by the prior building-block process. Except at the initial stage, the defining length of these schemas is related to the defining lengths of the components of the blocks. Consider a block made up of just two schemas. One schema may be nested inside the other. In this case the defining length of the block is simply the defining length of the longer of the two schemas. At the other extreme the two schemas might be situated at opposite ends of the genotype. In this case the defining length of the new block will be close to the genotype length l.
If we make the conservative assumption that, on average, the defining lengths of blocks will be at least the sum of the defining lengths of the components, then we see that real schema length will increase at least exponentially during the building-block process. Assume all original schemas have defining length 1 and that all blocks have two components. Then a first-level block has defining length 2. A second level block has defining length 4. A third level block has defining length 8, and so on.
The schemas that must be processed at any given stage are the blocks that have been created by previous processing. Thus, strictly speaking, the schema growth formula - which defines growth in terms of a particular schema - cannot be meaningfully applied to a C-GA in a building-block scenario. If it is to be applied, we should at least use a time-fixed schema identifier and a definition of defining length which explicitly captures the time-related growth. For example, we might let Ht denote a schema processed at time t, and define the schema length as:
Once, the implicit growth in schema lengths is made explicit, we see that the building-block hypothesis (which depends on negligible schema length) is very likely to be violated in all but the initial stage of processing. It demands increasing schema length and thus violates the shortness assumption introduced by the schema theorem.
Thus we find that both of the assumptions introduced by building-block hypothesis directly contradict assumptions introduced by the schema theorem. The situation is illustrated schematically in the figure below.
Given the importance of the building-block hypothesis within the GA paradigm this clash of assumptions occurring at the most fundamental level of the analysis is of particular interest. As Forrest and Mitchell (1996) have commented there is a "need for a deeper theory of how low-order building blocks are discovered and combined into higher-order schemas".
When we come to consider the assumptions implicitly introduced by the building-block hypothesis we find that they implicitly contradict the assumptions underpinning the schema theorem. Thus, if we assume that the viability of the GA process is established by the schema theorem but that its `power' is accounted for by the building-block hypothesis, we have to conclude that the GA only works well in the situation in which it is guaranteed not to work effectively. The practical failures of the method, such as those reported by Forrest and Mitchell (1996) for the standard C-GA, and by Lang (1995) and O'Reilly (in press) for the GP variant, might thus be viewed as a predictable consequence of the contradictory assumptions upon which the method is based.