Complexity International       /vol09/wagner01/ © Copyright 2002     
Volume 09 Received: 
Accepted: 
15 Jan 2002
4 Dec 2002




Evolving consensus among a population of communicators

Kyle Wagner and Jim Reggia
University of Maryland at Baltimore, University of Maryland Institute for Advanced Computer Studies
A. V. Williams Building, University of Maryland, College Park, MD 20742
Phone: (410)381-1158    Fax: (301) 405-6707
email: elyk@cs.umd.edu
http://www.cs.umd.edu/~elyk



Abstract

How does a group of individuals who lack a shared communication system evolve to achieve a consensus, so that every member of the group uses each signal in a manner consistent with others in the group? There are many factors that affect the difficulty of this task, including the number of signals available, the number of meanings or situations to convey, the population size, and whether or not any learning occurs. Each of these factors is explored in simulations which use a genetic algorithm that selects for agents who communicate meanings effectively with other agents. The difficulty of gaining consensus among a population of signalers increases as the number of meanings (and signals) increases, but decreases if more signals than meanings are allowed, although there is probably a limit to the benefit derived from having more signals than meanings. Surprisingly, difficulty decreased as population size increased, but the learning algorithms used did not significantly improve the population's ability to achieve consensus. An analysis is made of the exponentially increasing difficulty of achieving consensus as the number of meanings and signals grows. The implications for the evolution of communication are discussed.


1     Introduction

One of the basic phenomena associated with the evolution of a communication system is that the communication system becomes shared among a population [32,33]: there is a consensus among signalers about which utterances are appropriate in various contexts, and there is a consensus among receivers about the appropriate responses to each utterance. Many animals use communication of this sort. Belding's ground squirrels produce different alarm calls to indicate different kinds of predators [24] as do vervet monkeys [4]. Many birds emit food calls when they find a large patch of food, including house sparrows [12,13,10]. Many species of squid produce a large variety of different body patterns and postures to communicate intentions and internal states (e.g., aggression, alarm, courtship, readiness to mate) [16,8]. Clearly, many species have evolved and learned to use a variety of efficient communication systems, with a well-defined meaning for each signal. But how does a group of individuals which initially lacks a shared communication system evolve to achieve a consensus?

The problem of evolving a shared communication system is different from evolving other kinds of traits. For example, eye colour mainly requires coordination with the environment but not with other organisms. Communication on the other hand requires coordination not only with the environment but also with the other members of a population of organisms. This highly relative fitness entails a much more complex process. Understanding the added difficulties of evolving interactive traits, such as communication or rearing offspring, will help in more fully characterizing the complexity of the evolutionary process.

Many computer simulations over the last decade have demonstrated that consensus can be achieved among a population of agents by a variety of adaptive processes. Supervised and reinforcement learning methods [37,32,18,28] and evolutionary algorithms [35,22,1,2,3,34,29,28] have shown how agent populations can evolve or learn to communicate to solve various cooperative tasks (food finding, mate finding, predator avoidance). These simulations have used small signal repertoires (typically from 1 to 8 signals), and employ a variety of behavioral representations (e.g., feedforward neural nets, recurrent neural nets, finite state machines, lookup tables, production systems, and a few unique mechanisms).

A more detailed and methodical examination is needed of the effects of various factors on the difficulty of evolving consensus among a population of simple signalers. Dunbar [9] has argued that population size and the amount of interaction among a social group has a large effect on what kind of communication might evolve. Although some studies have investigated the effects of population size on signalers (e.g., [3,18,28,20,29]), only Levin [20] has systematically investigated the connection between population size and the difficulty in achieving consensus. Levin did not describe in detail the level of consensus that his populations reached. In addition to population size and levels of interaction, other important factors in achieving consensus include signal repertoire and learning mechanisms. Although game-theoretical analyses have been made of signaling, such as Newman and Caraco's analysis of food-calling in house sparrows [26], no analyses seem to have explored the difficulty of evolving the signals in the first place.

As no simulation work (other than Levin [20]) has focused extensively on the difficulty of the problem of evolving consensus, we used a multi-agent, computational simulation to systematically explore various factors: the number of signals and meanings in use, population size and gregariousness (level of interaction among the population), and whether or not agents can learn. Our focus was on the dynamics of achieving consensus under simple learning and evolutionary processes. A simulation is described which allows agents (simulated communicators) to interact by sending and decoding signals. Fitness points are awarded for correct decoding of a signal. A genetic algorithm selects for those agents whose communication results in higher fitness scores (due to more correct than incorrect decodings). The results of manipulating the previously mentioned factors are presented: small population sizes, large numbers of meanings, and the addition of ``one-shot'' learning made evolving consensus more difficult, while having more signals available than meanings made the task easier.

1.1 Previous work

When approaching the problem of consensus, it is useful to consider the Saussurean notion that a communication system consists of a mapping between meanings and signals [27,6]. Although some might suggest that communication at this level of abstraction is unrealistic, it nevertheless characterises much of the simulation work described above--the search for a shared communication system among a population of agents. At this level, agents can be treated simply as meaning encoders (producing a signal for each meaning) and signal decoders (producing a meaning heard from another's signal). With regard to this kind of communication system, Steels [32] argued that increasing population size and the number of signals would lead to a considerable increase in the difficulty of achieving convergence among the signalers. Hutchins and Hazlehurst [18] explored these factors by simulating a population of agents who taught each other what their signal was for each of a set of ``visual'' input patterns. Each agent had a feedforward neural net, used supervised learning to autoassociate visual input patterns and used one of its hidden layers (the ``verbal'' layer) as a signal output. Agents were constrained to develop unique verbal layer patterns. Over time, the population of agents developed a unique signal to correspond with each different visual input.

Similarly, Oliphant [28] let agents in a population teach each other their communication systems. Increasing the population size caused convergence time to increase dramatically, as did increasing the number of meanings and signals to be mapped. Livingstone and Fyfe [21] placed 120 agents in a linear world who then learned to communicate about their internal states from each other by alternately being teacher and learner (randomly paired) in a series of supervised learning epochs. Livingstone and Fyfe found that consensus could be reached when there were no spatial constraints but that local, mutually incompatible ``dialects'' developed when agents were permitted to communicate only with close neighbors.

Instead of using learning, Levin [20] used a standard genetic algorithm to perform a systematic exploration of many factors that may affect the evolution of communication. Agents were given fitness points for being able to determine another agent's internal state by decoding the signal it made. Agents who had higher fitness scores were allowed to mate and produce offspring similar to themselves. His simulations resulted in agents who evolved to use the same signals for a set of internal states, analogous to the visual patterns and internal states in the work mentioned above. An agent was represented as a meaningsignal matrix (encoder) and a signalmeaning matrix (decoder), similar to lookup tables.

Agents in Hutchins and Hazlehurst's, Oliphant's, and Livingstone and Fyfe's simulations all used learning, and all were trained on both the internal and external properties of a model agent. Such access is unrealistic among real organisms, where access to internal states can only be inferred from indirect signs. Levin's [20] simulations were more realistic in that they prevented agents from knowing each others' internal states.

Simulations involving agents occupying two-dimensional worlds have also demonstrated that a population of agents can evolve to share a communication system (for example, see [3,29,34,35,22,1,30,15,19,7]). These simulations typically used agents with 1 to 8 possible signals and population sizes from 2 to several hundred (except for Ackley and Littman's much larger populations). Most of them, however, did not drastically vary population size or the number of signals and meanings. These simulations were concerned with the nature of the signals themselves, the environmental factors contributing to the evolution of communication, and the consequences that signals have for coordination among a population of agents.

2 Methods and Expectations

2.1 The simulator and task

At the start of each experimental run, a fixed size population of agents is created. Each agent is given a randomised genome, which is used to build the agent's encoder and decoder. The simulation proceeds as follows:

  1. interaction phase, where the agents interact with each other by encoding meanings as signals and subsequently decoding those signals (agents are scored on their communicative success)
  2. a genetic algorithm (described in Section 2.3) selects agents based on their fitness scores to be parents of the offspring that will form the next generation
  3. offspring replace parents
The simulation repeats steps 1-3 for each new generation until either a preset number of generations is reached or the population achieves consensus (evolves a shared communication system which accounts for all possible meanings).

During the interaction phase, each agent plays games with partners randomly-chosen with replacement from the population (see Figure 1). The number of partners for each agent is determined by a gregariousness parameter, g (see [20]), a proportion in the range [0,1]. If gregariousness were 0.25 and population size were P, for instance, then each agent would have P/4 partners.

Figure 1: Each agent (circle) is chosen to play some number of games with a subset of the population. The current interactant, I, will play games with a set of partners selected randomly with replacement from the population. Every agent in the population eventually gets a turn to be the interactant.

During a game (see Figure 2), an agent and its partner take turns being speaker and hearer. The object of each game is for the hearer to determine what meaning caused the speaker to produce a specific signal. (The term game is derived from Wittgenstein's and Steels' notions of a language game [36,33] and is only weakly related to the notion of games in game theory [31].) The simulator presents the speaker agent with a randomly-chosen meaning which the speaker must encode as a signal. The hearer receives the signal and decodes it as a meaning. If the decoded meaning matches the meaning given to the speaker, then both agents get a fitness point, representing the idea used in many past models that effective communication leads to increased survival. No points are given for mismatches. Since real speakers and hearers are subject to error, a small amount of ``noise'' is added to this process for realism. On some small probability, pn, the speaker's encoding and/or the hearer's decoding are changed to something other than what they originally produced.

Although it is not necessarily realistic to reward both participants for a successful decoding, there are two good reasons to do this. Oliphant found that consensus among communicators was best achieved when both sender and receiver benefitted from the communication's success [27], and this is also the case for many other simulations, including [22,35,34]. Since one purpose of these experiments is to determine how difficult it is to achieve consensus even under the best conditions, awarding both sender and receiver for a correct decoding should make the task as easy as possible. Even so, as will be shown in the results, consensus can be difficult to attain under these idealistic conditions.

Figure 2: A single game is played by first giving one agent a meaning. This agent, the sender, encodes that meaning and produces a signal (s). The other agent, the receiver, takes the sender's signal and decodes it into a meaning. If the decoded meaning (m2) matches the sender's given meaning (m1), then both sender and receiver get a point of fitness. Here, the interactant is the sender, and one of its partners is the receiver. Both the interactant and its partner will get several chances to be sender and receiver.

The agent and its partner play a number of games equal to twice the number of meanings so that most meanings will be explored (e.g., with 3 meanings, 6 games will be played with each partner). After the completion of all interaction phases, each agent's final fitness score is the ratio of number of games ``won'' to total number of games played, normalised to a 1000-point scale. This normalisation accounts for a different number of games played by each agent (due to the random selection of partners and the different levels of interaction across experiments).

2.2 Simulated agents

When acting as a speaker, an agent determines which signal it should send for a given meaning by looking up the corresponding entry in its encoder table. When acting as a hearer, an agent determines the meaning of a signal by looking up the corresponding entry in its decoder table. Each signal and meaning is represented as a positive integer. The encoder and decoder are represented as two separate, discrete, deterministic lookup tables (see Figure 3). Each table entry is specified by a gene in the agent's genome (chromosome). Since the encoder and decoder tables have a canonical order (encoder entries are ordered by ascending meaning value, while decoder entries are ordered by ascending signal value), the genome need only specify the encoder entries followed by the decoder entries.

(A)
(B)
Figure 3: Simple meaning-signal mapping agents. Signals and meanings are discrete elements. (A) Each agent has an encoder, which takes a specific meaning and produces a signal. Each agent also has a separate decoder, which takes a signal (produced by another agent) and produces a meaning. (B) A self-consistent lookup table. The encoder describes a signaling system which maps each meaning to a unique signal. An agent using this encoder would emit signal 2 when given meaning 3 by the simulation. The decoder similarly maps each signal to a unique meaning. The encoder can be uniquely specified by listing the signals emitted for each meaning: ``3, 1, 2''. The decoder is similarly specified, ``2, 3, 1'', so the genome for this encoder/decoder pair would be ``3 1 2 2 3 1'', where the first 3 integers specify the encoder entries and the last 3 integers specify the decoder entries.

Lookup tables are functionally similar to mechanisms used by Levin [20], MacLennan and Burghardt [22], Oliphant [28] and even Hutchins and Hazlehurst [18]. The use of the term ``meaning'' does not imply any special status--meanings in this simulation simply stand for the internal states or local environments that other researchers have used.


2.3 Evolutionary processes

A simple genetic algorithm (GA) is used to simulate evolution. Tournament selection determines which agents will be parents. For P agents, P tournaments are conducted to select parents (thus, for each tournament, one agent becomes a parent). Tournaments are always conducted between two randomly-chosen agents from the population, and the best of the two is chosen to be a parent 80% of the time; 20% of the time, the lesser fit of the two becomes a parent. This implies that an agent with higher than average fitness should be chosen as a parent during several tournaments, while an agent with lower than average fitness may not be chosen to be a parent at all. Each pair of agents chosen in this way is mated, producing one offspring. Enough offspring are generated to completely replace their parents' generation. Crossover is used to create offspring genomes from the parents' genomes, and offspring genomes are mutated. Crossover is multi-point, with a probability of .8 that each gene of an offspring comes from one parent, and a .2 probability that the gene comes from the other parent (see [25]). Mutation rates are always .001 per gene to ensure that diversity can be introduced into several members of the population during one generation (since the population sizes are generally low and the number of genes is small). A mutation to a lookup table gene changes that entry value to some other randomly-selected value.


2.4 Experimental procedures

A baseline set of experiments was run to provide a control for comparison with subsequent experiments. For these control runs, N = Ns = Nm = 3 (where Nm is the number of meanings and Ns is the number of signals), no learning was used, noise level was pn = .05, population size was 50, and gregariousness was moderate (g = .4).

Several parameters were then varied in different experiments to test their impact on the population's achievement (or lack) of consensus. Gregariousness and population size were varied, agents were tested using different values of N, a series of experiments were run where Ns > Nm, and learning types were compared.

The population was considered to have achieved consensus when at least 50% of the population had achieved the maximum fitness, given the amount of noise in the system. Once consensus was reached, the program was stopped. The maximum expected fitness for communicating with 5% noise is 892.5 points. A consensus of 50% of the population was a reasonable stopping point because when 50% of the population had a perfect fitness score, it was almost always the case that the rest of the population had an almost perfect fitness score as well, and almost every agent necessarily shared the same communication system. Noise, an unlucky mutation, or a single lingering "bad" gene would account for slightly less than perfect performance among these remaining 50%.

The number of generations required for the population of agents to reach consensus was the primary measurement of difficulty to reach consensus. This value was recorded for each simulation run (along with many other measurements including genetic diversity and fitness at every generation). The generation of consensus was averaged over all runs for each experimental condition to produce a single generation of consensus, G, for each condition.

To determine the statistical significance of any differences, a two-sample t-test was used to compare the means of the generation of consensus, G, for two experimental conditions. Since the direction of difference was usually known, a one-tailed test was usually employed. A t-test might not be adequate for some results since the distribution may not always be normal; regardless, the results of the t-test are provided uniformly for comparison purposes.


2.5 The complexity of consensus

There are a number of reasons to expect that the evolution of shared communication will be quite difficult. Here we show that as the number of signals and meanings increases, the difficulty of evolving a shared communication system increases dramatically. It is assumed in the following that the number of meanings is the same as the number of signals, Ns = Nm. The conclusions would differ if Ns > Nm. Consensus here implies that all agents use the same system and that all meanings are unambiguously communicated.

As the number of meanings and signals increases, the difficulty in gaining consensus among the population will increase rapidly because the size of the search space increases exponentially. Consider a signaler who can use N distinct signals and needs to communicate N distinct meanings. There are N! ways to uniquely map a set of N distinct meanings to a set of N distinct signals--therefore, there are N! possible unique encoders. However, there are NN possible encoders, many of them associating multiple meanings with the same signal. Therefore, for consensus among just encoders (or just decoders), a huge space of NN must be searched to find one of N! possibilities. In addition, a unique encoder is only useful if receivers have a matching decoder. Given that any one of the N! unique encoders has been found, there is only one decoder from the NN possible decoders which will perfectly match it. For a population to achieve a reasonable level of consensus, a great many of these possible decoders must be ruled out.

Aside from the enormous size of the search space, the difficulty of associating unique meanings and signals increases as the population begins to achieve consensus. The first few signal-meaning associations should be relatively easy to find since the number of partially-matching decoders for a given encoder is large (and vice-versa). At the start of a simulation, assume a population of size P is uniformly distributed with respect to signal-meaning associations. With regard to the meaning m1, there would be roughly an equal number of agents with associations between m1 and each possible signal sj (1 ≤ jN). There are N possible signal associations for m1, so there will be an expected P / N agents with a specific association. For example, with P = 6 agents and encoders with N = 3 possible associations between a single meaning and the 3 available signals, there will be an expected 2 agents with each possible association mi sj, where 1 ≤ j ≤ 3 (see Figure 4a). Early in the simulation, any single change (from evolutionary or learning processes) to any of these agents' encoders will take it out of a group of expected size P / N and place into a group of the same size plus one (itself), causing the group it left to decrease in size. This larger group of agents encoding a particular meaning-signal (mx sj) association now exerts more pressure on other agents to decode the corresponding signal-meaning association (sj mx) (see Figure 4b), i.e., any agents whose decoders are compatible with the encoders of this larger group will have a higher expected fitness.

A B
Figure 4: Caricature of the effect of a single mutation to an encoder on a group of agents at the beginning of a simulation. Agents are indicated by black dots. Each agent is shown in the appropriate box based on which meaning-signal association it has in its encoder. (A) Of the six agents in the population, two encode meaning a using each of the three signals. (B) The next generation contains a different composition of agents who encode meaning a. Now, any agents who decode signal 2 as meaning a will have a higher fitness on average than those who decode signals 1 or 3 as meaning a.

However, as more signals become commonly associated with specific meanings, fewer signals are left for the remaining meanings. Any change (due to evolution or learning) to an agent's decoder or encoder might disrupt an existing, widespread association, and thus lower the fitness of that agent. To take an extreme example, the number of good associations diminishes greatly when there is only one meaning and one signal left to associate. The last encoding is a choice among N possible signals where only one signal is not already in use; a similar situation exists for the last decoding. A mutation to an encoder or a decoder must not disrupt the N - 1 existing associations or it will conflict with previous associations. Therefore, a change to an encoder when consensus is almost complete has a 1 in NN chance of being the correct one. Furthermore, another agent must experience a corresponding change in its decoder (with the same probability). All other changes will either change the unused association from one incorrect value to another, leaving fitness the same (an unlikely N - 1 in NN probability); or will destroy an existing, widespread association, decreasing fitness (a highly likely NN - N in NN probability).

A final point to make is that population size should affect the time to achieve consensus. Steels [32] has argued and Hutchins and Hazlehurst [18] and Oliphant [28] have demonstrated that populations of learning agents will take more time to reach consensus when population sizes increase. Evolving agents experience similar interaction constraints as learners, but genetic diversity is also an important factor. If diversity is quickly reduced in a population, then the chance of gaining consensus for an ideal communication system drops dramatically as explained above. While a large population can provide a higher level of diversity than a smaller one, convergence in genetic algorithms is often faster in larger populations [14]. The process of genetic convergence conflicts with the difficult problem of finding an ideal, shared communication system. It is important to realize that genetic convergence does not necessarily lead to consensus in the way we mean it. As we are defining consensus, agents must agree on one way to encode meanings and a matching way to decode signals to derive those meanings; furthermore, in this paper, we take consensus to mean that agents also communicate all meanings unambiguously. A population could genetically converge on one encoder and decoder which were simply incompatible: all agents might encode meanings in one way, and decode signals in a totally different way; no agent could even understand its own signals. Overall, evolving communicators should benefit from large populations since there will be more diversity available for natural selection, and the problem of consensus can be solved in step with genetic convergence.

3 Results

For each experimental condition, the simulation was run at least 10 times (except in a few computationally intensive cases), using a different random population of genomes each time. Most experimental runs ended with the population's achieving consensus according to the criteria described above. Some experiments resulted in populations that did not achieve consensus.

A B
Figure 5: Typical run under baseline conditions (defined in Section 2.4). (A) Average fitness of the population for each generation until consensus was reached. (B) Genetic diversity of the population for each generation. The lower the value for diversity, the closer the agent's genomes are to each other.

Table 1 and Figure 5 illustrate a typical run under the baseline conditions. This population reached consensus by the 22nd generation. Figure 5 shows the rise in fitness that was typical of all runs. Table 1 shows a sample agent's decoders and encoders from the initial population and an agent from the final population. In the initial population, most agents' decoders differed from each other; the same was true for agents' encoders. 48 of the 50 agents in the final population had identical tables to the agent shown in Table 1. Agent fitnesses varied mostly because of noise and also because of who they interacted with; final fitnesses ranged from 759 to 919, with an average of 886. For 20 runs of the baseline condition, the average number of generations to consensus, G, was between 15 and 28, with three outliers at 95, 296 and 337. These kinds of outliers were common and probably reflect local minima due to initial population diversity as well as to the variability of interactions and genetic operators. In the results that follow, these outliers seem to be part of a bimodal distribution, which is discussed in more detail below. The seeming bimodal nature of the outcomes inflated the standard deviations for most experiments.


Table 1: Sample initial and final encoders and decoders.
initial final
encoder decoder
1 3 1 1
2 3 2 2
3 2 3 3
encoder decoder
1 2 1 3
2 3 2 1
3 1 3 2


Some runs exhibited a difficult climb to consensus. When a population of agents failed to achieve consensus early (or ever), it was often because the agents in the population became almost genetically identical without establishing a unique one-to-one mapping between meanings and signals. An example of this is exhibited in Figure 6. In this case, as in many cases where consensus took longer than normal for that condition, genetic diversity was lost early on (diversity falls to 0 well before generation 50). Only after enough diversity was regained could a fortunate mutation or crossover have had any chance of helping the population achieve consensus.

A B
Figure 6: The effects of premature genetic convergence (convergence before consensus is reached) during a baseline run. Population size was 50 and N = 3. (A) Average fitness of the population at each generation until consensus was reached. (B) Genetic diversity of the population for each generation. Notice that the population converged genetically long before fitness reached its maximal value. Diversity had to be built up again over generations until the final meaning-signal associations were found, and only then did the genetic diversity diminish again.

3.1 Number of meanings

Since others (e.g., [20,28]) have demonstrated that increasing the number of meanings also increases time to consensus, we also replicated this result. Figure 7 illustrates that the relationship seems to be less than exponential, but still superlinear, as suggested by the complexity analysis in Section 2.5.

Figure 7: The effects of the number of meanings on the number of generations required for consensus among the population of signalers.

3.2 Gregariousness and population size

Gregariousness (g) and population size (P) were manipulated simultaneously. Gregariousness was either high (g = 1.0), medium (g = .4) or low (g = .1). Population size was 10, 50, 100, 300 and 1000, while the number of meanings and signals, N, was 3 or 10. The results of these experiments are shown in Figures 8 and 9.


A B
Figure 8: The effects of population size on the number of generations, G, required for consensus among the population of signalers. A N = 3 and B N = 10*. Note that because of scale, many of the bars for N = 3 appear to represent zero generations before consensus; they actually represent between 18 and 24 generations for all but medium gregariousness, P = 50 and 100, where populations took an average of 56 generations to reach consensus (for both population sizes).

* For N = 10/P = 10, all levels of gregariousness, and for N = 10/P = 50 high and low gregariousness, at least one run did not reach consensus over a large number of generations (the number of runs reaching consensus / the total number of runs for that condition).


Population size had a large effect on the number of generations required for consensus. Surprisingly, as population size grew from 10 to 1000, the number of generations required to get consensus decreased dramatically (see Figure 8). These increases were statistically significant in most cases (going from P = 10 to 100 and P = 100 to 1000). For N = 3, increase in population size was only significant up to P = 50, although the decrease in difficulty as population size increases clearly holds. The decrease in difficulty as population size increases is surprising considering that Steels [32] expected larger populations sizes to make the task harder, and Hutchins and Hazlehurst [18] found that for small populations of agents (5 to 15 agents), increasing population size led to greater difficulty in learning a shared communication system.

Another surprising result was that a population size of 10 was detrimental to achieving consensus (as Levin [20] also found), whereas population sizes of 50 or more afforded a much lower time to consensus. A population size of 20 was used in some pilot experiments not reported here; these experiments resulted in times to convergence that were much better than those at a population size of 10. Finally, it is important to note that the standard deviation of the outcomes drops significantly as the population size increases. This is a clue that larger populations reduce the bimodality found in the distribution of G.

Figure 9: The effects of population size on the number of generations required for consensus among the population of signalers, shown with standard deviations (error bars). N = 10. Note the strong downward general trend in number of generations to consensus as population size increases.

Overall, gregariousness had very little systematic effect, and no effects were seen for N = 10. When N = 3, for P = 100, g = high was slightly more effective than g = low (p < .0025), but not g = medium. A similar situation was true for P = 1000: g = medium was a bit more effective than g = low (p < .025). No other differences in gregariousness were significant. The fact that consensus can emerge even under low levels of gregariousness indicates that a species could evolve consensus even in a large population with low levels of interaction.

3.3 Learning

It is difficult to devise flexible learning schemes for lookup tables, so for these learning experiments, agents used simple feedforward neural networks (with one layer of weights--no hidden units) instead of lookup tables. Each output unit has n weighted inputs and has a bias. For example, if Nm = 2 and Ns = 3, the decoder has 2 output units and 3 input units. Each output unit would then have 3 weights and a bias for a total of 6 weights and 2 biases. The genome has a real-valued gene for each weight and bias. The segment of the genome for the above decoder might look like < .15, - .32,.94, 0 > , < .3,.18, - .35, - .002 >, thereby specifying 3 weights and a bias for the first output unit and 3 weights and a bias for the second output unit.

A simple form of reinforcement learning was enabled in agents for the following experiments. When an agent sends a signal and the hearer decodes a different meaning from the sender, the sender adjusts its neural net so as to not produce that signal again when given the same meaning. Hearers do not learn; only senders learn. A one-shot adjustment causes the agent to use a different signal the next time it is given the same meaning, while a gradual adjustment only causes the agent to be more likely to use a different signal next time. Enough gradual adjustments will eventually cause the agent to produce a new signal.

Neural net agents were tested, with N = 3 (see Table 2), using one-shot or gradual reinforcement learning, or no learning at all. The GA was also operating under all of these conditions. The changes resulting from learning were not propagated to offspring (thus, inheritance remained Darwinian, not Lamarckian). The weight adjustment during gradual learning was always ±.05. Higher values of this adjustment resulted in much poorer performance during pilot runs. Runs were stopped after 2000 generations if they had not achieved consensus by that point.


Table 2: Learning: Number of generations, G ± standard deviation, to consensus.
no learning gradual one-shot
277±394 120±168 *(0/10)

* At least one run did not reach consensus over a large number of generations (the number of runs reaching consensus / the total number of runs for that condition).


Gradual reinforcement learning seemed to be better than no learning at all. The differences are not statistically significant, but this is due to the high variability in the results of the non-learning condition. Thus, gradual reinforcement learning seemed to reduce the mean and variability of the time to consensus, but not overwhelmingly. The most surprising result was that one-shot learning was terrible compared to either gradual learning or no learning at all. No population under one-shot learning reached consensus after 2000 generations (statistically significant from the non-learning and gradual results, p < .0005).

3.4 Extra signals

Although, as predicted, the difficulty in achieving consensus increased dramatically as the number of meanings, Nm, increased, we hypothesized that consensus would be more readily attained if more signals than meanings were available. Experiments were run with Nm set to 2, 3, 5 or 10. The number of available signals, Ns, was always set to be larger than Nm by an amount Ne, where Ne = Ns - Nm. Ne was set to 1, 2, 3, 4, 5, 10, 20 or 50. As seen in Table 3 and Figure 10, extra signals were clearly beneficial to agents in achieving consensus in signaling.


Table 3: Extra signals: Number of generations, G ± standard deviation, to consensus
Nm Number of extra signals, Ne
0 1 2 3 4 5 10 20 50
2 13       11       13
3 56       17       16
5 807 *(9/10) 397 412 27 110 30 27 32 24
10 3252 3065 3022 2471 765 900 1166 704 167
Note: Blank entries are conditions that were not explored.

Different values of Ne seemed to be best, depending on Nm. When Nm = 5, Ne = 3 was a significant improvement over Ne = 2 (p < .02). Similarly, when Nm = 10, Ne = 4 improved the situation greatly over Ne = 3 (p < .025). For Nm = 5, smaller Ne than 3 was not statistically better than Ne = 0, and for Nm = 10, smaller Ne than 4 was not statistically better than Ne = 0. It seems that there is some minimum number of extra signals needed before any significant effect appears, but this minimum number increases as Nm increases. Twenty extra signals was an improvement over three extra signals (p < .025), and fifty extra signals was an even more dramatic improvement (p < .025).

Figure 10: The effects of extra signals on the number of generations required for consensus among the population of signalers.

Finally, to see if the above findings with Ns > Nm would hold for neural nets as well, neural net agents were used instead of lookup table agents in several otherwise identical conditions, where Ne = 50, and the same dramatic improvements occurred. For Nm = 10, when Ne = 0, no condition achieved consensus by 10,000 or even 20,0000 generations. However, when Ne = 50, all 10 runs resulted in consensus by G = 305 ± 300 (p < .001).

4 Discussion

In summary, for a population of agents to reach consensus with a simple signal-meaning mapping system, there are many difficulties. As the number of meanings and signals increases, the difficulty in reaching consensus grows exponentially. However, having more signals than meanings makes the task much easier. It is important to maintain diversity in the population before consensus is reached, which can be achieved by using a large population. Gradual reinforcement learning showed a slight advantage over no learning, while one-shot learning was actually worse than evolution alone.

In our experiments, an increase in population size led to a decrease in the mean time (and variability) to evolve consensus. Yet Steels [32] expected and Hutchins and Hazlehurst [18] and Oliphant [28] both showed that as population size increases, so does the difficulty of achieving consensus. Why are our results so different? The most important difference is that our simulation used evolution while their simulations used learning. Their learning agents needed to teach each other an initially random communication system, and it would take time to propagate a single communication system throughout a population with limited interactions. Also, since the learning agents were not subject to a genetic algorithm, diversity might not have been as much of an issue: there was no inheritance, and genetic operators like mutation and crossover did not play a role. Since Hutchins and Hazlehurst's neural net agents began with random initial weights, a slightly larger population would have helped to provide more initial diversity, but this might have been counteracted by the problem of many learning interactions among a larger population (which was also a problem for our learning experiments described above).

There were a few curious points in the gregariousness/population size data which need to be explained. With 3 signals and meanings (N = 3) and either high or low gregariousness, consensus emerged quickly with a small population size (P = 50). On the other hand, with medium gregariousness, consensus emerges quickly only with a large population size (P = 1000). What could cause this disparity? Few interactions (g = low) will afford more stability as the communication system develops, while many interactions (g = high) will exert strong selection pressure for rapid compliance. With N = 3, the signal/meaning space is small, so small populations will usually have the solution already. Medium gregariousness may simply be the "worst of both worlds": there are not enough interactions to promote universal agreement, but there are too many interactions to maintain stability. N = 10 is a different case: the signal/meaning space is huge, so small populations will always have a difficult time (due mainly to lack of diversity).

Diversity was important--the bigger the gene pool, the more potential solutions exist, and also the more potential interactants for a specific mutation. Maintenance of diversity was crucial to completing the last few signal-meaning associations. Early genetic convergence probably explains the bimodal outcomes of many of the experiments, when these conditions resulted in early consensus (typically before 100 generations) or very late consensus (typically 1000 or more generations). Either some subset of the population converged on a complete system and the rest of the genomes were quickly selected against, or the population genetically converged early on one system that was not consistent, and the loss of diversity prevented the individuals from quickly agreeing on the final signal-meaning pairs; they instead had to wait for matching changes to appear in one agent's decoder and in another agent's encoder (for example, see Figure 6). For lookup tables, the chance of a paired fortuitous mutation like this is p = [m/(NsNm)]2 where m is the mutation rate and 1/(NsNm) is the chance that a mutation is in the right place with the right value. For neural nets, the chance is lower since a mutation may have no effect on an association. A large population size probably confers its advantage through genetic (and thus communicative) diversity. Nevertheless, it is possible that very large populations, say 10,000 or more, would exhibit an increase in time to consensus due to the problem of coordinating multiple independent, effective signaling systems.

A few other simulations have shown the effects of different numbers of meanings, but Levin's [20] experiments went no higher than N = 6 (probably due to computational constraints) and Oliphant's [28] agents used learning instead of evolution. Like Levin and Oliphant, we found that the difficulty in achieving consensus clearly increased as the number of meanings (and signals) increased. As explained in Section 2.5, the search space of possible communication systems greatly increases in size as N increases. For N = 2, there are 2 unique, consistent decoder-encoder systems (N!) out of 16 possible ones (N2N), whereas for N = 5 there are 120 unique systems out of 9.8 million possible systems, and for N = 10 there are 3.6 million unique systems out of 1020 possible systems.

The ability of large populations to achieve consensus more rapidly than smaller ones--even as the number of meanings grows--makes sense in terms of diversity: the increased diversity takes more time to destroy and also provides more options for decoders and encoders alike. Noise may also play a role similar to diversity; its importance to the evolution of a communication system was studied by Grim et al. [15].

Why was one-shot learning so disruptive to achieving consensus? In the meaning-signal mapping task used in our simulations, agents initially had highly divergent, random communication systems. Each time an agent was paired with another, it was likely to learn a particular meaning-signal association for each meaning (since each agent was initially quite different from the others). However, when the same agent was paired with a new partner, the associations it just learned would be mostly incorrect. The agent would fail to communicate very well, gaining few fitness points. Additionally, the agent would again learn new associations. This cycle would continue, preventing the agent from ever settling on a single system. The random genetic starting point of any agent would usually be no better than that of any other agent, so evolution would have very little to work with. Gradual learning, on the other hand, would be more sensitive to the successes and failures an agent would encounter over time, while being less disruptive to existing associations. Although it did not perform well in our simulations, one-shot learning would be appropriate for some behaviors (e.g., noxious stimulus aversion), including many human linguistic behaviors (e.g., fast mapping [5]).

One-shot learning doesn't have to be disruptive. MacLennan and Burghardt's agents achieved consensus much faster when using one-shot supervised learning as opposed to evolution alone [22]. These agents only interacted with a small, fairly fixed subset of the population, so one-shot learning could have been more beneficial due to a more predictable signaling environment.

Only selection pressure for many distinct signals coupled with a lengthy period of evolution (and perhaps learning) can produce a population that shares all parts of a large communication system. This may explain why having Ne extra signals helped so much. Having extra signals is like having a blackboard to work on: new signals can be ``selected'' from those available, and the likelihood that they are already in use is lower as Ne increases. The benefit is multiplicative, so for Ne extra signals, there should be O(NNe) possible unused associations to explore, rather than O(1) in the case where Ne = 0. However, there is probably a point of diminishing returns: significantly more signals than meanings (e.g., a large Ne) might be problematic since the receivers may have difficulty finding exactly which unused signal is now in use by a signaler for an unassociated meaning (a meaning currently with no associated signal). Livingstone and Fyfe [21] pointed out another advantage of extra signals, which is to allow several alternatives for a specific meaning (or internal state). In this way, agents are more flexible in how they interpret signals, and the population can actually retain some genetic diversity while still achieving communicative consensus.

Future work might focus on other factors playing a role in achieving consensus, comparing various kinds of learning agents to evolving agents. Other behavioral representations need to be explored (e.g., finite state machines, recurrent neural networks and production systems) as well as other genetic representations (e.g., diploid, haploid, epistatic). An active role for noise should be examined as Grim [15] has begun to do. More complex representations of signals and meanings (e.g., distributed, temporal) that resemble natural communication systems need to be used. Finally, different results for gregariousness and population size might be attained if agents interacted with a fixed number of others.

More work needs to be done to characterise the complexity and difficulty of acquiring a communication system. The number of signals used in a communication system is not an adequate measure of its complexity. Other factors such as the form and structure of the signals and the contexts in which they appear must be taken into account (e.g., [11,17]).

Acknowledgments

We thank Jim Newkirk, Jerry Wilkinson, Juan Uriagereka, Yotam Hermoni, Rita Berndt and two anonymous reviewers for useful discussions and comments related to this paper. This research was supported by an NIH Post-doctoral Fellowship (T32 DC00061) at the University of Maryland at Baltimore and by funding from the University of Maryland Institute for Advanced Computer Studies.

References

1
Ackley, D. & Littman, M. (1994) Altruism in the evolution of communication, in Artificial Life IV: Proceedings of the fourth international workshop on the synthesis and simulation of living systems, (eds) R. Brooks & P. Maes, MIT Press: 40-48.

2
Baray, C. (1997) Evolving cooperation via communication in homogeneous multi-agent systems, in Proceedings of Intelligent Information Systems, Los Alamitos, CA. IEEE Computer Society, IEEE Computer Society : 204-208.

3
Baray, C. (1999) Evolution of coordination in reactive multi-agent systems, PhD thesis, Indiana University, Bloomington, IN.

4
Cheney, D. L. and Seyfarth, R. M. (1990) How monkeys see the world: Inside the mind of another species, University of Chicago Press, Chicago.

5
Clark, E. (1995) The lexicon in acquisition, Cambridge University Press, Cambridge.

6
de Saussure, F. (1985/1959) The linguistic sign in Semiotics: an introductory anthology, (ed) R. Innis, Indiana University Press, Bloomington, IN: 24-46.

7
Di Paolo, E. A. (2000) Behavioral coordination, structural congruence and entrainment in a simulation of acoustically coupled agents, Adaptive Behavior, 8(1): 27-48.

8
DiMarco, F. P. and Hanlon, R. T. (1997) Agonistic behavior in the squid Loligo plei (loliginidae, teuthoidea): fighting tactics and the effects of size and resource value, Ethology, 103: 89-108.

9
Dunbar, R. (1993) Coevolution of neocortical size, group size and language in humans, Behavioral and Brain Sciences, 16(4): 681-735.

10
Elgar, M. A. (1986) House sparrows establish foraging flocks by giving chirrup calls if the resources are divisible, Animal Behavior, 34: 169-174.

11
Elman, J. L. (1993) Learning and development in neural networks: The importance of starting small, Cognition, 48: 72-99.

12
Ficken, M. S. (1981) Food finding in black-capped chickadees: altruistic communication? Wilson Bulletin, 93(3): 393-394.

13
Ficken, M. S. (1989) Acoustic characteristics of alarm calls associated with predation risk in chickadees, Animal Behaviour, 39(2): 400-401.

14
Greenhalgh, D. and Marshall, S. (2000) Convergence criteria for genetic algorithms, SIAM Journal on Computing, 30(1): 269-282.

15
Grim P., Kokalis T., Tafti, A., and Kilb, N. (2000) Evolution of communication in perfect and imperfect worlds, World Futures: The Journal of General Evolution, 56: 179-197.

16
Hanlon, R. T., Smale, M. J., and Sauer H. H. W. (1994) An ethogram of body patterning behavior in the squid Loligo vulgaris reynaudii on spawning grounds in South Africa, Biological Bulletin, 187: 363-372.

17
Hare, M. and Elman, J. L. (1995) Learning and morphological change, Cognition, 56: 61-98.

18
Hutchins, E. and Hazlehurst, B. How to invent a lexicon: the development of shared symbols in interaction, in Artificial societies: the computer simulation of social life, (eds) N. Gilbert and R. Conte, UCL Press, London: 157-189.

19
Jim, K.-C. and Giles, C. L. (2000) Talking helps: Evolving communicating agents for the predator-prey pursuit problem, Artificial Life, 6(3): 237-254.

20
Levin, M. (1995) The evolution of understanding: A genetic algorithm model of the evolution of communication, BioSystems, 36: 167-178.

21
Livingstone, D. and Fyfe, C. (1999) Modeling the evolution of linguistic diversity. in Advances in Artificial Life, volume 1674 of Lecture Notes in Artificial Intelligence (eds) D. Floreano, J.-D. Nicoud, and F. Mondada, Springer.

22
MacLennan, B. J. and Burghardt, G. M. (1993) Synthetic ethology and the evolution of cooperative communication, Adaptive Behavior, 2(2): 161-188.

23
Maes, P., Mataric, M. J., Meyer, J.-A., Pollack, J., and Wilson, S. W. (1996) From animals to animats 4, Cape Cod, MA, sep 9-13, MIT Press.

24
Mateo, J. M. (1996) The development of alarm-call response behaviour in free-living juvenile belding's ground squirrels, Animal Behaviour, 52: 489-505.

25
Mitchell, M. (1996) An introduction to genetic algorithms, MIT Press, MA.

26
Newman, J. A. and Caraco, T. (1989) Co-operative and non-co-operative bases of food-calling, Journal of Theoretical Biology, 141: 197-209.

27
Oliphant, M. (1996) The dilemma of saussurean communication, Biosystems, 37(1-2): 31-38.

28
Oliphant, M. (1999) The learning barrier: Moving from innate to learned systems of communication, Adaptive Behavior, 7(3-4): 371-384.

29
Reggia, J. A., Schulz, R., Wilkinson, G. S., and Uriagereka, J. (2001) Conditions enabling the emergence of inter-agent signalling in an artificial world, Artificial Life, 7(1): 3-32.

30
Saunders, G. M. and Pollack, J. B. (1996) The evolution of communication schemes over continuous channels, in [23]: 580-589.

31
Smith, J. M. (1982) Evolution and the theory of games, Cambridge University Press, Cambridge, UK.

32
Steels, L. (1996) Emergent adapative lexicons. in [23].

33
Steels, L. (1997) The synthetic modeling of language origins, Evolution of Communication, 1(1): 1-34.

34
Wagner, K. (2000) Cooperative strategies and the evolution of communication, Artificial Life, 6(2): 149-179.

35
Werner, G. M. and Dyer, M. G. (1991) Evolution of communication in artificial organisms, in Artificial Life II, SFI Studies in the Sciences of Complexity, volume X, Addison-Wesley, MA: 659-687.

36
Wittgenstein, L. (1968) Philosophical investigations, The Macmillan Company, New York, third edition.

37
Yanco, H. and Stein, L. (1996) An adaptive communication protocol for cooperating mobile robots, in From animals to animats 2, (eds) J.-A. Meyer, H .L. Roitblat, and S .W. Wilson, Cape Cod, MA, MIT Press: 478-485.