Complexity International
    ISSN 1320-0682     
Volume 3 April 1996
 

Towards a Mathematics of Complexity

David G. Green

...Green
Johnstone Centre, Charles Sturt University, PO Box 789 Albury 2640 Australia Email: dgreen@csu.edu.au

Abstract:

There is a need for a unified, mathematical theory of complexity that is capable of expressing the structures and processes common to different phenomena. This account proposes three elements for such a theory. (1) Universal measurement generalises the concept of measurement to include many formal systems of observation. (2) Graphs, which are inherent in the structure and behaviour of all complex systems, provide possible units of measurement that are the equivalent of numbers for organisation. (3) Syntactic homomorphism, which identifies the underlying "programs" common to classes of patterns and processes, provides at least one analytical tool. As well as suggesting new techniques, the above elements imply a need to re-interpret many current ideas and methods. They also suggest a series of open questions for future research.


Introduction

One of the great successes of complexity theory has been to identify common patterns and processes in distinct systems. Despite this success, there is as yet no uniform approach to dealing with complexity. Separate systems are still treated in separate fashion. There is a need for a body of mathematical theory that enables us to represent systems in ways that embody key concepts and make common structures clear.

Many results about complexity concern the organisation and behaviour of computational systems, rather than numerical ones, so traditional mathematical methods are often inappropriate. Complexity has become increasingly associated with a new scientific paradigm that sees processes in the real world as "natural computation". This paradigm treats natural processes as forms of computation; conversely, it seeks inspiration from nature in looking for new approaches to computation. Several authors have pointed to the need for a general mathematical approach to complexity that would unify all of these diverse ideas.

In this account, I outline three ideas that may help to provide the foundations for a general approach. First, I generalise the definition of measurement to include a wide range of formalised observations [4]. Secondly, I identify graphs as fundamental units of measurement for organisation and behaviour. Finally, I propose the notion of "syntactic homomorphism" as a fundamental analytic concept. Even these preliminary ideas are useful as they cast new light on various well-known methods and point to a range of fundamental questions for further research.


Universal measurement

The term "measurement" has become almost synonymous with "numbers". However, it is not how values are expressed, but how they are obtained that distinguishes measurement from other forms of observation. The essential features of any system of measurement are a formalised procedure whose outputs express features in a systematic way. Many observational systems (for example, event recorders, pattern recognition) provide methods that are just as rigorous as those of any numerical measure. To highlight the similarities, we shall use the following broad definition of measurement.

A measure is a formal system of observation that yields values for a variable. Its elements are:

  1. A formal language within which to express the values that the measure can take;
  2. A model that defines rules about relationships between the values;
  3. A set of objects and observations (standards) that define the semantics of the formal model;
  4. A formal procedure (algorithm) for obtaining values.

Firstly, notice that numerical measures do satisfy this definition. For instance, in weighing objects we use the language of arithmetic. The model defines the properties of numbers, with which we express values for mass and ensure that the laws of arithmetic apply. To measure mass, we require standards to fix the values 0 and 1, and to define how we compare and combine masses. We define these standards in terms of the action of a balance. Zero mass is the mass on an empty balance pan. Unit mass is the mass of some universal standard object. Two objects are defined to have equal mass if they do not tip a balance when placed on opposite pans. Adding two masses is equivalent to placing two objects on one pan together. The algorithm to determine mass assigns a real number to a test object by comparing it with the unit standard. The experimental procedure is to balance the test object against multiples or fractions of the standard. In practice, we normally simplify this procedure by using pre-calibrated scales.

As a non-numeric example, records of animal behaviour often take as values the sequence of actions that an animal performs. The underlying model represents behavior as a sequence of discrete actions. We would express these actions as symbols denoting, say, sleeping (S), feeding (F), or walking (W). We might record these sequences of symbols by tapping keys on an event recorder (a device that places symbols in sequence onto a chart) whilst observing the animal. The standards would be criteria, perhaps supported by film or photographs, that uniquely define each category of behaviour in the system. In practice, a biologist has to rely on his or her memory of such standards when making observations.

Some Implications

The above definition has several immediate consequences. Firstly, it implies that we should re-interpret many observational systems as forms of measurement. In particular, many pattern recognition systems produce syntactic descriptions that could be regarded as measurements in the sense we have defined here. More generally, almost no effort has been devoted to developing systems whose outputs are precise syntactic descriptions.

Secondly, anything we can measure, we can represent as variables. So, by analogy with conventional algebra, we can ask whether there are identifiable relationships between sets of variables. We shall address the question of how to find, represent and use such relationships later.

If we adopt the notion of syntactic variables then we need to change the way we study many natural phenomena. The process of identifying and quantifying variables has had an enormous effect on the development of theory in every area of science. To give just one example, the notion of diversity has always been a central theme in ecology. The introduction of numerical measures of diversity has transformed the nature of ecological research. Whole study programmes centre on measuring diversity and many theories concern trends, patterns and influences on diversity.

The above considerations lead to a crucial question. What are the variables that could and should be measured non-numerically? One prospect is that non-numeric approaches might help to make some key ideas in complexity more precise. Some of these ideas include organisation, behaviour, and the notion of complexity itself.


Units of measurement

Generalising the idea of measurement leads directly to the next question: what should we use as the units of measurement? The usefulness of arithmetic stems directly from two facts. Firstly, numbers are inherent in many different phenomena. Secondly, arithmetic is a familiar and well-developed body of theory. So, in seeking units of complexity, we need to look for constructs that are widely applicable, familiar, and have the capacity for development of useful methodologies. There are several structures that meet these requirements.

If we adopt graphs  [12] as units, then two issues immediately arise:

  1. How are we to represent the units? Unlike numbers, there are many different possible graphs for a given number of nodes. We also need systematic methods of expressing graph structure for computational purposes. Tree graphs, for instance, can be represented as strings, by using (say) nested brackets to denote branches. The string (((1)(2))((3)(4))), for example, denotes a balanced binary tree with four terminal nodes.
  2. What are the most relevant operations on graphs? An obvious analogue of addition is to join two graphs by adding an edge to link two nodes. Again, however, this operation is more complex than addition: two graphs with M and N nodes respectively can be joined in MN possible ways. Another possible operation is convolution, in which nodes in one graph are replaced by a second graph. This operation corresponds to multiplication for arithmetic and to substitution in string grammars. Some relevant ideas, such as matroids, already exist in graph theory  [12], but more work will clearly need to be done to formalise the use of graphs as fundamental units.

 

  figure37
Figure 1: Some issues arising in using graphs as units of measurement. (a) All four possible graphs of 3 nodes. If arcs linking a node to itself are allowed, then the total rises to 32 possible graphs. If nodes are labelled, then the total rises to 64. (b) An example of "adding" two graphs by joining selected nodes. (c) An example of "convolution" of two graphs. Here, a tree is substituted for every node in another graph.


Algebraic properties

A potentially fruitful approach to the need for finding and expressing algebraic relationships is to build on the ideas of computation and syntax. We can describe any structure via a formal syntax and we can consider that syntax as the program of an automaton. Thus, we can express any pattern as the result of a program plus data. For example, if S denotes a starting word and A,B,C,... are alphabet symbols, then we can generate patterns as follows:

tabular45

An important corollary of the above definition is that we can express relationships between variables in terms of common elements within their underlying programs. A set S of patterns are related if we can identify a common form ("coform") co(S). This is a pattern that is inherent in each pattern in the set. As a simple example, consider the following set of sentences:

          THE CAT SAT ON THE MAT
          THE DOG SAT ON THE MAT
          THE DOG SAT NEAR THE FIRE

One common form for this system would be

displaymath50

To make this precise, we can say that the patterns tex2html_wrap_inline213 are related if there is a pattern co(S) and a set of homomorphisms tex2html_wrap_inline217 such that

tabular55

These ideas differ from current thinking about computational complexity in two important ways. Firstly, in general we do not know the full program of any pattern. Secondly, attention has usually been given to the notion of the "shortest" program that generates a pattern and the problem of finding that program. The approach here considers the nature of the program that arises within a particular framework. In practical terms, the coform co(S) is the pattern produced by a particular algorithm A. Different algorithms will produce different coforms.

Some examples

An obvious application of the above ideas is to data compression. Many algorithms, such as run-length encoding, reduce data volume by methods related to the above. More generally, any coding system compresses information by taking advantage of common elements in a dataset.

Another example is the notion of schemas for genetic algorithms [8]. A schema is a string of symbols that denotes a pattern of similarities within a set of genotypes. For instance, suppose that the encoding is binary; then genotypes can be represented as strings of 0 and 1: say 0011010 or 1100010. These two strings match in the last three positions. So we can represent the common pattern by a similarity template ****010, where an asterisk denotes either 0 or 1. Holland [8] used this approach in his Schema Theorem.

Genomic analysis

We can find a practical example of the above ideas in genome data analysis [2]. Suppose that we have a set of corresponding DNA sequences from different organisms; then questions of interest are what similarities and differences exist in the sequences. How are the organisms related? We also want to understand the function of the sequences. Do corresponding DNA sequences code for functionally similar proteins? In the production of proteins, redundancies in the genetic code mean that different DNA base sequences may translate to identical amino acid sequences. Also, the amino acids themselves fall into several groups with similar properties. Thus, apparently quite different protein sequences can yield functionally identical proteins.

Consensus methods in genomic analysis provide a practical application of the above ideas. These methods identify common patterns in a set of sequences. One approach is to classify bases or amino acids according to the physical properties that highlight features of the 3-D folding structure . A practical advantage of consensus methods is that new sequences can be aligned by comparing them against the consensus pattern, thus restricting alignment to pairs of sequences. One variation - profile analysis [7] - generates fuzzy consensus sequences by providing a score for all 20 possible amino acids at each point in the sequence.

  figure73

Figure 2: Application of structural rules to amino acid sequence data. The rules (charge and hydrophobicity) emphasise structural similarities between the sequences and provide markers that can be used in aligning the sequences and in deducing protein-folding patterns. The consensus pattern here shows simple majorities wherever they occur, with capitals indicating perfect consensus.

A popular consensus technique is to identify motifs. Motifs are short strings of amino acids that have particular chemical or physical properties. They are often crucial in the formation of 3-D structures (for example, helixes, beta-sheets) that affect the way a protein functions. Such motifs therefore tend to be conserved by evolution, and appear as constant features within whole families of sequences  [11]. Families of proteins, have been characterised in terms of the patterns of motifs that they share in common. The Prosite Database  [1], for instance, catalogues numerous motif patterns, such as [LIVFM]-x(2)-Y-D-S-[YA]. Here, LIVFM denotes a particular string of amino acids, the notation [] denotes any one of the enclosed amino acids and x(2) denotes a sequence of any two amino acids. Motif searches can simplify alignment enormously because they provide markers that constrain the possible configuration of sequences.


Discussion

The ideas developed here fall a long way short of a general theory of complexity. They do not even constitute an adequate mathematical toolkit for dealing with complexity. On the other hand, they do hopefully demonstrate the possibility of attaining both goals. What is more, they provide a possible starting point for developing a general, systematic approach.

There are many potential advantages in a mathematical approach to complexity. For a start, a unifying formalism enables us to address questions at a general level, rather than case by case. It also raises the possibility of developing new methodologies, as some of the examples given earlier suggest.

Even the preliminary ideas developed here have potential uses, such as the development of high-level application languages. For example, implicit in every simulation model is an abstract, algebraic structure. A model of (say) a factory might assume a network structure, in which workstations are the nodes, and the flow of materials defines the edges. Identifying implicit structures such as this shows that studying the behaviour of the abstract systems can teach us about many different systems at once. Studies of 2-D cellular automata, for instance, simultaneously can tell us about systems as different as landscapes and parallel computation.

Finally, our earlier discussions suggest some open questions about complexity. Not only can these serve as useful research goals, but also answering them is likely to involve developing the mathematical approach suggested here.

  1. Why are trees such common structures in nature? Do natural processes tend to create trees? Or is the observed frequency of trees just an artefact of our conceptual framework?

  2. One of the major research issues in algebra is how structures are constructed. For instance, how do we build graphs? Are there general principles for finding and using basis sets to build networks? Such questions are relevant to many issues in complexity. For instance, a promising approach to neural networks is to build large networks out of existing modules, each with different functions.

  3. Are there general mechanisms involved in the way in which systems develop ( [9],  [10])? In previous work  [6], I have suggested that phase changes in connectivity may be an important source of novelty in nature; that is, connections within a system gradually build up until they exceed a critical level, at which point a collapse occurs and the system crystallises into a new structure.

  4. Does connectivity underlie all criticality? The graph theorems alluded to here ( [5],  [6]) ensure that graphic structure is present in the structure and behaviour of all complex systems. Together with the connectivity avalanche for graphs  [3] they ensure as a minimal condition that connectivity leads to some critical phenomena in all complex systems. However, this result alone does not guarantee that criticality cannot arise in other ways. A general proof may not be possible.

  5. Does connectivity breakdown lead to natural sizes in social structures, such as human social groups? Studies of human and primate social structures suggest that ape societies have a natural group size of about 30 individuals, whereas humans appear to have a natural group size of over 100. The increased social bonding possible through speech has been suggested as the crucial difference.

  6. Models of computational complexity usually refer to the minimal algorithm describing a pattern. The notion of pattern developed here leads to the idea of the complexity of a family of patterns "relative" to the algorithm that characterises that family. This approach has several advantages. It makes it easier to measure complexity. It also raises some interesting questions. For instance, are identifiable trends in complexity associated with natural processes? In interacting systems, can we find causal relationships underlying complexity?

In conclusion, it is not clear at this point whether a truly universal theory of complexity is achievable. However, the ability to pose questions such as the above show that even the search itself is useful.


References

1
A. Bairoch. "PROSITE: a dictionary of sites and patterns in proteins". Nucleic Acids Research, 19:2241-2245, 1991.

 

2
R.F. Doolittle, editor. "Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences". Academic Press, San Diego, 1990.

 

3
P. Erdos and A. Renyi. "On the evolution of random graphs". Mat. Kutato. Int. Kozl., (5):17-61, 1960.

 

4
D.G. Green. "Syntactic modelling and simulation". Simulation, 54(6):281-286, 1990.

 

5
D.G. Green. "Emergent behaviour in biological systems". In D.G. Green and T.J. Bossomaier, editors, Complex Systems - From Biology to Computation, pages 25-36. IOS Press, Amsterdam, 1992.

 

6
D.G. Green. "Connectivity and the evolution of biological systems". Journal of Biological Systems, 2(1):91-103, 1994.

 

7
M. Gribskov, A.D. MacLachlan, and D. Eisenberg. "Profile analysis". Proc. National Acad. Sci., 84:4355-4360, 1987.

 

8
J.H. Holland. "Adaptation in Natural and Artificial Systems". University of Michigan Press, Ann Arbor, 1975.

 

9
S.A. Kauffman. "Origins of Order: Self-Organization and Selection in Evolution". Oxford University Press, Oxford, 1992.

 

10
C.G. Langton. "Computation at the edge of chaos: phase transitions and emergent computation". Physica D, 42(1-3):12-37, 1990.

 

11
R. Staden. "Finding protein coding regions in genomic sequences". In R.F. Doolittle, editor, Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, pages 163-179. Academic Press, San Diego, 1990.

 

12
R.J. Wilson. "Introdution to Graph Theory". Longman, Burnt Mill, 3 edition, 1985.

About this document ...

Towards A Mathematics Of Complexity

This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html meas.

The translation was initiated by Pam Milliken on Fri Jan 10 12:44:10 EST 1997


Complexity International (1996) 3