Complexity International      ISSN 1320-0682     
Volume 02 April 1995

PFSA Modelling of Behavioural Sequences by Evolutionary Programming

Charlie H. Clelland and Douglas A. Newlands
School of Computing and Mathematics
Deakin University
Victoria 3217, Australia
Email: fractal@deakin.edu.au, doug@deakin.edu.au

Abstract:

Behavioural observations can often be described as a sequence of symbols drawn from a finite alphabet. However, the inductive inference of such strings by any automated technique to produce models of the data is a nontrivial task. This paper considers modelling of behavioural data using probabilistic finite state automata (PFSAs). There are a number of information-theoretic techniques for evaluating possible hypotheses. The measure used in this paper is the minimum message length (MML) of Wallace.

Although attempts have been made to construct PFSA models by incremental addition of substrings using heuristic rules and the MML to give the lowest information cost, the resultant models cannot be shown to be globally optimal. Fogel's evolutionary programming can produce globally optimal PSFA models by evolving data structures of arbitrary complexity without the requirement to encode the PFSA into binary strings as in genetic algorithms. However, evaluation of PSFAs during the evolution process by the MML of the PFSA alone is not possible since there will be symbols which cannot be consumed by a partially correct solution. It is suggested that the addition of a "can't consume" symbol-to-symbol alphabet obviates this difficulty. The addition of this null symbol to the alphabet also permits the evolution of explanatory models which need not explain all of the data, a useful property to avoid overfitting noisy data.

Results are given for two test sets for which the optimal PFSA model is known and for two sets of eye-glance data derived from an instrument panel simulator.


Introduction

Inductive inference, the creation of models or hypotheses from data sets, has been accepted to be one of the primary formal operations of scientific inquiry since the 17th Century. However, it has been known since then that a large and possibly infinite number of hypotheses may be inferred from a single data set. The selection of the best hypothesis inferred from a data set is guided by Occam's Razor. This may be interpreted as accepting the simplest hypothesis with the maximum explanatory capability. Ideally, the process of inducing models should be capable of being automated by some computational procedure.

The problem of inductive inference may be considered as a search of possible models to find the optimal model which explains the data set. Some simplification of this search procedure can be achieved by restricting the search to a single class of models. Even with this simplification, it is known that inducing a single class of models (finite state machines) from a data set is NP-complete [4]; that is, the time requirement for an induction algorithm to find an optimal finite state machine is non-deterministic and increases as a polynomial function of the size of the data set. Given this limitation on the performance of automated induction procedures, the size of data sets which can be processed by a deterministic search algorithm is clearly limited [3].

There is, however, a class of algorithmic techniques collectively known as evolutionary computation (EC) which are non-deterministic and can give close-to-optimal solutions to many problems known to be NP-complete.


Problem domain

It is a long-term goal in behavioural science to identify the underlying processes which mediate behaviour. As an initial step toward this goal, the identification of regularities and patterns in observable behaviour is a necessary precursor activity. However, this process is not trivial since most behavioural data is highly contaminated with noise and often the metrics of behaviour are not known. The simplest procedure to transform behavioural data for automated induction is to map the raw data into a finite alphabet of symbols [9]. In the example used, the data is eye-glance data obtained from an instrument panel simulator; the raw data is mapped to a 4-symbol alphabet where each symbol corresponds to an instrument on the simulator panel. The task is to find the optimal model which explains the scanning sequence embedded in the data.


PFSA models

The choice of model used is arbitrary; for this work, probabilistic finite state machine (PFSA) models [8] are induced from the data sets. PFSAs are rather general models which consist of a set of states and transitions between those states. When a PFSA receives a symbol, it consumes it if a transition for that symbol is present and moves to the state defined by the transition. In addition, PFSAs have a probability associated with each outgoing transition from a state and can be used to predict the most likely symbol to be consumed next. In this application, the PFSA transitions have a count of the number of symbols consumed on each transition associated with them. The probabilities are not calculated, although, in principle, they can be. Note that, unlike other finite state models, there are no output symbols associated with either transitions or states; in this respect the PFSA models used are similar to the finite state automata models used in parsing formal languages.

The PFSA models used are constrained by rules of construction. These are:

Since the task is to find the optimal PFSA model, an evaluation procedure is required. Notionally, this could be done by some sort of quasi-objective scoring of nodes and transition counts in each model generated. A better method is that of estimating the length of the message required to communicate the structure of a PFSA model. The optimal model is that which has the shortest estimated message length [11]. The minimum message length (MML) for a PFSA can be approximated by the expression:

where:

The expression contains 3 terms corresponding to the message length costs of:

The message length is expressed in nits, that is, the natural logarithm (bits).

MML measures can be compared in a similar way to statistical measures. The difference between 2 MMLs may be taken as an odds ratio expressed as a power of 2. A difference of approximately 6 is reasonable evidence for accepting one model or hypothesis over another [10].


Evolutionary programming (EP)

EP is a rather venerable algorithm which has attracted a resurgence of interest in recent years; the primary text dates from the 1960s [2]. The algorithm, in part, emulates biological evolution and operates on a population of candidate solutions to a problem. In this case, the candidate solutions are PFSA models coded as dynamically sized two-dimensional arrays. Unlike Genetic Algorithms (GAs), EP places no restriction on the representations used allowing simple and direct representation of complex data structures. GAs are usually restricted to bit string representations which do not permit changes in data structure size during a run [12]. Although versions of the basic GA algorithm exist which permit dynamic bit strings [5], the mapping of the bit string to the desired structure represents a significant computational overhead. EP may be expressed in pseudocode as:

        Initialise a population of N candidate solutions

        Evaluate the population

        while ( not terminate condition)

                Copy each parent to create N children

                Mutate each child

                Evaluate the children

                Stochastically select the best N individuals

                Discard the poorest N individuals

        end while

Initialisation

For the studies reported, the initial PFSAs were simple ring structures of 2 to 20 nodes with a single transition between each node. It was assumed that there was no existing knowledge about the PFSA models which could be used to initialise the population.


Evaluation

The evaluation function used in EP, supplies selection pressure on the population. In this case, the primary evaluation measure is the MML of each PFSA. Procedurally, each PFSA model was presented with the data set and counts maintained on each transition of the symbols consumed by that transition. There are two other conditions which must be selected against.

In the first case, a simple procedure was implemented. An additional "can't consume" symbol was added to the symbol alphabet which was always expressed as a self-transition on each node on which a symbol could not be consumed. This had two benefits:

Similarly, the total number of redundant transitions in a PFSA - that is, those transitions which were never used in consuming the data set - was used in the evaluation function. The evaluation function used was:

The constants were empirically determined and set to the same value, usually 5.


Mutation

The mutations used were structural alterations of the PFSA, consistent with the rules of construction of a PFSA. Each mutation was only accepted if the resulting PFSA could consume the data set. In part, this restriction prunes the search space since the space which contains all "broken" and viable PFSAs is much larger than that for only viable PFSAs. The mutations used were:

The mutations were stochastically selected with each transition operation occurring 30% of total mutations and the node operations 5%. The norm in EP is to have equal probabilities for each mutation type but it was thought in this case that the node operations would have gross and possibly disruptive effects on the PFSAs and were reduced in probability [6].


Selection

Selection occurred by "tournament" selection where each individual's evaluation measure in the population is compared to a randomly selected sample of the others in the population. A score of the number of "wins" determines the survival into the next generation. Tournaments of size 5 and 7 were used.

 
Figure 1: Best runs for 100-symbol (cba) data set


Experimental description

Two data sets were used.


cba

A 100-symbol set, which was known to be composed of the characters "cba" interspersed with the character "d" [3] was tested. A population size of 50 with 5 tournaments was used. Convergence was relatively rapid with the unconsumed symbol count falling to zero in approximately 20 generations with very little improvement noted after 100 generations. The mean evaluation score of 2 runs is shown in Figure 1, with the 2 best PFSAs obtained in 20 runs shown in Figure 2. The best PFSAs are 3-state with the best machine identifying the "cba" sequence. Gaines [3] claimed the optimal PFSA was a 4-machine for this data set; this view is not supported by the present data.

 
Figure 2: Best 2 PFSA models induced from 100-symbol (cba) data set.

 
Figure 3: Best 2 runs for 2000-symbol (eye-glance) data set.


Eye-glance

The second data set contained 2000 symbols and represented eye-glance data from an instrument panel simulator. In this case, self-transitions were permitted in the PFSAs. A population size of 100 with 7 tournaments was used. The runs behaved in a similar fashion to the first data set, although the convergence was somewhat slower. The best 2 PFSAs are shown in Figures 3 and 4. Interpretation of these PFSAs is not obvious but there seems to be some sequences which occur with higher frequencies than others, such as the sequence "bc" in both machines.

 
Figure 4: Best 2 PFSA models induced from 2000-symbol (eye-glance) data set.


Conclusions and further work

The combination of PFSA models and EP seems to show some possibilities for a robust, rapid technique for inferring regularities and patterns in behavioural data. The technique appears to work on real data sets rather than on "toy" problems. However, this conclusion must be considered tentative since the present method has no adequate way of avoiding over-fitting noisy data. It is thought that, by permitting the PFSA models to retain a proportion of "can't consume" symbols, the evolutionary search may be able to retrieve regularities from noisy data sets. The present system under-utilises the available information during the search; the MML not only can give an assessment of PFSA structure but can also provide a measure of the effect of each mutation type on the PFSAs. This may allow for dynamic tuning of the mutation probabilities during a run in a similar fashion to evolving mutation schemes used in EP for continuous function optimisation [1].


References

1
Fogel L. J. (1993), "Evolving mutation", Evolutionary Computation Workshop - Australian Joint Artificial Intelligence Conference, Melbourne.

2
Fogel L. J., Owens A. J. & Walsh M. J. (1966), Artificial Intelligence through Simulated Evolution, New York: Wiley and Sons.

3
Gaines B. R. (1976), "Behaviour/structure transformations under uncertainty", Intl. J. Man-Machine Studies, 8, pp. 337-365.

4
Gold E. A. (1978), "Complexity of automaton identification from given data", Information and Control, 37, pp. 302-320.

5
Goldberg D. E., Korb B. & Deb K. (1989), "Messy genetic algorithms: Motivation, analysis, and first results", Complex Systems, 3, pp. 493-530.

6
Lutter B. E. & Huntsinger R. C. (1969), "Engineering applications of finite automata", Simulation, pp. 5-11.

7
Patrick J. D. & Chong K. E. (1987) "Real-time inductive inference for analysing human behaviour", AI '87 - Proceedings Australian Joint Artificial Intelligence Conference, Sydney, pp. 305-322.

8
Rabin M. O. (1963), "Probabilistic automata", Information and Control, 6, pp. 230-245.

9
Solomonoff R. J. (1964), "A formal theory of inductive inference: Part 1", Information and Control, 1, pp. 1-22.

10
Wallace C. S. & Freeman P. R. (1987), "Estimation and inference by compact coding", J. R. Statist. Soc. B, 49(3), pp. 240-265.

11
Wallace C. S. & Georgeff M. P. (1984), "A general objective for inductive inference", Technical Report No. 32, Department of Computer Science, Monash University, Victoria, Australia.

12
Zhou H. & Grefenstette J. J. (1986), "Induction of finite automata by genetic algorithms", Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Atlanta, GA, pp. 170-174.

About this document ...

PFSA Modelling of Behavioural Sequences by Evolutionary Programming

This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
l2h -dir test dn_cmplx.tex.

The translation was initiated by Pam Milliken on Fri Aug 16 16:39:47 EST 1996


Complexity International (1995) 2