|
ISSN 1320-0682 | |||
| Volume 02 | April 1995 | |||
Terry Bossomaier
School of Information Technology
Charles Sturt University, Bathurst, NSW 2795
Email: tbossomaier@csu.edu.au
The field of artificial neural networks is one of the fastest growing of the present time. There is activity on many fronts - such as learning algorithms, dynamics, capacity - producing new ideas and results frequently. Yet, comparing artificial with biological neural networks reveals several striking differences:
The first two of these we have to live with for the moment. The third, however, is not so much a limitation but more a different strategy. The need for modularity is clear when we consider the difficulties of learning complex visual tasks. Face recognition, for example, has been of major interest to law enforcement agencies for a long time, but human beings still outperform computers. A minimal representation of a face would need several thousand pixels. The training requirements on such an input vector size are huge.
Yet the task is not a homogeneous one. Faces have structure, and learning this structure seems an obvious requirement. In practice we would try to break up the task, introducing pre-processing steps to reduce the scale of the learning phase; but ideally we would like the network to learn this substructure. Simple feedforward networks can learn substructure in small problems, such as the shape-from-shading problem discussed below.
In the macaque, face-recognising cells appear in infero-temporal cortex, right at the end of the visual system. There are several intermediate areas, and perhaps as many as twenty processing stages. We know a lot about what happens at the beginning, but much less about the functionality of the cells in the middle. Part of the trouble is a shortage of reasonable guesses of what to look for. The intermediate representation could be anything at all! The present paper proposes the use of algorithmic information theory as a metric for understanding this organisation.
The numerous visual areas form the global organisation, but within each area there is substantial specialisation on a range of scales for a range of tasks. One important aspect of local structure is the columnar organisation. This is found in V1, the first area of visual processing, where cells of similar orientation specificity are arranged in columns perpendicular to the cortical sheet. Columns of orientations are then clustered into neat pin-wheel arrangements [3]. But this highly regular organisation is not just a characteristic of early visual processing. In infero-temporal cortex, a similar columnar structure appears, but this time the elements are not primitive visual tokens, but are figural elements; that is, shape, shading and other building blocks of images [4]. Similarly, tasks are grouped together, such as the quasi-independent streaming of colour, form and movement, forming cortical structures visible under staining as blobs and stripes [5].
In trying to understand the modularity of visual cortex, we can put forward a range of theories from a wholesale surrender to evolution, to complete ab initio learning by the individual.
Unfortunately, evolution does not follow a predictable path, and one of the problems frequently encountered in biology is that the existence of a species, or biological substructure, is not an argument for its inevitability. Thus, one of the arguments for the organisation of visual cortex is that it just evolved that way, nothing more or less - a position for historical rather than scientific analysis.
In some cases, a convincing case for the evolutionary model can be made. Dumont and Robertson demonstrate pre-adaptation in invertebrate neural systems such as the kick reflex of the crayfish [7]: certain parts of the neural architecture are redundant and are clearly leftovers from pre-adaptation. Stork et al. [8] successfully built on the biological data with a genetic model of the proposed evolutionary pathway. Colour vision in primates uses separate pathways [5], and could be viewed as an independently evolved system. The mechanisms by which independently evolved modules might be integrated are imperfectly understood, but in any case there is striking adaptation of visual systems to environment and lifestyle [9].
Whatever the capacity of an organism to learn, the more hints it can start off with the better. Some generic features of images change slowly, if at all, over geological time. The most fundamental are obviously the properties of a solid three-dimensional world and the various characteristics of the reflection of light from surfaces. Similarly, the visual processing of movement may have underlying algorithmic properties which do not change with time. Other phenomena which change slowly but still last typically over many generations are things like colour, the spectrum of the sun's radiation or the absorption of chlorophyll. It would make sense for each individual not to have to learn them afresh and it would obviously be possible for these mechanisms to persist and be individually optimised over time. But that would not guarantee optimal behaviour of the entire system and further evolution might improve their integration.
On the other hand, evolution is now recognised to be a powerful mathematical optimisation technique and we should not exclude the possibility that the there is a high level of optimality in the primate visual system. Very early on, at the initial stages of capture of the image and its encoding for the bottleneck of the optic nerve, Shannon's information theory of communication [10] has proved effective in demonstrating that the early system is highly optimised [11]. Some authors continue to use this as a metric into the cortex itself [6,12]. Nevertheless, at some point, data compression must cease and be replaced with actual computation. Is there a metric for optimality of computational tasks? If we switch our attention from encoding of an ensemble to the precise description of each, individual unique image, then we should use algorithmic [13] rather than Shannon information theory. Whereas early vision, particularly at the photoreceptor level, is acutely concerned with noise, at higher levels noise is less important compared to the efficient organisation and access to visual data.
We propose that the modularity of cortex is driven primarily by the search for the most economical representation of the external world. In simple terms, the algorithmic information or Kolmogorov complexity of a data set is the shortest program to describe it. For completely random sets, this program must be basically a list of the elements. But if a data set is non-random, then a shorter, algorithmic description exists. At a simple level this is intuitive. Breaking the visual image up into features, moving to object-centred co-ordinates, even merely the reconstruction of a three-dimensional object from two-dimensional projections, are all simplifications. More importantly, once these transformations have been carried out, the original image information is frequently discarded in human vision [14]. However, the selection of appropriate features and their extraction has repeatedly proved non-trivial in real world images, a point we return to below.
The difficulty of ascribing function to intermediate neural modules is illustrated by the simple cells of the striate cortex. Originally they appeared to be feature detectors and numerous studies suggested a role in edge detection. Morrone et al. [15] suggested such a mechanism derived from psychophysical data, subsequently shown to be an effective image-processing technique (Morrone and Owens [16]). Yet the same cells have been shown to be effective for image compression [12,17] and to arise from self-organisation in response to pink noise [6]. Even more interesting is their appearance as the hidden layer in the shape from shading work of Lehky and Sejnowski [18]. This multiplicity suggests an underlying common functionality of some kind, not adequately described by any existing theoretical framework.
Let us now switch our attention to a specific example using feedforward networks, where some firm theoretical results for learning and representation have been established. Suppose we wish to classify a set of random patterns into two categories. Since they are without any structure, we cannot do better than a perceptron with number of inputs of the order of the number of images. We can view the classification task in terms of the Vapnik-Chervonenkis dimension [19]. The classification machine must have VC dimension at least equal to the number of pictures. For feedforward networks, the VC dimension is of the order of the number of variable weights in the network, and for the perceptron it is simply N+1 where N is the number of inputs.
Now consider what happens if the number of input pictures to classify
is increased
dramatically as a result of some deterministic transformation. To
take a concrete example, imagine the random patterns to be illuminated
from different orientations, producing some
degree of shading. This is quite deterministic. A sufficiently
flexible multi-layer system will learn to deconvolve the shading and
generate a representation which is again linearly separable. On the
other hand, if we were to stay with the single layer system, the best
we would be able to do would be to classify all the individual
images. This would greatly increase the size of network
.
The representation of functions by simple feedforward neural networks has been addressed in some detail [20], and the nature of this transformation need not concern us. The penultimate layer must simply be linearly separable, and this determines the number of hidden units of that layer which must grow essentially linearly with the number of random patterns to classify. As already mentioned, current training methodologies can successfully find a set of hidden units weights, but if we have many tasks which share intermediate levels, the training would be far more complicated.
For the early stages of visual information processing, we may in fact be able to use Shannon information to approximate algorithmic information. Part of the difficulties faced by machine pattern recognition arise from the irregularity of natural images. Abu-Mostafa [21] describes such problems as random; that is, of very high algorithmic complexity. In such cases, the two information measures are very close [13] and optimising the Shannon information should suffice. As the image information gets processed however, and becomes much more structured, this relationship may no longer hold. To determine the properties of inner layers then becomes a matter of empirical investigation.
Although the transformations needed to go from the visual image to a model of the three-dimensional world which generated it have been identified and discussed, at least since the influential thinking of David Marr, their implementation in artificial vision for natural scenes has proved difficult, at least partly due to the randomness referred to above.
It is within this context that computer simulations of evolution can prove particularly useful. Nillson and Pilger, for example [22], examine the development of the optics of an eye, subject to an evolutionary pressure to increase the amount of spatial information. Similarly, the development of individual neural modules should be amenable to study. In particular, within an artificial life framework, the alternative hypotheses discussed above, between developing modules for specific slowly varying visual tasks and the optimisation of intermediate processing stages, should be discriminable. Imposing a size penalty on networks should force them to evolve towards the optimal algorithmic complexity.
The organisation of the visual cortex up to the level of complex pattern recognition is a major biological challenge and promises to contribute much to the creation of large scale artificial neural networks. Evolution towards the smallest neural networks is seen to have advantages for higher level pattern analysis and efficient computation. The simulation of evolution provides a framework for discriminating between various models of cortical organisation.
I am grateful to David Green, Daniel Osorio and Allan Snyder for discussion on the topics of this paper.
Evolution of Neural Modules
This document was generated using the LaTeX2HTML translator Version 95.1 (Fri Jan 20 1995) Copyright © 1993, 1994, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
l2h -dir terry trjb.tex.
The translation was initiated by Pam Milliken on Fri Oct 4 13:36:27 EST 1996Fri Oct 4 13:36:27 EST 1996