|
ISSN 1320-0682 | |||
| Volume 02 | April 1995 | |||
Stevan M. Vajdic
Department of Electrical and Electronic Engineering
University of Adelaide, SA 5005
Michael J. Brooks
Department of Computer Science
University of Adelaide, SA 5005
Andrew Downing
School of Engineering
Flinders University of SA, SA 5001
Henry E. Katz
ISCS Inc., 1B Forest Hills, NY 11375, USA
Email: svajdic@eleceng.adelaide.edu.au
Matching/fusion strategies are discussed and a set of functions for evaluation of relational inexact matching is introduced. They are implemented in the domain of 2D/3D medical imagery and are based on Artificial Intelligence (AI) paradigms - in particular, rule-based knowledge representation and tree search. The 2D reference and target images are selected from 3D sets and are segmented into non-touching and non-overlapping regions, using iterative thresholding and/or knowledge about the anatomical regional shapes of human organs. Selected image region attributes and their inter-relationships (relations) are calculated. Region matches are obtained using a tree search, and the error is minimised by evaluating a "goodness of matching" function based on similarities of region attributes/relationships. Once the matched regions are found and the spline geometric transform is applied to regional centres of gravity, images are ready for fusion into a single 2D and/or 3D image of higher clarity.
Image fusion (composing images of the same objects taken by different means) has played a significant role in disciplines such as medical imaging, pattern recognition, computer vision and remote image sensing. The idea is to combine images of a single object obtained using either different sensors, different illumination conditions, with perhaps different sections of the object observed at different times, from various positions or viewing angles, under different illumination conditions or polarisations. The resultant single image would be of a better clarity and contain more information than could be obtained from any of the individual "matched" images.
The idea of combining images in medicine becomes very important because of the necessity to improve the diagnosis of diseases and the treatment and care of patients. 2D/3D fusion of medical images acquired by medical scanners - for example, Computerised Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET) and Single Photon Emission Computer Tomography (SPECT) - is of special interest for medical experts because of the different image characteristics produced by each device that are useful for combining.
It is known that CT and MRI can provide anatomical information (structure, texture, grey-level shapes) of human organs and PET and SPECT can provide functional information (tissue function, blood vessels, cell function, metabolism). Moreover, CT can provide good bone structure with less distortion but with less tissue texture, as opposed to MRI that can give a high quality structure description but with more distortion due to secondary magnetic field effects formed around the examined object. As a consequence, any one of the following five image pairs - (CT,PET), (CT,SPECT), (MRI,PET), (MRI,SPECT), (CT,MRI) is useful for fusion. The first four are classified as structure-function fusion, the last one as structure-structure fusion. The first component in each pair is referred to as a reference image (the one that is not changed), the second component is referred to as a target one (the one that is changed). The mutual characteristic for all four modalities is that they contain a finite set of 2D cross-sections (slices, scans) of a human organ, forming a 3D structure of the organ, with a chosen distance between the 2D slices.
Before the images are fused they have to be matched (registered); that is, the target image needs to be modified somehow to match the reference image. All image matching/fusion algorithms developed and classified [1, 2, 3, 4] so far generally adopt a four-step strategy: a) extraction of key structures, b) matching of key structures, c) evaluation of similarities between structures, and d) transformation of target image.
Classification of matching/fusion algorithms is a very complex task in general, since it involves many classification criteria [2] such as image dimensionality (2D, 3D), image properties (attributes and their inter-relationships - extrinsic, intrinsic) and image property coupling (interpolation, approximation), domain (global, local) and elasticity (rigid, affine, projective, curved) of transformation, parameter determination (direct, search-oriented) and interaction (interactive, semi-automatic, automatic). A complex classification with all explanations is beyond the scope of this paper. Yet, it would be worth mentioning that different classifications would be possible from the following global set of principles of matching methods: a) control points, b) moments, c) knowledge base representation of object characteristics, d) similarity optimisation, e) stochastic sign change, f) Fourier analysis, g) cross correlation, h) eigenvalue decomposition, i) warping, j) 3D surface-fitting, k) anatomical atlas, l) internal landmarks, m) external landmarks, n) principal axes, o) inexact relational matching, and p) neural network segmentation.
Which method is to be chosen depends on the type of images to be matched/fused (for example, natural scene images, stereo images, Landsat images or medical images). Moreover, a set of possible problems related to the matching task and type of images would be given as: a) choosing a matching method, b) choosing the reference image, c) selecting image properties, d) imposing constraints, e) evaluating "goodness of matching", f) choosing an appropriate geometric transformation, g) coping with distortion, and h) choosing a fusion method.
This project is concerned with medical images. Hence, with
regard to the suggested four-step matching methodology and related
problems, the following key steps are chosen: a) image regions, similar
to the anatomical shapes of
human organs, and their attributes, are extracted from 2D
slices to make less complex the image-matching procedure
and the finding of the best match; b) CT or MRI image modality is
selected to be the
reference since they provide anatomical information which
is closely related to segmented regions; c) tree search is employed as
the best method for
finding the region matches, and similarity between regions (for
finding the best match) is evaluated by applying functions
that incorporate comparisons of region attributes; d) a spline geometric
transform is applied to correct the local distortion.
Based on these key steps and knowledge base representation (IF-THEN
rule-based analysis) 2D and 3D generalised approaches were proposed
and discussed by
the authors of this paper [5, 6].
In order to get a single 2D (3D) composite image out of two
modality components, two (three) major procedures needed to be
performed: matching and fusion (and 3D visualisation). These procedures
were broken into steps: a) selection of two 3D medical image data sets,
b) resampling of
the 3D target modality using bilinear interpolation (getting the same
dimension of both 3D sets (number and size of slices)), c) selection of 2D
slices of interest, d) generation of non-touching regions (region segmentation
via iterative thresholding and/or using knowledge base representation of
the anatomical shapes of human organs), d) calculation of region attributes
and their inter-relationships, e) application of inexact relational region
matching, f) application of the geometric transformation (spline
transform applied to the region centres of gravity as the best transform
for eliminating local distortions [7], and
g) fusion of
images [8] - composition of weighted average of corresponding
pixels (voxels) of either any chosen pair of 2D slices
or complete 3D sets. In the case of 3D fusion, a 3D visualisation
was also performed. Two distinct, sequential step-designed
approaches were proposed: a) initially supervised segmentation based on
iterative thresholding to form the image regions, region-based relational
matching, and a tree search for finding the best match, and
b) unsupervised
knowledge-based segmentation to form image regions, region-based relational
matching, and a tree search for finding the best match. The shapes of
closed region boundaries, produced by segmentation, were as similar as
possible to those based on the anatomical structure of human organs.
Segmentation by iterative thresholding was manually supervised only for
the initial trials, either until the
shapes of significant parts of organs in all of the matching
modalities became similar to those initially suggested, or
until the number and sizes of regions became close enough
to match. Once the set of threshold
values
(where m represents
modality - CT, MRI, PET, SPECT) was found, it could
be used as the set of initial values for any future iterative
threshold procedures. That would provide an automatic approach
without significant human intervention. Also, separate
sets of threshold values would have to be found and stored for
different organs or structures such as the head, chest, spine
or neck. An automatic knowledge-based unsupervised segmentation
procedure was based on prior knowledge about the regions
that expected to be obtained; for example, in the case of brain images, a definition of
knowledge-based shapes of characteristic parts - thalamus, sinus, skull,
cerebellum, white matter, gray matter, etc., was demanded.
Each small region, obtained using initial low-level
segmentation, was checked against the prior knowledge about
the chosen anatomical region shapes and merged into larger
regions. The prior knowledge was represented in the form of
masks consisting of anatomical shapes of specific organs,
and was driven by rule-based (IF-THEN) facts that used the
image features and guided the region merging process.
Afterwards, segmentation regions of both images were to be matched.
In this paper the emphasis is given to evaluation of inexact relational matching (match evaluation functions based on a similarity measure between region attributes and their inter-relationships), and finding the best match within a set of "good" matches (AI-based tree search methods).
Among many image primitives (points, lines, regions), regions are adopted because of the special segmentation methods applied for their extraction (iterative thresholding or knowledge-based segmentation). More attributes can be extracted from regions than from points and lines. Nevertheless, by applying Hough or medial axis transformations, points and lines could also be used with the same algorithm proposed here, but the accuracy of matching may well be below the level that can be reached with regions.
Image region attributes including perimeter, area, shape, mean/max/min radius, centre of gravity, and amplitude, histogram and average gray values are computed. Either the attributes of the regions, or calculable inter-relationships between them, can be used. If the inter-relationships are used, then attributes of each region within either the reference or target images are recalculated in terms of their relationships with other regions in order to provide uniformity. A major experimental search for the best combination of region attributes and/or relationships is underway. The main criterion for a good choice is invariance under linear transformations (translation, rotation, scaling, skewing) and possibly non-linear distortions. The non-linear distortions are to be corrected mainly by applying the spline geometric transformation using the centres of gravity of matched regions after the region matching has been performed.
Let sets
and
represent regions as image primitives in the reference and
target images, where m and n are the numbers of regions in
each (in general, m is not equal to n). Each region is
described by a finite number
of region properties
with
values
,
k=1,2,...,r (the term "property" will stand for attributes and their
inter-relationships
(usually called relations) - numerical or symbolical) to form hierarchical
structural image pattern
descriptions - representations. All numerical property values are scaled
into the interval [0,1] using the expression
or, in the case of Gaussian-distributed properties, normalised using the expression
where
and
are the minimum and maximum
values, and
and
are the mean value and
standard deviation of property
. Properties are weighted
depending on what pair of modalities (CT, MRI, PET,
SPECT) is to be fused. The property weights differ from
each other, especially when a structure-function fusion
takes place, due to different regions being extracted in the
segmentation process. For example, properties "area" and "perimeter" (or
"relative area" and "relative perimeter"
in the case of forming relations),
are
given a higher weight than shape; the property "mean radius" (or "relative
mean radius"), is
considered more influential than property "distance between region centres
of gravity", etc. Unfortunately, there is no mathematical apparatus that
can theoretically describe the way of assigning weights to
properties - experiments are the only alternative researchers have for the time being.
The matching procedure is complete when l regions (
) are found in both matched images. Relational region
matching is solved by means of tree search methods. The
matching procedure is guided by the match evaluation
function that compares similarities between weighted
region property values in the reference and the target images in
order to find the best match. A tree search is applied only to
find region matches prior to the application of a geometric
transformation.
The match evaluation function has to be applied to two data descriptions (here, reference and target image regions as primitives and their properties - attributes and their relationships-relations) in order to guide the search method in finding the best match. The best match between two sets of image region primitives is to be found as the mapping for which the regions show the best similarity in their property values. The major problems of definition of similarity of properties - what property is more important and/or how to combine numerical and symbolical properties - need to be overcome. Thus, the property values have to be weighted and their inter-relationships defined in terms of all the relations over the set of property values, and/or a conditional probability function of properties needs to be employed.
Three types of evaluation function that compare similarities of two
descriptions (here, sets of regions) have been proposed so far: a distance
measure (cost function) and a merit function, both involving mainly comparison
of pairs of primitive property values, and the function that involves
more sophisticated parameters - for example, the number of primitives, the number of
properties and the number of successful matchings of each pair of properties
[9].
The distance measure (distance function) is to be minimised, the merit
function is to be maximised, and the third function could be defined in such
a way to allow either minimisation or maximisation. In the case of the two
sets of region primitives defined in
Section 3, the cost function of similarity between
regions
and
is usually defined as the sum of the absolute or square
value differences, or even as the square root of the sum of the square value
differences of all region properties. These forms can be applied to numerical
properties only. In the case of either only symbolical properties or a
combination of both symbolical and numerical properties, a conditional
probability function can be successfully applied [11],
or some
penalty for non-similarity needs to be incorporated into an evaluation function
expression. The conditional probability (density) function (or
conditional information)
of property
, is
defined as the likelihood that the property
of the primitive
in the target image will have the
value
if it is known that the property
of the
corresponding
primitive
in the reference image has the value
.
The contribution to the distance measure between these two primitives
from property
could then be
defined as:
Minimising the sum of all distance measures will lead to a maximising of the product of all probabilities and, consequently, a maximising of the likelihood mapping. The maximum likelihood mapping can be obtained by minimising the sum of the square root property value differences in the case of Gaussian-distributed properties or by minimising the sum of the absolute property value differences in the case of Laplacian-distributed properties. In the case of symbolical properties, beside adding a penalty value to the total distance measure if compared property values are not the same, which is rather a rough method since it does not distinguish the level of similarity, the only smooth method involves the above mentioned probabilistic approach.
Boyer and Kak [10] developed a cost function whereas Vosselman [11] developed a merit function based on probabilistic information theory and the formula defined in Equation (3). At issue here is the calculation of property conditional probabilities. All methods (analytical, numerical and hand-training) require either complex functions to be found (analytical and numerical methods) or many matches to be done by hand to guarantee correctness.
Two well-known cost (distance) functions were designed by Sanfeliu and Fu [12] and by Shapiro and Haralick [13, 14]. The first one defines a distance measure on graphs (primitives are graph nodes, branches are inter-relationships/relations between primitives). The function primarily includes a measure of differences between normalised attribute values, and some heuristics involving structural differences between graphs. The second one is based on comparing weighted primitives, weighted attributes (using attribute value thresholds) and weighted relation vectors/tuples, and adding a normalised distance (between 0 and 1) for each corresponding pair of primitive properties (relation vectors/tuples) that is inexactly matched. Price and Reddy [15] and Faugeras and Price [16] extended this weighted attributes comparison to assigning a strength for attribute contribution in particular matching tasks; for example, the same attributes may have more or less influence on the evaluation of matching depending on the type of images being used and primitives being extracted. Difficulties here are the need for an experimental (non-theoretical) search for weights and strengths.
Barrow and Popplestone [9] proposed an evaluation function involving more parameters than a comparison of attributes and their relationships:
where
is the number of successful matchings of
relations (inter-relationships between region attributes),
and
are respectively the number of relations and the number
of regions in the reference image region descriptions. Relations between
reference and target images (
and
) are considered to be
matched if their
difference is within some threshold. The threshold is chosen experimentally
to be
,
where
is the standard deviation of
the relation
.
The drawback is that neither attribute/relation weights
have been introduced, nor penalties applied, for combining
numerical and symbolical region properties.
To overcome some of the problems mentioned above the authors propose a set of new evaluation functions:
where
,
and
are the weights that
define the level of contribution of the number of regions segmented in
reference and target images and the number of the properties used;
and
are the numbers of regions in reference and
target images;
is the number of initially suggested anatomical
regions that should be segmented;
is the number of properties extracted; and
is
the number of successful matchings of
property values. The factor
represents the
sum of penalty values for each pair of unmatched properties. The first two
factors
contribute to the function as the relative difference between the number of
extracted and initially suggested regions in both images. All the functions are to be
minimised for good matches. All properties are weighted and
are compared with
regard to the threshold given by
, k=1,...,r, where
is a
constant to be found experimentally, and
is the standard
deviation of all region property values of one kind in the reference image.
If the difference between two property values is above the threshold, a
penalty, equal to the weight of
that property, is added to the evaluation function. Properties of regions
of both images can either be attributes or relations (all attributes of
one kind for all regions are recalculated with regard to each other
forming inter-relationships/relations).
Although detailed explanations about tree search algorithms and related problems are not the prime concern of this paper, some pointers related to evaluation of matching will be given, to make the matching/fusion algorithm complete.
Tree search methods have been well developed in AI. Here, they are used to search the space of similarities between region property values of reference and target images and find the best match based on the proposed evaluation function. In general, the search for the best match is very time-consuming since there are many regions and their properties of both images involved. The search time could be reduced by imposing some constraints. Hard constraints limit the set of all mappings within the search space to those for which the values of the evaluation function are inside experimentally predefined thresholds. Soft constraints define a relative preference over the space of possible mappings by implementing some heuristics in the evaluation function that direct the search method towards "better mappings". To minimise the search time for the best match, both hard and soft constraints could be employed. For example, with regard to the defined evaluation functions (Equations (5), (6) and (7)), the authors propose soft constraints that direct the search algorithm to compare the properties with high weights first; after each comparison the partial value of the evaluation function of currently compared regions is calculated and compared with the stored minimum value of the function of some previously completed region comparisons; the process of the calculation of the current function (and, consequently, the process of the comparison of the rest region properties) is stopped when its value exceeds the minimum stored value of another pair of compared regions; the current match is then assigned the tag "not the best one"; the track of "the best match" is always kept.
Evaluation of inexact relational matching in 2D/3D medical image matching/fusion based on tree search principles has been considered. Improvements are given in considering the evaluation function that involves the weighted attribute/relation comparison, the weighted penalty for non-similarity, the number of attributes/relations, the number of successful attribute/relation matches, and the weighted relative difference between the segmented and initially suggested anatomical regions of human organs of both the reference and the target images. Although both relaxation labelling ([17, 18]) and relational matching methods can be used for matching relational image descriptions, the latter has been chosen for this work because it always finds the best mapping between the primitives of image descriptions (in this case, the image regions). The trade-off between the computational cost and accuracy also needs to be taken into account whereby the more image regions and region attributes/relationships adopted, the better the accuracy, but the greater the computation.
The authors are grateful to Prof. Marilyn Noz of the Department of Nuclear Medicine, New York University, for her unreserved support in supplying matched medical image data sets used in this project.
AI and Medical Imagery: Strategy and Evaluation of Inexact Relational Matching
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html vajdic.tex.
The translation was initiated by Pam Milliken on Mon Oct 28 15:19:03 EST 1996