|
ISSN 1320-0682 | ||||
| Volume 3 | April 1996 | ||||
Irfan Altas and Murli M. Gupta
...Altas
School of Information Studies, Charles Sturt University, Wagga Wagga NSW 2678
AUSTRALIA
...Gupta
Department of Mathematics, George Washington University, Washington DC 20052
USA
Email:ialtas@csu.edu.au & mmg@math.gwu.edu
It takes considerable effort to develop cost-effective, efficient parallel algorithms in a high performance computing environment for computationally intensive problems arising from many areas, such as complex systems dynamics and molecular sequence analysis. In this work, we demonstrate some steps which guide us to design an efficient parallel algorithm. An ill-conditioned problem, which is the case for most complex systems, is selected to identify the appropriate parameters for a successive over relaxation (SOR) algorithm. It is illustrated that once the right parameters are identified, they also help to select an appropriate type of parallel algorithm to be developed for a particular problem.
High Performance Computing (HPC) is widely employed in a diverse spectrum of computational problems such as weather simulation, complex systems dynamics, molecular sequence analysis and the numerical solution of differential equations. The methods used in these areas are not uniform in the sense that some require local interaction among processing elements whereas others need global interaction. However, a considerable number of such applications including dynamic programming, genetic algorithms, network flow problems and system of equations, can be classified under the category of "fixed point problems". The problems under this category are usually solved by applying an iterative scheme.
A number of iteration methods are developed in the literature to solve fixed point problems. In this work, we mainly concentrate on iterative schemes developed to solve a system of equations arising from a model problem presented in the next section. However, the observations given in this work are equally applicable to other types of fixed point problems. There are various iterative techniques such as successive over relaxation (SOR) and conjugate gradients in the literature to solve a system of equations (for example, see [4], [2]). Some of those iterative schemes are employed in various parallel platforms [7]. The speed of convergence of some classical iteration schemes, such as Jacobi and Gauss-Seidel, is examined for synchronous and asynchronous communication approaches in some parallel environments (for example, see [1], [6]).
Our aim in this paper is to examine the convergence behaviour of SOR method for a model problem. It has been demonstrated that the expected benefit from parallelisation of the algorithm will not be gained if the right parameters for SOR method were not chosen before the computation was carried over a parallel platform. The parameters chosen for the sequential algorithm were carried over into a parallel platform to help in designing an efficient parallel algorithm.
In the next section, we introduce the model problem. We examine the convergence behaviour of SOR method with various parameters in the following section. Then the parallel version of the algorithm is given, and finally, the conclusions are introduced.
Most of the problems arising from complex systems are ill-conditioned and, hence, can not usually be solved before standard techniques developed for well-conditioned problems are modified. As a representative of an ill-conditioned problem, we pick the following model problem. The model problem does not come from a specific complex phenomenon. However, the solution shows a similar characteristic with some complex systems. The coefficient matrix of the model problem is ill-conditioned. The solution of this system can not easily be obtained by directly applying the standard iterative schemes. We study the convergence behaviours of several variations of classical iterative algorithms such as Gauss-Seidel and SOR to solve the model problem. The observations from this study can usually be used to solve other kinds of ill-conditioned problems.
Consider the second biharmonic equation -

The model problem above can be derived from the calculus of variations combined with the variational principle of minimum potential energy. This principle applies to problems of stable equilibrium of a system that is governed by the conditions necessary to ensure minimum potential energy.
The discretised form of the above equation in a unit square domain is derived by applying a 9-point compact difference scheme given in [8]. These equations are -

where "h" shows discretisation step size and i=1...N, j=1...N for some positive integer "N". Note that the number of unknowns is 3(N*N).
The scheme produces a fourth order discretisation of the problem. In this approach, not only u's but the second derivatives of u are also treated as unknowns to avoid the fictitious points which fall outside of the problem domain. Another advantage of this difference scheme, compared with other discretisation schemes in the literature, is that the boundary conditions are exactly represented.
The direct solvers for solving the system of linear equations obtained from discretisation of the differential equation in the last section, by employing various difference schemes, can only be used for moderate values of grid width "h". This is because of ill-conditioned coefficient matrix. For example, the coefficient matrix of the system derived from a 13-point difference formula is 5789.7 for N=20 in [5]. In fact, it can be shown by employing a symbolic algebra package such as Maple that the condition number of the coefficient matrix of the system given in the previous section is inversely proportional to the square of the step size. (for example for N=2, the condition number of the coefficient matrix in L1 norm is 12.6+13.1/(h*h) which produces the condition number 130.5 by substituting h=1/3.)
We do the computations for the biharmonic equation with the exact solution u=(1-x*x)*(1-y*y) [3]. The boundary conditions are obtained from the exact solution. We start our computations by applying the Gauss-Seidel iteration scheme. The first issue which should be explored, is updating the three equations in the last section either individually or simultaneously. We may first update "u" values for i=1...N, j=1...N by using Equation (2). Then, we may update the second derivative of u with respect to x for i=1...N, j=1...N by using Equation (3) and finally, Equation (4) may be employed to update the second derivative of u with respect to y. We refer to this approach as individual updating technique. The second approach may be to update all unknowns (u as well as its second order derivatives) at the same time for every "i" and "j" value. The number of iterations required for 10e -12 tolerance is given in Table 1.
As indicated from the results in Table 1, simultaneous updating produces a slightly better convergence rate. Hence, we carry simultaneous updating for the rest of the computations.
Another parameter of the Gauss-Siedel iteration technique we explore is the order of iteration technique over discretisation points in the solution domain. Amongst some variations of Gauss-Seidel we test the "zebra" and "multi-colour" Gauss-Seidel iteration techniques. In this context, the table above was produced by using point Gauss-Seidel iteration technique. These techniques as well as some others in multigrid context are discussed in [9] for convection-diffusion equation. The multi-colour approach is widely used in algorithms designed for parallel computing environments, since each colour may be assigned to a different process and the updating of unknowns can be carried out independently in each process. For our model problem, we used four different colours which converged with 185113 iterations for N=31. It did not provide any advantage over the results given in Table 1.
We tested some versions of zebra Gauss-Seidel iteration technique such as horizontal, vertical and alternating zebra. All tests produced similar results. Therefore, we only discuss horizontal zebra version here. Let us define grid points (i,j) where i=1,...,N denote the numbering of grid points along the x-axis direction (similarly for "j" along the y-axis direction). In the horizontal zebra technique, we first update the unknowns for j-odd and i=1,...,N; then, the unknowns for j-even and i=1,...,N are updated. For N=31, the horizontal zebra converged with 184892 iterations, which is not much different from the point and the multi-colour iterations given above. However, a slightly modified version of the horizontal zebra produced a considerable saving in the number of iterations.
When the horizontal zebra Gauss-Seidel technique is applied to nonlinear problems it is usual that a few iterations are required for j-odd before we start the iteration for j-even (and vice versa). Our model problem is linear, but due to its ill-conditioned nature, we thought this approach might work. In fact, that was the case. We employed 5 iterations every time before we switched from j-odd to j-even (and from j-even to j-odd). (Applying 5 iterations was decided after several experiments; for example, applying 3 iterations resulted in total 54587 iterations.) We refer to this approach as inner-outer loop technique. The results are summarised in Table 2.
The inner-outer loop method converged with 3823 iterations for N=31. By applying 5 iterations for j-odd and j-even, the actual number of iterations is 5*3823=19115. This represents ((184871-19115)/184871)*100=89% savings in the number of iterations and, hence, in the computations.
Note that the inner-outer loop approach was employed for multi-colour case. It did not provide a significant improvement over the point Gauss-Seidel case.
The Gauss-Seidel method can be accelerated by introducing successive over relaxation (SOR) version. In order to get the benefit of SOR, the optimal relaxation parameter should be chosen. The calculation of the optimal value of the relaxation parameter might be difficult for every problem [2]. However, it can be approximately calculated by some simple approaches such as trying several values and observing the effect on the speed of convergence. For the model problem, the approximated relaxation parameters 1.21, 1.1 and 1.1 for u and the second derivatives of u with respect to x and y respectively, were used to accelerate the horizontal zebra method with N=31. The horizontal zebra SOR converged with 7940 iterations for N=31. This represents a (19115-7940)/19115 * 100% = 58% saving in the number of iterations over the horizontal zebra Gauss-Seidel approach. Thus, overall reduction in the number of iterations for N=31 can be summarised in Table 3.
Note that the appropriate relaxation parameters used for the point Gauss-Seidel SOR approach for N=31 were 1.1, 1.1 and 1.1 for u, and its second order derivatives. It converged with 151908 iterations which was 32963 less iterations than the point Gauss-Seidel approach.
Before we proceed with the parallel algorithm, we must stress here that the horizontal zebra SOR algorithm is only one of the efficient techniques to solve this problem. There are some other techniques such as multigrid which may solve this problem more cost effectively.
In the previous section, we selected some suitable parameters which provided fastest convergence for the sequential algorithm amongst the tested parameters. In this section, our purpose is to develop a parallel algorithm by preserving those parameters. As is demonstrated in the previous section, we reduced the number of iterations from 198524 to down 7940 by testing several versions of the sequential Gauss-Seidel algorithm. This approximately represents an overall 96% reduction in the computations. In other words, we approximately speed up the initial sequential algorithm by a factor of 25. Hence, if we would have attempted to parallelise the initial algorithm, we would need to employ well above 25 processors in a high performance computing environment to get up the same speed.
For simplicity, we restrict the parallelisation discussion to two processors only. However, it extends naturally to any number of processors. It is also natural to only concentrate on the horizontal zebra SOR method which provided the fastest convergence rate. Note that 7940 iterations are needed to obtain the results for this technique. However, the parallel form of the algorithm needs only 1588 (=7940/5) times swapping the updated values between processors since the 5 iterations of inner-outer technique do not require any change of updated values. Iterative algorithms are usually parallelised by employing domain decomposition techniques due to their high communication rates. This is also true for our model problem.
One processor may be assigned to update unknowns along the lines j-even, whereas the other processor updates unknowns along the lines j-odd. This is not a viable option communicationwise, since all updated values in one processor should be passed to the other. On the other hand, if one processor updates all unknowns in the lower half of the unit square and if the unknowns in the upper half of the unit square are updated by the other process, then only the updated values along the common boundary in the middle of the unit square need to be changed. We use this option in our implementation.
As it was explained in the previous section, during the implementation of inner-outer loop zebra technique, we do a certain number of iterations before we switch from j-odd to j-even, and vice versa. The number of iterations at this stage has a considerable effect on the convergence rate. For example, instead of doing 5 iterations for each of j-even and j-odd case, if we do 7 and 3 iterations respectively, then the number of iterations increases from 7940 to 11975. Applying asynchronous communication techniques may cause an unbalanced number of iterations, as in the above example for j-even and j-odd due to early or late arrivals of messages. Therefore, we prefer synchronous communication over asynchronous.
The parallel version of the horizontal zebra SOR algorithm is implemented in the parallel virtual machine (PVM) environment. We employed 2 and 4 Sun workstations. The speed-up gained for each case is as follows. The speed-up factor is defined as the ratio of timing of the sequential algorithm to timing of the parallel algorithm.
We have demonstrated that designing a cost-effective parallel algorithm not only depends on parallelisation parameters, but also parameters of the sequential algorithm. By choosing an appropriate parameter of the Gauss-Seidel method, we obtained 96% reduction in the computations for the sequential algorithm. In order to get the same reduction by parallelising the point Guass-Seidel algorithm, we need well above 25 processors. We also demonstrated that selecting appropriate parameters for the sequential method, before carrying out any parallelisation effort, may help in choosing the right parallelisation approach.
Iterative Methods For Fixed Point Problems On High Performance Computing Environment
This document was generated using the LaTeX2HTML translator Version 96.1 (Feb 5, 1996) Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html altas2.
The translation was initiated by Pam Milliken on Tue Jan 14 16:09:28 EST 1997