A fast 2-D parallel multilevel fast multipole algorithm solver for oblique plane wave incidence



[1] We present a multilevel fast multipole algorithm (MLFMA) implementation to numerically solve Maxwell's equations for large two-dimensional geometries illuminated by an arbitrary plane wave in three-dimensional space. The solver's capabilities are augmented by means of an asynchronous and hierarchical parallelization. Its accuracy is demonstrated by comparing the analytical and numerically obtained scattering width of several canonical examples with a size of 700,000 wavelengths.

1. Introduction

[2] Significant research efforts have been expended on the development of ever more powerful electromagnetic solvers. Over the past two decades, the multilevel fast multipole algorithm (MLFMA) has proved to be a robust and error-controllable recipe to reduce the computational complexity of the matrix-vector product during the iterative Method of Moments (MoM) solution of boundary integral equations [Chew et al., 2001]. Most efforts have been directed toward the solution of large three-dimensional perfect electric conducting (PEC) objects as, e.g., by Velamparambil et al. [2003] and Ergül and Gürel [2009].

[3] This paper is concerned with the development of an efficient MLFMA solver that can handle a 2D geometry illuminated by arbitrary plane waves in 3D space. This means that even though the geometry is infinitely long in, e.g., the z direction, field values still depend on the z coordinate in general. In the case of oblique plane wave incidence, this dependency is of the form ejβz, which leads to a coupling of the 2D transverse magnetic (TM) and transverse electric (TE) problem.

[4] The boundary integral equation that was employed and its classic MoM solution were first developed by Olyslager et al. [1993] to study the behavior of waveguides. In a later effort, De Backer et al. [1997] used the Impedance Matrix Localization (IML) method to sparsify the system matrix. This allowed for the application of the integral equation to the prediction of indoor wave propagation. Other applications can be found in research fields such as imaging and tomography [see, e.g., Van den Bulcke and Franchois, 2009; Ngakosso et al., 1998]. In that context, the problem at hand is often referred to as ‘2.5-D’. This term however, is also frequently used for scattering problems in layered media, and is hence a possible source of confusion.

[5] In earlier work, we have proposed a kernel-independent, asynchronous, hierarchical parallel MLFMA applied to (pure) 2D TM scattering problems. The asynchronous algorithm allows for an efficient parallelization of simulations that involve multiple dielectric and/or PEC objects [Fostier and Olyslager, 2008b], while the hierarchical approach [Ergül and Gürel, 2008] allows for scalable parallel computations [Fostier and Olyslager, 2008a]. The term kernel-independent denotes that the parallel framework makes no assumptions about the electromagnetic MoM scheme that is used. In work by Peeters et al. [2008], the same framework was used for 3D broadband electromagnetic shielding problems. In this contribution, we report the application of this parallel framework to the coupled TE/TM integral equation.

[6] Compared to 3D simulations, 2D solvers allow for significantly larger simulations in terms of wavelengths. However, because the interactions between the discretized elements have a longer action range due to the slower decay of the 2D Green function, a high precision of the numerical methods is imperative. To the best of the authors knowledge, this paper is the first one to deal with the application of the MLFMA to coupled TE/TM simulations.

[7] This paper is organized as follows: in section 2, we outline the integral equation and its MoM solution. In section 3, the MLFMA is developed. Special attention is devoted to the memory-efficient storage of the lowest-level aggregation and disaggregation matrices. Also, a short outline of the parallel methodology and low-frequency stabilization is given. Finally, in section 4.1, we demonstrate the accuracy of the solver for different angles of oblique plane wave incidence and for very large scale examples with a diameter of 700,000λ.

2. Method of Moments

[8] Consider a number of homogeneous dielectric and PEC cylindrical objects of arbitrary shape that are embedded in a background medium or in another dielectric. The objects do not overlap or touch each other and are illuminated by an incoming electromagnetic field Einc, Hinc. If the total number of dielectric objects is D, then the number of homogeneous regions is D + 1. Each region is characterized by its material parameters εi and μi and by its contour Ci. The z axis is oriented along the longitudinal direction of the 2D objects. This is illustrated in Figure 1.

Figure 1.

Geometry under study: cylindrical objects illuminated by an incoming electromagnetic field Einc, Hinc.

[9] Using the inverse spatial Fourier transform in the z direction, the incoming and scattered fields can be assembled from their respective spectra, e.g., for the electric field:

equation image

with ρ = xux + yuy. Only the β components ranged in [−ki, ki] (with ki = ωequation image) contribute to the propagating fields in medium i. In what follows, we focus on a single β component and suppress the time and z dependence ej(ωt−βz), while retaining the same notations for the field components.

[10] From elementary electromagnetic theory, it is found that the longitudinal field components Ez and Hz satisfy the Helmholtz equation in each homogeneous medium i:

equation image

with γi2 = ki2β2 and F = E or F = H. In the local tangential coordinate system to boundary curve Ci of medium i (see Figure 1), the Et and Ht components can be expressed from the longitudinal components as follows:

equation image

[11] By making use of Green's identity and by eliminating the normal derivatives using (3), the fields in a point ρ within medium i can be expressed as a contour integral along its boundary Ci [see Olyslager et al., 1993]:

equation image
equation image

The integral expressions for the Hz(ρ) and Ht(ρ) field components can be obtained from (4) and (5), respectively, through the well-known duality substitutions: EH, H → −E, εiμi, μi → εi.The two-dimensional Green function is given by

equation image

[12] The final set of integral equations is obtained by expressing the continuity of the field components Ez, Hz, Et and Ht across dielectric boundaries. For a PEC body, only the Hz and Ht unknowns remain and the corresponding integral equation is obtained by expressing that the tangential electrical fields vanish at its boundary. In the appropriate media, a contribution from an excitation source needs to be taken into account (this is usually the background medium).

[13] Each contour Ci is divided into a number of line segments s with a length of λi/10. The wavelength λi is determined by considering the material with the highest contrast at either side of Ci. Overlapping triangular basis functions equation imagen are defined over two adjacent segments sn and sn+1 (see Figure 2) to expand the longitudinal components Ez and Hz. This results in a piecewise linear approximation of Ez and Hz, which makes the determination of the tangential derivatives ∂Ez/∂t and ∂Hz/∂t straightforward. The Et and Ht components are expanded into pulse basis functions equation imagen defined over a single segment sn. Conversely, pulse functions are used to test the Ez and Hz equations whereas the Et and Ht equations are tested by triangular functions. Finally, the MoM results in a linear system of unknowns

equation image

where X is a vector containing the unknown expansion coefficients of the field components. Z is a dense matrix which describes the interactions between the expansion coefficients. If the total number of discretization elements is given by N, the matrix Z consists of N × N blocks Zmn(m, n = 1 … N). In the dielectric case, Zmn takes the following form:

equation image

where [XY]mn (X, Y = [Ez, Hz, Et, Ht]) represents the contribution generated by the basis function of field component Y at segment sn (and sn+ 1 in the case of a triangular function) to the field component X tested at observer segment sm (same remark holds). Note that there is no contribution from Et to Ez and from Ht to Hz.

Figure 2.

The contour C is approximated by linear segments sn over which triangular and pulse and basis functions are defined.

[14] The elements [XY]mn represent a double integral over a basis and a test function. The integrand consists of the Green function or its derivatives. The tangential derivatives of the Green function can be eliminated through partial integration. For example, [EtHt]mn is given by

equation image

where we used of the fact that the triangular functions are zero at the integration boundaries. This way, only the Green function G and its normal derivatives equation image, equation image and equation image occur in the integrands. All interaction integrals [XY]mn are calculated numerically using Gauss-Legendre quadrature.

[15] In order to calculate the interaction of a segment with either itself or with a neighboring segment to a sufficiently high precision, the singularity of the Green function or its derivatives needs to be extracted. Although this leads to tedious calculations, it is possible to handle all the so-called self and neighbor patches analytically [Fostier and Olyslager, 2010].

[16] Finally, in (7), B is a vector containing the tested incoming fields. An incoming plane wave E(r) = E0ejk·r, propagating along wave vector k = k(cos αux + sin αuz), can be described with a single β component (β = k sin α). In what follows, we chose an electric field linearly polarized in the (x, z) plane, i.e., E0 = E0(equation imageux + equation imageuz). In that case, the tested field components in B are given by

equation image

with t the tangent to the segment, and Z = equation image.

[17] Note that more complex 3D sources like a point source or a dipole can also be used. In that case, the sources will need to be decomposed in several β components. Each β component requires a separate simulation. Using (1), the resulting field values can be assembled from these individual simulations. In this manuscript, however, we focus on the simulation of a single β component.

[18] The set of equations (7) is solved iteratively by using the TFQMR algorithm [Freund, 1993]. A simple block-Jacobi preconditioner with an adjustable block size is used to reduce the number of iterations. By storing these Jacobi blocks in LU decomposed form, no extra memory is required for the preconditioning.

3. High-Frequency MLFMA

3.1. Mathematical Description

[19] In this section, the application of the MLFMA to the MoM is outlined. For a good introduction to the mathematics and data structures of the MLFMA, we refer to Chew et al. [2001].

[20] The geometry is divided into a grid of squares (called ‘boxes’) with a size of 0.15–0.5λ. The fast multipole method allows for a fast evaluation of the fields in a certain box due to sources located in another box, provided that these boxes are sufficiently separated from each other. Mathematically, this is expressed through the following factorization of the Green function G:

equation image

where Ps and Po are the centers of the source and the observation box, respectively, and where ρs and ρo are two arbitrary points in the source and observation box, respectively. γq denote wave vectors along equidistant directions equation imageq = equation image and Tq is the translation operator given by

equation image

with Φ the angle between P and the x axis.

[21] The first factor in (11), the aggregation, depends only on the relative position of the source ρs in the source box. The aggregation is a plane wave expansion along directions ϕn and is represented by 2Q + 1 sampling points, a so-called outgoing radiation pattern (ORP). Multiple sources in a box can be aggregated in a single ORP, and the same ORP can be reused to calculate the fields in several observation boxes.

[22] The second factor in (11), the translation, depends only on the relative position of the source and observation box. Intuitively, the outgoing plane waves represented by the ORP are transformed into incoming plane waves represented by an incoming radiation pattern (IRP) in the observation box.

[23] Finally, the third factor in (11), the disaggregation, depends only on the relative position of the observation point ρo in the observation boxes. In this step, the IRP is used to calculate the actual field value in ρo. Again, the same IRP can be used to evaluate the field in many observation points in the box.

[24] In the MLFMA this idea is extrapolated hierarchically by assembling boxes into larger boxes and so on. In that way, all the interactions, normally requiring a computational complexity of O(N2), can be reduced to O(N log N) complexity.

3.2. Radiation Patterns

[25] At first glance it would appear that four radiation patterns—one for each field component—are necessary to evaluate all interactions between the expansion coefficients. However, upon closer inspection, only the radiation patterns for the Ez and Hz components are required. Indeed, equations (3) can be used to obtain the other two components Et and Ht. For each box, the ORPs are calculated from the expansion coefficients as follows:

equation image

where ezn, hzn, etn and htn are the expansion coefficients associated with segment sn, as estimated by TFQMR in the iterative solution process. The elements Tq,na, Tq,na and Pq,na(q = −Q,…,Q) are given by (the superscript ‘a’ denotes ‘aggregation’)

equation image
equation image
equation image

where n′ denotes the normal to the segment, which appears from taking the normal derivative equation image in (11).

[26] Similarly, the disaggregation of the incoming radiation patterns (IRP) at each box can be expressed as follows:

equation image

where equation imagezm, equation imagezm, equation imagetm and equation imagetm represent the tested coefficients of the corresponding field components at segment m. The elements Tq,md, Tq,md and Pq,md (the superscript d denotes the disaggregation) are defined by

equation image
equation image
equation image

where t is the tangent to the segment.

[27] As mentioned before, we have used (3) to derive the Et and Ht components from the Ez and Hz radiation pattern. This observation results in a reduction by a factor of two of the memory requirements for storing the outgoing and incoming far field patterns and eliminates unnecessary shifting, interpolation and translation operations.

3.3. Compression of the (Dis)aggregation Matrices

[28] If a certain box contains S segments of a dielectric interface and if the sampling rate for the associated radiation pattern is denoted by N (with N = 2Q + 1), then the full storage of both matrices would require 16SN elements. Expressing the lowest-level (dis)aggregation as a matrix-vector product has the advantage that the BLAS [Dongarra et al., 1988] can be employed for their fast evaluation.

[29] One can, however, easily observe that storing only the elements Pq,na, Tq,na and Tq,na for the aggregation and Pq,nd, Tq,nd and Tq,nd for the disaggregation is sufficient in the general case. The prefactors from the matrix elements can be introduced in the right hand side vector. In this way, the memory requirements are reduced to 6SN elements.

[30] If the medium is lossless (Im(γ) = 0), it is easily observed that Pq,na = (Pq,nd)* and Tq,na = (Tq,nd)*, where (.)* denotes the complex conjugate. Furthermore, if N is taken even (this is always possible), a factor of two is gained by observing that Pq,na = (Pq,na)*, etc. In that case, only the storage of the Pq,na, Tq,na, Tq,na and Tq,nd for half of the N values is required. This amounts to only 2SN elements in total. Note that the use of BLAS is still possible to some extent.

[31] If the medium is lossy, these relations do not hold. However, if N is even then Pq,na = Pq,nd and Tq,na = Tq,nd. In that case, 4SN elements need to be stored.

[32] In all cases, the memory for the (dis)aggregation matrices can be further reduced by carefully selecting the number of sampling points that is required to store the radiation patterns. It is a known fact that the Nyquist sampling rate for storing the radiation pattern (denoted by N′) is smaller than the oversampled rate N that is required for accurate translations [Sarvas, 2003]. The ORPs for each box at the lowest level can be determined at a sampling rate of N′, after which they are interpolated to a higher sampling rate of N, using, e.g., fast Fourier transform (FFT) interpolation. Conversely, for the disaggregation, the IRPs are downsampled from N to N′ without loss of information, after which the tested expansion coefficients are obtained through the matrix-vector product. This results in even smaller aggregation and disaggregation matrices, at the cost of an extra interpolation.

[33] Figure 3 illustrates the memory requirements and the run time of the lowest-level aggregations in the lossless case. We consider three cases: the full storage of the matrices at a sampling rate of N, the full storage of these matrices at a sampling rate of N′ and the compressed storage at a sampling rate of N′. In the first case, only a matrix-vector product is required for the (dis)aggregation. In the case of the reduced sampling rate N′, a smaller matrix-vector product and an additional FFT interpolation are required. The FFTW package [Frigo and Johnson, 2005] was used for this. Not only does this method require less memory, it is also faster. In the third case, this memory is further reduced and although this scheme requires somewhat more computational operations, it appears to be the fastest method. This contradiction can most likely be attributed to a better use of the processor's cache memory.

Figure 3.

Comparison of the memory requirements and the run time for the different approaches of the lowest-level (dis)aggregations in the lossless case.

[34] We conclude that by carefully sampling the lowest-level radiation patterns and by exploiting their symmetry, memory requirements can be significantly reduced without compromising speed of execution.

3.4. Low-Frequency MLFMA

[35] It is a known fact that the plane wave MLFMA described in section 3.1 breaks down at low frequencies because of accumulating numerical instabilities in the evaluation of (11). An elegant solution to this problem was theoretically developed by Bogaert et al. [2006] and first implemented in the parallel framework by Michiels et al. [2011] for the pure TM and TE case. The same method can also be employed for the coupled TE/TM problem at hand, and will prove very useful for the case where βk and hence γ ≈ 0.

3.5. Hierarchical/Asynchronous Parallelization

[36] Efficient implementations of the MLFMA can handle problems with up to one million of unknowns on a typical workstation. In order to deal with even larger simulations, the MLFMA can be parallelized. In this way, the combined memory and CPU power of a cluster of computers can be used. Recently, complex schemes based on a so-called hierarchical load distribution were introduced [Ergül and Gürel, 2008]. In two dimensions, this approach was shown to lead to a scalable parallelization [Fostier and Olyslager, 2008a], which means that larger simulations can be handled by using a proportional increase in number of workstations, without loss of efficiency.

[37] Parallelization efforts of the MLFMA are typically focused at a single, large 3D PEC object [Velamparambil et al., 2003; Ergül and Gürel, 2009]. In order to efficiently deal with simulations that contain different dielectric regions, an asynchronous parallelization scheme was introduced [Fostier and Olyslager, 2008b] at the cost of additional implementation complexity.

[38] The hierarchical and asynchronous parallelization of the MLFMA were implemented in a framework that is decoupled from the actual boundary integral formulation that is used. In this way, this framework can also be used for the coupled TE/TM problem.

4. Results and Discussion

4.1. Validation for Different Values of β

[39] In this section, the solver is validated for different values of β, both for β < k (propagating fields) and β > k (evanescent fields). We consider the following geometry (see Figure 4): a circular dielectric cylinder (medium 3; equation imager = 1) with a diameter of 5λ concentrically embedded in a larger dielectric cylinder (medium 2; εr = 2) with a diameter of 10λ. Here, λ = equation image denotes the free space wavelength. The dielectric background medium (medium 1) has εr = 4 as permittivity. The wavenumbers in the respective media are denoted by ki (i = 1, 2, 3).

Figure 4.

Two embedded dielectrics illuminated by a plane wave incident at an angle of α with the (x, y) plane.

[40] The cylinders are illuminated by a plane wave incident at an angle α (i.e., β = k1 sin α) with the xy plane. We consider four values for α. For α = 0°, the problem reduces to a pure 2D TM scattering problem. The case of oblique incidence (α = 20°), leads to a coupling of the 2D TE and TM problems. Because β < ki (i = 1, 2, 3), the fields propagate in the three media. When α is increased to 40°, β > k3, which means that the fields in the inner cylinder (medium 3) are evanescent. Finally, for α = 60°, the fields in both cylinders (medium 2 and 3) are evanescent. Figure 5 shows the electrical field density ∣Ez∣ for the different angles α. Note that the low-frequency MLFMA stabilization mentioned in section 3.4 was used for an accurate evaluation of the interactions in the last two simulations.

Figure 5.

Electric field density |Ez| (V/m) for (top left) α = 0°, (top right) α = 20°, (bottom left) α = 40°, and (bottom right) α = 60°.

[41] In all cases, the tolerance for the relative residual error was set to 10−6, and a solution was obtained in a few minutes using a single CPU. For this geometry, the analytical solution can be derived in closed form by separation of variables [Van Bladel, 2007]. In Table 1, the root-mean-square (RMS) error for the bistatic scattering width (SW) for both polarizations (VV and VH) is given. The RMS error is defined as follows:

equation image

with σan) and σcn) the analytical and calculated SW, respectively, in expressed dB and sampled in a number of equidistant angles θn. Very accurate results are obtained for all values of β.

Table 1. The RMS Error for the Bistatic SW for Different Values of β
αβ (1/m)γ1 (1/m)γ2 (1/m)γ3 (1/m)RMSVV (dB)RMSVH (dB)
  • a

    NA means not applicable.


4.2. Validation for Extremely Large Problems

[42] In this section, the validation of the solver for extremely large scattering problems is investigated. We consider 4 circular cylindrical geometries: (1) a single PEC cylinder, (2) a single dielectric (εr = 2) cylinder, (3) a PEC cylinder concentrically embedded in a dielectric (εr = 2) cylinder, and (4) a dielectric cylinder (εr = 4) concentrically embedded in a dielectric (εr = 2) cylinder. In all cases, the outer cylinder has a diameter of 700,000λ whereas the inner cylinder has a diameter of 350,000λ. The cylinders are illuminated by a plane wave incident at an angle of 20° with the xy plane. The background medium has permittivity εr = 1.

[43] The numerical solutions were obtained on a parallel system consisting of 1024 processor cores (32 machines each containing four eight-core AMD Opteron 6136 processors) with 2 TB of memory in total (hence 2 GB of memory per core). As an interconnection network, a double QDR Infiniband link was used. The TFQMR solver was stopped after 7000 iterations. A 2λ × 2λ preconditioner was used in all cases. Again, the bistatic SW of these geometries was determined in 18,652,634 equidistant angles equation imagen = equation image. In Table 2, the RMS error is listed for all cases and together with the number of unknowns, the total run time and the relative residual error after iteration 7000. Even though the RMS is higher than for the smaller simulations in section 4.1, the results are still accurate. Indeed, depending on the simulation, a RMS error between 0.2039 dB and 1.4089 dB is obtained, while the actual SW ranges roughly between −60 dB and 60 dB. A rather large contribution to the RMS error is caused by sampling the SW minima. In these minima, the difference between the analytical and simulated values can be large, when these values are expressed in dB. For most purposes, there is no practical difference between, e.g., −50 dB and −60 dB. Note that in the case of the single PEC cylinder, no cross polarization occurs. In Figure 6, the bistatic SW for both polarizations are illustrated in the case of the embedded dielectric cylinders. We have chosen to show only a small portion of the angular range in order to illustrate the very good correspondence between analytical and simulated results, which both oscillate rapidly as a function of the angle.

Figure 6.

Comparison between the analytical and calculated bistatic SW with (left) VV polarization and (right) VH polarization of a 350,000λ0 dielectric cylinder (εr = 4) embedded into a 700,000λ0 dielectric cylinder (εr = 2): (top) θ ∈ [0°, 0.003°] and (bottom) θ ∈ [90°, 90.003°].

Table 2. The RMS Error for the Bistatic SW of a Few Cylindrical Objects With a Diameter of 700,000λ0
ObjectNumber of UnknownsRun TimeNumber of IterationsRelative Residual ErrorRMSVV (dB)RMSVH (dB)
  • a

    NA means not applicable.

PEC41,329,8423 h 35 min7,0001.641E-30.2039NAa
Dielectric120,707,8849 h 19 min7,0006.736E-30.45490.7853
PEC/Dielectric150,884,85611 h 55 min7,0009.996E-30.52761.4089
Dielectric/Dielectric207,376,70027 h 50 min7,0001.315E-20.77171.0257

[44] For the largest simulation, the embedded dielectrics, a total of approximately 1.8 TByte of memory was required. The near interactions and preconditioner account for approximately 25% of this memory, while the outgoing and incoming radiation patterns account for 42% of the memory usage. Due to the compressed representation of the (dis)aggregation matrices, only 5.7% of the total memory was needed to store their elements. The remaining memory was mainly used as work memory for the iterative solver and as communication buffers.

5. Conclusion

[45] We have developed a two-dimensional parallel MLFMA solver that can handle arbitrary plane waves in three dimensional space. Simulations and analytic results for very large cylindrical objects with a diameter of 700,000λ and over two hundred million of unknowns were found to be in excellent agreement.


[46] Parts of this work were supervised by the late Femke Olyslager. The work of J. Fostier was supported by a doctoral grant from the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen). The work of B. Michiels was supported by a doctoral grant from the Special Research Fund (BOF) at Ghent University. The work of I. Bogaert was supported by a postdoctoral grant from the Fund for Scientific Research (Fonds Wetenschappelijk Onderzoek). The computational resources (Stevin Supercomputer Infrastructure) and services used in this work were provided by Ghent University, the Hercules Foundation and the Flemish Government—department EWI.