A fast algorithm for multiscale electromagnetic problems using interpolative decomposition and multilevel fast multipole algorithm



[1] The interpolative decomposition (ID) is combined with the multilevel fast multipole algorithm (MLFMA), denoted by ID-MLFMA, to handle multiscale problems. The ID-MLFMA first generates ID levels by recursively dividing the boxes at the finest MLFMA level into smaller boxes. It is specifically shown that near-field interactions with respect to the MLFMA, in the form of the matrix vector multiplication (MVM), are efficiently approximated at the ID levels. Meanwhile, computations on far-field interactions at the MLFMA levels remain unchanged. Only a small portion of matrix entries are required to approximate coupling among well-separated boxes at the ID levels, and these submatrices can be filled without computing the complete original coupling matrix. It follows that the matrix filling in the ID-MLFMA becomes much less expensive. The memory consumed is thus greatly reduced and the MVM is accelerated as well. Several factors that may influence the accuracy, efficiency and reliability of the proposed ID-MLFMA are investigated by numerical experiments. Complex targets are calculated to demonstrate the capability of the ID-MLFMA algorithm.

1. Introduction

[2] Efficient and accurate solutions of electromagnetic (EM) scattering and radiation problems have attained a lot of interest for decades. Typical applications include radar cross section (RCS) estimation, antenna analysis and design, electromagnetic compatibility (EMC), electromagnetic interference (EMI), radiation hazards (EMR), remote sensing, etc. Among many full-wave numerical methods, the algorithms developed based on the method of moments (MoM) [Peterson et al., 1998] have been widely used due to its high fidelity and superior capability to handle arbitrarily shaped targets. A typical MoM solution procedure begins with properly meshing the target of interest and selecting basis functions to model the equivalent electric and magnetic currents. After modeling a target with a set of N expansion functions and performing the traditional Galerkin testing for the integral equation, an N × N dense impedance matrix is generated with the memory requirement of O(N2). The resultant matrix system can be solved by direct or iterative solvers. The computational complexity of MoM is O(N3) for a conventional direct solver in terms of CPU time, such as LU, and O(N2) for an iterative algorithm, such as CG or GMRES. The RWG [Rao et al., 1982] basis functions are the typical basis functions selected for discretizing the integral equations. To achieve accurate solutions, the average size of each element is generally on the order of 1/10 wavelength (λ). Consequently, the size of the associated MoM matrix grows very rapidly as the object size becomes larger with respect to λ; this challenges the MoM for a variety of applications. To make it worse, there are so-called multiscale applications. In these cases, targets are over-meshed to conduct wide-band calculations, or partly over-meshed to capture the tiny geometrical structures. The discretization size is virtually independent of λ and N subsequently can be very large even for electrically small target sizes.

[3] In the MoM, both the CPU time and memory space are a great burden for a moderate N, even on modern computers. To mitigate this technical difficulty, MoM matrix equations typically utilize iterative solvers along with techniques to accelerate the matrix vector multiplication (MVM). These accelerations are performed either by adopting sets of directional basis and testing functions which radiate narrow beams (giving rise to quasi-sparse impedance matrices) or by approximating MVM through the physical or mathematical properties of the MoM matrix. Examples of the former case include the impedance matrix localization (IML) [Canning, 1995], complex multipole beam approach (CMBA) [Boag and Mittra, 1994], and wavelet expansion [Steinberg and Leviatan, 1993]. The latter includes the fast multipole method (FMM) [Coifman et al., 1993], its multilevel version [Chew et al., 2001; Velamparambil and Chew, 2005; Pan and Sheng, 2006, 2008; Ergul and Gurel, 2009; Taboada et al., 2010], adaptive integral method (AIM) [Bleszynski et al., 1996], precorrected fast Fourier transform (pFFT) method [Phillips and White, 1997], multilevel matrix decomposition algorithm (MLMDA) [Michielssen and Boag, 1996; Rius et al., 2008], IES3 [Kapur and Long, 1998], QR-based or SVD-based methods [Tsang and Li, 2004; Breuer et al., 2003; Gope and Jandhyala, 2005; Seo and Lee, 2004; Burkholder and Lee, 2004], and adaptive cross-approximation (ACA) method [Kurz et al., 2002; Zhao et al., 2005; Shaeffer, 2008]. In some of these formulations the memory requirements and CPU time are reduced from O(N2) to O(N1.5) for single level implementations and O(N log αN)(1 ≤ α ≤ 2) for multilevel ones. The pFFT, FMM and its multilevel version are based on analytic property of the Green's function, while the MLMDA, IES3, ACA and the other QR-based methods are based on the rank deficiency among the coupling matrices between well-separated mesh partitions. The approximating methods mentioned above differ in the implementation and performance despite similarity in essence. Among these methods, MLFMA seems to be the most appealing one because of its fidelity, efficiency and generality. Although MLFMA has well documented success in solving large-scale MoM-based problems, its applications on scenarios involving over-meshing still present challenges. To substantiate this claim, it is well known that the FMM and MLFMA suffer from sub-wavelength breakdown when targets are over-meshed [Chew et al., 2001]. This results in expensive operations associated with near-field coupling submatrices which could consist of millions of entries. Computing and storing these are often impractical.

[4] One solution for this difficulty is to combine the traditional MLFMA with its low-frequency versions developed through analytic approaches [Hu et al., 2001; Darve and Have, 2004; Cheng et al., 2006; Jiang and Chew, 2004; Daniela and Bunger, 2009; Vikram et al., 2009]. The efficiency issues associated with the aforementioned approaches are mentioned in Section 5.4. Another possibility is to adopt algebraic techniques such as ACA, QR or SVD-based methods, to approximate the near-field interactions in the MLFMA. However, the QR or SVD-based methods require all entries of the near-field matrix being computed. This prevents them from efficiently treating multiscale problems because evaluating and storing the near-field matrix would be too expensive. Furthermore, the complexity of the QR or SVD-based algorithms is of O(N3), where N is the dimension of the objective matrix. Recently, Rodriguez et al. [2008] efficiently approximated near-field interactions of the FMM according to the corresponding data sparse representation of far-field interactions, which was obtained by applying the SVD to the aggregation matrix. But the error arising from this approximation is hard to analyze. On the other hand, the ACA can avoid the large time/memory requirement to compute all the matrix entries, while its error control scheme is still an ongoing area of research. In this paper, the interpolative decomposition (ID) [Liberty et al., 2007] is combined with the conventional MLFMA for multiscale problems. Particularly, the ID is employed here to efficiently approximate near-field interactions (NFIs) with respect to MLFMA (MLFMA-NFIs). Furthermore, a specific mechanism for the ID approximation is developed to avoid the expensive operations on evaluation all of the MLFMA-NFI matrix elements.

[5] The rest of the paper is organized as follows. Section 2 begins with a brief outline of the conventional MLFMA and then discusses the main framework of the proposed ID-MLFMA. Section 3 gives the basic idea of the ID algorithm and its applications on matrix approximations. Some details about computations at the ID levels are discussed in Section 4, including the employment of the artificial sphere. Section 5 presents some illustrative numerical results, and finally, a summary and some conclusions are given in Section 6.

2. Outline of MLFMA and ID-MLFMA

2.1. MLFMA

[6] For perfectly electric conducting (PEC) objects, discretization and testing of surface integral equations yields an N × N dense matrix equation in the form of

display math

where Z is the impedance matrix, N is the number of unknowns, I are the equivalent current, and V corresponds to the incident wave. The matrix equation, equation (1), can be solved iteratively, and the required MVM can be accelerated by the FMM or MLFMA [Coifman et al., 1993; Chew et al., 2001]. The FMM/MLFMA decomposes MVM into two parts: NFIs and far-field interactions (FFIs). The former is computed directly, while the latter is accelerated by FMM/MLFMA. The matrix equation in the context of FMM has a form of

display math

where Zo,s is the impedance matrix corresponding to the observation and source box. Is is the coefficients of the RWG basis functions in the box s; Bo denotes the near neighbors of the box o; To,s is the translator; Do and As are the disaggregation and aggregation matrices. The first term in equation (2) accounts for the contribution from the self-coupling of box o and its near neighbors. While the second one collects the contribution from the remaining boxes.

[7] To conduct far-field interactions by MLFMA, a hierarchical tree structure is always constructed by recursively subdividing the spatial domain. The computational domain is first enclosed in a box; subsequently the box is divided into eight equal children, where each child is then recursively divided into smaller boxes. The recursive division will not stop until the size of the leaf box is less than a given size, in our case, we use 0.3λ. In the end, a tree structure is established. In general, all computations in the FMM/MLFMA are organized by boxes from this tree structure. In the MLFMA, interpolation/anterpolation combined with center-shifting operations is required to transfer far-field patterns from a child box to its parent box and vice versa. The detailed explanation of the MLFMA can be found in work by Chew et al. [2001].


[8] In order to compute the FFIs, the MLFMA maps basis/testing functions to plane waves using far-field expansion functions. The interactions between the plane wave functions are efficiently handled through the three steps listed above; furthermore, such interactions are independent of the supporting mesh. In contrast, the MLFMA-NFI matrix is heavily dependent on the properties of the mesh; as the mesh density grows so does the computational complexity of evaluating the coupling in the near-field. Consequently the efficiency of MLFMA degrades rapidly for targets which have fine meshing.

[9] To circumvent this problem, we propose a new approach by combining the interpolative decomposition (ID) [Liberty et al., 2007] with the conventional MLFMA, denoted by ID-MLFMA. The finest level boxes in terms of the MLFMA, which are determined based on size criterion, are decomposed further in the ID-MLFMA; the condition for discontinuing the decomposition is based on some predetermined number of basis elements (e.g. 50). Figure 1 shows such a tree structure for the ID-MLFMA in 2D cases. The levels in the tree are classified into three categories: the MLFMA levels, the transition level and the ID levels. The transition level is identical to the finest MLFMA level. It is worthy pointing out that the box division procedure can be carried out independently. Namely, further division would only be conducted on boxes with more than 50 unknowns. The computations on MLFMA-FFIs are conducted by the aggregation, translation and disaggregation, while those on MLFMA-NFIs are conducted through the skeleton approximation at ID levels. The MLFMA-NFI matrix consists of block matrices that represent coupling among finest MLFMA boxes. These block matrices generally permit no approximation because of their high rank. Submatrices with deficient rank should be extracted from them in advance. To this end, the ID-MLFMA further classifies MLFMA-NFIs into ID-NFIs and ID-FFIs at the ID levels; the distinction between the two is based upon the so-called “one-buffer-box” criterion [Chew et al., 2001]. Based on the ID [Liberty et al., 2007], efficient data sparse representation is found to approximate the low rank ID-FFI submatrices. Because the high rank submatrices corresponding to the ID-NFIs are separated from the approximation, the resultant ID-FFI submatrices in the form of data sparse representation are very sparse. The proposed ID-MLFMA has the following virtues.

1. The error of the ID-MLFMA is controllable since both the MLFMA and the ID are error controllable.

2. The data sparse representation of ID-FFI submatrices is obtained without computing and storing all the original coupling matrix elements.

3. The integration of the ID into the MLFMA is straightforward.

Figure 1.

Tree structure of the ID-MLFMA (2D case).

[10] The implementation details of the ID-MLFMA will be given in Section 4 after the discussion of the ID and its applications on skeletonization in Section 3.

3. ID and Matrix Approximation

[11] Skeletons and data sparse representation are efficiently computed for the ID-FFIs by the ID. In the following, the mathematical background of the ID and the procedures of constructing skeletons are elucidated.

3.1. ID

[12] We first gives a brief introduction of the ID proposed by Liberty et al. [2007]. Suppose C is a complex m × n matrix of rank k with k ≤ m and k ≤ n. There exist a complex k × n matrix P and complex m × k matrix B whose columns consists of a subset of the columns of C such that

1. some subset of the columns of P makes up the k × k identity matrix;

2. no element of P has an absolute value greater than 1;

3. inline image;

4. the least (that is the k − th greatest) singular value of P is at least 1; and

5. when k < m and k < n, inline image, where σk+1 is the (k + 1)-st greatest singular value of C.

[13] Based on these statements, an approximation can be derived as

display math

when the exact rank of Cm×n is greater than k, but the (k + 1)-st greatest singular value of Cm×n is small.

[14] The ID employs randomness to reach the decomposition described in equation (3). It begins with generating a random vector ω with Gaussian distribution and forming the product of y = ωHC, where the superscript H means adjoint operation. Vector y can be regarded as a random sample from the range of C. Repeating this sampling process l(l > k) times:

display math

Owing to the randomness, the set ω(i) : i = 1, 2, ⋅⋅⋅, l of random vectors form a linearly independent set and no linear combination falls in the null-space of C. Therefore, to produce an orthonormal basis of the range of C, we just need to orthonormalize the sample vectors by rewriting equation (4) into the compact form,

display math

Employing some stable methods for performing the orthonormalization, such as the pivoted QR factorization, a k × n matrix P whose columns form an orthonormal basis for the range of Y can be obtained, such that

display math

where the columns of Ll×k constitute a subset of the columns of Y. That is to say, there exists a set of integers i1i2, ⋅⋅⋅, ik that, for any j = 1, 2, ⋅⋅⋅, k, the j-th columns of L is the ij-th column of Y. Collect the corresponding columns of C into a complex m × k matrix B, so that, for any j = 1, 2, ⋅⋅⋅, k, the j-th columns of B is the ij-th column of C.

[15] The ID algorithm typically requires [Liberty et al., 2007]

display math

floating-point operations, where CH is the cost of applying CH to a vector.

[16] As shown by Liberty et al. [2007], l = k + 5 or l = k + 10 is sufficient. In practice, the rank k is rarely known in advance. The ID are usually implemented in an adaptive fashion where the number of samples is increased until the error satisfies the desired threshold εID as discussed in Section 5.2. This will at most double the cost [Liberty et al., 2007]. Due to the randomness used, the ID does have the possibility to fail. However, the possibility is very slim [Liberty et al., 2007]. In a word, compared with the classical pivoted QR factorization, the cost is reduced a lot since we need only to factorize the small matrix Y.

[17] In some cases, it is more efficient to construct matrix Ωl×m in such a manner that the resultant matrix consists of uniformly randomly selected rows of the product of the discrete Fourier transform matrix and a random diagonal matrix [Liberty et al., 2007].

3.2. Approximating Matrix by ID

[18] Suppose the boxes o and s are a pair of well separated boxes, as shown in Figure 2. There are Ns basis functions in the source box s, with image as the coefficients, and No testing functions in the observation box o. Basis functions are denoted by lines with arrows, while testing functions are denoted by lines with double arrows. Skeletons to be figured out are indicated by lines with solid arrows. Suppose image is the rank deficient coupling matrix for these two boxes. Applying the ID of equation (3) to the coupling matrix yields

display math

where ks(ks ≤ Ns) is the number of skeletons for the source group s. image is the compressed representation of the matrix image consisting of ks columns of original matrix image Furthermore, employing the ID to conduct the row approximation on image results in,

display math

where ko(ko ≤ No) is the number of skeletons for the observation group o. image is the sampling matrix consisting of ko rows of image

Figure 2.

Construction of skeletons, (a) before and (b) after skeletonization.

[19] The matrix vector multiplication image evaluates the fields image at the No observation points generated by the source image According to equation (8) and equation (9), the MVM can be written as,

display math

The matrices image and image represent projection matrices. The former selects the dominant ks radiating elements which will sufficiently approximate the outgoing fields radiating from box s. The latter projects dominant field components onto each testing function located in box o. image the sampling matrix, acts as a translation operator; it alters the outgoing skeletonized representations to incoming ones. It is essentially viewed as a compressed version of the image coupling matrix. If the matrix dimensions ko ≪ No and ks ≪ Ns, the compression can be significant.

4. Implementation of ID-MLFMA

[20] In the ID-MLFMA, interactions at levels above the transition level are carried out by aggregation, translation and disaggregation, which are well documented by Coifman et al. [1993] and Chew et al. [2001]. Computations below the transition level (i.e. the ID levels) are conducted by the skeleton approximation. The implementation of this approximation will be discussed in detail in this section.

4.1. Extracting and Approximating Low Rank Submatrices

[21] The MLFMA-NFI matrix consists of block matrices corresponding to coupling among boxes at the transition ID-MLFMA level. These matrices are generally not rank deficient matrix subject to low rank approximation. However, the rank deficiency can be exploited at the ID levels. Suppose the ltrans-th level is the transition level, and box b1 is a box at this level which has three near neighbors: b2, b3 and b4 as shown in Figure 3a for a 2D case. It is clear that each matrix associated with interactions among these 4 boxes (including the self-coupling matrix) is almost full rank matrix. So, the skeleton approximation cannot be efficiently applied to them. After dividing each of the 4 boxes into 4 sub-boxes, b12, b13 and b14 are near neighbors of the box b11, while all children of b2, b3 and b4, the gray boxes in Figure 3b, become second near neighbors of b11. Their coupling belong to ID-FFIs and the corresponding submatrices are low rank matrices. By this manner, all the low rank matrices can be extracted at the (ltrans + 1)-th level.

Figure 3.

The division of boxes, showing (a) ltrans-th and (b) (ltrans + 1)-th levels.

[22] Applying the ID to the low rank matrices, the first term in equation (2) can be written as

display math

where q and p, residing at the (ltrans+1)-th level, are children of boxes o and s, respectively. The notations in equation (11) are similarly defined as in equation (10). Suppose inline image and inline image, then the size of the low rank matrix Zq,p can be reduced by a factor of Cq ⋅ Cp.

[23] The first term in equation (11) can be recursively approximated by skeletons at the (ltrans + 2)-th level, (ltrans + 3)-th level, ⋅⋅⋅, and so on. It is worthy pointing out that the skeletonization can also be used to efficiently approximate the aggregation and disaggregation matrices.

4.2. Constructing Projection Matrices Efficiently

[24] A simple way to obtain the skeletons of the box q is to concatenate all the submatrix Zq,p and Zp,qH, where p ∉ Bq, into a matrix as

display math

which results in a Nq × Ntot matrix with inline image. The ID can be utilized on the matrix Zq to conduct the row approximation as

display math

where image consists of kq rows of Zq, and image is the incoming projection matrix of box q. Thus, for any submatrix Zq,p(∀ p ∉ Bq) , we have

display math

where image consists of kq rows of Zq,p. According to Martinsson and Rokhlin [2005, 2007] and Greengard et al. [2009], the outgoing projection matrix of any box can be identical to its adjoint of incoming projection matrix. As a result, one arrives at

display math

Although the above procedure seems simple enough, constructing projection matrices R/L of Zq may be time consuming. The reason centers around the fact that computations in equation (13) require all the elements of Zq,p({p : p ∉ Bq}) to be available. Since these matrices are dense, computing and storing them are very time consuming; additionally, although Nq may be moderately sized Ntot can be considerably large leading to a large dimensions in the Zq matrix.

[25] Conceptually, finding skeletons of a box is a procedure of selecting basis/testing functions in this box according to their ability of radiation or receiving. It is realized by calculating and sorting the singular values of the corresponding coupling matrix. Due to the fast decay of the Green's function with respect to the distance between the source and observation points, Martinsson and Rokhlin [2005, 2007] and Greengard et al. [2009] proposed an approach to accelerate this step by introducing “supercell” of box q. In the supercell approach, Zq includes all Zq,p(p ≠ q). By contrast, in the ID-MLFMA, Zq consists of Zq,p(p ∉ Bq) and excludes the ID-NFI block matrices. Consequently, we can substitute an artificial sphere for all the boxes p({p : p ∉ Bq}), as shown in Figure 4. The radius of the artificial sphere rsph = 2.5rbox, where rbox is the size of the box of interest. Because each second near neighbor (box p{p : p ∉ Bq}) resides outside the artificial sphere, the ability of radiation or receiving of the elements in the box q can be well measured. Based on this, we just take into account the coupling between the box q and the corresponding artificial sphere to construct the skeletons of box q. In particular, we compute the MoM submatrices Zq,a and Za,q for the mutual coupling between the box q and the artificial sphere a according to the standard MoM discretization procedure. The matrix used to construct skeletons of box q is then written as Zq = (Zq,a, (Za,q)H). After applying the ID on Zq as done in equation (13), we get the Lq and Rq by

display math

where Na is the number of unknowns required to discretize the artificial sphere a. Na is determined by an user-specified accuracy εID to conduct the ID approximation, as shown in Section 5.1 and 5.2. Since the number of unknowns for the artificial sphere is always much less than inline image, a lot of CPU time is saved in applying the ID on Zq.

Figure 4.

The artificial sphere for skeleton construction (rsph = 2.5rbox).

[26] The employment of the artificial sphere provides us a mechanism to efficiently construct projection matrices without the MLFMA-NFI matrix, which is usually expensive to evaluate in multiscale problems. After skeletonization, the matrix Sq,p are evaluated directly according to the computed skeletons. Thus, only kq × kp matrix elements are required to be computed and stored, instead of Nq × Np elements in the original Zq,p. This may be one of the most attractive virtues of the ID-MLFMA. Furthermore, integrating the ID into the existing MLFMA is straightforward because all skeletonization operations are carried out on the MLFMA-NFI matrix. Also, the skeletonization procedure represents an inherent parallelism.

5. Numerical Results

[27] All the computations are carried out on the Dell Optiplex 980 personal computer. It is configured with i7 870 CPU and 16 GB memory. RWG functions are chosen as basis and testing functions to discretize CFIE with a combination coefficient of 0.2. The GMRES iteration process is terminated when the L2-norm of the residual vector is reduced to 10−3. To simplify implementation of ID-MLFMA, skeletons at the child level are not used to construct skeletons at the parent level. (See Notation for definitions of notation used in this section.)

[28] In the following, the compression ratio is computed by

display math
display math

Errors on electric current δJ and the radar cross section (RCS) δRCS are computed via

display math

where fID-MLFMA is the result obtained by the ID-MLFMA, fref is the data computed by an analytical approach or the MLFMA, and ∥⋅∥ is the Euclidean norm.

5.1. Mesh Size of the Artificial Sphere

[29] In this subsection, experiments are carried out to study how the mesh size of the artificial sphere will impact the efficiency and accuracy of the skeletonization by calculating the scattering from a PEC sphere, denoted by Sph-12, at the frequency of 60 MHz. Sph-12 has a diameter of 12 meters, modeled by 164,268 unknowns with an average mesh size of 0.1 m. In the experiment, εID is 0.001. The average mesh size of the artificial sphere is set be 0.15λ, 0.10λ and 0.06λ. In the ID-MLFMA a total of 6 levels are required (the 0 and 1st levels involve no computations). The 3rd level is representative of the transition between MLFMA and ID. The last 2 (4th and 5th) ones are required for the ID. Skeletonization is performed independently on boxes at the ID levels. No skeleton is constructed for boxes with less than 50 basis functions.

[30] Table 1 shows that the accuracy of ID-MLFMA is quite insensitive to the mesh size of the artificial sphere. The RCS errors (taking the Mie series as the reference) for these three cases are almost identical. To understand this phenomenon, it is beneficial to recall the essence of the skeleton construction. The selection of basis and testing functions, which is carried out by the projection matrices R and L, is based on the elements which contribute significantly to the radiation and receiving capabilities of a given box. Such projection matrices are obtained by calculating and sorting the singular values of the original coupling matrix through some rank revealing techniques (e.g. the ID). In this procedure, the ID cares more about the relative magnitudes of the singular values rather than their accurate values. That is to say, the accuracy of the singular values does not matter much if the singular values can be sorted correctly.

Table 1. Impacts of the Mesh Size of the Artificial Sphere on the ID-MLFMA
 Average Size of Mesh (λ)
Memory (MB)
Time (s)
δJ (%)0.350.320.31
δRCS (%)0.300.300.30

[31] To prove our analysis, the normalized singular values of the coupling matrix for a selected box (a box contains 305 unknowns at the 4th level) are calculated under different meshing of the artificial sphere. As shown in Figure 5, the singular values are almost identical in these three cases because near neighbor coupling is excluded from the Zq and rbox is large in comparison with the box size. Since the ID employs randomness in finding skeletons, the skeletons obtained in these three cases are not exactly the same. However, 95% of the skeletons are identical. In other words, the skeletons are insensitive to the mesh size of the artificial sphere. It should be noted that the quadrature rule order does affect the accuracy of the matrix elements and thus the accuracy of very small singular values (i.e., the ones having normalized magnitudes less than 10−5) [Rius et al., 2008]. Since a threshold of 0.001 can always reach enough accurate approximation for the ID, as will be shown in Section 5.2, the basis/testing functions associated with small singular values contribute little to the accuracy of ID. Additionally, high order quadrature rules are unnecessary to some extent because the Sph-12 is already overly meshed. In a word, the quadrature rule order doesn't play an important role in constructing skeletons.

Figure 5.

The normalized singular values of a selected box.

5.2. Threshold for Skeleton Constructing

[32] An εID, closely related to the singular values of the objective matrix, should be prescribed in advance for the ID to construct skeletons. The following experiments show the impact of εID on the efficiency and accuracy of the ID-MLFMA. In these experiments, the ID-MLFMA computations are conducted on the Sph-12 at 60 MHz by setting εID to be 0.01, 0.001 and 0.0001, respectively. The average mesh size of the artificial sphere is set to be 0.15 λ according to the results in Section 5.1. It is shown in Table 2 that the accuracy of the ID-MLFMA is acceptable for the Sph-12 even when εID = 0.01. The accuracy can be further optimized by decreasing the threshold. The numerical results justify the above statement, as can be seen from Table 2. In summary, we conclude that the error of ID-MLFMA can be well-controlled since both the ID and the MLFMA are error controllable.

Table 2. Impacts of the εID on the ID-MLFMA
Memory (MB)
Time (s)
δJ (%)1.430.320.05
δRCS (%)1.220.300.05

[33] In the following computations, the mesh size of the artificial sphere is set to be 0.15λ, and a threshold of 0.001 is used in the ID-MLFMA computations according to the investigations in Section 5.1 and Section 5.2.

5.3. Comparison Between ID and Pivoted QR Factorization

[34] The outgoing and incoming projection matrices can also be computed by the pivoted QR factorization [Martinsson and Rokhlin, 2005, 2007; Greengard et al., 2009]. However, it is revealed by Liberty et al. [2007] that he ID always exhibits a higher efficiency than the pivoted QR, especially when the objective matrix is considerably rank deficient. For a 4096 × 4096 matrix with an effective rank of 248, the ID is 11 times faster than the pivoted QR as shown by Liberty et al. [2007]. In the following experiments, we compare the efficiency of the ID and the pivoted QR by computing the skeletons of Sph-12 at 60 MHz. In regards to the size of matrix Zq, Nq is no more than 300 and Ntot is 10800. The statistics shown in Table 3 for computations at the 4th level, where 1160 non-empty boxes reside, reinforce the statement by Liberty et al. [2007].

Table 3. CPU Time (s) Used by the ID and the Pivoted QR
 Average Size of Mesh (λ)
Pivoted QR3298501607

5.4. Performance of ID-MLFMA

[35] To investigate the performance of the ID-MLFMA, we calculate the scattering from Sph-12 under three different frequencies: 90 MHz, 60 MHz and 30 MHz. RCS are also calculated by the traditional MLFMA as comparison. Tables 4 and 5 list the resources required by the traditional MLFMA and by the ID-MLFMA. The ID-MLFMA shows superior performance compared with the traditional MLFMA. For example, ID-MLFMA needs less than 2.7 GB memory while it is estimated that the traditional MLFMA requires 47.9 GB memory for the NFI matrix in the 30 MHz case. At the same time, computations in the 90 MHz and 60 MHz also exhibit that the ID-MLFMA is capable of saving memory and accelerating the calculations.

Table 4. Computational Statistics of the MLFMA on the Sph-12
 Frequency (MHz)
  • a

    Computations cannot be completed because of limited memory.

Average size of mesh (λ)
Finest MLFMA level3rd3rd2nd
Finest MLFMA box size (λ)0.450.300.3
MMLFMA-NFI (MB)107951079547905
TMLFMA-NFI (s)37555247a
TMVM (s)14.314.3a
Ttot (s)41635663a
Table 5. Computational Statistics of the ID-MLFMA on the Sph-12
 Frequency (MHz)
Finest ID level4th5th5th
Memory (MB)
Time (s)
δJ (%)0.110.320.44
δRCS (%)0.110.300.41

[36] Wide band computations are conducted on the NASA almond to further investigate the performance the ID-MLFMA. The frequency is swept from 1.4 GHz to 14.0 GHz under the stepping of 200 MHz. The almond is model by 37,122 unknowns with an average mesh size of about 0.1λ at 14.0 GHz and 0.015λ at 1.4 GHz. A 6-level oct-tree is constructed for all the ID-MLFMA computations. Since the computations are costly, only 8 sampling points, 1.4 GHz, 2.0 GHz, 4.0 GHz, 6.0 GHz, 8.0 GHz, 10.0 GHz, 12.0 GHz and 14.0 GHz, are computed by the MLFMA. In the MLFMA computations, the division of boxes stops when the leaf box size reaches about 0.29λ. As a result, a 6-level MLFMA is used in the 14.0 GHz case, while a 3-level one in the 1.4 GHz case. Figure 6 presents the monostatic RCS results at the direction of (90°, 30°) under different frequencies. It shows that the RCS results computed by the ID-MLFMA agree very well with those by the conventional MLFMA. Figure 7 plots MID against MMLFMA-NFI under different frequencies. In all ID-MLFMA computations, MID-NFI = 41(MB). It is indicated from Figure 7 that the memory consumption is significantly reduced when the frequency is low. For example, the MLFMA needs 7326 MB memory to store the corresponding NFI matrix in the 1.4 GHz case. However, it is reduced to 491 MB, a factor of 15.0, when the ID is employed. Figure 8 lists the statistics on TMLFMA-NFI and TID, where TID = TID-NFI + Tproj + Tsamp. Figure 9 gives statistics on TMVM. As shown in Figure 8 and Figure 9, the ID-MLFMA saves the total solution time in two aspects. On one hand, CPU time can be reduced in the ID-MLFMA because only a small portion of MLFMA-NFI matrix elements are required to be evaluated. In the 1.4 GHz case, the ID-MLFMA cuts down the matrix filling time from 5840 s to 2593 s by a factor of over 2.0 compared with the MLFMA. On the other hand, the MVM is accelerated greatly by the approximation. As shown in Figure 9, the time for one MVM (the time for FFI is not included in the statistics) is reduced from 9.8 s to 0.9 s for the 1.4 GHz case, a factor of over 11.0, after the approximation.

Figure 6.

Monostatic RCS by the NASA almond under different frequencies.

Figure 7.

Memory for the MoM matrix under different frequencies for the NASA almond.

Figure 8.

Time filling the MoM matrix under different frequencies for the NASA almond.

Figure 9.

Time for one MVM under different frequencies for the NASA almond.

[37] The RCS from a ship model is calculated to demonstrate the capability of the ID-MLFMA. The incident plane wave illuminates the ship from the direction of (90°, 90°) at the frequency of 200 MHz. The ship, 100 meters long in the largest dimension, is simulated using 94,008 unknowns. Because there are many fine geometrical details, the resulting mesh as expected is drastically non-uniform, as shown in Figure 10. The longest edge is about 0.1λ, and the shortest one is less than 1/100λ. This is a typical multiscale application. The mesh is generated in such a manner that only the tiny structures are overly meshed to capture their geometrical shapes while minimizing the total number of unknowns. We employ this ship model to exhibit the capability of the ID-MLFMA in solving multiscale problems. Whether the mesh is the most efficient one to conduct the simulation at a certain accuracy is beyond the scope of current study.

Figure 10.

The ship model.

[38] A 5-level tree is used for the pure MLFMA while a 7-level one is constructed for the ID-MLFMA. Namely, the 4th level is the finest MLFMA level, and it is the transition level in the ID-MLFMA computation. The statistics on the computational resource are listed in Table 6. The computed RCS is presented in Figure 11. As shown in Table 6, the relative RCS error against the results from the MLFMA is about 0.6%. The compression ratio of memory for the MoM matrix is over 6.0. The CPU time to fill this matrix is cut down by a factor of 2.4. The MVM is accelerated by a factor of 6.5. As a result, the total solution time is reduced by a factor of over 3.0. Obviously, the acceleration rate on total solution time can reach 6.5 for monostatic RCS calculations where the cost of iterations dominates that of the whole computation.

Table 6. Computational Statistics on the Ship Model
MMLFMA-NFI / MID (MB)120851950
TMLFMA-NFI / TID (s)42471753
TMVM (s)16.22.5
Ttot (s)78822504
δJ (%)0.61
δRCS (%)0.60
Figure 11.

Bi-static RCS from the ship model.

[39] In particular, our experiments reveal that the proposed ID-MLFMA can significantly reduce the memory requirement as well as the total solution time compared with the MLFMA. The total solution time is saved not only on NFI matrix filling but also on the MVM for the iterations. This does not always hold true for other low-frequency fast algorithms. It was reported by Vikram et al. [2009] that the CPU time for every MVM was increased by the MLFMA based on accelerated Cartesian expansion (ACE) because of the additional computational cost at ACE levels in comparison to the MLFMA. As indicated by Hu et al. [2001], the memory requirement and the CPU time for one MVM with respect to the number of unknowns in the multilevel FIPWA was almost the same as the traditional MLFMA.

[40] Just as other fast algorithms [Hu et al., 2001; Jiang and Chew, 2004], the loop-tree method can be integrated with the ID-MLFMA to solve the low-frequency MoM accuracy problem [Zhao and Chew, 2000] when the frequency becomes smaller.

6. Conclusions

[41] The ID-MLFMA proposed here consists of combining the interpolative decomposition (ID) with the conventional MLFMA for multiscale problems. Through the ID, the effective unknowns, referred to as skeletons, are obtained to approximate the coupling among well-separated boxes at the ID levels. With an artificial sphere, skeletons are constructed without evaluating entries of the MLFMA-NFI matrix. In particular, the memory consumed by the skeletonized coupling submatrices at the ID levels can be greatly reduced. Furthermore, the matrix filling becomes less expensive and the matrix–vector multiplication is accelerated. Also, numerical experiments show that the ID-MLFMA is error controllable. Moreover, the ID-MLFMA's accuracy is quite insensitive to the mesh employed for the artificial sphere. It is indicated that the ID-MLFMA is much more efficient than the conventional MLFMA in terms of memory consumption and total solution time for multiscale problems. Future work can be extended to handle radiation problems where non-uniform meshes are generally required to accurately model the details in the neighborhood of the feed point. Additionally, some techniques, such as loop-tree basis, can be incorporated into the ID-MLFMA to solve the low-frequency MoM accuracy problem.


The memory for all MLFMA-NFI block matrices (Zo,s(s ∈ Bo))


The memory for all ID-NFI block matrices (Zq,p(p ∈ Bq)) at the finest ID level


The memory for projection matrices R/L at all ID levels


The memory for sampling matrices S at all ID levels


The CPU time filling the MLFMA-NFI matrix


The CPU time filling the NFI matrix at the finest ID level


The CPU time constructing projection matrices R/L at all ID levels


The CPU time filling sampling matrices S at all ID levels


The CPU time for one MVM excluding the time for the corresponding MLFMA-FFI


The total solution time


[42] This work was supported by the NSFC under grant 10832002, by the NSFC under grant 60901005, by the excellent young scholars research fund of Beijing Institute of Technology under grant 2010YS0502, and by the basic research fund of Beijing Institute of Technology under grant 20090542001.