Research Article
PetFMM—A dynamically load-balancing parallel fast multipole library
Article first published online: 1 SEP 2010
DOI: 10.1002/nme.2972
Copyright © 2010 John Wiley & Sons, Ltd.
Issue

International Journal for Numerical Methods in Engineering
Volume 85, Issue 4, pages 403–428, 28 January 2011
Additional Information
How to Cite
Cruz, F. A., Knepley, M. G. and Barba, L. A. (2011), PetFMM—A dynamically load-balancing parallel fast multipole library. International Journal for Numerical Methods in Engineering, 85: 403–428. doi: 10.1002/nme.2972
Publication History
- Issue published online: 28 DEC 2010
- Article first published online: 1 SEP 2010
- Manuscript Accepted: 28 MAY 2010
- Manuscript Revised: 11 MAY 2010
- Manuscript Received: 5 OCT 2009
- Abstract
- Article
- References
- Cited By
Keywords:
- fast multipole method;
- order-N algorithms;
- particle methods;
- vortex method;
- hierarchical algorithms;
- parallel computing;
- dynamic load balancing
Abstract
Fast algorithms for the computation of N-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this latter class belongs the well-known fast multipole method (FMM), which offers (N) complexity. The FMM is a complex algorithm, and the programming difficulty associated with it has arguably diminished its impact, being a barrier for adoption. This paper presents an extensible parallel library for N-body interactions utilizing the FMM algorithm. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based N-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by N vortex particles in two dimensions, ample verification and testing of the library was performed. Strong scaling results are presented with 10 million particles on up to 256 processors, including both speedup and parallel efficiency. The largest problem size that has been run with the PetFMM library at this point was 64 million particles in 64 processors. The library is currently able to achieve over 85% parallel efficiency for 64 processes. The performance study, computational model, and application demonstrations presented in this paper are limited to 2D. However, the software architecture was designed to make an extension of this work to 3D straightforward, as the framework is templated over the dimension. The software library is open source under the PETSc license, even less restrictive than the BSD license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications. Copyright © 2010 John Wiley & Sons, Ltd.

1097-0207/asset/NME_left.gif?v=1&s=d337defb1fdf8b3424a76d74a4a8200ecaa73ed0)