Research Article
Finite element assembly strategies on multi-core and many-core architectures
Article first published online: 19 JAN 2012
DOI: 10.1002/fld.3648
Copyright © 2012 John Wiley & Sons, Ltd.
Issue

International Journal for Numerical Methods in Fluids
Early View (Online Version of Record published before inclusion in an issue)
Additional Information
How to Cite
Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D. and Sherwin, S.J. (2012), Finite element assembly strategies on multi-core and many-core architectures. Int. J. Numer. Meth. Fluids. doi: 10.1002/fld.3648
Publication History
- Article first published online: 19 JAN 2012
- Manuscript Accepted: 21 DEC 2011
- Manuscript Revised: 13 DEC 2011
- Manuscript Received: 23 JUN 2011
- Abstract
- Article
- References
- Cited By
Keywords:
- FEM;
- GPU;
- multi-core;
- many-core
SUMMARY
We demonstrate that radically differing implementations of finite element methods (FEMs) are needed on multi-core (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our numerical investigations using a finite element advection–diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and diverse algorithmic choices that cut across the high-level structure of the implementation. Making these commitments to achieve high performance for a single architecture leads to a loss of performance portability. Data structures that include redundant data but enable coalesced memory accesses are faster on many-core architectures, whereas redundancy-free data structures that are accessed indirectly are faster on multi-core architectures. The Addto algorithm for global assembly is optimal on multi-core architectures, whereas the Local Matrix Approach is optimal on many-core architectures despite requiring more computation than the Addto algorithm. These results demonstrate the value in making the correct choice of algorithm and data structure when implementing FEMs, spectral element methods and low-order discontinuous Galerkin methods on modern high-performance architectures. Copyright © 2012 John Wiley & Sons, Ltd.

1097-0363/asset/FLD_left.gif?v=1&s=acdf92c67291aadb6aee12c67965913ef3672990)