Research Article
369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’
Article first published online: 14 AUG 2009
DOI: 10.1002/cpe.1483
Copyright © 2009 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Special Issue: Exploring the Frontiers of Computing Science and Technology: Adapting Emerging Multi-and Many-core Processors
Volume 21, Issue 17, pages 2143–2159, 10 December 2009
Additional Information
How to Cite
Germann, T. C., Kadau, K. and Swaminarayan, S. (2009), 369 Tflop/s molecular dynamics simulations on the petaflop hybrid supercomputer ‘Roadrunner’. Concurrency Computat.: Pract. Exper., 21: 2143–2159. doi: 10.1002/cpe.1483
Publication History
- Issue published online: 22 OCT 2009
- Article first published online: 14 AUG 2009
- Manuscript Accepted: 23 JUN 2009
- Manuscript Received: 16 FEB 2009
Funded by
- U.S. Department of Energy, NNSA Advanced Simulation and Computing Program. Grant Number: DE-AC52-06NA25396
- Abstract
- Article
- References
- Cited By
Keywords:
- heterogeneous computing;
- cell processor;
- Roadrunner;
- molecular dynamics
Abstract
We describe the implementation of a short-range parallel molecular dynamics (MD) code, SPaSM, on the heterogeneous general-purpose Roadrunner supercomputer. Each Roadrunner ‘TriBlade’ compute node consists of two AMD Opteron dual-core microprocessors and four IBM PowerXCell 8i enhanced Cell microprocessors (each consisting of one PPU and eight SPU cores), so that there are four MPI ranks per node, each with one Opteron and one Cell. We will briefly describe the Roadrunner architecture and some of the initial hybrid programming approaches that have been taken, focusing on the SPaSM application as a case study. An initial ‘evolutionary’ port, in which the existing legacy code runs with minor modifications on the Opterons and the Cells are only used to compute interatomic forces, achieves roughly a 2× speedup over the unaccelerated code. On the other hand, our ‘revolutionary’ implementation adopts a Cell-centric view, with data structures optimized for, and living on, the Cells. The Opterons are mainly used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard–Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), nearly 10× faster than the unaccelerated (Opteron-only) version. Copyright © 2009 John Wiley & Sons, Ltd.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)