Designing and implementing lightweight kernels for capability computing
Article first published online: 6 AUG 2008
Copyright © 2008 John Wiley & Sons, Ltd.
Concurrency and Computation: Practice and Experience
Volume 21, Issue 6, pages 793–817, 25 April 2009
How to Cite
Riesen, R., Brightwell, R., Bridges, P. G., Hudson, T., Maccabe, A. B., Widener, P. M. and Ferreira, K. (2009), Designing and implementing lightweight kernels for capability computing. Concurrency Computat.: Pract. Exper., 21: 793–817. doi: 10.1002/cpe.1361
- Issue published online: 18 MAR 2009
- Article first published online: 6 AUG 2008
- Manuscript Accepted: 11 MAY 2008
- Manuscript Revised: 6 MAY 2008
- Manuscript Received: 7 MAR 2008
- United States Department of Energy's National Nuclear Security Administration. Grant Number: DE-AC04-94AL85000
- parallel computing;
- operating systems
In the early 1990s, researchers at Sandia National Laboratories and the University of New Mexico began development of customized system software for massively parallel ‘capability’ computing platforms. These lightweight kernels have proven to be essential for delivering the full power of the underlying hardware to applications. This claim is underscored by the success of several supercomputers, including the Intel Paragon, Intel Accelerated Strategic Computing Initiative Red, and the Cray XT series of systems, each having established a new standard for high-performance computing upon introduction. In this paper, we describe our approach to lightweight compute node kernel design and discuss the design principles that have guided several generations of implementation and deployment. A broad strategy of operating system specialization has led to a focus on user-level resource management, deterministic behavior, and scalable system services. The relative importance of each of these areas has changed over the years in response to changes in applications and hardware and system architecture. We detail our approach and the associated principles, describe how our application of these principles has changed over time, and provide design and performance comparisons to contemporaneous supercomputing operating systems. Copyright © 2008 John Wiley & Sons, Ltd.