Employee of Computer Sciences Corporation
Research Article
Performance evaluation of the SX-6 vector architecture for scientific computations
Article first published online: 29 NOV 2004
DOI: 10.1002/cpe.884
Copyright © 2005 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Volume 17, Issue 1, pages 69–93, January 2005
Additional Information
How to Cite
Oliker, L., Canning, A., Carter, J., Shalf, J., Skinner, D., Ethier, S., Biswas, R., Djomehri, J. and Van der Wijngaart, R. (2005), Performance evaluation of the SX-6 vector architecture for scientific computations. Concurrency Computat.: Pract. Exper., 17: 69–93. doi: 10.1002/cpe.884
Publication History
- Issue published online: 29 NOV 2004
- Article first published online: 29 NOV 2004
- Manuscript Accepted: 9 MAR 2004
- Manuscript Revised: 9 FEB 2004
- Manuscript Received: 15 AUG 2003
- Abstract
- References
- Cited By
Keywords:
- microbenchmarks;
- NAS Parallel Benchmarks;
- scientific applications;
- vectorization;
- superscalar performance
Abstract
The growing gap between sustained and peak performance for scientific applications is a well-known problem in high-performance computing. The recent development of parallel vector systems offers the potential to reduce this gap for many computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX-6 vector processor, and compares it against the cache-based IBM Power3 and Power4 superscalar architectures, across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines many low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks. Finally, we evaluate the performance of several scientific computing codes. Overall results demonstrate that the SX-6 achieves high performance on a large fraction of our application suite and often significantly outperforms the cache-based architectures. However, certain classes of applications are not easily amenable to vectorization and would require extensive algorithm and implementation reengineering to utilize the SX-6 effectively. Copyright © 2005 John Wiley & Sons, Ltd.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)