Article
Computation and data movement on RP3
Article first published online: 24 OCT 2006
DOI: 10.1002/cpe.4330040105
Copyright © 1992 John Wiley & Sons, Ltd
Additional Information
How to Cite
Brochard, L. and Freau, A. (1992), Computation and data movement on RP3. Concurrency: Practice and Experience, 4: 57–78. doi: 10.1002/cpe.4330040105
Publication History
- Issue published online: 24 OCT 2006
- Article first published online: 24 OCT 2006
- Manuscript Revised: 15 MAY 1991
- Manuscript Received: 7 FEB 1990
- Abstract
- References
- Cited By
Abstract
We present in this paper a study of the computation and communication costs on RP3 and on some issues about algorithm designs on a three-level memory hierarchy multi-processor. Using very simple algorithms (vector-add, vector-sum, saxpy, … ), we compare different implementations which differ on data localization (global or local) and data cacheability (cacheable or non-cacheable). This comparison is done using a performance monitoring system (VPMC) that records instructions, data movement, cache requests and misses. The output of the VPMC was then used as input to an analytical performance model which we used to compute the elemental computation and communication times of every basic algorithm. Regarding cacheability (marking the data cacheable instead of non-cacheable), we found it worthwhile as long as data are blocked adequately. For our simple 1-D data structures, a block size equal to a multiple of the cache line size gives the best results. However, considering possible load imbalance, a block size equal to the cache line seems optimal. Regarding localization (copying data from global to local, working on local data instead of global and copying data back), we found it ineffective, at least with the RP3 local and global communication speed ratios (1:10:15).

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)