SEARCH

SEARCH BY CITATION

Keywords:

  • FFT;
  • distributed transpose;
  • MPI; communication policy

Abstract

The best approach to parallelize multidimensional FFT algorithms has long been under debate. Distributed transposes are widely used, but they also vary in communication policies and hence performance. In this work we analyze the impact of different redistribution strategies on the performance of parallel FFT, on various machine architectures. We found that some redistribution strategies were consistently superior, while some others were unexpectedly inferior. An in-depth investigation into the reasons for this behavior is included in this work. Copyright © 2001 John Wiley & Sons, Ltd.