The nonequispaced FFT on graphics processing units



Without doubt, the fast Fourier transform (FFT) belongs to the algorithms with large impact on science and engineering. By appropriate approximations, this scheme has been generalized for arbitrary spatial sampling points. This so called nonequispaced FFT is the core of the sequential NFFT3 library and we discuss its computational costs in detail. On the other hand, programmable graphics processing units have evolved into highly parallel, multithreaded, manycore processors with enormous computational capacity and very high memory bandwidth. By means of the so called Compute Unified Device Architecture (CUDA), we parallelized the nonequispaced FFT using the CUDA FFT library and a dedicated parallelization of the approximation scheme. (© 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim)