Porting ONETEP to graphical processing unit-based coprocessors. 1. FFT box operations


E-mail: c.skylaris@soton.ac.uk


We present the first graphical processing unit (GPU) coprocessor-enabled version of the Order-N Electronic Total Energy Package (ONETEP) code for linear-scaling first principles quantum mechanical calculations on materials. This work focuses on porting to the GPU the parts of the code that involve atom-localized fast Fourier transform (FFT) operations. These are among the most computationally intensive parts of the code and are used in core algorithms such as the calculation of the charge density, the local potential integrals, the kinetic energy integrals, and the nonorthogonal generalized Wannier function gradient. We have found that direct porting of the isolated FFT operations did not provide any benefit. Instead, it was necessary to tailor the port to each of the aforementioned algorithms to optimize data transfer to and from the GPU. A detailed discussion of the methods used and tests of the resulting performance are presented, which show that individual steps in the relevant algorithms are accelerated by a significant amount. However, the transfer of data between the GPU and host machine is a significant bottleneck in the reported version of the code. In addition, an initial investigation into a dynamic precision scheme for the ONETEP energy calculation has been performed to take advantage of the enhanced single precision capabilities of GPUs. The methods used here result in no disruption to the existing code base. Furthermore, as the developments reported here concern the core algorithms, they will benefit the full range of ONETEP functionality. Our use of a directive-based programming model ensures portability to other forms of coprocessors and will allow this work to form the basis of future developments to the code designed to support emerging high-performance computing platforms.Copyright © 2013 Wiley Periodicals, Inc.