We have developed a micromagnetic simulator for graphical processing units (GPU), using the CUDA framework. In this paper, we discuss the optimization of the effective field calculation, both from a mathematical and from a hardware-specific point of view. By using a finite-difference discretization scheme, the long-range magnetostatic field can be calculated using fast Fourier transforms, an approach well suited for the GPU. We show how the implementation can be tuned to the GPU hardware and how the performance can be further increased by dealing with the large number of zeros that typically occurs in the micromagnetic field computation. Additionally, we show how the ferromagnetic exchange interaction can be readily included in the magnetostatic field calculation without any additional computational cost. The resulting high-performance software can be used to run large-scale simulations that would have been very time-consuming on regular CPU hardware. As an example, we present a case study on the de-pinning of domain walls in racetrack memory devices. Copyright © 2012 John Wiley & Sons, Ltd.