## 1 Introduction

[2] Spherical harmonics are the eigenfunctions of the Laplace operator on the 2-sphere. They form a basis and are useful and convenient to describe data on a sphere in a consistent way in spectral space. Spherical harmonic transforms (SHT) are the spherical counterpart of the Fourier transform, casting spatial data to the spectral domain and vice versa. They are commonly used in various pseudospectral direct numerical simulations in spherical geometry, for simulating the Sun or the liquid core of the Earth among others [*Glatzmaier*, 1984; *Sakuraba*, 1999; *Christensen et al*., 2001; *Brun & Rempel*, 2009; *Wicht & Tilgner*, 2010].

[3] All numerical simulations that take advantage of spherical harmonics use the classical Gauss-Legendre algorithm (see section 2) with complexity for a truncation at spherical harmonic degree *N*. As a consequence of this high computational cost when *N* increases, high-resolution spherical codes currently spend most of their time performing SHT. A few years ago, state-of-the-art numerical simulations used *N* = 255 [*Sakuraba & Roberts*, 2009].

[4] However, there exist several asymptotically fast algorithms [*Driscoll & Healy*, 1994; *Potts et al*., 1998; *Mohlenkamp*, 1999; *Suda & Takami*, 2002; *Healy et al*., 2003; Tygert, 2008], but the overhead for these fast algorithms is such that they do not claim to be effectively faster for *N* < 512. In addition, some of them lack stability (the error becomes too large even for moderate *N*) and flexibility (e.g., *N* + 1 must be a power of 2).

[5] Among the asymptotically fast algorithms, only two have open-source implementations, and the only one that seems to perform reasonably well is SpharmonicKit, based on the algorithms described by Healy et al. [*Healy et al*., 2003]. Its main drawback is the need of a latitudinal grid of size 2(*N* + 1), while the Gauss-Legendre quadrature allows the use of only *N* + 1 collocation points. Thus, even if it were as fast as the Gauss-Legendre approach for the same truncation *N*, the overall numerical simulation would be slower because it would operate on twice as many points. These facts explain why the Gauss-Legendre algorithm is still the most efficient solution for numerical simulations.

[6] A recent paper [*Dickson et al*., 2011] reports that a carefully tuned software could finally run nine times faster on the same CPU than the initial nonoptimized version, and insists on the importance of vectorization and careful optimization of the code. As the goal of this work is to speed up numerical simulations, we have written a highly optimized and explicitly vectorized version of the Gauss-Legendre SHT algorithm. The next section recalls the basics of spherical harmonic transforms. We then describe the optimizations we used and compare the performance of our transform to other SHT implementations. We conclude this paper by a short summary and perspectives for future developments.