Testing normality in any dimension by Fourier methods in a multivariate Stein equation

We study a novel class of affine invariant and consistent tests for multivariate normality. The tests are based on a characterization of the standard $d$-variate normal distribution by means of the unique solution of an initial value problem connected to a partial differential equation, which is motivated by a multivariate Stein equation. The test criterion is a suitably weighted $L^2$-statistic. We derive the limit distribution of the test statistic under the null hypothesis as well as under contiguous and fixed alternatives to normality. A consistent estimator of the limiting variance under fixed alternatives as well as an asymptotic confidence interval of the distance of an underlying alternative with respect to the multivariate normal law is derived. In simulation studies, we show that the tests are strong in comparison with prominent competitors, and that the empirical coverage rate of the asymptotic confidence interval converges to the nominal level. We present a real data example, and we outline topics for further research.

based on X 1 , . . . , X n , against general alternatives. The purpose of this paper is to introduce and study a novel class of affine invariant and consistent tests based on a partial differential equation (PDE) that determines the characteristic function of the multivariate standard normal law. We write ∇ for the gradient operator and consider for f ∈ L 2 (R d ) the initial value problem of the PDE Note that the operator Af (x) = (x + ∇)f (x) is a multivariate Stein operator in the following sense: For a centred random vector X with E[XX ] = I d , which has a differentiable density with full support R d , we have E[Af (X)] = E[Xf (X) + ∇f (X)] = 0 for each function f with existing derivatives in every direction and for which all occurring expectations exist, if and only if X has the normal distribution N d (0, I d ), see Theorem 3.5 in [43] as well as [37,40,52] for more information on the multivariate Stein lemma. Here and in the following the symbol means transposition of column vectors and matrices. In the spirit of the Stein-Tikhomirov method, see [1,19], and hence using the characteristic functions {exp(it x), t ∈ R d } as test functions, a simple calculation shows the equivalence of the Stein equation to the initial value problem in (1). In the case d = 1 the same initial value problem was motivated by a fixed point of the zero bias transform in [17]. For more information on the zero bias transform, see [21,50].
Theorem 1.1. The characteristic function of the d-variate standard normal distribution N d (0,I d ) is the only solution of (1).
Proof. If f ∈ L 2 (R d ) is an arbitrary solution of (1), the product rule yields In view of f (0) = 1, we have exp( t 2 /2)f (t) = 1, and the assertion follows.
According to Theorem 1.1, the characteristic function (CF) of the d-variate standard normal distribution is the only CF satisfying ∇ψ(t) = −tψ(t). Our test statistic will be based on this equation. To achieve affine invariance of the test statistic with respect to full rank affine transformations of X 1 , . . . , X n , let Y n,j := S −1/2 n (X j − X n ), j = 1, ..., n, denote the so-called scaled residuals, where X = n −1 n j=1 X j and S n := n −1 n j=1 (X j − X n )(X j − X n ) stand for the sample mean and the sample covariance matrix of X 1 , . . . , X n , respectively. The matrix S −1/2 n is the unique symmetric positive definite square root of S −1 n . To ensure almost sure invertibility of S n , we tacitly assume n ≥ d + 1 in what follows, see [15]. Writing for the empirical CF of Y n,1 , . . . , Y n,n , our test statistic is correlations, see [56], and the notion of energy, see [54], are other approaches to this testing problem. Empirical competitive Monte Carlo studies can be found in [18,58].
The rest of this paper unfolds as follows: In Section 2, we give a representation of T n,a that is amenable for computational purposes. Moreover, we derive limits of T n,a , after suitable affine transformations, as a → ∞ and a → 0, that hold elementwise on the underlying probability space. Section 3 deals with the limit distribution of T n,a under the null hypothesis, and Section 4 considers the limit behavior of T n,a both under contiguous and fixed alternatives to H 0 . Section 5 presents the results of a simulation study, and Section 6 exhibits a real data example. Section 7 contains a brief summary, and it indicates topics for further research. For the sake of readability, some of the proofs have been deferred to Appendix A.
Throughout the paper, we use the following notation: The symbol D = means equality in distribution, and P −→ and a.s.
−→ stand for convergence in probability and almost sure convergence, respectively. Moreover, D −→ is shorthand for convergence in distribution for random elements in whatever space (which will be clear from the context). If not stated otherwise, each limit refers to n → ∞, and each unspecified integral is over R d . The stochastic Landau symbols o P (1) and O P (1) refer to convergence to zero in probability and stochastic boundedness, respectively.

Basic properties of the test statistic
In this section, we provide some information on the test statistic T n,a defined in (4). The first result shows that T n,a allows for a simple representation that is amenable to computational purposes. Moreover, since this representation shows that T n,a depends on X 1 , . . . , X n only via Y n,i Y n,j , i, j ∈ {1, . . . , n}, the statistic T n,a is affine invariant. Note that this representation is implemented in the R package mnt, see [9]. The proof of Theorem 2.1 is given in Appendix A.
We now consider the elementwise limits (on the underlying probability space) of T n,a for fixed n as a → ∞ and a → 0. It will bee seen that the class of tests based on T n,a is 'closed at the boundaries' a → ∞ and a → 0 in the sense that, after suitable affine transformations, there are well-defined 'limit statistics'. Our first result refers to the limit a → ∞.
Proof. Invoking (5), it follows that =: A n − B n + C n (say). We now use a a + 1 as a → ∞ and as x → 0, and we employ the identities n j=1 Y n,j = 0, n j=1 Y n,j 2 = nd as well as Upon combining, the assertion follows.
Notice that the right hand side of (6) is a linear combination of two time-honored measures of multivariate skewness. Notably, the same linear combination showed up not only for the class of BHEP tests (see Theorem 2.1 of [27]), but also as a limit of a related test statistic in connection with a test for multivariate normality based on a partial differential equation for the moment generating function of the normal distribution, see [31].
Regarding the limit of T n,a as a → 0, we have the following result. In this section we derive the limit distribution of T n,a under the hypothesis H 0 . In view of affine invariance, we assume without loss of generality that X has the standard normal distribution N d (0, I d ) in what follows. The starting point is an alternative representation of T n,a , namely where Z n (t) = 1 √ n n j=1 Y n,j cos(t Y n,j ) + sin(t Y n,j ) − tψ(t) .
This assertion follows from straightforward calculations using Writing L 2 := L 2 (R d , B d , w a (t)dt) for the separable Hilbert space of (equivalence classes of) functions f : R d → R that are square integrable with respect to w a (t)dt, we regard Z n as a random element of the Hilbert space H = L 2 ⊗ · · · ⊗ L 2 . Putting f = (f 1 , . . . , f d ), g = (g 1 , . . . , g d ), the space H is equipped with the inner product f, g H := The main theorem of this section is as follows: where Z n is the random element defined in (10).
Since the proof of Theorem 3.1 is long and tedious, it is deferred to Appendix A. From Theorem 3.1 and the continuous mapping theorem, we obtain the following result.
It is well-known that the distribution of T ∞,a : . is a sequence of i.i.d. standard normal random variables, and λ 1 (a), λ 2 (a), . . . are the positive eigenvalues associated with the integral operator f ∈ H. In view of the complexity of K(s, t), we did not succeed in obtaining closed-form expressions for these eigenvalues. In our simulation study presented in Section 5, we use approximate critical values for T n,a that have been obtained by means of simulations. Some information on the limit null distribution, however, is given by the following result.
Proof. From Fubini's theorem, it follows that E[T ∞,a ] = E Z(t) 2 w a (t) dt. Moreover, writing tr for trace, we have Since the assertion follows by straightforward computations.
In the univariate case, which is deliberately not excluded from our study, we have been able to calculate the first four cumulants of T ∞,a . By the methods presented in Chapter 5 of [51] the mth cumulant of T ∞,a is derived by Here, h 1 (s, t) = K(s, t), and h m (s, t) : In order to calculate κ m (a), m ∈ {1, 2, 3, 4}, we used the computer algebra system Maple, see [41]. The formulae for κ 3 (a) and κ 4 (a) are given in the appendix.
For κ 1 (a) and κ 2 (a), we obtain   From these cumulants, we obtain the expectation, the variance as well as the skewness β 1 and the kurtosis β 2 of T ∞,a for the case d = 1 (see Table 1), since By complete analogy with [17,24], we can now approximate the distribution of T ∞,a by that of a member of the system of Pearson distributions which has the same first four moments as T ∞,a . To this end, we used the statistic software R, see [47], and the package PearsonDS, see [7]. Table 2 shows the quantiles of the fitted Pearson distribution, which serve as approximations of the corresponding quantiles of the distribution of T ∞,a .

Limit behavior of T n,a under alternatives
In this section, we assume that H 0 does not hold, and we will derive limit distributions for T n,a both under contiguous and fixed alternatives to H 0 . To define the setting for a triangular array of contiguous alternatives, we assume that, for each n ≥ d + 1, X n,1 , ..., X n,n are i.i.d. d-variate random vectors having Lebesgue density Here, ϕ(x) = (2π) −d/2 exp(− x 2 /2), x ∈ R d , is the density of the distribution N d (0, I d ), and g is a bounded measurable function satisfying g(x)ϕ(x) dx = 0. Notice that f n is nonnegative for sufficiently large n due to the boundedness of g. To derive the limit distribution of T n,a under this sequence of alternatives, we employ the representation (9), which comprises the random element Z n as defined in (10). For repeated later use, we put Theorem 4.1. Under the sequence of alternatives (X n,1 , ..., X n,n ) n≥d+1 , we have Here, Z n is defined in (10), Z is the centred Gaussian random element of H figuring in Theorem 3.1, and the shift function c(·) is given by where Proof. We write λ d for d-dimensional Lebesgue measure, and we put P (n) := ⊗(ϕλ d ), Q (n) := ⊗(f n λ d ). Furthermore, let L n := dQ (n) /dP (n) . The boundedness of g and a Taylor expansion then give In the following we write σ 2 = g(x) 2 ϕ(x) dx < ∞. Since, under P (n) , expectation and variance of the sum figuring in (16) converge to −σ 2 /2 and σ 2 , respectively, the Lindeberg-Feller central limit theorem and Slutsky's lemma yield Notice that the boundedness of g ensures the validity of the Lindeberg condition. In view of Le Cam's first lemma (see, e.g., [39], p. 297), the probability measures Q (n) and P (n) are mutually contiguous. According to Theorem 3.1, the auxiliary process Z * n introduced in (38) is tight under P (n) and thus, in view of contiguity, also under Q (n) . Let {e k , k ≥ 1} be an arbitrary complete orthonormal system of H. It remains to show that, for each ≥ 1, we have where Π denotes the orthogonal projection onto the linear subspace of H spanned by e 1 , . . . , e . We first consider where Z * n is given in (38), with the only difference that X j is throughout replaced with X n,j . In view of Theorem 3.1, the asymptotic distribution of Z * n under P (n) is centred Gaussian with a covariance operator K given by the covariance where, by Fubini's theorem, c j := lim n→∞ E Z * n , e j H , log(L n ) = c, e j H , and c is given in (15). According to Le Cam's third Lemma (see, e.g., [39], p. 300), it follows that Z * n , e 1 H , ..., Z * n , e H The assertion now follows from Slutsky's lemma since, in view of 41 and 42, Z n − Z * n H is asymptotically negligible under P (n) and thus, because of contiguity, also under Q (n) .
As a corollary, we have the following result.
We now consider fixed alternatives to H 0 , and we suppose that the underlying distribution, in addition to being absolutely continuous, satisfies E X 4 < ∞. In view of affine invariance, we assume E[X] = 0 and E[XX ] = I d . Our first result is a strong limit of T n,a /n as n → ∞. and Proof. Invoking (9), we have n −1 T n,a = n −1/2 Z n 2 H , where Z n is given in (10). −→ 0. To this end, notice that Since E X 4 < ∞, Theorem 5.2 of [6] yields n −1/4 max j=1,...,n X j a.s.
Since the right hand side converges to 0 Palmost surely according to Proposition A.1 of [14], it follows that n −1 n j=1 ∆ n,j CS −→ 0 now follows from the triangle inequality.
As a corollary, we obtain the following result.
Corollary 4.4. The test for multivariate normality based on T n,a is consistent against each alternative distribution satisfying E X 4 < ∞.
Proof. Let ψ X (t) = E[exp(it X)] be the CF of X. By straightforward calculations, we have where ∆ a is given in (19). Since ∆ a = 0 if and only if X , the assertion follows. Notice that, for each a > 0, ∆ a may be regarded as a measure of deviation from normality. The following result sheds some more light on ∆ a .
as well as Proof. Straightforward calculations give ∆ a = I a,1 − I a,2 + I a,3 , where Using addition theorems for the sine and the cosine function as well as (11) and (35), (36) und (37), it follows that The Taylor expansions (7) und (8), together with E[X] = 0, E[XX ] = I d and E X 6 < ∞ then yield Upon summarizing, the assertion follows. The second statement is proved following similar arguments.
We remark in passing that the first term on the right hand side of (20) is the population measure of multivariate skewness in the sense of Móri, Rohatgi, and Székely [44], and E[(X 1 X 2 ) 3 ] is population skewness in the sense of Mardia [42]. Thus, Theorem 4.5 can be regarded as the 'population counterpart' of Theorems 2.2 and 2.3.
[3] observed that, in the context of goodness-of-fit testing of a general parametric hypothesis H 0 (say), weighted L 2statistics have a normal limit under fixed alternatives to H 0 . To state such a theorem in our case, we first introduce some notation. Again, we write ψ X (t) = E[exp(it X)] for the CF of X and put ψ ± We then have the following result. Here, and L(s, t) is defined in (22).
Proof. The basic observation is that, with Z n defined in (10) and for some centred Gaussian random element V of H having covariance matrix kernel L(s, t) given in (22). The proof of (26) is completely analogous to that of Theorem 3.1 and is therefore omitted. In view of (26), the second summand in (25) is o P (1), and the first converges in distribution to 2 V, z H by the continuous mapping theorem. The distribution of 2 V, z H is the normal distribution N(0, σ 2 a ). Using Slutsky's lemma, Theorem 4.6 yields the following asymptotic confidence interval for ∆ a .
is a consistent sequence of estimators for σ 2 a , and if σ 2 a > 0, then is an asymptotic confidence interval with level 1 − α for ∆ a .
A necessary and sufficient condition for σ 2 a > 0 is that the function R d s → L(s, t)z(t)w a (t) dt does not vanish λ d -almost everywhere, see Remark 1 of [3].
To construct a consistent sequence of estimators for σ 2 a , we replace z(s), z(t) and L(s, t) figuring in (23) with suitable empirical counterparts. In view of (22) and (21) and the fact that ∇ψ + where and Furthermore, let We then have the following result.
where L n (s, t) and z n (t) are defined in (27) and (31), respectively. If E X 4 < ∞, then ( σ 2 n,a ) is a consistent sequence of estimators for σ 2 a , i.e., we have σ 2 n,a P −→ σ 2 a . Moreover, where σ i,j n,a is given in (47).
Since the proof of Theorem 4.8 is long and tedious, it is deferred to Appendix A. We stress that the representation (32) does not comprise any integral, which means that σ 2 n,a is a feasible estimator. We close this section with an example that illustrates the feasibility of the asymptotic confidence interval. To this end, we consider the following standardized symmetric alternatives to normality. Firstly, let X In this case, we have where ∇ϕ(t) (j) is the jth component of ∇ϕ(t). Secondly, we consider a Laplace distribution with i.i.d. marginals, denoted by Finally, let X have a logistic distribution with i.i.d. marginals, denoted by Logistic(0, 3/π) d . In this case, we obtain In each case, ∆ a has been computed by numerical integration. The resulting values are displayed in Table 3.
By means of a Monte Carlo study, we estimated the probability of coverage of the confidence interval I n,a,α figuring in corollary 4.7 for a ∈ {0.5, 1, 2, 5}, d ∈ {1, 2}, and the sample sizes n ∈ {10, 20, 30, 50, 100, 200}. The nominal level is 0.95, and the number of replications is 10000. Simulations have been carried out with the statistic software R, see [47]. In particular, we used the package extraDistr, see [59], to generate variates from the Laplace distribution.
The results are displayed in Table 4. As one can see the empirical coverage is converging to the nominal level, while it is obviously slower in higher dimensions. For larger values of the tuning parameter a the confidence interval tends to be too wide, so we conjecture that an improvement of the asymptotic interval can be found.

Simulations
This section presents the results of a Monte Carlo study, with the aim to compare the power of the proposed test with respect to that of prominent competitors against selected alternatives. We used the statistic software R, see [47], and we employed the package MonteCarlo, see [38], which allows for parallel computing. In addition, we used the package expm, see [22], for the standardization of the data. Critical values for the test statistic have been estimated by means of extensive simulations (100000 replications), and they are displayed in Table 5 for the weight parameters a ∈ {0.5, 1, 2, 5, 10, ∞} and the sample sizes n ∈ {20, 50, 100}. Throughout, the level of significance is α = 0.05. For the sake of comparison, Table 5 displays the approximate critical values of T ∞,a in the special case d = 1, which have been obtained in Section 3 by choosing a distribution of the Pearson family by equating the first four moments. As already mentioned in Section 2, the test statistic T n,∞ is a linear combination of skewness in the sense of Mardia [42] and skewness in the sense of Móri, Rohatgi und Székely [44], and it equals the statistic HV ∞ of Henze-Visagie, see [31].
The first three of these tests are well-known. The CvM-test and the AD-test have been implemented with the Rpackage nortest, see [23], which contains the functions cvm.test and ad.test, and for the SW-test we used the function shapiro.test of the stats-package. The test statistics BHEP and HV will be explained in (33) and (34), respectively.
For the BHEP-test and the HV-test, critical values have been simulated with 100000 replications. These values and those of Table 5 for the novel test statistics have been employed to assess the power of the various tests against several alternatives. Table 6 exhibits percentages of rejection based on 100000 replications. An asterisk denotes power of 100% and the best performing test for each alternative is marked in boldface. The choice of alternatives orients itself towards those used in [31]. The acronym NMix1 denotes a mixture of the normal distributions N(0, 1) and N(3, 1) with weights 0.9 and 0.1, respectively.
The novel tests outperform the selected competitors for the t 3 -distribution, the χ 2 (15)-distribution and the distribution NMix1, and they keep up with the other procedures against the remaining alternatives. For most of the alternatives, power does not change much with varying the weight parameter a. A notable exception is the uniform distribution U(− √ 3, √ 3), against which power breaks down for larger tuning parameters, a feature shared by the HV-test.
A recent synopsis of tests for multivariate normality is given in [18]. Just as the novel procedure, the BHEP-test (see [32]) is based on the empirical characteristic function (ECF). More precisely, it employs the test statistic where ϕ a (t) = (2πa 2 ) −d/2 exp(− t 2 /(2a 2 )), and ψ n (t) and ψ(t) are given in (3) and (2), respectively. An alternative representation for BHEP a is In our study, we used the special value a = 1.
The recent test of Henze-Visagie, see [31], is the 'moment generating function analog' of our novel test statistic. It employs the test statistic where M n (t) = n −1 n j=1 exp(t Y n,j ) is the empirical moment generating function of the scaled residuals. An alternative representation of HV a is In our comparative study, we put a = 5, as recommended in [31].
The rationale of the energy test of Székely and Rizzo, see [54], is based on the fact that, if X and Y are independent integrable d-dimensional random vectors and X , Y denote independent copies of X and Y , respectively, then Here, equality holds if and only if X D = Y . The statistic of the energy test for multivariate normality is Here, Y n,j = n/(n − 1)Y n,j , and Z 1 , Z 2 are i.i.d. with the normal distribution N d (0, I d ), which are also independent of Y n,1 , . . . , Y n,n . To calculate EN, notice that .
The R-package energy [48] contains the function mvnorm.etest to calculate EN. Note that all of the mentioned procedures are also implemented in the R-package mnt, see [9].
Just as done in the case d = 1, we first simulated critical values with 100000 replications. With the same number of replications, we then simulated the power of the tests under discussion against selected alternatives. Again, the choice of alternatives orients itself towards those used in [31]. Tables 7, 8 and 9 display percentages of rejection of H 0 for dimensions d = 2, d = 3 and d = 5, respectively, and an asterisk again denotes power 100%. To generate pseudo random numbers, we used the R-packages mvtnorm, see [20], and PearsonDS, see [7]. Suppressing the dimension d, the distribution NMix1 is a mixture of the normal distributions N d (0, I d ) and N d (3, I d ) with mixing proportions 0.9 and 0.1, respectively. Here, 3 stands for the d-dimensional vector that contains 3 in each component. Likewise, NMix2 denotes a mixture of the normal distributions N d (0, I d ) and N d (0, B d ) with mixing proportions 0.1 and 0.9, respectively. Here, B d is a d × d-matrix with 1 for each diagonal entry and 0.9 for each off-diagonal entry.
The novel tests outperform their competitors for some alternatives, notably for the χ 2 -, the Γ-, and the NMixdistribution, but they can also keep up for the other alternatives. However, just as in the univariate case, power is extremely low against the uniform distribution U(− √ 3, √ 3), a feature shared by the HV-test. Based on the results of this simulation study, we recommend the choice a = 5 for the tuning parameter, since it leads to competitive power against nearly each of the alternatives considered.

A real data example
The Black-Scholes-Merton model is a stochastic model for the dynamics of a financial market that contains derivative investment instruments. One of the basic assumptions of this model is the normality of the log returns of stocks and indexes. To test the hypothesis joint normality of log returns of several indexes, we consider the five stock indexes Standard & Poor 500 (^GSPC), Dow Jones Industrial Average (^DJI), NASDAQ Composite (^IXIC), DAX Perfomance Index (^GDAXI), and EURO STOXX 50 (^STOXX50E), over a period of 50 trading days, starting July 1st, 2017. The data (daily closing prices of the stocks) were obtained by means of the R-package quantmod, see [49]. To model the independence assumption between the realisations, we ignored a time span of 10 trading days between each of the five dimensional observations. Figure 1 shows a plot of the two-dimensional projections of the log returns. For each value a ∈ {0.5, 1, 2, 5, 10} of the weight parameter a, we performed a Monte Carlo simulation based on 100000 replications, in order to estimate the p-value of the observations. The empirical p-values are displayed in Table 10. As can be seen, the hypothesis of a multivariate normality of the log returns of the selected stock prices is rejected at the 1%-level, for each of the choices of the weight parameter a.

Summary and outlook
We propose a novel class of tests of normality based on an initial value problem connected to a multivariate Stein equation, which characterises the multivariate standard normal law. We derived asymptotic theory under the null hypothesis as well as under contiguous and fixed alternatives. Moreover, we proved consistency against each alternative distribution that satisfies a weak moment condition, and we provided insights into the structure of the behaviour of the test statistic under fixed alternatives by calculating asymptotic confidence intervals for ∆ a , and by providing a consistent estimator for the limiting variance σ 2 a . Monte Carlo simulations show that the methods operate as expected, and that the new family of tests is a strong class of competitors to established procedures.
A first open question for further research is to find explicit formulae or numerical stable approximations for the eigenvalues λ j (a), j = 1, 2, . . . connected to the integral operator K in (13). We also leave as an open problem the calculation of higher cumulants of T ∞,a for dimensions d > 1. Results of this kind would open ground to efficient approximation methods for the computation of critical values that avoid Monte Carlo simulations and efficiency statements, since the largest eigenvalue has a crucial influence on the approximate Bahadur efficiency, see [2,45]. An promising new field of interest in connection with tests of multivariate normality is to consider their behaviour in high-dimensional settings, i.e., to answer the question whether one can find a suitable rescaling and shifting of the test statistic to obtain a non trivial limit distribution under a suitable limiting regime, under which, e.g., n, d → ∞ such that d/n → τ ∈ [0, ∞]. For first results, see [10]. As a starting point, we conjecture that for a sequence (n d ) d∈N , Finally, it would be of interest to consider a related family of test statistics, which is given by Thus, the theoretical CF in T n,a has been replaced by the empirical counterpart. Note that in the univariate case, this family is extensively studied in [17], but the generalisation to higher dimensions is still open. We conjecture that similar results as derived in Sections 2 to 4 hold for S n,a .

Acknowledgment
The authors thank Yvik Swan for sharing his knowledge of multivariate Stein operators and Stein characterisations.

A Proofs
A.1 Proof of Theorem 2.1 n,j ) , some algebra (using symmetry and the addition theorem for the cosine function) yields n,j sin(t Y n,j ) We thus have the assertion follows readily.
A.2 Proof of Theorem 3.1 Proof. Recall that, in view of invariance, there is no loss of generality if we assume X D = N d (0, I d ). With the notation in (14), Z n defined in (10) takes the form To prove Theorem 3.1, we use a central limit theorem for Hilbert space valued random elements, see, e.g., Theorem 2.7 of [8]. Since Z n does not comprise independent summands, we approximate Z n by a sum of i.i.d. random elements of H. To this end, we introduce the auxiliary random elements The proof of Theorem 3.1 comprises 3 steps. We show The assertion then follows from Slutsky's lemma. To prove (40), notice that Z * * 1 , Z * * 2 , . . . is a sequence of i.i.d. random elements of H. These elements are centred, since The covariance matrix kernel E Z * n (s)Z * n (t) = E Z * * 1 (s)Z * * 1 (t) = K(s, t) (say), where s, t ∈ R d , is given by Testing normality in any dimension by Fourier methods in a multivariate Stein equation In view of E[X] = 0 and E[XX ] = I d , tedious but straightforward calculations yield Since the occurring expectations are given by = ts + s tI d , some algebra shows that K(s, t) takes the form given in (12). Thus, by the central limit theorem in Hilbert spaces, (40) follows. To prove (41), notice that cos(t Y n,j ) = cos(t X j ) − sin(t X j )t ∆ n,j + ε n,j (t), sin(t Y n,j ) = sin(t X j ) + cos(t X j )t ∆ n,j + η n,j (t), where max(|ε n,j (t)|, |η n,j (t)|) ≤ t 2 ∆ n,j 2 .
−→ 0. Invoking Proposition A.1 of [14], according to which n 1/4 max j=1,...,n ∆ n,j a.s. −→ 0 and n j=1 ∆ n,j 2 = O P (1), it is readily seen that each of the expressions A n , B n and C n converges to zero in probability as n → ∞. In view of the proof of (41) is finished. To prove (42), we put Using the triangle inequality, some calculations give Z n − Z * n H ≤ A n H + B n H , and thus (42) follows if we can show that A n H = o P (1) and B n H = o P (1). We only prove A n H = o P (1), since the reasoning for B n H = o P (1) is completely similar. From the definition of ∆ n,j in (39), we have say, and thus it remains to prove that each of A n,k H , k ∈ {1, 2, 3, 4}, is o P (1). Letting · 2 denote the spectral norm, it follows that Here, the first factor on the right hand side is O P (1), and the second converges to zero almost surely because of the strong law of large numbers in H. As for A n,2 2 H , it holds that Here, each of the first two factors on the right hand side are O P (1), and the last one converges to zero almost surely because of the strong law of large numbers in L 2 . The term A n,3 2 H is bounded from above by From display (2.13) of [32], the factor preceding the integral is o P (1), and thus A n,4 2 H = o P (1). The proof of Theorem 3.1 is completed.

A.3 Proof of Theorem 4.8
Proof. Since the proof is analogous to that given in [14], it will only be sketched. The first observation is that the quantities Ψ ,n (t), ∈ {1, 2, 3, 4}, defined in (29), (30) have the following almost sure limits: Here, the convergence of Ψ 3,n (t) is assertion a) of Lemma 6.6 of [14], and the remaining claims follow mutatis mutandis the reasoning given in the proof of Lemma 6.6. of [14]. From (27) and (28), we have where L i,j n (s, t) = L j,i n (t, s) and -putting I ± n,j := Y n,j Y n,j ± I d - Y n,j CS + (s, Y n,j )Y n,j Ψ 1,n (t), Y n,j CS + (s, Y n,j )Ψ 3,n (t) I + n,j , Y n,j Ψ 1,n (s)Y n,j Ψ 1,n (t), From (44), it follows that σ 2 n,a = 5 i,j=1 σ i,j n,a , where σ i,j n,a = 4 z n (s) L i,j n (s, t)z n (t)w a (s)w a (t) ds dt.
Notice that σ i,j n,a = σ j,i n,a . In view of (22) and (21), we have L(s, t) = 5 i,j=1 L i,j (s, t), where L i,j (s, t) = E[w i (s, X)w j (t, X) ], and w 1 (t, X) = XCS + (t, X), w 2 (t, X) = −Xψ + X (t), w 3 (t, X) = −t X∇ψ + X (t), Therefore, σ 2 a = 5 i,j=1 σ i,j a , where σ i,j a = 4 z(s) L i,j (s, t)z(t)w a (s)w a (t) ds dt and, by symmetry, L i,j (s, t) = L j,i (t, s) and hence σ i,j a = σ j,i a . We thus have to prove σ i,j n,a P −→ σ i,j a for each choice of i, j ∈ {1, . . . , 5}. To this end, we proceed in two steps. The first one is to replace L i,j n (s, t) in (45) with L i,j n,0 (s, t). Here, L i,j n,0 (s, t) originates from L i,j n (s, t) by throughout replacing Y n,j with X j , and this replacement also affects the quantities Ψ ,n (t), ∈ {1, . . . , 4}. Moreover, we replace z n (t) with z n,0 (t) = n −1 n j=1 X j CS + (t, X j ) − tψ(t). Putting σ i,j n,0,a = 4 z n,0 (s) L i,j n,0 (s, t)z n,0 (t)w a (s)w a (t) ds dt, it follows from Fubini's theorem that σ i,j n,0,a P −→ σ i,j a . The second, much more technical step is to prove σ i,j n,a − σ i,j n,0,a = o P (1). To this end, notice that z n (s) L i,j n (s, t)z n (t) − z n,0 (s) L i,j n,0 (s, t)z n,0 (t) =z n (s) L i,j n (s, t) − L i,j n,0 (s, t) z n (t) + z n (s) − z n,0 (s) L i,j n,0 (s, t)z n (t) + z n,0 (s) L i,j n,0 (s, t) z n (t) − z n,0 (t) , where z n (s) − z n,0 (s) L i,j n,0 (s, t)z n (t) ≤ z n (s) − z n,0 (s) L i,j n,0 (s, t) 2 z n (t) , z n,0 (s) L i,j n,0 (s, t) z n (t) − z n,0 (t) ≤ z n,0 (s) L i,j n,0 (s, t) 2 z n (t) − z n,0 (t) . We have z n,0 (t) ≤ 2n −1 n j=1 X j + t ψ(t), and a Taylor expansion yields Notice that each of the terms L i,j n,0 (s, t) 2 are bounded from above by terms of the type 2 k s t m , multiplied with finitely many products of the type n −1 n j=1 X j β , with k ≤ 2, , m ∈ {0, 1}, and β ∈ {1, 2, 3, 4}. In view of the condition E X 4 < ∞ and the fact that n −1 n j=1 ∆ n,j k X k a.s.
−→ 0 (see Proposition A.2 of [14]), it follows that As a consequence, we only have to consider the first term on the right hand side of (46). To this end, notice that . To find an upper bound for L i,j n (s, t) − L i,j n,0 (s, t) 2 , we have to consider each case i, j ∈ {1, . . . , 5} such that i ≤ j separately. We will elaborate on the case i = j = 1; the other cases are treated similarly. We have and a Taylor expansion yields L 1,1 n (s, t) − L 1,1 n,0 (s, t) 2 ≤ 4 n n j=1 X j 2 t ∆ n,j + s ∆ n,j + 4 n n j=1 X j 2 s t ∆ n,j 2 + 8 n n j=1 X j ∆ n,j 1 + s ∆ n,j 1 + t ∆ n,j + 4 n n j=1 ∆ n,j 2 1 + s ∆ n,j 1 + t ∆ n,j .

−→ 0.
To prove (32), we need the integrals P i,j 1,a :=Y n,i Y n,j I 1,a (Y n,i , Y n,j ) − L 1,a (Y n,j ) Y n,j , P i,j,k 2,a :=Y n,i Y n,j I 1,a (Y n,i , Y n,k ) − L 1,a (Y n,k ) Y n,j , P i,j,k 3,a :=Y n,i Y n,k Y n,j I 2,a (Y n,i , Y n,k ) − Y n,j L 2,a (Y n,k ),