Testing multivariate normality by zeros of the harmonic oscillator in characteristic function spaces

We study a novel class of affine invariant and consistent tests for normality in any dimension. The tests are based on a characterization of the standard $d$-variate normal distribution as the unique solution of an initial value problem of a partial differential equation motivated by the harmonic oscillator, which is a special case of a Schr\"odinger operator. We derive the asymptotic distribution of the test statistics under the hypothesis of normality as well as under fixed and contiguous alternatives. The tests are consistent against general alternatives, exhibit strong power performance for finite samples, and they are applied to a classical data set due to R.A. Fisher. The results can also be used for a neighborhood-of-model validation procedure.


Introduction
The multivariate normal distribution plays a key role in classical and hence widely used procedures.Serious statistical inference that involves the assumption of multivariate normality should therefore start with a test of fit to this model.There is a continuing interest in this testing problem, as evidenced by a multitude of papers.The proposed tests may be roughly classified as follows: [4,7,36,37,47,55] consider tests based on the empirical characteristic function, while [31,32,35] employ the empirical moment generating function.A classical (and still popular) approach is to consider measures of multivariate skewness and kurtosis (see, e.g., [16,38,41,42,43,45]), as supposedly diagnostic tools with regard to the kind of deviation from normality when this hypothesis has been rejected, but the deficiencies of those measures in this regard have been clearly demonstrated (see, e.g., [8,9,26,27,29]).Other approaches involve generalizations of tests for univariate normality [2,39,53], the examination of nonlinearity of dependence [13,19], canonical correlations [57] and the notion of energy, see [54].For a survey of affine invariant tests for multivariate normality, see [30].Monte Carlo studies can be found in [20,44,58].
To be specific, let X, X 1 , . . ., X n , . . .be a sequence of independent identically distributed (i.i.d.) d-dimensional random (column) vectors, which are defined on some common probability space (Ω, A, P).Here, d ≥ 1 is a fixed integer, which means that the univariate case is deliberately not excluded.We write P X for the distribution of X.The d-variate normal distribution with expectation µ and nonsingular covariance matrix Σ will be denoted by N d (µ, Σ).Furthermore, stands for the family of nondegenerate d-variate normal distributions.A check of the assumption of multivariate normality means to test the hypothesis H 0 : against general alternatives.
Writing I d for the unit matrix of order d, our novel idea for testing H 0 is to use a characterization of the Fourier transform of N d (0, I d ) as the unique solution of an initial value problem of a partial differential equation motivated by the harmonic oscillator, which is a special case of a Schrödinger operator.The proposed test statistic is based on the squared norm of a functional of the empirical characteristic function in a suitably weighted L 2 -space.This statistic is close to zero under the hypothesis (1), and rejection will be for large values of the test statistic.
Let L 2 (R d ) be the space of square integrable functions, equipped with the usual norm and scalar product •, • .Consider for sufficiently regular f ∈ L 2 (R d ) the partial differential equation Here, ∆ stands for the Laplace operator, and • denotes the Euclidean norm.Notice that we can rewrite (2) as (−∆ + x 2 − d)f (x) = 0 or, equivalently, as The operator −∆ + x 2 − d is known as the harmonic oscillator, see [24].In the univariate case, (2) reduces to a fixed point problem (or, equivalently, to the problem of finding the eigenfunction that corresponds to the eigenvalue 1) of the Hermite operator, see equation (1.1.9)in [56].The solution of this problem is the 0 th Hermite function, which coincides with the solution given in the following theorem.
Theorem 1.1.The characteristic function of the d-variate standard normal distribution N d (0, I d ) is the unique solution of (2).
Proof.Let f be an arbitrary solution of (2).Writing i for the imaginary unit, we introduce the creation and annihilation operators a j = x j + ip j and a j = x j − ip j , j = 1, . . ., d, where p j = −i ∂ ∂xj , j = 1, . . ., d.For each j ∈ {1, . . ., d} we have So we can rewrite (3) as and thus a j f = 0 for each j ∈ {1, . . ., d}.We therefore have (x + ∇)f = 0, where ∇ denotes the gradient operator.By the last statement and the product rule, it follows that which, in view of the condition f (0) = 1, completes the proof.
Remark 1.2.The operator H = −∆ + x 2 is the Hermite operator in R d , and ψ is the product of the one-dimensional 0 th Hermite functions.Therefore, since Hψ = dψ (as we have shown in Theorem 1.1), ψ is the eigenfunction associated with the eigenvalue d.For details on the Hermite operator in R d and corresponding eigenfunctions, see p. 5 of [56].
In this paper, we study a family of affine invariant test statistics for H 0 that is based on the characterization given in Theorem 1.1.Since the class N d is closed under full rank affine transformations, Theorem 1.1 does not restrict the scope of the testing problem.We make the tacit standing assumption that P X is absolutely continuous with respect to Lebesgue measure, and that n ≥ d + 1.Let X n = n −1 n j=1 X j denote the sample mean and ) the sample covariance matrix of X 1 , . . ., X n , where x means transposition of a column vector x.The assumptions made above guarantee that S n is invertible almost surely, see [18].The test statistic will be based on the so-called scaled residuals which represent an empirical standardization of the data.Here, is the unique symmetric positive definite square root of S −1 n .The test statistic will be based on the empirical characteristic function Notice that an application of the Laplace operator ∆ to ψ n yields Motivated by ( 2) and Theorem 1.1, we propose the weighted L 2 -statistic and a > 0 is a fixed constant.Moreover, |z| is the modulus of a complex number z, and integration is, unless otherwise specified, over R d .In principle, other weight functions than w a are conceivable in the definition of T n,a .Since, for c ∈ R d (see [37], p. 3601), as well as an attractive feature of the choice of w a is that the test statistic takes the simple form which is amenable to computational purposes.Moreover, T n,a depends only on the scalar products , where i, j ∈ {1, . . ., n}.This shows that T n,a is invariant with respect to full rank affine transformations of X 1 , . . ., X n .Moreover, not even the square root S −1/2 n of S −1 n is needed.The rest of the paper is organized as follows: In Section 2, we show that, as the tuning parameter a tends to infinity, the test statistic T n,a , after a suitable scaling, converges to a certain measure of multivariate skewness.On the other hand, a time-honored measure of multivariate kurtosis emerges as a → 0. Section 3 presents a basic Hilbert space central limit theorem, which proves beneficial for obtaining the limit distribution of T n,a both under H 0 and under fixed alternatives to normality.In Section 4, we derive the limit null distribution of T n,a as n → ∞.Section 5 considers the behavior of T n,a with respect to contiguous alternatives to H 0 .In Section 6, we show that the test for multivariate normality that rejects H 0 for large values of T n,a is consistent against general alternatives.Moreover, the limit distribution of T n,a under a fixed alternative distribution is seen to be normal.Since the variance of this normal distribution can be estimated from the data, there is an asymptotic confidence interval for the measure of distance from normality under alternative distributions that is inherent in the procedure.Furthermore, there is the option for a neighborhood-of-model validation procedure.The results of a simulation study, presented in Section 7, show that the novel test is strong with respect to prominent competitors.In Section 8, the new tests are applied to the Iris flower data set due to R.A. Fisher.The paper concludes with some remarks.For the sake of readability, most of the proofs and some auxiliary results are deferred to Section A. Finally, the following abbreviations, valid for t, x ∈ R d , will be used in several sections: 2 The limits a → ∞ and a → 0 The results of this section show that the class of tests for multivariate normality based on T n,a is "closed at the boundaries" a → ∞ and a → 0 and thus shed some light on the tuning parameter a, which figures in the weight function w a given in (6).We first consider the case a → ∞.
Theorem 2.1.Elementwise on the underlying probability space, we have Proof.For short, put Y j := Y n,j .From the representation of T n,a we have nd, an expansion of the exponential function yields as a → ∞.To tackle U n,2 , we use 2a 2a+1 and, after some algebra, obtain Finally, a binomial expansion of (a/(a + 1)) d/2+1 yields Summing up, the assertion follows.
Remark 2.2.The limit , which figures on the right-hand side of (14), is a measure of multivariate (sample) skewness, introduced by Móri, Rohatgi and Székely, see [45].A much older time-honored measure of multivariate (sample) skewness is skewness in the sense of Mardia (see [42]), which is given by It is interesting to compare Theorem 2.1 with similar results found in connection with other weighted L 2 -statistics that have been studied for testing H 0 .Thus, by Theorem 2.1 of [28], the time-honored class of BHEP-statistics for testing for multivariate normality (see [36]), after suitable rescaling, approaches the linear combination 2b 1,d + 3 b 1,d , as a smoothing parameter (called β in that paper) tends to 0. Since β and a are related by β = a −1/2 , this corresponds to letting a tend to infinity.The same linear combination 2b 1,d + 3 b 1,d also showed up as a limit statistic in [31] and [32].Notice that, in the univariate case, the limit statistic b 1,d figuring in Theorem 2.1 is nothing but three times squared sample skewness.We stress that tests for multivariate normality based on b 1,d or b 1,d or on related measures of multivariate skewness and kurtosis lack consistency against general alternatives, see, e.g., [8,9,26,27,29].
We now consider the case a → 0. Since, elementwise on the underlying probability space, the expressions in (11) and (12) have finite limits as a → 0, and since the double sum figuring in (10) converges to n j=1 Y n,j 4 as a → 0, we have the following result.
Theorem 2.3.Elementwise on the underlying probability space, we have Remark 2.4.The limit statistic on the right-hand side of ( 15) is Mardia's celebrated measure b 2,d of multivariate sample kurtosis, see [42].Together with Theorem 2.1, this result shows that, just like the class of BHEP tests for multivariate normality (see [28]), also the class of tests based on T n,a is "closed at the boundaries" a → ∞ and a → 0. Notably, Mardia's measure of kurtosis shows up for the first time in connection with limits of weighted L 2 -statistics for testing for multivariate normality.The corresponding limit statistic for the class of BHEP tests is, up to a linear transformation, n −1 n j=1 exp(− Y n,j 2 /2), see Theorem 3.1 of [28].

A basic Hilbert space central limit theorem
In this chapter, we present a basic Hilbert space central limit theorem.This theorem implies the limit distribution of T n,a under the null hypothesis (1), but it is also beneficial for proving a limit normal distribution of T n,a under fixed alternatives to H 0 .Throughout this section, we assume that the underlying distribution satisfies E X 4 < ∞.Moreover, we let E(X) = 0 and E(XX ) = I d in view of affine invariance.To motivate the benefit of a Hilbert space setting and for later purposes, it will be convenient to represent T n,a in a different way.
Proposition 3.1.Recall ψ(t) from (4), and let We then have Proof.The proof follows by straightforward algebra using the addition theorems for the sine function and the cosine function and the fact that sin(t y)m(t)w a (t) dt = 0, y ∈ R d .
A convenient setting for asymptotics will be the separable Hilbert space H of (equivalence classes of) measurable functions f : R d → R satisfying f 2 (t)w a (t) dt < ∞.The scalar product and the norm in H will be denoted by 13), and putting the main object of this section is the random element V n of H, defined by Observe that V n = Z n if the distribution of X is N d (0, I d ), since then the functions µ and m coincide.We will show that, as n → ∞, V n converges in distribution to a centred Gaussian random element V of H.The only technical problem in proving such a result is the fact that V n is based on the scaled residuals Y n,1 , . . ., Y n,n and not on X 1 , . . ., X n .If V 0 n (t) denotes the modification of V n (t) that results from replacing Y n,j with X j , a Hilbert space central limit theorem holds for V 0 n , since the summands comprising V 0 n (t) are i.i.d.square-integrable centred random elements of H. Writing D −→ for convergence in distribution of random elements of H and random variables, the basic idea to prove In what follows, the stochastic Landau symbol o P (1) refers to convergence to zero in probability in H, i.e., we have to show To state the main result of this section, let ψ X (t) = E[exp(it X)], t ∈ R d , denote the characteristic function of X, and put , where Re w and Im w stand for the real and the imaginary part of a complex number w, respectively.For a twice continuously differentiable function f : R d → R, let Hf (t) denote the Hessian matrix of f , evaluated at t. Furthermore, recall the gradient operator ∇ and the Laplace operator ∆ from Section 1. where We then have (21).

The proof of Proposition 3.2 is given in Section
, we have (writing tr for trace) Ev(t, X) = −∆ψ + X (t) + tr(Hψ + X (t)) = 0. Thus, v(•, X 1 ), . . ., v(•, X n ) are i.i.d.centred square-integrable random elements of H, and the central limit theorem in Hilbert spaces gives V n D −→ V for some centred Gaussian element V of H.In view of ( 21) and Slutsky's lemma, we therefore can state the main result of this section.Theorem 3.3.Let X, X 1 , X 2 , . . .be i.i.d.random vectors satisfying E X 4 < ∞, E(X) = 0 and E(XX ) = I d .For the sequence of random elements V n defined in (20) we have where V is a centred Gaussian element of H having covariance kernel where v(t, x) is given in (23). 4 The limit null distribution of T n,a In this section we derive the limit distribution of T n,a under the null hypothesis (1).In view of affine invariance, we assume that X has a d-variate standard normal distribution.Since the process Z n (t) given in ( 17) is nothing but V n , as defined in (20), in this special case, we have the following result.
Theorem 4.1.Suppose that X has some non-degenerate normal distribution.Putting we have the following: a) There is a centred Gaussian random element Z of H with covariance kernel Notice that b) follows from a) and the continuous mapping theorem.Part a) follows from Theorem 3.3.For the special case , and ∇∆ψ(t) = tψ(t) 2 + d − t 2 .Thus, the function v(t, x) figuring in the statement of Proposition 3.2 takes the special form Long but straightforward computations, using symmetry arguments and the identities ] takes the form given above.
Let T ∞,a be a random variable with the limit null distribution of T n,a , i.e., with the distribution of Z 2 (t)w a (t) dt.
Since E(T ∞,a ) = K(t, t)w a (t) dt, the following result may be obtained by straightforward but tedious manipulations of integrals.
Theorem 4.2.Putting we have The quantiles of the distribution of T ∞,a can be approximated by a Monte Carlo method, see Section 7.

Contiguous alternatives
In this section, we consider a triangular array (X n1 , . . ., X nn ), n ≥ d + 1, of rowwise i.i.d.random vectors with Lebesgue density , is the density of the standard normal distribution N d (0, I d ), and g is some bounded measurable function satisfying g(x)ϕ(x) dx = 0. We assume that n is sufficiently large to render g nonnegative.Recall Z n (t) from ( 17).
Theorem 5.1.Under the triangular X n,1 , . . ., X n,n given above, we have where Z is the centred random element of H figuring in Theorem 4.1, and with h(x, t) given in (27).
Proof.Since the reasoning uses standard LeCam theory on contiguous probability measures and closely parallels that given in Section 3 of [36], it will be omitted.
Corollary 5.2.Under the conditions of Theorem 5.1, we have From Theorem 5.1 and the above corollary, we conclude that the test for multivariate normality based on T n,a is able to detect alternatives which converge to the normal distribution at the rate n −1/2 , irrespective of the underlying dimension d.The test of Bowman and Foster (see [12]) is a prominent example of an affine invariant tests for multivariate normality that is consistent against each fixed non-normal alternative distribution but nevertheless lacks this property of n −1/2 -consistency, see Section 7 of [30].

Fixed alternatives and consistency
In this section we assume that X, X 1 , X 2 , . . .are i.i.d. with a distribution that is absolutely continuous with respect to Lebesgue measure, and that E X 4 < ∞.In view of affine invariance, we make the additional assumptions E(X) = 0 and E(XX ) = I d .Recall m(t) from ( 16) and µ(t) from (19).Writing a.s.
−→ for P-almost sure convergence, our first result is a strong limit for T n,a /n.
The proof of Theorem 6.1 is given in Section A. By analogy with Theorem 2.1, the next result shows that the (population) measure of multivariate skewness in the sense of Móri, Rohatgi and Székely (see [45]) emerges as the limit of ∆ a , after a suitable normalization, as a → ∞.
The proof of Theorem 6.4 is given in Section A.
We now show that the limit distribution of This fact is essentially a consequence of Theorem 1 in [6].The reasoning is as follows: By (18), we have A little thought shows that , where V n is given in (20).By Theorem 3.3, V n D −→ V for a centred Gaussian element V of H.As a consequence, the second summand in ( 29) is o P (1) as n → ∞, and, by the continuous mapping theorem, the first summand converges in distribution to 2 V, z H .The latter random variable has a centred normal distribution with variance The following theorem summarizes our findings.
Theorem 6.5.For a fixed alternative distribution satisfying E X 4 < ∞, E(X) = 0 and E(XX ) = I d , we have where and L(s, t) is given in (26).
Proof.To complete the proof, notice that, by Fubini's theorem, Under slightly stronger conditions on P X , there is a consistent estimator of σ 2 a .To obtain such an estimator, we replace L(s, t) as well as z(s) and z(t) figuring in (30) with suitable empirical counterparts.To this end, notice that, by (26), we have where and v j (t, x), j ∈ {1, 2, 3, 4}, are given in (24), (25).Since ∇∆ψ ± parts a) -d) of the following lemma show that the unknown quantities ∇∆ψ + X (t), ∇ψ − X (t), ∆ψ − X (t) and Hψ + X (t) that figure in the expressions of v 2 (t, x), v 3 (t, x) and v 4 (t, x) can be replaced with consistent estimators that are based on the scaled residuals Y n,1 , . . ., Y n,n defined in (5).
The proof of Lemma 6.6 is given in Section 9.
In view of Lemma 6.6, a suitable estimator of L(s, t) defined in (31) is where and By straightforward algebra we have Writing P −→ for convergence in probability, we then have the following result.
where L n (s, t) and z n (s) are defined in ( 33) and ( 36), respectively.We then have The proof of Theorem 6.7 is given in Section A.
Under the conditions of Theorem 6.7, Theorem 6.5 and Sluzki's lemma yield provided that σ 2 a > 0. We thus obtain the following asymptotic confidence interval for ∆ a .
Example 6.9.We consider the case d = 1, a = 0.1 and the following standardized symmetric alternative distributions: the uniform distribution U − √ 3, √ 3 , the Laplace distribution L 0, 1/ √ 2 , and the logistic distribution Lo 0, √ 3/π .The corresponding characteristic functions and their second derivatives are given by The pertaining values of ∆ 0.1 are ∆ 0.1,U ≈ 0.3322, ∆ 0.1,L ≈ 0.127 and ∆ 0.1,Lo ≈ 0.033.Table 1 shows the percentages of coverage of the confidence intervals I n,0.1,α for α = 0.05 and several sample sizes.Each entry is based on 5000 Monte-Carlo-replications.The results are quite satisfactory for n ≥ 50.
Remark 6.10.A further application of (38) addresses a fundamental drawback inherent in any goodness of fit test, see [6].If a level-α-test of H 0 does not reject H 0 , the hypothesis H 0 is by no means validated or confirmed, since each model is wrong, and there is probably only not enough evidence to reject H 0 .However, if we define a certain "essential distance" δ 0 > 0, we can consider the inverse testing problem Here, the dependence of ∆ a on the underlying distribution has been made explicit.
From (38), we obtain an asymptotic level-α-neighborhood-of-model-validation test of H δ0 against K δ0 , which rejects H δ0 if and only if Indeed, from (38) we have for each Moreover, it follows that for each F such that ∆ a (F ) = δ 0 .Finally, we have if ∆ a (F ) < δ 0 .Thus, the test is consistent against each alternative.
Remark 6.11.The double integral figuring (37) may be calculated explicitly.To this end, notice that σ 2 n,a = 4 i,j=1 σ i,j n,a , where and σ i,j n,a = σ j,i n,a , i, j ∈ {1, . . ., 4}, by symmetry.We put where Y j is shorthand for Y n,j .Notice that P 1,a,1 n , P 1,a,2 n and P 2,a n are vectors and P

1,a
n is a matrix.By straightforward but tedious manipulations of the integrals in (39), each σ i,j n is seen to be an arithmetic mean of functions of the scaled residuals, namely:

Simulations
In this section, we present the results of a Monte Carlo simulation study on the finite-sample power performance of the test based on T n,a with that of several competitors.and multivariate data, we distinguish the cases d = 1 and d ≥ 2. In the univariate case, the sample sizes are n ∈ {20, 50, 100}, and in the multivariate case we restrict the simulations to n ∈ {20, 50}.The nominal level of significance is fixed throughout all simulations to 0.05.Critical values for T n,a (in fact, for a scaled version of T n,a in order to obtain values of a similar magnitude across the range of values for d and a considered) have been simulated under H 0 with 100 000 replications, see Table 2.The critical values in the rows in Table 2 denoted by "∞" represent approximations of quantiles of the distribution of the limit random element T ∞,a = Z 2 (t) w a (t) dt in Theorem 4.1 b).
To obtain these values we simulate (say) m i.i.d.random supporting points U 1 , . . ., U m where U 1 ∼ N d (0, (2a) −1 I d ), which are adapted to the integration over R d with respect to the weight function w a (t) = exp(−a t 2 ) and a large number (say) of random variables Z j = 1 d 2 m X j 2 , j = 1, . . ., , with i.i.d.X j ∼ N m (0, Σ K ), where the covariance matrix Σ k is given by Σ K = (K(U k1 , U k2 )) k1,k2∈{1,...,m} and K is the covariance kernel in Theorem 4.1 a).Next, we calculate the empirical 95% quantile of Z 1 , . . ., Z .Each approximation was simulated with = 100, 000 and m = 1, 000 for d ∈ {2, 3, 5, 10} and each entry in Tables 4 -7, which exhibit percentages of rejection of H 0 of the tests under consideration against various alternative distributions, is based on 10 000 replications.Entries that are marked with * denote 100%.The simulations have been performed using the statistical computing environment R, see [48].
For the implementation of the first three tests in R we refer to the package nortest by [23].Tests based on the empirical characteristic function are represented by the BHEP-test with tuning parameter a > 0. The statistic is given in (40), a = 1 is fixed, and the critical values are taken from [25].
The BCMR-test is based on the L 2 -Wasserstein distance, see section 3.3 in [14], and is given by Here, X (k) is the k-th order statistic of X 1 , . . ., X n , S 2 n is the sample variance, and Φ −1 is the quantile function of the standard normal law.Simulated critical values can be found in [40], or in Table 4 of [11].
The is based on a L 2 -distance between the empirical zero-bias transformation and the empirical distribution.Since the zero-bias transformation has a unique fixed point under normality, this distance is minimal under H 0 .The statistic is given by where Y 1 , . . ., Y n is shorthand for the scaled residuals Y n,1 , . . ., Y n,n , Y (1) ≤ . . .≤ Y (n) are the order statistics of Y 1 , . . ., Y n , and Φ stands for the distribution function of the standard normal law.The implementation employs a bootstrap procedure to find a data driven version of the tuning parameter a, see [1].We used B = 400 bootstrap replications and the same grid of tuning parameters as in [11], p. 19.
The alternative distributions are chosen to fit the extensive power study of univariate normality tests by [49], in order to ease the comparison with a wide variety of other existing tests.As representatives of symmetric distributions we simulate the Student t ν -distribution with ν ∈ {3, 5, 10} degrees of freedom, as well as the uniform distribution U(− √ 3, √ 3).The asymmetric distributions are represented by the χ 2 ν -distribution with ν ∈ {5, 15} degrees of freedom, the Beta distributions B(1, 4) and B(2, 5), the Gamma distributions Γ(1, 5) and Γ(5, 1), parametrized by their shape and rate parameter, the Gumbel distribution Gum(1, 2) with location parameter 1 and scale parameter 2, the lognormal distribution LN(0, 1) as well as the Weibull distribution W(1, 0.5) with scale parameter 1 and shape parameter 0.5.As representatives of bimodal distributions we take the mixture of normal distributions NMix(p, µ, σ 2 ), where the random variables are generated by Table 4 shows that the empirical power estimates of the new test T n,a outperform the other strong procedures for the symmetric t-distribution, and they can compete for most of the other alternatives.Interestingly, the power does not differ too much when varying the tuning parameter a, although an effect is clearly visible, especially for the uniform distribution.A data driven choice as in [1] might be of benefit also in connection with the new testing procedure.
The HV-test uses a weighted L 2 -type statistic based on a characterization of the moment generating function that employs a system of first-order partial differential equations.The statistic is defined by where γ > 2. We followed the recommendation of the authors in [35] and fixed γ = 5.Since the limiting statistic HV ∞ for γ → ∞ is a linear combination of sample skewness in the sense of Mardia and that of Móri, Rohatgi and Székely, we also included HV ∞ .
The HJG-test uses a weighted L 2 -distance between the empirical moment generating function of the standardized sample and the moment generating function of the standard normal distribution.The test statistic is given by , with β > 0. In our simulation we fix β = 1.5.For each of the tests based on HV 5 , HV ∞ and HJG 1.5 , critical values were simulated with 100 000 replications.
Finally, the now classical BHEP-test examines the weighted L 2 -distance between the empirical characteristic function of the standardized data and the characteristic function of the d-variate standard normal distribution.The statistic has the simple form with a tuning parameter a > 0. A variety of values of a, i.e., a ∈ {0.1, 0.25, 0.5, 0.75, 1, 2, 3, 5, 10}, has been considered.Critical values can be found in Tables I -III of [36], whereas missing critical values have been simulated separately with 100 000 replications.
The alternative distributions are chosen to fit the simulation study in [35] and are defined as follows.We denote by NMix(p, µ, Σ) the normal mixture distributions generated by where Σ > 0 stands for a positive definite matrix.We write in the notation of above µ = 3 for a d-variate vector of 3's and Σ = B d for a (d × d)-matrix containing 1's on the main diagonal and 0.9's on each off-diagonal entry.We denote by t ν (0, I d ) the multivariate t-distribution with ν degrees of freedom, see [22].By DIST d (ϑ) we denote the d-variate random vector generated by independently simulated components of the distribution DIST with parameter vector (ϑ), where DIST is taken to be the Cauchy distribution C, the logistic distribution L, the Gamma distribution Γ as well as the Pearson Type VII distribution P V II , with ϑ denoting in this specific case the degrees of freedom.The spherical symmetric distributions where simulated using the R package distrEllipse, see [50], and are denoted by S d (DIST), where DIST stands for the distribution of the radii and was chosen to be the exponential, the beta and the χ 2 -distribution.
From Tables 5 to 7, it is obvious that T n,a outperforms the competing tests for most of the alternatives considered, again showing that the tuning parameter has -compared to the BHEP test-little effect on the power.As in the univariate case T n,a has very strong power against the multivariate t-distribution.If the radial distribution of the spherical symmetric alternatives has compact support, the BHEP test exhibits a better performance than T n,a .The HV 5 -and the HJG 1.5 -test have a good power, but they are mostly dominated by the BHEP-test and the T n,a -test.Again, a data driven choice of the tuning parameter a would be beneficial for the test, but to the best of our knowledge no reliable multivariate method is yet available.We suggest to choose a small tuning parameter like a = 0.25.

Real Data Example: The Iris Data set
In 1936 R.A. Fisher presented the now classical data set called Iris Flower, see Table I in [21].The data consist of the four variables sepal, length, sepal width, petal length, and petal width, measured on fifty specimens of each of three types or iris, namely Iris setosa, Iris versicolor, and Iris virginica.This data set is included in the statistical language R, and it can be downloaded from the UCI Machine Learning Repository, see [17].That reference provides a list of articles that use this specific data set to validate clustering methods, neural networks or learning algorithms, and it presents a typical test case for statistical classification techniques in machine learning, such as support vector machines.A visualization of two-dimensional projections of the data set is given in Figure 1.
In Table 3 we present empirical p-values simulated with 10 000 replications.As can be seen, the test does not reject the hypothesis of normality on a small significance level (like α = 0.01) for the different species for each of the tuning  parameters considered.For the Iris setosa data, however, an increase of the significance level to 0.05 results in a rejection of the hypothesis for a = 2 and a = 3.For the whole data set, we observe a small p-value due to the mixture of the three populations and consequently reject the hypothesis H 0 .

Concluding remarks
We proved consistency of the test for multivariate normality based on T n,a against each alternative distribution that satisfies the moment condition E X 4 < ∞.Intuitively, the test should be "all the more consistent" if E X 4 = ∞.
In fact, we conjecture consistency of the new test against any non-normal alternative distribution.The limiting random element T ∞,a = Z 2 H from Theorem 4.1 b) has the same distribution as ∞ j=1 λ j N 2 j , where the N j are i.i.d.standard normal random variables, and the λ j are the positive eigenvalues corresponding to eigenfunctions of the linear integral operator Kf (s) = R d K(s, t)f (t) w a (t) dt associated with the covariance kernel K from Theorem 4.1 a), i.e., we have λf (s) = R d K(s, t)f (t) w a (t) dt.These positive eigenvalues clearly depend on K and the weight function w a .It is hardly possible to obtain a closed form expression for λ 1 , λ 2 , . ... It would be interesting to approximate the eigenvalues either numerically or by some Monte Carlo method, since the largest eigenvalue has a crucial influence on the approximate Bahadur efficiency, see [5] and the monograph [46], as well as [34] for results on distribution-free L p -type statistics.say.Letting X 1 , X 2 be independent copies of X, Fubini's theorem, the addition theorems for the sine function and the cosine function, considerations of symmetry and (7) yield

Species
From ( 8) and ( 9), we have and thus say.An expansion of the exponential terms, dominated convergence (notice that exp(−u) ≤ 1 − u + u 2 if u ≥ 0) and a binomial expansion gives and thus Notice that the condition E X 6 < ∞ is not only needed for the existence of the final limit, but it also occurs when dealing with J 2,a .
Proof of Lemma 6.6 Putting Y j = Y n,j and ∆ j = Y j − X j , we have where |ε j (t)|, |η j (t)| ≤ t 2 ∆ j 2 (see [36], p. 8) and thus To prove a), notice that Since |CS ± (t, X j )| ≤ 2 and |ε j (t) + η j (t)| ≤ 2 t 2 ∆ j 2 , the Cauchy-Schwarz inequality yields In view of Proposition A.2, each summand converges to zero almost surely, which proves a).The remaining assertions b), . . ., e) are treated similarly.To tackle b) and e), one can show negligibility of terms by using the fact that the spectral norm • sp of a matrix satisfies xy sp = x y , if x, y are (column) vectors in R d .The condition E X 1 6 < ∞ is needed for part e), since and multiplication with ∆ j 2 (which arises from an expansion of CS + (t, Y n,j )) gives monomials of order 6.
Proof of Theorem 6.7 The proof is similar to the proof of Theorem 5 of [33] and will therefore only be sketched.From ( 26), ( 23) and ( 30), we have σ 2 a = 4 i,j=1 σ i,j a , where σ i,j a = 4 L i,j (s, t)z(s)z(t)w a (s)w a (t) dsdt, and L i,j (s, t) is given in (32).It thus suffices to prove σ i,j n,a P −→ σ i,j a for each pair (i, j), where σ i,j n,a is given in (39).The first step of the proof is to replace L i,j n (s, t) with L i,j n,0 (s, t), which arises from L i,j n (s, t) given in (34) by throughout replacing Y n,k with X k in the functions v n,j (s, Y n,k ), j ∈ {1, . . ., 4}.Notice that this replacement also refers to the quantities Ψ − 4,n (s), Ψ 1,n (s), Ψ − 3,n (s) and Ψ 2,n (s) that figure in the definition of v n,2 , v n,3 and v n,4 .Moreover, we replace z n (s) with z n,0 (s) = n −1 n j=1 CS + (s, X j ) X j 2 − m(s).Putting σ i,j n,0,a = 4 L i,j n,0 (s, t)z n,0 (s)z n,0 (t)w a (s)w a (t) dsdt, Fubini's theorem shows that σ i,j n,0,a P −→ σ i,j a .It thus remains to prove σ i,j n,a − σ i,j n,0,a = o P (1) as n → ∞.To tackle σ i,j n,a − σ i,j n,0,a , we put n,0 (s, t)z n,0 (s)z n,0 (t) and notice that +L i,j n,0 (s, t) (z n (s)z n (t) − z n,0 (s)z n,0 (t)) ,

d j=1 a j a j f = 0, which implies f, d j=1 a j a j f = d j=1 f, a j a j f = d j=1 a j f, a j f = d j=1 a j f 2
= 0

Figure 1 :
Figure 1: 2D projections of the iris data set with coloured species.
By Theorem 1.1, we have ∆ a = 0 if and only if X has the normal distribution N d (0, I d ).In view of Theorem 6.1, T n,a → ∞ P-almost surely under any alternative distribution satisfying E X 4 < ∞.Since, according to Theorem 4.1, the sequence of critical values of a level-α-test of H 0 that rejects H 0 for large values of T n,a stays bounded, we have the following result.Corollary 6.3.The test for multivariate normality based on T n,a is consistent against any fixed alternative distribution satisfying E X 4 < ∞.

Table 1 :
Percentages of coverage of ∆ 0.1 by I n,0.1,α for different distributions (5000 replications) Notice that, by symmetry, L i,j n

Table 2 :
Empirical and approximate quantiles for d −2 a π d/2 T n,a and α = 0.05 (100 000 replications) Since different procedures have been used for univariate

Table 3 :
Empirical p-values for T n,a .
(52). is bounded from above by s t m times finitely many products of the type n −1 n j=1 X j β , where , m ∈ {0, 1} and β ∈ {1, 2, 3, 4}.It now follows from Proposition A.2 that the term figuring in(52), multiplied with w a (s)w a (t) and integrated over R d × R d , converges to zero in probability.