White noise testing using wavelets

Testing whether a time series is consistent with white noise is an important task within time series analysis and for model fitting and criticism via residual diagnostics. We introduce three fast and efficient white noise tests that assess spectral constancy via the wavelet coefficients of a periodogram. The Haar wavelet white noise test derives the exact distribution of the Haar wavelet coefficients of the asymptotic periodogram under mild conditions. The single‐coefficient white noise test uses a single Haar wavelet coefficient obtaining a test statistic as a linear combination of odd‐indexed autocorrelations. The general wavelet white noise test uses compactly supported Daubechies wavelets, shows that its coefficients are asymptotically normal and derives its theoretical power for an arbitrary spectrum. All our tests are available in the freely available hwwntest package for the R system. We present a comprehensive simulation study that shows the good performance of our new tests against alternatives commonly found in available software and show an example applied to a wind power time series. © 2014 The Authors. Stat published by John Wiley & Sons Ltd.


Introduction
Time series white noise tests are useful in their own right or as part of a procedure that assesses model fit from residuals. A stochastic process is white noise if and only if its spectrum is flat and if and only if the wavelet coefficients of the spectrum are all zero. Our tests work by assessing whether the well-known estimator of the spectrum, the periodogram, has wavelet coefficients that are significantly different from zero.
We develop a highly accurate approximation to the Haar wavelet coefficient distribution of the normalized periodogram. Using this approximation, and contemporary multiple hypothesis testing methods, Section 2 introduces a test for white noise based on all the Haar wavelet coefficients and also a single Haar wavelet coefficient test. A further general wavelet test, based on assessing wavelet coefficients produced by compactly supported Daubechies' wavelets, appears in Section 3. Here, an exact distribution is out of reach, so we use the asymptotic normality result due to Neumann (1996) to assess coefficient significance. Section 3 additionally develops a theoretical formula for the general wavelet test's power for any spectrum for a fixed number of observations T. The power formula is implemented in our software package and provides useful guidance on questions such as 'how large does my sample have to be to detect departures from white noise against some specific alternative'?
Many white noise tests already exist, so why do we need another? The answer is that all tests are 'directional' in that they have excellent power for some alternatives, reasonable power for many alternatives and poor power for others. We show that our test achieves high power when others fail (and, naturally, vice versa) and hence is a useful complement to existing methods. Apart from specifying statistical size, our test requires no tuning parameters. Section 4 describes an extensive simulation study comparing our new tests with some established ones and a wind speed data example.
Next, we briefly summarize a few popular white noise tests and cite more extensive review articles. Sections 1.2 and 1.3 set up the basic components of our test and describe how they are combined to assess spectral constancy. Box & Pierce (1970) introduced a popular white noise test that was designed to work with the residuals from an autoregressive integrated moving average model fit. They developed a mathematical approximation to the residual distribution and showed it to be chi-square-distributed with degrees of freedom equal to the lag up to which the autocorrelations are tested (minus the number of fitted autoregressive moving average (ARMA) parameters). This work formed the basis for two tests-Box-Pierce and Ljung-Box from Ljung & Box (1978), the latter a modification of the earlier test. The test statistics used were proportional to T P p iD1 O 2 i where T is the number of observations, O i is the usual estimate of autocorrelation and p denotes an upper bound on the lags being tested. Both versions are portmanteau tests, and the null hypothesis is that i is zero for i D 1, : : : , p. The operation of these tests requires the user to prespecify a likely value of p. Selecting a too small p causes the test to miss any significant autocorrelations at higher lags. Selecting a too large p can detect more significant autocorrelations, but often at reduced power. Many subsequent papers have attempted to address the problem of selecting the 'best' lag for a given problem or averaging results over several lags (e.g. Guay et al., 2013).

Brief overview of white noise tests
Frequency domain white noise tests assess the 'flatness' of a spectrum estimate. Early tests of this type are that of Bartlett (1954Bartlett ( , 1955, which apply a Kolmogorov-Smirnov test to the cumulative periodogram ordinates. See also Durbin (1969Durbin ( , 1970. Some more recent tests and comprehensive reviews can be found in the works of Lobato (2001) and Guay et al. (2013).

Basic components of our tests
Suppose that ¹X t º t2Z is a real-valued second-order stationary stochastic process with mean D 0, variance 2 X < 1, autocovariance X .k/ for integers k and spectrum f.!/ for ! 2[ , / D …. In practice, we remove the mean for any time series for which ¤ 0. We address the problem of testing the hypothesis H 0 that ¹X t º is a white noise versus the alternative H A that ¹X t º is not white noise given a realization X t , t D 1, : : : , T, for some positive integer T.
The process ¹X t º is white noise if and only if its spectrum f.!/ is a constant function on …. The spectrum, f, can be estimated from a realization ¹X t º T tD1 via the periodogram which can be computed at the Fourier frequencies I p D I T .! p /, where ! p D 2 pT 1 for p D 1, : : : , T=2.
Our tests will be based on wavelet decompositions of a spectrum defined as follows.

Definition 1 (Wavelets)
Define < , > to be the usual inner product on L 2 .…/ given by < f, g >D R f.!/g.!/ d!. Let N 0 D N [ 0 and T . j/ D ¹0, : : : , 2 j 1º for j 2 N 0 . Let ¹ j,k º j2N 0 ,k2T . j / be an orthonormal periodic wavelet basis for functions f 2 L 2 .…/, where j,k .x/ D 2 j=2 .2 j x k/, where 2 L 2 .…/ is a (suitably rescaled) Daubechies' where .!/ is the scaling function associated with the wavelet basis and j,k is defined similarly to j,k . The coefficients of the expansion are given by v j,k D< f, j,k > for j 2 N 0 and k 2 T . j/.
Here, wavelets are used provide a decomposition of a spectrum, f.!/, that is localized in frequency and 'scale of frequency' (in contrast to the usual localization in time and frequency for a function defined over time). The wavelet coefficient sets for a wide variety of functions (technically in some Besov class) are sparse. This means that the essential information present in a spectrum can potentially be represented using few wavelet coefficients, generally fewer than in other representations such as Fourier or orthogonal polynomials. Our procedures exploit this sparsity, particularly for the general wavelet test presented in Section 3. Wavelet transforms have further advantages in that their implementations are extremely fast and efficient and the coefficients provide useful information about location and scale of non-constancies. For further information on wavelets, see Daubechies (1992), Mallat (1998), Vidakovic (1999) or Nason (2008) in the statistical context.

Assessing spectral constancy
We investigate the constancy of a normalized spectrum by examining its wavelet coefficients given by For example, j,k .x/ might be the usual Haar wavelet system defined by H The white noise null hypothesis can be reformulated in terms of the wavelet coefficients: if f.!/ is constant, then every v j,k D 0, and the null hypothesis is equivalent to H 0 : v j,k D 0 for all j 2 N 0 , k 2 T . j/. This is because a defining property of wavelets is that the integral of every wavelet is zero.
In practice, we do not know the spectrum and hence estimate it using the periodogram. For the Haar white noise test, we apply the discrete Haar wavelet transform to the normalized periodogram to form the following estimates of v j,k : for .j, k/ 2 I T D ¹.j, k/ : j D 0, : : : , log 2 .T/, k 2 T . j/º. Here, T D 2 J for some J 2 N, but the test can be extended to arbitrary-length data sets by using the Haar wavelet transform for arbitrary T. Alternatively, with truncation or minor 'zero-padding', a series can be fed to the current software without a major effect. A version of (3) exists for more general Daubechies' wavelets, and a fast transform, called the discrete wavelet transform, exists due to Mallat (1989).
To test the constancy of f, we perform multiple hypothesis tests for H 0 : v j,k D 0 for all .j, k/ 2 I T against H A : 9.j, k/ such that v j,k ¤ 0 using O v j,k as test statistic. Sections 2 and 3 explain how to compute the distribution of O v j,k , to a high degree of accuracy, for both the Haar and general wavelet situations. From this, we can obtain a 'per-test' p-value with respect to each j, k pair. We use multiple hypothesis test size adjustment techniques such as Bonferroni correction for a test of size˛or the less conservative false discovery rate method of Benjamini & Hochberg (1995). The idea of using Haar wavelet coefficients for testing constancy appeared in the work of von Sachs & Neumann (2000) in the context of stationarity testing. Here, we test for constancy over frequency, rather than over time, and, additionally, use more general wavelets to improve the statistical power of our tests.

A Haar wavelet test
Under the null hypothesis of ¹X t º being independent and identically distributed with mean zero and variance 2 X < 1, it is known that a fixed number of periodogram ordinates are distributed asymptotically as independent exponential random variables (Brockwell & Davis, 1991, p. 344; and, apart from p D T=2, identically). Further asymptotic results that also show the exponential limit for all periodogram ordinates under certain conditions can be found in the works of Freedman & Lane (1980), Kokoszka & Mikosch (2000), Fay & Soulier (2001) and Shao & Wu (2007). If the X t are Gaussian, then the periodogram ordinates are independent.
We will model the periodogram ordinates as independent exponential random variables with appropriate means.
For now, suppose that 2 (3) is given by the following result.
Proof: See Simon (2002, p. 26) or Kotz et al. (2001) for the essential result. A tailored version is provided in the supplementary material, which also provides information on the asymptotic behaviour of g m .x/, which is useful later.
In practice, 2 X is unknown, and so we estimate it using the standard sum-of-squares sample variance formula. Then we operate our test on the normalized periodogram O 2 X I p , which is distributed, asymptotically, as an exponentially distributed random variable with a rate of 1. The test requires no tuning parameters.
Haar wavelet white noise (HWWN) test procedure: Compute the normalized periodogram, and then compute its Haar wavelet transform as specified by (3). The Haar coefficients have distribution specified, to a high degree of accuracy, by (4), and the p-value for testing H 0 : v j,k D 0 versus H A : v j,k ¤ 0 can be obtained by using the inverse cumulative distribution of g m .x/ from (4).
To demonstrate the accuracy of our approximations, Figure 1 shows probability and cumulative density estimates of the wavelet coefficients with their theoretical g 2 .x/ versions superimposed and associated empirical p-values obtained by the test procedure operating on 2 14 independent Gaussian random variables. The empirical p-values appear to be close to uniform random variables, which is what one would expect for the test statistic under the null hypothesis.
In general, Proposition 1 is only approximate for the distribution of O v j,k even for Gaussian X t . For finite T, the normalization by O 2 X causes the periodogram ordinates to obey an F-distribution (Koen, 1990), although, in practice, the difference between this and the exponential is small, even for moderate sample sizes. For non-Gaussian random variables, the ordinates will not be precisely exponential and will suffer small correlations, although these tend to be mitigated by the 'decorrelation' action of the wavelet transform (Craigmile & Percival, 2005).
Any single non-zero Haar coefficient in the test described earlier indicates departures from white noise. We developed a further test using the single coefficient v 0,0 , the coarsest scale wavelet coefficient that can be written in the frequency Because of the Fourier relationship between autocovariance and spectrum, the coefficient v 0,0 can be rewritten as . 1/ m .2m C 1/=.2m C 1/ 0.90 .1/ 0.30 .3/ C 0.18 .5/ 0.13 .7/ C , where . / is the usual autocorrelation. A test statistic can obtained from v 0,0 by substituting the sample autocorrelation r for . / and comparing jO v 0,0 j with a critical value obtained either from the g m distribution (4) from Proposition 1 or using the Gaussian approximation derived in the next section. Simulations (in the following sections) show that this single coefficient, SCWN, test is surprisingly effective in practice. The SCWN test is also particularly easy to compute from the sample autocorrelations, which, in many cases, would have already been computed and plotted.

(wileyonlinelibrary.com) DOI: 10.1002/sta4.69
The ISI's Journal for the Rapid Dissemination of Statistics Research 3 A general wavelet test Wavelet expansions for a wide variety of functions are sparse (e.g. Wasserman, 2005, p. 208;Nason, 2008). The bottom-right plot in Figure 1 shows the Haar wavelet coefficients of the AR.1/ spectrum with˛1 D 0.9: most coefficients are small or zero, and only four or five large ones are required to represent the spectrum accurately. One reason for the sparsity of representation can be explained via wavelets' vanishing moments property. A wavelet .x/ has m vanishing moments if R x` .x/ dx D 0 for`D 0, : : : , m 1. The property means that coefficients of wavelets that overlap the spectrum, where it is locally like a polynomial of degree less than m, will be small and in many cases zero. Consequently, smooth spectral densities, such as those belonging to ARMA.p, q/ processes, have sparse wavelet representations with increasing sparsity for higher numbers of vanishing moments.
Hence, this section uses Daubechies' compactly supported wavelets with ten vanishing moments in (2)  For general wavelets, no simple closed form for the null distribution of O v j,k , such as in Proposition 1 for Haar wavelets, seems to exist because the wavelet weights, g i , are not constant and change from scale to scale. Alternatively, as long as we stay away from the finer scales, it turns out that the wavelet coefficients are asymptotically normal.
For Haar wavelets, the asymptotic normality can be seen immediately by noticing that the characteristic function of the g m distribution in (4) is essentially Student's t-distribution, which tends asymptotically to the Gaussian as the number of degrees of freedom tends to infinity. Hence, asymptotically, the g m distribution tends to a Gaussian as that density is a fixed point of the Fourier transform. Asymptotic normality of coefficients arising from general Daubechies' compactly supported wavelets can be established using the following result from Neumann (1996).

Corollary 1 (Coefficient asymptotic normality)
Let O v j,k be the empirical wavelet coefficients of the normalized periodogram computed by (3)  More details can be found in the work of Nason & Savchev (2014). The 2 j D O.T 1=2 / ensures that we only consider coefficients away from the fine scales and is assumption (A1) from von Sachs & Neumann (2000). Our general wavelet test is called GENWWN, and its definition is identical to the earlier HWWN except that we gauge the distribution of the O v j,k using the Gaussian approximation suggested by Corollary 1.
Few tests in the literature have freely available software to compute them, and none, as far as we know, provides functionality to compute theoretical power. Flexible computation of the theoretical power of a test for a given ARMA model is extremely useful as it sheds considerable light on the likely sample size required to detect departures of a given size in a given direction. The theoretical power function of GENWWN is given by the following.

Proposition 2 (Test power function)
Let the nominal size of the test be˛, the Bonferroni corrected size be˛c D N 1 c˛a nd the Bonferroni critical value C˛c Dˆ 1 .1 ˛c=2/, whereˆis the standard normal cumulative distribution function. Then the (approximate)  (7) for a general wavelet test (solid line); circles are simulation study results (below). Top left: independent and identically distributed normal random variables for checking statistical size (5%); Top right: AR.1/ with˛D 0.3 (Table II). Bottom left: MA.2/ withˇ1 D 0,ˇ2 D 0.5 (Table III). Bottom right: AR.12/ with 1 D D˛1 1 D 0,˛1 2 D 0.4 (Table III).

power function of the test is given by
where v j,k and Á j,k are given by (2) and Corollary 1, respectively. Figure 2 shows the utility and high accuracy of our theoretical power function (solid line) by comparing it with results (circles) from four of the models used in our simulation study later. The top left-hand plot in Figure 2 corresponds to the tests' statistical size and agrees well with the 5% nominal value. 4 Implementation, simulations and real data example

Implementation
Our tests are implemented in an R package hwwntest where the spectrum and its normalization are computed using the fast Fourier transform and the regular var variance function and the wavelet coefficients computed using the (wileyonlinelibrary.com) DOI: 10.1002/sta4.69 The ISI's Journal for the Rapid Dissemination of Statistics Research wavethresh package (Nason, 2013). The wavelet coefficients are then compared with the g m distribution from (4) (for Haar) or the normal distribution (for general wavelets), and then the coefficient p-values are adjusted by a Bonferroni correction (alternatively, other methods such as false discovery rate can be used).
To compute the theoretical power of the GENWWN test for a general spectrum, we first compute the wavelet coefficients of the normalized spectrum. In particular, the variances Á 2 j,k D< 2 j,k , ¹ 2 X fº 2 > need to be calculated. We choose to use the fast approximate methods due to Herrick et al. (2001) and Barber et al. (2002) where 2 .x/ is represented as a linear combination of father wavelets, `,m at finer scales. Then all the rescaled squared wavelets 2 j,k in the calculation of Á 2 j,k can be rapidly computed by using father wavelet coefficients at finer scales still.

Simulation results
We carried out an extensive simulation study to evaluate the empirical size and power of our new tests in comparison with the Box-Ljung, Bartlett and whitenoise.test() test from Lobato & Velasco (2004), which we call LVWN in the tables. Tables I-III present a selection of results, and more can be found in the work of Nason & Savchev (2014).
The nominal size of all hypothesis tests was set to 5%, and results were established via 10 5 Monte Carlo replications. Table I shows empirical size results for independent and identically distributed standard normal and Cauchy distributions for sample sizes of T D 64, 256 and 1024. In most cases, the empirical size is close to, and never exceeds, the nominal size for all tests for the standard normal data. However, for Cauchy data, the empirical size for the LVWN test, and to a much lesser extent the Ljung-Box test, dramatically exceeds its nominal value, and hence, we would not recommend its use in heavy-tailed noise scenarios. The periodogram-based tests (HWWN, GENWWN and Bartlett) are not affected, probably because of the stable exponential asymptotic limit of the periodogram under mild distributional conditions.     Overall conclusions from the simulations are as follows: (i) one should avoid the LVWN test if one suspects heavy tails; (ii) the performance of the Box-Ljung test is highly variable and depends on the number of lags; and (iii) all of the tests perform well against some alternatives and not against others. The GENWWN test performs well, especially for a reasonable T. Figure 3 shows an hourly wind speed series of length T D 128 recorded at Aberporth, Wales, during 2010. The top left plot shows the series with a non-trivial trend, which we remove by first differencing (top right) and is then tested for consistency with white noise.  Figure 3, bottom left, shows the first differences' autocorrelation function, which seems close to white noise, although there may be significant autocorrelations at lags 1 and 10. This kind of borderline situation, where the series is close to white noise and where the autocorrelations are close to being assessed as zero, is interesting and challenging for white noise tests. Most tests are less challenged by obvious non-white noise or exact white noise.