Appendices for “Locally stationary wavelet packet processes: basis selection and model ﬁtting.”

The proof is based on three intermediate results: (i) Positive definiteness of A; (ii) Decays of entries for A; (iii) Boundedness of inverse A−1. (i) Positive definiteness of A. A is positive semi-definite as it can be written as A = Ψ′Ψ where Ψ is a matrix constructed from autocorrelation wavelet packets and hence x′Ax = x′Ψ′Ψx = (Ψx)′(Ψx) = y′y ≥ 0 for some y. For definiteness, suppose x′Ax = 0 for some x. This implies ∑ τ{ ∑ p xpΨp(τ)}{xp′ ∑ p′ Ψp′(τ)} = 0. In other words ∑ τ a 2 τ = 0 where aτ = ∑ p xpΨp(τ) and hence aτ = 0 for all τ . Hence ∑ p xpΨp(τ) = 0 for all τ and because of the linear independence of the {Ψp(τ)}p we must have xp = 0 for all p. from (ii) Decay of entries of A. We can use the alternative definition of A in terms of crossautocorrelation wavelets Ψp,p′(τ) = ∑ t ψp(t)ψp′(t+ τ):


Appendix Proof of Theorem 1
The proof is based on three intermediate results: (i) Positive definiteness of A; (ii) Decays of entries for A; (iii) Boundedness of inverse A −1 .
(i) Positive definiteness of A. A is positive semi-definite as it can be written as A = Ψ Ψ where Ψ is a matrix constructed from autocorrelation wavelet packets and hence x Ax = x Ψ Ψx = (Ψx) (Ψx) = y y ≥ 0 for some y.For definiteness, suppose x Ax = 0 for some x.This implies τ { p x p Ψ p (τ )}{x p p Ψ p (τ )} = 0.In other words τ a 2 τ = 0 where a τ = p x p Ψ p (τ ) and hence a τ = 0 for all τ .Hence p x p Ψ p (τ ) = 0 for all τ and because of the linear independence of the {Ψ p (τ )} p we must have x p = 0 for all p. from (ii) Decay of entries of A. We can use the alternative definition of A in terms of crossautocorrelation wavelets Ψ p,p (τ ) = t ψ p (t)ψ p (t + τ ): and express the decay of the entries of A using those of Ψ p,p (τ ).In the following we will use the two scale relationship ψ j,i (t) = k p j,i t−2k ψ j−1,i (k), with i = [i/2], the integer part (see Percival and Walden (2000), equation 231b).In the following we will use this relation in cascade.Note that this relation holds for any packet (j, i), whether this is in a basis b or not.Finally we will also use that and the definition of orthogonal wavelet packet filters t p t p t−2k = δ k for all (j, i).Since this relation holds for any packet, in the following we will simply write p t .For p, p ∈ b and p = p , let j p = j and j p = l.Moreover, we will also assume without loss of generality that there are at least two wavelet packets belonging to different scales (j p = j p ) such that min(j p , j p ) = min(j, l) = l.Setting k − m = τ , by noting that as a cross-correlation (wavelet) function it has a peak at lag 0, we can bound using Cauchy-Schwartz as Indicating the wavelet packet filter's transfer function as m u,j,b (ω), u = 0, 1 under the same conditions as (2), we have for Ψ 2 p,p (0) for some K 2 < ∞.Using (3), ( 4) and ( 5) we have for some positive K 3 < ∞.For j > l, using the difference of triangular numbers and completing the squares, we finally obtain To obtain (4) we bounded the squared gain functions for a generic sequence with the squared gain for the scaling filter, using the properties of trigonometric polynomials.Hence the squared cross-correlation will have (lower bound) decay O(2 −|j−l| ) and, in view of (1), A will have the same decay.As an illustration of the less general wavelet example see Figure 1, which plots numerically computed values of log A 1,j versus the scale index j, and show that the (known) 'worst-case' for decay is the Haar wavelet which 'achieves' the 2 −j decay rate of (7).
Definition 1 (GMRS-239) A matrix A = (A ij ) i,j∈J (where for our purposes the set J = Z or Z + , the non-negative integers) decays exponentially if there exist constants c > 0 and 0 < λ < 1 such that |A ij | ≤ cλ |i−j| for all i, j ∈ J. GMRS-240 note that if A, B are semi-infinite matrices such that B decays exponentially and A ∼ B then A decays exponentially also.GMRS-235 guarantees that semi-infinite positive definite A can be written in Cholesky form as A = M M where M is lower triangular with positive diagonal entries.Then, using the decay properties of A from (ii) above, construct a lower triangular L that decays exponentially such that A ∼ LL .Then Theorem 2.2 of GMRS states that M ∼ L and M has a lower triangular inverse M −1 with M −1 ∼ L −1 and hence M −1 decays exponentially.From GMRS-Lemma 2.2 A −1 = (M −1 ) M −1 decays exponentially and from GMRS-Lemma 2.1 A −1 is bounded in l 2 (Z + ).This completes the proof.

Proof of Proposition 1
By omitting the constant term the (negative) Gaussian log-likelihood for the LSWP vector X t is approximated by where EX t X t = Σ S(b) + O(T −1 ) is the covariance matrix for LSWP processes.This (T × T ) matrix can be decomposed as Σ S(b) = V S V where S = diag{S p (t/T )} p,t is a diagonal matrix of dimensions (|b|T × |b|T ) containing all the spectral values.The rectangular block matrix V = (V p ) p has dimensions (T × |b|T ) and is formed by |b| square matrices each of dimensions (T × T ).Each row of the V p blocks is made of circularly shifted NDWP vectors Ψ jp,ip defined in Section 2 and is full-rank.This implies that V and V are also full-rank and therefore their generalized inverses exist and will be denoted as V −1 and (V ) −1 respectively.We also define the sample covariance matrix Σ L(b) = X t X t that can be decomposed as Σ L(b) = V L V where V is defined as above and L = diag(L p,t ) p,t has dimensions (|b|T × |b|T ).The discussion of Remark 7 (in the main paper) and references therein imply that this will be an asymptotically unbiased estimator for the covariance matrix Σ S(b) even if it will not exactly be symmetrical in finite samples.The first part of the log-likelihood (8) can be rewritten using the identity where v t,T = E(X t − Xt ) 2 is the prediction error variance and Xt is the best linear predictor for X t , that can be obtained, given a basis b ∈ B, in the same fashion as Fryzlewicz et al. (2003).
In fact we have that in our case for all packets p.We therefore have The second part of the log-likelihood (8) can be rewritten using the properties of the trace operator since X T Σ −1 S(b) X T = tr(X T Σ −1 S(b) X T ), being scalars equivalent to their trace.Using the decompositions of covariance matrices we have tr .
Using these two results the log-likelihood can be finally expressed as which completes the proof.

Proof of Proposition 2
Equation ( 13) gives an explicit expression for the log-likelihood in terms of the spectral ordinates {S p (t/T )} which are the nuisance parameters in the basis selection process.To prove part 1, the profile likelihood is obtained by replacing the spectra in equation ( 13 and the result follows from the asymptotic unbiasedness of L p,t and the application of Jensen's inequality.

Figure 1 :
Figure 1: Plot of numerically computed log A 1,j for j = 1, . . ., 20 for Haar wavelets plotted as H, and Daubechies' extremal phase wavelets with 4 and 8 vanishing moments.The slope of the Haar line obtained by linear regression is (numerically) exactly − log 2.
2 we simply calculate the expectation of the log-likelihood given in equation (therefore given byE [L T (b) − L T {b, S(b)}] = 1 2T p∈b t E [log L p (t/T ) − log S p (t/T )] + O(T −1 ),