Functional principal component analysis for cointegrated functional time series

Functional principal component analysis (FPCA) has played an important role in the development of functional time series analysis. This note investigates how FPCA can be used to analyze cointegrated functional time series and proposes a modification of FPCA as a novel statistical tool. Our modified FPCA not only provides an asymptotically more efficient estimator of the cointegrating vectors, but also leads to novel FPCA‐based tests for examining essential properties of cointegrated functional time series.


Introduction
Functional principal component analysis (FPCA) has been a central tool for functional time series (FTS) in various contexts encompassing FTS regression (Park and Qian, 2012;Seong and Seo, 2022), prediction (Hyndman and Ullah, 2007;Aue et al., 2015) and long memory FTS (Li et al., 2020) to name a few.In the recent work by Chang et al. (2016), FPCA is applied to nonstationary cointegrated FTS.Given that many economic time series are nonstationary but allow a long run stable relationship, their analysis paves the way for potential applications and extensions (see e.g., Chang et al., 2020, Li et al., 2022, Seo and Shang, 2022).This note further investigates how FPCA can be used in statistical analysis of cointegrated FTS {X t } t≥1 taking values in a Hilbert space H. Cointegrated FTS considered in this paper can be decomposed by orthogonal projections P N and P S = I −P N (where I denotes the identity map) into a finite dimensional unit root process {P N X t } t≥1 and a potentially infinite dimensional stationary process {P S X t } t≥1 .When such a time series is given, it is important to estimate P N or P S since either of them characterizes the cointegrating behavior.Chang et al. (2016) earlier studied how the eigenelements of the sample covariance operator can be used to construct a consistent estimator of P N .In this note, we first add some novel asymptotic results, including the convergence rate and the asymptotic limit of the existing estimator of P N (or P S ) obtained from FPCA.Based on the results, we propose a modified FPCA methodology that guarantees (i) a more asymptotically efficient estimator and (ii) a more convenient asymptotic limit which leads to novel statistical tests for examining hypotheses about cointegration.Our modified FPCA is motivated from semiparametric modifications to obtain asymptotically efficient estimators of the cointegrating vectors in a multivariate cointegrated system as in Phillips and Hansen (1990) and Harris (1997), where the latter article is more related to the present paper (this will be discussed in detail in Section A of the Supplementary Material to this article).
The rest of this paper is organized as follows.Section 2 provides main assumptions.In Section 3, we discuss on the ordinary FPCA and our proposed modification of it as statistical tools to estimate P N or P S , and provide the relevant asymptotic theory.Based on our modified FPCA, Section 4 gives statistical tests to examine various hypotheses on cointegration.In Section 5, we conclude with some cautions about using our methodology in practice and present some idea how the methodology can be further extended.
The Supplementary Material to this article contains six sections (Sections A-F) on some extensions of the theoretical results to be given, Monte Carlo simulations, empirical applications and mathematical proofs.
We review notation for the subsequent sections.Let H denote a real separable Hilbert space with inner product •, • and norm • = •, • 1/2 .The space of bounded linear operator is denoted by L H , which is assumed to be equipped with the operator norm • L H .For any A ∈ L H , its adjoint (A * ), range (ran A), kernel (ker A), rank (rank A), and Moore-Penrose inverse (A † ) are introduced in Section E.1.Moreover, various properties of A ∈ L H such as compactness, positive (semi)definiteness and self-adjointness are also defined in detail.As discussed in Section E.1, if A is self-adjoint, positive semidefinite, and compact, we may define its m-regularized inverse A| † m = m j=1 a −1 j u j ⊗ u j where {a j } m j=1 are positive eigenvalues of A and {u j } m j=1 are the corresponding eigenvectors; A| † m is understood as the partial inverse of A on the restricted domain span({u j } m j=1 ) (for convenience, we let A| † m = 0 if A = 0).Random elements taking values in H and L H are introduced in Section E.2.We let L 2 H denote the space of H-valued random variables X satisfying EX = 0 and E X 2 < ∞.For any X ∈ L 2 H , its covariance operator is defined by C If A is a random linear operator taking values in L H and x 1 , . . ., x n , y 1 , . . ., y n ∈ H for n ∈ N, we let the distribution of ( Ax 1 , y 1 , . . ., Ax n , y n ) be called the finite dimensional distribution (fdd) of A with respect to the choice of vectors; if n = m 2 for m ∈ N, x j = e j and y k = e k for all j, k ≤ m, and {e j } m j=1 is an orthonormal basis of a subspace H m , then the associated fdd can be viewed as the distribution of A as a map from H m to H m If two random bounded linear operators A and B have the same fdd regardless of n and x 1 , . . ., x n , y 1 , . . ., y n ∈ H, we write A = fdd B.Moreover, if {A j } j≥1 is a sequence of random elements in L H and A j − A L H → p 0 for some A taking its value in L H , we write A j → L H A.

Cointegrated FTS in Hilbert space
We let {ε t } t∈Z be an iid sequence in L 2 H with a positive definite covariance and let {Φ j } j≥0 be a sequence in L H satisfying ∞ j=0 j Φ j L H < ∞.We define {ζ t } t∈Z and {η t } t∈Z as follows, where Φ j = − k≥j+1 Φ k .We consider a sequence {X t } t≥0 of which first differences ∆X t = X t − X t−1 satisfy the equations ∆X t = ζ t for t ≥ 1.Then by the Phillips-Solo decomposition (and ignoring the initial values X 0 and η 0 that are unimportant in the development of our asymptotic theory), we find that where ε c t = t s=1 ε s and Φ(1) = j≥0 Φ j .Note that {X t } t≥1 is a unit-root-type nonstationary process unless Φ(1) = 0; however it may have an element x such that { X t , x } t≥1 is stationary.Such an element is called a cointegrating vector, and the collection of the cointegrating vectors, denoted by H S , is called the cointegrating space.The attractor space, denoted by H N , is defined by the orthogonal complement to H S .When X t satisfies (2.1), H S (resp.H N ) is given by [ran Φ(1)] ⊥ (resp.the closure of ran Φ(1)), in this case, H = H N ⊕ H S and the orthogonal projection P N onto H N is uniquely defined (see Proposition 3.3 of Beare et al., 2017).Then P S = I − P N is the orthogonal projection onto H S .Note that {P N X t } t≥1 is a unit root process which does not allow any cointegrating vector while {P S X t } t≥1 is stationary.
{X t } t≥1 in (2.1) has zero mean by construction, but which is not essentially required in this paper.A deterministic component such as a nonzero intercept or a linear trend may be allowed in our analysis with a proper modification, which will be discussed in Section C of the Supplementary Material.
For our asymptotic analysis, we apply the following additional conditions throughout.
Assumption M-(i) is employed for mathematical proofs, which may not be restrictive in practice.Assumption M-(ii) implies that dim(H N ) < ∞, which is commonly assumed in the recent literature on cointegrated FTS for statistical analysis, see e.g., Chang et al. (2016) and Nielsen et al. (2022).
Remark 2.1.Finiteness of ϕ seems to be reasonable in many empirical examples; see Chang et al. (2016, Section 5) and Nielsen et al. (2022, Section 5).Moreover, it is known that functional autoregressive (AR) processes with unit roots (including functional ARMA processes that are considered in e.g.Klepsch et al., 2017) with compact AR operators always satisfy Assumption M-(ii); see Beare and Seo (2020), Franchi and Paruolo (2020) and Seo (2022).It is common to assume compactness of AR operators in statistical analysis of such time series, and thus Assumption M-(ii) does not seem to be restrictive in practice.
The case with H S = {0} for each ϕ is uninteresting for our study of cointegrated FTS, and thus such a case is not considered throughout this paper.Moreover, if ϕ = 0, {X t } t≥1 is stationary and thus H N = {0}; this is also an uninteresting case except when we examine the null hypothesis of ϕ = 0 in Section 4. Thus, our asymptotic theory, to be developed in this section, concerns the case when ϕ ≥ 1.
Related to a sequence {X t } t≥1 satisfying Assumption M, it will be convenient to introduce additional notation.Note that the identity I = P N + P S implies the following operator decomposition: for A ∈ L H , We also define note that each sequence of E N t , E S t and E t is stationary in H.We then let Ω (resp.Γ) denote the long run covariance (resp.one-sided long run covariance) operator of {E t } t∈Z , which are defined by (2.4) Ω and Γ are well defined under Assumption M. As in (2.2), we may also decompose those operators as Brownian motion in H whose covariance operator is given by Ω (see e.g., Chen and White, 1998).We then let W N = P N W and W S = P S W .Note that W N and W S are correlated unless Ω N S = Ω SN = 0.
For convenience in our asymptotic analysis, we employ the following assumption: in the assumption below, we let the long-run covariance Ω of E t be positive definite on H and, for x 1 , . . ., Appropriate lower-level conditions for the above can be found in Hansen (1992); specifically, Assump- converges weakly in the Skorohod topology to W k , (ii) sup t≥1 E t a < ∞, and (iii) {E t } t≥1 is strongly mixing of size −ab(a − b) for some a > b > 2. Among these conditions, the first is implied by Assumption M (see Lemma F.1 in the Supplementary Material).
3 FPCA of cointegrated FTS and asymptotic results

Ordinary FPCA of cointegrated FTS
We now consider estimation of P N and P S from observations {X t } T t=1 .Throughout this section, we assume that ϕ = dim(H N ) is known.Of course, this is not a realistic assumption in most applications.However, we already have a few available methods to determine ϕ such as those in Chang et al. (2016), Nielsen et al. (2022), andLi et al. (2022).In addition to those, it will be shown in Section 4 that the asymptotic results to be given in this section lead to a novel FPCA-based testing procedure to determine ϕ.
The sample covariance operator C of {X t } T t=1 is the random operator given by C = T −1 T t=1 X t ⊗X t .Broadly speaking, FPCA reduces to solving the following eigenvalue problem associated with C, where λ1 ≥ . . .≥ λT ≥ 0 = λT +1 = λT +2 = . . .and {v j } j≥1 constitutes an orthonormal basis of H.
From the estimated eigenvectors {v j } j≥1 , we construct our preliminary estimators of P N and P S as follows, Note that ϕ as a subscript in (3.2) denotes the the number of eigenvectors used to construct P N ϕ ; even if keeping such a subscript may increase notational complexity, it will be eventually helpful to avoid potential confusion when we need to consider P N ϕ 0 , the projection onto span{v j } ϕ 0 j=1 , for a hypothesized value ϕ 0 in Section 4. To investigate the asymptotic properties of the preliminary estimators, we may establish the limiting behavior of either of P N ϕ − P N or P S ϕ − P S (one is simply given by the negative of the other).The following theorem deals with the former case: we hereafter write A(r) to denote 1 0 A(r)dr for any operator-or vector-valued function A defined on [0, 1].
Theorem 3.1.Suppose that Assumptions M and W hold with ϕ ≥ 1.Then Chang et al. (2016, Proposition 2.1) earlier established that P N ϕ − P N L H = O p (T −1 ).Theorem 3.1 extends their result by providing a more detailed limiting behavior of P N ϕ .Even if Theorem 3.1 implies consistency of P N ϕ , it does not facilitate a direct asymptotic inference since F depends on nuisance parameters.We particularly focus on Γ N S and Ω N S ; as shown in Theorem 3.1, Γ N S = 0 makes the limiting ran-dom operator F + F * be not centered at zero, and Ω N S = 0 makes W N and W S be correlated.Therefore, the asymptotic limit of any statistic based on P N ϕ or P S ϕ is, if it exists, also dependent on Γ N S and Ω N S in general.As will be shown throughout this paper, elimination of such dependence is the most important step to our modified FPCA and statistical tests based on it.

Modified FPCA for cointegrated FTS
We now propose a modification of the ordinary FPCA, which not only provides an asymptotically more efficient estimator of P N (or P S ) but establishes the foundation for our statistical tests that will be developed in Section 4. Our modified FPCA may be viewed as an adjustment of the ordinary FPCA for cointegrated FTS rather than an alternative methodology; we actually take full advantage of the asymptotic properties of the preliminary estimators given in Section 3.1.To introduce our methodology, we first define where P N ϕ and P S ϕ are the preliminary estimators obtained from the ordinary FPCA in Section 3. Then the sample counterparts of Ω and Γ given in (2.4) are defined by where k(•) (resp.h) is the kernel (resp.the bandwidth) satisfying the following conditions: h → ∞ and h/T → 0 as T → ∞.
Note that the one-sided kernel k(•) in Assumption K satisfies the identical conditions given by Horváth et al.
(2013) for estimation of the long run covariance of a stationary FTS.From the identity I = P N ϕ + P S ϕ , we may decompose Ω ϕ and Γ ϕ as in (2.2), i.e., We then define a modified variable X ϕ,t as follows, The ϕ-regularized inverse Ω N N ϕ | † ϕ in (3.5) can easily be computed from eigendecomposition of Ω N N once we know ϕ.Let C ϕ be the sample covariance operator of {X ϕ,t } T t=1 , i.e., C ϕ = T −1 T t=1 X ϕ,t ⊗ X ϕ,t .Roughly speaking, replacing C with C ϕ in (3.1) has the effect of transforming W S , appearing in the expression of F in Theorem 3.1, into another Brownian motion that is independent of W N ; however, this does not resolve the issue that the center of the limiting operator depends on nuisance parameters.We therefore need a further adjustment, which is achieved by considering the following modified eigenvalue problem: where Note that the operator C ϕ − Υ ϕ − Υ * ϕ is self-adjoint, so (3.6) does not produce any complex eigenvalues.
From the estimated eigenvectors { ŵj } j≥1 , we construct our proposed estimators of P N and P S as follows: Note that Π N ϕ and Π S ϕ are simply obtained by replacing vj with ŵj in (3.2).The asymptotic properties of Π N ϕ or Π S ϕ are described by the following theorem.
Theorem 3.2.Suppose that Assumptions M, W and K hold with ϕ ≥ 1.Then and W S.N is independent of W N .
The asymptotic limit of the proposed estimator Π N ϕ is of a more convenient form than that of the preliminary estimator based on the ordinary FPCA.First note that the limiting operator G + G * is now centered at zero, whereas F + F * in Theorem 3.1 is not so in general due to Γ N S .In addition, G is characterized by two independent Brownian motions while F is not so in general due to Ω N S .These properties of Π N ϕ not only help us develop statistical tests for examining various hypotheses about cointegration in Section 4, but also make Π N ϕ asymptotically more efficient than P N ϕ in a certain sense, see Remark 3.1.
Remark 3.1.For any k, x = (x 1 , . . ., x k ) ∈ H k and y = (y 1 , . . ., y k ) ∈ H k , let P (k, x, y) = ( T P N P S ϕ x 1 , y 1 , . . ., T P N P S ϕ x k , y k ) and Π(k, x, y) = ( T P N Π S ϕ x 1 , y 1 , . . ., T P N Π S ϕ x k , y k ) .From similar arguments used in Saikkonen (1991, Theorem 3.1) and Harris (1997, Section 2), the following can be shown: for any choice of k, x, y, and Θ ⊂ R k that is convex and symmetric around the origin, lim T →∞ Prob.{Π(k, x, y) ∈ Θ} ≥ lim T →∞ Prob.{P (k, x, y) ∈ Θ} and the equality does not hold in general unless Ω N S = Γ N S = 0.A detailed discussion is given at the end of Section F.1.2 in the Supplementary Material.This implies that the asymptotic distribution of Π(k, x, y) is more concentrated at zero than that of P (k, x, y).A similar result can be shown for P S Π N ϕ and P S P N ϕ .In this sense, we say that Π N ϕ is more asymptotically efficient than P N ϕ .
Remark 3.2.Our methodology can be applied in the case where dim(H) < ∞ without further adjustment, which is differentiated from that the existing methods (Nielsen et al., 2022, Chang et al., 2016, and Li et al., 2022) are only discussed in a specific infinite dimensional Hilbert space setting.In this finite dimensional case, the modified FPCA is related to the PCA methodology of Harris (1997), but the latter cannot directly be applied to an infinite dimensional setting.For more details, see Section A of the Supplementary Material.
Remark 3.3.We assumed that ϕ = dim(H N ) is known in this section, but this is not generally possible for practitioners.Of course, ϕ can be replaced by its consistent estimator and this does not affect the asymptotic result given by Theorem 3.2.Nevertheless, as long as this replacement is necessary, our estimator is not guaranteed to be better (in the sense of Remark 3.1) than the ordinary FPCA-based estimator in finite samples; more detailed discussion on this issue is beyond the scope of the present article and hence we leave it as a future work.Despite this limitation, Theorem 3.2 by itself can be a basis for statistical inference on cointegrated FTS, and one such example is our test to be presented in the next section.
4 Applications: statistical tests based on the modified FPCA We develop statistical tests to examine various hypotheses about H N or H S based on the asymptotic results given in Section 3.2.As in the previous sections, we focus on the case without deterministic terms; the discussion is extended to allow a deterministic component in Section C of the Supplementary Material.
4.1 FPCA-based test for the dimension of H N In the previous sections, we need prior knowledge of ϕ = dim(H N ).We here provide a novel FPCA-based test that can be applied sequentially to determine ϕ.Consider the following null and alternative hypotheses, for ϕ 0 ≥ 0. It is worth mentioning that stationarity of {X t } t≥1 , i.e., ϕ 0 = 0, can be examined by the test to be developed.This means that our test can be used as an alternative to the existing tests of stationarity of FTS proposed by e.g.Horváth et al. (2014), Kokoszka and Young (2016), and Aue and van Delft (2020).
It should also be noted that our selection of hypotheses in (4.1) is the opposite of that used for the existing tests proposed by Chang et al. (2016) and Nielsen et al. (2022) in the sense that the alternative hypothesis in those tests is set to H 1 : dim(H N ) < ϕ 0 .Due to this difference, our test has its own advantage especially when it is sequentially applied to estimate ϕ; this will be more detailed in Remark 4.2.
For each ϕ 0 , we define P N ϕ 0 and P S ϕ 0 as in (3.2) from the eigenproblem (3.1); if ϕ 0 = 0, P N ϕ 0 is set to zero.Given P N ϕ 0 and P S ϕ 0 , define Z ϕ 0 ,t , Ω ϕ 0 , Γ ϕ 0 , Υ ϕ 0 , X ϕ 0 ,t , and C ϕ 0 as in Section 3; see (3.3)-(3.7).We expect from Theorem 3.1 that { P S ϕ 0 X t } t≥1 will behave as a stationary process if ϕ 0 = ϕ (since P S ϕ 0 − P S L H = O p (T −1 )), and the eigenvectors {v j } ϕ j=ϕ 0 +1 are asymptotically included in H N if ϕ 0 < ϕ.In the latter case, ( X t , vϕ 0 +1 , . . ., X t , vϕ ) is expected to behave as a unit root process, hence { P S ϕ 0 X t } t≥1 will not be stationary.Based on this idea, it may be reasonable to ask if we can distinguish the correct hypothesis from incorrect ones specifying ϕ 0 < ϕ by examining stationarity of { P S ϕ 0 X t } t≥1 .To simplify the discussion, we focus on the time series { X t , vϕ 0 +1 } t≥1 , which is expected to be stationary under H 0 .To examine stationarity of { X t , vϕ 0 +1 } t≥1 , we consider the following test statistic, where , and k(•) and h satisfy Assumption K.Note that the test statistic is given by the ratio of the sum of squared partial sums of the univariate time series ({ X t , vϕ 0 +1 } T t=1 ) to its sample long-run variance (LRV( X t , vϕ 0 +1 )).This is similar to the well-known KPSS test for examining the null hypothesis of stationarity; however, it should be noted that (4.2) is not identical to the original LM statistic proposed by Kwiatkowski et al. (1992) unless { X t , vϕ 0 +1 } T t=1 is demeaned (see Appendix of their paper).If Ω N S = Γ N S = 0, the asymptotic null distribution of (4.2) does not depend on any nuisance parameters and is given by a functional of two independent standard Brownian motions.We thus may assess the plausibility of the null hypothesis with no significant difficulty.In general cases where Ω N S = Γ N S = 0 is not satisfied, the limiting distribution of (4.2) depends on both of Ω N S and Γ N S , which are unknown in practice (see Theorem B.1 and its proof given in Section F.1 of the Supplementary Material).As expected from Theorem 3.1, this issue is linked closely to the fact that the asymptotic limit of P S ϕ 0 depends on Ω N S and Γ N S .The assumption that Ω N S = Γ N S = 0 is too restrictive in modeling cointegrated FTS, hence a naive use of (4.2) in practice to examine (4.1) should be limited to very special circumstances.However, using the asymptotic results developed for our modified FPCA, the test statistic (4.2) can be modified to have an asymptotic null distribution that is free of nuisance parameters in general cases.Consider the following eigenvalue problem: We then replace X t and vϕ 0 +1 in (4.2) with X ϕ 0 ,t and ŵϕ 0 +1 , respectively, and obtain the following statistic: There is a big gain from this simple replacement: different from (4.2), the asymptotic null distribution of (4.4) does not depend on any nuisance parameters and is given by a functional of two independent standard Brownian motions without requiring Ω N S = Γ N S = 0; this is, of course, closely related to the asymptotic result given by Theorem 3.2.We thus may asymptotically assess the plausibility of the null hypothesis relative to the alternative as follows: below, we let B and W denote independent ϕ 0 -dimensional and onedimensional standard Brownian motions, respectively.
Proposition 4.1.Suppose that Assumptions M, W and K hold.Then Q → d V 2 (s)ds under H 0 of (4.1) and If ϕ 0 = 0 (and thus X ϕ,t = X t , C ϕ 0 = C, and Γ ϕ 0 = Γ * ϕ 0 = 0), the above test is similar but not identical to the test of Horváth et al. (2014); their test is based on computation of the eigenelements of the sample long-run covariance operator of X t unlike ours is based on those of C. The asymptotic test given in Proposition 4.1 in fact corresponds to a special case of the general version of our FPCA-based test, which is discussed in the Supplementary Material (Section B); see also Section C.2 for our extension of the test for the case where there is a nonzero intercept and/or a linear trend.
Remark 4.1.To determine ϕ, we may apply our test for ϕ 0 = 0, 1, . . .sequentially.Let φ denote the value under H 0 that is not rejected for the first time at a fixed significance level α.We deduce from Theorem B.1 that Prob.{ φ = ϕ} → 1 − α and Prob.{ φ < ϕ} → 0. Note that, even if the sequential procedure requires multiple applications of the proposed test, the correct asymptotic size, α, is guaranteed without any adjustments; this property is shared by the existing sequential procedures developed in a Euclidean/Hilbert space setting (see e.g., Johansen, 1995;Nyblom and Harvey, 2000;Chang et al., 2016;Nielsen et al., 2022).
Remark 4.2.In Chang et al. (2016) and Nielsen et al. (2022), ϕ is determined by testing the following hypotheses: for a pre-specified positive integer ϕ max and ϕ 0 = ϕ max , ϕ max − 1, . . ., 1, sequentially until H 0 is not rejected for the first time.Note that we need a prior information on an upper bound of ϕ, i.e., ϕ max , for this type of procedure.However, in an infinite dimensional setting, there is no nat-ural upper bound of dim(H N ).On the other hand, our sequential procedure described in Remark 4.1 does not require such a prior information; the procedure first examines the null hypothesis H 0 : dim(H N ) = 0, and 0 is the minimal possible value of dim(H N ).Even with this difference resulting from a different formulation of the alternative hypothesis, our method can be compared with the existing ones as a way to estimate ϕ.From our simulation study we find that our procedure seems to perform comparably with the recent method proposed by Nielsen et al. (2022) and tends to work better if the stationary component is persistent; see Section D. 1 and Table 8.If ϕ ≥ 1 is known in advance, ϕ can also be estimated by the eigenvalue ratio criterion proposed by Li et al. (2022) under their assumptions.Our procedure given in Remark 4.1 is significantly differentiated from any of these existing procedures, and thus ours can complement those in practice.

Tests of hypotheses about cointegration
Practitioners may be interested in testing various hypotheses about H N or H S .For example, we may want to test if a specific element x 0 is included in H N or the span of a specified set of vectors contains H N .More generally, we here consider testing the following hypotheses: for a specified subspace M, Let P M denote the projection onto M and assume that ϕ is known; in practice, we may apply any statistical method discussed in Remarks 4.1 and 4.2 to determine ϕ.In this case, (4.5) and (4.6) can be tested by examining the dimension of the attractor space associated with the residuals Specifically, let Q M be the test statistic computed as in (4.4) from {(I − P M )X t } T t=1 for ϕ 0 = ϕ − dim(M) (resp.ϕ 0 = 0) if we examine (4.5) (resp.(4.6)).Then we may deduce from Proposition 4.1 that, under H 0 of (4.5), Q M → d W 2 (s)ds while it diverges to infinity under H 1 of (4.5).
Remark 4.3.We assess the plausibility of H 0 in (4.5) or (4.6) by checking if the dimension of the attractor space associated with {(I−P M )X t } t≥1 is higher than that is implied by H 0 .This is possible since our test is consistent against the alternative hypotheses with higher dimensional attractor spaces.Unlike our proposed test, the top-down tests proposed by Chang et al. (2016) and Nielsen et al. (2022) cannot be used in this way.

Numerical studies on the proposed tests: a brief summary
In Sections D.1-D.2 of the Supplementary Material, we examine the finite sample properties of our proposed tests for (4.1), (4.5) and (4.6).The tests seem to perform reasonably but slightly over-reject the null hypothesis when the stationary component is persistent.We also compare the finite sample performance of our test as a way to estimate ϕ = dim(H N ) (Remark 4.1) with the recent testing procedure of Nielsen et al. (2022).
We found that, overall, ours performs comparably with theirs in general; moreover, in our simulation setting with a persistent stationary component, ours tends to outperform theirs.The performance of our testing procedure seems to be dependent on the choice of h while the method proposed by Nielsen et al. (2022) does not require such a bandwidth choice (since theirs does not require any long-run covariance estimation).Despite this disadvantage, we may conclude from the simulation evidence that our testing procedure can be an attractive alternative or complement to theirs.Of course, as mentioned in Remark 4.3, the proposed test in the present paper can also be used to examine various hypotheses on cointegration from a relevant residual FTS, which may be another advantage of ours over the test of Nielsen et al. (2022).In addition to the simulation studies, we in Sections D.3 and D.4 illustrate our methodology with two empirical examples: U.S. age-specific employment rates and monthly earning densities.

Concluding remarks : some cautions and future direction
This note adds some novel asymptotic results for the FPCA-based estimator of P N or P S of cointegrated FTS.We then propose a modification of FPCA for statistical inference on the cointegrating behavior.Some cautions need to be made for practitioners.First, in order for the modified FPCA to work well, we need to accurately estimate P N first; however, as remarked by Nielsen et al. (2022), it is not easy particularly if dim(H N ) is large but the sample size is small.Moreover, this note does not address in detail on the choice of the bandwidth parameter h used to estimate long-run covariance operators of some residual time series considered in the note.Given that all such time series are expected to be stationary, the method proposed by Rice and Shang (2017) may be used, but this issue obviously requires a further investigation.Despite these limitations, our proposed methodology gives us some useful results, such as statistical tests on dim(H N ) and cointegrating properties.To put it in more detail, our test as a way to estimate dim(H N ) performs comparably with the existing methods (it is found to perform better for some simulation setting considered in Section D.1), and the test by itself can be used to examine various hypotheses on cointegration (see Remark 4.3); see also Remarks 3.1 and 3.2.The theoretical results in this paper may be used for more sophisticated statistical models requiring dimension-reduction of a FTS exhibiting unit-root-like behavior; a potential example may be the cointegrating regression model involving functional regressors.Provided that practitioners want to employ FPCA-based dimension-reduction, which has been widely used in various contexts, our results will be helpful to derive detailed asymptotic properties of estimators for such models.Supplementary Material on "Functional principal component analysis for cointegrated functional time series" Abstract This supplementary material consists of two parts.In Part I, we provide some relevant extensions of the theoretical results given in the main article.In Part II, we provide mathematical justifications for the results given in the main article and Part I.

Outline
Part I of this supplementary material consists of four sections (Sections A-D).Section A shows how our modified FPCA is related to the PCA methodology proposed by Harris (1997).Section B provides a more general version of the statistical test given in Section 4.1 and Section C extends the theoretical results given in the main article (and also those in Section B).In Section D, we provide simulation results (to see the finite sample performances of the proposed tests) and empirical applications (to illustrate our methodology for practitioners).In Part II, we provide mathematical justifications for the theoretical results given in the main article and Part I. Section E summarizes mathematical preliminaries and Section F provides proofs.

Part I : Supplementary results and numerical studies
A Modified FPCA in a finite dimensional setting We consider a special case when dim(H) < ∞ and the minimum eigenvalue of E[E t ⊗E t ] is strictly positive.
Even if our modified FPCA can be applied without any adjustment in this case, we here provide a way to obtain an estimator whose asymptotic properties are equal to those of Π N ϕ given in Theorem 3.2.Define where Let Cϕ be the sample covariance of { Ẍϕ,t } T t=1 and consider the following eigenvalue problem, Cϕ ẅj = μj ẅj , j = 1, 2, . . ., dim(H). (A.1) We then define ΠN ϕ = ϕ j=1 ẅj ⊗ ẅj and ΠS ϕ = I − ΠN ϕ .In fact, (A.1) is a simple adaptation of the eigenvalue problem proposed in Harris (1997) for multivariate cointegrated systems.Based on the asymptotic results given in Harris (1997), the following can be shown.
Proposition A.1.Suppose that Assumptions M, W and K hold with ϕ ≥ 1, dim(H) < ∞, and the minimum where G is given in Theorem 3.2.
However, the asymptotic result given in Proposition A.1 essentially requires C −1 ϕ,Z to converge in probability to the inverse of E[E t ⊗ E t ] (see the proof of Theorem 2 in Harris, 1997).In an infinite dimensional setting, E[E t ⊗ E t ] does not allow its inverse as an element of L H and, moreover, C −1 ϕ,Z does not converge to any bounded linear operator since the inverse of the minimum eigenvalue of E[E t ⊗ E t ] diverges to infinity.This is the reason why Proposition A.1, unlike Theorem 3.2, is not applicable in an infinite dimensional setting.
Remark A.1.As in Section 3 of Harris (1997), it may also be possible to develop Wald-type tests of various linear restrictions on the cointegrating vectors H S and they will have standard χ 2 limiting distributions (due to the mixed Gaussianity induced by the modified estimator, see the proof of Theorem 4 of Harris, 1997) with degrees of freedom increasing linearly in dim(H).However, this approach explicitly requires a finite dimensional setting and thus not directly applicable to a more general potentially infinite dimensional setting.On the other hand, our tests developed in Section 4.2 can be used regardless of if dim(H) < ∞ or not.

B General version of the test given in Proposition 4.1
Let Π K ϕ 0 be the projection onto the span of { ŵj } K j=1 for any arbitrary finite integer K in Once { ŵj } K j=1 are given, z ϕ 0 ,t can easily be computed.It should be noted that z ϕ 0 ,t may be understood as Π K ϕ 0 Π S ϕ 0 X ϕ 0 ,t , i.e., the projected image of possibly infinite dimensional element Π S ϕ 0 X ϕ 0 ,t on span({ ŵj } K j=1 ) of dimension K; this K-dimensional time series of z ϕ 0 ,t is all we need to construct our test statistic.It may be helpful to outline how the sequence {z ϕ 0 ,t } t≥1 behaves under H 0 and H 1 prior to defining the test statistic.From the asymptotic results derived in Section 3, the following can be shown (Lemma From the above results, we expect that the subvector ( X ϕ 0 ,t , ŵϕ 0 +1 , . . ., X ϕ 0 ,t , ŵϕ ) of z ϕ 0 ,t will behave as a unit root process while the remaining subvector will behave as a stationary process.Only when ϕ 0 = ϕ, {z ϕ 0 ,t } t≥1 will behave as a stationary process.Based on this idea, our test statistic is constructed to examine stationarity of {z ϕ 0 ,t } t≥1 as follows: where The test statistic is similarly constructed as other KPSS-type statistics developed for multivariate cointegrated systems; see e.g.Shin (1994), Choi andAhn (1995), andHarris (1997).Computation of our test statistic is easy once the projected time series {z ϕ 0 ,t } T t=1 is obtained from the modified eigenvalue problem (4.3).If K = ϕ 0 + 1, the test statistic (B.2) becomes identical to (4.4).
We hereafter let B and W denote independent ϕ 0 -dimensional and (K − ϕ 0 )-dimensional standard Brownian motions, respectively.The asymptotic properties of our test statistic are described by these independent Brownian motions as follows.
Theorem B.1.Suppose that Assumptions M, W and K hold, and K is a finite integer in (ϕ 0 , dim(H)].
Under H 0 of (4.1), where , and the second term is regarded as zero if

C Inclusion of deterministic terms
We now extend the main results given in Sections 3, 4 and B to allow deterministic terms that may be included in the time series of interest.The proofs of the results in this section will be given later in Section F.
In particular, we in this section consider the following unobserved component models: where X t is a cointegrated time series considered in Section 3. We define the functional residual U t as in Kokoszka and Young (2016): for t = 1, . . ., T , The sample covariance operator of {U t } T t=1 is given by

Extension of the results given in Section 3
Consider the eigenvalue problems given by We similarly obtain our preliminary estimator P Define for t = 1, . . ., T , The sample long run covariance and one-sided long run covariance of {Z ϕ,t } T t=1 are similarly defined as in (3.4), and denoted by Ω ϕ (resp.Γ ϕ ).As in (2.2), we consider the following operator decompositions: for i ∈ {N, S} and j ∈ {N, S}, For t = 1, . . ., T , define As in Section 3.2, we let and consider the following modified eigenvalue problems that are parallel to (3.6), We then construct Π Moreover, we let F and G be random bounded linear operators satisfying that where, as in Theorem 3.2, W S.N = W S − Ω SN (Ω N N ) † W N and W S.N is independent of W N .The asymptotic properties of the estimators are given as follows.
Theorem C.1.Suppose that Assumptions M, W and K hold with ϕ ≥ 1.
Moreover, as in Remark 3.1, it can be shown without difficulty that Π N ϕ is more asymptotically efficient than P N ϕ ; see Remark 3.1 and our detailed discussion to be given in Section F.1.2.

C.2 Extension of the results given in Sections 4 and B
For any hypothesized value ϕ 0 , we similarly construct P N ϕ 0 and P S ϕ 0 from (C.3).Define Z ϕ 0 ,t , Ω ϕ 0 , Γ ϕ 0 , Υ ϕ 0 , U ϕ,t , and C ϕ 0 for each of Model D1 and Model D2 as in Section C. We then consider the following modified eigenvalue problems, As in (B.1), we define the following: for K > ϕ 0 , To examine the null and alternative hypotheses given in (4.1), we consider the following statistics for Model D1 and Model D2 respectively; To describe the asymptotic properties of Q(K, ϕ 0 ), we let where B and W are the independent Brownian motions that are defined for the case with no deterministic terms.The asymptotic properties of the statistics are given follows: Theorem C.2. Suppose that Assumptions M, W and K hold, and K is a finite integer in (ϕ 0 , dim(H)]. where and the second term in the expression of V is regarded as zero if ϕ 0 = 0.Under H 1 and each of the models, Tables 2 and 3 of Harris (1997) report critical values for some choices of K and ϕ 0 .At least to some extent, our test may be viewed as an extension of the PCA-based test proposed by Harris (1997), which extends the KPSS test for univariate/multivariate cointegrated time series.In such a finite dimensional setting, the test statistic (B.2) can be reconstructed by the outputs from the PCA described in Section A and K can be set to dim(H); this makes our test become identical to that given by Harris (1997).
It should be noted that the asymptotic null distribution given in Theorem B.1 depends on K.This is a new aspect that arises from the fact that the proposed test examines stationarity of potentially infinite dimensional time series { Π S ϕ 0 X ϕ 0 ,t } t≥1 by projecting it onto a K-dimensional subspace as in Horváth et al. (2014).As a consequence, critical values for the test statistic depend on both K and ϕ 0 ; however, for any fixed K and ϕ 0 , those can easily be simulated by standard methods since the limiting distribution of the test statistic is simply given by a functional of two independent standard Brownian motions.One can refer to Table 1 of Harris (1997) reporting critical values for a few different choices of K and ϕ 0 , with the caution that the table reports critical values depending on the dimension of time series and the cointegration rank, which correspond to K and K − ϕ 0 in this paper, respectively.Even if Theorem B.1 holds for any arbitrary finite integer K in (ϕ 0 , dim(H)], it may be better, in practice, to set K to be only slightly greater than ϕ 0 , such as K = ϕ 0 + 1 or ϕ 0 + 2; our simulation results support that a large value of K tends to yield worse finite sample properties of the test; see Section D.1 and also the discussion given by Nielsen et al. (2022, Section 3.5) in a similar context.

D Numerical studies D.1 Monte Carlo Simulations
We first investigate the finite sample performances of our tests given in Sections 4 and B by Monte Carlo study.For all simulation experiments, the number of replications is 2000, and the nominal size is 5%.

Simulation setups
Let {f j } j≥1 be the Fourier basis of L 2 [0, 1], the Hilbert space of square integrable functions on [0, 1] equipped with inner product f, g = f (u)g(u)du for f, g ∈ L 2 [0, 1].For each ϕ (≤ 5, in our simulation expements), we let E N t and E S t be generated by the following stationary functional AR(1) models: where {g N j } ϕ j=1 (resp.{g S j } 10 j=1 ) are randomly drawn from {f 1 , . . ., f 6 } (resp.{f 7 , . . ., f 18 }) without replacement, and ε t is given as follows: for standard normal random variables {θ j,t } j≥1 that are independent across j and t, ε t = 80 j=1 θ j,t (0.95) j−1 f j .We then construct X t from the relationships P N ∆X t = E N t and P S X t = E S t for t ≥ 1.Note that H N is given by the span of {g N j } ϕ j=1 , which is not fixed across different realizations of the DGP; moreover, an orthonormal basis {g N j } ϕ j=1 of H N is selected only from a set of smoother Fourier basis functions {f j } 6 j=1 .The former is to avoid potential effects caused by the particular shapes of H N as in Nielsen et al. (2022), and the latter is not to make estimation of H N with small samples too difficult: as expected from the results given by Nielsen et al. (2022), the finite sample performances of tests for the dimension of H N may become poorer as H N includes less smooth functions.We also let {α j } ϕ j=1 and {β j } 10 j=1 be randomly chosen as follows: for some β min and β max , It is known that persistence of the stationary component ({E S t } t∈Z in this paper) has a significant effect on finite sample properties of KPSS-type tests (see e.g., Kwiatkowski et al., 1992;Nyblom and Harvey, 2000).
We thus will investigate finite sample properties of our tests under a lower persistence scheme (β min = 0, β max = 0.5) and a higher persistence scheme (β min = 0.5, β max = 0.7).It may be more common in practice to have a nonzero intercept or a linear time trend in (2.1), and Section C.2 provides a relevant extension of Theorem B.1 to accommodate such a case.We here consider the former case and add an intercept µ 1 , which is also randomly chosen, to each realization of the DGP (some simulation results for the case with a linear trend are reported in Table 5); specifically, µ 1 = 4 j=1 θj p j / 4 j=1 θ2 j , { θj } 4 j=1 are independent standard normal random variables, and p j is (j − 1)-th order Legendre polynomial defined on [0, 1].Finally, functional observations used to compute the test statistic are constructed by smoothing X t observed on 200 regularly spaced points of [0, 1] using 20 quadratic B-spline basis functions (the choice of basis functions has minimal effect in our simulation setting).
We need to specify the kernel function k(•), the bandwidth h, and a positive integer K satisfying K > ϕ 0 to implement our test.In our simulation experiments, k(•) is set to the Parzen kernel, and two different values of h, T 1/3 and T 2/5 , are employed to see the effect of the bandwidth parameter on the finite sample performances of our tests.Moreover, our simulation results reported in this section are obtained by setting K = ϕ 0 + 1, which is the smallest possible choice of K. We found some simulation evidence supporting that this choice of K tends to result in better finite sample properties; to see this, one can compare the results reported in Table 1 and those in Tables 3 and 4.

Simulation results
We investigate the finite sample performance of the proposed test for the dimension of H N .In our simulation experiments, finite sample powers are computed when ϕ = ϕ 0 + 1; as ϕ gets larger away from ϕ 0 , our test tends to exhibit a better finite sample power as is expected (see Table 2 with Table 1).Table 1 summarizes the simulation results under the two different persistence schemes.Under the lower persistence scheme (β min = 0, β max = 0.5), for all considered values of ϕ and h the test has excellent size control with a reasonably good finite sample power.On the other hand, it displays over-rejection under the higher persistence scheme (β min = 0.5, β max = 0.7).Our simulation results evidently show that (i) such an over-rejection tends to disappear as T gets larger and/or h gets bigger, and (ii) choosing a bigger bandwidth with fixed T tends to lower finite sample power.To summarize, employing a bigger bandwidth helps us avoid potential overrejection, which can happen when {E S t } t∈Z is persistent, at the expense of power.This trade-off between correct size and power in finite samples seems to be commonly observed for other KPSS-type tests; see e.g.Kwiatkowski et al. (1992) and Nyblom and Harvey (2000).
We also investigate finite sample properties of the tests for examining hypotheses about H N or H S .Among many potentially interesting hypotheses, we consider the following: for a specific vector x 0 ∈ H and an orthonormal set {x j } ϕ j=1 ⊂ H, Note that span({g N j } ϕ j=1 ) = H N and g S 1 ∈ H S in each realization of the DGP (see (D.1)).In our simulation experiments for (D.2), finite sample sizes and powers are computed by setting Clearly, x 0 = g N 1 ∈ H N if γ = 0. On the other hand, x 0 deviates slightly from H N in the direction of g S 1 when γ > 0, but such a deviation gets smaller as T increases.In our experiments for (D.3), finite sample sizes and powers are computed by setting x j = g N j for j = 2, . . ., ϕ, γ = 0, 20, 40, 60. (D.4) Obviously, γ = 0 implies that span({x j } ϕ j=1 ) = H N in (D.4) while γ > 0 makes span({x j } ϕ j=1 ) deviates slightly from H N in the direction of g S 1 .The simulation results for our tests of the hypotheses (D.2) and (D.3) for a few different values of ϕ are reported in Tables 6 and 7; since the proposed tests are essentially simple modifications of our test for the dimension of H N , they seem to have similar finite sample properties to those we observed in Table 1.We can also observe a trade-off between correct size and power when {E S t } t∈Z is persistent in each of Tables 6 and 7.As mentioned in Remark 4.1, our test can sequentially be applied to estimate ϕ = dim(H N ) as an alternative method to the existing ones.In Table 8, we compare ours with the recent testing procedure proposed by Nielsen et al. (2022).According to our simulation results, our proposed method in general tends to perform comparably with that of Nielsen et al. (2022).In panel (a) where the stationary component is not persistent, even if the test of Nielsen et al. (2022) tends to be better when ϕ is large, our test appears to exhibit some advantages when T is small and ϕ ≤ 3. Thus the results do not indicate any absolute advantage.On the other hand, in panel (b) where the stationary component is persistent, the performance of our testing procedure tends to be significantly better than that of Nielsen et al. (2022).This simulation evidence (with our additional simulation results which are reported in panel (c) of Table 8) suggests that ours can be an attractive alternative or complement to the test of Nielsen et al. (2022) for practitioners.
Among the considered tests in this section, our test for (D.2) requires a prior knowledge on the true dimension ϕ = dim(H N ) and the rejection frequencies given in Table 6 are computed assuming that ϕ is known.In Table 9, we replace ϕ with φ obtained from our testing procedure (Remark 4.1).In this case, the magnitude of over-rejection is found to be bigger overall compared to that in the previous case with known ϕ.Such an over-rejection seems to be particularly severe when ϕ is large and T is small.This may be due to that inaccuracy of φ tends to be bigger when ϕ is larger and/or T is smaller (see Table 8).

D.2 Supplementary simulation results
Table 2: Supplementary results to Table 1 (a)    are independent copies of N (0, 1), and pj is (j − 1)-th order Legendre polynomial defined on [0, 1].µ2 is generated in a similar manner.Notes: ϕ is estimated by our proposed method with h = T 1/3 and h = T 2/5 .For comparison, ϕ is also estimated by the sequential method proposed by Nielsen et al. (2022) when the dimension of the asymptotic superspace (see Remark 13, Nielsen et al., 2022) is set to ϕ0 + (NSS1) and ϕ0 + 2 (NSS2).The maximal possible number of stochastic trends, which is also a necessary tuning parameter for their method, is set to 7. (2022, Section 5.1).Such employment rates take values in [0, 1] by construction, hence we take the logit transformation φ(Z a,t ) as suggested by Nielsen et al. (2022), and then obtain functional observations X t (u) for u ∈ [15, 64] by smoothing φ(Z a,t ) over a using 18 quadratic B-spline functions.Due to this construction, {X t } T t=1 may be understood as a high but finite dimensional time series; however as discussed in Section A, our methodology is compatible with the finite dimensionality of X t , so this does not cause any theoretical issues in applying our methodology.In Figure 1 we plot the functional observations and univariate time series { X t , x } t≥1 for some choices of x, which may help us explore characteristics of the FTS.Specifically we consider x = x 1 , x 2 and x 3 defined as follows: Clearly, X t , x 1 (resp.X t , x 2 ) computes the average (logit) employment rate of individuals aged less than 40 years (resp.no less than 40 years), and X t , x 3 computes their difference.Provided that the two Notes: T = 408.We use * and * * to denote rejection at 5% and 1% significance level, respectively; the approximate critical values for 95% (resp.99%) are given by 0.15, 0.12, and 0.10 (resp.0.22, 0.18, and 0.15) sequentially.
time series given in Figure 1-(b) seem to be nonstationary, we expect that there exists at least one stochastic trend; however, it is not possible to conclude from the plots that there are multiple stochastic trends since a single stochastic trend, say w, can make both time series of X t , x 1 and X t , x 2 nonstationary if w, x 1 and w, x 2 are nonzero.It can also be seen from Figure 1-(c) that for some choice of x the time series of X t , x may exhibit a lower persistence than those of x 1 and x 2 ; this may be suggestive of the possible existence of cointegrating vectors.
When a cointegrated FTS is given, it is important to estimate H N .We already know from our earlier results that H N can be estimated by the span of the first ϕ (=dim(H N )) eigenvectors computed from the proposed eigenvalue problem (3.6); however in practice ϕ is unknown.We thus apply our FPCA-based sequential procedure to determine ϕ.As in Nielsen et al. (2022), the model with a linear trend (see Section C) is considered for this empirical example.Table 10 reports the test results when k(•), h, and K are set to the Parzen kernel, T 2/5 , and ϕ 0 + 1, respectively.The proposed sequential procedure concludes that the dimension of H N is 1 both at 5% and 1% significance levels, hence H N can be estimated by the span of the first eigenvector w 1 obtained from our modified FPCA. Figure 2 shows the estimated eigenvector and the time series { U t , w 1 } T t=1 .Note that the time series of X t , x for any x ∈ H N is a unit root process with a linear deterministic trend, hence the time series of U t , w 1 is expected to behave as a unit root process, which is as shown in Figure 2-(b).
In practice, one may also be interested in testing various hypotheses about cointegration such as (4.5) and (4.6).These hypotheses can be examined by the tests given in Section 4.2 once ϕ (=dim(H N )) is estimated.For illustrative purposes, we consider the following hypotheses, where x = x 1 , x 2 or x 3 given in (D.5).The test results are reported in Table 11 under the assumption that  ϕ = 1 as we earlier concluded from our sequential procedure.The null hypothesis that x ∈ H N is rejected in every case at 1% significance level, which means that each of the considered functions is not entirely included in H N but given by a linear combination of nonzero elements of H N and H S ; these results are, at least to some extent, expected from that H N is estimated by span{w 1 } and the considered functions have quite different shapes from that of w 1 ; see Figure 2-(a).Moreover, the null hypothesis that x ∈ H S is rejected at 1% significance level for x = x 1 or x 2 , and it is rejected at 5% (but not 1%) significance level for x = x 3 .These test results would lead us to similarly conclude that each of the considered functions is not entirely included in H S , but now it is worth noting that the null hypothesis for x = x 3 is rejected with less confidence; that is, the data supports that x 3 is relatively closer to H S than x 1 and x 2 .This result was earlier conjectured from the plots given in Figure 1, and we here note that such a conjecture has been, at least to some extent, examined by our proposed FPCA-based test.
In this example, we found a strong evidence that there exists at least one stochastic trend in the time series of age-specific employment rates.It may be of interest to investigate whether this time series is cointegrated with some economic variables exhibiting a unit root-type nonstationarity.This can be further explored in the future by developing a cointegrating regression model involving functional observations.

D.4 Empirical illustration 2: cross-sectional earning densities
We consider a monthly sequence of U.S. earning densities running from January 1989 to December 2019; this empirical example is similar to that given by Chang et al. (2016, Section 5.1).The raw individual weekly earning data can be obtained from the CPS at https://ipums.org.Since individual earnings in each month are reported in current dollars, they are all adjusted to January 2000 prices using monthly consumer price index data obtained from the Federal Reserve Economic Database at https://fred.stlouisfed.org.Reported weekly earnings are censored from above, with the threshold for censoring switching partway through the sample: the top-coded nominal earning is 1923 before January 1998, and 2885 afterward.In addition, the raw dataset contains many abnormally small nominal earnings, such as Notes: T = 372.We use * and * * to denote rejection at 5% and 1% significance level, respectively; the approximate critical values for 95% (resp.99%) are given by 0.15, 0.12, and 0.10 (resp.0.22, 0.18, and 0.15) sequentially.
$0.01 a week.In the subsequent analysis, weekly earnings below the 3th percentile and above the 97th percentile are excluded to avoid potential effects of those abnormal earnings.Each earning density is estimated from the remaining individual earnings as in Seo and Beare (2019) by applying local likelihood density estimation proposed in Loader (1996); this will be more detailed at the end of this section.One may treat such density-valued observations as random elements of the familiar Hilbert space L 2 of square integrable functions to apply our methodology; however, as pointed out by Petersen and Müller (2016), this is not in general advisable since the set of probability densities is not a linear subspace in this case (see also Delicado, 2011;Hron et al., 2016;Kokoszka et al., 2019;Zhang et al., 2021).Therefore, we first embed the estimated densities into a Hilbert space via the transformation approach proposed in Egozcue et al. (2006).Let S ⊂ R be the support of the probability density functions and let |S| be the length of S; in this empirical example S is set to [75.36, 1823.94]where the left endpoint (resp.the right endpoint) corresponds to the minimal (resp. the maximal) individual earning over the time span.We then define Then the transformed series {f t } t≥1 are regarded as a time series taking values in L 2 0 [0, 1], the collection of all x ∈ L 2 [0, 1] satisfying x(u)du = 0, which is a Hilbert space.The transformation ψ given above turns out to be an isomorphism between a Hilbert space of probability density functions (called a Bayes Hilbert space) and L 2 0 [0, 1], and its inverse is given by ψ −1 (f ) = exp(f )/ exp(f (u))du for any f ∈ L 2 0 [0, 1]; see Egozcue et al. (2006).For our purpose to illustrate our modified FPCA methodology, we hereafter only concern with the transformed series {f t } t≥1 ; the results obtained in this section can be naturally converted into those for the original density-valued time series {X t } T t=1 using the inverse transformation ψ −1 under certain mathematical conditions; see Seo and Beare (2019) for a more detailed discussion.Figure 3 reports the estimated earning densities and their transformations as our functional observations.Given that the functional observations are constructed from weekly earnings which tend to increase over time, it may be reasonable to consider the model with a linear trend as in Section D.3.We apply the sequential procedure based on our proposed test to determine ϕ.We use 18 quadratic B-spline functions for the representation of the functional observations, and k(•), h, and K are set in the same way.Table 12 reports the test results.Our sequential procedure concludes that the dimension of H N is 2 at 1% significant level.
Given this result, H N is estimated by the span of the first two eigenvectors {w j } 2 j=1 , which are obtained from our modified FPCA.If we let f t denote the residual from detrending f t , it is then expected that the time series of f t , w j behaves as a unit root process for j = 1 and 2. These are shown in Figure 4. Once the dimension of H N is estimated, then we may examine various hypotheses about cointegration using the tests developed in Section 4.2.We in this example found an evidence that the time series of earning densities exhibits stochastic trends.
As in the example given in Section D.3, it may be interesting to investigate if this time series of earning densities is cointegrated with other economic time series, and this can certainly be further explored in the future study.

Estimation of densities of earning
To obtain densities of earning, we employ the local likelihood method by Loader (1996Loader ( , 2006)).Suppose that X is the density of interest which is supported on S, and there are n individual earnings available for estimation of X.Given survey responses a 1 , . . ., a n with design weights w 1 , . . ., w n such that n i=1 w i = n, the weighted log-likelihood is given by L(X) = n i=1 w i log(X(a i )) − n X(u)du − 1 .Under some smoothness assumptions, we can consider a localized version of L(X), and log X(a) can be locally approximated by a polynomial function, as follows.
where W(•) is a suitable kernel function, b is a nearest neighborhood bandwidth ensuring that a fixed percent of the data is included in the local neighborhood of a, and Q(u − a; ) is the q-th order polynomial in u − a with coefficients = ( 0 , . . ., q ).Let ( ˆ 0 , . . ., ˆ q ) be the maximizer of (D.7).The local likelihood log-density estimate is then given by log X(a) = ˆ 0 and therefore X(a) = exp( ˆ 0 ).The procedure is repeated for a fine grid of points, and then X may be obtained from an interpolation method described in (Loader, 2006, Chapter 12).Following this procedure, each earning density is obtained on the common support S = [75.35, 1823.94] in our empirical application in Section D.4: to be more specific, we set , and choose b so that 40% of the data is contained in the local neighborhood of a.The employed kernel function is called the tricube kernel and used in Loader (1996).
Part II : Mathematical Appendix denotes the operator z → x, z y of rank one.An operator A ∈ L H is said to be compact if there exists two orthonormal bases {u j } j≥1 and {v j } j≥1 , and a real-valued sequence {a j } j≥1 tending to zero, such that A = ∞ j=1 a j u j ⊗ v j ; we may assume that u j = v j and a 1 ≥ a 2 ≥ . . .≥ 0 if A is also self-adjoint and positive semidefinite (Bosq, 2000, p. 35).For any A ∈ L H with a closed range, we let A † ∈ L H denote the Moore-Penrose inverse; see Engl and Nashed (1981).For any self-adjoint, positive semidefinite, and m is understood as the partial inverse of A on the restricted domain span({u j } m j=1 ).
E.2 Random elements in H and L H Let (S, F, P) denote the underlying probability triple.An H-valued random variable X is defined as a measurable map from S to H, where H is understood to be equipped with the usual Borel σ-field.An Hvalued random variable X is said to be integrable (resp.square integrable) if E X < ∞ (resp.E X 2 < ∞).If X is integrable, there exists a unique element EX ∈ H satisfying E X, x = EX, x for any x ∈ H.
The element EX is called the expectation of X.Let L 2 H denote the space of H-valued random variables X (identifying random elements that are equal almost surely) that satisfy EX = 0 and E X 2 < ∞.For any X ∈ L 2 H , we may define its covariance operator as C X = E [X ⊗ X], which is guaranteed to be selfadjoint, positive semidefinite and compact.Now let A be a map from S to L H such that Ax, y is Borel measurable for all x, y ∈ H.Such a map A is called a random bounded linear operator; see Skorohod (1983).For such operators A and B, we write We provide useful lemmas.
Lemma F.2.Under Assumptions M and K, Ω ϕ and Γ ϕ in (3.4) are consistent, i.e., Ω ϕ → L H Ω and Γ ϕ → L H Γ. Moreover, let Ω ϕ and Γ ϕ be defined as in Section C.Under the same assumptions, Ω ϕ and Γ ϕ are consistent in the same sense.
Proof.We know from Theorem 3.1 that P N ϕ = P N + O p (T −1 ) and P S ϕ = P S + O p (T −1 ), hence where Ω 0,ϕ = P N Ω ϕ P N + P N Ω ϕ P S + P S Ω ϕ P N + P S Ω ϕ P S .Note that Ω 0,ϕ is the sample long-run covariance of {E t } T t=1 , where E t = E N t + E S t , E N t = ∞ j=0 P N Φ j ε t−j , and E S t = ∞ j=0 P S Φ j ε t−j .We know from our proof of Lemma F.1 that the summability conditions ∞ j=1 j Φ j L H < ∞ and ∞ j=1 j Φ j L H < ∞ imply that {E t } t∈Z is L 2 -m-approximable.We then apply Theorem 2 of Horváth et al. (2013) to obtain Ω 0,ϕ → L H Ω, which in turn establishes Ω ϕ → L H Ω. To show Γ ϕ → L H Γ, we note that Γ ϕ = Γ 0,ϕ + O p (T −1 ) where Γ 0,ϕ = P N Γ ϕ P N + P N Γ ϕ P S + P S Γ ϕ P N + P S Γ ϕ P S .Then Γ ϕ → L H Γ may be deduced from the proof of Theorem 2 in Horváth et al. (2013).
For the cases with deterministic terms, the desired results are deduced from Theorem 2 of Horváth et al.
(2013) (when Model D1 is true), Theorem 5.3 of Kokoszka and Young (2016) (when Model D2 is true), and our previous proof for the case without deterministic terms.

F.1.2 Proofs of the main results
Proof of Theorem 3.1.We note the identity Since P N ϕ is the projection onto the first ϕ leading eigenvectors, P N C P S ϕ = P N CP N P S ϕ + P N CP S P S ϕ = P N C(I − ϕ j=1 vj , • vj ) = P N P S ϕ Λ, where Λ = ∞ j=1 λj vj ⊗ vj .We thus find that where (T −1 P N CP N ) † is well defined since T −1 P N CP N is a finite rank operator, hence has a closed range.We first deduce from the weak convergence result given in (F.1) and the continuous mapping theorem that the following holds under Assumption M: for k ≥ 1, x 1 , . . ., x k ∈ H and y 1 , . . ., y k ∈ H, This result implies that F.4) due to the Cramér-Wold device (Lemma E.2-(ii)).Let X S k,t = ( P S X t , x 1 , . . ., P S X t , x k ) , X N k,t = ( P N X t , y 1 , . . ., P N X t , y k ) .We deduce from the convergence result given by Assumption W that where For any x, y ∈ H, P N CP S x, y = T −1 T t=1 P S X t , x P N X t , y , from which we find that k j=1 P N CP S x j , y j = tr(T −1 T t=1 X S k,t X N k,t ).From the continuous mapping theorem, we have dW S (r), x j W N (r), y j + Γ N S x j , y j .
(F.5) From (F.5) and the Cramér-Wold device (Lemma E.2-(ii)), we deduce that It will be shown that the convergence results given by (F.4) and (F.6) can be strengthened as follows: Claim 1: Almost surely, the eigenvalues of A 1 are distinct and each eigenvalue is associated with one-dimensional eigenspace.We then know from Claim 1 and Lemma 4.3 of Bosq (2000) that the eigenvectors of T −1 P N CP N converge to those of A 1 .From similar arguments to the proofs of Proposition 3.2 and Theorem 3.3 of Chang et al. (2016), we find that P N P S ϕ = O p (T −1 ), P S P S ϕ − P S = O p (T −1 ) and λϕ+k = O p (1) for k ≥ 1.Given these results, the following can be additionally proved.
Since P N P S ϕ = O p (T −1 ), (F.3) can be written as T P N P S ϕ = − T −1 P N CP N † P N CP S + O p (T −1 ).
From this equation, Claim 3 and (F.2), we find that We know from Claims 1-2 and Lemma E.1 that our proof becomes complete if we show Note that both T −1 P N CP N and A 1 are self-adjoint positive definite operators of rank ϕ, almost surely.
Consider the spectral representations of T −1 P N CP N and A 1 as follows: Let ūj = sgn( ûj , u j )u j .Then it can be shown that sup We then deduce from (F.8) and (F.9) that (F.10) converges in probability to zero as desired, so (F.7) holds.
Proofs of Claims 1 & 2 : We only provide our proof of Claim 2; the arguments to be given can be applied to prove Claim 1 with a minor modification.To simplify expressions, we let A = P N CP S and A * = P S CP N .From (F.6) and Lemma E.2-(iii), we may assume that A → w A 2 = fdd dW S (r)⊗W N (r)+Γ N S and A * → w A * 2 .Let {u j } ϕ j=1 (resp.{u j } ∞ j=ϕ+1 ) denote an orthonormal basis of H N (resp.H S ).Since A → w A 2 , ran A ⊂ H N and ran A 2 ⊂ H N , we have for any x ∈ H where the equality follows from Parseval's identity.From the properties of operator norm, we have 2 is self-adjoint, positive semidefinite and has finite rank.Moreover, its operator norm is bounded above by the trace norm (see equation (1.55) in Bosq, 2000), which is in turn bounded by where the last equality follows from the fact that (F.11) and sup Proof of Claim 3: Note that P N ϕ = P S P N ϕ + P N P N ϕ and P S ϕ P N ϕ = 0 by construction, we thus have T P S ϕ P S P N ϕ = −T P S ϕ P N P N ϕ .From this equation and the identity I = P N ϕ + P S ϕ , the following can be shown: T P S P N ϕ − T P N ϕ P S P N ϕ = −T (P N P S ϕ ) * + T ( P S ϕ P N P S ϕ ) * .
Since P N P S ϕ = O p (T −1 ) and P S P S ϕ − P S = O p (T −1 ), we find that T P N ϕ P S P N ϕ L H = T (I − P S ϕ )P S P S (I − P S ϕ ) L H = O p (T −1 ) and T P S ϕ P N P S ϕ L H = T P S ϕ P N P N P S ϕ L H = O p (T −1 ).We therefore conclude that T P S P N ϕ = −T (P N P S ϕ ) * = O p (T −1 ).
Proof of Theorem 3.2.
We first show that the desired result can be obtained from the following eigenvalue problem: C ϕ − Υ ϕ ŵj = μj ŵj , j = 1, 2, . . . .(F.16) It can be shown that P N ( C ϕ − Υ ϕ )P S and P S ( C ϕ − Υ)P S weakly converge in distribution to some elements in L H , then Lemma E.1-(i) implies that each of P N ( C ϕ − Υ ϕ )P S L H and P S ( C ϕ − Υ ϕ )P S L H is O p (1).We know from Lemma F.2 that Υ ϕ = O p (1), and also deduce from our construction of X ϕ,t that T −1 P N C ϕ P N = T −1 P N CP N + O p (T −1 ).Combining all these results and arguments used in our proof of Theorem 3.1, we have (F.17 We know from Π S ϕ P N = O p (T −1 ) and P S ϕ P N = O p (T −1 ) that P N Π S ϕ M = O p (T −1 ) and P N Υ ϕ P N = O p (T −1 ).This in turn implies that T P N Π S ϕ = − T −1 P N C ϕ P N † P N C ϕ P S − P N Υ ϕ P S + O p (T −1 ), (F.18)where note that (T −1 P N C ϕ P N ) † is well define since T −1 P N C ϕ P N is a finite rank operator, so has a closed range.Using similar results to (F.2) and Claim 3 in our proof of Theorem 3.1, we find that T ( Π N ϕ − P N ) = P S C ϕ P N − P S Υ * ϕ P N T −1 P N C ϕ P N † + T −1 P N C ϕ P N † P N C ϕ P S − P N Υ ϕ P S + O p (T −1 ).(F.19) From (F.17) and nearly identical arguments to those used to derive (F.7), (F.20) We will derive the limit of A = P N C ϕ P S − P N Υ ϕ P S .First, it may be deduced from Lemma F.2 that Moreover, from the fact that rank Ω N N = rank Ω N N = ϕ almost surely, B → B −1 is a continuous map for every positive definite matrix B ∈ R ϕ×ϕ , and 2) and this may be understood as a convergence in probability in R ϕ×ϕ , we find that Combining (F.21) and (F.22), we have, for any k ≥ 1, x 1 , . . ., x k ∈ H and y 1 , . . ., y k ∈ H, P N Υ ϕ P S x j , y j → p Υx j , y j , where Υ = Γ N S − Γ N N Ω N N † Ω N S .We then note that Ax j , y j = 1 T T t=1 P S X t − Ω SN Ω N N † P N ∆X t , x j P N X t , y j − Υx j , y j + o p (1), where the first term converges in distribution to dW S.N (r), x j W N (r), y j + Υx j , y j ; see e.g. the proofs of Theorems 1 and 2 of Harris (1997).Using this result and nearly identical arguments used to derive (F.6), it is quite obvious to establish that A → wd dW S.N (r) ⊗ W N (r).Furthermore, from similar arguments to those used to prove Claim 2 in our proof of Theorem 3.1, we find that A → L H A 2 = fdd dW S.N (r) ⊗ W N (r).(F.23) Combining (F.19), (F.20), (F.23), and Lemma E.1-(ii) and (iii), we obtain the desired result.
We now consider the eigenvalue problem (3.6).On top of the previous proof, it is easy to show that Υ * ϕ = O p (1) and P N Υ * ϕ P S = O p (T −2 ), so (F.18) still holds for Π S ϕ obtained from (3.6).The rest of proof is almost identical to our proof for the eigenvalue problem (F.16).
Independence of W S.N and W N is deduced from the fact that E[W S.N ⊗ W N ] = 0.
Proof of Proposition A.1.Since dim(H) < ∞ and the minimum eigenvalue of E[E t ⊗ E t ] is strictly positive, we may deduce from the proof of Theorem 2 in Harris (1997)  the eigenvectors of T −1 ( C ϕ − Υ ϕ 0 − Υ * ϕ 0 ) converge to those of W N (r) ⊗ W N (r).Note also that W N (r) ⊗ W N (r) is a positive definite operator of rank ϕ = dim(H N ) almost surely.This implies that the eigenvectors of T −1 ( C ϕ 0 − Υ ϕ 0 − Υ * ϕ 0 ) corresponding to the largest ϕ eigenvalues converge to an orthonormal basis of H N .We now let Q N ϕ = ϕ j=1 ŵj ⊗ ŵj and let Q S ϕ = I − Q N ϕ .Then from (F.30) and similar arguments used in the proof of Proposition 3.2 in Chang et al. (2016), it can be shown that Q N ϕ − P N L H = O p (T −1 ) and Q S ϕ − P S L H = O p (T −1 ).Then from a nearly identical argument used in the proof of Theorem 3.3 in Chang et al. (2016), it can be shown that ŵj − sgn( ŵj , w j )w j → p 0, j = ϕ + 1, . . ., where w ϕ+1 , . . .are the eigenvectors of E[E S t ⊗ E S t ].This completes our proof of (i).We now show (ii).From our proof of Theorem C.1, we may similarly deduce that Then the rest of proof is similar to that for the case without deterministic terms.

Nϕ
from (C.3) as in (3.2), and let P S ϕ = I − P N ϕ .As will be shown in Theorem C.1, the asymptotic limits of these preliminary estimators depend on Ω N S and Γ N S .

Figure 3 :
Figure 3: Estimated earning densities and transformed densities It will be convenient to define two other modes of convergence of random bounded linear operators for our proofs of the main results.First, we write A j → w A, if for all x, y ∈ H,| A j x, y − Ax, y | → p 0.Moreover, we write A j → wd A if, for all k and x 1 , . . ., x k , y 1 , . . ., y k ∈ H,lim j→∞ Ef ( A j x 1 , y 1 , . . ., A j x k , y k ) = Ef ( Ax 1 , y 1 , . . ., Ax k , y k )F.1 Mathematical proofs for the results in Sections 3 and A F.1.1 Preliminary results

Table 8 :
Relative frequencies of correct determination of ϕ

Table 9 :
Rejection frequencies (%) of the test for (D.2) with D.3 Empirical illustration 1: age-specific employment ratesWe revisit two empirical applications considered, respectively, byNielsen et al. (2022, Section 5.1) andChang et al. (2016, Section 5.1)with extended time spans.In this section, we discuss the first empirical example in detail.Specifically, we here consider the time series of U.S. age-specific employment rates for the working age (15-64) population, observed monthly from January 1986 to Dec 2019.The raw survey data is available from the Current Population Survey (CPS) at https://www.ipums.org/.For age a ∈ [15, 64], the age-specific employment rate at time t, denoted by Z a,t , is computed as inNielsen et al.

Table 10 :
Test results for the dimension of H N -age-specific employment rates

Table 12 :
Test results for the dimension of H N -earning densities Hilbert space and bounded linear operatorsLet H denote a real separable Hilbert space with inner product •, • and norm • = •, • 1/2 .We let L H denote the space of bounded linear operators on H, equipped with the usual operator norm A L H = sup x ≤1 Ax .For an operator A ∈ L H , we let A * ∈ L H denote the adjoint of A, and let ran A (resp.ker A) denote the range (resp.kernel) of A, which are, respectively, defined by ran A = {Ax : x ∈ H} andker A = {x ∈ H : Ax = 0}.The rank of A, denoted by rank A, is equal to dim(ran A).If A = A * , thenA is said to be self-adjoint.An operator A ∈ L H is positive semidefinite if Ax, x ≥ 0 for any x ∈ H, and positive definite if also Ax, x = 0 for any nonzero x ∈ H. Throughout this paper, x ⊗ y for x, y ∈ H Bosq (2000) j > 0 and the associated eigenspace is one-dimensional for each j = 1, ..., ϕ almost surely.It is deduced from Lemma 4.3 ofBosq (2000)that the eigenvectors satisfy sup 1≤j≤ϕ ûj − sgn( ûj , u j )u j → p 0.(F.9) T A * = O p (1) (Lemma E.1-(i)).We therefore conclude that A A * → w A 2 A * 2 .(F.14) From parallel arguments, A 2 A * → w A 2 A * 2 , AA * 2 → w A 2 A * 2 .(F.15)Equations (F.14) and (F.15) imply that (F.13) is o p (1), which in turn implies that (F.12) is o p (1).