Functional data analysis for longitudinal data with informative observation times

In functional data analysis for longitudinal data, the observation process is typically assumed to be noninformative, which is often violated in real applications. Thus, methods that fail to account for the dependence between observation times and longitudinal outcomes may result in biased estimation. For longitudinal data with informative observation times, we find that under a general class of shared random effect models, a commonly used functional data method may lead to inconsistent model estimation while another functional data method results in consistent and even rate‐optimal estimation. Indeed, we show that the mean function can be estimated appropriately via penalized splines and that the covariance function can be estimated appropriately via penalized tensor‐product splines, both with specific choices of parameters. For the proposed method, theoretical results are provided, and simulation studies and a real data analysis are conducted to demonstrate its performance.


INTRODUCTION
In functional data analysis for longitudinal data, the observation process is typically assumed to be noninformative, that is, it is assumed to be independent of longitudinal outcomes. However, this assumption is often violated in real applications. For example, in a longitudinal study, the frequency and timing of a subject's observations may be correlated with the values of longitudinal outcomes. Data collected in such a setting are referred to as longitudinal data with informative observation times.
A motivating example is a longitudinal data set of trajectories of patient-reported symptom severity of Parkinson's disease (Mischley et al., 2017). The data consist of self-reported severity of each of 33 symptoms on a scale of 0-100 for 371 patients, with the higher number reflecting greater symptom severity. The sum of these symptoms gen-This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. © 2022 The Authors. Biometrics published by Wiley Periodicals LLC on behalf of International Biometric Society. erates the total patient-reported outcome in Parkinson's disease (PRO-PD) score. The number of observations for each patient ranges from 2 to 38, and observations are made on a voluntary basis at varying times since a patient's initial diagnosis. Patients entered the study at arbitrary times since diagnosis. There are no instances of dropout, and we only consider data observed less than 20 years following any patient's initial diagnosis. Figure 1 illustrates the 371 trajectories along with smoothed group-mean profiles of subjects with 10 or fewer observations and of subjects with more than 10 observations in relation to patient-reported quality of life (Cella et al., 2010). The average (standard error) PRO-PD score of subjects with 10 or fewer observations is 828.3 (16.5), and the average PRO-PD score of subjects with more than 10 observations is 752.4 (13.2). It is apparent that subjects reporting lower symptom severity tend to report more frequent observations, which provides

F I G U R E 1
The 371 trajectories of symptom severity along with smoothed mean profiles of subjects with less than or equal to 10 observations and of subjects with more than 10 observations. Reference lines indicate the average PRO-PD of subjects answering the prompt "In general, would you say your quality of life is:" with either "poor," "fair," "good," "very good," or "excellent." This figure appears in color in the electronic version of this article, and any mention of color refers to that version evidence against the assumption of noninformative observation times.
It is well understood that standard methods for longitudinal data that ignore the informative observation processes may result in bias in parameter estimation. Hence, a number of methods have been proposed for the situation in which longitudinal outcomes are correlated with observation times (Lin et al., 2004;Sun et al., 2005Sun et al., , 2007Sun et al., , 2012Liang et al., 2009;Zhu et al., 2011;Cai et al., 2012;Song et al., 2012;Zhao et al., 2012;Fang et al., 2016). Among them, a commonly used modeling strategy is to consider shared random effects models (eg, Sun et al., 2005;Liang et al., 2009). Such models can allow flexible dependence between longitudinal outcomes and observation times.
However, existing literature for longitudinal data with informative observation times mostly focuses on the estimation of an unspecified smooth mean function or integrated mean function. In the analysis of longitudinal data, besides the mean function, it might also be interesting to consider a nonparametric covariance function, which captures the patterns of subject-specific trajectories. A popular nonparametric modeling approach for estimating both mean and covariance functions of longitudinal data is via methods of functional data analysis (Ramsay & Silverman, 2006), in which one views the values of a longitudinal trajectory as observations from a random smooth function; see Guo (2002) and Yao et al. (2003Yao et al. ( , 2005. The advantage of functional data methods compared to traditional parametric longitudinal methods is that they allow for unspecified mean/covariance functions, which may better capture complex variation patterns of subject-specific trajectories. Theoretical properties of functional data analysis have been well studied in several papers (Li & Hsing, 2010;Cai & Yuan, 2011;Zhang & Wang, 2016;Xiao, 2020). In particular, Zhang and Wang (2016) and Xiao (2020) considered general weighting schemes for aggregating multiple observations of each subject in the estimation of mean and covariance functions of functional data. The theories therein assume that the observation times are either fixed or randomly sampled and independent from the longitudinal outcomes. Hence, existing theories are not directly applicable to settings in which the observation process is informative. Nevertheless, it might be helpful to clarify that, as shall be seen by our model specification, we consider longitudinal data where each subject only has only a finite number of observations, and hence the sparse to dense phase transition of functional data in terms of the number of observations per subject as those in Cai and Yuan (2011) and other works is not relevant here.
In this paper, we consider functional data methods for longitudinal data with informative observation times. We use a heuristic argument to show that a commonly used functional data method may be inconsistent for such data. Using the same argument, we identify one particular functional data method that could give consistent estimation under the considered shared random effect model. The difference in these methods lies in how the multiple observations of each subject are aggregated to formulate the weighted least square estimation. This contrasts dramatically with existing functional data theories, which found that both methods lead to consistent estimation. For the suggested method with penalized splines, we derive the corresponding rates of convergence, which are either optimal or the best in the literature. It is worthy of pointing out that the theoretic derivation is far from a straightforward application of existing theoretic results which consider noninformative observation times while the theory in the paper has to handle informative time points. Simulation studies demonstrate the desired performance of the suggested method and the inconsistent behavior of the aforementioned commonly used method in functional data analysis. Finally, the suggested method is applied to the motivating data and we conclude the paper with some discussion.

Model specification
Let ( ) be subject 's underlying longitudinal outcome ( = 1, … , ) at time ∈  where  is a compact time interval. Without loss of generality, let  = [0, 1]. The observations = ( ) of subject are taken at time points 1 < 2 < … < . Let ∈  denote the censoring time of the th subject. The cumulative number of observations for the th subject by time is Here, is the indicator function and ∧ denotes the minimum of and . The observed values of ( ) are obtained at the jump points of ( ). Consider the functional data model where * (⋅, ) is a conditional mean function parameterized by the latent frailty variable , (⋅) is a zero-mean random function that models subject 's smooth deviation from the mean, and (⋅) is zero-mean white noise with finite variance 2 . Define the marginal mean function ( ) = { * ( , )}, the covariance function of the conditional mean functions * ( , ) = cov{ * ( , ), * ( , )}, and the covariance function of the random subject-specific functions ( , ) = cov{ ( ), ( )}. Then, the responses can be considered as observations from a random function with mean function ( ) and covariance function ( , ) = * ( , ) + ( , ), contaminated with measurement error.
The dependence between longitudinal outcomes and observation times is induced by the latent frailty variable , but the form and magnitude of the dependence is left completely unspecified for flexibility. To simplify theoretical analysis, we allow to be 0. In formulas and equations involving −1 , the term −1 should always be interpreted as −1 ( ≠ 0), which equals 0 if = 0.

Estimation of mean function
The mean function ( ) is approximated by a spline function B ( ) where B( ) = { 1 ( ), … , ( )} ∈ ℝ is a vector of th order B-spline basis functions constructed from equally spaced knots in  , and is a vector of coefficients. The coefficient vector is estimated via the minimization where s are nonnegative weights that may depend on and , is a smoothing parameter that balances fit and smoothness, and P is a penalty matrix such that P equals the squared sum of the th order consecutive differences of as in Eilers and Marx (1996). The mean function is then estimated byˆ( ) = B ( )ˆ.
A common choice of the weights is = 1, which means that each observation has equal weight in forming the least squares in (2). We will denote this choice of weights by "OBS," following Zhang and Wang (2016) and Xiao (2020). To our knowledge, this is the most often used choice for formulating the least squares in functional data analysis. Another choice of weights is = −1 , which implies that the set of observations from each subject has equal weight (eg, Li & Hsing, 2010). We will denote this choice of weights by "SUBJ." Both choices of weights could lead to consistent and even rate-optimal estimation in functional data analysis as shown in Zhang and Wang (2016) and Xiao (2020) for functional data with noninformative observation times.
However, in what follows, we shall heuristically derive different weighting schemes will affect estimation for functional data with informative observation times. Let As shown in Liang et al. (2009), Thus, Similarly, we derive that It follows by Equation (3) that as increases to infinity, ( ) converges to We now evaluate expression (4) with either the OBS weights or the SUBJ weights. First, consider the SUBJ weights for which = −1 .
The above derivations show that the OBS weights lead to inconsistent estimation of the mean function while in contrary the SUBJ weights might give consistent estimation. This differs from traditional functional data analysis, where both OBS and SUBJ weights will lead to consistent estimation (Zhang & Wang, 2016;Xiao, 2020). Thus, for functional data with informative observation times under the shared random effects model, we propose to use the SUBJ weights. Accordingly, we shall rigorously derive the rate of convergence of the mean function estimation in Section 3.
The covariance function ( , ) is approximated by a tensor-product spline, ≤ is a coefficient matrix. To simplify notation, we use the same univariate B-splines as in the mean function estimation. Since the covariance function is symmetric, it is natural to impose the constraint that Θ = Θ , which ensures that ( , ) = ( , ). Let vec(⋅) be the vectorization operator, which is invertible, and define where s are nonnegative weights, −1 is a smoothing parameter, and P is the same penalty matrix for mean function estimation. We use the same notation as in the mean function estimation to simplify the notation in theoretical analysis. The above method was proposed in Xiao et al. (2018) and can be shown to be a special case of the bivariate tensor-product P-splines in Eilers and Marx (2003). Letˆ= vec(Θ). Then, the covariance function ( , ) is estimated byˆ( , ) = B ( , )ˆ. As in the mean function estimation, we consider two choices of weights. We will denote the choice of = 1 by "OBS" and the choice of = { ( − 1)} −1 by "SUBJ." Similar to the mean function estimation, the former weight implies each auxiliary variable carries the same weight in forming the weighted least squares while the latter implies the set of auxiliary variables from each subject carries the same weight. Then, by similar arguments (omitted) as in the mean function estimation, it can be shown that the SUBJ weights should be adopted. And we shall also derive rates of convergence of the above proposed estimator with the SUBJ weights.

Assumption 2. The censoring times
(1 ≤ ≤ ) are independent and identically distributed.  We now discuss Assumption 4. First, 1 ( ) is the density function of the theoretical counterpart of the empirical distribution of the observed time points. In nonparametric regression, it is usually required this function is bounded away from 0 and infinity. When there are no censoring, that is, = 1, then 1 ( ) = Λ (1) 0 ( ) and 2 ( , ) = Λ (1) 0 ( )Λ (1) 0 ( ). Because of Assumption 3, Assumption 4 will always hold. When there are censoring, we could show that one generating distribution for in the simulation study also ensures that Assumption 4 holds.
For theoretical results, we shall focus on the SUBJ weights and scale the weights with = ( ) −1 so that ∑ =1 = 1. The notation denotes the order of splines and denotes the order of penalty. Let ℎ = −1 and ℎ = ℎ ∨ 1∕(2 ) . By Xiao (2019), ℎ plays the role of bandwidth parameter for penalized splines.
of Theorem 1. The proof of Theorem 3.3 in Xiao (2020) can be adapted if Lemmas A.7, A.8, and A.9 in Xiao (2020) hold almost surely, which are given in Lemma 1 and proved in Section A of the Supporting Information. These lemmas deal with empirical distributions of the observed time points, which are informative, and hence different proofs are required.

Theorem 2. Suppose that Assumptions 1-8 hold. If
The proof of Theorem 2 is given in Section B of the Supporting Information. The derived 2 rate in Theorem 2 can be shown to achieve the best rate in the literature (Zhang and Wang, 2016;Xiao, 2020), −2 ∕(2 +2) , which is the optimal rate for estimating bivariate smooth functions (Stone, 1982).

Simulation settings
In this section, we conduct simulation studies to evaluate the performance of the proposed method, that is, penalized splines with SUBJ weights, and compare it with that of penalized splines with OBS weights. In the following simulations, the maximum follow-up time is = 1 and we consider three censoring time distributions: (1) = 1, which corresponds to no censoring; (2) is sampled from a mixture distribution of Uniform(0.2, 1) and a point mass at 1, with mixing factors of 0.8 and 0.2, respectively; and (3) ∼ Uniform(0, 1). For the second generating distribution for the censoring times, which we call mixed censoring, Assumptions 4 and 8 still hold and hence our theoretical results will still hold; see Section C of the Supporting Information for a proof. The third generating distribution, which we call uniform censoring, serves for the purpose of a sensitivity analysis. The frailty variable is chosen as = (1 − ) + * , where * is sampled from a gamma distribution or lognormal distribution, both with mean 1 and variance 2 * = 0.5 or 2 * = 1, and = 0.5 or 0.85, corresponding to the informative observation process settings, and = 0, corresponding to the noninformative observation process settings. The observation process follows a Poisson process with the intensity function 10 . Given and , the number of observations of subject then follows a Poisson distribution with the mean 10 . The conditional mean function is * ( , ) = ( 2 + 2∕3), implying a marginal mean function of ( ) = 2 + 2∕3. The covariance function of the random functions is chosen as ( , ) = 0.05 exp( 2 ) × 0.05 exp( 2 ) × (1− | − |). The observed values ( ) are then generated from (1), where ( ) ∼  (0, 2 ) and 2 is set to be 0.5 in the informative observation process settings, and 1 in the noninformative observation process settings. The number of subjects is set to = 150 or = 300.
We replicate each simulation setting 200 times. For univariate penalized splines, we use 10 cubic B-spline bases with equally spaced knots in the unit interval and for the tensor-product penalized splines, we use 10 marginal cubic B-splines with equally spaced knots in both dimensions. The smoothing parameters are selected using generalized cross-validation via a modified version of the function fpca.sparse in the R package face, which performs the fast covariance estimation (FACE) method for sparse functional data (Xiao et al., 2018).

Simulation results
Define the integrated squared error as either ∫ {ˆ( ) − ( )} 2 d or ∬ {ˆ( , ) − ( , )} 2 d d whereˆ( ) andˆ( , ) are estimates of ( ) and ( , ). For each simulation setting, we calculate the median and interquartile range of the integrated squared error of the estimates using the OBS and SUBJ choices of weights. Figure 2 plots the averages of the estimated mean functions from penalized splines with either OBS or SUBJ weights for a few simulation settings with gamma frailty variables, for no censoring and mixed censoring times. In these plots, 2 * = 0.5. A nonzero value of in these plots indicate informative observation times. We first observe from the plots that penalized splines with OBS weights converge to ( ) = ( 2 + 2∕3)(1 + 2 2 * ), the derived expression in (4) for OBS weights, which differs from the true ( ) = 2 + 2∕3 and the difference becomes more pronounced as increases. Next we see that penalized splines with SUBJ weights converge to the true mean function as desired, irrespective of the values of . Thus, the plots demonstrate the inconsistent estimation of the mean function by penalized splines with OBS weights and the desired model estimation by penalized splines with SUBJ weights. The plots for uniform censoring and lognormal frailty variables are similar and are presented in Web Appendix D.

F I G U R E 2
Average of 200 estimates of the mean function using penalized splines with both OBS and SUBJ weights for simulation settings in which the frailty variable * follows a gamma distribution with variance 2 * = 0.5. The true mean function is ( ) = 2 + 2∕3 and ( ) is the formula in (4) associated with OBS weights, specifically ( ) = ( 2 + 2∕3)(1 + 2 2 * ). This figure appears in color in the electronic version of this article, and any mention of color refers to that version Table 1 gives the median and interquantile range of integrated squared error for estimating the mean function using both methods in all simulation settings with informative observation times. The numerical results show that penalized splines with SUBJ weights have much smaller median integrated squared error for estimating the mean function, in agreement with the above plots. In addition, the estimators based on the SUBJ weights have much smaller variation in terms of interquartile range in the presence of information observation times. While there is some additional variability in the estimates at the larger time points in the mixed censoring settings compared to the no censoring settings, censoring seems to have a less significant effect on the estimates. Similar simulation results for the uniform censoring setting are found and given in Web Appendix D.
The numerical results for the covariance function estimation under the informative observation times settings are summarized in Table 2, and the corresponding numerical results under the noninformative observation times settings are included in Section D of the Supporting Information. The estimate using the SUBJ weights clearly outperforms the estimate using the OBS weights in both cases. Similar to the results for the mean function estimation, the estimators based on the SUBJ weights have much smaller integrated squared error and variation than those based on the OBS weights under all the settings with information observation times. For illustration, Figure 3 TA B L E 1 Median (interquartile range) of integrated squared error for estimating the mean function using penalized splines with both OBS and SUBJ weights

F I G U R E 3
True covariance function and corresponding estimates using penalized splines with both OBS and SUBJ weights for two simulated data sets in which the frailty variable * follows a gamma distribution with variance 2 * = 0.5, with = 0.5 (top row) and = 0.85 (bottom row). This figure appears in color in the electronic version of this article, and any mention of color refers to that version presents the estimated covariance function from both methods along with the true covariance function from two simulated data sets.
In simulation settings with noninformative observations, that is, = 0, we find no substantial difference in the integrated squared errors for estimating the mean and covariance functions using either the SUBJ or OBS weights. Indeed, penalized splines with both types of weights perform similarly and both estimate accurately the mean function. The numerical results are given in Web Appendix D.
Based on the above empirical results, we can recommend the use of the SUBJ weights for any longitudinal data with or without informative observations, without concern for suboptimal estimation in the case that the observation process is noninformative.

DATA APPLICATION
In this section, we apply the discussed methods to the Parkinson's disease symptom severity data set described in Section 1. Recall that in this data set, there was evidence of an informative observation process in which the average severity score of subjects with more than 10 observations is less than that of subjects with less than or equal to 10 observations. The intuition behind the selection of weights is clearly illustrated by this example. When weighting each observation equally, the estimates are more heavily influenced toward subjects with more observations. Thus, the mean function is likely to be underestimated. Similarly, the estimate of the covariance function is likely to be biased. Also recall that the time at which a patient entered the study is unrelated to the time since their initial diagnosis, there is no instance of dropout, and we only consider data observed less than 20 years following a patient's diagnosis. There is therefore no informative truncation or censoring present in the data. Therefore, it seems appropriate to use the proposed functional method with SUBJ weights. The trajectories are plotted in Figure 4 along with the mean function estimates of the penalized spline estimator with the OBS and SUBJ weights. As expected, the mean estimate when using SUBJ weights is greater than the estimate when using OBS weights over the whole time domain. By the estimate from penalized splines with SUBJ weights, we see a monotonic increase in severity over the first 20 years after diagnosis, which agrees with existing research (Group, 2004;Venuto et al., 2016;Holden et al., 2018). The estimated correlation and variance functions of the penalized spline estimator with SUBJ weights are displayed in Figure 4. It is interesting to see that the variance function takes its smallest value around 8 years after diagnosis and is greatest at the maximal follow-up time. As a comparison, the estimated variance function with OBS weights is also plotted and has uniformly higher values. We also consider the spectral decomposition of the estimated covariance function. The first three eigenfunctions are shown in Figure 4. These first three eigenfunctions account for 60%, 30%, and 6% of the total variation, respectively. The first eigenfunction corresponds to a contrast between early and late time after diagnosis. The second accounts for deviation from the mean trajectory around 8 years after diagnosis. The estimated correlation function and associated eigenfunctions using OBS weightings are displayed in Section E of the Supporting Information and very similar to the above results using SUBJ weights.

DISCUSSION
In this paper, we studied some commonly used functional data methods for analysis of longitudinal data with informative observation times and identified a method with a proper subject-specific weighting scheme that can achieve the consistent estimation of the mean and covariance functions under a class of shared random effect models. The suggested method can be easily implemented using existing software. The theoretical properties of the estimator were studied under the setting with the informative observation process, and special care must be taken to handle the distribution of observation times for establishing the convergence rate results. In our current work, no covariates are included in the model. But the proposed functional data method can be easily extended to accommodate covariates. In addition, censoring times are assumed to be independent of longitudinal outcomes and observation times, which we think is reasonable for our data application. It is certainly of interest to extend the proposed method to the setting with the informative observation process and informative dropout. In such a setting, the shared random effects model can be extended to account for dependence between censoring times and longitudinal outcomes and observation times. Then, the nonparametric approach in Wang et al. (2001) can be used to estimate the baseline cumulative intensity function. Note that the derivations that lead to Equation (4) remain valid for informative censoring. Therefore, Equation (4) may be used for selecting an appropriate set of weights in aggregating the observations. Such a research direction merits further investigation.

A C K N O W L E D G M E N T S
The authors thank the editor and the associate editor for their constructive and helpful comments, which improved the paper. This work was partially supported by the NIH grants R56 AG064803 (LX), R01 AG064803 (LX), and R01 NS112303 (LX and CW). This work represents the opinions of the researchers and not necessarily that of the granting organizations.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings in this paper are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.