Multistate analysis of multitype recurrent event and failure time data with event feedbacks in biomarkers

In this paper we propose a class of multistate models for the analysis of multitype recurrent event and failure time data when there are past event feedbacks in longitudinal biomarkers. It can well incorporate various effects, including time‐dependent and time‐independent effects, of different event paths or the number of occurrences of events of different types. Asymptotic unbiased estimating equations based on polynomial splines approximation are developed. The consistency and asymptotic normality of the proposed estimators are provided. Simulation studies show that the naive estimators which either ignore the past event feedback or the measurement errors are biased. Our method has a better coverage probability of the time‐varying/constant coefficients, compared to the naive methods. An application to the dataset from the Atherosclerosis Risk in Communities Study, which is also the motivating example to develop the method, is presented.


INTRODUCTION
Characterizing the association between multiple ordered events and time-varying covariates, under a feedback mechanism in the event process, is of great interest in many areas. Steele (2011) studied women employment histories, where the time-varying covariate, presence of children, may affect employment transitions; and employment status may in turn affect the timing of childbearing. Another example occurs in cardiovascular disease (CVD) studies. Some clinical evidences (Amarenco & Labreuche, 2009;Elisaf et al., 1999;Khan et al., 2017;Woo et al., 1990) suggest that the biomarker data may carry information from the past CVD events, and may have further impact on the occurrences of the future ones.
In our motivating example, the Atherosclerosis Risk in Communities (ARIC) Study, researchers sought to evaluate the relationship between the levels of and/or changes in risk factors with CVD events in men and women from four U.S. communities (The ARIC Investigators, 1989). Various types of events were recorded and more than one type of event was observed from the same participant, including myocardial infarction (MI), stroke, and cardiovascular death. The existing analysis of the ARIC data, including but not limited to Wattanakit et al. (2005), Yatsuya et al. (2011), Kim et al. (2012), and Li and Lu (2015), focused on survival event or a single type of recurrent event but did not address the issue of event feedback mechanism. To properly reflect the overall cardiovascular burden of the risk factors, the multitype CVD events should be modeled simultaneously, because such events are very likely associated. Moreover, lack of consideration of the feedback mechanism may lead to biased estimate of the association between event times and biomarkers. For example, the profile of systolic blood pressure (SBP) can be associated with prior MI history and the levels of high-density lipoprotein cholesterol (HDLC) may change after onset of stroke. The statistical estimation procedure is challenged by the fact that the biomarkers are only measured at 3-year intervals in the ARIC Study and are very likely subject to measurement errors as well.
Various approaches have been proposed for the analysis of time to multitype events. Chen et al. (2012) developed an additive marginal rate model for the analysis of multitype recurrent event data. Such model characterizes the rates of the event process without specifying the dependence structure among event occurrences. Frailty models are also popular choices to account for the association among different types of time-to-event process (Brown & Ezekowitz, 2019;Chen et al., 2005;Yu et al., 2018). The aforementioned methods, however, either fail to capture the competing risks structure of the multitype events or is not able to characterize disease progression, which is reflected by the impact of the previous event history on new event occurrences. For example for some chronic disease, a larger number of hospital admissions may be associated with a decrease in the times between hospital admissions in the future (Ieva et al., 2017). To have a natural formulation of all possible combinations of multitype events and reflect the progressive feature of a disease process, multistate models have also been employed for the analysis of recurrent event and terminal event data. Nathoo and Dean (2008) proposed a hierarchical multistate modeling framework for the analysis of longitudinal event data where the dependence between different transition rates can be well addressed. Their model and estimation method were applied to investigate the readmission rate and mortality for acute coronary syndrome. Ieva et al. (2017) proposed a flexible multistate representation of the event process. Repeated hospitalizations and death in heart failure patients were investigated. These multistate models are effective in exploring the dependence structure among event occurrences. But they only allow for a single type of recurrent event. Therefore new statistical models need to be developed for the joint analysis of multitype recurrent events and terminal events, which is very common in practice.
When the past event history enters the model as a single covariate itself, it is defined as a dynamic covariate indicating a feedback mechanism in the event process (Gjessing et al., 2010). There have been many works dealing with the dynamic covariate in time-to-event analysis. Aalen et al. (2004) presented an approach for the analysis of multivariate counting process, in which the dynamic covariate comes into the intensity model in an additive manner. Several forms of the dynamic covariates were discussed, such as the number of previous events, time since last event and average number of prior events. Peña (2006) described a general class of models for recurrent events, where the number of past events was used as a covariate in the the intensity function. Fosen et al. (2006) developed a method for the analysis of recurrent event data using information on previous occurrences of the event as a time-dependent covariate. Direct, indirect, and total effects of other baseline covariates were investigated based on the additive hazard model. Yiu et al. (2018) considered a clustered multistate model with dynamic covariates and analyzed the transition intensities and sojourn times in a study of psoriatic arthritis. These methods, however, are not adequate for the event history analysis when the past event feedbacks are associated with the longitudinal covariates. As mentioned in the motivating example, the trajectories of biomarkers are influenced by prior event history. Hence the past event impact is not simply served as a single covariate for the event intensity function but is actually associated with other time-dependent covariates in a complex way. Dai and Pan (2018) considered the joint models for survival and longitudinal data where the longitudinal process is related to the history of some random observation time points. Their models were developed under the framework of informative observation times for longitudinal data and hence is not appropriate for our analysis where primary interest is in the time-to-event modelling.
In this paper, we propose a new class of multistate models with past event feedbacks in the associated biomarkers. To our knowledge, this is the first attempt at modeling past event feedback in the longitudinal covariates in event history analysis. There is also limited literature that considered a multistate representation of the multitype recurrent event and terminal event data. Our proposed model and estimation procedure are novel and feature the following aspects. First, the competing risks feature of events of different types and the correlation structure of the consecutive events are well addressed within the multistate model framework. The functional forms of the event intensities are allowed to depend on the event order or event type or a combination of the both, reflecting disease progression in a flexible way. Second, the trajectories of the associated biomarkers are characterized by random effects models. Function of event paths enters the random effects model as a predictor variable to capture the past event feedback. The distributions of the random effects are left unspecified making our model robust against distribution misspecification. Asymptotically unbiased estimating equations are obtained based on an extension of the corrected score approach. Third, both time-dependent coefficients and time-independent regression parameters are incorporated in the multistate models, considering the fact that our motivating example is a long term study. The association between event time and biomarker may change along time and therefore the proportional assumption may be violated. The rest of the paper is structured as follows. Section 2 introduces the model and notation. Section 3 presents the estimation procedure and the large sample results. Simulation studies and real application are presented in Sections 4 and 5, respectively.

MODEL AND NOTATION
Suppose that there are n individuals and the data from different individuals are independent. Each individual can possibly experience K types of recurrent events. Let M denote the maximum number of nonterminal events observed among all individuals. We represent the event process in terms of a multistate process. An individual moves between the intermediate states when the recurrent events occur and moves to the absorbing state when a terminal event occurs. A simple example showing state transition paths when K = 2 and M = 4 is shown in Figure 1. We use white boxes to represent intermediate states and use grey boxes for absorbing states. The intermediate states in the first row are defined by the type 1 recurrent events and those in the second row are defined by the type 2 recurrent events. The mth column of the intermediate states represents that this is the mth nonterminal event a subject has experienced. For example, consider stroke as a type 1 recurrent event, MI as a type 2 recurrent event and cardiovascular death as a terminal event. Transition to state 2 corresponds to the fact that an individual has experienced a second recurrent event and this event is a stroke. Transition to state 7 represents the case that a third recurrent event has occurred and it is a MI. The absorbing state 9 represents direct cardiovascular death and the absorbing state 10 represents death with previous stroke and/or MI history. If an individual experienced the following series of events: MI → stroke → MI → cardiovascular death, then the multi-state transition path is 5 → 2 → 7 → 10 according to Let Y i h (t) be the at risk process which equals 1 if subject i is at risk for the transition out of state h at t. We assume that the multistate process is Markov and the intensity process can be written as To characterize the associations between the risk factors and state transition times, we consider the following multiplicative intensity model where W i (t) are p w -dimensional covariates with constant effects on the event intensities and hg are the time-independent coefficients capturing such effects. X i (t) are p x -dimensional covariates with effects that change over time and hg (t) are the time-dependent coefficients capturing such effects. 0 hg (t) are baseline intensity functions. In practical situations some of the states may have very sparse data for estimation. One can consider a more parsimonious model assuming that several transitions share the same intensity model parameters in this case. For example in our real analysis, among those participants who had at least one recurrent event, the average number of type 1 recurrent events observed per subject is around 1.26 and the average number of type 2 recurrent events is about 1.19. It is expected that some intermediate states have sparse event data. Therefore we assume that all the type 1 recurrent event times share the same intensity model parameters and all the type 2 recurrent event times share the same parameters. Another way to tackle the sparse data problem is to choose a smaller M and treat the data from the subjects observed to have more than M intermediate events to be censored (Cook et al., 2004). Different applications may use different values of M and it depends on the model complexity and the background of the data. For example in Ieva et al. (2017) the repeated hospitalization and death in patients with heart failure were investigated using multistate modeling approach. In their research, a maximum of six hospital admissions are modeled, and subsequent admissions (but not deaths) are ignored, due to the sparsity of data from individuals with more than six admissions. We suggest trying different values of M and compute the AIC-type criterion using the formulas in section 3.1 in the Appendix S1. The number M that minimizes the AIC-type criterion can indicate a best fit of the data.
Estimating the regression coefficients is challenging by the fact that covariates X i (t) and W i (t) are not observed at each event time point and are possibly subject to measurement errors. The presence of past event feedback makes the problem even more complicated because we need to properly evaluate the event history information in the longitudinal observations. For each rth component in X i (t), denote the longitudinal observations as X ijr , j = 1, … , m Xir , r = 1, … , p x . Similarly for each sth component in W i (t) denote the observations as W ijs , j = 1, … , m Wis , s = 1, … , p w . Let  i (t−) represent the history of the multistate process of subject i before time t. Assume that the longitudinal observations of the covariates are generated from the following models with additive measurement errors.
where f Xr (⋅) and f Ws (⋅) are known functions of time and past event history. They should satisfy the condition that the intensity process of N i hg (t) will not go to infinity over time (Gjessing et al., 2010). Xir and Wis are vectors of random effects, of dimensions c Xr and c Ws , respectively. Note that we do not make any assumptions about the distributions of these random effects. The measurement errors Xijr and Wijs are mutually independent with Xijr iid ∼ N(0, 2 Xr ) and Wijs iid ∼ N(0, 2 Ws ). The normal distribution assumption is a standard practice in measurement error theories. To assess this normal assumption, one can apply the common method such as QQ plot to check the residuals of the fitted models of the longitudinal data. For simplicity, the measurement errors of any two components of the covariates are assumed to be independent. We also assume that the observation time points t ij are noninformative. The measurement errors are independent of the true covariates, the event times and the censoring time.
Model (3) extends the existing joint models for time-to-event data and longitudinal data, in the sense that it allows the covariates to depend on past event history. The linear combinations T Xir f Xr provide flexible modeling of the true covariate process, allowing various forms of time trajectories and past event feedbacks. Let I(⋅) represents an indicator function. The longitudinal model T Xir f Xr ) represents a linear trajectory of observation times and the type 1 recurrent event history. If the model of the true covariate process is chosen as , it indicates that the levels of biomarker have a nonlinear relationship with the observation times and they are influenced by the successive occurrences of events of the same type. In practice one has to choose the functional forms of the longitudinal trajectory f Xr (⋅) and f Ws (⋅), and it mainly depends on the background and knowledge of the application field (Fisher & Lin, 1999). For example in our real data analysis, blood pressure and cholesterol levels are modeled as a linear trajectory of measurement times plus an indicator function of prior CVD history. The linear trajectory is chosen according to the previous medical research of the ARIC data (Barrett et al., 2019;Paige et al., 2017) and the indicator function is chosen with relevant medical theories (Amarenco et al., 2020;Khan et al., 2017Khan et al., , 2018. If there is no background knowledge of the functional forms, one may try several different functions and compute the AIC-type criterion using the formulas in section 3.1 in Appendix S1. The function with a smaller AIC-type value can indicate a better fit of the data. Model (3) is also applicable to several types of covariates. For example, a precisely measured time-dependent covariate X ir (t) can be accommodated by setting Xijr = 0, and a baseline covariate W is without measurement errors can be handled by setting m Wis = 1 and

ESTIMATION PROCEDURE
When the true values of the biomarkers are known at all event time points, we can obtain an accurate estimator of the coefficients hg (t) and hg in Model (2) based on the local partial likelihood (Cai & Sun, 2003) or the spline-based partial likelihood (Nan et al., 2005). Even if the biomarkers are intermittently measured and possibly with errors, analysis focusing on a single event is still feasible using the conditional score approach (Tsiatis & Davidian, 2001) or the corrected score approach (Huang et al., 2016;Song & Wang, 2008. However in the case of multiple events with feedback mechanism, the existing methods cannot be easily adapted to obtain a valid estimate. In what follows, we develop an inference procedure based on an extension to the corrected score approach. The idea is that we first estimate the values of the time-varying covariates at each event time point using the least squares method. We also approximate the time-dependent coefficients with polynomial splines. Then we replace the corresponding quantities in the score function of the log-partial likelihood with the least square estimates and the spline approximations, and derive the resulting bias due to the substitution. The asymptotically unbiased estimating equations are constructed.

The estimating equations
To calculate the least square estimates of the covariate value at time t, we use only the longitudinal data up to and including time t. This is to ensure the measurable property of the covariate estimates, which is essential for the derivation of the theoretical results. Let m Xir (t) be the number of longitudinal observations of the rth component of X i (u) before time t and similarly define m Wis (t) for W i (u). The estimated values of the covariates at time t are denoted asX ir (t) and W is (t). And the conditional variances ofX ir (t) andŴ is (t) given the measurement time points and the variances of the measurement errors 2 Xr and 2 Ws , are denoted aŝ2 Xir (t) and̂2 Wis (t). The detailed derivation is placed in section 1 of Appendix S1. The matrix notations can be easily written asX ) T . If we simply replace the unknown values of the true covariates with their least squares estimates, the "naive" score function of the log-partial likelihood is where the observation period is [0, L]. Let  i (t−) be the filtration containing all the history information of the multi-state process and the covariate process before time t for individual i. Let } . Using the following property of a normal random variable X with mean and variance matrix Σ: E{ (X − Σ ) exp( T X − 1∕2 T Σ ) } = exp( T ), it can be shown that the conditional expectation of the numerator in the score function is Therefore the biases are resulted from the terms involving j X (u) and j W (u). To show the spline-based approximation of the time-dependent coefficients, we need the following notations. For the rth component in hg (t), let n hg,r be the number of knots of the spline function that approximates it and let d be the degree of the spline function. ) p x r=1 , and the vector containing the spline coefficients and constant parameters as } be a D hg × D hg -dimensional matrix of the conditional variances of the estimated covariates.
Denote C i as the censoring time of subject i and we assume that it is noninformative.
be the overall at risk process. Denote A ⊗0 = 1, A ⊗1 = A and A ⊗2 = AA T for a general vector A. The following functions are defined according to the results in (5), which aim to remove the bias due to the least square estimates of the covariates. For l = 0, 1, 2, Combining (4) and (5), it can be easily shown that the conditional expectations of the numerator and denominator in S (1) hg ( hg , u)∕S (0) hg ( hg , u) in the following estimating function equal to the corresponding quantities in the score function of the log-partial likelihood with true values of the covariates. Therefore as the sample size increases, the fraction approaches the corresponding fraction in (4) with true covariates. The asymptotically unbiased estimating equation of hg is . The Hessian function can be written as Given (6) and (7), the estimates of the coefficients can be found using Newton's method.

Theoretical properties
We will next present the asymptotic results of the estimators, which are used to calculate the standard errors and confidence intervals. The proofs are given in Appendix S1. We first define the norm functions that are used in the theorems. For any matrix function F(t), denote the supremum norm as ||F|| ∞ = max i,j sup t |f i,j (t)|. Let the empirical L 2 -norm be ||F(t)|| 2 n = 1∕n ∑ n i=1 F T (t)F(t) and the corresponding theoretical norm be ||F(t)|| 2 2 = E{F T (t)F(t)}. For any vector a, denote the Euclidean norm as ||a||. For simplicity, we assume that the number of knots n hg,r are the same for each component in hg (t) and write n hg,r as n hg . We consider orthonormal B-spline basis functions for ( B hg,r (t) ) p x r=1 . Let I A×B be a A by B dimensional diagonal matrix with the first B diagonal elements to be 1 and the rest to be 0. Let and S (l) hg ( hg , u) in the proposed estimating equation (6), except that some terms involving the product of spline basis function and covariate are replaced with the covariate itself.
. The following functions are notated in a way that the letter I is used when the true values of the covariates are incorporated and the letter S is used when the least squares estimates of the covariates are incorporated. For l = 0, 1, 2, define ) , where we simply write hg (t) as hg for the argument of I (l) hg and Q (l) hg ( hg , u) as the limit of S (l) hg ( hg , u), I (l) hg ( hg , u) andĨ (l) hg ( hg , u), respectively. The following notations are defined for the theoretical properties of the proposed estimating equations, which will be used to derive the asymptotic variance-covariance matrix of the estimator of the time-fixed regression coefficient.
Note thatU * hg ( hg ) is some quantity derived from the negative of the Hessian matrix. Denote the limit of the variance-covariance matrix as v hg ( hg ) = E and denote the limit of the negative Hessian matrix asu * , and defining then we can write the asymptotic variance-covariance matrix of̂h g as Assume that the following regularity conditions hold for Models (2.2) and (2.3).
(A1) P The above conditions will be used to derive the theoretical properties of the parameter esti-matorŝh g and the nonparametric estimatorŝh g (t). Conditions (A1) and (A2) are standard assumptions for proportional intensity models. Condition (A3) ensures that the counting process martingale properties are valid in the case of intensity models with time-dependent covariates. Condition (A5) ensures that the Lindeberg's condition is satisfied and also provides conditions under which the functional law of large numbers can be applied. Condition (A5) and Condition (A6) are standard regularity conditions in time-to-event data analysis with spline approximations. Condition (A6) implies that the limit of the Hessian matrix is positive definite.

NUMERICAL STUDY
We conducted several groups of simulation studies to assess the finite sample performance of the proposed estimation methods. The following scenarios were considered. Scenario 1: Random coefficients of the time-dependent covariate process were assumed to be normally distributed. Different sample size, different censoring percentages and different strength of feedback were considered: (a) sample size n = 500, censoring rate 50% (Censoring times were generated from a uniform distribution U[0, 3] and truncated at 1), (b) n = 1000, censoring rate 50%, (c) n = 500, censoring rate 35% (Censoring times were all chosen as 1), (d) greater impact of past event feedback. Scenario 2: Random coefficients of the time-dependent covariate process were generated from other distributions: (a) mixture of normal distributions (b) mixture of normal distributions and smaller variances of the measurement errors (c) Dirichlet distributions. Scenario 3: The true models that generate the longitudinal data did not contain past event feedbacks. For all scenarios, we generated two types of recurrent events. Let  1 denote the set of possible transition paths for type 1 recurrent events and Here we assume that events of the same type share the same baseline intensity functions and regression coefficients, to mimic the real analysis. The estimating equations can be similarly derived with some straightforward algebra and are presented in section 2 in Appendix S1. W i1 is a baseline covariate generated from a Bernoulli distribution with probability 0.5. The observed data of X i (t) and W i2 (t) are generated from the following models The function of past event history in the first model implies that the intercept of W i2 (t) will change after the first occurrence of type 1 recurrent event and the function of event history in the second model implies that the intercept of X i (t) will change after each occurrence of recurrent events of any kind and will remain the same after the fifth recurrent event. Particularly, the functional form of the past event feedback in W i2 (t) mimic the setting of the real analysis, where we assume that the blood pressure profile is related to prior MI (type 1 recurrent event) and the levels of HDLC is associated with previous stroke (type 2 recurrent event). var (  In each scenario, the sample size was chosen as 500 and simulation results were based on 500 Monte Carlo replications. Cubic splines were used to estimate the time-varying coefficients. The number of knots for each spline function was chosen by minimizing a AIC-type criterion based on the corrected log-partial likelihood function. The detailed formulas can be found in section 3.1 in Appendix S1. The time-varying coefficients and constant coefficients in all scenarios were estimated in four ways: (1) our proposed estimator (DC) where D is short for dynamic feedbacks and C is short for bias correction; (2) the naive estimator that ignores the dynamic feedbacks (NaivD); (3) the naive estimator using least squares estimate of the random coefficients without correcting bias (NaivC); and (4) the ideal estimator (I) that assumes the random coefficients to be known at each time point. For each simulation, we calculated the following summary statistics for the constant parameters: the difference between the average of the estimates and the true parameter (Bias), the Monte Carlo SD, the average of the estimated SE, and the 95% empirical CP. As for the time-varying coefficients, the average of the spline estimates, the 95% pointwise confidence band and the pointwise coverage probability were calculated. To save space, the simulation results under Scenario 1, setting (a) and Scenario 2, setting (a) are presented here and the rest are shown in Appendix S1.

Scenario 1, setting (a)
In this simulation, the random coefficients of the time-dependent covariates were generated from normal distributions. In settings (a)-(c), the time-to-event data and longitudinal data are generated from a system with moderate feedback effects. The mean of Xi was chosen as (1, −0.2, 0.3) and the variance was (0.5, 0.1, 0.1) with correlations corr( Xi0 , Xi1 ) = 0.2, corr( Xi0 , Xi2 ) = 0.8, corr( Xi1 , Xi2 ) = 0.6. Wi were sampled from a normal distribution with mean (1, −0.5, 0.2), variance (0.4, 0.2, 0.1) and correlations corr( Wi0 , Wi1 ) = 0.2, corr( Wi0 , Wi2 ) = 0.8, corr( Wi1 , Wi2 ) = 0.6. In setting (d), we generated data from a stochastic processes with stronger feedback by setting E( Xi2 ) = 0.5 and E( Wi2 ) = 0.8 with the other parameters remaining the same. Table 1 presents the estimation result of the constant parameters when sample size n = 500 and the censoring rate is around 50%. We can see that our proposed estimator is noticeably TA B L E 1 Estimation of the constant parameters in Simulation 1 setting (a) unbiased and the Monte Carlo SDs agree with the estimated standard errors. The empirical coverage probabilities are very close to their nominal values 95%. On the other hand, the NaivC estimator is seriously biased with poor coverage percentage. The NaivD estimator is also biased with coverage probability below the nominal level. The four estimators perform almost the same in terms of the coverage probabilities of the coefficients of the baseline covariates. This is expected since the baseline covariates were assumed to be precisely measured and did not contain any past event feedback. Therefore ignoring the past event feedback or the bias resulted from the measurement errors in the time-dependent covariates have less impact on the estimation of the coefficients of the baseline covariates. Figure 2 displays the true and fitted curves of the time-varying coefficients. The estimates based on the proposed method are close to the true curves. Our proposed estimator shows better performance than the naive ones in terms of the pointwise coverage probabilities. It can be seen that the coverage rates of the time-dependent coefficients calculated via the naive methods are lower comparing to the proposed method at most of the time points.
It can be found in Figure 2 that the coverage probabilities of 2 (t) and 3 (t) were closer to the nominal levels comparing to those of 1 (t). As discussed in Cummins et al. (2001), nonparametric curve estimates based on a single global smoothing parameter tend to be most highly biased at points of sharp curvature. And because bias in nonparametric estimation is not uniform, they do not hold the desired level of coverage probabilities uniformly at all design points. In our estimation procedure, there is only one global smoothing parameter which is the number of knots. And the curvature of 1 (t) is greater than that of 2 (t) and 3 (t), especially around the minimum point of 1 (t). Therefore it is expected that the coverage probabilities of 1 (t) around the minimum point would be lower. The simulations based on a larger sample size and a lower censoring rate presented in Appendix S1 show that the performance of our proposed estimator can be further improved with a larger effective sample size. The simulation results also display a more

Scenario 2, setting (a)
In this simulation, the random effects of the longitudinal covariates were generated from a mixture of two normal distributions. This simulation aims to show that our estimation method is robust against random effect distributions. The mixing proportion was chosen as 0.3. The means TA B L E 2 Estimation of the constant parameters in Simulation 2 setting (a) and distances of the two distinct normal distributions being mixed were chosen in a way such that the mean of the generated random coefficients Xi and Wi were the same as those in Scenario 1. The skewness of the generated random coefficients Xi were 0.33, −0.04, and −0.15; and the skewness of Wi were 0.20, −0.15, and −0.04. The variances of Xi were 0.86, 0.12, and 0.12; and the variances of Wi were 0.75, 0.23, and 0.13. The censoring rate was around 50%. The estimation results are presented in Table 2 and Figure 3. It is shown that the naive estimators are biased with poor coverage probabilities. The proposed estimators outperform the naive estimators. The coverage probabilities of̂1(t) around its minimum point was a bit lower than the nominal level due to the greater curvature. But it can be shown by the results of setting (b) which are presented in Appendix S1 that the coverage probabilities improve with the decrease of the variance of the measurement errors. The simulation result of setting (c) shows that the proposed estimators still perform well when the random coefficients were generated from other asymmetric distributions with greater skewness.

APPLICATION
In this section, we apply the proposed model and estimation method to the ARIC study described in the Introduction. Our aim is to investigate the association of the biomarker trajectory and other established risk factors with recurrent MI, stroke, and total mortality. We restricted our attention to White male participants living in Washington County and Suburban Minneapolis, who had no history of stroke or MI. The events of interest were definite MI, probable MI, definite stroke, probable stroke, and all death events before 2011. The follow-up data of biomarker measurements and censoring information before 2011 were used. The final sample included 2764 subjects with 445 MI events, 210 stroke events, and 29.2% mortality rate. The covariates considered were longitudinal HDLC (mmol/L), longitudinal SBP (in mmHg) and the following baseline covariates: age (in years); indicators for diabetes (1 for fasting glucose ≥ 126 mg/dl and 0 otherwise), indicators for hypertension medication (1 for hypertension lowering medication within past 2 weeks at baseline examination, and 0 otherwise) and smoking (1 for current smoker at baseline examination and 0 otherwise). Among them, age was divided by 10 and SBP measurements were divided by 50. All event times and longitudinal observation times were divided by 365. The structure of time-varying coefficients and constant parameters in the multi-state models were chosen based on some preliminary analysis treating the effects of all covariates nonparametrically. It was found that the coefficients of SBP, HDLC and hypertension did not change much over time and in addition, the effect of SBP and HDLC were not significant over most of the time.
Therefore we assumed that these three covariates have constant effects. Using the same notations of the sets of possible transition paths  1 ,  2 , and  3 in Section 4, we considered the following models i hg (t) = 0,1 (t) exp( 1,1 (t)age i + 1,2 (t)diab i + 1,3 (t)smoke i where M is the maximum number of nonterminal events observed among all subjects. M = 6 in our sample. The first model is for recurrent MI events, the second model is for recurrent stroke events and the third one is for fatal events. We assumed that the intensity functions of the same type of events share the same baseline intensities and regression coefficients. The models of SBP and HDLC were chosen as follows according to the longitudinal data trajectories shown in Appendix S1 and the biological knowledge discussed in  Table 3, we can see that the current levels of SBP were positively and significantly associated with the intensity of stroke events. This suggests that an increase of 50 mmHg of SBP level was related to a exp(1.2239) = 3.4 higher risk of stroke occurrences. This is consistent with existing results in the medical research of the ARIC data. Shahar et al. (2003) found that baseline SBP was positively associated with incident stroke events and Petruski-Ivleva et al. (2016) found that an increase in SBP independent of baseline is associated with increased risk of stroke. In our model

Full model
. Since both the baseline SBP level and the increase in SPB along time are positively associated with the risk of stroke occurrences, it is expected that the current levels of SBP, which summarize both effects, are positively associated with the stroke event intensity. As shown in Table 3, the current levels of SBP was found to be negatively associated with the risk of MI event and death, but these effects are not significant. Although existing research found that the baseline SBP was positively associated with the risk of MI and death (Mok et al., 2018), some studies found that the slope of SBP was negatively associated with CVD events (Barrett et al., 2019). This may help in understanding why the associations between the current levels of SBP and the risks of MI and death are not significant as the effects of the baseline SBP level and the effects of the slope of SBP have opposing signs. HDLC was negatively related to the risk of all types of events, but neither of the effects are significant. Taking hypertension medication was significantly and positively associated with death, indicating that subjects with hypertension medication use are more likely to experience a fatal event. It can be seen from Figures 4 and 6 that the baseline age and smoking status were positively associated with the risks of MI events and death. These effects become stronger and significant as time moves on. Figure 5 shows that the baseline age, diabetes and smoking status were all positively associated with the risks of stroke events over most of the time.
Comparing the results obtained from the full model and the reduced model, it can be found that the strength and the significance of some coefficients are quite different. We ran a preliminary longitudinal data analysis of the SBP and HDLC data using linear mixed effects models and the results show that the fixed effects of the CVD event history are significant for both biomarkers. Therefore it is necessary to include the event history term in the time-dependent covariate models and ignoring the past event feedbacks yields biased estimation results as shown in the simulation studies. Therefore the statistical evidence provided by the reduced model may be invalid.

DISCUSSION
This paper proposed a novel modeling framework for the analysis of multitype recurrent event and terminal event data. Our methodology has great flexibilities and are computationally efficient. First of all, the formulations of the intensity functions are allowed to depend on the event order or/and event types. Various forms of event feedbacks can be considered. Second, no distribution assumptions are required for the random effects in the covariate process. Third, standard Cox regression models are extended to include both time-dependent coefficients and constant coefficients. The use of regression splines provides an computationally efficient estimate of the time-dependent coefficients, comparing to the local polynomials approach where estimation should be implemented at a series of dense grid points. Fourth, an asymptotically unbiased estimating equation is developed and the variance estimators of the coefficients are derived. The variance estimators can be used to construct hypothesis tests and confidence intervals without going to bootstrap methods. In the real analysis, we assumed that the prior CVD history provide feedback to the future occurrences of CVD events through the biomarkers SBP and HDLC. Though there is medical evidence for the association between prior CVD events and the trajectories of SBP and HDLC, a formal hypothesis test for testing the existence of past event feedback is desirable. In addition, we assumed a Markov model for the multistate events. However in some applications the form of past event feedback may result in assumption violations, for example when the event history is a function of state duration time. In this case, the multi-state model with state duration times as event feedbacks is a non-Markov model. Therefore model checking techniques to evaluate the Markov assumption is essential. And an extension of the proposed modeling framework to the semi-Markov or non-Markov settings will be left as future work.