In many medical follow-up studies, each subject may experience a series of events of different nature, representing different states of a process. A fundamental question in analysing such multiple event data is the estimation of the multivariate distribution function for gap times in the presence of right censoring. In the sequentially ordered multivariate failure time setting, the gap times except the first one are subject to induced dependent censoring, even if the overall follow-up time is independently right censored. Analysis should take account of this dependent censoring, and the corresponding estimation method for gap times is challenging. We present several convenient and simple approaches to estimating the joint and conditional distributions or survival functions of gap times in the following.

##### Joint and conditional distribution estimations by inverse probability weighting method

Suppose that each subject under observation experience *J* ordered events of different types at times *Y*_{1} < *Y*_{2} < … < *Y*_{J}. The gap times of interest are *T*_{1} = *Y*_{1},*T*_{2} = *Y*_{2} − *Y*_{1}, … ,*T*_{J} = *Y*_{J} − *Y*_{J − 1}. The *j*-th gap time, *T*_{j}, is censored by *C* − *Y*_{j − 1}. A non-parametric estimator for the joint distribution function of gap times is provided by Lin *et al*. (1999). In this method, the follow-up time is subject to independent right censoring, but no assumption is imposed on the dependence structure of gap times. The inverse probability weighting methodology is used to adjust for dependent censoring.

For ease of discussion, we first consider the case when *J* = 2, then extend the estimation method to the general setting of *J* > 2. Suppose that there are *n* subjects in the study, each experiences two events at times *Y*_{1} < *Y*_{2}. The gap times are *T*_{1} = *Y*_{1} and *T*_{2} = *Y*_{2} − *Y*_{1}. The observed data consist of , where ,*δ*_{ij} = *I*(*Y*_{ij} ≤ *C*_{i}) for *j* = 1,2, *i* = 1, … ,*n*; ∧ is the minimization function, and *I*( · ) is the indicator function. Let *F* be the joint distribution function of (*T*_{1},*T*_{2}) and *G* be the survival function of the censoring time *C*. We have

where *H*(*t*_{1},*t*_{2}) = *P*(*T*_{1} ≤ *t*_{1},*T*_{2} > *t*_{2}). Therefore, an estimator of *F* can be obtained by estimating *H*. Because , we have

which indicates that *H*(*t*_{1},*t*_{2}) can be estimated by a weighted empirical estimator,

where is the Kaplan–Meier estimator of *G* based on the data or for *i* = 1, … ,*n*, and *τ*_{c} = sup{*t* : *G*(*t*) > 0}. The corresponding estimator of *F*(*t*_{1},*t*_{2}) is , for *t*_{1} + *t*_{2} < *τ*_{c}. If *t*_{1} + *t*_{2} > *τ*_{c}, *F*(*t*_{1},*t*_{2}) and *H*(*t*_{1},*t*_{2}) are not estimable. In addition, the estimator is not always a proper distribution function in that it may have negative mass, although it converges to a proper distribution for large samples. As shown in Lin *et al*. (1999), the estimator is strongly consistent, and the process converges weakly to a bivariate zero-mean Gaussian process with covariance function

where *D*(*t*_{1},*t*_{2};*u*) = {pr(*T*_{1} ≤ *t*_{1}) − pr(*T*_{1} ≤ *u*)}^{ + } − {*H*(*t*_{1},*t*_{2}) − *H*(*u* − *t*_{2},*t*_{2})}^{ + }, and Λ_{c} is the cumulative hazard function of *C*.

For the first gap time *T*_{1}, the marginal distribution can be estimated by , which is identical to the Kaplan–Meier estimator. Although the marginal distribution function for *T*_{2} is not generally estimable, it is interesting and possible to estimate the conditional distribution *F*_{2 | 1}(*t*_{2} | *t*_{1}) = *P*(*T*_{2} ≤ *t*_{2} ∣ *T*_{1} ≤ *t*_{1}) = *F*(*t*_{1},*t*_{2}) ∕ *F*(*t*_{1}, ∞ ) as long as *t*_{1} + *t*_{2} < *τ*_{c}. A natural estimator of *F*_{2 | 1}(*t*_{2} | *t*_{1}) is obtained by replacing *F* with . The conditional estimator is consistent and converges weakly to zero-mean Gaussian process with easily estimated covariance function (Lin *et al*., 1999). It is straightforward to extend the method to the general case of *J* > 2. Define **t** = (*t*_{1}, … ,*t*_{J}), **t**_{0} = (*t*_{1}, … ,*t*_{J − 1},0), *F* (**t**) = *P*(*T*_{1} ≤ *t*_{1}, … ,*T*_{J} ≤ *t*_{J}), and *H*(**t**) = *P*(*T*_{1} ≤ *t*_{1}, … ,*T*_{J − 1} ≤ *t*_{J − 1},*T*_{J} > *t*_{J}).Obviously, *F*(**t**) = *H*(**t**_{0}) − *H*(**t**). Similarly, *H*(**t**) can be estimated by

The method provides a simple solution to gap time distribution estimation for multivariate failure times. The joint distribution function is estimated by an empirical mean-type estimator with a closed-form covariance structure. Essentially, each observation is inversely weighted by the probability of being uncensored, which effectively removes the bias from dependent censoring. Estimation of the conditional distribution for the second gap times is provided, and such measure could be more informative and ready to interpret for practitioners. Taking a natural history study of HIV, for example, the event sequence is of interest, is birth HIV death. A regression analysis of time from HIV to death on age at HIV infection, based on the Cox proportional hazards model, would give some general understanding of the relationship between the corresponding two gap times. Nevertheless, the disease progression could be more precisely depicted by the conditional probability of surviving beyond a certain amount of time given some restricted age at infection. In addition, it is often of interest to compare gap time distributions among groups of subjects. The method plays an important role in constructing a family of non-parametric two-sample tests for difference in gap time distributions (Lin & Ying, 2001). The non-parametric method, however, requires that the censoring time is completely independent of gap times, which could be a practical limitation in some contexts.

##### Conditional survival estimation based on conditional cumulative hazard

As an alternate to the aforementioned method suggested by Lin *et al*. (1999), estimate of conditional survival function of gap time can be developed on the basis of the corresponding conditional hazard function, via the relationship of *S*(*t*) = exp{ − Λ(*t*)} where Λ(*t*) is the cumulative hazard function. In this way, the resulting estimate is not subject to negative mass, which is a problem with many survival function estimators that depend on estimation of the joint survival function as in Section 3.2.1.

Because of non-identifiability of the marginal distributions of the second and subsequent gap times, the conditional survival function of the *j*-th gap time *T*_{j} given the (*j* − 1)-th event occurring prior to a fixed time, *S*_{j ∣ j − 1}(*t*_{j} | *t*_{j − 1}) = *P* (*T*_{j} > *t*_{j} ∣ *Y*_{j − 1} ≤ *t*_{j − 1}) where *Y*_{j − 1} is the total time to the (*j* − 1)-th event, is of interest for *j* ≥ 2. Schaubel & Cai (2004a) proposed to estimate the condition survival function based on a conditional cumulative hazard estimation. In the absence of censoring, the conditional cumulative hazard for (*T*_{j} ∣ *Y*_{j − 1} ≤ *t*_{j − 1}) could be estimated by

- (1)

where and (*s*,*Y*_{i,j − 1} ≤ *t*_{j − 1}). With censoring, the observed gap time data consist of where , *δ*_{ij} = *I*(*T*_{ij} ≤ *C*_{ij}), *C*_{ij} = *C*_{i} − *Y*_{i,j − 1}, and *C*_{i} is the overall censoring time for subject *i*, for *j* = 1,2, … , and *i* = 1, … ,*n*. Denote *G*(*t*) = *P*(*C*_{i} > *t*). In the presence of censoring, for *j* ≥ 2, *S*_{j ∣ j − 1}(*t*_{j} | *t*_{j − 1}) is identifiable for *t*_{j} ∈ [0,*τ*_{j}] where *t*_{j − 1} + *τ*_{j} ≤ *τ*_{c} and *τ*_{c} = sup{*t* : *G*(*t*) > 0}. We have for *j* ≥ 2,

where and . An estimate of the conditional cumulative hazard function can be obtained by replacing potentially unobservable random variables in (1) with some consistent estimates of quantities with the same conditional expectation, and is given as

for *j* ≥ 2, where are the ordered and distinct uncensored gap times for *T*_{j}, and is the Kaplan–Meier estimator of *G*(*t*) based on (*Y*_{iJ} ∧ *C*_{i},1 − *δ*_{iJ}) for *i* = 1, … ,*n*. The conditional survival function of interest can then be estimated by , which is bounded by 0 and 1 and is monotone in *t*_{j}, for *t*_{j} ∈ [0,*τ*_{j}]. The asymptotic results are established by empirical process technique and functional delta method. The estimator is uniformly consistent, and converges weakly to a zero-mean Gaussian process with a covariance function that can be consistently estimated by where is given by formula ((3)3) in Schaubel & Cai (2004a). Simultaneously, a 100(1 − *α*)*%* confidence band for the estimator can be constructed on the basis of the normality result.

Similar in spirit to Lin *et al*. (1999), the method adjusts for induced dependent censoring by weighting risk set contributions by the inverse of the probability of being uncensored. The conditional survival function is estimated directly through a cumulative hazard estimator, rather than through a joint distribution estimator. One advantage of the estimator compared with that in Section 3.2.1 is the ease in computing asymptotic standard errors, which is important in practice. For the sake of identifiability, the method conditions on the (*j* − 1)-th event occurring in the [0,*t*_{j − 1}] interval. In application, such conditioning may often make the survival estimators more meaningful to investigators, particularly when the conditioning serves to identify the subjects of primary interest. For instance, in the disease process of HIV AIDS death, it could argue that the subjects developed AIDS within 1year after HIV infection may have an extremely bad disease progression, and therefore, special focus should be placed on the survival behaviour after AIDS onset for these subjects. The method could be applied to address questions of this kind.

##### Joint distribution estimation based on method for marked survival data

Huang & Louis (1998) proposed a general estimation procedure for the joint distribution of survival time and mark variables that mark the endpoints and are observed only if the survival time is uncensored. Analogous to the product integral representation of survival function, the joint distribution of survival time and mark variables can be represented by a cumulative mark-specific hazard function, which serves as a basis of the estimation method. In the sequentially ordered multiple events setting, the survival time is the overall follow-up time, and the mark variables are a vector of gap times. Therefore, by appropriately choosing endpoints and mark variables, estimator for the joint distribution of gap times can be obtained using method for marked survival data. We present the gap times of interest and the corresponding procedure for estimating their joint distribution function under the framework of marked survival data.

For simplicity of discussion, we consider the case when *J* = 2, and the gap times are *T*_{1} and *T*_{2}. Define random variables *X* = (*T*_{1},*T*_{2}) as the bivariate mark vector, *Y* = *T*_{1} + *T*_{2} as the survival time and denote their joint distribution function by *F*_{XY}, and the marginal survival function of *Y* by *S*_{Y}. For *x* = (*t*_{1},*t*_{2}),*F*_{XY}(*x*,*y*) = *P*(*T*_{1} ≤ *t*_{1},*T*_{2} ≤ *t*_{2},*Y* ≤ *y*) and *S*_{Y}(*y*) = 1 − *F*_{XY}{( ∞ , ∞ ),*y*}. Obviously, the joint distribution function of (*T*_{1},*T*_{2}) is determined by *F*_{XY} through *F*(*t*_{1},*t*_{2}) = *F*_{XY}{(*t*_{1},*t*_{2}),*t*_{1} + *t*_{2}}. Suppose the overall follow-up time *Y* is subject to independent censoring with censoring time *C* and the values of *X* = (*T*_{1},*T*_{2}) are completely observed only when *Y* is uncensored. The observed data consist of where and *δ*_{i} = *I*(*Y*_{i} ≤ *C*_{i}), for *i* = 1, … ,*n*. Denote the marginal survival function of by and let . Under independent censoring assumption of *Y* and *C*, the joint distribution function *F*_{XY} can be expressed as

where, , and *G*( · ) is the survival function of *C*. In construction of the estimator of *F*_{XY}, the survival function *S*_{Y} is estimated by the Kaplan–Meier estimator, and are estimated by their corresponding empirical measures. The joint distribution of gap times (*T*_{1},*T*_{2}) is then estimated by , which enjoys desirable asymptotic properties following Theorem 6 in Huang & Louis (1998). By defining a mark vector *X* = (*T*_{1}, … ,*T*_{J}) and a survival time *Y* = *T*_{1} + … + *T*_{J} based on gap times, it is straightforward to extend the method to the general case of *J* > 2. The resulting estimator is unbiased, uniformly strongly consistent and asymptotically normal. As other methods considered in this paper, the approach relies on the random censorship assumption.

In summary, the three non-parametric methods, presented in this section to estimate the gap time distribution for multivariate failure times, appropriately adjust for induced dependent censoring. For the conditional survival function, the variance is easier to compute for the estimator on the basis of conditional cumulative hazard than the one derived from the estimated joint and marginal distributions. In addition, simultaneous confidence bands can be constructed by the method based on conditional cumulative hazard. Neither of the other two methods provide such techniques. Further, by combining with the method for univariate recurrent events in Section 3.1, the method based on marked survival data not only can be applied to analyse gap times from multivariate failure times but also can be used to draw inference on bivariate alternating gap time data from bivariate recurrent events as shown in Section 3.3.