A semi-competing risks model for data with interval-censoring and informative observation: An application to the MRC cognitive function and ageing study

Semi-competing risks data occur frequently in medical research when interest is in simultaneous modelling of two or more processes, one of which may censor the others. We consider the analysis of semi-competing risks data in the presence of interval-censoring and informative loss-to-followup. The work is motivated by a data set from the MRC UK Cognitive Function and Ageing Study, which we use to model two processes, cognitive impairment and death. Analysis is carried out using a multi-state model, which is an extension of that used by Siannis et al. (Statist. Med. 2007; 26:426–442) to model semi-competing risks data with exact transition times, to data which is interval-censored. Model parameters are estimated using maximum likelihood. The role of a sensitivity parameter k, which influences the nature of informative censoring, is explored. Copyright © 2010 John Wiley & Sons, Ltd.


Introduction
A competing risks framework consists of survival data where failure may be due to one of a number of competing causes. This notion can be extended to that of semi-competing risks, where one type of event may censor the other events, but not vice versa. The 'censoring' event is sometimes known as the terminal event. A simple example of a semi-competing risks model is an illness-death model, where death may occur after illness, but death censors illness. In this case individuals who have a greater risk of death may also have a greater risk of illness, resulting in informative censoring of illness by death. Semi-competing risks frameworks have previously been discussed in Day et al. [1] and Fine et al. [2]. A multi-state model approach to semi-competing risks data was suggested by Wang [3]. We will use a fully parametric version of multi-state models reviewed by Putter et al. in [4], extended to allow for interval censoring. Covariates are incorporated naturally via proportional hazards assumptions.
Our work is motivated by the MRC UK Cognitive Function and Ageing Study (MRC CFAS), which is a longitudinal study investigating the cognitive function of older people [5]. Cognitive function has been assessed in this study using a continuous measure, but for research purposes has been dichotomized into two states: healthy and cognitively impaired (CI). Our aim is to jointly model two processes, CI and death. The CI process may be censored informatively by death, but there is an additional censoring process which may also be informative-participants who are CI may be more likely to withdraw from the study. We will therefore consider a semi-competing risks model with a death process, and two competing non-terminal processes, CI and loss-to-followup (LTF). A multi-state model analysis of the MRC CFAS data that does not take account of informative LTF has been carried out by van den Hout and Matthews [6].
Our model extends that of Siannis et al. [7], who consider a five-state model where the LTF process is modelled explicitly by introducing LTF states in the illness-death model. Siannis et al. [7] assumed that transition times were known exactly. The situation is more complicated for the analysis of the MRC CFAS data because there are long gaps of up to 8 years between measurements of cognitive function; hence, we must consider the data to be interval-censored. As a result our model will require an additional transition compared with the model used by Siannis et al. In Section 2 we will describe the MRC CFAS. In Section 3 we will describe our model and methods of inference. In Section 4 we will present our data analysis, and in Section 5 we give some concluding remarks and discuss outstanding issues.

MRC CFAS
MRC CFAS is a UK-based longitudinal multi-centre study investigating the health and cognitive function of older people [5]. The six centres in the study are Newcastle, Nottingham, Liverpool, Cambridgeshire, Gwynedd and Oxford. CI was assessed using the Mini-Mental State Examination (MMSE), a widely used test of memory and cognitive function [8]. MMSE scores range from 0 to 30 with scores less than or equal to 21 indicating CI. Participants who were assessed as CI at the prevalence screen, or at the incidence screen 2 years later, were scheduled to have assessments every one to 3 years. A random sample of healthy participants also followed this assessment schedule. The remainder were assessed less frequently, with gaps of up to 8 years between assessments, which means that we must consider the data to be interval-censored. All participants were flagged in the Medical Research Information Service, so exact death times were recorded for all those who died during the study, including those who were LTF. Because the pattern of observations is fixed by the design of the study, it is not informative in MRC CFAS. In other contexts this may not be true, and might require consideration. Figure 1 shows the model used by Siannis et al. [7] to model cardiovascular disease with data from the Whitehall study. In this model the process of being LTF is modelled explicitly by the inclusion of an LTF state in the model. Assumption of non-informative LTF has been avoided by allowing transition rates to a Non-Fatal event and death to differ after LTF (i.e. 1 = 5 and 2 = 6 ). Non-fatal events were not observed after LTF, hence, Non-Fatal(LTF) is an unobserved state, indicated in the model diagram by a dotted box. There is no transition from the Non-Fatal state to the LTF state in this model because the interest is in the time of the first non-fatal event, after which LTF provides no extra information as full information is available on mortality.   Figure 2 shows the extension of this model that we have used to model the interval-censored data from MRC CFAS. In this model CI takes the place of the Non-Fatal state in the Whitehall model. We have introduced a transition from the CI state to the LTF state because otherwise participants who become LTF are assumed to be healthy until the time of LTF. This was not a problem for the Whitehall data because exact transition times were assumed to be known, and so any non-fatal event prior to LTF would have been observed. For the interval-censored data set we can no longer make this assumption, and we must therefore allow the possibility of participants becoming CI in the time interval between the last healthy observation and the time of LTF. The state Cognitively Impaired And Lost-To-Followup (CI(LTF)) is now observed for participants LTF after CI. But it remains unobserved for those who were healthy prior to LTF because we do not know whether or not they become CI later on. In this model we have implicitly assumed that the hazard of death for those who are in the CI(LTF) state is independent of the order in which CI and LTF has occurred.

The likelihood
There are 12 routes that we might observe a participant to take in the MRC CFAS model. where 'H' represents the healthy state, 'CI' is the cognitively impaired state, 'D' is the death state, 'LTF' is the lost-to-followup state and 'CI(LTF)' is the cognitively impaired and lost-to-followup state. Some observable routes correspond to more than one model trajectory, for example observable route (5) corresponds to H → LTF and H → LTF → CI(LTF).
We will fit the model using maximum likelihood estimation. The likelihood function is a product of the probabilities of observed routes where Q ji is the probability of observing route j for subject i and I ji is an indicator function for subject i taking route j. Expressions for the Q ji are given in the Appendix. Because our data are interval-censored, we must sum over all possible trajectories that might have taken place between observations when calculating a transition probability. For example, if we observe a transition from the healthy state to the death state we must allow for the possibility of the participant having passed through the CI and/or LTF states between the last healthy observation and the time of death.
We take the timescale t to be the time from study entry, as in van den Hout and Matthews [6] (see also the discussion in Siannis et al. [7]). We use Weibull hazards for transition rates: logistic model for the probability of a participant being CI at the prevalence screen, again with age and sex as covariates, as was used by van den Hout and Matthews in [6].

Constraints
Not all of the parameters in the MRC CFAS model are identifiable, and we must therefore introduce some constraints on parameters. We use one constraint as used by Siannis et al.: Although somewhat unusual, this constraint was chosen to impose minimal restrictions on parameter values. The assumption is that the ratio of the hazard of death to the hazard of CI for those who are not LTF is proportional to the equivalent ratio for those who are. Note that it is only the intercept parameters which are affected by the choice of k in equation (2). In particular, regression coefficients for explanatory variables should be robust to changes in the value of k. (We thank one of our referees for highlighting this.) There is no information in the data concerning the value of the parameter k, and this parameter must therefore be regarded as a fixed constant. We will test the sensitivity of our results using various values of k.
The sensitivity parameter k in equation (2) plays a very specific role in the analysis, exploring certain aspects of non-ignorable censoring. As a default, analysis is carried out assuming k = 1, which means that the ratio of the'healthy to death' rate to the CI rate is not affected by LTF. The assumption of k = 1 does not exclude the possibility of informative LTF, since the two rates may change in proportion to each other after LTF, even though the ratio remains unchanged. The parameter k, therefore, measures the relative change of one rate to the other following LTF. If k>1 then the CI rate increases more after LTF relative to the death rate, while k<1 implies exactly the opposite.
Unlike Siannis et al. we do not need to introduce any further parameter constraints to ensure identifiability of our model. This is a consequence of the assumption, discussed earlier, that the hazard of death for those who are CI and LTF does not depend on the order in which those events occurred. It is therefore possible, in principle, to estimate 7 parameters using the observable data of those who became LTF after CI, because we can be certain that they entered the CI(LTF) state. However, to reduce the number of parameters in our model we impose the additional (but not required) constraint This constraint implies that the hazard of death for CI participants is independent of whether or not they are LTF. This assumption appears reasonable if we regard the critical information to be whether or not a participant is CI, and that CI has the same effect before or after a participant is LTF.

Data assumptions
For simplicity only data from the Newcastle centre have been used. The initial data set contained 2524 participants. We removed from this data set 56 who had missing MMSE scores at the prevalence screen, 4 who were audited as dead but with no record of the date of death and a further 12 participants, for whom our information was uncertain at the time of analysis. The data set used in the analysis therefore contained a total of 2452 participants. In the MRC CFAS data set exact times of LTF have not been recorded, only the interview at which the participant was first LTF. Because interviews are scheduled at varied times, we have to define an interval during which LTF can be considered to have taken place. As an upper end-point for the LTF interval we could use the average time since prevalence screen of the missed interview. However, this can cause problems when interviews are close together because an interview may take place later than the average time of the interview scheduled to follow it. Instead we have used the time of the last interview which took place plus the average time to the next scheduled interview as an upper end-point for the LTF interval.
As described earlier, when calculating the probability of a transition in the likelihood, we must sum over all possible transitions which may have taken place during the intervening period. In practice this means that we never observe a direct healthy to death transition, because we must also allow for the probabilities of passing through the CI and/or LTF states between the last healthy observation and the time of death. We found that a likelihood composed in this way may lead to misleading parameter estimates because an unfeasible set of parameter values may have a high likelihood. For example a near-zero healthy-to-death transition rate may be compensated for by other death rates being inflated. To avoid this problem we introduced an extra assumption that participants who died within 2 years of their last observation did not become CI or LTF during those 2 years.

Results
The age range of participants in the data set was 64-103 years at baseline, with a median of 74 years and an inter-quartile range of 11 years. Of the 2452 participants in the data set, 908 were male and 1544 were female. Before presenting the results of estimation of our multi-state model, we will first consider a simpler time-to-event analysis which may shed light on the relationship between the death, CI and LTF processes. Table II shows results of proportional hazards regression analyses for time to death with (a) CI and (b) LTF as time-dependent covariates, and with age in years and sex as time-independent covariates. This analysis assumes non-informative censoring. This assumption can be justified for the LTF analysis because censoring only occurs at the end of the study. It may not be justified for the CI analysis where individuals who are LTF have been censored at their last observation time. The results of analysis (a) show a considerable increase in risk of death for participants who were CI. The results of analysis (b) suggest that there may be a slight increase in the risk of death for participants who have become LTF. For both analyses the results show a higher risk of death for older ages and a lower risk of death for females, as we would expect.
In the remainder of this section we will consider the multi-state model of Figure 2. Counts of observed transitions between states are given in Table I. We have fitted the model in R using maximum likelihood estimation. Table III shows maximum likelihood estimates for the model parameters. Here transitions have been divided into transitions to death and others. The parameters relating to 6 and 7 can be determined by the constraints (2) and (3). For the death transitions older participants and male participants have a greater hazard of death, as we would expect. The intercept and shape parameters of 3 are greater than those of 1 indicating a greater hazard of death for CI participants, which agrees with the results of the simple proportional hazards analyses. The 5 intercept and shape estimates are somewhat different from the 1 estimates, with a smaller hazard at earlier times, but a greater hazard at later times due to the larger shape parameter. For other transitions, the H → CI hazard is greater for older and female participants, whereas the H → LTF hazard is greater for female participants. The hazard of the H → LTF transition is decreasing with time (i.e. 4 <1) because there was more drop-out from the study at earlier times. The CI → CI(LTF) transition has a very high intercept and small shape parameter because participants who were assessed as CI at the prevalence screen were scheduled to have a follow-up interview 1-2 months later, and there was a high level of drop-out before this second interview. It could be argued that participants who were CI at study entry could belong to a different population from those who were not. These individuals have been included partly because of the information they provide for estimation of the CI → CI(LTF) transition rate. Note, however, that if they are removed then the Weibull hazard for the CI → D transition is less elevated initially, and the hazard for the CI → CI(LTF) transition is smaller with more uncertainty about its shape. Table IV shows that the maximum likelihood estimates of the covariate parameters are fairly robust to different values of k (this was to be expected, as was discussed in Section 3.3). We have also fitted a k = 1 model with an age by sex interaction and a model with an age 2 term in the linear predictor. There was little evidence that these additional variables improved the fit of the models.  In order to test for the presence of non-informative LTF in the data, we can compare the multi-state model with a model which is identical, except with 1 = 5 and 2 = 6 . A likelihood ratio test gave a test statistic of 16.3 on 4 degrees of freedom, which corresponds to a p-value of 0.003. There is therefore evidence that the more complicated model provides a better fit to the data.
The time to death process can be modelled non-parametrically using an adapted Kaplan-Meier estimate of the survival curve with adjustment for age and sex through the Cox model. This estimate should be unbiased because we have exact death times and no informative censoring for the death process. It can therefore be used to assess one aspect of the fit of the multi-state model by comparison with the multi-state estimate of the probability of survival. Figure 3 shows plots of the non-parametric and multi-state model survival curve estimates for males and females of mean study age. The multi-state model provides a generally reasonable fit, but with slight underestimation of the risk of death, especially for males at later times. This could be due to the extra assumptions made when fitting the multi-state model.
We can also use the multi-state model to plot cumulative incidence curves for the time to CI outcome. Figure 4 shows plots of the cumulative incidence curves for the CI process, calculated using the multi-state model with various values of k. These plots are quite sensitive to the value of k, especially for females. The sensitivity to k appears to be greater here than was found in Siannis et al. [7] for the case with exact transition times, possibly because of extra uncertainty inherent in the use of interval-censored data.

Discussion
We have fitted a semi-competing risks model to data from the MRC Cognitive Function and Ageing Study, which takes into account informative LTF and interval-censoring. We used a five-state model, explicitly modelling the LTF process by the inclusion of two LTF states in the model. This paper extends previous work on MRC CFAS data [6] by allowing for informative observation. In addition, the presence of interval-censored data necessitated the use of a new model which  has an extra transition compared with one used previously to handle informative observation [7]. The results presented here agree qualitatively with those reported in the previous analysis. However, we have demonstrated that informative observation is present and therefore should be considered in future analyses of MRC CFAS data. In addition our analysis reveals that participants were more likely to become LTF from the MRC CFAS if they were older or female. By using Weibull hazards for the transition intensities we were able to incorporate covariates in a natural way. However, other distributions could be used. If the model was to be used for prediction purposes, piecewise-constant intensities would be more appropriate [6].
The five-state model introduced here contains two LTF states and a transition between them which can never be observed. This leads to identifiability issues, and forces us to introduce constraints on some parameters in the model. One of the constraints we imposed involves a constant k, which is interpreted as the ratio of the hazard of death divided by the hazard of CI for those who are not LTF to the equivalent quantity for those who are LTF. Because there is no information about k in the data, it must be considered to be fixed in any estimation of the model. We used the most natural choice, k = 1, in our analysis, but also carried out sensitivity analyses using other values of k. We found the estimates of covariate parameters to be fairly robust to different values of k. However, estimated cumulative incidence curves were found to be more sensitive. The dependence of survival curves on k was more severe here than was found by Siannis et al. [7] using exact time transitions.
Our model could be extended in several ways. The assessment of cognitive function is subject to measurement error, and so we could take into account the possibility of misclassification of both healthy and CI states using methods similar to those of van den Hout and Matthews [6]. Here we have concentrated instead on the time to the first assessment of CI and ignored later assessments, whether CI or not. Our approach can, however, be justified because in general participants show improvement in MMSE scores with time through familiarity with the MMSE questionnaire. The first CI assessment may therefore carry more weight, even though later assessments may indicate that the participant in question is healthy. Other possible extensions to our model could be to allow the hazard for the H → LTF transition to depend on the last recorded MMSE score, to allow for non-linear time effects, particularly for the H → CI transition, and to use a mixture model for the CI → CI(LTF) transition.
An alternative approach to the analysis of the MRC CFAS data set would be to use a joint longitudinal model in which the MMSE score was modelled as a continuous variable, and the hazard of death was allowed to depend on the MMSE score to account for the informative censoring of death by LTF due to cognitive decline. We used the multi-state model approach because the CI process itself is of clinical interest, as well as the death process, and we therefore wish to take into account informative LTF when modelling the CI process. In addition, modelling of the MMSE score as a continuous outcome variable is not straightforward (see, for example, Muniz Terrera et al. [9]). Another advantage of the multi-state model is that it accounts for LTF which may be informative for a variety of reasons and for which relevant data may not be available to include as covariates in the model.