Reuse of this article is permitted in accordance with the Terms and Conditions set out at http://wileyonlinelibrary.com/onlineopen#OnlineOpen_Terms
Research Article
A semiMarkov model for stroke with piecewiseconstant hazards in the presence of left, right and interval censoring
Article first published online: 18 AUG 2012
DOI: 10.1002/sim.5534
Copyright © 2012 John Wiley & Sons, Ltd.
Additional Information
How to Cite
Kapetanakis, V., Matthews, F. E. and van den Hout, A. (2013), A semiMarkov model for stroke with piecewiseconstant hazards in the presence of left, right and interval censoring. Statist. Med., 32: 697–713. doi: 10.1002/sim.5534
Publication History
 Issue published online: 23 JAN 2013
 Article first published online: 18 AUG 2012
 Manuscript Accepted: 27 JUN 2012
 Manuscript Revised: 4 MAY 2012
 Manuscript Received: 26 OCT 2011
 Abstract
 Article
 References
 Cited By
Keywords:
 censored data;
 semiMarkov model;
 multistate modelling;
 piecewiseconstant hazards;
 EM algorithm;
 stroke
Abstract
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
This paper presents a parametric method of fitting semiMarkov models with piecewiseconstant hazards in the presence of left, right and interval censoring. We investigate transition intensities in a threestate illness–death model with no recovery. We relax the Markov assumption by adjusting the intensity for the transition from state 2 (illness) to state 3 (death) for the time spent in state 2 through a timevarying covariate. This involves the exact time of the transition from state 1 (healthy) to state 2. When the data are subject to left or interval censoring, this time is unknown. In the estimation of the likelihood, we take into account interval censoring by integrating out all possible times for the transition from state 1 to state 2. For left censoring, we use an Expectation–Maximisation inspired algorithm. A simulation study reflects the performance of the method. The proposed combination of statistical procedures provides great flexibility. We illustrate the method in an application by using data on stroke onset for the older population from the UK Medical Research Council Cognitive Function and Ageing Study. Copyright © 2012 John Wiley & Sons, Ltd.
1 Introduction
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
Stroke is the rapidly developing loss of brain function due to a disorder in the blood supply to the brain. It can cause serious complications that may lead to death. Stroke is the third largest cause of death in the UK and the USA [1, 2]. Nonfatal stroke may cause serious complications including permanent neurological damage and adult disability.
Multistate modelling is a method of analysing longitudinal data when the observed outcome is a categorical variable. In medical research, multistate models are often used to model the development or progression of a disease, where the different levels of the disease can be seen as the states of the model. This approach enables the investigation of ageing in the older population by jointly modelling the rate of having a nonfatal stroke or dying on healthy individuals and the rate of dying after having a nonfatal stroke. Multistate models have been used in a wide range of applications including AIDS [3], liver cirrhosis [4], cognitive impairment [5], coronary heart disease [6], stroke [7] and various types of cancer [8, 9]. Putter et al. [10] have published a concise introduction to multistate modelling.
Norris [11] has discussed the theory of stochastic processes and Markov chains. Fitting multistate models involves various assumptions. A common hypothesis is that the data satisfy the firstorder timehomogeneous Markov property. According to this assumption, the transition to the next state depends only on the current state. This means that any previous history of the process can be ignored. Although this assumption simplifies statistical modelling, it may often be inappropriate and lead to incorrect conclusions. A number of extensions to the theory have been proposed including the incorporation of history in the underlying stochastic process. Weiss and Zelen [12] first proposed a semiMarkov model for clinical trials. In semiMarkov models, the transition to the next state depends not only on the current state but also on the time spent in the current state. This involves the exact transition time from one state to the other, which in many applications is unknown. In 1999, Commenges introduced the terminology of a partial Markov model [13]. In partial Markov models, the transition to the next state depends not only on the current state but also on a multivariate explanatory process that can be predicted at the current state. This enables the inclusion of explanatory covariates in multistate modelling. Faddy [14] applied originally a model with piecewiseconstant transition intensities, which enables intensities to depend on timevarying covariates. Van den Hout and Matthews [15] have also discussed a piecewiseconstant approach for the effect estimation of explanatory variables in multistate modelling.
Longitudinal studies, as opposed to crosssectional studies, involve repeated observations on the same individuals over time. In such studies, researchers often recruit individuals over a range of ages at which some participants may have already developed and progressed through the different study endpoints. Longitudinal data are usually collected by monitoring individuals at prespecified times over the period of an observational study. Thus, the value of monitored variables is known at a discrete set of times, only. The case where the exact value of a variable is unknown and only partial information is available is referred to as censoring [16]. There are three types of censoring, namely left, right and interval censoring. In left and right censoring, the value of a variable is known to lie below and above a certain value, respectively. In interval censoring, the value of a variable is known to lie within an interval with known limits. Methods for handling rightcensored data have been discussed in a number of statistical textbooks [17, 18] and are widely implemented in medical research. However, methods for adjusting for left censoring are less frequently employed in longitudinal studies [19]. Ignoring the presence of left censoring when estimating the underlying stochastic process that explains the data observed, may cause substantial bias [19]. Cain et al. have shown that including individuals whose data are subject to left censoring (by collecting all necessary information at the time of recruitment) rather than excluding them from the analysis reduces bias significantly [19]. A notion similar to left censoring is that of left truncation. However, left truncation is to be distinguished from left censoring. A lefttruncated distribution is one formed from another distribution by cutting off and ignoring the part lying to the left of a fixed variable value [20]. A lefttruncated sample is likewise obtained by ignoring all values smaller than a fixed value [20]. Left truncation may occur in longitudinal studies when individuals who have already developed and progressed through the different study endpoints before the beginning of the study are not included in the study. A reason for an individual not to be included in the study is the event of death before the initiation of the study. In 1986, Kay [21] introduced a method that dealt with the problem of right censoring and also handled the case where the time of death is known precisely. Foucher et al. have investigated ways to fit multistate models in the presence of left, right and interval censoring by using a generalised Weibull distribution for the waiting times of the underlying process [22]. Interval censoring has often been dealt with by integration [6]. In 1993, Lindsey and Ryan [8] presented another approach for adjusting for interval censoring based on the Expectation–Maximisation (EM) algorithm.
This paper presents a method to incorporate history in the underlying process in the presence of left truncation and left, right and interval censoring. The proposed model combines properties of semiMarkov models and partial Markov models. We handle interval censoring by integration and adjust for left censoring by using an EMinspired algorithm [23]. We bypass left truncation by analysing data only over the period of followup although, for the adjustment for left censoring, assumptions about the process before baseline need to be made. We illustrate the method in an application by using data from the UK Medical Research Council Cognitive Function and Ageing Study (MRC CFAS). The objective was to investigate ageing in the older population by modelling the transition intensities in a threestate model that comprises the states ‘healthy’ (state 1), ‘history of stroke’ (state 2) and ‘death’ (state 3) and to investigate how time after an individual has a stroke affects the rate of dying. Statistical inference about ageing is feasible only for the older population because the study includes individuals in their 65th year and above. Survival after having a stroke has been discussed in several articles [24, 1]. These articles assist the understanding of the mechanisms and the difficulties that exist in the particular data set that is used in the application and enable the validation of the results of the proposed method.
Section 2 presents the available data of the MRC CFAS. Section 3 presents the statistical model and the method to include timevarying explanatory covariates in the presence of right and interval censoring. We discuss handling left censoring in Section 4. A simulation study in Section 5 shows how assumptions about the process before baseline affect the performance of the method. Section 6 illustrates the method on the MRC CFAS data and investigates model fit graphically. Finally, Section 7 is the discussion.
2 Data
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
The MRC CFAS is a large scale multicentre longitudinal study conducted in the UK [25]. The study was launched in the late 1980s to explore dementia and cognitive decline by using a representative sample of 13 004 people in the older population. The data have also been used to investigate other disorders such as depression [26] and physical disability [27] and to look at healthy active life expectancy [15]. To date, over 46 000 interviews with participants have been completed. More information on the design of this study is available online (www.cfas.ac.uk).
The objective in this paper was to investigate ageing in the older population by modelling the transition intensities in a threestate model that comprises the states ‘healthy’ (state 1), ‘history of stroke’ (state 2) and ‘death’ (state 3). Figure 1 illustrates the multistate model. Of interest is how time after an individual has a stroke affects the rate of dying.
We analyse a subset of the MRC CFAS data, that is, data of the Newcastle centre only. We denote this data set as the MRC CFAS throughout this work. This subset includes data of 2316 individuals in their 65th year and above, interviewed during the period from 1991 to 2003. These individuals had up to nine interviews where they were asked whether they had had a stroke since they were last seen, and age at interviews was recorded. Exact dates of death are available even after the end of followup. At baseline, history of stroke up to that time was investigated, and individual data for age (A), gender (G; 0 for women and 1 for men), years of education (E; 0 for less than 10 years and 1 for 10 years or more) and smoking status at age 60 years (S; 0 for nonsmoker or exsmoker and 1 for current smoker) were collected. Defining smoking in this way reduces a bias from giving up due to ill health. Smoking habits rarely change after age 60 years. According to the annual report for smokingrelated behaviour and attitudes in 2005 [28], smokers over the age 65 years are the least likely to want to stop smoking, and those who want to give up are more likely to have quit before the age of 65 years.
Both the number of interviews and the time between interviews varied among individuals. Figure 2(a) and (b) show the number of interviews per individual and the distribution of the length of followup intervals, respectively. The median length of followup intervals was 2 years, and the median number of interviews was 2. Figure 2(c) illustrates the distribution of the time between the last interview and the time of either death or right censoring. Table 1 shows the frequencies of pairs of consecutive states in the data. For each state i and j and over all individuals, these frequencies correspond to the number of times an individual had an observation in state i followed by an observation in state j. Owing to the definitions of the states, there were no transitions from state 2 to state 1.
To (state j)  Total  

Healthy  History of stroke  Death  Censored  
From (state i)  
Healthy  2964  113  1328  710  5115 
History of stroke  0  303  223  55  581 
Total  2964  416  1551  765  5696 
In the MRC CFAS longitudinal study, there are a number of potentially observed patterns of followup for each individual. For example, if at the beginning of the study an individual is healthy, then he or she can either have a stroke in the coming years and die or be still alive when the study ends, or not have a stroke and either die before the end of the study or be right censored. Likewise, if at the beginning of the study an individual is reported to have had a stroke, then he or she may remain alive or die before the end of the study. We depict these various patterns graphically in Figure 3 and label them as separate patterns A–F. In patterns A, B, E and F, a transition from state 1 to state 2 is known to have happened. For patterns C and D, however, the presence of censoring makes it impossible to know whether such a transition has taken place. Therefore, two scenarios are possible. An individual may have moved to state 2 and never been recorded in this state owing to censoring or may have remained in state 1 until he or she died or the state was right censored.
In the data, 2151 individuals were observed in state 1 at baseline, whereas 165 had a stroke before the beginning of the study. Individuals who had a stroke before the initiation of the study were asked at their first interview to report the time of their first stroke. Selfreported data are often subject to measurement error due to digit preference, that is, the tendency to round outcomes to pleasing digits [29], and should be treated with caution. Moreover, in most longitudinal studies, information about the measured endpoints prior to baseline is rarely available. For this reason, we have developed in this paper a method that does not need this information and have not use selfreported data regarding the time of first stroke. Hence, the way the proposed method handles all types of censoring makes the method applicable to most longitudinal studies.
The median age of individuals at baseline was 74 years. This was imposed by the study design, according to which individuals over their 75th year were oversampled to achieve equal numbers with individuals aged 65–74 years at baseline. By the study design, every individual was followed up approximately every 2 years. The time of death is known exactly. Because the exact time of transition from state 1 to state 2 is unknown, the data are subject to left, right and interval censoring. It is possible that transitions from state 1 to state 2 may have occurred and not have been observed before death or right censoring at the end of followup. Transitions from state 1 to state 2 that take place before the beginning of the study are left censored if individuals are enrolled in the study or left truncated if they are not. A reason for an individual not to be enrolled is the event of death before baseline.
To include the individuals who were observed in state 2 at the beginning of the study, the estimation of the exact age of onset of state 2 is necessary. This estimation involves assumptions with regard to the age at which these individuals were healthy in the past. Using data published in ‘Key health statistics from general practice’ reports of the Office for National Statistics [30], we estimated that 90% of individuals who have a stroke before the age of 76 years (the median age at baseline for individuals who had a stroke before the beginning of the study) have the stroke after the age of 40 years. For these individuals, the probability of having the stroke within the age span 35–44 and 45–55 years is 5.06% and 12.6%, respectively. In the estimation of these figures, we ignored possible cohort effects owing to unavailability of data. Nevertheless, we expect the true estimates to be of similar magnitude. Therefore, when modelling stroke for individuals who were observed in state 2 at baseline, a realistic assumption with regard to the age at which these individuals can be assumed to have been healthy in the past is to assume that they were healthy at the age of 40 years.
To overcome the difficulties imposed by the study design and the presence of censoring, we used age, A, as the time scale. Age is also the natural time scale for processes in the older population. We introduce the following notation:
 A_{0}
:Age before the beginning of the study at which all individuals are assumed to have been healthy.
 A_{b}
:Age at baseline.
 A_{1N}
:Age at the last time an individual is observed in state 1.
 A_{20}
:Age at the first time an individual is observed in state 2.
 A_{N}
:Age at the end of the followup.
 W
:Age at the time of transition from state 1 to state 2.
3 Method
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
We model the semiMarkov process via regression equations for transition intensities and fit the threestate model illustrated in Figure 1, adjusting all transition intensities for a number of possible confounders. We introduce history in the underlying stochastic process by fitting a semiMarkov model where the waiting time in state 2 is used as a timevarying covariate.
3.1 The regression model
Let Y (A) ∈ {1,2,3} denote an individual's state at age A, and let q_{ij} be the intensity for the transition from state i to state j, where (i,j) ∈ {(1,2),(1,3),(2,3)}. We model the transition intensities as follows:
 (1)
 (2)
where j = 2 or 3; Z(A) = (1,A,X_{1},X_{2}, … ,X_{r})^{T} is a vector including age, A, and r explanatory variables, X_{1},X_{2}, … ,X_{r}; is the vector of corresponding regression coefficients for a transition from state i to state j; and W is age at the time of transition from state 1 to state 2. The parameter γ is the regression coefficient corresponding to the time spent in state 2, A − W. Without loss of generality, we assume that the explanatory variables X_{1},X_{2}, … ,X_{r} may vary with age, A. Throughout this work, the vector (1,A,X_{1},X_{2}, … ,X_{r})^{T}, evaluated when an individual is A years old, is denoted by Z(A).
For simplicity, we consider the restricted model when Z(A) = (1,A)^{T}. The transition intensities q_{1j}(A) for j ∈ {2,3}can be expressed as follows:
where . Thus, the baseline intensity function for a transition from state 1 to state j ∈ {2,3}comes from a Gompertz distribution with age as the time scale. Similarly, transition intensity q_{23}(A  W) can be expressed as follows:
If we let T_{2} = A − W, then transition intensity q_{23} takes the following form:
When an individual moves to state 2, W remains constant thereafter, and q_{23} can be expressed as follows:
 (3)
where and . This equation shows that the baseline intensity function for a transition from state 2 to state 3 comes from a Gompertz distribution with time spent in state 2 as the time scale.
The Gompertz distribution is suitable for modelling data with monotonic hazard rates that either increase or decrease exponentially with time. In Equation (3), when , the hazard function increases with time, and survival tends to 0 as T_{2} tends to infinity. When is zero, the hazard function is equal to α_{23} for all T_{2}, so the model reduces to an exponential. A negative would imply that the hazard function decreases with time. However, when α_{23} ≤0 or , the cumulative distribution function is improper. In particular, when α_{23} > 0 and , survival tends to as T_{2} goes to infinity. This implies that of individuals never move to state 3. Similarly, in Equation (2), when − β_{A.23} < γ < 0, an individual's mortality after having a stroke increases with age, but this increase becomes smaller with time after the stroke. However, when γ < − β_{A.23} < 0 (i.e. ), the model implies that there is a nonzero probability of never failing (living forever). That is, there is always a nonzero hazard rate, yet it decreases exponentially.
Some authors of recent survival analysis textbooks would restrict to be strictly positive so that the survivor function always goes to zero as T_{2} tends to infinity [31]. Although this may be a desirable mathematical property, the more traditional approach that also standard statistical software take is that of not restricting [32]. The main reason for this is that, in survival studies, individuals are not monitored forever. There is a date when the study ends, and in many applications in medical research, an exponentially decreasing hazard rate is clinically appealing [32].
3.2 A piecewiseconstant hazards model
The transition intensities are functions that vary with age, and this should be taken into account in the estimation of the likelihood contributions. In particular, all the transition intensities should be adjusted for age, A, and explanatory variables, X_{1},X_{2}, … ,X_{r}, which may vary with age. Furthermore, the transition intensity q_{23} should be adjusted for the time spent in state 2, A − W, in a timevarying way. Let [A_{L},A_{U}] be any age interval with lower and upper age limits A_{L} and A_{U}, respectively. A piecewiseconstant approach for the transition intensities involves splitting [A_{L},A_{U}] into small pieces, within which the transition intensities are assumed to be constant. More precisely, a resolution, h, is specified; and starting from A_{L}, interval [A_{L},A_{U}] is split into as many as possible subintervals of length h. Without loss of generality, we assume that [A_{L},A_{U}] can be split in exactly K such subintervals. In every subinterval k, k = 1, … ,K, the transition intensities are evaluated at the left subinterval limit. Figure 4 illustrates the piecewiseconstant approach for a transition from state 2 to state 3. In this case, A_{L} = W and A_{U} = A_{N}. As an example, at subinterval 2, we evaluate q_{23} at age W + h. We assume the time spent in state 2 throughout subinterval 2 to be (W + h) − W = h.
3.3 Likelihood contributions
To fit the model defined by Equations (1) and (2), age at the exact time of transition from state 1 to state 2, W, is needed. However, W is unknown for all patterns. For data that follow patterns A–D, this problem is solved by integrating out all possible values of W in the calculation of the likelihood contribution of every individual. Equations (4) and (5) give the individuals' contribution to the likelihood for patterns A and D, respectively. These likelihood contributions are conditional on individuals' state at baseline, Y (A_{b}).
 (4)
 (5)
where is the probability that an individual remains in state i throughout age interval [A_{L},A_{U}] when the escape rate from state i is λ_{i} for i ∈ {1,2}. Under the assumption of piecewiseconstant transition intensities, it follows that λ_{i} is piecewise constant over the same subintervals as the transition intensities and that the probability of remaining in state i at a specific subinterval k follows an exponential distribution with parameter λ_{i.k}. Hence, for every age interval [A_{L},A_{U}], it follows that
 (6)
 (7)
where
The rationale behind Equations (4) and (5) becomes clear by looking at Figure 3. In pattern A, every individual is healthy at the beginning of the study, moves to state 2 at some point and then dies. The time of the transition to state 2 is unknown. Under the assumption that this transition takes place when the individual is W years old, the likelihood contribution of this individual is given by the integrand of the integral in Equation (4). That is the probability that the individual remains in state 1 for time W − A_{b} following a distribution with piecewiseconstant rate q_{12} + q_{13}, then moves to state 2 with an instant transition rate q_{12}, then remains in state 2 for time A_{N} − W following a distribution with piecewiseconstant rate q_{23}, and finally moves to state 3. However, the exact value of W is unknown. We obtain the likelihood contribution by integrating out all possible values for W in (A_{1N},A_{20}).
In pattern D, two different things can happen, and the likelihood contribution of each individual consists of two terms. The first term corresponds to the case where a transition to state 2 does take place. In this case, the likelihood contribution is given by a similar integral as in pattern A. The difference is the right censoring at the end of the followup. In pattern D, we include the probability of remaining in state 2 for at least time A_{N} − W, but there is no multiplication with the rate of transition to state 3 at the end of the followup. The second term in Equation (5) corresponds to the case where a transition to state 2 does not happen. The likelihood contribution is then given by the probability of remaining in state 1 throughout interval [A_{b},A_{N}].
For data patterns B and C, we calculate the likelihood contributions in a similar way. For patterns E and F, where the transition from state 1 to state 2 takes place before baseline, we estimate the time of onset of state 2 as described in Section 4. Once this estimate is obtained, the model for these patterns is fully specified. Hence, for individuals in patterns E and F, their contribution to the likelihood is and , respectively, where is given by Equation (7). These likelihood contributions are conditional on individuals' state at baseline, Y (A_{b}) and W. We calculate the full likelihood, , by multiplying all the individual likelihood contributions.
4 Handling left censoring
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
When the data are subject to left censoring (patterns E and F), there is no left age limit at which the individual is known to have been healthy. However, we assume that there is some age, A_{0}, at which all individuals in the data were healthy. Figure 3 illustrates the assumption. Let be the parameter vector of the regression model defined by Equations (1) and (2). We use a method inspired by the EM algorithm [23] to handle left censoring, as follows:

We use parameter θ^{0} to find the expected age of transition from state 1 to state 2, W, for individuals in patterns E and F. An estimate of the time spent in state 2 for individuals in patterns E and F at baseline is A_{b} − W.

We repeat steps 2 and 3, creating a sequence of parameter vectors {θ^{0},θ^{1}, … }, until convergence of both the parameter vector and the maximum of the likelihood.

After obtaining the converged maximum likelihood parameters of θ, we find confidence intervals for the parameters by using the nonparametric bootstrap method. We obtain each bootstrap sample by weighting all sampling units of the available data with equal weight, and by randomly taking a sample of equal size from the available data, with replacement. In longitudinal studies where an individual may have multiple records, every individual is one sampling unit. The resampling scheme is stratified by the following strata: (i) patterns A–D and (ii) patterns E and F. Stratification preserves the same proportion of lefttruncated and leftcensored data in all bootstrap samples and eliminates a potential source of bias. We estimate the variance of the mean by using the studentised bootstrap statistic and the nonparametric delta method, as described by Davison and Hinkley [33].
We compute the expected age at the time of transition from state 1 to state 2 in step 2 by splitting interval [A_{0},A_{b}] following the piecewiseconstant approach described in Section 3.2 and by finding the probability of transition in each subinterval conditioning on the fact that a transition from state 1 to state 2 takes place within [A_{0},A_{b}]. Without loss of generality, let us assume that [A_{0},A_{b}] can be split into exactly K such subintervals. For each k = 1, … ,K, let P_{k} be the probability that an individual moves to state 2 within subinterval k, conditional on that he or she is healthy at age A_{0}, and is found in state 2 at baseline, that is, , where Y (A) is the underlying multistate process of an individual at the time point where he or she is A years old. Using Bayes' theorem, we can obtain P_{k} as
 (8)
where at subinterval k  Y (A_{b}) = 2), which is given by the following formulas:
 (9)
where p_{12.k}(h) is the probability of transition from state 1 to state 2 within time h at subinterval k and p_{ii.k}(h) is the probability of remaining in state i ∈ {1,2}for time h at subinterval k, for k = 1, … ,K. The probabilities P_{k} in Equation (8) simplify the problem of finding the expected age at the time of transition from state 1 to state 2 within [A_{0},A_{b}] by reducing it to the calculation of the mean of a multinomial distribution with K possible outcomes with corresponding probabilities P_{1}, … ,P_{K}. The expected subinterval of the transition from state 1 to state 2 is therefore subinterval k*, where
 (10)
and ⌊ ⋅ ⌋is the floor function, which maps a real number to the largest preceding integer. The expected time of transition within subinterval k*, conditional on that a transition has taken place within subinterval k*, is
 (11)
where is the escape rate from state 1, evaluated at the beginning of subinterval k*. Hence, the age at which an individual in pattern E and F is expected to move from state 1 to state 2 is
 (12)
where k* and t are given by Equations (10) and (11), respectively.
5 Simulation study
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
Assuming that A_{0} = 40 for the 165 individuals in patterns E and F in the MRC CFAS data implies that for these individuals, A_{b} − A_{0} is 36 years on average. This period is three times longer than the time of study followup, A_{N} − A_{b}, which has a maximum of 12 years. To examine whether this assumption influences the performance of the proposed method, we carried out a simulation study with two investigated scenarios (Scenarios I and II). The differences between the two simulation scenarios were the age at which everyone was assumed to have been healthy before baseline, A_{0}, and the distribution of age at baseline. In Scenario I, we assumed A_{0} to be 60 years and we simulated age at baseline from a normal distribution with mean 65 years and standard deviation 2 years, left truncated at the age of 64 years. We chose this combination so as to reflect moderate left censoring, and this implied that the range of A_{b} − A_{0} was 4 to 12 years with an average of 6 years. In Scenario II, we assumed A_{0} to be 40 years and we simulated age at baseline from a normal distribution with mean 75 years and standard deviation 6.5 years, left truncated at the age of 64 years. This distribution had equal mean and range as the observed age at baseline in the MRC CFAS data, although age at baseline in the MRC CFAS was bimodal by design.
In both simulation scenarios, we simulated data according to Equations (1) and (2), adjusting all transition intensities for age, A. We additionally adjusted the transition intensity q_{23} for the time spent in state 2. We included no other covariates in the simulation study. Although timevarying covariates have not been included in the simulation study, such covariates can be included after postulating a model for their trajectories with age.
Having the intention to select appropriate values for the model coefficients in the simulation study, we chose the values of β_{0.12},β_{0.13},β_{0.23},β_{A.12},β_{A.13} and β_{A.23} after fitting a threestate multistate model to the MRC CFAS data with age as the only covariate. We fitted the multistate model by using the msm package [34] in R following a piecewiseconstant approach, which allowed transition intensities to change after every observation point. We investigated several values for γ. When simulating individuals' trajectories, positive values of γ made individuals move to state 3 very quickly after a simulated transition to state 2. We avoided this by choosing γ to be negative. Table 2 shows the selected values of model coefficients for both simulation scenarios. As discussed in Section 3.1, choosing γ to be − 0.11 and β_{A.23} to be 0.052 implies that a maximum of of individuals die after having a stroke. Thus, a maximum of 74.3% of individuals with simulated stroke at the age of 65 years will eventually move to state 3. This makes the survival function for transitions from state 2 to state 3 improper. Although the selected value of γ involves an unrealistic longterm survival, this choice facilitates the simulation study as it leads to simulated trajectories with nonnegligible time spent in state 2. The choice of γ should not affect the findings of the simulation study.
Regression coefficient  True value  Scenario I  Scenario II  

Mean estimate  Percentage bias (%)  RMSE  Mean estimate  Percentage bias (%)  RMSE  
β_{0.12}  − 8.780  − 8.962  2.07  1.818  − 8.631  1.70  1.057 
β_{0.13}  − 10.310  − 10.365  0.53  1.020  − 10.240  0.68  0.638 
β_{0.23}  − 5.920  − 5.888  0.53  2.212  − 5.522  6.73  1.033 
β_{A.12}  0.065  0.067  2.87  0.025  0.062  4.03  0.014 
β_{A.13}  0.093  0.094  0.87  0.014  0.092  0.83  0.008 
β_{A.23}  0.052  0.050  3.10  0.031  0.046  11.98  0.013 
γ  − 0.110  − 0.102  7.71  0.034  − 0.087  21.14  0.026 
We simulated age at baseline, A_{b}, for all individuals. All simulated data sets included 1500 individuals at baseline. To obtain this sample size, we simulated trajectories for 2500 individuals starting from age A_{0}. We excluded from the analysis individuals who moved to state 3 before the simulated age at baseline, treating their data as left truncated. In all simulations, more than 1500 individuals were alive at age A_{b}. However, all analyses included 1500 individuals in order to have the same sample size in all simulated data sets. When simulating the trajectories of individuals, we chose the resolution of the piecewiseconstant approach so that the transition intensities changed with age every 0.25 year (3 months). For both simulation scenarios, the length of followup was 12 years, and the length of followup intervals was 2 years.
In both simulation scenarios, we fitted the model defined by Equations (1) and (2) on 1000 simulated data sets. When fitting the model, we calculated the integrals for the calculation of the likelihood by using the composite Simpson's rule for numerical integration [35], with resolution equal to 0.05 year (18 days). The average percentage of individuals with lefttruncated data was significantly greater in simulation Scenario II (36.8%) as compared with Scenario I (7.8%) because individuals' simulated entry to the study was at a more advanced age. Moreover, the percentage of individuals observed in state 2 at baseline was 20.3% and 4.9% for simulation Scenarios II and I, respectively. Table 2 shows the results of Scenarios I and II. Mean estimates in both scenarios converged within 250 simulations. In Scenario I, the estimates of all regression coefficients were very close to their true values. Percentage bias remained below 5% for all coefficients except for the one which corresponds to the time spent in state 2 (percentage bias for γ = 7.71%). Nevertheless, the actual bias for γ was very small ( bias = 0.008). In Scenario II, the coefficients corresponding to intensities q_{12} and q_{13} were close to their true values. However, we estimated the coefficients corresponding to q_{23} with more than 6% bias. In particular, the percentage bias for γ was more than 21%. The presence of extensive left censoring and left truncation had greater impact on the coefficients, which model the rate of escape from state 2. There are two reasons for this. The first reason is the fact that left censoring is handled by the EMinspired algorithm. When A_{b} − A_{0} is large, the difference between the expected age of onset of state 2 in the expectation step of the EMinspired algorithm and the true age of onset of state 2 within the interval [A_{0},A_{b}] may be large. When A_{b} − A_{0} is small, the estimation of the true age of onset of state 2 becomes more accurate. The second reason is that the likelihood contributions of individuals in patterns E and F include only q_{23} and not q_{12} or q_{13}. Because left censoring appears only in patterns E and F, it affects mostly intensity q_{23}. Intensities q_{12} and q_{13} are affected by left censoring to a lesser extent.
The square root of the mean squared error (RMSE) of the regression coefficients, although similar, was consistently smaller in Scenario II as compared with Scenario I. Although this is counterintuitive, it can be explained. For the chosen true values of the regression coefficients in these simulations, the percentage of individuals observed in state 2 at baseline was more than four times greater in Scenario II (20.3%) than that in Scenario I (4.9%). When the percentage of individuals in patterns E and F is larger, there is more information for the estimation of q_{23}. Because the transition intensities are jointly modelled, this affects the precision of all parameter estimates. This is why all RMSE estimates were smaller in Scenario II as compared with Scenario I. However, it is important to make clear that simulation Scenarios I and II are not comparable because both, A_{0} and , were chosen differently. Scenario I has the capacity to show that the method works well in the presence of moderate left censoring and left truncation, where A_{b} − A_{0} is small as compared with A_{N} − A_{0}. In Scenario II, both, A_{0} and , were chosen to be smaller and larger than in Scenario I, respectively, to investigate the performance of the method in the case of severe left censoring and truncation. Smaller simulation studies (not presented in this paper) where the distribution of A_{b} was fixed allowing only A_{0} to vary confirmed that the proposed method works well when A_{b} − A_{0} is reduced, independently of whether this is achieved by increasing A_{0}, reducing A_{b} or both.
Apart from the value of A_{0} and the distribution of A_{b}, the results of the simulation scenarios also depend on the chosen values of the regression coefficients that were used to simulate the data. These coefficients affect the extent of left censoring and left truncation. Although the distribution of A_{b} had equal mean and range as the observed age at baseline in the MRC CFAS data, the percentage of individuals observed in state 2 at baseline was 20.3% in Scenario II and 7.1% in the MRC CFAS. Of those who had had a stroke and reported their history in the MRC CFAS, 98% expressed their first stroke after the age of 40 years. Moreover, 59% and 79% of these individuals had had their first stroke within 5 and 10 years of baseline interview, respectively. On the contrary, in Scenario II, the relative figures for 5 and 10 years of baseline interview were 29% and 48%, respectively, therefore showing that the data in the MRC CFAS are subject to less severe left censoring than the data simulated in Scenario II. This, and the fact that the percentage of leftcensored data in the MRC CFAS is comparable with that of Scenario I (7.1% and 4.9%, respectively), indicate that the method is applicable to the MRC CFAS data. To investigate the effect of length of followup intervals, length of followup and sample size, we carried out additional simulations based on a range of different scenarios. The results (not presented) confirmed that bias is reduced by more frequent interviews and longer followup and showed that the proposed method performs equally well with 500, 1000 and 1500 individuals.
6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
We demonstrate the method by using data from the MRC CFAS, adjusting all transition intensities for age (A), gender (G), years of education (E) and smoking status at the age of 60 years (S). We additionally adjust the intensity for the transition from state 2 to state 3 for the time spent in state 2. More specifically, we fit the model defined by Equations (1) and (2) for Z(A) = (1,A,G,E,S)^{T}. The timevarying covariates in this model are age (A) and the time spent in state 2 (W − A), where W is age at the time of transition to state 2. We assume individuals who had a stroke before the beginning of the study to have been healthy at the age of 40 years (A_{0} = 40).
We undertake all the calculations for the likelihood contributions in C++ and perform the maximisation of the likelihood in R by using optim with the BFGS quasiNewton optimising method [3640]. We set the resolution for the piecewiseconstant approach to 0.25 year (3 months). We calculate the integrals for the calculation of the likelihood by using the composite Simpson's rule of numerical integration [35], with resolution equal to 0.05 year (18 days). We base the confidence intervals for the parameters on 450 bootstrap samples obtained using the studentised nonparametric bootstrap, as described by Davison and Hinkley [33]. This involves resampling sampling units (i.e. individuals) from the data, stratifying by the following strata: (i) patterns A–D and (ii) patterns E and F. This stratification guarantees the same proportion of missing information with regard to the age at which individuals in patterns E and F had a stroke and guarantees the same amount of left truncation in all bootstrap samples. Failing to control for either or both, censoring and truncation, may introduce bias in the uncertainty of estimated parameters. To improve the accuracy of the studentised bootstrap method, the variance of the parameter estimates is stabilised by transforming the parameter scale by using the Box–Cox transformation [41].
Table 3 shows the results of the model. The bootstrap confidence intervals indicate that the time spent in state 2, the history variable, is not found to be statistically significant (95% confidence interval for γ includes 0). Although γ has a negative sign, which is the opposite than what would be expected according to the literature [24, 1] (as survival has been shown to be adversely affected by the age at stroke and the risk of dying has been reported to be increased in the first several months after stroke [42, 43]), its absolute value is very small, and the confidence interval is relatively wide around zero. The positive coefficients for age reflect that having adjusted for gender, smoking, years of education and time spend in state 2; older people have increased mortality and hazard of having a stroke. The positive coefficients for gender reflect that men (as compared with women) have increased mortality at all ages and hazard for a stroke independently of their years of education or smoking status. These findings are consistent with those obtained by others [44, 45]. The positive coefficients for smoking show that smokers have increased mortality, as expected. Finally, higher education decreases the risk of dying on healthy individuals.
Covariate  Regression coefficient  Estimate  95% CI 

 
Model intercepts  β_{0.12}  − 8.184  ( − 12.028, − 5.245) 
β_{0.13}  − 11.150  ( − 12.006, − 10.106)  
β_{0.23}  − 7.472  ( − 10.168, − 5.265)  
Age (years)  β_{A.12}  0.051  (0.013, 0.104) 
β_{A.13}  0.101  (0.089, 0.111)  
β_{A.23}  0.065  (0.039, 0.097)  
Gender (men versus women)  β_{G.12}  0.340  ( − 0.050, 0.744) 
β_{G.13}  0.298  (0.159, 0.427)  
β_{G.23}  0.412  (0.128, 0.707)  
Education (10 years or more)  β_{E.12}  − 0.025  ( − 0.472, 0.355) 
β_{E.13}  − 0.228  ( − 0.381, − 0.077)  
β_{E.23}  0.159  ( − 0.144, 0.507)  
Smoking (current versus never/ex)  β_{S.12}  0.203  ( − 0.155, 0.591) 
β_{S.13}  0.503  (0.383, 0.644)  
β_{S.23}  0.347  (0.124, 0.647)  
Time spent in state 2 (years)  γ  − 0.001  ( − 0.021, 0.021) 
We assess the goodness of fit by using graphical methods by comparing observed with fitted survival curves and observed with expected prevalence in each state over time. When the model includes timevarying covariates, graphical illustration is not straightforward. We simulated a number of trajectories at an individual level under the fitted model to reflect the range of survival curves expected under the fitted model. The method of simulating trajectories at an individual level, known as microsimulation, has been used for studying the paths of dynamic processes in medical research, including studies of disability in the older population [46, 47].
Figure 5 illustrates survival probabilities for individuals who had a stroke before the beginning of the study. The bold lines correspond to the Kaplan–Meier survival curve and its 95% confidence limits for the observed data in the MRC CFAS. The dotted lines correspond to the Kaplan–Meier survival curves obtained as follows:

We assume that there are data sets, identical to the MRC CFAS data set at baseline, that is, data sets that include the same number of individuals with the same covariate specifications at baseline as the individuals in the MRC CFAS.

For every data set, we simulate one trajectory for every individual over time, assuming that the coefficients from Table 3 are correct. This generates simulated data sets.

We plot the Kaplan–Meier survival curve for individuals in patterns E and F for all simulated data sets.
Figure 6 illustrates the percentage of individuals in state 3 as a function of the time since the beginning of the study. Because state 3 is an absorbing state, prevalence is an increasing function of time. The bold solid line represents the prevalence observed in the MRC CFAS data. The dotted lines correspond to prevalence obtained by simulating trajectories, assuming that the coefficients from the fitted model are correct, as described earlier.
For individuals in patterns E and F, the model underestimates survival in the first 4 years, but model fit improves thereafter. Prevalence plots can be constructed for all states. However, the assessment of model fit by using prevalence plots becomes difficult because of the presence of interval censoring. Individuals are observed in state 2 later than when they actually move to state 2; thus, the observed prevalence in state 1 is greater than expected. The percentage of individuals in state 3 is not influenced by censoring, because exact times of deaths are available. The model predicts prevalence in state 3 remarkably well, considering the effect of the study design and the fact that the period A_{b} − A_{0} is on average 36 years for individuals in patterns E and F. Nevertheless, Figures 5 and 6 can only show that some aspects of the model fit the data well. No claim about the overall goodness of fit can be made as graphical methods are not formal statistical tests. Titman and Sharples [48] discuss model diagnostics for multistate models and present both graphical methods and formal statistical tests, but their discussion does not include our model.
7 Discussion
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
This paper presented a threestate illness–death model with no recovery in the presence of all types of censoring, where intensities for the transition from one state to another were allowed to change in a piecewiseconstant manner. We handled interval censoring by using integration and handled left censoring by using an EMinspired algorithm. We illustrated the method by using data for stroke from the MRC CFAS. The illness state was ‘history of stroke’, and we included time since having a stroke as a covariate in the intensity for the transition from the illness state to death.
We included individuals who had a stroke before the beginning of the study in the model after making the assumption that they were healthy at age A_{0} = 40. According to data published in ‘Key health statistics from general practice’ reports of the Office for National Statistics [30], this was a realistic assumption with regard to the age at which these individuals could be assumed to have been healthy in the past. Of those who had had a stroke and reported their history, 59% and 79% had had their first stroke within 5 and 10 years of baseline interview, respectively, therefore indicating that assuming a healthy outcome at the age of 40 years was a good estimate, and 98% expressed their first stroke after this age.
After choosing an appropriate value for A_{0}, one way to handle left censoring is to include an extra observation for all individuals (not only those who had a stroke before the initiation of the study), in which everyone is in state 1 at age A_{0}. Hence, left censoring can be treated as interval censoring by integrating out all possible transition times. When the time interval between age at baseline, A_{b}, and A_{0} is disproportionately large as compared with the period of study followup, this approach may introduce considerable bias. For the MRC CFAS data, the assumption that all individuals were healthy at age A_{0} = 40 would add an extra 35 years of healthy life to the vast majority of individuals (2151 out of a total of 2316 individuals). Unobserved personyears for the period between A_{0} and A_{b} would be treated as observed in the calculation of the likelihood. If the study had recruited individuals from age A_{0}, some of them would have died within [A_{0},A_{b}]. Thus, their data would have been left truncated. Left truncation makes the estimation of the leftcensored entry time in state 2 more difficult. Including personyears for the period [A_{0},A_{b}] in the likelihood, failing to include individuals whose data would have been observed within that period, or otherwise adjust for left truncation may introduce bias.
The proposed method overcomes this problem by imputing the sojourn time in state 2 for individuals who had a stroke before baseline and by utilising an EMinspired algorithm. All the formulas for the likelihood contributions are conditional on state at the beginning of the study and bypass the problem of left truncation. We calculate the likelihood over the period of study followup, [A_{b},A_{N}], and the assumption that study participants were healthy at age A_{0} is made only for those who were observed in state 2 at baseline. In particular, for these individuals, we assume that the underlying stochastic process that governs transitions between states is the same throughout the period [A_{0},max{A_{N}}], where max{A_{N}}is the maximum age A_{N} observed in the data. This involves the extrapolation of the formulas for the transition intensities back in time to age A_{0}. This assumption offers a practical solution to an otherwise unsolvable problem. Nevertheless, the lack of information over the period from age A_{0} to A_{b} makes this assumption unverifiable. The introduced EMinspired algorithm is an intuitively appealing method because missing values of age of transition to state 2 are imputed under a specified model iteratively until they converge to optimum values that maximise the likelihood of the complete data. Although, in simulation studies no problems have been experienced in the implementation of the method and the method has been shown to be able to produce unbiased estimates, the mathematical properties of the proposed EMinspired algorithm regarding convergence and susceptibility to bias require further investigation.
A simulation study showed that the performance of the method is affected by the choice of A_{0}. The performance of the method improves as A_{b} − A_{0} decreases. The method was found to perform well when the data are subject to moderate left censoring, where moderate left censoring is with respect to both the average length of [A_{0},A_{b}] and the percentage of leftcensored data. Data subject to moderate left censoring are common in medical research. For example, when modelling HIV, the difference between age at baseline and age at which all individuals could be assumed to have been healthy would commonly be less than 10 years.
Fitting the model defined by Equations (1) and (2) by using piecewiseconstant transition intensities and age as the time scale enables the explanatory variables X_{1},X_{2}, … ,X_{r} to vary with age A. When the change of a covariate X_{s} for s = 1, … ,r over time cannot be calculated using age in a deterministic way, the trajectory of X_{s} needs to be estimated. This can be achieved by carrying forward the last observed value of X_{s} or by jointly modelling the trajectory of X_{s} with the transition intensities (Equations (1) and (2)). For example, the time since the beginning of the study can be obtained deterministically by the difference in age between two time points, whereas the trajectory of a covariate such as an individual's blood pressure over time cannot. One way of estimating individuals' blood pressure is by carrying forward the last observation. In this way, timevarying covariates remain constant between successive followup visits and at any time they are equal to the value obtained from the latest followup visit. To avoid having followup visits with updated information on covariates lying within any subinterval where transition intensities are assumed to be constant, all study followup visits can be included as time points in the age grid of the piecewiseconstant approach. Another way is to predict individuals' blood pressure by using linear regression adjusting for possible confounders. Depending on the application, more complex methods of prediction may be appropriate.
The piecewiseconstant approach is to be distinguished from a discretization used to approximate the integrals in the likelihood contributions. In fact, the proposed method involves two time grids. The first grid is used for numerical integration and can be chosen to have a fine resolution. The second time grid corresponds to the intervals within which transition intensities are assumed to remain constant (piecewiseconstant hazards). For a given time of onset of state 2, the integrand of a likelihood contribution is being calculated using the piecewiseconstant approach described in Section 3.2. Thus, for any two different times of onset of state 2, the integrand is calculated in a different way. The reason all possible transition times are integrated out is because the time of onset of state 2 is unknown.
When covariates X_{1},X_{2}, … ,X_{r} do not depend on age, the only timevarying covariates are age, A, and time spent in state 2, A − W. In this case, the model defined by Equations (1) and (2) can be fit without adopting a piecewiseconstant approach for the transition intensities. Although the piecewiseconstant approach was not necessary in the application presented in this paper, we fitted the model defined by Equations (1) and (2) by using piecewiseconstant transition intensities because this approach is more general and allows the inclusion of timevarying covariates other than age and time spent in state 2. When such covariates are included in the model and a piecewiseconstant approach is adopted, we believe that expressing the model in the form of Equations (1) and (2)) provides better intuition and is more concise as compared with the form of the product of a baseline intensity function and a factor that includes the effects of covariates.
Applying the method to the MRC CFAS data (Newcastle centre) showed that the time since having a stroke was not statistically significant. Age was shown to increase mortality and the risk of having a stroke. Compared with women, men were found to have increased mortality at all ages and independently of their years of education or smoking status. These findings are consistent with those obtained by others [44, 45]. Smoking was shown to increase mortality, whereas higher education was shown to decrease the hazard of death on healthy individuals. Nevertheless, whether education alone can affect the risk of dying needs further discussion. Levels of education are usually associated with socioeconomic status and other causal factors, such as quality of health care received. Although the MRC CFAS data were subject to left censoring, graphical assessment of model fit showed that the model fitted some aspects of the data very well. Nevertheless, graphic assessment of model fit in multistate modelling cannot assess the overall goodness of fit, even when the model is correctly specified. The presence of interval censoring may cause discrepancies between the observed and expected prevalence in some states of the multistate model. Individuals are observed in state 2 later than the time they move to state 2. Hence, observed prevalence in state 1 is greater than the expected, even when the model is correctly specified and fits the data very well. Graphs can show that some aspects of the model fit the data well. Titman and Sharples [48] have developed formal statistical tests for diagnosing model fit in multistate modelling. However, these methods fall beyond the scope of this study.
This paper has presented a method to fit semiMarkov models in the presence of all types of censoring and left truncation. The method allows the inclusion of timevarying covariates that may change over time in a stochastic way and can be applied successfully in a range of medical applications with data subject to moderate left censoring. The parametric form of the piecewiseconstant approach presented in this paper offers flexibility and, additionally, the option to use the fitted model to extrapolate into the future (prediction). This is particularly important in research areas such as the estimation of life expectancies. Furthermore, the parametric form of the transition intensities facilitates the adjustment for left censoring. Finally, this paper has proposed a new graphical way of investigating goodness of fit when the model includes timevarying covariates, which is based on microsimulation.
Acknowledgements
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
The authors wish to acknowledge the significant contribution of all the individuals who participated in the study. Gratitude needs to be given to Steve Miller at the MRC Biostatistics Unit for his support with regard to programming in C++ with multithreaded computations. The Medical Research Council CFAS is supported by major awards from the Medical Research Council and the Department of Health (grant MRC/G99001400). This investigation is funded by grant MC_US_A030_0031.01.
References
 Top of page
 Abstract
 1 Introduction
 2 Data
 3 Method
 4 Handling left censoring
 5 Simulation study
 6 Application: The Medical Research Council Cognitive Function and Ageing Study data
 7 Discussion
 Acknowledgements
 References
 1Survival and recurrence following stroke: the Framingham study. Stroke 1982; 13(3):290–295., , , .
 2The Stroke Association. R11 stroke statistics resource sheet, 2006. Available from: http://www.stroke.org.uk/document.rm?id=877 [accessed on 12 April 2010].
 3Hidden Markov models for settings with interval censored transition times and uncertain time origin: application to HIV genetic analyses. Biostatistics 2007; 8(2):438–452., .
 4Multistate models for bleeding episodes and mortality in liver cirrhosis. Statistics in Medicine 2000; 19:587–599., , .
 5A semicompeting risks model for data with intervalcensoring and informative observation: an application to the MRC cognitive function and ageing study. Statistics in Medicine 2010; 30(1):1–10., , .
 6A multistate model for joint modelling of terminal and nonterminal events with application to Whitehall II. Statistics in Medicine 2007; 26:426–442., , .
 7Estimating strokefree and total life expectancy in the presence of nonignorable missing values. Journal of the Royal Statistical Society: Series A 2010; 173(2):331–349., .
 8A threestate multiplicative model for rodent tumorigenicity experiments. Journal of the Royal Statistical Society: Series C 1993; 42(2):283–300., .
 9Estimation and prediction in a multistate model for breast cancer. Biometrical Journal 2006; 48(3):366–380., , , , .
 10Tutorial in biostatistics: competing risks and multistate models. Statistics in Medicine 2007; 26:2389–2430., , .
 11Markov Chains, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press: Cambridge, UK, 1997..
 12 , .
 13Multistate models in epidemiology. Lifetime Data Analysis 1999; 5:315–327..
 14A note on the general timedependent stochastic compartmental model. Biometrics 1976; 32:443–448..
 15A piecewiseconstant Markov model and the effects of study design on the estimation of life expectancies in health and ill health. Statistical Methods in Medical Research 2009; 18:145–162., .
 16Applied Life Data Analysis, Wiley Series in Probability and Statistics. Wiley: Hoboken, New Jersey, USA, 2004..
 17Modelling Survival Data in Medical Research, 2nd ed. Chapman and Hall: Boca Raton, Florida, USA, 2009..
 18Survival and Event History Analysis. Springer: Baltimore, Maryland, USA, 2010., , .
 19Bias due to lefttruncation and leftcensoring in longitudinal studies of developmental and disease processes. American Journal of Epidemiology 2011; 173(9):1078–1084. DOI: 10.1093/aje/kwq481., , , , , , .
 20The Oxford Dictionary of Statistical Terms. Oxford University Press: New York, USA, 2010., , , , .
 21A Markov model for analysing cancer markers and disease states in survival studies. Biometrics 1986; 42:855–865..
 22A semiMarkov model for multistate and intervalcensored data with multiple terminal events. Application in renal transplantation. Statistics in Medicine 2007; 26:5381–5393., , , .
 23The EM Algorithm and Extensions, 2nd ed., Series in Probability and Statistics. Wiley: Hoboken, New Jersey, USA, 2008., .
 24Longterm survival after firstever stroke: the Oxfordshire Community Stroke Project. Stroke 1993; 24(6):796–800., , , , , .
 25Cohort profile: the Medical Research Council Cognitive Function and Ageing Study (CFAS). International Journal of Epidemiology 2006; 35:1140–1145., , .
 26Prevalence of depression in older people in England and Wales: the MRC CFA Study. Psychological Medicine 2007; 37(12):1787–1795., , , , , , .
 27Medical Research Council Cognitive Function and Ageing Study (MRC CFAS) and Resource Implications Study (RIS MRC CFAS). Writing committee: , , , , . Profile of disability in elderly people: estimates from a longitudinal population study. British Medical Journal 1999; 318(7191):1108–1111.
 28Smokingrelated behaviour and attitudes. A report on research using the National Statistics Omnibus Survey produced on behalf of the Department of Health and the Information Centre for Health and Social Care. Office for National Statistics, 2005. (Available from: http://www.statistics.gov.uk/downloads/theme_health/Smoking2005.pdf) [accessed on 4 December 2010]., , , , .
 29Modelling general patterns of digit preference. Statistical Modelling 2008; 8(4):385–401., , .
 30Office for National Statistics. Key health statistics from general practice, 1998. (Available from: http://www.statistics.gov.uk/downloads/theme_health/key_Health_Stats_1998.pdf) [accessed on 8 April 2010].
 31Survival Analysis: Techniques for Censored and Truncated Data, 2nd ed. Springer: Baltimore, Maryland, USA, 2003., .
 32Stata Corporation. STATA Manual, 11th edn. STATA/SE: College Station, Texas, USA, 2009.
 33Bootstrap Methods and Their Application, 9th ed., Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press: Cambridge, UK, 2007., .
 34Multistate models for panel data: the msm package for R. Journal of Statistical Software 2011; 38(8):1–28..
 35An Introduction to Numerical Analysis, 2nd ed. John Wiley & Sons: Hoboken, New Jersey, USA, 1989..
 36The convergence of a class of doublerank minimization algorithms. Journal of the Institute of Mathematics and Its Applications 1970; 6:76–90..
 37A new approach to variable metric algorithms. Computer Journal 1970; 13:317–322..
 38A family of variable metric updates derived by variational means. Mathematical and Computer Modelling 1970; 24:23–26..
 39Conditioning of quasiNewton methods for function minimization. Mathematical and Computer Modelling 1970; 24:647–656..
 40Optimal conditioning of quasiNewton methods. Mathematical and Computer Modelling 1970; 24:657–664., .
 41An analysis of transformations. Journal of the Royal Statistical Society: Series B 1964; 26(2):211–252., .
 42 , , , .
 43 , , , .
 44New methods for analyzing active life expectancy. Journal of Aging and Health 1998; 10:214–241., .
 45Estimating incrementdecrement life tables with multiple covariates from panel data: the case of active life expectancy. Demography 1994; 31:297–319., , .
 46The American Way of Ageing: An Event History Analysis. University of Chicago Press: Chicago, 1990., .
 47Sharing the Burden: Strategies for Public and Private Longterm Care Insurance. Brookings Institution: Washington, DC., , .
 48Model diagnostics for multistate models. Statistical Methods in Medical Research 2009; 19:621–651., .