A multistate survival model of the natural history of cancer using data from screened and unscreened population

One of the main aims of models using cancer screening data is to determine the time between the onset of preclinical screen‐detectable cancer and the onset of the clinical state of the cancer. This time is called the sojourn time. One problem in using screening data is that an individual can be observed in preclinical phase or clinically diagnosed but not both. Multistate survival models provide a method of modeling the natural history of cancer. The natural history model allows for the calculation of the sojourn time. We developed a continuous‐time Markov model and the corresponding likelihood function. The model allows for the use of interval‐censored, left‐truncated and right‐censored data. The model uses data of clinically diagnosed cancers from both screened and nonscreened individuals. Parameters of age‐varying hazards and age‐varying misclassification are estimated simultaneously. The mean sojourn time is calculated from a micro‐simulation using model parameters. The model is applied to data from a prostate screening trial. The simulation study showed that the model parameters could be estimated accurately.


INTRODUCTION
The purpose of cancer screening is to detect cancer prior to the onset of symptoms, in order to improve prognosis by earlier intervention. Early in the natural history of cancer, a newly developed tumor would be asymptomatic and undetectable. At some point it becomes detectable by screening. In nonscreened individuals or if the screening test does not detect the cancer, it may progress to become symptomatic and could be diagnosed clinically. The time by which screen detection advances the diagnosis is the lead time. The maximum lead time is the sojourn time. This is the preclinical screen-detectable period (PCDP).
The potential for screening to detect cancer early depends on the sojourn time, the frequency of screening, and the sensitivity of the screening process. Sensitivity refers to the potential for screening to identify a cancer in the PCDP. Estimates of the probability of cancer initiation, the sojourn time, and the sensitivity inform the design and evaluation of a screening program.
Data from cancer screening programs and cancer screening trials provide unique problems in determining sojourn time. Once a tumor in PCDP is screen-detected and treated by surgical resection, then its progression into the clinical phase is aborted. The exact time of entry into PCDP is not observed. In the data, we only observe individuals either in PCDP or with symptomatic clinical cancer but never both. Models used to determine the sojourn time must overcome this problem.
The phases of the cancer can be defined as mutually exclusive states. A multistate Markov model can be used to determine the transitions between states involving survival data. 1 Such a multistate survival model must incorporate interval-censored and left-truncated (delayed entry) data. Left-truncation accounts for prevalence at first observation. Transition rates and test sensitivity can be estimated simultaneously. 2 Cancer incidence and mortality increase with age. 3 Parmigiani et al 4 show that the incidence of preclinical cancers varies with age. Weibull or Gompertz hazards are two particular functions used, in multistate models in survival analysis, for age-varying transition rates. 2 However, most studies of multistate models using cancer screening data have constant exponential hazards. 5,6 Piecewise-constant hazards are used to account for age-varying transition rates. [7][8][9] Models with piecewise constant hazards have a greater number of parameters to fit than continuous hazards, and the transition rates jump between age groups. Other studies use the Weibull hazard function to model nonhomogeneous rates; 8,10 however, those studies fail to take into account deaths as a competing risk. Death from competing causes has been studied in multi-state models using screening data by Wu et al 11 and Taghipour et al 12 However, both studies used piecewise constant hazards as opposed to continuous hazards.
Observations of clinical diagnoses are required to derive the transition rate from PCDP to the clinical state. Existing multi-state models using screening data, use either interval cancers (cancers diagnosed clinically in the interval between two screening rounds) only or all clinical diagnoses. 5,13 All clinical diagnoses consist of interval cancers, symptomatic cancers arising among individuals not attending a screening episode or not invited to screening such as individuals in the control arm of a screening trial. Chen et al 13 used a three-and five-state multistate models for breast cancer screening data; however, the authors did not consider death from other causes or age-varying transition rates. To our knowledge, there are no published models that combine using data on nonscreened individuals while accounting for death from competing causes.
We propose a four-state model to account for competing risks in survival. The model includes age-varying hazards and age-varying misclassification. Additionally, the survival model allows for left-truncated data and requires data from screened and nonscreened participants to estimate the hazard parameters. This study is novel in that it combines all of these elements into a multistate model. In Section 2, we describe the model and its likelihood function. In Section 3, we describe data from randomized prostate cancer screening trial and its application in modeling the natural history of prostate cancer. Lastly, in Section 4, we perform a simulation study.

NATURAL HISTORY MODEL
We developed a four-state continuous-time Markov model with states S 1 , healthy, S 2 , screen-detectable, S 3 , clinical cancer , S 4 , death. State 3, clinical cancer, refers to the point at which a cancer, if it is not screen-detected, will be diagnosed. Screening tests are not 100% accurate, so we include the possibility of misclassification in the hidden Markov model, which we present in Figure 1. Screen-detected cancers are biopsy confirmed diagnosis following testing positive on a screening test. Therefore a subject in true state 1 cannot have an observation of state 2. The time of the observation refers to the time of the screening test and not of the biopsy. We have not accounted for the transition from state 3 to state 4, as the time spent in state 2, sojourn time, is independent of any transition after state 3. Transition rates are elements of an intensity matrix, and the misclassification rates are elements of a misclassification matrix. Misclassification in this context reflects screening uptake, Prostate-Specific Antigen (PSA) performance, screening interval, uptake of diagnostic investigation. Sensitivity, 1 -misclassification, of the screening process would be lower than the test sensitivity. 14 The intensity matrix Q(t), at time t > 0, is given by where q ij (t) are hazard functions. In this paper, we concentrate on two parametric hazard functions, the exponential hazard and the Gompertz hazard. The Gompertz hazard function takes the form, where ij > 0, is an age-varying hazard. The exponential is a constant hazard that is, (2) with ij = 0. The variable Z is the vector of time-constant covariate values, and ij , and are the parameters of interest. However, in this article, we focus on the case without covariates, Z, ij ≡ 0 without loss of generality. An assumption we have made is that the transition rates to death from healthy and screen-detectable states are equal. That is the hazards q 14 (t) = q 24 (t), and the parameters are constrained such that 14 = 24 and 14 = 24 . It is difficult to identify all parameters if we relax one (or both) of these restrictions, such that 14 ≠ 24 or 14 ≠ 24 . The probability of observing j when the true state is i at age t, Pr(O t = j|S t = i), is given by the function where m ∈ R and m ∈ R. e ij (t) is the ijth element of the 4 × 4 misclassification matrix.
The probability of transitioning from state i to state j in time interval (t 0 , t 1 ] is Pr(S t 1 = j|S t 0 = i). This is given by the ijth element of the transition probability matrix, P. For a constant intensity matrix where Q(t 0 ) = Q, the probability matrix P(t 0 , t 1 ) is A piecewise constant approximation (section 3.6 in Reference 2) is employed for an intensity matrix, Q(t). The transition probability matrix, P(t 0 , t 1 ), is approximated using a grid with grid length l. The probability matrix is where 0 ≤ n < t 1 −t 0 l . As an example, consider t 0 = 2 and t 1 = 4.5 with grid length, l = 1, then The grid length is chosen in relation to the screening frequency and the transition rates. We have chosen a grid length of l = 1 for the trial data and the simulation data.

Likelihood function
In this section, we explain the likelihood function for the model. The data for the model include interval-censored, censored state, misclassification, right censored, and left-truncated observations. The likelihood contribution for an individual, who has N observations, at ages t 1 to t N , is given by where is the model parameters. Let t 0 be the left-truncation age, the age at which it is assumed all individuals are healthy. The likelihood contribution, L( |t 1 … t N ) is a scalar value; L n is a matrix dependent on the ages t n−1 , t n , and the corresponding observation o n . Additionally, 1 = [1, 1, 1, 1] T , and 1 = [1, 0, 0, 0] T , the basis vector, indicates all individuals are in state 1 (healthy) at age t 0 . The random variable time to state 2 is interval-censored since it is only known that the individual enters state 2 between observations. If the observation at age t n is o n = 1 then there exists a possibility of misclassification defined by (3). The probability of transitioning from state 1 or 2 to state s n but observing o n Not accounting for the sensitivity of cancer screening tests would under-estimate the length of sojourn time. Satten and Longini proposed methods in determining underlying dynamics of diseases with measurement error. 15 We follow this method along with the assumption that misclassification is independent of previous misclassification. The contributing matrix, L n , in (7) is where As an example, let us say an individual is first observed at age t 1 . The second observation at age t 2 is a negative screening result, o 2 = 1. In this scenario (9) is where p ij is the ijth element of the P matrix. Nonscreened individuals would be in either healthy (true state 1) or screen-detectable (true state 2) states if alive and not clinically diagnosed. The contributing matrix for the likelihood Equation 7 for an individual with a censored state observation is L n = P(t n−1 , t n ). The age at death, state 4, is always known and is not interval-censored. The other absorbing state is clinical cancer, state 3, and is also assumed to be not interval-censored. Hence, observations of states 3 and 4 are exact time transitions, and without misclassification, the contribution to (7) for observations of these states, d, is The method above assumes the intensity matrix is constant in the time interval (t N−1 , t N ] and transition instantaneously at age t N . The method aligns with the msm package in R. 16 The msm package does not account for left-truncation, we construct the likelihood function with left-truncation. The first observation is a censored state for individuals in the control arm or nonscreened participants in screening data. A censored state is when we do not know whether the true state of these individuals is 1 or 2. Screened individuals that have a negative screen result at the first observation may be the result of incorrect observation of healthy state when the true state is screen-detectable. To model the first observation we followed the method described by van den Hout (section 8.4 in "Multi-state models for interval censored data"). 2 The left-truncation method assumes all individuals are in true state 1 at some initial age, t 0 . Furthermore, individuals who had entered state 3 or had died before t 1 were censored from the analysis. The probability of a left truncated observation is The contribution of the first observation, L 1 , in (7) is The likelihood function is a product of individual contributions L (k) for individual k, We maximize the log-likelihood function to estimate model parameters. The SEs of the estimates are determined from the resulting Hessian.

Data
We used data from the Prostate Lung Colorectal and Ovarian screening trial. 17 A total of 76 685 men, aged 55 to 74 years, were randomly assigned to intervention (annual screening with the blood test, PSA for 6 years, 38 340) or control (usual care, which included opportunistic PSA testing, 38 345) and followed up for a median of 13 years. 18 The individuals enter and leave the trial at different ages, either due to death, positive cancer diagnosis or end of the follow-up period, see Figure 2. The state table in Table 1 illustrates the problem of deriving the sojourn time from cancer screening data; that is there is no observed transition from state 2 to state 3. Figure 3 shows more screen detected cancers at first screening compared to subsequent screenings, over 28% of all the screen-detected cancer. Participants with screen-detected cancer in the remaining five screenings represented 1.03% of all participants. To perform the analysis of the prostate cancer data, we took a random sample of 10 000 participants, 5000 from the screening arm and 5000 from the control arm.

Model
We present three models of the natural history of prostate cancer, using different hazard distributions on the transitions. Firstly, we apply the model (Figure 1), with exponential hazards on all transitions, to the trial data. Let us call this Model A. Secondly, we employ a Gompertz hazard on all transitions. Let us call this Model B. Lastly, a model with Gompertz hazard on transitions from state 1 to state 2 and state 1 to state 4 (states 2 to 4), and an exponential hazard on state 2 to state 3. Let us call this Model C. We estimate model parameters, by maximizing the likelihood function (15). We assume all patients are healthy (in state 1) at some age t 0 before the trial starts. Healthy refers to no detectable cancer. We assumed that every individual of the PLCO trial was in state 1 at age 40. Secondly, the hazard rate of death from competing causes is equal when patients are in state 1 or state 2. Lastly, we assumed that hazards are monotonically increasing or are constant with respect to age.
From (2) and (3) the hazards and misclassification functions are TA B L E 2 Estimated parameters for Model C on prostate cancer screening data. Model C has a Gompertz hazard on transition from state 1 to state 2 (q ij = exp( ij + ij t)), and on transitions to state 4, exponential on transition on transition from state 2 to state 3 (q ij = exp( ij )), and with age-varying . Age was centered in this model where an age of 68 was set to 0. The program sensitivity for a 55 year old is 18.4% and for a 80-year old the sensitivity is 92.5%

Parameters
Value (95% CI) SE = exp( 24 The hazards and misclassification functions for Model A are Equations (16) to (20) Table 2, and the graphs of the hazards and misclassification are presented in Figures A1 and A2. The results indicate that constant hazards yield worse results than time-varying hazards. The AIC of Model A was 30 243, and the AIC of Model C was significantly lower with 29 412. The SE of the misclassification parameter, m , was very high (50.20) for Model A. The high SE suggests the exponential model, Model A, was not able to correctly identify the true misclassification rate. Model B yielded a negative 23 . However, since we restricted our analysis to 23 ≥ 0 we concentrate on Model C.

Model application
The sojourn time can be calculated from a micro-simulation using the model parameter estimates in Table 2. The length of time in state 2 (the screen-detectable phase) given that the individual enters state 3 (clinical cancer) is the sojourn time.
The algorithm for the micro-simulation is in Box 1. To generate survival times we follow the method described by Bender et al 19 The time to an event, t n -t n−1 , is given by where H is the cumulative hazard function and U ∼ U[0, 1]. The micro-simulation consisted of 10 000 individuals, 3134 of whom entered state 3. The mean sojourn time, 3.79 (95% confidence interval: 3.58-3.85) years, is found by taking the mean of the time in state 2 of the 3134 simulated individuals.
Overdiagnosis is the proportion of screen-detected cases that would not have been diagnosed clinically within a person's lifetime. Using the model, we can calculate overdiagnosis by employing a micro-simulation. Varying the start and stop age, and the frequency of screening, one could determine a screening strategy that minimizes overdiagnosis while detecting harmful cancers.

Model validation using cumulative incidence function
To validate the model, we compare nonparametric and model-based cumulative incidence functions (CIF). For state k, the CIF I k is the probability, Pr(T ≤ t, S = k) of experiencing the absorbing state k before age t. 20 A comparison of the CIF I 3 and I 4 provides a heuristic measure for the goodness of fit.  The nonparametric estimate of the CIF for state k is given bŷ where n j is the number at risk at age t j and d k,j is the number of observations of state k at age t j . We estimate I 3 and I 4 for the control group in the PLCO data. The model-based estimate of Pr(T ≤ t, S = k) is computed using the transition probabilities from the model. We use 38 323 individuals of the control group to validate the model. The model CIF replicates the ages at baseline of the PLCO data, see Figure 4. The estimates for I 3 , in Figure 5A, shows good agreement between the model and the PLCO data. However, at an advanced age the accuracy of the model becomes less accurate. The estimates for I 4 of the data and model in Figure 5B match closely. The graphs of the CIFs indicate there is a good fit between the model and the data.

Data generation
To validate the models, we performed a simulation study, estimating model parameters and comparing them to the input parameters. The simulation study involved creating 200 sets of panel data. Each panel data were created with distributions with known parameters. To create the data, we followed the algorithm in Box 2.

Algorithm 2. Algorithm to generate simulation data
Simulation of cancer screening trial data: 1. Initial state is 1 the participants age is between 20 and 39, say t −1 . 2. From this state, the individual can transfer to 2 and 4, from Equation (21).We simulate the ages at these states, taking the minimum of the simulated ages and the corresponding state. If individual enters state 4 go to step 4 3. Simulate ages at states from state 2, the individual can transfer to states 3 or 4. 4. If the entry to state 2 is before age 40, or the entry to the absorbing states is before t 1 = t −1 + 21, go back to step 1. 5. Amend the data such that the first observation is at t 1 . 6. Implement screening procedure, including misclassification, for the screening arm. We have chosen annual-screening to mimic the PLCO data. 7. For control arm from age t 1 add in the censored state, which we label −1. 8. Remove observations of state 3 and state 4, where an observation of state 2 exists. 9. To implement the grid structure, we add "observations" of censored state (−1), where the time between observations is more than one year. 10. Implement maximum follow-up period, remove all observations after 12 years. 11. Add right censoring state a year after the last observations if the last observation is 1 or −1.
We present a study of the model in Figure 1, using two different hazard functions on the transition from state 2 to state 3. Firstly, the model with an exponential hazard between state 2 and state 3, and a Gompertz hazard on the remaining transition. This is Model C from Section 3.2. Each dataset contains 1600 individuals, 800 simulated individuals in the screening arm and 800 in the control arm. Secondly, the model employed Gompertz hazards on all transitions. This is Model B from Section 3.2. The simulation study for this model used data sets containing 1000 individuals, 500 in each arm. Both models include age-varying misclassification. The averages of the parameters in 200 simulations, estimated by a Nelder-Mead algorithm, are compared numerically to the input parameters. We assumed the simulation TA B L E 3 Simulation results for a Model C. Model C has a Gompertz hazard on transition from state 1 to state 2, and on transitions to state 4 (q ij = exp( ij + ij t)), exponential on transition on transition from state 2 to state 3 (q ij = exp( ij )) and with age-varying misclassification . The true parameter values were chosen to be similar to the prostate cancer screening trial in correctly identifies the parameters if the relative bias is less than 0.1 for every parameter in the model. The models were implemented in R and simulated with the use of a high performance computing cluster.

Results of the simulation study
The piecewise approximation method accurately estimates the parameters of age-varying hazards in the multistate model in Figure 1 with (16) to (20). The simulations of Model C correctly identified all parameters. Table 3 shows the median relative bias is less than 0.1 for all the parameters. However, not all parameter estimates have a mean relative bias of less than 0.1. Of the 200 simulations, 162 simulations yielded a positive definite Hessian and converged. Table 4 compares results of truly specifying the left-truncation age, and wrongly assuming the left-truncation age to be 10 years lower, and 10 years higher. The relative bias in the estimates was small as seen in Table 4. Simulation results for Model C where the number of subjects is 300, 600, 900, and 1200 are presented in Table B1. The results show that if the number of subjects is 900 or below that not all the parameters are identifiable, with a relative bias of less than 0.1. The SEs prominently decrease as the number of subjects increase. Secondly, Model B has a relative bias of less than 0.1 for all parameters (see Table 5). A positive definite Hessian was exhibited in 173 of 200 simulations. The coverage is the proportion of simulations in which the true parameter value is contained within the confidence interval. The coverage is calculated only with simulations with a positive definite Hessian. The 95% confidence interval coverage of both models is presented in Figures 6 and 7. The true parameter values were well covered in the simulations. The coverage was between 85% and 97% for all parameters. TA B L E 5 Simulation results for a Gompertz hazards (q ij = exp( ij + ij t)) on all transitions (Model B), and The coverage graphs show in some simulations the SEs of the misclassification parameters were large. Even though the true parameter values lie within the 95% confidence interval, the wide confidence interval in these simulations indicates the parameter estimates were not precise. The confidence intervals that do not contain the true parameter value in Figures 6B,E and 7A,D; the misclassification parameters Figure 7G,H, may indicate that overestimating the 's in the hazard functions is compensated by underestimating the effect of age, 's.
The model can identify parameters similar to the estimates for prostate cancer in Table 2. However, the hazard rate, from state 2 to state 3, might be monotonically increasing for other different cancers. We have shown these parameters are identifiable too.

DISCUSSION
We have developed a model using cancer screening data and the corresponding likelihood function. The simulation study shows that a maximum likelihood estimation can retrieve the correct parameter values. The simulation study also shows that the piecewise approximation method to determine the probability matrix, (5), is appropriate for age-varying hazards.
We use a cancer screening trial to model the natural history of prostate cancer. The mean sojourn time is calculated from the prostate cancer model parameters by a micro-simulation. The mean sojourn time (3.79 years) is lower than might be expected for prostate cancer. However, this is due to the high amount of contamination in the trial, where men in the control arm have taken the screening test. 21 Our model assumes that participants in State 1 and State 2 are random sample of the general population. However, PLCO trial, as any other screening trial, has healthy volunteer effect, where the study participants might be in better health than the general population. 22 Compliance with screening and survival patterns could be different between the screening trial participants and the general population. As we use data from both arms of the screening trial to derive the transition probabilities, healthy volunteer effect would not affect the internal validity of our findings but could affect its generalizability. Correction for healthy volunteer effect is important but would require additional data and further assumptions. 23 Additionally we assumed all subjects were in state 1 at age 40. However, we show that misspecifying the left-truncation age by 10 years would have a minimal effect on the parameters estimates.
The extent of overdiagnosis is important in the evaluation of a screening strategy. A benefit of the model is that it can be used to determine the effects of different screening programs on the cancer. Using the model we can determine the proportion of cancers detected by a screening program, and the proportion of those cases that are non-overdiagnosed harmful cancers.
Existing software is available for many of the elements needed for the natural history model but not all. For example, models in the msm package in R can have time-varying hazards, and hidden Markov models with time-varying misclassification. 16 However, we developed software for models that include left-truncated data, which msm is unable to do.  We have derived the transition to the clinical state by using data on clinical cases from both the screened and unscreened population. Using only the interval cases would capture only fast progressing cancers and cancers that are missed by screening. The sojourn time distribution derived from interval cancers would not represent the sojourn time distribution of all cancers. Not using an unscreened cohort will cause an unrepresentative mix of the cancers in these studies. Most other models that use screening data neglect the control arm or use it to determine prevalent cases, test sensitivity, or used to test the validity of the model. Not using clinically diagnosed cancers from an unscreened population will bias results.
Some studies calculate prevalence from a control arm or assume prevalence from other data sources. Figure 3 highlights the need for accurately describing prevalent cases. The left-truncation method is the most suitable method. 24 We estimate hazard rates simultaneously with age-varying misclassification. In many models, test sensitivity is an input though there are some notable exceptions. 5,12,25 In a study of breast cancer, Taghipour et al 12 show test sensitivity changes with age. However, the calculated sensitivity is not continuous but piecewise constant. The authors do not use the left-truncation method and diagnosed cases in the first 6 months of enrolment are removed from the study. While both Uhry et al and Wu et al have assumed constant sensitivity without competing risks. 5,25 In our analysis, we use a Gompertz hazard, but this does not prohibit other time-varying hazards being used in the model. For the transition from state 2 to state 3, a Gompertz hazard implies mean sojourn time becomes shorter with age. Other hazards that have non-monotonic functions may also be employed, but the additional complexity may require a further simulation study. Incorporating a hazard function of time since entry into state 2 could improve the model. Since state 2 is interval censored this semi-Markov model would require integrating through all possible entry times. While this is possible and perhaps may increase the accuracy of the model, the CIFs indicate that there is a good fit between the model and the data. In the model, we assumed that individuals in the healthy and in screen-detectable state experience the same other cause mortality rate. Another assumption we made is that a false-negative screening is independent of previous false negatives. These assumptions are unlikely to impact findings.
Our approach solves many issues in modeling the natural history of cancer using cancer screening data. The model is particularly novel in that it incorporates age-varying hazards and age-varying misclassification, with a state of death as a competing risk, while using data from non-screened individuals. Figure A1 is the hazard rates for modeled parameters from the PLCO data Figure A2 is the probability of misclassification and sensitivity by age in the PLCO trial: The verticle lines indicate the minimum and maximum screening ages in the PLCO trial. The vertical dashed lines specify the age range of screening in the PLCO trial. Estimates of probability of misclassification and sensitivity outside these age ranges should be read with caution.

APPENDIX B. SIMULATION STUDY RESULTS OF DIFFERING SAMPLES
A simulation study of Model C; Gompertz hazards on transitions of 1 to 2, 1 to 4 and 2 to 4, and a exponential hazard on the transition from 2 to 3. Model C includes age-varying misclassification.