The authors are indebted to their collaborators in this study: Prof. O. Kronborg, Mrs. Dr. D. Gyrd-Hansen, Odense University Hospital; Prof. J. Faivre, Mrs. Dr. C. Lejeune, Burgundy Cancer Registry; Prof. J.D. Hardcastle, Prof. D.K. Whynes, University of Nottingham; Dr. N. Segnan, Dr. G. Castiglione, Dr. C. Senore, Centro Prevenzione Oncologica Regione Piemonte; Dr. G. Hoff, Telemark Central Hospital; Dr. E. Thiis-Evensen, Riskhospitalet; Dr. H. Brevinge, Sahlgrens Hospital; Dr. T. Church, University of Minnesota; Dr. F. Loeve and Dr. G. van Oortmarssen, Erasmus MC University Medical Center Rotterdam. Their cooperation was essential for the successful completion of the study.
Estimates of the fecal occult blood test (FOBT) (Hemoccult II) sensitivity differed widely between screening trials and led to divergent conclusions on the effects of FOBT screening. We used microsimulation modeling to estimate a preclinical colorectal cancer (CRC) duration and sensitivity for unrehydrated FOBT from the data of 3 randomized controlled trials of Minnesota, Nottingham, and Funen. In addition to 2 usual hypotheses on the sensitivity of FOBT, we tested a novel hypothesis where sensitivity is linked to the stage of clinical diagnosis in the situation without screening.
We used the MISCAN-Colon microsimulation model to estimate sensitivity and duration, accounting for differences between the trials in demography, background incidence, and trial design. We tested 3 hypotheses for FOBT sensitivity: sensitivity is the same for all preclinical CRC stages, sensitivity increases with each stage, and sensitivity is higher for the stage in which the cancer would have been diagnosed in the absence of screening than for earlier stages. Goodness-of-fit was evaluated by comparing expected and observed rates of screen-detected and interval CRC.
The hypothesis with a higher sensitivity in the stage of clinical diagnosis gave the best fit. Under this hypothesis, sensitivity of FOBT was 51% in the stage of clinical diagnosis and 19% in earlier stages. The average duration of preclinical CRC was estimated at 6.7 years.
Colorectal cancer (CRC) is the second leading cause of cancer mortality in developed countries.1 Because prognosis for CRC is mainly related to the extent of tumor spread at the time of diagnosis, earlier presymptomatic diagnosis offers hope of mortality reduction. Three large randomized trials have conclusively shown that screening with the Hemoccult II fecal occult blood test (FOBT) can reduce CRC mortality by 11% to 33%.2-4
FOBT trials provide information on estimates of mortality reduction, as well as rates of screen-detected CRC, stage distribution of screen-detected CRC and interval cancers. This information can be used to obtain estimates of sensitivity of FOBT and sojourn time (ie, the duration of the preclinical screen-detectable cancer period). Sensitivity of FOBT screening has been estimated individually for each screening trial, but these estimates differ from 54% to 59% for the Nottingham trial,5 62% for the Funen trial,6 to 94% to 96% for the Minnesota trial.7 These differences can at least partly be explained by differences in estimation methods. Using different estimates for sensitivity and how it relates to sojourn time to make predictions of CRC screening beyond the trial setting, will lead to diverging conclusions concerning the (cost-) effectiveness of FOBT screening. This not only holds for the guaiac FOBT, but also for new and more sensitive FOBTs, for which no randomized controlled trial results are available.
In this study, we used the MISCAN-Colon microsimulation model to estimate unrehydrated FOBT sensitivity and preclinical CRC duration simultaneously on the randomized controlled FOBT trials of Minnesota, Nottingham, and Funen. Although, the methodology used is standard (we simulated the trials and evaluated with which values of sensitivity and duration the expected (ie, simulated) outcomes are closest to the observed),8, 9 the exceptionality of this analysis is that we simulated 3 trial populations instead of 1. In addition to the usual hypotheses for which FOBT sensitivity is the same for all CRC stages or increases with stage, we also evaluated a novel hypothesis for which sensitivity is linked to the stage in which the cancer would have been diagnosed in the absence of screening. In the model, each clinical CRC diagnosis in a certain stage is preceded by a preclinical phase in the same stage. In the novel hypothesis, we assumed that sensitivity was higher in this preclinical stage than in the earlier stages.
MATERIALS AND METHODS
Table 1 contains an overview of the most important differences in trial design among the Minnesota, Nottingham, and Funen trials, which we accounted for.
Table 1. Overview of Differences in Design of 3 Large FOBT Screening Trials*
All three trials used 6 slide Hemoccult II FOBT. FOBT indicates fecal occult blood test.
Screening was not performed in the period 1982 to 1986.
Only attending individuals reinvited. From 1990, all were reinvited
Only attending individuals were reinvited
Unrehydrated, later rehydrated
4 or less slides positive: re-test and eventually colonoscopy
5 or more positive: mainly colonoscopy
The Minnesota trial was originally designed to screen and follow participants from 1975 through 1982.10 In this period, 46,551 participants ages 50 to 80 years were recruited among volunteers in Minnesota. In February 1986, screening was reinstituted and continued through February 1992. Participants were randomly assigned to screening once a year, to screening once every 2 years, or to a control group. Participants in the 2 screening groups were each asked to collect 2 samples from 3 consecutive stools on a Hemoccult II FOBT-kit. The participants were instructed to abstain from dietary factors influencing the specificity of the test. Initially, the slides were processed unrehydrated; from 1977 onward, slides were rehydrated with a drop of deionized water to increase sensitivity. Persons with 1 or more slides testing positive were referred for diagnostic follow-up, mainly by colonoscopy. All persons alive without CRC were reinvited for screening after 1 year or 2 years, depending on the study arm. Controls were not invited for screening. Eighteen years after initiation, the study reported a 33% CRC mortality reduction in the annual arm and 21% in the biennial arm.4
From 1981 to February 1995, a total of 152,850 subjects from the area of Nottingham were randomly allocated to biennial FOBT screening or no screening (controls).2 Controls were not informed about the study. FOBTs were not rehydrated and dietary restrictions were imposed only for retesting borderline results (4 or less positive slides). Screening-group participants with a positive test were offered full colonoscopy. Initially, individuals who attended screening were invited to take part in further screening every 2 years. From 1990 onward, also nonattenders to screening were reinvited. After 14 years, the study reported a 15% reduction in CRC mortality in the intervention group.
From 1985 to 2002, a total of 61,933 inhabitants of Funen, Denmark, ages 45 to 74 years were randomly allocated to either FOBT screening every 2 years or no intervention. Six-slide Hemoccult II blood tests (with similar dietary restrictions as in Minnesota but without rehydration) were sent to screening-group participants. Only participants who completed screening were invited for further rounds. Participants with positive tests were offered colonoscopy whenever possible. The reported mortality reduction in this study was 18% after 7 screening rounds.3
The MISCAN-Colon microsimulation model was developed at the Department of Public Health at Erasmus MC, Netherlands, in collaboration with the US National Cancer Institute and experts in the field of CRC to assess the effect of different interventions on CRC. A graphical representation of the natural history in the model is given in Figure 1. A detailed description and the data sources that inform the quantification of the model can be found in previous studies,11-13 and in a standardized model profile.14 In brief, the MISCAN-Colon model simulates the relevant biographies of a large population of individuals from birth to death, first without screening and, subsequently, the changes that would occur under the implementation of screening. CRC arises in this population from the development of adenomatous polyps that may progress to carcinoma.15, 16 More than 1 adenoma can occur in an individual and each can independently develop into CRC. Adenomas progress in size from small (≤5-mm) to medium (6-9 mm) to large (≥10-mm). Some of the adenomas eventually become malignant, transforming to a localized (Dukes A) cancer. The cancer can then progress through Dukes B and C stages to metastasized (Dukes D) cancer. In every stage, there is a chance of diagnosis of the cancer because of symptoms. The survival after clinical diagnosis depends on the stage in which the cancer was detected.
After the life history of an individual in the absence of screening is generated, the model simulates if and when screening interrupts the development of CRC in that same life history. With screening, adenomas are detected and removed and cancers are detected and treated earlier in time. The probability of detection of a certain lesion depends on the sensitivity of the test for the stage the lesion is in. Because the life history in the absence of screening is first simulated, the stage in which the cancer would have been diagnosed in the absence of screening is known in the model.
The model as quantified for the general US population,11, 13 served as the basis of this analysis. The model was the same for each trial with respect to the natural history of disease and FOBT sensitivity, but differed with respect to trial-specific characteristics such as the age distribution of the eligible population, the attendance pattern and CRC risk. Table 2 contains an overview of model parameters that were adjusted to the trial-specifics. We assumed that differences in CRC incidence between the general US population and the control groups in the 3 trials, were caused by differences in adenoma onset, and we adjusted the adenoma risk parameter accordingly (Table 2). Also, the probability of clinical diagnosis for each CRC stage was varied between the trials, reflecting differences in stage distribution of CRC in the control groups. Screening ages, invitation protocol, and compliance with screening and follow-up of positive test results were explicitly modeled in each population according to what was observed in each of the corresponding trials. As observed in the trials in first and consecutive rounds, not all invited individuals attend screening in the model. Each invited individual has a certain probability to attend first screening. For consecutive screenings, previous attenders have a higher probability to attend the consecutive screen round than nonattenders. The adenoma risk in the nonattenders was adjusted to reproduce observed CRC incidence in this group in each trial. Because, based on randomization, on average the CRC risk in the total intervention group should match that of the control group, the attenders were left with a correspondingly lower adenoma risk. Because of the difference in dietary restrictions between the trials, specificity of FOBT was allowed to vary among the 3 trials. With this complete set of adjustments, simulated incidence and stage distribution of the control group were within 1% of observed for all 3 trials (data not shown).
Table 2. MISCAN-Colon Model Parameters as Adjusted Specifically to the Trials
We assessed 3 different hypotheses for FOBT sensitivity. Hypothesis A: Sensitivity of FOBT is the same for all 4 preclinical cancer stages (1 parameter). Hypothesis B: Sensitivity of FOBT increases with each preclinical cancer stage (4 parameters). Hypothesis C: Sensitivity of FOBT is higher in the stage in which the cancer would have been diagnosed in the absence of screening than in earlier stages (2 parameters). A total of 4 parameters for average duration were estimated, 1 for each preclinical CRC stage.
In the Minnesota trial, both unrehydrated and rehydrated FOBT were used. As part of the estimation procedure, we, therefore, also estimated sensitivity for rehydrated FOBT assuming the same hypotheses as for unrehydrated FOBT. Because the Nottingham and Funen trials did not rehydrate tests, rehydrated FOBT was not the focus of our analysis.
The sensitivity and duration parameters for each hypothesis were estimated by minimizing the difference between observed and expected trial outcomes. Trial outcomes used for estimation were as follows: 1) screen-detected cancers by screening round, 2) stage distribution of screen-detected cancers for first and consecutive screening rounds, and 3) interval cancers by years since negative screening. Because the trials differed in number of screening rounds and interval, the number of outcomes per trial was different. There were 26 outcomes for Minnesota, 15 for Nottingham, and 18 for Funen. The corresponding expected outcomes were generated per trial with the MISCAN-Colon microsimulation model. The significance of the difference between observed and expected outcomes was assessed by the following chi-square statistic:
where Ek, = Expected number of CRC cases for outcome i in trial k; Ok,i = Observed number of CRC cases for outcome i in trial k.
The overall chi-square statistic of each hypothesis was calculated as the sum of the chi-square statistics of the individual outcomes. We assumed outcomes to be independent and uncorrelated. This overall chi-square statistic was minimized with an adaptation of the Nelder-and-Mead Simplex Method.8 The Nelder-and-Mead method is a common approach to estimating parameters with microsimulation models, because derivatives of equations of these models are often too complex to use Maximum-Likelihood approaches. The resulting chi-square statistic after estimation of the parameters was a measure of the goodness-of-fit of each hypothesis. The degrees of freedom of the chi-square statistic were equal to the total number of trial outcomes compared minus the number of parameters under the respective hypothesis. The chi-square statistics of hypotheses B and C could not be directly compared statistically because there is no hierarchical relationship between the hypotheses. We used the Akaike Information Criterion to compare these 2 hypotheses. We assumed the outcomes were Poisson distributed. The formula for the Akaike Information Criterion with Poisson distributed outcomes is:
where n = Number of parameters; Ek, = Observed number of CRC cases for outcome i in trial k; Ok, = Expected number of CRC cases for outcome i in trial k.
The Akaike Information Criterion is a standard tool for model selection, with the model having the lowest value being the best.
We also derived conditional confidence intervals around the estimated parameters. We determined to what values we could change each of the estimated parameters without significantly worsening the goodness-of-fit of the model. The values closest to the estimated parameter at which the goodness-of-fit of the model significantly worsened (P = .05) constituted the boundaries of the confidence interval.
Sensitivity and Duration
Table 3 shows the estimates for sensitivity and duration. Assuming the same sensitivity of FOBT for all preclinical CRC stages resulted in shorter duration of Dukes A and B (1.6 and 2.1 years) than in Dukes C and D (4.0 and 3.2 years), due to higher detection rates in later stages than in earlier ones. With these durations, it took on average 6.0 years for a preclinical cancer to become clinically diagnosed. The estimated sensitivity of FOBT under this hypothesis was 33%. Assuming a higher sensitivity of FOBT with each Dukes stage resulted in a longer duration for Dukes A and C (3.8 and 3.6 years, respectively) compared with Dukes B and D (2.4 and 2.1 years). The average duration of preclinical CRC was 8.0 years. The sensitivity of FOBT is comparable for Dukes B and C disease (35%-38%), and lower for Dukes A (13%), and higher for Dukes D (66%). Assuming a higher sensitivity of FOBT in the stage of clinical diagnosis, Dukes C has longer duration (3.7 years) than the other 3 stages (2.5 years for Dukes A and B and 1.5 years for Dukes D). The average duration of preclinical CRC is 6.7 years. Sensitivity is considerably higher in the stage of clinical diagnosis than in earlier stages (51% vs 19%).
Table 3. Estimated Values (Confidence Interval) for Sensitivity of FOBT and Duration of Preclinical CRC for 3 Sensitivity hypotheses
FOBT indicates fecal occult blood test; CRC, colorectal cancer; Hypothesis A, same sensitivity of FOBT for all cancer stages; Hypothesis B, sensitivity increases with each cancer stage; Hypothesis C, sensitivity of FOBT different in stage of clinical diagnosis than in earlier stages.
Calculated as (% in stage A×duration A)+(% in stage B×duration A+B)+(% in stage C×duration A+B+C)+(% in stage D×duration A+B+C+D).
For the Minnesota trial, sensitivities of rehydrated FOBT were as follows: 28% for Hypothesis A; 10% Dukes A, 26% Dukes B, 56% Dukes C and 63% Dukes D for Hypothesis B; 55% stage of clinical diagnosis, 10% earlier stages for Hypothesis C.
Table 4 shows observed and expected detection and interval cancer rates aggregated for the 3 FOBT trials and the associated goodness-of-fit for each hypothesis. For hypothesis A, the expected outcomes differed significantly from observed (P < .01). This was mainly due to a significantly lower number of expected screen-detected cancers in Dukes A (first round, 91 expected vs 116 observed), and a significantly higher rate of interval cancers in the first 2 years after screening (432 expected vs 369 observed). For hypothesis B, the expected outcomes also differed significantly from observed (P < .01). Four expected outcomes under this hypothesis were different from observed: as with hypothesis A, the expected number of first round screen-detected cancer cases in Dukes A was lower than observed (93 vs 116) and the number of interval cancers was higher than observed (19 vs 369). Moreover, the observed number of screen-detected cancer cases in consecutive screen rounds was 543, where 492 were expected and the observed cases in stage B were 157, where 132 were expected. Hypothesis C had the lowest chi-square statistic (χ512 = 73) (Table 4). Although none of the expected outcomes aggregated over the 3 trials differed significantly from observed under hypothesis C, summed together, the outcomes significantly differed (P = .02). Nonetheless, hypothesis C was significantly better than hypothesis A (P < .01), whereas hypothesis B was not significantly better than hypothesis A (P = .48). Finally, hypothesis C had a better goodness-of-fit than hypothesis B with fewer parameters. This finding also showed from the Akaike Information Criterion, which was −10,582 for hypothesis C, better than the −10,562 for hypothesis B.
Table 4. Observed and Expected Screen-detected CRC, Stage Distribution of Screen-detected Cancers by Phase for First and Consecutive Rounds and Interval Cancers and Chi-Square Statistic for 3 Hypotheses for FOBT Sensitivity, 3 Trials Aggregated
Hypothesis A6 parameters
Hypothesis B12 parameters
Hypothesis C8 parameters
CRC indicates colorectal cancer; FOBT, fecal occult blood test; Hypothesis A, same sensitivity for all cancer stages; Hypothesis B, sensitivity increases with each cancer stage; Hypothesis C, sensitivity is higher in stage of clinical diagnosis.
Expected outcome significantly different from observed (P < 0.01).
Expected outcome significantly different from observed (P < 0.05).
Comparison of detailed trial-specific results (results not shown)
Under hypothesis C, 5 expected trial-specific outcomes differed significantly from observed: the expected interval cancer rate in the first year after screening in the Minnesota trial; the expected number of screen-detected cases in the first screening round in the Nottingham trial; and the number of screen-detected cases in the first screening round, the number of screen-detected cases in the second round, and the percentage of screen-detected cases in Dukes B in the Funen trial. In addition to these outcomes, there were 3 other significant differences under hypotheses A and B: the expected rate of interval cancers in the second year after screening in the Minnesota trial, the interval cancers after the first screening round in the Nottingham trial, and the screen-detected cancers in the seventh round in the Funen trial.
We have fitted sensitivity and duration for 3 different sensitivity models to the Minnesota, Nottingham, and Funen trial results. We found that the hypothesis in which sensitivity of FOBT is highest in the stage in which the cancer would have been clinically diagnosed in the absence of screening gave the best fit, with an estimate of 51%. In earlier stages, estimated sensitivity was 19%. The mean preclinical CRC duration was estimated at 6.7 years.
The hypothesis that sensitivity of FOBT is highest in the stage of clinical diagnosis was best for 3 reasons. First, it gave the best statistical fit to observed trial outcomes (although differences in goodness-of-fit between the hypotheses are small). Second, it is also biologically the most plausible one, because tumor bleeding resulting in (macroscopic) detection of blood in stool is often the symptom leading to clinical detection of CRC. Approximately 34% to 58% of CRC patients present with rectal bleeding.17-20 It is very plausible that occult bleeding precedes macroscopic bleeding and, thus, that sensitivity of FOBT depends on time to clinical diagnosis. Interestingly, the range of cancers that present with bleeding compares well with our sensitivity estimate of 51%. Third, this hypothesis is able to explain the discrepancy between the high FOBT sensitivity estimates based on trial results (54%-96%)5-7 and the low estimates based on back-to-back studies with colonoscopy (11%-50%).21-26 With a 1- to 2-year screening interval, trials mainly estimate sensitivity in the last phase of cancer progression, that is, the stage before diagnosis in the absence of screening. Our sensitivity estimate of 51% for this phase is in line with the individual estimates by the investigators of the Nottingham and Funen trials.5, 6 Colonoscopy is sensitive for all stages of CRC and showed that FOBT detects a much smaller proportion of all CRC. The weighted average of our sensitivity in stage of clinical diagnosis and our sensitivity in earlier stages of 32% is in line with that observation.
In all 3 trials, the observed stage distribution in repeat screening rounds is less favorable than the stage distribution in the first screening round, whereas for all 3 hypotheses, this is predicted to be the other way around. This discrepancy can be explained by assuming the presence of occult bleeding indolent cancers (ie, early-stage cancers never progressing or giving symptoms), especially in stage A. These indolent cancers would be detected during first screening, allowing for many early stage cancers in the first screening round. At consecutive screening rounds, these cancers would no longer be present, so that then fewer early-stage cancers are detected. This would be adding a considerable amount of length-biased sampling. With the current assumption of an exponential distribution, there already is a considerable variability in the duration of CRC and, therefore, amount of length-biased sampling accounted for in the model, but modeling indolent cancers would further increase length-biased sampling. This would potentially further improve the fit of the model, not only for the favorable stage distribution in first screenings but potentially also regarding the sensitivity of rehydrated FOBT. Currently, our estimate for rehydrated FOBT in stages before the stage of clinical detection is lower than for unrehydrated FOBT. Several studies have shown that rehydration of FOBT slides increases sensitivity.10, 27-30 Rehydration of FOBT slides was mainly done in the second phase of the Minnesota trial with only follow-up screening rounds. Because the modeled detection rates in follow-up rounds, and thus in this phase, are higher than observed, the estimated sensitivity for rehydrated FOBT needed to be low to compensate. With indolent cancers, the detection rates at consecutive screenings would be lower and, consequently, the estimate for rehydrated FOBT sensitivity higher.
Dividing FOBT sensitivity in a phase with low sensitivity and a phase with high sensitivity is a novel way of describing the occult blood detection process. Despite its plausibility, this hypothesis was never tested, maybe because it cannot be observed in studies (time of clinical manifestation of a disease is not known), or estimated through classic sensitivity estimation. With microsimulation, time of clinical manifestation is pseudo-observed and, therefore, sensitivity of the test can be varied accordingly. But up to now, microsimulation models have assigned a certain sensitivity of FOBT for preclinical CRC stages, regardless of when individual cancers become clinical.31 In these models, sensitivity was not varied at all between stages (our hypothesis A).
Our improved estimates can be used to better extrapolate the trial results to newer and more sensitive FOBTs, for which no randomized controlled trial results are available. Because these tests have higher sensitivity, one could argue that the screening interval could be lengthened with these tests. However, the mechanism of detection of occult blood is the same for these tests, so it is likely that these more sensitive tests are also mainly sensitive for lesions shortly before clinical diagnosis. Therefore, also with a higher sensitivity, it will remain important to screen with FOBT frequently. Our results also have implications for endoscopy screening. Although the attention of endoscopy is often on detection and treatment of precancerous adenomas, the effectiveness due to detection of cancers in an (very) early stage is stressed by this analysis. A longer preclinical CRC duration improves the efficacy of endoscopy screening. All together, the improved model will be more fitted to compare (newer) FOBT testing with endoscopy screening. To test the 6.7 years dwell time for preclinical cancer as estimated here, the CRC detection rates of endoscopy together with incidence in the control group are required.
In conclusion, the results of the Minnesota, Nottingham, and Funen trials were best explained by the hypothesis that FOBT becomes more sensitive shortly before clinical diagnosis. The total preclinical cancer duration was estimated to be as long as 6.7 years. FOBT has only 20% sensitivity for the majority of this period. Only for cancers in the stage in which the cancer would have been diagnosed in the absence of screening (on average the last 2.5 years before diagnosis), sensitivity becomes 50%.
Conflict of Interest Disclosures
Funding was received from the European Commission (99/CAN/36,898) and the National Cancer Institute (U01 CA97426). All authors declare to have no proprietary, financial, professional, or other personal interest of any nature in any product, service, and/or company that could be affected by the position presented in this manuscript. Rob Boer has participated since 1989 in the screening research group at the Department of Public Health of the Erasmus MC. He is affiliated with RAND since 2000. Since 2007, he is a Director of Evidence Based Strategies - Disease Modeling and Economic Evaluation at Pfizer Inc, which develops and sells various medicines for cancer and other diseases. This research and article were not funded or supported by Pfizer.