Tampere School of Public Health, University of Tampere, Tampere, Finland

Graduate Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan

Graduate Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Room 533, No. 17, Hsuchou Road, Taipei 100, Taiwan

As with wide-spread use of prostate cancer (Pca) screening with prostate-specific antigen testing, overdetection has increasingly gained attention. The authors aimed to estimate absolute risk of overdetection (RO) in Pca screening with various interscreening intervals and ages at start of screening. We estimated age-specific preclinical incidence rates (per 100,000 person-years) for progressive cancer (from 128 for age group 55–58 years to 774 for age group 67–71 years) and nonprogressive cancer (from 40 for age group 55–58 years to 66 for age group 67–71 years), the mean sojourn time (7.72 years) and the sensitivity (42.8% at first screen and 59.8% at the second screen) by using a multistep epidemiological model with data from the Finnish randomized controlled trial. The overall number of screens for overdetection (NSO) was 29 (95% confidence interval (CI): 18, 48) for screenees aged 55–67 years, equivalent to 3.4 (95% CI: 2.1, 5.7) overdetected Pcas per 100 screenees. The NSO decreased from 63 (95% CI: 37, 109) at the first screen to 29 (95% CI: 18, 48) at the third screen and from 43 (95% CI: 36, 52) for age 55 years to 25 (95% CI: 8, 75) at age 67 years at the first screen. In conclusion, around 3.4 cases for every 100 screened men would be overdetected during three screen rounds (∼ 13 years of follow-up) in the Finnish randomized controlled trial. Elucidating the absolute RO under various scenarios makes contribution for evaluating the benefit and harm of Pca screening.

As with wide-spread use of prostate cancer (Pca) screening with prostate-specific antigen (PSA) testing, overdetection has increasingly gained attention because treatment of indolent cases may do harm while providing little or no benefit.1 In autopsy studies on men who died without symptoms, 26–46% of men aged 50 years or older had latent Pcas.2–5 In the United States, the lifetime risk of developing a clinical Pca is ∼ 16%, whereas the lifetime risk of Pca death is only 3%.6 This suggests that a large proportion of diagnosed Pca would not have progressed to advanced stage during one's lifetime in the absence of screening.

Previous epidemiological studies suggest that 18–84% of screen-detected Pcas can be regarded as overdetection.7–16 In addition to the definition of overdetection,17 such a large variation across studies is not only subjected to individual temporal disease natural history and the sensitivity of screening tool but also affected by screening policies on age at start and termination of screening and interscreening interval. Very few studies adopted a quantitative epidemiological model to deal with the issue of overdetection by first estimating the parameters pertaining to inherent disease natural history and the detectability of screening and then by assessing their influences on the overdetection given the determined screening policy by making use of information on various detection methods on Pca obtained from a population-based randomized controlled trial design, such as two main series of population-based randomized controlled trials, European Randomised Study of Screening for Prostate Cancer (ERSPC) and Prostate, Lung, Colon, and Ovarian (PLCO) Cancer Screening Trial.18, 19 In our study, we first estimated the transition parameters governing the disease natural course of Pca and those related to the sensitivity of PSA test on which we were based to develop a quantitative measure, number of screens for overdetection (NSO), for estimating the absolute risk of harm caused by the overdetection of Pca through screening. We used it to quantify the effect of age at start of screening and interscreening interval on overdetection making allowance for lead-time related to disease natural history and sensitivity estimated from empirical data.

Methods

Study subjects

The data for estimating overdetection were obtained from the Finnish prostate cancer screening trial, which is the largest component of the ERSPC. During 1996–1999, 80,458 men aged 55, 59, 63 and 67 years were enrolled in a randomized controlled trial with 32,000 randomly allocated to the screen arm. Men in the screen arm were invited to screen up to three times with a 4-year interval until age 71 years. The remaining 48,458 men comprised the control group. In the screen arm, men with serum PSA of 4 ng/ml or higher were referred to diagnostic examinations consisting of a digital rectal examination, transrectal ultrasound and prostate biopsy. For men with serum PSA 3.0–3.9 ng/ml, an ancillary test was provided (digital rectal examination in 1996–1998 and a free/total PSA ratio with a cutoff of ≤0.16 from 1999 onward), and those with a suspicious finding were referred to diagnostic evaluation. The design and preliminary screening findings have been reported in detail elsewhere.17

In our study, we followed the men in the screen arm until the end of the second screen or the end of December 2005. Men in the control arm were followed until the end of the fourth year after randomization. The shorter follow-up for the control arm was chosen to minimize the extent of contamination, i.e., unorganized screening in the control arm. Follow-up in both groups ended at the common closing date, diagnosis as Pca, death or emigration, whichever came first. Table 1 shows the descriptive findings of 20,796 screenees from the first screen and subsequent screens and 48,285 participants from the control arm. A total of 290 interval cancers (defined as those among men with previous negative screening test and those among men with a positive screening test but detected more than 1 year from the biopsies) were identified by record linkage with the Finnish Cancer Registry during the first and the second interscreening intervals. An additional 286 Pcas were identified among the 8,707 men who had not been screened at all and 76 among 3,095 men who attended the first screening but refused the second round. In the control arm, 893 Pcas occurred during the first 4-year period after randomization.

Table 1. The number of subjects in each detection mode in the Finnish Prostate Cancer Screening Trial, 1996–2005

Temporal natural history model for defining progressive and nonprogressive Pca and assumptions

We used two random variables, X and Y, to describe age of entering into the preclinical detectable phase (PCDP) and interval between the time entering into the PCDP and the time surfacing to the clinical phase (CP) of Pca (Fig. 1a) to define progressive and nonprogressive Pca used for the following analysis. Age of entering into the PCDP (X) is quantified by the incidence rates of progressive Pca (λ_{1}) and nonprogressive Pca (λ_{3}) (Fig. 1b). The sojourn time (Y) is subjected to the transition rate from the PCDP to the CP (λ_{2} in Fig. 1b). We define progressive and nonprogressive Pca by using whether Y is infinite (∞). For those with potential to progress, the sojourn time distribution is finite, i.e., P (Y < ∞|X < ∞). For those without potential to progress (indolent Pca), the sojourn time distribution is infinite, i.e., P (Y = ∞|X < ∞). Screen-detected Pca through PSA screening consists of both progressive and nonprogressive Pca using such a definition.

There are several assumptions pertaining to the construction of temporal natural history model for progressive and nonprogressive Pca as defined above. First, we assume that the value of X must be finite (<∞). Those who died before entering the PCDP are defined as nonsusceptible subjects that are distinct from nonprogressive Pca. The temporal natural history model for the preclinical incidence rates of progressive and nonprogressive Pca would be conditional on those surviving to age of entering the PCDP. These nonsusceptible subjects have been considered in the following likelihood function for those who were lost to follow-up due to competing causes of death (see the part I of parameter estimation of Supporting Information). The risk for nonprogressive Pca will be predicted by using a five-state Markov model with the incorporation of information on competing causes of death (see below). Second, we assume that the date of death for those who are potential of being progressive Pca and were detected in the PCDP but died from other causes before surfacing to the CP as noninformative censorship on the sojourn time distribution. Namely, the definition of overdetection in our study not only includes those without potential of progressing to the CP (nonprogressive Pca) but also covers those who are potential of being progressive Pca but died from other causes of death before surfacing to the CP assuming their progression on disease natural history was censored and independent of the sojourn time. These noninformative censored cases have been considered in the likelihood function (see the part I of parameter estimation of Supporting Information). Third, we assume that a nonprogressive Pca would not be possible to become progressive Pca. From biological viewpoint, this assumption is not unreasonable. However, even this phenomenon is possible, it is unlikely to estimate such a hidden process with the current empirical screening data on PSA test. In the light of this definition, we assume screen-detected modes consisted of both progressive Pca and nonprogressive Pca, whereas other detection methods (including interval cancers, cancers from nonparticipants and the control group) only identified progressive Pca.

We further depict how the definition of progressive and nonprogressive Pca shown in Figure 1a relates to the temporal natural history of Pca together with the sensitivity of PSA test as shown in Figure 1b into which various detection methods can be superimposed (including screen-detected cancers, interval cancers, cancers from the nonparticipant group and cancers from the control arm in the light of Finnish population-based randomized controlled trial as mentioned above). The middle box pertains to the temporal disease natural history including the two-step progression for progressive Pca, time (age) to enter into the PCDP (i.e., X in Fig. 1a) quantified by the incidence rate (λ_{1}) and the subsequent progression to the CP quantified by annual transition rate (λ_{2}) and one-step transition of the time (age) to enter into the PCDP for nonprogressive Pca (λ_{3}). The inverse of λ_{2} is called mean sojourn time (MST), the expected value of Y in Figure 1a for progressive Pca only. It should be noted that such natural pathways are hidden process and cannot be observed in the absence of PSA screening. By the application of PSA test, both progressive and nonprogressive Pca at the PCDP could be detected. PSA-detected normal in upper panel of Figure 1b includes three latent states, truly free of Pca, false-negative nonprogressive and false-negative progressive Pca cases, that would either stay at the PCDP waiting for being detected until the next round of screen (subsequent screen pathway) or surface to the CP as interval cancer before the next screen (interval cancer pathway) for progressive Pca. In addition to PSA-tested normal, the observed data (in rectangular box) at the bottom of Figure 1b are classified by two major detection methods, screen-detected Pca and clinically detected Pca. The former can be further divided into Pca detected at first screen (prevalent screen-detected cases) and those detected at later screen (subsequent screen-detected cases). As both include progressive and nonprogressive screen-detected Pca cases their information makes contribution to the estimation of preclinical incidence rates of progressive and nonprogressive Pca. The latter include interval cancer, cancers from nonparticipants and cancers from the control arm. As seen in Figure 1b, because cancers detected through these three detection modes are mainly due to clinical signs and symptoms they follow the pathway of progressive Pca as the assumption made above. Furthermore, because the interval cancer consists of false-negative cases and newly diagnosed Pca after screening this mode in conjunction with screen-detected Pca cases plays a crucial role in estimating annual transition rate from the PCDP to the CP (λ_{2}), the inverse of MST as mentioned above. Cancers from nonparticipants render the estimates of preclinical incidence rates and the MST less vulnerable to selection bias that is often caused by using the data of attendees only. Because of the randomization design, data on the control group are representative of progressive Pca in the underlying population and provide an unbiased estimate of preclinical incidence rate of Pca (λ_{1}) by directly computing the incident Pca divided by person-years assuming preclinical incidence rate would be equal to the clinical incidence rate in the absence of overdetection as in the control group (see the bottom panel of Table 1). In addition to the parameters of disease natural history, the detection rates are also determined by the sensitivity of PSA test. We treated those with positive results of PSA test but without subsequent biopsy by either surfacing to the CP or returning to the next screen as normal. The former would be modeled as interval cancer through the parameters of false negative, and the latter would be dealt with normal state at next screen.

Derivation of likelihood function

The formulation of likelihood functions was derived in the way similar to Chen et al.'s method.20 It can be clearly seen from Figure 1b that data used for formulating the following likelihood function based on repeated periodical screen are characterized by multivariate joint transition history. Suppose an individual has screening history denoted by (state, transition time) like the following: [(free of Pca (state 1), age 45 years at the first screen), (free of Pca (state 1), 4 years between the first and the second screen) and (interval cancers (state 3), 18 months after the second screen)]. To deal with the repeated and correlated data resulting from the periodical screen, we applied the Markov (memoryless) property that assumes the occurrence of interval cancer in this case is only dependent on the previous state at the second screen but independent of state at the first screen to simplifying the joint likelihood function by decomposing the multivariate joint probability of transition history, i.e. [(1→ 1, 45 years), (1→ 1, 4 years) and (1→ 3, 18 months)], into a product of three respective likelihood functions, i.e., probability (1→ 1, 45 years) × probability (1→ 1, 4 years) × probability (1→ 3, 18 months). The correlation between the preclinical incidence rate of (λ_{1}) (duration of free of Pca) and the MST was captured by the simultaneous estimation of λ_{1} and λ_{2}, the inverse of the MST. Because of this Markov property, an exponential distribution was applied to the duration of staying at each state. However, as such an assumption may be inappropriate for the real scenario of increasing trend of preclinical progressive and nonprogressive Pca, the piecewise (age-specific) estimates (e.g., nonhomogeneous process, see the part I of parameter estimation of Supporting Information) were therefore proposed for relaxing this assumption of constant rate for preclinical incidence rates (λ_{1} and λ_{3}). The detailed algebra for the likelihood function based on different detection modes is given in the part I of parameter estimation of Supporting Information. Note that we illustrate the formulae in a manner of periodical screening to make the expression explicit. Because our model is a continuous-time Markov model, the transition probability, P_{ij}(t), is a function of time, say t. It is flexible for our model to deal with irregular interscreening interval by relaxing the assumption using a random variable, m_{r}, to denote the irregular interscreening interval between rth and (r + 1)th screens.

Statistical analysis

The parameters of the temporal natural history model were estimated by constructing the likelihood functions based on the transition probabilities and data on various detection methods as illustrated in Figure 1b. The detailed algebra is given in the part I of parameter estimation of Supporting Information. The E-M algorithm method was further adopted to get the converged estimates and standard errors after repeated iterations between the E-step based on data from the control arm and the M-step based on data from the screened group. The detailed descriptions on E-M algorithm method are also given in the part I of parameter estimation of Supporting Information. The corresponding standard error and 95% confidence interval (CI) for each parameter were derived from the inverse of the second derivative of the likelihood function given the maximum likelihood estimates.

On the basis of the estimated results, we predicted the risk of overdetection (RO), that is the chance that a person will be overdetected as a Pca in a given period, making allowance for the possibility that the person will die of competing causes of deaths other than Pca before being overdetected by screening (e.g., nonsusceptible subjects) based on a five-state Markov model [the Eq. (26) in the part II of RO of Supporting Information] by borrowing the transition rate of death from Statistics Finland. The extent of overdetection due to various screening policies with different ages at entry and interscreening intervals was evaluated by RO and the NSO, the inverse of RO. The NSO indicates the expected number of men screened to result in overdetection of one case, similar to the concept of number needed to screen to avert a cancer death as well as the number needed to harm, which represents the number of subjects needed to be exposed to cause harm in one subject.21 The detailed algebra for NSO and RO is given in the part II of RO of Supporting Information. As there is no direct algebra for calculating variance of NSO, 95% confidence intervals of NSO were obtained by using jackknife bootstrap method for calculating each NSO with each set of estimated parameter by deleting one case at each estimation.22

Results

Table 1 shows number of person-years, Pca and empirical age-specific preclinical incidence rate of progressive Pca using the control group (the bottom panel). The estimated annual incidence rate of preclinical progressive cancer increased with age at first screen from 128 (95% CI: 99, 156) per 100,000 for age group 55–58 years to 774 (95% CI: 466, 1221) per 100,000 for age group 67–71 years, and the incidence rate of preclinical nonprogressive cancer increased from 40 (95% CI: 27, 54) per 100,000 for age group 55–58 years to 66 (95% CI: 30, 130) per 100,000 for age group 67–71 years (Table 2). The transition rate from the preclinical progressive Pca to the CP was estimated as 0.1295 (95% CI: 0.0931, 0.1660) per year, corresponding to a MST of 7.72 (95% CI: 6.03, 10.74) years for progressive Pca (after excluding the nonprogressive cases). The episode sensitivity was estimated as 42.8% (95% CI: 35.1, 50.6%) for the first screen and 59.8% (95% CI: 47.5, 72.0%) for the second screen.

Table 2. Estimated results of preclinical incidence of progressive and nonprogressive prostate cancer, transition rates and sensitivity

Based on the above estimates, the predicted long-term risks of Pca in the screen arm by age at the first screen with and without the overdetected cases are shown in Figures 2a–2d. The risk of Pca increased sharply with every screen round (those first screened at 67 years who had only two rounds; Fig. 2d). During the screening intervals, the risk slowly increased as interval cancers occurred after each round. Table 3 shows that the overall RO (shown as the difference between the two curves in Figs. 2a–2d) was 3.4% (95% CI: 2.1, 5.7%) and increased with each screening round, implying that the overdetection occurs not only at the first screen but also at the subsequent screens. The NSO estimates for the overall group aged between 55 and 67 years following one, two and three rounds of screens were 63 (95% CI: 37, 109), 34 (95% CI: 20, 58) and 29 (95% CI: 18, 48), respectively. The effect of age at entry on NSO is comparable to that of the screen round (Table 3). For men aged 55 years at entry, the NSO estimates at the first, the second and the third screen were 104 (95% CI: 87, 124), 54 (95% CI: 46, 65) and 43 (95% CI: 36, 52), respectively, compared with 61 (95% CI: 40, 93) for men aged 59 years at first screen and 46 (95% CI: 23, 91) for those aged 63 years at entry.

Table 3. Risk of overdetection (RO) and number of screens for overdetection (NSO) in the Finnish Prostate Cancer Screening Trial, 1996–2005

As far as the adequacy of model assumption is concerned, the Pearson chi-square for the goodness-of-fit test was 67.2 with 61 degrees of freedom by various detection modes, indicating an adequate fit of the model (p = 0.27). The predicted cumulative risks of Pca in the screen arm and the control arm were also very close to the observed ones (Fig. 2e), further indicating a good fit of the model to the data from the Finnish randomized controlled trial.

For a single screen, NSO decreased by half from 104 (95% CI: 87, 124) for those starting at 55 years of age to 48 (95% CI: 25, 92) for those starting at 60 years of age (Table 4). However, the difference between starting at age 60 and 65 years was small. Similarly, different interscreening intervals did not substantially affect the extent of overdetection. The NSO estimates were similar for 1- and 2-year interscreening interval, for example, 33 (95% CI: 26, 41) and 33 (95% CI: 27, 41) for starting screening at 55 years and stopping at 71 years of age. However, for longer interscreening intervals (4 and 8 years), the change of NSO estimates was more remarkable with an increase by a fifth (14–23%) from a 4-year to an 8-year interscreening interval.

Table 4. Risk of overdetection (RO) and number of screens for overdetection (NSO) by the combination of age of starting screening and interscreening interval

Discussion

By the application of a multistate epidemiological model to data on Finnish population-based randomized controlled trial, we first estimated the parameters governing the disease natural course for progressive and nonprogressive Pca and those relate to the sensitivity of PSA test on which we are based to propose an indicator, NSO, to quantify the risk of overdiagnosis as a result of screening. Overall, the NSO after three screen rounds was estimated as 29 (95% CI: 18, 48), i.e., for every 29 men screened one Pca case is detected that would not have been diagnosed in the absence of screening, ∼ 3.4% (95% CI: 2.1%, 5.7%) of the men screened with the PSA test would have Pca diagnosis attributable to overdetection due to a screening program with two or three rounds of screens. The NSO decreased (indicating more overdiagnosis) with age at the first screen. When comparing different hypothetical screening strategies, overdetection also increased when interscreening interval was shortened from 8 to 4 years and also from starting screening at 55 years of age to 60 years. However, overdetection increased only slightly when interscreening interval was less than 2 years and screening starting after age 60 years.

Although overdetection in Pca screening has been investigated in several earlier studies,7–16, 23 the NSO has been rarely addressed previously. Table 5 (Supporting Information) presents the results of NSO for previous studies. A high correlation between the NSO and the detection rate was found in these studies.7, 9, 10 Moreover, previous studies show a wide variation in detection rates from 0.5% in Italy7 to 7.5% for whites and 24.3% for blacks in the United States.9 The detection rate was affected by Pca incidence in the target population, age at screening and the characteristic of screening program including test sensitivity,24 biopsy procedures,25, 26 interscreening interval and screen rounds. This implies that the NSO is correlated with these characteristics.

Our analyses also demonstrate an association of NSO with age, interscreening interval and number of screen rounds. The lower NSO found in older subjects is mainly attributed to the gradual accumulation of nonprogressive Pca, which leads to a substantial pool of slowly growing tumors at older age. These findings were consistent with the previous studies.7, 10, 11, 14 The NSO also decreases with shorter interscreening interval, particularly when 8 years is shortened to 4 years, but less strongly when the interscreening interval is shorter than 2 years. This is related to the MST for progressive Pca and sensitivity for detecting both progressive and nonprogressive types.

As sensitivity might depend on the length of stay because as the tumor grows up it generates more PSA and later clinical symptoms, we applied different sensitivity estimates to prevalence and subsequent screens. Moreover, sensitivity has also been considered as a function of age in our study, but the difference is not significant (data not shown). In our study, we considered the sensitivity referring to the full diagnostic process (both screening test and diagnostic assessment), i.e., episode sensitivity.27 Our estimate of sensitivity for the first screen (42.8%) was close to the estimated sensitivity of 48% from Hakama et al.27 Moreover, the impact of sensitivity on overdetection was also examined (results not shown). By comparing the studies conducted by Auvinen et al.28 and Gann et al.,29 the test sensitivity of PSA testing decreased from 87 to 73% after removing the ancillary test for PSA values of 3–3.9 ng/ml, corresponding to a 16% decrease in sensitivity. As the sensitivity decreased by 16%, the NSO slightly increased from 35 to 37.

The correlation between the incidence rate of the PCDP (λ_{1}) and transition rate from the PCDP to the CP (λ_{2}) was considered to adapt the fact that early onset Pca tends to be more aggressive because the two parameters were estimated simultaneously. A positive correlation between estimated λ_{1} and λ_{2} was observed (data not shown). It should be noted that the Markov model used in our study was to simplify the joint likelihood function, rather than to assume the independence between the two parameters.

There is one possible limitation of our model. Because of the inherent Markov property, we assume that the transition rates are constant with time, and age effect is therefore incorporated into proportional hazards regression form with a time-independent covariate. As age changes with time, the ideal way is to model the effect of age with a time-dependent covariate. Instead, we approximated such a time-dependent age effect by using piece-wise estimates of age-specific preclinical incidence rate in 4-year age band (see the part I of parameter estimation of Supporting Information) to deal with this issue because information on the effect of age is subjected to the 4 years of interscreening interval. This means when age changes from the ith age band at first screen to the jth age band, the corresponding age-specific preclinical incidence rates were applied in the light of proportional hazards regression model form of the expressions of (2) and (3) (see the part I of parameter estimation of Supporting Information). We believe that such approximation with the constant piecewise transition rate assumption on age effect may be reasonable because the duration of the trial is short compared to the time frame of the age effect on the natural history.

In conclusion, by using NSO calculated by transition parameters of disease natural course and the sensitivity of PSA test estimated from Finnish randomized controlled trial data, we estimated that, for every 100 screened men, 3.4 cases would be overdetected during three screen rounds (∼ 13 years of follow-up) in the Finnish trial. The impact of overdetection related to age at start of screening, rounds of screen and interscreening interval was also quantified. The method and findings on overdetection are informative to evaluation of benefits and harms of PSA screening for screening policy makers.

Acknowledgements

The funders had no role in the conduct of the study; collection, management, analysis or interpretation of the data or preparation, review or approval of the manuscript.