Prostate cancer in most Western countries is now the most frequent cause of cancer death in men after lung cancer. Studies have shown that confined prostate cancer can be eradicated by either radiotherapy or radical prostatectomy or that in some instances watchful waiting may be offered.1 To get an answer as to whether screening for prostate cancer followed by appropriate treatment decreases prostate cancer-specific mortality, the European Randomised Screening for Prostate Cancer (ERSPC) trial was started.2 In the period from 1992 until now, 8 centres in Europe (Finland, The Netherlands, Sweden, Italy, Spain, Belgium, Switzerland and France) are recruiting men aged 50–74 years. In the same period in the United States, a trial with similar objectives has been initiated, the Prostate, Lung, Colorectal and Ovary cancer screening (PLCO) trial.3 Initiatives in the United Kingdom have recently been stopped.
Before starting the ERSPC trial, sample size calculations were performed to estimate how many men were needed to be included in the study to prove an expected reduction in prostate cancer mortality due to screening. The target value of the power was 80–90%. For these calculations, assumptions had to be made about compliance, contamination as well as intervention effect.4, 5 Important specific situations in the ERSPC are the mix of screening centres in different countries with different underlying risks for prostate cancer and differences in trial design.
To date, the screening centres have enrolled about 90% of the target number of subjects. It is now possible to estimate the expected power of the trial more accurately. Since prostate cancer mortality rates are decreasing in several countries,6–8 the power of the ongoing randomised trials, the differences they can show and at what point in time seem crucial. Prostate-specific antigen (PSA) testing is increasingly being done and even advocated by some groups based on interpretation of observational, nonrandomised studies. Lack of knowledge whether and when the randomised trials can give the answer is partially responsible for this situation. We present information that may justify a restrictive attitude in coming years. The strength of pooling data from the ERSPC and PLCO trials is also quantified, as well as the role a possible United Kingdom trial could have had in the short run.
MATERIAL AND METHODS
The design and outline of planned evaluations of the PLCO and ERSPC trials have been previously described.2, 3 The PLCO trial has a strict joint protocol that is followed by all centres, while participants in the ERSPC have agreed on a common core protocol, which allows for variability in the target population and screening procedure. The core age group that is targeted by nearly all countries is 55–69 years old; the Swedish centre included men aged 51–66 at the time of the first invitation and the Finnish centres recruit men aged 55, 59, 63 or 67 at the time of the first invitation.
All centres represent either a randomised population-based or a volunteer-based sample. Randomised trials inviting participants to be screened are often population-based trials, which make use of population registers that identify the eligible trial population by age. Those eligible people are subsequently randomised to either the screening or the control arm of the trial (Finland, Italy, Sweden, France).
In trials based on volunteers, volunteers are required to give their informed consent before randomisation. Such trials are opted for in cases where it is deemed unethical for an individual to be randomised and/or placed in a control group without their consent (United States, The Netherlands), where it is likely that only a small proportion of those who would be invited for screening would comply with the invitation (Belgium, Spain, Switzerland) or in the event that a population register is lacking (United States). Although the volunteers may be biased with respect to the presence of prostate cancer and therefore resulting data may not be generalizable to the whole population, they may be considered representative of those who readily participate in screening. In the volunteer-based approach used at various European centres, all those eligible in the population registers are invited to volunteer. Local and/or national policies determined whether it is necessary to obtain informed consent. All centres apply computerised individual randomisation and have been approved by a local institutional (medical ethics) review board.
The numbers of subjects recruited to the trial centres in the age group 55–69 years, sorted by year and by 5-year age groups separately, were obtained from the screening centres (Table I). The Swiss and French centres will finish recruitment relatively late, so we used target recruitment numbers for these. For each country, we collected national age-specific mortality rates for the ages 55–85 years for the 3 years before the start of the trial in that country (specific country/year data can be obtained from the authors).9 If 5-year age-specific mortality rates were not available in the age groups above 74 years, mortality rates for the whole 74+ group were used and corrected to 5-year age-specific mortality rates by using the Dutch proportions in mortality rates for the different age groups.
Table I. Number of Subjects Enrolled in Screen (S) and Control (C) Group in Each Country Since the Start of the ERSPC Trial (Ages 55–69 at Entry)
Actually 31-12-1994;—2Men eligible at time of screening;—3Targeted recruitment.
The calculation consisted of 3 parts. First, we calculated the expected number of prostate cancer deaths in our study. Next, we tested whether the differences between study and control group in number of expected prostate cancer deaths was significantly different, assuming a prostate cancer mortality reduction of 20% (baseline). Last, sensitivity analyses were performed on mortality reduction, compliance and contamination rate assumptions.
The starting population is the number of subjects enrolled in the screen and control groups during the period 1992–2002. We assumed the age distribution to be constant in each recruitment year. This population will be followed until 2008 (10-year follow-up since 1999, 1st screens completed of initial screening centres). During follow-up, the men will get older and die from other causes. To estimate how many subjects will be alive in the trial population after each year of follow-up, we applied the Dutch population life table for people aged 55–90 years on our starting population.10
By multiplying the number of subjects present in each year with the prostate cancer mortality rates for the specific countries, the expected number of prostate cancer deaths in each centre is calculated. However, since men with prostate cancer diagnosed before entry of the trial are excluded from our study, the expected prostate cancer deaths in each centre will be overestimated by this method. To correct for this, we made use of the fact that prostate cancer mortality in a certain year can be calculated from incidence and survival rates of men with prostate cancer in the previous 25 years. We calculated for each year which part of mortality is caused by men with prostate cancer diagnosed before entry of the trial and which part is caused by newly diagnosed cases after the start of the trial (Fig. 1).
Ten- or 5-year age disease-specific survival rates of prostate cancer were obtained from a Dutch Cancer Registry in The Netherlands, Norway and SEER (Surveillance, Epidemiology and End Results) data and 5-year age-specific incidence rates from The Netherlands Cancer Registry.11
In year 1, 16% of prostate cancer mortality is caused by new prostate cancer cases. In the next 24 years, this percentage increased gradually until it reaches 100% in year 25. The largest increase was seen in the first 10 years. For each country, we applied these correction percentages to the previously calculated number of cancer deaths in each year.
In our baseline assumptions, we used the compliance values as reported in the different centres: The Netherlands, 95%; Finland, 70%; Sweden, 60%; Italy, 70%; Belgium, 90%; Spain, 50%; Switzerland, 90%; France, 50%. We assumed a 20% PC mortality reduction for r. In an ideal situation, people will not be screened for prostate cancer in the control group. The proportion of people in the control group, who are nevertheless screened for prostate cancer, is called the contamination percentage. We assumed a 20% contamination percentage for all countries.
The power (1-β) of our study is the probability of detecting a statistically significant difference in prostate cancer mortality if our assumptions on the intervention effect would be true. We calculated the power by using a 1-sided significance test with significance level α = 0.055 (see Appendix).
We calculated the power with different intervention effect assumptions from 20–50% and with different contamination percentages from 0–40%.
We used Dutch data on population distribution to predict the decrease of the study group population by deaths from all causes during follow-up. However, the population distribution in other countries might be different than in The Netherlands. To test if this would influence our results, we calculated the power as if hazard of mortality from all causes would be 50% lower than when using Dutch data.
At the time these calculations were made, a new trial was being planned in the United Kingdom. The effect on the power of inclusion of the PLCO as well as the UK trial was estimated. Recruitment data from the PLCO trial was collected from the screening centres (40,000 subjects randomised 1:1 from 1992–1999). For the UK, expected numbers were 180,000 subjects randomised 1:2 between 2002 and 2003. For the UK trial, a compliance rate of 60% and a contamination rate of 20% were assumed. For the PLCO trial, we used 3 sets of assumptions. First, the contamination and mortality reduction assumptions from the ERSPC trial and a 95% compliance were used. A second set of assumptions considered contamination to be increased to 30% and the underlying prostate cancer mortality rates to be 50% lower to correct for the fact that a substantial proportion of people had already had PSA testing before entering the trial,3 thereby possibly lowering the underlying risk of prostate cancer (death) in the trial cohort. A third assumption related to higher intervention effects due to annual screening in the PLCO trial instead of 2-/4-yearly screening in the ERSPC trial (40% effect).
The total number of subjects randomised into the ERSPC trial until 2001 is 163,126 and is anticipated to be approximately 90,450 for the screen group and 99,150 for the control group (ages 55–69) (Table I). Not all centres started simultaneously. In Sweden, all subjects were randomised in 1994, whereas Finland and the Netherlands recruited most subjects in the period after 1994.
Figure 2 shows the power of the ERSPC trial for respective years. If a 20% mortality reduction and a contamination rate of 20% are assumed, the chance of getting results that show a significant decrease in prostate cancer (PC) mortality is 69% in 2008. Assuming a 25% intervention effect will result in an 86% power in 2008. With a 40% mortality reduction, a power of 90% will already be reached in 2003–2004.
Table II shows the results of the sensitivity analyses. With a 10% contamination percentage, the power increased from 69% to 81% (baseline assumption 20% reduction). A contamination rate of 40% resulted in a decrease of the power to 40%. The effect of 50% lower all-cause mortality on the power was small: an increase to 75% (baseline).
Table II. Effect of Different Contamination Rates and Assumed PC Mortality Reduction Due to Screening on the Power of the ERSPC Trial in the Year 2008
PC mortality reduction
With inclusion of the PLCO trial, the power increased from 69% to 89% in 2008 (when using ERSPC assumptions) (Fig. 3). With probably more realistic assumptions about contamination and prostate cancer mortality in the PLCO trial, combined analyses of the trials increases the power to 79% in 2008. The PLCO trial, however, has annual screens so that higher powers may be reached of 87% (30% effect) to 92% (40% effect). Inclusion of the UK trial would not have added much to the expected power in 2008.
Table III shows, with 2 ERSPC centres as an example, that pooling data from centres with compliance rates lower than 46–52% is expected to decrease the power of the trial as a whole.
Table III. Effect of Different Compliance Rates in the Study Arm in 2 ERSPC Centres on the Power of the ERSPC Trial in the Year 2008
ERSPC without Netherlands
ERSPC without France
We have estimated what contribution the 2 ongoing large-scale randomised controlled trials of screening for prostate cancer may have in proving a PC mortality reduction using the number of subjects enrolled and compliance and contamination rates observed thus far. A reasonable power is achieved if screening for PC will lead to a 25% PC mortality reduction or if contamination is limited to 10%.
Is it reasonable to assume a higher prostate cancer mortality reduction than 20%? The figure of 20% that was chosen as baseline assumption for the potential mortality reduction through screening was based on the meta-analysis presented by Adolfsson et al.12 Recent literature, however, suggests that higher mortality reductions than 20% are not unlikely. In the United States, the first country where PSA testing for early detection of prostate cancer was introduced, PC mortality has declined from 1991–1996.13 In Canada, age-standardised prostate cancer mortality rates declined by 23% from 1991–1997.7 Observational data on population-based PSA testing in Innsbruck also suggest a higher prostate cancer mortality decline of 42%.8 Not all these estimates provide evidence about the exact impact screening might have but do not refute the possibility of a relatively large effect.
Different assumptions on contamination had an important effect on power as well. The exact rate of contamination with PSA testing in the control group is not yet entirely known. In initial sample size considerations, a value of 10–15% was assumed. However, since the rate of PSA testing in the general population is increasing,14, 15 we assumed a rate of 20% for our baseline calculation. Contamination rates higher than 20% will result in a substantial decrease of the power, although we think this is more unlikely in the European setting than in the United States. PSA testing in the control arm as such may not satisfy a proper definition of “contamination” if abnormal values are not followed by appropriate biopsy.
It is therefore possible that the effect of contamination is overestimated in our model. It may be incorrect to assume that PSA testing outside the trial will have the same effect on PC mortality reduction as the regular PSA testing in the screen population. The number of cancers detected by opportunistic screening is the decisive parameter and it may be lower than in the trial. In the trial, a strict protocol is used to yield an optimal screening effect, while in a general practitioner practice less strict guidelines are used with often higher cut-off levels.
We kept mortality reduction constant for each follow-up year. It has been observed in breast cancer screening that the positive effects of screening do not occur immediately.16, 17 In the first year of prostate cancer screening, direct treatment mortality will have a relatively large negative effect, whereas the positive effects of screening will occur later during the follow-up period. Although it seems more appropriate to use specific mortality reductions for each follow-up year, this assumption does not have a large effect on power. In the first years after randomisation, prostate cancer mortality without intervention is very low.
As people need to give their written consent for participation in some centres of the trial, selection may occur. People may choose to be screened because they have symptoms or risk factors of prostate cancer. It is also possible that men who give their consent care more about their health.18 This would result in a study group that is healthier than the normal population. Since it is unknown to which extent both forms of bias could influence the power, we did not make assumptions about this in our calculation.
The decline in prostate cancer mortality observed in several countries may partly be due to improvements in therapy. If so, this may impact the number of PC deaths in both trial arms in the ERSPC trial, but we feel it is not likely to introduce any substantial bias in our findings. A much more important question is 1-sided vs. 2-sided testing;5 we have used 1-sided testing, as agreed upon by the Data Monitoring Committee of the ERSPC trial for power calculation.
Power increases if data from the PLCO trial are pooled with those of the ERSPC trial. The contribution of a new trial to the power would be small before 2008. Although a cooperation with the PLCO trial will have a positive effect on the power, it is important to consider if it is possible and advisable to combine the 2 trials, which follow different protocols. Criteria should be, e.g., correct randomisation, effective follow-up of positive screening results, complete follow-up and review of deaths. A meta-analysis (overview analysis) may be a second best option. The PLCO trial is based on a sample size calculation in 1994.19 It was established that, to prove a 20% PC mortality reduction, the PLCO trial could reach a power of 0.90 with 74,000 subjects. However, this calculation was based on a higher age group than finally implemented (60–74 vs. 55–74), which overestimates the power. In addition, instead of calculating the effect of compliance and contamination separately, the assumption was made that the mortality reduction of 20% would already include the effects of compliance and contamination rates. In fact, an intervention effect of 34% was assumed. We anticipate a lower underlying risk of dying from prostate cancer in enrolled men, e.g., due to previous PSA testing and a substantially higher contamination rate in the PLCO trial than in the ERSPC trial. Data have become available from the small, randomised Quebec trial on prostate cancer screening,20 boasting approximately 31,000 men in the intervention arm and 15,000 in the control arm. Given the sample size calculations presented here, this trial can be seen as an informative but low-power trial. It is a randomised population-based trial with only a 23% acceptance rate of screens in the intervention arm (total 7,155 screens). The intention-to-screen-analysis revealed no statistically significant result on prostate cancer mortality.21 We show that pooling with centres that have less than roughly 50% compliance in the study arm decreases the power to show an effect.
We here refrain from the discussion as to whether a 25% PC mortality reduction may eventually be large enough to consider a routine screening programme. In breast cancer screening, it is,22 but prostate cancer incidence is approximately half and unfavourable side effects of primary treatment and the percentage of screen-detected patients not benefiting are likely to be much higher in prostate cancer screening. In that sense, a smaller difference in PC mortality between the 2 arms in 2008 may be quite conclusive not to implement nationwide screening.
With current numbers of subjects enrolled, the ERSPC trial has sufficient power to detect a significant difference in prostate cancer mortality between the 2 arms if the true reduction in mortality by screening is 25% or more or if contamination remains limited to 10% if the true effect is 20% or more. These analyses will form the basis for interim evaluations by the Data Monitoring Committee of the ERSPC. Interim data reports, however, pose the problem of repeated significance testing. With successive repeated significance tests at the usual 0.05 significance level, the likelihood of a type I (false-positive) error increases. An obvious solution would be to require a smaller p-value for statistical significance at each interim test and/or to limit the total number of interim testing. This is presently being discussed, with a proper design.
As long as its benefit has not been established, PSA testing should not be disseminated through the control group of the trial. It would also cause serious problems for the power. Studies to determine the actual contamination rates are crucial and ongoing. If requirements for combining trials are met, a cooperation with the PLCO trial is recommended to improve power, so that the benefits of prostate cancer screening can be proven, possibly even earlier in time (2002–2006). Essential in pooling is that program performances of individual centres are adequate and comparable enough.23
If one believes PSA testing and early treatment is very effective and primarily responsible for decreases in prostate cancer mortality in some countries, the ERSPC trial is likely to conclusively show that within the next 5 years.
If one questions the efficacy of screening for prostate cancer, the 2 large-scale randomised controlled trials may have the power to show a (lower) effect around 2008.
We thank the principal investigators and data managers from the 8 individual screening centres in Europe, in particular V. Nelen, M. Roobol, J. Hugosson, A. Auvinen, S. Ciatto, A. Berengaer Sanchez, M. Kwiatkowski, A. Villers and Dr. Ph. Prorok (PLCO trial) for providing data.
We first calculated the expected prostate cancer deaths in the screen arms of all 8 ERSPC centres as if no screening intervention would take place. The outcome for all centres together is called Ds0 (number of prostate cancer deaths in screen group without intervention). To calculate the expected numbers of deaths in the control group (Dc0), we used the number of cancer deaths in the screen group per centre and adjusted them for differences in size between the screen and control groups.
To calculate the expected numbers of prostate cancer deaths with effect of screening intervention, we used the following formulas:
Expected number of prostate cancer deaths in the screen group:
Ds1: expected number of prostate cancer (PC) deaths in screen group with an intervention effect; Ds0: expected number of PC deaths in the screen group without an intervention effect; Ps = compliance (the percentage of people in the screen group that actually receive one or more screening tests); r = expected reduction in prostate cancer mortality for people who attend screening.
We estimated the number of PC deaths in the control group, taking into account the contamination percentage by using:
Dc1: expected number of PC deaths in the control group with an intervention effect; Dc0: expected number of PC deaths in the control group without an intervention effect; Pc = expected contamination percentage; r = expected reduction in prostate cancer mortality by screening.
Calculation of the power
A power calculation in a screening trial is based on the distribution of mortality rates in the 2 study arms.4 In our calculation, the control arm is also affected by contamination effects of screening. To take that into account, we used the proportions of deaths that are in the screened arm (p0 and p1) as our test statistic. This method is essentially the same as the standard method of calculating the power in a screening trial.
The expected numbers of PC cancer deaths in a situation with and without screening were compared, using the following formulas:
p0 is the fraction of PC deaths in the screen arm in a situation without screening effect. p1 is the proportion between the screen and control PC deaths in a situation with screening under our baseline assumptions. Both values were normally distributed, with standard errors: