Two articles in this issue of Cancer are representative of an evolving trend in research on breast cancer screening, and each raises unique issues regarding how we should view the role of screening in the control of breast cancer and the importance of continuing to address questions concerning early breast cancer detection.
The first trial of breast cancer screening, the Health Insurance Plan of Greater New York (HIP) study, was launched in 1963.1 Seven additional trials in Europe and Canada followed and have since concluded, the most recent being the National Breast Screening Studies I and II (NBSS), which were initiated in Canada in 1980.2, 3 One additional trial currently is underway, the Age Trial, which was established by the United Kingdom Coordinating Committee on Cancer Research (UKCCCR) in 1991 to address the efficacy of initiating breast cancer screening in women ages 40–41 years.4 An interim analysis is expected in the near future.5
During the 40 years since trials of breast cancer screening were initiated, strong scientific evidence supporting the value of the early detection of breast cancer has accumulated and served as the basis for health policies supporting regular screening. However, during this same period, there have been ongoing debates, with occasional public flare-ups, over various aspects of breast cancer screening, most notably questions regarding screening efficacy for women in their 40s,6, 7 concerns about harms associated with false-positive results,8 and concerns regarding the detection rate of ductal cancer in situ and the overtreatment of potentially nonprogressive lesions.9–11 Some researchers have even questioned the value of breast cancer screening for any woman,12 despite the abundance of solid evidence and the conclusions of expert groups that screening effectively advances the time of breast cancer diagnosis and reduces mortality.13–17
Although the trials have made important contributions, individually and taken together the data have been limited in varying degrees with regard to answering enduring and new questions concerning the efficacy and effectiveness of mammography. Each randomized clinical trial followed a somewhat different protocol, and the outcome in each was influenced by a number of factors that have important implications for the interpretation of study results, including study methodology, the screening interval and number of views, participation rates in the study group (compliance), screening rates in the control group (contamination), and the number of screening rounds before an invitation to screening was extended to the control group. There also were other factors that likely influenced trial results, including the quality of the screening process, thresholds for a positive test, and follow-up for women with an abnormality. These limitations are most evident when considering the question of screening efficacy for women ages 40–49 years, primarily due to the eventual realization that wide screening intervals (i.e., ≥ 24 months) were less effective for premenopausal women. Until recently, inference more than unambiguous evidence was the basis for drawing conclusions regarding the effectiveness of screening in that group. The absence of solid data from which to draw evidence-based conclusions meant many years of debate, and many additional years of follow-up in order to increase statistical power. At this point, individual trials, meta-analyses, and long-term evaluation of ongoing screening programs have led expert groups to conclude that screening is beneficial after age 40 years.17–19
The limitations of the existing randomized control trials in estimating the true benefit of modern mammography, and interest in measuring the contribution of screening programs to the reduction of breast cancer mortality, have resulted in new investigations focused on evaluating the impact of screening in the community setting, referred to as service screening. If surveillance systems are adequate, the evaluation of service screening can focus on mortality reductions in an entire population, which somewhat simulates an intent-to-treat analysis, and also mortality reductions in women who actually participate in screening. The evaluation of service screening also can be applied to estimate differences in mortality over time due to screening compared with improvements in therapy and increased awareness, although establishing the relative contributions of screening and nonscreening factors is complex and only indirect estimates are possible.20–22 However interesting this question might be, it should not lead to competion between therapists and screeners over which group makes the greater contribution to reducing breast cancer mortality. Duffy et al. described the unique methodologic challenges in the evaluation of screening in the population.19 These challenges are not trivial, and include not only distinguishing screened from unscreened cohorts in the population but also adjusting end results for the influence of lead-time and length bias. An appreciation of the difficulty of measuring the contribution of service screening is important because evalution of population trends without attention to these important details may lead to unwarranted conclusions that mammography is not as effective in the service screening arena as it was in the trials.12, 23, 24 In fact, recent evaluations of service screening programs have shown that many programs are matching or exceeding the performance of the trials.19, 21, 22 Ongoing surveillance programs hold great potential today for providing data with which to answer questions concerning ways in which breast cancer screening can be made more effective and more cost-effective, and a growing global effort currently is underway to organize researchers to answer these questions.25, 26
A good example of the kinds of studies that we can continue to anticipate from trial data, especially questions that depend on long-term follow-up, can be found in the study by Warwick et al. in this issue of Cancer.27 Using data from the Swedish Two-County Trial, Warwick et al. have shown that the traditional prognostic factors for breast cancer (i.e., tumor grade, lymph node status, and tumor size) appear to have a lasting influence on survival 15–20 years after diagnosis. As the authors note, these findings are important because the prevailing assumption has been that tumor characteristics primarily are influential in the near term, and after 10 years are largely inconsequential. Although Warwick et al. have shown that the influence of tumor characteristics are attenuated over time, the effects do not completely diminish and still demonstrate a statistically significant influence after 20 years. What are the implications of these findings? Attenuation of the effect of these prognostic factors over time is to be expected, as the authors have noted, because individuals with very advanced disease are at greater risk of death in the early years after diagnosis. However, although long-term survival data demonstrate that most of the mortality occurs in the first 10 years after diagnosis, small declines in survival continue after 10 years.28 In the years since women in the Two-County Trial were diagnosed, surgeons and oncologists have developed a greater appreciation for the importance of long-term follow-up based on tumor characteristics at the time of diagnosis, and an appreciation for the limits of current therapy has led to investigations to evaluate alternative hormonal treatments. Recent findings, such as those from the MA17 trial,29 are encouraging and may hold the potential for further reductions in breast cancer mortality, although the implications suggest that some women eventually may be receiving sequential adjuvant therapy for many years longer than is currently recommended. Nevertheless, although the findings of the study by Warwick et al. reinforce the importance of continued progress in the development of therapeutic strategies for women with advanced breast tumors at the time of diagnosis, the more important finding for long-term prognosis and quality of life is that an early stage of disease at the time of diagnosis still is protective against breast cancer mortality after 20 years. Twenty years after diagnosis, a woman with a small, lymph node-negative tumor has a lower risk of dying of breast cancer than a woman diagnosed with advanced disease.
Ernst et al. compared breast cancer stage distribution, treatment, and survival in the south of the Netherlands both prior to the initiation of a program of biannual screening for women ages 50–69 years (1985–1991) and after the introduction of the program (1992–1999).30 They observed a significant improvement in stage distribution, prognosis, and survival in the age group invited to undergo screening, benefits that were most pronounced in those women with screen-detected tumors compared with women who were nonparticipants, women who skipped screening rounds, or those who were not yet invited to undergo screening. No improvement in survival was observed for women age < 50 years or those age > 70 years, neither of which was invited to screening.
Although the evaluation of a population-based breast cancer screening program can contribute to the scientific literature, an equally important goal of evaluation is to determine whether the program is achieving high standards of performance, and to identify those aspects that require further attention. The screening program in Tilburg resulted in a near-doubling of new breast cancer cases diagnosed at Stage I, and also in improved survival. However, fewer than half (40%) of the tumors were screen detected, 28% were interval tumors, and 32% were detected in women who either refused screening or had skipped 1 or more screening rounds. The interval tumor rate in this program should be a cause for concern because 40% of tumors diagnosed among women participating in screening were detected between screening rounds. This finding has several possible explanations. First, a higher interval tumor rate may be observed in the first few years of a screening program due to the program being at an early point in the learning curve. Second, the authors do not report the call-back rate, but if there is a policy to attempt to minimize the false-positive rate, this could, although not necessarily, be contributing to a higher rate of missed tumors. Third, some radiologists participating in the program may have low tumor detection rates. Ongoing evaluation of the performance of interpreting physicians can both identify those clinicians who need additional training as well as provide ongoing feedback regarding performance. Interval tumors in breast cancer screening programs are inevitable, but concerted efforts should be made to minimize the number of tumors diagnosed among women in a screening program that are not detected at the time of screening. Finally, the fact that 32% of breast cancer cases were diagnosed among women who refused participation or had lapsed participation in screening indicates that special attention should be devoted to understanding behavioral or structural factors that have led to a high rate of nonparticipation in screening programs.
It is unlikely that new trials of breast cancer screening will be initiated anywhere in the world, and thus we will continue to depend on existing trial data and the evaluation of service screening. The latter has great potential to answer presently unanswered questions regarding various approaches to breast cancer screening, and factors that may improve detection rates and minimize harms. Programs that organize the evaluation of service screening data, such as those currently underway in Sweden and other countries, and the Breast Cancer Surveillance Consortium and International Breast Screening Network supported by the National Cancer Institute are important steps toward insuring that knowledge continues to accumulate regarding the role of early detection in reducing the morbidity and mortality from breast cancer.