Modeling in Estimations of Mortality Reduction
Dr. Berry and colleagues prefer “modeling” rather than direct patient data, ignoring the reality that the results from modeling are determined completely by the assumptions on which the models are built. They evaluated the decline in breast cancer deaths that began in the United States in 1990 to determine the proportion caused by screening versus systemic therapy. Their 7 models widely disagreed in the proportion of mortality decline caused by screening, ranging from 28% to 65% (median, 46%), because they had differing assumptions. They ignored direct patient data from Sweden, the Netherlands, and other countries that consistently demonstrate that screening is the main reason for reduced mortality. Berry et al favor widely disparate computer models over actual patient data.
Complex reasoning is often used to obfuscate the facts and disparage screening. It can be difficult to unravel the analyses of opponents of screening, as in Berry's attack on our findings. Many readers may not understand his complicated, obscure, and misleading arguments but may defer to his mathematical expertise. On close inspection, the arguments that Berry makes have little bearing on reality or logic.
In response to our article, Berry presents a new model in which the majority of breast cancer deaths could be among women who were not participating in screening, even if screening had no benefit. To parallel our study, he imagines a cohort of women with rapidly progressive cancers that will be lethal within 17 years. He neglects to point out the important reality that, based on his own model, and with the exception of a few “interval” cancers, all of these cancers might have been detected by mammography had the women participated in screening.
His hypothesis requires that women destined to die from breast cancer have a premonition that makes them shun screening. If the majority of women destined to die from breast cancer avoided screening, then the majority of deaths would be among women who were not screened. The idea is simple, but it is highly speculative and bizarre. If it were true, then Berry has discovered a new way to stratify women for screening. Because the randomized controlled trials (RCTs) have proven that screening saves lives, Berry should argue that screening should be denied to women who want it (because they will not have lethal cancers), and only women who refuse to be screened should be screened! His model requires the implausible, the improbable, and a willing suspension of disbelief.
The Role of Observational Studies of Screening Mammography
Berry faults the media for exaggerating the benefits of screening, but then compliments the media when it suits him, citing an editorial in the Chicago Tribune supporting opposition to screening that disparaged coverage of a 2002 publication in Cancer. That article indicated that the death rate from breast cancer had declined in 7 counties in Sweden only after the start of population-based screening. The finding that screening saves lives had already been demonstrated by RCTs. The Swedish study answered the question, “what happens when screening is introduced into the general population?” The answer is that the breast cancer death rate goes down, and the authors of that study had extensive experience analyzing RCT data and were well aware of the potential pitfalls of drawing conclusions from observational data. The Chicago Tribune argued that the death rate had likely declined because of improvements in therapy and not because of screening, but it failed to mention that the Cancer article had addressed that specific issue and determined that screening was the key to the decline in mortality. The breast cancer death rate decreased to a much greater extent among women who participated in screening compared with those who did not, although they all had access to the same modern systemic therapy.
Berry, other critics, and the press forget that, although the hypothesis of population RCTs is that if women are offered screening that will reduce mortality, the lack of adherence to randomization presents strong biases against screening. Because of compliance bias (offered but not screened) and contamination bias (not offered but screened anyway), RCTs, and especially meta-analyses of RCTs, can underestimate what screening can achieve. When acknowledging these large biases in addition to other biases, it can be deduced that the RCT results underestimate the real mortality reduction of actual screening by 50% to 100%.
Every study that has investigated the correlation reveals that mortality from invasive breast cancer is directly related to the size of the cancer and the number of lymph node metastases. Mammography reduces mortality by identifying smaller cancers with fewer or no lymph node metastases (stage-shifting). This is the mechanism of screening which is successful in the great majority of invasive breast cancers that have a biologically progressive course. The great majority of breast cancers have such a progressive course as has been shown in the “spectrum” biological model to occur in about 70% of invasive breast cancers. Indeed, there appears to be a “tipping point” in size reduction with screening at which progressive mortality reduction begins.
Berry argues that those who recognize the scientific evidence (by RCTs) supporting breast cancer screening are incorrect and are confusing the public. We would argue that it is the unscientific opposition to screening that has created confusion. Berry questions challenges to the US Preventive Services Task Force (USPSTF) 2009 guidelines as if those guidelines have unimpeachable validity. In fact, the USPSTF panel conclusions were based on models and ignored important scientific facts. They based their recommendations on faulty analyses and subjective emphasis on the harms of so-called “false-positives.” The USPSTF failed to explain that “false-positives” are the approximately 10% or less of women who are recalled after screening for additional evaluation (a recall rate similar to cervical cancer [Papanicolaou] screening). Most recalls are resolved by an extra image or ultrasound. Fewer than 2% of screened women are recommended to undergo image-guided needle biopsy using local anesthesia, and 30% of those biopsies reveal cancer. This is a far greater yield than when biopsies are performed for palpable abnormalities, most of which are benign. Only 15% of palpable lesions that are biopsied prove to be cancer, and palpable cancers are much larger, later stage, and prognostically far worse than those detected by screening. The USPSTF decided that it was better to increase the risk of death from breast cancer than to risk the harm of anxiety and inconvenience of being recalled for additional evaluation. In codifying this subjective assessment, the USPSTF imposed their values on American women. In fact, screened women readily accept the alleged harms of screening for the proven mortality reduction benefits.
Relying on modeling instead of direct patient data, the USPSTF recommendation, ironically, ignores the demonstration by their own models that many more lives are saved by annual screening beginning at age 40 years. By using the USPSTF models, Hendrick and Helvie demonstrated that if women who were then in their thirties followed the USPSTF guidelines for biennial screening starting at the age of 50, then as many as 100,000 lives would be lost that could have been saved by screening annually beginning at age 40 years, because there was a further 70% decrease in breast cancer mortality in the USPSTF models when screening began at age 40 years. Yet the USPSTF promotes a screening program that is far less effective because of their subjective emphasis on “harms.”
A Timeline of 20 Years of Misinformation:
Contrary to Berry's assertions, a review of the history of the screening debates reveals that it is those who wish to reduce access to screening who have been misleading the public and making them “wary of all studies and guidelines.”
1992: Women were told that screening before age 50 years leads to earlier deaths
The National Breast Screening Study of Canada 1 (CNBSS1) made headlines by telling the media that the early excess of breast cancer deaths among screened women was caused by mammography squeezing cancer into the blood. This supposition had no scientific basis and was eventually recanted, but not before raising fears around the world.
There were major, fundamental flaws in CNBSS1. First, the quality of their mammography was poor. Their own study physicist stated that the quality of NBSS mammography was far below the state of the art, even for that time (early 1980s). Second, the randomization procedure was decentralized and unblinded. The trial included volunteers and not a geographic population, and there was a very high contamination bias in their control group. Randomization was not centralized but was dispersed into the multiple sites, a potentially great source of nonrandom allocation. An excess of cancer deaths in the screening arm was due to a compromise of the random allocation process. Women were first examined, and those with masses or large axillary lymph nodes were more frequently allocated to the mammography screening arm on open, local allocation lists. This excess of women with positive examination findings assigned to the mammography group caused a statistically significant excess of women with advanced or incurable cancers allocated to the screening arm. This subversion of random allocation explains why there were more early deaths among the “screened” women. This compromise of randomization also explains how the 5-year breast cancer survival rate among women in the “unscreened” control arm was >90% whereas it was only 75% among Canadian women outside the trial. These fundamental corruptions of the NBSS have been widely ignored by critics of mammography, as is evident in Berry's comments, and were ignored in the recent 25-year review.
1993: Retrospective, unplanned subgroup analysis of data lacking statistical power was used to drop support for screening women ages 40 to 49 years
In 1993, the National Cancer Institute (NCI) dropped support for screening women ages 40 to 49 years based on the CNBSS1 and subgroup analyses of the other RCTs despite their knowledge that the trials were not powered to evaluate women ages 40 to 49 years separately and had no power to permit legitimate recommendations based on the review. To continue their support, the NCI had required a statistically significant mortality reduction of 25% within 5 years of the start of screening. Such an expectation for early reduction in mortality is contrary to data and fundamentals of screening. Because of length bias sampling it takes 5 to 7 years after the onset of population screening to begin seeing a reduction in mortality. Furthermore, immediate reduction in the expected number of deaths was mathematically impossible given the number of women in their 40s who participated in the RCTs.
1994: Reinforcing the NCI position by grouping data to make age 50 years appear as a legitimate threshold when it is not
None of the factors associated with screening change abruptly at age 50 years or at any other age. Flawed analyses have made it appear that there is a major biologic change at age 50 years and in the factors associated with screening. Data were grouped and averaged to make what are continuous variables artificially appear to jump at age 50 years.[24, 25] There has never been a biologic or scientific reason to delay screening until age 50 years. Furthermore, >40% of the years of life that are lost to breast cancer are among women diagnosed at age <50 years. When RCTs are analyzed as they were designed, they have always demonstrated a significant reduction in breast cancer deaths from screening starting at age 40 years,[27, 28] yet the myth has been perpetuated that the age of 50 years is a valid threshold. Indeed, our failure analysis, which is criticized by Berry, revealed that 50% of breast cancer deaths at our institutions occurred in women who were first diagnosed at a median age of 49 years.
1997: Consensus Development Conference
A Consensus Development Conference of the National Institutes of Health on “Breast Cancer Screening for Women Ages 40-49” was held to reevaluate the NCI 1993 guidelines. The panel included Berry and had been convened to review the latest data on breast cancer screening for women ages 40 to 49 years. Updated results from RCTs indicated that the decline in deaths among women ages 40 to 49 years had become statistically significant, with a 35% mortality reduction in the Malmo trial and a 44% mortality reduction in the Gothenberg trial. Yet, on January 23, 1997, Berry read a summary to the meeting and the media that stated, in part, that the panel had concluded that the currently available data did not warrant a universal recommendation of mammography for all women in their 40s. The media were misled and were told that the panel was unanimous, but the late Dr. J. Petrek had resigned from the panel; and, after evaluating the evidence, Drs. D. C. Sullivan and R. T. Zern concluded that routine screening mammography should be actively encouraged for women in their 40s. The media reported the “unanimous” agreement unaware of the minority opinion.
The Consensus Development Conference summary, presented by Berry, ignored updated information:
- Without mention of the Gothenberg and Malmo trials, Berry suggested that the decrease in mortality might only be as much as 30%.
- Berry disparaged the mortality decline for women ages 40 to 49 years, saying that there was no difference in breast cancer mortality within 7 years. Berry, an expert in biostatistics, invokes length bias in his attack on our results, but he ignored this phenomenon in the NCI's expectation of an immediate decline in mortality among women in their 40s despite this being scientifically unlikely. In the Swedish Two-County RCT, the initial mortality reduction begins between years 5 and 7 after allocation to the RCT. The decreasing mortality for women ages ≥40 years in the screening group continues to improve up to 30 years after RCT allocation with a continuing, progressive decline in breast cancer deaths compared with the control group.
- Berry suggested that the decline in breast cancer deaths may be caused by women reaching the age of 50 years, at which point screening suddenly begins to save lives. Called “age creep,” this phenomenon was suggested by de Koning et al in 1995.34 Berry failed to tell the media that de Koning had retracted his article the day before Berry's 1997 Consensus Development Conference press conference and admitted that most of the decline in deaths was because of cancers detected while women were in their 40s. There is no such thing as “age creep,” yet the false argument was perpetuated by Berry.
- Although there is no proven risk from radiation to the breast for women aged ≥40 years—and there is almost certainly no risk from the extremely small mammographic doses of radiation—Berry told the media that annual mammograms during ages 40 to 49 years could cause an estimated 1 additional breast cancer death per 10,000 women without providing any scientific proof to support that speculation.
All this fallacious information was immediately made public by the media spreading unjustified fear among women, in contrast to Berry's assertion of positive media bias. The legitimate objections of attendees at the Consensus Development Conference to the dissemination of misinformation was ridiculed and belittled in the New York Times (NYT).
Richard Klausner, MD, then Director of the NCI, having heard much of the Consensus Development Conference, told the assembly that he disagreed with the panel and would have the National Cancer Advisory Board review the data. When the National Cancer Advisory Board advised Dr. Klausner that screening should begin at age 40 years and the NCI guidelines were revised to reflect that, opponents, ignoring Dr. Klausner's earlier disagreement, wrote that these revised NCI guidelines supporting screening for women in their 40s were politically driven.
2000: Lancet article disparages all breast cancer screening
Gotzsche and Olsen used unsupportable criteria to reject results from all but 2 of the 7 well done RCTs, accepting only the flawed CNBSS and Malmo trial data, and stated that those 2 adequately randomized trials demonstrated no effect—not even a tendency toward an effect—of screening on mortality from breast cancer. The most recent data from the Malmo trial, however, had demonstrated a benefit, and Gotzsche and Olsen ignored the obvious compromises of the allocation process, the volunteer nature of the participants, and other failings in the CNBSS as well as the high contamination bias. Their article caused a great deal of consternation and confusion and subsequently was rejected by the scientific community based on the unsupportable methodology that was used.
Gotzsche and Olsen were given a second opportunity by The Lancet in 2001 to publish a reanalysis of the data, but they made no change in their methods or conclusions. Reported on the front page of the NYT, doubt about screening spread once again. Meetings and rereviews of the screening data were held in the United States and Europe. All these consensus scientific meetings concluded that the Gotzesche and Olsen articles were methodologically poor, and they disparaged Gotzesche and Olsen's conclusions.[28, 41-43] The finding that mammography screening saves lives was again reinforced. Gotzsche, now heading the Cochrane Reviews and ignoring these scientific criticisms, still hopes to eliminate screening.
2007: American College of Physicians drops support for screening women ages 40 to 49 years
The American College of Physicians (ACP) published new breast cancer screening guidelines suggesting that screening should begin at the age of 50 years and could be repeated every 2 years. The Annals of Internal Medicine allowed only a truncated Letter to the Editor that highlighted 9 fundamental errors in this ACP review. The ACP authors' response to legitimate scientific criticism of their guidelines was, “We disagree.”
In 2010 the ACP and the American College of Radiology (ACR) met to find common ground on guidelines, agreeing that mammography screening decreases the number of deaths from breast cancer among women ages 40 to 74 years and that its primary benefit is a reduction in mortality from breast cancer. The Annals of Internal Medicine refused to publish a joint ACP/ACR review that recognized the importance of starting screening at age 40 years.
2008-2012: Attempts to shore up opposition to screening
Publications have suggested that the introduction of screening has had little effect on death rates in Europe.[48-50] These were registry reviews by analysts outside the countries specified and without direct patient data. Studies tracking individual patients by analysts from within the countries indicated that screening was related to major declines in breast cancer deaths.[51-54]
2009 USPSTF: 2013 Overdiagnosis takes center stage
“Overdiagnosis,” as previously refuted, has been reintroduced in the latest effort to limit screening. An article by Bleyer and Welch is the most recent, methodologically inappropriate effort among several[2, 43] promulgating this concept. RCTs are the only method for accurately assessing overdiagnosis, and these trials have demonstrated that the overdiagnosis rate is <10% and probably <1%. Reports of higher rates are based on poor methodology. Although the treatment of preinvasive ductal carcinoma in situ (frequently referred to as “overdiagnosis”) remains a legitimate area of disagreement, there are strong indications that its removal has resulted in a decline in subsequent invasive cancer incidence.[59, 60] Pathologists agree that the vast majority of invasive ductal carcinomas have their origins in the accompanying ductal carcinoma in situ. Almost all invasive cancers will become clinically evident and potentially lethal if not treated. It is the dissemination of scientifically unsupportable concepts and misinformation that has resulted in confusion among women and physicians.
Berry faults journals like Cancer that, “over the past 20 years or so… have published articles promoting cancer screening” that are “typically overly optimistic about screening benefits.” In fact, major biases by prominent journals have precluded the publication of articles supporting breast cancer screening, especially for women ages 40 to 49 years. For instance, all the biases in mammography RCTs minimize mortality reductions.
A review published in Radiology listed fundamental problems with the USPSTF analysis. The Annals of Internal Medicine, having rejected that article, subsequently published an editorial stating that opposition to the USPSTF was nothing more than emotion, politics, and anecdote. The New England Journal of Medicine (NEJM), also having rejected the same article, subsequently published a Sounding Board article that dismissed concerns raised by screening experts as nothing more than “vested interests.” Despite numerous submissions, the NEJM has failed to publish articles in support of screening, particularly for women ages 40 to 49 years. If Berry discovers biased publications and misinformation passed on by the media, then he need only look to the NEJM and the NYT. Two clear examples are NEJM articles by Kalager et al in 201063 and one by Bleyer and Welch in 2012, both of which were promoted by major articles in the NYT.
In their article, Kalager et al claimed that the introduction of screening in Norway had little effect on the breast cancer death rate. An immediate drop in breast cancer deaths is highly unlikely, as noted above, yet Kalager and colleagues' median follow-up was only 2.2 years. Those authors claimed that very few women were being screened before the start of their analysis; but, in fact, >40% of Norwegian women were being screened before the program began, thus reducing the apparent impact of beginning the screening program. Despite all of the biases, that flawed study attributed 33% of the reduction in mortality to screening. That scientifically inadequate article with false information and only 2.2 years of follow-up was covered on the front page of the NYT. A subsequent Swedish report with 18 years of follow-up66 in which a significant decline in breast cancer deaths associated with screening was demonstrated among women ages 40 to 49 years was covered only briefly on a NYT back page and was not emphasized.
The NEJM article by Bleyer and Welch claimed that mammography screening has led to massive “overdiagnosis” of breast cancer in the United States. Those authors claimed that as many as 70,000 women with breast cancer in 2008 alone (almost one-third of all cases) had breast cancers detected by mammography that would have regressed or disappeared on their own if left undetected. This is complete conjecture and is not supported by any direct patient data. With greater than 200,000 female invasive breast cancers diagnosed every year, there should be many reports of invasive breast cancers regressing spontaneously, yet there is not a single credible report of this phenomenon ever occurring. Bleyer and Welch extrapolated data from the Surveillance, Epidemiology, and End Results registry without direct access to patient records. The authors could not determine which patients had mammograms or which cancers were mammographically detected, yet they faulted mammography. They relied on estimates and extrapolations that were grossly incorrect. More accurate estimates demonstrate that there has been little or no overdiagnosis of invasive cancers57,64; yet, because the article was in the NEJM, it has become “fact.” The NYT simultaneously published a supporting OpEd piece by Welch himself promulgating his faulty conclusions. A subsequent multipage argument against screening was published in the NYT Magazine that failed to mention any scientific data, including the RCTs. Consumer Reports stated that cancer screening is “oversold” and advised women to wait until the age of 50 years to begin screening. Even the Ladies Home Journal advised women aged <50 years that they can “blow off” screening until age 50 years.
Berry and critics who wish to reduce access to screening are convinced that improvements in therapy are the reason that the age-adjusted death rate from breast cancer—which had been unchanged since 1930—began to decline in 1990 soon after the start of screening at a national level and temporally unrelated to any immediate advances in therapy. If improvements in therapy can be credited with decreased breast cancer mortality, then why is it that oncologists have not supported reduced access to screening? Oncologists recognize that systemic therapy reduces mortality by a similar proportion whether cancers are discovered early or late in their biologic growth. With the exception of HER2-positive breast cancers treated with trastuzumab, cancers that are detected late, as by palpating a mass, have a far worse prognosis despite the proportional mortality reduction of systemic therapy.
RCTs prove that screening mammography significantly reduces breast cancer mortality in women aged ≥40 years despite their negative biases, but Berry and others confuse women and physicians by suggesting that there is little if any benefit. Once again, we emphasize that the only way to prove that screening for breast cancer saves lives is through RCTs. These trials of mammography screening, despite underestimating the benefit of screening because of contamination and noncompliance biases, have clearly demonstrated significant mortality reductions from screening women beginning at age 40 years, and our study merely supports these findings. Our failure analysis investigated survival from the date of diagnosis of women who died from breast cancer with a minimum of 8 years, a maximum of 17 years, and a median of 12.5 years of follow-up. The results clearly indicate that 50% of women who ultimately died from breast cancer were diagnosed at age <50 years, and 70% of the women who died from breast cancer were in the 20% of women who did not participate in screening. These observational findings support the RCTs that have also demonstrated benefit in early mammographic screening. Those studies estimate mortality reductions of 25% to 31% for women who are invited to screening and of 38% to 48% for women who actually are screened. Berry would prefer to rely on assumption-based modeling while ignoring actual patient data. Berry also criticizes the findings of observational studies such as ours as though there were not RCTs that have reported similar findings. As reported but not acknowledged by the USPSTF, their models have demonstrated that the most deaths are averted by screening women annually beginning at the age of 40 years. Our failure analysis strongly supports the recommendation that women should not delay screening until the age of 50 years but should consider more frequent screening at younger ages and perhaps less frequent screening at older ages.