The cumulative risk of a false-positive recall in the Norwegian Breast Cancer Screening Program
Biennial breast cancer screening for women ages 50–69 years is recommended by the World Health Organization. It has been claimed that the cumulative risk of a false-positive recall is a significant disadvantage in breast cancer screening programs. The primary objective of this study was to estimate the cumulative risk of a false-positive recall during a screening period of 20 years in women ages 50–51 years who are screened biennially in a population-based screening program. A secondary objective was to estimate the cumulative risk of undergoing fine-needle aspiration cytology, core needle biopsy, and open biopsy with benign morphology in the same group of women.
The Norwegian Breast Cancer Screening Program invites all women ages 50–69 years who reside in the country to a 2-view mammography biennially. A nationwide data base that covers all of the invited women includes individual information about all screening activity. Results from three screening rounds in four counties were the basis for this study. False-positive recalls due to abnormal mammograms among 83,416 women who participated all the 3 screening rounds were the basis for the estimations.
It was calculated that women ages 50–51 years who participate in biennial screening run a cumulative risk of 20.8% for a false-positive recall during a screening period of 2 decades. The cumulative risk of undergoing fine-needle aspiration cytology was estimated at 3.9%, and the risk of undergoing core needle biopsy or open biopsy with benign morphology was 1.5% and 0.9%, respectively.
False-positive recalls are a disadvantage in a breast cancer screening programs, but the cumulative risk seemed to be acceptable in the Norwegian Breast Cancer Screening Program. It is important to communicate the existence and extent of this risk to the target group. Cancer 2004. © 2004 American Cancer Society.
The Word Health Organization (WHO) advises women ages 50–69 years to have breast cancer screening biennially.1 Today, breast cancer screening is available in most Western countries. The benefit of the screening is indicated by years of lives saved.1–3 However, breast cancer screening includes human as well as economic costs. A false-positive recall is one of the disadvantages that reportedly causes adverse psychological consequences on an individual level.4–8 It has been claimed that the cumulative risk of experiencing a false-positive recall during a long screening period is substantial.9 However, this risk has not been investigated thoroughly. Lack of consistency in definitions (recall and false-positive recall9–13), combined with different ways of organizing the breast cancer screening, complicates the issue further.13–16
The primary objective of this study was to estimate the cumulative risk of a false-positive recall in a cohort of women ages 50–51 years who will participate in the Norwegian Breast Cancer Screening Program (NBCSP) biennially in 10 screening rounds until they are ages 68–69 years. The secondary objective was to estimate the cumulative risk of undergoing an invasive procedure with benign morphology for the same group of women. Invasive procedures were separated into three assessment groups: fine-needle aspiration cytology (FNAC), core needle biopsy (CNB), and open biopsy (OB).
MATERIALS AND METHODS
The NBCSP was founded by the Norwegian government and is administered by the Cancer Registry of Norway. The program started in 1996 as a 4-year pilot in 4 counties and became nationwide in February, 2004.17 The program is carried out on the basis of a quality-assurance manual,18 which was based on European guidelines19 and the International Agency for Research on Cancer (IARC) Handbook of Cancer Prevention.1 The target population is ≈ 460,000 women ages 50–69 years. The fee is approximately 25 Euros per screening round for each woman. The public health care system covers the medical expenses in relation to diagnostic work-up and medical treatment. This study was based on results as of December 31, 2002 from the first 3 screening rounds in the 4 counties where the pilot started.
Process Indicators in the First Three NBCSP Screening Rounds
Approximately 160,000 women were invited to each of the first 3 screening rounds. Overall, 126,659 women attended the first screening round, 127,297 women attended the second screening round, and 130,350 attended the third screening round, for an attendance rate close to 80%. A total of 854 breast cancers were diagnosed in the first screening round, 657 were diagnosed in the second screening round, and 717 were diagnosed in the third screening round. Approximately 20% were ductal carcinoma in situ (DCIS). The mean tumor size was 15.0 mm (95% confidence interval [95% CI], 14.2–15.7 mm) in the first round, 14.4 mm (95% CI, 11.4–15.2 mm) in the second round, and 14.0 mm (95% CI, 13.3–14.7 mm) in the third round. Among the invasive carcinomas, 21.4%, 27.0%, and 24.1% were lymph node positive in the first, second, and third round of the program, respectively. Interval cancer was diagnosed in 245 women (including 14 women with DCIS) between the first and second screening rounds and in 231 women (including 16 women with DCIS) between the second and third rounds. The sensitivity was estimated at 77.7% in the first round and 74.0 in the second round. The specificity was 95.8% and 96.0% in the first and second round, respectively.
Invitations and Recalls
All women ages 50–69 years who are resident in Norway are invited personally by letter to participate in the NBCSP. A unique 11-digit personal identification number (PIN) given to all inhabitants of Norway identified the women. The women were offered a two-view mammography biennially. The 2-year intervals could be changed somewhat if women moved to another municipality. However, this was relevant for only approximately 0.1% of the women who participated in the first 3 screening rounds.
The same technical mammography equipment was used in all three screening rounds. Two radiologists interpreted the mammograms of each woman independently. Interpretations of the mammograms and diagnostic work-up took place at centralized breast clinics at university or county hospitals. Four breast clinics with specialized radiologists were involved in the interpretations. The radiologists read mammograms from > 5000 women each year. A 5-point scale (1, normal/benign; 2, probably benign; 3, intermediate; 4, probably malignant; 5, malignant) was used for the interpretations. A score of 1 by both radiologists was regarded as a negative screening, whereas a score ≥ 2 by a single radiologist resulted in a consensus and a final decision for recall.18 Mammograms of insufficient technical quality or self-declared symptoms also were reasons for recall. However, the estimates in the current study were based on abnormal mammograms that were accompanied by a recommendation for diagnostic work-up, because this is the most common definition used.1, 10, 12 The recall rate due to insufficient technical quality was 0.4%. An additional recall rate of 0.4% was due to self-declared symptoms.
The diagnostic work-up took place 1–4 weeks after screening. The work-up had to be finished within 4 months. If no malignancy was stated, then it was recommended that the women participate in the ordinary program. All cancers (invasive carcinomas and DCIS) were verified histologically.
Additional mammograms and ultrasound studies (noninvasive methods) are used to differentiate abnormal mammograms. If these methods are insufficient, then an invasive procedure, such as FNAC, CNB, OB, is performed. A false-positive recall is defined as a diagnostic work-up with a negative outcome (including both noninvasive procedures and, eventually, biopsy). An invasive procedure with benign result means an FNAC, a CNB, or an OB with benign morphology. An FNAC or a biopsy usually is performed subsequent to a noninvasive procedure that does not result in a clarified outcome. All information about attendance, screening outcome, and diagnostic work-up was registered in the central nationwide data base with a PIN used as unique identification for each woman.
The Study Population
In total, 159,747 women ages 50–69 years were invited to the NBCSP's first screening round. Among these women, 132,099 were age 50–65 years and potentially could participate in all 3 screening rounds. Women who had notified the Cancer Registry that further invitations were unwanted or who had a breast cancer diagnosed in the routine screening or in the interval between screenings were excluded (n = 10,396 women). Of the remaining 121,703 women, 83,416 women (68.5%) participated 3 times. To date, these women have attended in compliance with IARC1 and NBCSP recommendations. Data from the 83,416 women, corresponding to approximately 1,000,000 mammograms during 3 screening rounds, formed the basis for the estimations (4 mammograms for each women in each of the 3 screening rounds).
One objective of the current study was to estimate the cumulative risk of a false-positive recall for a woman who enters the screening program at age 50–51 years and intends to participate in compliance with the program's recommendations until she reaches age 68–69 years. The estimates were based on the three screening rounds performed in four counties. The probability of at least 1 false-positive recall due to abnormal mammograms in 10 biennial screening rounds was expressed by P(rc1 ∪ rc2 ∪ … … ∪ rc10), where a false-positive recall in screening round j was denoted rcj; j = 1, 2, …, 10; and P denotes the probability: P(rc1 ∪ rc2 ∪ … … ∪ rc10) = ΣiP(rci)+ (− 1)2− 1Σi < jP(rci ∩ rcj) + (− 1)3 − 1 Σi < j < kP(rci ∩ rcj ∩ rck) + (− 1)10 − 1P(rc1 ∩ rc2 ∩ … …rc10).
This formula may be approximated by P(rc1 ∪ rc2 ∪ … … ∪ rc10) = ΣiP(rci) + (− 1)2 − 1Σi < jP(rci ∩ rcj) + (− 1)3 − 1Σi < j < kP(rci ∩ rcj ∩ rck), because all terms (including recalls in 3 screening rounds) were small and could be neglected. The approximation will slightly overestimate P(rc1 ∪ rc2 ∪ … … ∪ rc10).
ΣiP(rci) was estimated by adding all of the numbers shown in boldface in Table 1. P(rci ∩ rcj) was estimated directly from the observed data for i = 1, 2; j = i + 1. For the other terms, independence was assumed: P(rci ∩ rcj) = P(rci) * P(rcj).
Table 1. The Proportions of False-Positive Recalls and Invasive Procedures with Benign Morphology by Age and Screening Round Carried Out in Women who Participated in the Norwegian Breast Cancer Screening Program in Compliance with the Program's Recommendationsa
|50–51||14,859||4.4 (4.1–4.8)||1.7 (1.5–1.9)|| || || || || || |
|52–53||12,797||3.9 (3.6–4.2)||1.4 (1.2–1.7)||14,859||2.6 (2.3–2.8)||0.6 (0.5–0.7)|| || || |
|54–55||10,440||3.7 (3.3–4.1)||1.5 (1.3–1.7)||12,797||2.3 (2.1–2.6)||0.6 (0.4–0.7)||14,859||2.4 (2.1–2.6)||0.6 (0.5–0.8)|
|56–57||10,176||3.8 (3.4–4.2)||1.4 (1.2–1.7)||10,440||2.3 (2.1–2.6)||0.6 (0.5–0.8)||12,797||2.3 (2.0–2.5)||0.7 (0.5–0.8)|
|58–59||9426||2.9 (2.6–3.3)||1.1 (0.9–1.3)||10,176||2.2 (1.9–2.5)||0.6 (0.5–0.8)||10,440||1.9 (1.6–2.1)||0.4 (0.3–0.6)|
|60–61||8702||3.0 (2.7–3.4)||1.2 (1.0–1.5)||9426||2.0 (1.7–2.3)||0.6 (0.5–0.8)||10,176||2.0 (1.8–2.3)||0.5 (0.4–0.7)|
|62–63||8387||3.0 (2.7–3.4)||1.1 (0.9–1.4)||8702||2.2 (1.9–2.5)||0.5 (0.3–0.6)||9426||2.1 (1.8–2.4)||0.6 (0.4–0.7)|
|64–65||8629||2.7 (2.3–3.0)||0.9 (0.7–1.2)||8387||2.1 (1.8–2.5)||0.5 (0.3–0.6)||8702||1.9 (1.6–2.2)||0.5 (0.3–0.6)|
|66–67|| || || ||8629||1.7 (1.4–0)||0.4 (0.3–0.6)||8387||1.6 (1.3–1.9)||0.5 (0.4–0.7)|
|68–69|| || || || || || ||8629||1.8 (1.5–2.1)||0.4 (0.3–0.5)|
|Total||83,416||3.5 (3.4–3.7)||1.3 (1.3–1.4)||83,416||2.2 (2.1–2.3)||0.6 (0.5–0.6)||83,416||2.0 (1.9–2.1)||0.5 (0.5–0.6)|
P(rc1 ∩ rc2 ∩ rc3) was estimated directly from the observed data. For the other terms, independence was assumed: P(rci ∩ rcj ∩ rck) = P(rci) * P(rcj) * P(rck). The same procedure was used to estimate the cumulative risk of undergoing a biopsy with benign morphology, which was given as the cumulative risk of undergoing a biopsy overall (for FNAC, CNB, and OB together) and separately for FNAC, CNB, and OB.
Table 1 demonstrates a decreasing rate of false-positive recalls by screening rounds. In the cohort of women ages 50–51 years in the first screening round, the false-positive recall rates are 3.5% (95% CI, 3.4–3.7%), 2.2% (95% CI, 2.1–2.3%), and 2.0% (95% CI, 1.9–2.1%) in the first, second, and third screening round, respectively. The rates require attendance in all three screening rounds. The rates decrease by age in all 3 screening rounds. The highest rate, 4.4% (95% CI, 4.1–4.8%), is seen in the first screening round in women ages 50–51 years. There was a false-positive recall in 25 of 14,589 women (0.17%) in both the first and second screening rounds, in 22 of 14,589 women (0.15%) in the first and third screening rounds, and in 12 of 14,589 women (0.08%) in the second and third screening rounds. Three of 14,589 women (0.02%) had false-positive recalls in all 3 screening rounds. Seven of 83,416 women were recalled all 3 screening rounds. When using the estimation procedure described, the cumulative risk of a false-positive recall is 20.8% for a cohort of women ages 50–51 years who will be screened biennially for 2 decades (10 screening rounds).
Diagnostic work-ups with benign outcomes were conducted only by additional mammograms and ultrasound studies in 62.0% of women (95% CI, 60.2–63.8%) in the first screening round, increasing to 75.1% of women (95% CI, 73.1–77.1%) and 74.2% of women (95% CI, 72.1–76.3%) in the second and third screening round, respectively (not shown in Table 1). Hence, 38.0%, 24.9%, and 25.8% of women underwent an invasive procedures in the first, second, and third round, respectively. The rates of women undergoing invasive procedures with negative outcomes decreased by age and screening round (Table 1). Three of 14,859 women (0.02%) underwent 2 invasive procedures with benign morphology either in the first and second screening rounds, or in the first and third screening rounds, or in the second and third screening rounds. One woman in 83,416 underwent such procedures in all 3 screening rounds. When using the estimation procedure described above, the estimated cumulative risk of undergoing an invasive procedure with benign morphology is 6.2% for women who enter the NBCSP at ages 50–51 years and participate in compliance with the program's recommendations.
Invasive procedures included FNAC, CNB, and OB. FNAC constituted 68.2% (95% CI, 65.3–70.9%) of negative invasive procedures in the first screening round, 70.6% (95% CI, 66.2–74.7%) of negative invasive procedures in the second round, and 56.8% (95% CI, 52.0–61.5%) of invasive procedures in the third round. CNB constituted 11.0% (95% CI, 9.2–12.9%), 16.7% (95% CI, 13.4–20.4%), and 30.0% (95% CI, 25.8–34.5%) of negative invasive procedures in the first, second, and third screening round, respectively. The proportion of OB decreased from 20.9% (95% CI, 18.5–23.4%) of negative invasive procedures in the first screening round to 12.8% (95% CI, 9.9–16.2%) and 13.2% (95% CI, 10.2–16.7%) of negative invasive procedures in the second and third round, respectively (data not shown).
The invasive procedures carried out in women who participated in the first three screening rounds in the NBCSP are shown in Table 2. Three of 14,859 women (0.007%) underwent an FNAC with benign morphology that was performed in the first and second screening rounds, the first and third screening rounds, or the second and third screening rounds. Only 1 woman in 83,416 (0.001%) underwent an FNAC with benign morphology in all 3 screening rounds. Three women in 83,416 (0.004%) underwent a benign CNB in the first and second screening rounds or in the second and third screening rounds, whereas 1 woman (0.001%) underwent a benign CNB in the second and third screening rounds. No women were registered with more than two CNB procedures or more than one negative OB procedure. When using the estimation procedure described above, the cumulative risk of undergoing an FNAC was estimated at 3.9% in women who enter the NBCSP at ages 50–51 years and who participate in compliance with the program's recommendations until age 69 years. The risk of undergoing a CNB with benign morphology was estimated at 1.5%, and the risk of undergoing a benign OB was estimated at 0.9%.
Table 2. The Proportions of Different Invasive Procedures with Benign Morphology, by Age and Screening Round Carried Out in Women who Participated in the Norwegian Breast Cancer Screening Program in Compliance with the Program's Recommendationsa
|50–51||14,859||1.16||0.20||0.31|| || || || || || || || |
|52–53||12,797||1.02||0.16||0.26||14,859||0.42||0.13||0.02|| || || || |
|66–67|| || || || ||8629||0.30||0.02||0.09||8387||0.32||0.16||0.02|
|68–69|| || || || || || || || ||8629||0.23||0.09||0.05|
Of 2147 women who were recalled in the third screening round, 1698 women obtained a negative result. In total, 888 women underwent a biopsy, and 449 women were diagnosed with breast cancer (50.6%; 95%CI, 47.2–53.9%).
It is estimated that every fifth woman age 50–51 years who will participate in breast cancer screening in compliance with the IARC and NBCSP recommendations will experience a false-positive recall over a 20-year screening period. Furthermore, according to the estimations, these women will run a cumulative risk of 6.2% of undergoing a negative invasive procedure and a risk < 1% of undergoing an open biopsy with benign morphology.
A false-positive recall is considered a hazard in breast cancer screening. The disadvantage relates to the diagnostic work-up, biopsy, and anxiety that would never have happened in the absence of screening. It has been claimed that the cumulative risk for a false-positive recall during a screening period is high.9, 12 The accuracy of that statement may be questioned. Results from Denmark indicate a risk similar to that found in the current study.20 Conversely, Elmore et al. calculated that the cumulative risk was 49.1% after 10 mammograms,12 and Christensen et al. showed that a cumulative risk after 9 mammogram varied from 5% to 100%, depending on individual risk factors.9 The study by Elmore et al. was followed by numerous responses.13–15, 21, 22 None of those responders have published any work suggesting alternatives to the cumulative risk shown by Elmore et al. To our knowledge, no clarification has been published to date, although guidelines of estimation have been presented by the IARC.1 Hence, there is a need for more studies on quality-controlled data.
The main reason for the varying estimates probably is due to different ways of organizing a screening program, including the methodologic and health service systems.23 Malpractice and the fiscal environment probably also have influence. The NBCSP is a part of the public health system in Norway. There is only a single central administration, and all activity is run in accordance with the program's quality-assurance manual, which was adapted from the European guidelines.19 Double-independent readings, the use of arbitration, and the number of mammograms read by each radiologist conceivably are factors that contribute to the achieved recall rate. Accurate reporting of recalls linked to the PIN, the nationwide screening data base, and the fact that the NBCSP is a population-based screening program with a high attendance rate make the data in this study of high quality. Furthermore, since 1952, it has been compulsory for all physicians in Norway (hospital departments and histopathology laboratories) to report invasive cancers to the Cancer Registry, which is almost 100%, complete for breast cancer.
The estimations in the current study were based on the assumption that a woman age 50–51 years attends biennially screening in 10 screening rounds or until breast cancer is diagnosed. The assumption excludes women who drop out of the screening program. It is claimed that drop-out is more common among women who have an experience of a false-positive recall compared with women who have a negative screening experience.7, 24, 25 Studies have different approaches to the topic and, thus, may have heterogeneous results.6–8, 24–26 Because of this argument, we compared reattendance by screening outcome in the group of women that formed the basis for this study (n = 132,099 women). In the second screening round, reattendance rates were 88.3% and 86.3% (P < 0.01) for women who had been screened negative and false positive, respectively. In the third screening round, reattendance was 91.4% among women who had 2 previous negative screening tests, 91.0% among women who were screened negative and false positive, and 88.9% among women who had 2 false-positive recalls (P = 0.39). It seems reasonable that a false-positive recall can cause more harm than a negative screening test and, thus, may influence the rate of reattendance; however, this does not seem to be decisive in the NBCSP.
The psychological consequences of a false-positive recall may depend on how prepared the women are for a recall and the methods used in the diagnostic work-up. There is a distinct difference between biopsies carried out in the clinic. It is obvious that FNAC and CNB procedures cause other aspects in terms of risk, anxiety, and morbidity compared with an OB procedure carried out under general anesthesia in an operating theatre. In our estimation, ≈ 7 of 10 recalled women have their status clarified only with the use of additional mammograms and ultrasound studies. Accordingly, a preliminary result usually is given immediately after the work-up. This contributes to a limited period of waiting. It has been shown that the time lag of a recall influences the psychological consequences.6, 7, 27 To undergo an invasive procedure may cause more adverse psychological consequences than additional mammograms and ultrasound studies in relation to both the examination and the time lag. Only 6.2% of participating women have a cumulative risk of a work-up that includes undergoing an invasive procedure with negative results. Several studies have covered the psychological effects of a recall,4, 6, 7, 27 but only a few of them separated recalls according to the methods used in the work-up.7 Only 0.9% of women run a cumulative risk of undergoing an OB with benign morphology. In any event, the likelihood that women will undergo an FNAC or a CNB before an OB is high. Thus, it is important to view the cumulative risk of different types of biopsy both separately and together.
Table 1 demonstrates that a previous screening mammogram decreases the false-positive recall rate. Access to previous mammograms in the interpretation may be the main reason.28 Unfortunately, data from only three screening rounds are available currently; however, the age tendencies in all screening rounds validates the assumption for the estimation. The estimation procedures in this study assume independence in the risk of recall between the screening rounds. The approximated formula includes 45 negative terms of type P(rci ∩ rcj); i < j and 120 positive terms of type P(rci ∩ rcj ∩ rck); i < j < k. Assuming independence between rci and rcj, the contribution from Σi < jP(rci ∩ rcj) is estimated at 2.36%. The observed P(rc1 ∩ rc2) + P(rc1 ∩ rc3) + P(rc2 ∩ rc3) is 0.40%, compared with the corresponding estimated value of 0.28% based on the independence assumption. This rough estimation indicates that the contribution from Σi < jP(rci ∩ rcj) can be 3.32% (2.36 * 0.40/0.28). Only 7 of 83,416 women (0.01%) had a false-positive recall in all 3 screening rounds. In summary, the estimated 20.8% cumulative false-positive recall rate could be overestimated by approximately 1% (range, 3.32–2.36) because of the independence assumption.
The current study provides estimates only for recalls due to abnormal mammograms; however, recalls also can occur due to technical reasons (e.g., failures with the equipment or in the developing process) or to a self-declared lump. The recall rate due to technical reasons and self-declared lumps is approximately 1% per screening round. Inclusion of those recalls will raise the cumulative risk of a false-positive recall to ≈ 30%.
In conclusion, to justify screening as a health care offer, the benefits have to outweigh the disadvantages. The risk of experiencing a false-positive recall can be considered a disadvantage. However, the recall rates should not be too low, because there is a risk of missing detectable cancers. This study shows a cumulative risk of 20.8% for a false-positive recall for women ages 50–51 years who attend biennial breast cancer screening over 2 decades, and 6.2% of those women have a risk of undergoing a negative, invasive procedure. The cumulative risk of undergoing an OB with benign morphology was estimated at 0.9%. All of these estimates seem to be acceptable and ought to be communicated to the target group. There is a need for further knowledge about the extent of recalls with negative outcomes, about methods of examination, and about the psychological consequences involved.