Department of Research, Cancer Registry of Norway, Oslo, Norway
Faculty of Health, Oslo and Akershus University College of Applied Sciences, Oslo, Norway
Corresponding author: Solveig Hofvind, PhD, Cancer Registry of Norway, Oslo and Akershus University College of Applied Sciences, Majorstua 0403, Oslo, Norway; Fax: (011) 47 22 45 13 70; Solveig.Hofvind@kreftregisteret.no
Some false-positive results are inevitable in mammographic screening, but the impact of false-positive findings on the program and the participants is a disadvantage of screening. The objective of the current study was to estimate the cumulative risk of a false-positive result over 10 biennial screening examinations and the cumulative risk of undergoing an invasive procedure with a benign outcome in women screened between the ages of 50 years to 69 years.
A retrospective cohort study was performed in 231,310 women aged 50 years to 51 years at the time of first mammography screening who underwent 715,311 screening mammograms in the Norwegian Breast Cancer Screening Program from 1996 through 2010. Generalized linear mixed models were used to estimate the probability of a false-positive screening result and to compute the cumulative false-positive risk for up to 10 biennial screening examinations.
The cumulative false-positive risk after 20 years of biennial screening for women who initiated screening aged 50 years to 51 years was 20.0% (95% confidence interval [95% CI], 19.7%-20.4%). The cumulative risk of undergoing an invasive procedure with a benign outcome for the same group of women was 4.1% (95% CI, 3.9%-4.3%). The cumulative risk of undergoing a fine-needle aspiration cytology, core needle biopsy, or open biopsy with a benign outcome was 1.4% (95% CI, 1.3%-1.5%), 2.0% (95% CI, 1.9%-2.1%), and 0.16% (95% CI, 0.13%-0.19%), respectively.
False-positive screening results are a concern in mammographic screening, although a certain rate is inevitable and must be accepted for adequate cancer detection. The negative effects of false-positive results have been widely noted and include the psychological harm of being recalled for further assessment, particularly in women who undergo a biopsy. Furthermore, a false-positive screening result entails extra economic costs to the screening program[2, 3] and may lead to decreased participation in future screenings.[4, 5]
Most screening programs in Europe invite women aged 50 years to 69 years to mammographic screening every 2 years. A recent study based on results from European screening programs demonstrated an average recall rate of 4% at screening rounds after the first screen (range, 1%-11%). The cumulative risk of a false-positive screening result is defined as the risk of experiencing at least 1 false-positive recall if a woman is screened biennially from ages 50 years to 69 years. In the European study, the pooled estimate of the cumulative risk of a false-positive recall after 10 rounds of screening was 20% and the cumulative risk of an invasive procedure with a benign outcome was 3%. The recall rate and the cumulative risk of a false-positive screening result are reported to be substantially higher in the United States, ranging from 13% to 16% at first screen and 8% to 10% at subsequent screens. The cumulative risk of a false-positive result after 10 years of annual screening in the United States ranges from 42% to 61% for a recall and from 4.8% to 18.6% for a biopsy recommendation.[7-10]
The first European study estimating the cumulative risk of a false-positive screening result used information from the first 3 screening rounds in only 4 of the 19 counties of Norway from 1996 through 2002. The estimates were based on direct probability calculations and did not include adjustment for any factors such as the calendar year or the variability among the counties. In addition, the estimates assumed that each screening result was independent of prior screening results. The study was one of 3 included in the recently published review of false-positive screening results in European screening programs, which identified only 2 prior studies estimating the cumulative risk of an invasive procedure with a benign outcome and only 1 study that had adjusted for confounding factors.
The availability of longer follow-up time, data from all 19 counties, and more appropriate estimation methods underscore the need for an update of the estimates of the cumulative risk of a false-positive screening result in the Norwegian Breast Cancer Screening Program (NBCSP). The goal of the current study was to update the estimates of the cumulative risk of a false-positive screening result (including additional assessment with mammography, ultrasound, and/or an invasive procedure) and the risk of a recall for further assessment including an invasive procedure (fine-needle aspiration cytology [FNAC], core needle biopsy [CNB], or open biopsy [OB]) with a benign outcome, using 15 years of individual-level data collected as a part of the NBCSP.
MATERIALS AND METHODS
The study population included all women with at least one screening examination performed in the NBCSP during the study period (1996-2010). The program invites women aged 50 years to 69 years to 2-view mammography every second year and is administered according to the European guidelines. The screening program started as a pilot in 4 counties in 1996 and became nationwide in 2005. The women are identified by a unique personal identification number given to all inhabitants of Norway. Information regarding attendance, screening outcome, and diagnostic workup was registered in the central nationwide database, with the personal identification number used as the unique identifier for each woman. We received an anonymized file with individual-level dates of invitations and attendances on all women targeted in the screening program. No ethical committee approval was necessary because we received anonymized data only.
The NBCSP uses independent double reading. An interpretation score ranging from 1 to 5 is given for both breasts and from both readers. A score of 1 indicates a negative screening examination whereas a score of 5 indicates a finding that is highly suspicious for malignancy. All mammograms with an interpretation score of ≥ 2 by one or both readers are discussed at a consensus/arbitration meeting to decide whether to recall the patient. Interpretation of the screening mammograms and the diagnostic workup take place at centralized breast clinics at university or county hospitals. Additional mammograms and ultrasound (noninvasive methods) are used to evaluate abnormal mammograms. If these methods are insufficient to rule out cancer, then an invasive procedure such as FNAC, CNB, or OB is performed. The diagnostic workup takes place 1 to 4 weeks after screening. If no malignancy is found, women are referred back to routine screening. Women who receive a diagnosis of breast cancer are referred for treatment. All malignancies (invasive carcinomas and ductal carcinoma in situ) are histologically verified.
Any recall for further assessments was considered a false-positive screening result if breast cancer was not diagnosed during the diagnostic workup (within 4 months), regardless of the procedures performed. We defined a false-positive screening result for a benign invasive procedure as any diagnostic workup including an FNAC, a CNB, or an OB with benign morphology. OB was defined as a diagnostic procedure including excision, incision, and marker biopsy. Women recalled due to insufficient technical quality or self-declared symptoms (< 0.5% for both together) were not included either in the nominator or denominator in the estimates of false-positive screening tests. We considered a woman as an irregular attendee if she missed her last screening invitation but attended after ≥ 4 years. Otherwise, she was considered a regular attendee.
Our estimates are based on all screening examinations performed on women aged 50 years to 51 years at the time of first screening in the 19 Norwegian counties. The women contributed data from the time of their first invitation until the end of follow-up (December 31, 2010). Data regarding up to a maximum of 6 screening examinations performed during the study period were used for estimation. Screening examinations for the seventh and eighth screenings were not used because they represented < 3% of the overall screening examinations, and therefore the estimates of false-positive risk for these screening rounds were imprecise. The probability and 95% confidence intervals (95% CI) for the risk of a false-positive screening result at each screening examination were estimated using generalized linear mixed models. The regression model included adjustment for year of the screening examination, taking the last year (2010) as the reference category, and a random intercept for county to allow for variation across counties in false-positive risk. Women were included in analyses only up to the time of their first false-positive result. The probability of a false-positive result at the ith examination (πi) was expressed as ln(πi/1−πi) = αi Di + β1 Xi + δ, in which Di is a vector of binary indicators denoting participating in the ith screening round. Di is equal to 1 if the woman participated in the ith screening examination and equals 0 otherwise. Xi is a mammogram-level covariate indicating the year in which the screening examination was performed. δ is a county-specific random effect to account for the correlation among screening tests performed in the same county. We reported the results for the county using the median false-positive risk. The models are described in detail by Singer and Willett.[14, 15]
Separate models were computed to estimate the probability of a false-positive screening result, the probability of any invasive procedure with a benign outcome, and the probability of a benign invasive procedure involving an FNAC, CNB, or OB, independently. We tested whether irregular attendees had a higher false-positive risk than regularly screened women by incorporating “irregular attendance” as an additional covariate in our regression model. The point estimates to calculate the cumulative risks of a false-positive screening result were performed assuming that the probability of experiencing a false-positive result in the 7th to the 10th screening examination was equal to that of the 6th examination. The cumulative risk of a false-positive result for each round up to the 10th screening examination was calculated by multiplying the probability of receiving a first false-positive test result at each round by the probability of receiving no false-positive test results at any previous round. Standard errors for the calculation of the 95% CIs for the cumulative risk probability were estimated using the Greenwood approximation. This approximation is based on the estimated probabilities and the observed sample size in the current study population. Standard errors based on the Greenwood formula will be inflated relative to true standard errors. To assess the possibility of dependent censoring, we conducted a sensitivity analysis in which cumulative false-positive risk was also estimated, conditional on the number of screening examinations a woman was observed to receive. Statistical significance was defined using a 2-sided α level of .05. Model parameters were estimated via residual pseudo-likelihood using the GLIMMIX procedure in SAS statistical software, version 9.1 (SAS Institute Inc, Cary, NC).
We analyzed information from 231,310 women aged 50 years to 51 years at the time of the initial screening examination in the NBCSP, contributing 715,311 screening examinations. A second screening examination was performed in 177,007 women (76.5%), 131,139 women (56.7%) underwent a third screening examination, and 30,077 women (13.0%) had a sixth screening examination (Table 1).
Table 1. Number of Women Screened and Percentages of Women Recalled for Further Assessment With a Negative Outcome by Screening Round and 3-Year Time Period in the Norwegian Breast Cancer Screening Program, 1996 Through 2010
The percentage of women with a false-positive screening result was higher at the time of the initial compared with subsequent screening examinations (Table 1). The overall crude false-positive rates decreased from 5.8% (95%CI, 5.7%-5.9%) at initial screening to 2.5% (95% CI, 2.5%-2.6%) at the second screening (Table 1). The highest crude false-positive rate, 6.9% (95% CI, 6.7%-7.1%), was observed in women receiving their first screening mammogram between 2008 and 2010 (Table 1). The overall crude rates of a benign invasive procedure decreased from 1.7% (95% CI, 1.6%-1.7%) at initial screening to 0.5% (95% CI, 0.5%-0.6%) at the second screening (Table 2). The highest crude rate of a benign invasive procedure, 1.9% (95% CI, 1.8%-2.0%), was observed in women receiving their first screening mammogram between 2008 and 2010.
Table 2. Number of Women Screened and Percentages of Women Recalled for Further Assessment Including An Invasive Procedure With a Benign Outcome by Screening Round and 3-Year Time Period in the Norwegian Breast Cancer Screening Program, 1996 Through 2010
Benign invasive procedure indicates a fine-needle aspiration cytology, core needle biopsy, or open biopsy with a benign outcome.
The estimated cumulative risk at 10 screening examinations for the cohort of women who initiated screening at ages 50 to 51 years was 20.0% (95% CI, 19.7%-20.4%) (Fig. 1). The cumulative risk of undergoing an invasive procedure with a benign outcome at 10 screening examinations for the same group of women was 4.1% (95% CI, 3.9%-4.3%).
A total of 6063 screened women (2.6%) underwent an invasive procedure with a benign outcome. FNAC constituted 2862 of the benign invasive procedures performed (47.2%), CNB represented 2498 (41.2%), and OB represented 703 of the benign invasive procedures performed (11.6%). The estimated cumulative risk of undergoing an FNAC, CNB, or OB with a benign outcome after 10 screening examinations for women initiating screening at ages 50 years to 51 years was 1.4% (95% CI, 1.3%-1.5%), 2.0% (95% CI, 1.9%-2.1%), and 0.16% (95% CI, 0.13%-0.19%), respectively.
We found that irregular screening attendees had a higher false-positive risk of a false-positive screening result (odds ratio, 1.12; 95% CI, 1.06-1.20), and a nonstatistically significantly higher risk of an invasive procedure with a benign outcome (odds ratio, 1.11; 95% CI, 0.98-1.26) compared with regularly screened women.
We evaluated the possible impact of dependent censoring on our cumulative false-positive risk estimates. The cumulative risk projecting the first 6 observed examinations up to 10 screening examinations was 20.5%, whereas the cumulative risk with the dependent censoring model was 19.9%. Furthermore, the cumulative risk of a benign invasive procedure was 5.4% based on the first 6 observed observations, and was 5.2% in the dependent censoring model.
We estimated that 1 in every 5 women who participates in the NBCSP will have a false-positive screening result over the course of 10 biennial screening examinations. Furthermore, we found that these women had a cumulative risk of undergoing an invasive procedure with a benign outcome of 4.1%. The results, which are based on nationwide data, confirm the results from a study published in 2004 for false-positive screening results, but are somewhat lower for an invasive procedure (4.1% vs 6.2%).
These results are in agreement with other studies performed in European service screening programs based on biennial screening in women aged 50 years to 69 years.[17-19] The risk of a false-positive screening result was estimated to be 20.4% in a study from Spain, which used the same regression models as the current study. For Copenhagen and Fyn, the cumulative risks were estimated to be 15.8% and 8.1%, respectively, in the study by Njor et al, whereas a letter to the editor by Puliti et al gave a cumulative risk of 15.2% after 7 screening rounds in Italy. The recent review by the Euroscreen Working Group, which included 4 countries, demonstrated a pooled estimate of 19.7%. To the best of our knowledge, only 3 studies in Europe have estimated the cumulative risk of undergoing an invasive procedure with a benign outcome. The estimates ranged from 1.8% in Spain to 8.5% in the United Kingdom.[10, 18]
False-positive risks estimates from the United States are substantially higher than those from Europe, ranging from 42% to 61% for false-positive results[7-9] and from 4.8% to 18.6% for a false-positive biopsy result after 10 screening examinations.[8-10] The differences have been attributed to the screening setting and practice environment. In Europe, breast cancer screening is population-based and all women aged 50 years to 69 years are invited every second year, whereas opportunistic screening of women aged ≥ 40 years using 1-year to 2-year screening intervals is the most common screening practice in the United States.[10, 20-22] Furthermore, the recall rate might be influenced by different reading procedures (independent double reading with consensus in Europe and usually single reading in the United States) or different interpretive volumes, with a recommendation to read at least 5000 screening mammograms per year in Europe compared with the requirement to read at least 960 mammograms every 2 years in the United States. Approximately 40% of the radiologists reading screening mammograms in Norway reach the European volume standard and are specialized in mammography, whereas most mammograms in the United States are read by general radiologists who interpret a wide range of imaging types. In addition, the recommended maximum level of recalls is 3% for subsequent screens under European guidelines while it is 5% to 12% in the United States,[25, 26] which may be due to medico-legal consequences in case of missed cancers at screening. Furthermore, because screening in the United States tends to be opportunistic, US women may be more likely to attend screening at multiple facilities than their European counterparts. This could influence the false-positive rate if comparison films are not made available. The underlying incidence cancer rate is the same in Europe (77 per 100,000) and the United States (76 per 100,000) and thus is unlikely to influence the false-positive rate.
The estimated cumulative risk of undergoing an invasive procedure with a benign outcome decreased from 6.2% in the previous study from the Norwegian program to 4.1% in the current study. The difference most likely is due to the performance of fewer FNACs of cysts during the last years compared with the time of the initiation of the program. In addition, in Norway, as in many other countries, CNB has replaced FNAC over the years due to its higher sensitivity for detecting breast cancer. Undergoing an invasive procedure is assumed to have a greater psychological impact than having additional mammography images and/or ultrasound; thus, the rate should be kept as low as possible while maintaining adequate cancer detection. The increased breast cancer risk observed in women with a false-positive recall assessment, even > 6 years after the recall, underscores the importance of a complete assessment for any kind of breast abnormality.
We also found a decrease in the cumulative risk of an OB with a benign outcome from 0.9% in the previous study to 0.2% in the current study. This is likely due to the movement toward performing CNB instead of surgical biopsies as a first invasive diagnostic approach, with women undergoing surgical biopsy only if the CNB result is inconclusive. Surgical biopsies are assumed to have a high positive predictive value, but they also cause psychological stress for the women and more significant scarring than a CNB. A surgical biopsy could be performed for either the diagnosis or treatment of the breast malignancy.
The regression approach we used for estimation is appropriate for studying false-positive screening results, accounting for multiple adjustment variables and changes over time in the absence of dependent censoring. Risk studies from the United States have identified an association between the number of screening examinations and the risk of a false-positive test result.[9, 32] Therefore, we computed the cumulative risk of a false-positive screening result, accounting for dependent censoring, and found little difference in the false-positive risks estimates, suggesting that censoring was independent in our setting.
The availability of > 700,000 screening examinations from > 230,000 women aged 50 years to 51 years at the time of first screening and a study period of 15 years provide robust estimates for the cumulative risk of a false-positive screening examination, including recalls for different procedures. However, no women had the possibility of receiving 10 invitations during the study period, which led us to base our estimates on data from 6 rounds instead of 10. The current study is based on data from a population-based screening program with an attendance rate of 77% of the invited women, and in which 84% of eligible women had attended at least 1 screening examination during the study period. Estimating the cumulative probability of a false-positive result after 10 screening examinations is important for quantifying the potential harms of a screening program if a woman receives all recommended screens.
We estimated that approximately 1 in 5 women undergoing biennial mammography screening from ages 50 years to 69 years will have at least 1 false-positive screening result during that 20-year period, and < 5% will undergo an invasive procedure with a benign outcome. False-positive screening results are an unavoidable part of breast cancer screening and some risk of false-positive results must be accepted for adequate cancer detection. Undergoing an invasive procedure with a benign outcome does not mean that the biopsy was unnecessary, because some mammography findings require a biopsy to determine whether they are benign or malignant. The harm of false-positive recalls must be balanced against the goal of maintaining reasonable detection of early-stage cancers. There is a need for further knowledge regarding the recalls of patients with negative outcomes, and how to reduce their associated harms.
No specific funding was disclosed.
CONFLICT OF INTEREST DISCLOSURES
Dr. Hubbard has received grants from the National Institutes of Health and GE Healthcare. Dr. Hofvind and Ms. Sebuodegard are employed at the Cancer Registry of Norway, which administers the Norwegian Breast Cancer Screening Program.