Cumulative risks of false positive recall and screen‐detected breast cancer after multiple screening examinations

Women tend to make a decision about participation in breast cancer screening and adhere to this for future invitations. Therefore, our study aimed to provide high‐quality information on cumulative risks of false‐positive (FP) recall and screen‐detected breast cancer over multiple screening examinations. Individual Dutch screening registry data (2005‐2018) were gathered on subsequent screening examinations of 92 902 women age 49 to 51 years in 2005. Survival analyses were used to calculate cumulative risks of a FP and a true‐positive (TP) result after seven examinations. Data from 66 472 women age 58 to 59 years were used to extrapolate to 11 examinations. Participation, detection and additional FP rates were calculated for women who previously received FP results compared to women with true negative (TN) results. After 7 examinations, the cumulative risk of a TP result was 3.7% and the cumulative risk of a FP result was 9.1%. After 11 examinations, this increased to 7.1% and 13.5%, respectively. Following a FP result, participation was lower (71%‐81%) than following a TN result (>90%). In women with a FP result, more TP results (factor 1.59 [95% CI: 1.44‐1.72]), more interval cancers (factor 1.66 [95% CI: 1.41‐1.91]) and more FP results (factor 1.96 [95% CI: 1.87‐2.05]) were found than in women with TN results. In conclusion, due to a low recall rate in the Netherlands, the cumulative risk of a FP recall is relatively low, while the cumulative risk of a TP result is comparable. Breast cancer diagnoses and FP results were more common in women with FP results than in women with TN results, while participation was lower.

women to make an informed choice about participation. In this breast cancer screening nationwide registry study using 13 years of follow-up data from the Netherlands, the cumulative risk of a false-positive recall was relatively low, while the cumulative risk of a true-positive result was comparable to that in other European countries. The rates of screen-detected and interval cancers and false-positives were higher in women who had received false-positive results than in women with true-negative results, while their participation was lower.

| INTRODUCTION
Population-based breast cancer screening programmes have been shown to reduce breast cancer mortality by detecting breast cancers earlier. 1 Because of this, many Western countries have implemented a national or regional breast cancer screening programme for their citizens. 2 Within these programmes, women between age 50 and 69, but sometimes also slightly younger or older, are invited for breast cancer screening annually, biennially or triennially. 2 This means that these women are invited to participate in multiple breast cancer screening examinations during their life.
In addition to the reduction in breast cancer mortality, the detection of earlier stage cancers also leads to less invasive treatment, potentially leading to an increase in quality of life. 3 However, breast cancer screening is also associated with harms including overdiagnosis and false positive (FP) screening results. 4,5 Many studies investigated the extent of these harms to be able to weight them against the benefits, but also to be able to inform the invited women so they can make an informed decision whether to participate or not. In these studies the extent of overdiagnoses and FP results of a screening programme were found to differ substantially between countries. 6,7 These differences in FP rate can mainly be attributed to differing aspects of the screening programmes, such as programme organisation (ie, extent of centralisation, single vs double reading, experience of radiologists, screening interval and age of the population invited) and cultural factors (ie, risk aversion and litigation culture). 6,7 For example, the specificity of subsequent breast cancer screening examinations in Denmark is considerably higher than that in the United States (US; 99% compared to 92%) and Denmark, thus, has a substantially lower FP rate. 7 In the Netherlands, women are invited for breast cancer screening with digital mammography biennially between the ages of 50 and 74.
The programme has a relatively low recall rate of 2.4% which leads to a FP rate of 1.7%. 8 This percentage also represents the average risk of a screening test resulting in a FP result. However, since women are invited up to 13 times in their lives, it is important to provide highquality information on the cumulative risks over multiple screening examinations to enable women to make an informed decision about participating. Re-attendance is high in the Netherlands and it is suggested that most women make a decision about participation and adhere to this decision for future invitations. 8,9 Therefore, presenting risks over multiple screening examinations is crucial to enable women to make an informed choice. Furthermore, the rate of FP results per true positive (TP) result gives an indication of the balance between short-term screening benefits and harms in a specific screening programme. It is known that this rate is higher in the initial screening examination than in subsequent examinations. 8,10 However, it is uncertain what this rate will be over multiple examinations cumulatively.
Several studies have analysed cumulative risks of FP results over multiple screening examinations in different countries and found ranges from 8% to 61% over 10 examinations for women with an average breast cancer risk. [11][12][13][14][15][16] The biggest difference was seen when comparing results from studies in the United States to those in Europe, due to the difference in screening interval and recall rate.
Within Europe, where screening intervals and recall rates are more comparable, only a few countries calculated cumulative risks. Despite this comparability in programme, the cumulative risks still ranged between 8% and 23% over 10 screening examinations. [11][12][13][14] Specifically for the Dutch breast cancer screening programme, analyses were performed for 13 examinations which resulted in a cumulative risk of FP results ranging of 16.1%. 17 In our study, data from women starting screening in 1975 to 1976 were used and data on five screening examinations from women starting screening in 1997 were extrapolated using the data from 1975 and incorporating the expected effect of digital mammography. However, in the meantime, changes have been made in the programme such as the introduction of digital mammography, the implementation of two-view mammography in both initial and subsequent screening examinations and changes to the referral strategy including the use of the Breast Imaging-Reporting and Data System (BI-RADS) categories which affected the amount of FP results. 3,18,19 Furthermore, international studies found that women who previously had a FP result are more likely to be diagnosed with breast cancer later on. [20][21][22][23] The reported hazard rates (HRs) and relative risks (RRs) were between 1.67 and 2.18 for women who previously had a FP result and increased to HRs between 4.22 and 9.13 for women who had multiple FP results. [20][21][22][23] Risks of both screen-detected and interval cancers were found to be increased and remained higher until 12 years after receiving the FP result. 23 This suggests that there might be some underlying biological susceptibility that causes some of the excess cancer risk in women with a FP test. 20 However, since FP rates differ between countries, it can be expected that the population of women with a history of a FP result and their risk factors are different as well. Therefore, it is unclear if, and to what extent FP results in the Dutch breast cancer screening programme lead to an increased risk of a breast cancer diagnosis. This is especially relevant since women were found to be less likely to participate in screening after a FP result in the Dutch breast cancer screening programme. 24 Therefore, our study aimed to estimate the cumulative risk of false positive recall and screen-detected breast cancer after multiple screening examinations in the Netherlands using more recent data.
Furthermore, our study aimed to investigate screening behaviour and outcomes in women with a history of FP results.

| METHODS
The population-wide breast cancer screening programme in the Netherlands started in 1990 with biennial mammography screening for women aged 50 to 69. In 1998, this age-range was extended to also include women aged 70 to 74. Initially, screen-film mammography was used, but this was gradually replaced for full-field digital mammography between 2003 and 2010. Mammographic examinations were performed by specialised radiographers who checked the images and immediately repeat examinations in case of vagueness or incompleteness. Independent double reading is performed by specialised screening radiologists who use the BI-RADS system to classify mammograms. In the Netherlands, women with a BI-RADS score of 0, 4 and 5 are referred for follow-up testing. 18 BI-RADS 3 is not used.

| Data collection
Data were retrieved from the Dutch Cancer Registry (NKR) at the Netherlands Comprehensive Cancer Organisation (IKNL). The dataset included data on screening invitations, participation and outcomes.
Furthermore, the age of the women at each screening examination was included. Participation was defined as a screening test registered after a screening invitation and before the sending of the invitation of the subsequent examination (ie, 24 months).
At the start of the screening programme, screening data were stored in multiple regional screening registries. More recently, the data was brought together in a national database. However, due to differences in registries, data from before 2005 were incomplete which made the data unreliable for this analysis. Therefore, we chose to only include data from screening invitations sent from the year 2005 onwards.
During the time period 2005 to 2019, women who regularly received biennial breast cancer screening invitations could have been invited for breast cancer screening seven or eight times. Women who moved to another municipality in the meantime could have received more or less invitations and women who permanently unregistered for breast cancer screening or had breast cancer received less invitations.

| Population
This longitudinal, observational cohort study included two cohorts of women who were invited for breast cancer screening. The first cohort

| Statistical analyses
Because data were only available from 2005 onwards, the analyses included seven consecutive screening examinations of the 13 that were offered in the Dutch breast cancer screening programme. However, by using the data of a second cohort of older women, extrapolation was possible until 11 examinations of screening ( Figure 1). 8.9%-9.4%) and of a TP result was 3.7% (95% CI: 3.6%-3.9%) ( decreased after an increasing number of examination and an increase in age of the women (2.5 after 7 examinations and 1.9 after 11). During the first examination the highest percentage of FP results was found, after which the increase in cumulative risk seemed to follow a less steep linear trend (Figure 2). After a relative high number of TP results during the first examination, the cumulative risk increased more slowly followed by an increasing steepness during later examinations at higher age.
Participation in the screening examination following a TN screening result was found to be over 90%, independent of the examination in which the TN result was received (Figure 3). When a FP result was received in the first screening examination, participation in the second examination was 71%. However, the later the FP result was received, the higher the participation rate in the subsequent examination with a maximum of 81% participation in examinations 6 and 7. Even though the participation rate increased as the FP was received later, it was always lower than when a TN result was received. A study in Finland found a cumulative risk of a screen-detected breast cancer of 3.4% over 7 screening examinations and 5.7% over 10 examinations with the highest risk in women with a history of breast cancer symptoms. 11 Furthermore, a Spanish study found that women with a history of benign breast disease had a cumulative risk of 3.6%, women with a family history of breast cancer had a cumulative risk of 4.5%, women with both had a risk of 6.1% and women with neither had a cumulative risk of 2.6% over seven screening examinations. 25 The weighted average of these four groups would come down to a cumulative risk of 3.0%. Compared to our results on TPs, the Finnish and Spanish risks are a little lower. A reason for this is the lower breast cancer incidence in both countries compared to the Netherlands. 26 However, also differences in screening detection performance may play a role. 27 A previous study on the Dutch breast cancer screening programme predicted that the cumulative risk of a screen-detected breast cancer after implementation of digital mammography would be 7.1% over 13 examinations of screening. 17 Our study already found a cumulative risk of 7.1% after 11 examinations.
The increase in cumulative risk can probably be explained by the usage of data from a more recent cohort of women who have a higher risk of developing breast cancer. 28 Only a few studies present the cumulative risk of a FP result after seven examinations of breast cancer screening. A study in Spain found cumulative risks between 20.7% and 34.3% depending on family history and previous benign breast disease, an Italian study found a cumulative risk of 15.2%, and a Finnish study found a cumulative risk of 13.6% to receive a FP result after seven examinations. 11,25,29 All three estimates are higher than the cumulative risk of 9.1% that we found after seven examinations in the current study. More European studies reported cumulative risks after 10 screening examinations and found estimates between 8% and 23%. [11][12][13][14] Only the cumulative risk of 8% found in the region of Fyn in the Danish study was lower than the 12.6% the current study found after 10 examinations of screening. 14 The estimate for the Copenhagen region and the other studies were all higher than the 12.6% we found over 10 examinations and also higher than the 13.5% we found for 11 screening examinations.
In addition, two American studies found even higher cumulative risks between 38.1% and 42% after five examinations of biennial screening and between 56.3% and 61.3% after 10 examinations of annual screening. 15,16 The considerable difference between most European countries and the US can probably be explained because of the lower recall rate in most European countries compared to the US. 7 Even within Europe, recall rates differ and can explain the differences between countries, but also differences in calendar year of data used could have an influence. 27 Additionally, the study by Ho et al found that screening with digital breast tomosynthesis (DBT) instead of digital mammography can decrease the cumulative risk of a FP results by 6.7% point in annual and 2.4% point in biennial screening. 16 In the US DBT is used in a proportion of the screening settings, while in Europe DBT is hardly used in screening, which can also explain part of the difference between the estimates in the US and Europe.
Overall, the cumulative risks on FP and TP results in the Netherlands are relatively favourable compared to other countries.
Despite the lower recall rate in the Netherlands, the cumulative risk of a screen-detected breast cancer remains quite comparable. This is an indication that the lower recall rate did not compromise the detection rate. This was also reflected in the low FP/TP ratios found, com- Participation among women with a TN result was found to be high, over 90%, which is in line with the participation loyalty in the monitors of the Dutch screening programme. 8 Among women with a TP result, participation was found to be lower. This was also found in two previous studies in the Netherlands which found even lower participation rates of around 65% among women with a FP result compared to 93% to 95% among women with a negative screening result. 9,24 However, Setz-Pels et al also found that nearly 30% of women with a FP result had follow-up surveillance in the hospital, which suggested that the mammography coverage, that is, screening coverage and hospital surveillance combined, in women with a history of FP results would be almost as high as for women with a negative screening result. 9 On the other hand, a study in Copenhagen did not find any difference between women with a negative and women with a FP screening result in their participation rate in the next round. 32 Interestingly, Chiarelli et al found that, in Ontario, reattendance of previously FP women was lower in screening centres without an assessment programme, like the policy in the Netherlands, and equal to negative women in centres with an assessment programme, like the policy in Denmark. 33  this effect was expected to be relatively small since the age difference was only 9 years. 28,34 Even though the extrapolation is less precise than analysis based on observed data, the benefit of this was that the cumulative risks are more applicable to women eligible for screening in current times, because screening performance has changed due to the implementation of digital mammography and because the breast cancer risk has increased over the years. 28,35 Given that most women in the Netherlands seem to make a fundamental decision about participation in breast cancer screening and adhere to this decision for future invitations, it is important that this decision is based on information encompassing benefits and harms of participation in multiple screening examinations. In addition, providing stratified information on increased risks in women who previously had a FP outcome may give them insights into their personal risk of developing breast cancer and may potentially increase their participation to the screening programme. Furthermore, in the prospect of risk stratified screening, it may be useful to include history of FP results into consideration when forming risk groups.
To conclude, we found that women who participate in the Dutch