Minimizing misclassification of hormone users at mammography screening
Article first published online: 24 NOV 2008
Copyright © 2009 Wiley-Liss, Inc.
International Journal of Cancer
Volume 124, Issue 9, pages 2159–2165, 1 May 2009
How to Cite
Njor, S. H., Pedersen, A. T., Schwartz, W., Hallas, J. and Lynge, E. (2009), Minimizing misclassification of hormone users at mammography screening. Int. J. Cancer, 124: 2159–2165. doi: 10.1002/ijc.24181
- Issue published online: 24 FEB 2009
- Article first published online: 24 NOV 2008
- Manuscript Accepted: 28 OCT 2008
- Manuscript Received: 11 JUL 2008
- Danish Medical Research Council
- hormone therapy;
- false positive test;
- interval cancer
The aim of the study was to retrospectively determine the impact of comparing current mammograms with prior mammograms on risk of misclassification especially for hormone users. Data on mammography screening were retrieved for 1993–2005 from Fyn, Denmark. At first screen, two projections were made; at subsequent screens, one projection for fatty and two projections for mixed/dense breasts. Until June 3, 2002, 2-year-old mammograms were used for comparison, and later 4-year-old mammograms. Prescription drug data were used to identify hormone, hormone therapy (HT), use. False positive risk and interval cancer proportion dependency on age, hormone use, screen number, projection and prior mammogram were tested with logistic regression. Controlled for breast density, current HT-users had a lower risk of a false positive test 0.69 (95%CI 0.55–0.86) and a lower interval cancer proportion 0.66 (95%CI 0.45–0.99) when 4-year-old instead of 2-year-old mammograms were used for comparison. The use of 4-year-old instead of 2-year-old mammograms for comparison lowered the risk of false positive test in never users, but otherwise age of comparison mammogram had no impact on classification of never and past users of HT. The study indicated that misclassification at screening mammography in current users of HT can be reduced considerably, when the screening mammograms are viewed with the mammograms taken 4 years earlier. It should be stressed that these results come from a single clinic, and replication in other observational and/or experimental studies is warranted. © 2008 Wiley-Liss, Inc.
In breast cancer screening with mammography, a certain proportion of participants will inevitably experience a misclassification i.e. a false positive test or an interval cancer. Misclassification is a serious problem both for the individual woman and for the screening programme. Many efforts have therefore been made to reduce misclassification. Roelofs et al. found that viewing current mammograms with prior mammograms reduced referrals due to benign lesions.1 To minimize misclassification, radiologists in the United States are advised to obtain prior mammograms when “the interpreting physician deems it necessary.”2 Radiologist in Europe are advised, if ever possible, to display previous mammograms at the time of screen reading, but it is a matter of personal choice whether these films are from the immediately previous or a prior screening round.3
Many users of hormone therapy (HT) continue to have dense breast tissue in postmenopausal age,4, 5 thus complicating mammography screening. Several studies have shown that the risk of a false positive test as well as a false negative test is increased in HT users compared to that of non-users.4, 6–10 In the Million Women's Study, current HT users had a 64% increased risk of getting a false positive mammogram compared to the risk in never users.8 The sensitivity was 83% among current users and 91% among never users. However, as described above, different viewing procedures may be used in mammography screening. In 2002, the radiologist in charge of the mammography screening programme in Fyn, Denmark felt that growth of minor abnormalities might be easier to see if 4-year-old instead of 2-year-old mammograms were used for comparison in the film reading. We have tested whether or not this assumption holds true for users and non-users of HT by combining data from the population-based mammography screening programme in Fyn with the population-based drug prescription register covering the same population. Mammography screening started in Fyn in November 1993, and the drug prescription register has been population-based since November 1992. The two registers can be linked by use of unique personal identification numbers.
Throughout the study period the same technical equipment has been used and the same radiologist had been in charge of the mammography screening programme in Fyn (WS). This ensured continuity together with the fact that the clinic had been in operation for almost 10 years before the reading procedure was changed. The studied setting is therefore an optimal one for evaluating the impact of using 4-year-old instead of 2-year-old mammograms for comparison.
Material and methods
Mammography screening registers for the county of Fyn, Denmark
The organized mammography screening programme in the county of Fyn, Denmark started in November 1993. The programme is population based and offers biennial screening to women aged 50–69 years. Women living in and around the main city, Odense are screened at a central clinic, while women living in other parts of Fyn are screened in a mobile unit. Independent double reading is standard, and all women have two-view mammography at the first screen. Breast tissue density is dichotomized based on the outcome of the first screen. Women with very fatty breasts will be scheduled for one-view mammography at the next screen, while women with mixed/dense breasts—about 60%—will be scheduled for two-view mammography.11 The same procedure is followed at subsequent screens. At screen reading new mammograms were from the second screen onward always compared to previous mammograms. Until June 3, 2002, mammograms were compared to those taken 2 years earlier (2y mammograms). After this date, mammograms were compared to those taken 4 years earlier (4y mammograms).
From the screening programme, we collected data on all screens from the first six invitation rounds from November 1993 to December 2005. Data included information on personal identification number (including date of birth), date of screen, number of projections, outcome of screening mammogram, outcomes of further assessment and operation when undertaken. All mammograms were read in one clinic headed by the same radiologist (WS) for the entire study period. A screening mammogram was defined as positive if it was equivalent to BI-RADS categories 4 and 5. A false positive test was defined as a positive screen where the assessment and/or operation did not show invasive breast cancer or ductal carcinoma in situ, DCIS. The false positive risk was defined as number of false positive tests divided by number of screened women.
Interval cancer data
The screened women were followed-up for interval cancers from the date of the negative finding until date of next screen or a maximum of 2 years, whichever date came first. Information on interval cancers was retrieved from the registers of the Danish Breast Cancer Cooperative Group. The interval cancer proportion was defined as number of interval cancers divided by number of screen detected cancers plus number of interval cancers.
Drug prescription register for the county of Fyn, Denmark
The Odense University Pharamacoepidemiological Database, OPED, collects information on reimbursed prescription drugs.12 OPED covered the city of Odense and suburbs from October 1990 (210,000 inhabitants), while coverage for the rest of Fyn increased to 88% by the end of 1991, and was complete by November 1992 (468,000 inhabitants). The completeness of the prescription register is excellent for reimbursed prescription drugs.12 The register includes information on personal identification number of the patient, date of purchase, drug code according to the Anatomical Therapeutical Chemical (ATC) Classification System,13 commercial name and quantity measured in grams, number of units (e.g. pills) or ml of e.g. gel.
All but two HT drugs were reimbursed in the period October 1990 to December 2005. On the basis of public drug statistics on number of women in the county of Fyn treated with one of the two missing drugs from 1997 and onwards,14 we estimated that information on hormone use was missing for less than 1% of screened women included in the present study.
HT drugs included were as follows: oestrogen, combined oestrogen-progesterone, SERM and tibolone. To include combined oestrogen-progesterone treatment when bought separately, we included progesterone, when bought at the same time as systemically administered oestrogen.
Linkage of registers
The three registers were linked by use of the personal identification numbers. For each screened woman, information was collected on purchased HT drugs prior to any of the maximum of six screens. A never user was a woman not registered with any purchase of HT before the date of screening. A woman was defined as current user of HT, if she had purchased HT before attending screening in a quantity that would last at least until 14 days before the date of screening, and she was defined as past user of HT, if she had purchased HT before attending screening, but in a quantity that would not last until 14 days before the date of screening.
Models and tests—Analysis of false positive tests
As the false positive risk in the screening programme depended strongly on screen number, being considerably higher for the first than for subsequent screens, we distinguish between these.15 We furthermore divided the subsequent screens into those from fatty and mixed/dense breasts, and those compared to 2y or 4y mammograms. We have previously shown that the outcomes of subsequent screens are independent.15 This was found for 2 mammography screening programmes in Fyn and Copenhagen, Denmark, respectively. The finding of independence is not surprising, as the new mammograms are compared to old ones, and suspicious benign finding are therefore only reacted on once. For every woman we therefore included all screens in the analysis, and assumed independence between the observations. The number of false positive tests in women belonging to a specific age, user group, screen number, breast density and comparison group follows a binomial distribution. To be able to separate the effect of postmenopausal HT use from the effect of endogenous hormones, we restricted the analysis to women aged 55–69 years, assuming the huge majority of these women to be postmenopausal. Using logistic regression, we tested whether the false positive risk depended on age (55–59, 60–64 or 65–69 years), comparison group (no, 2y, or 4y), breast density (mixed/dense or fatty), screen number (1, 2, 3, 4, 5, 6) and user group (current, past, never). The proportion of women with mixed/dense breasts was higher during the 4y period than during the 2y period. This could be due to a real change in the density distribution, and to take account of this we adjusted for breast density when we compared the false positive risks between the 4y and the 2y periods. The higher proportion of women with mixed/dense breasts during the 4y period could, however, also be due to a drift over time in the classification practice, where the underlying biological distribution was actually stable. As this latter explanation cannot be ruled out, we compared also the false positive risks between the 4y and the 2y periods unadjusted for breast density.
Models and tests—Analysis of interval cancer proportion
As previous studies have shown that false negative tests depend on breast density, screen number and user group,6, 8, 9 we included these as variables in the analysis of the interval cancer proportion. In our data, no woman had more than one breast cancer. We therefore included all screen detected and interval cancers in the analysis. The model for analyzing the interval cancer proportions are thereby equal to the model we used for analyzing false positive tests.
From November 1993 until the end of the sixth invitation round in December 2005, 271,367 screens were undertaken in women in the target group 50–69 years, Table I. In total, 75,581 women were screened for the first time; 61,375 women for the second time; 49,786 women for the third time; 38,708 women for the fourth time; 28,525 women for the fifth time and 17,392 women for the sixth time. Of the 271,367 screens, 84,996 were undertaken in women aged 50–54 on the date of screening; 75,102 in women aged 55–59; 60,883 in women aged 60–64 years and 50,386 in women aged 65–69. In total, 169,903 screens were undertaken in never HT users; 53,659 in current HT users, and the remaining 47,805 screens were undertaken in past HT users.
|Screen number||Age group||HT user group||Total|
|50–54||55–59||60–64||65–69||Never users||Current users||Past users|
|First||43837 (58.0%)||12104 (16, 0%)||10476 (13.9%)||9164 (12.1%)||52952 (70.1%)||14467 (19.1%)||8162 (10.8%)||75581 (100%)|
|Second||30538 (49.8%)||12124 (19.8%)||10056 (16.4%)||8657 (14.1%)||39147 (63.8%)||13031 (21.2%)||9197 (15.0%)||61375 (100%)|
|Third||10601 (21.3%)||20679 (41.5%)||10145 (20.4%)||8361 (16.8%)||29864 (60.0%)||10831 (21.8%)||9091 (18.3%)||49786 (100%)|
|Fourth||19 (0.1%)||20356 (52.6%)||10019 (25.9%)||8314 (21.5%)||22324 (57.7%)||7918 (20.5%)||8466 (21.9%)||38708 (100%)|
|Fifth||1 (0.0%)||9488 (33.3%)||10785 (37.8%)||8251 (28.9%)||16047 (56.3%)||5268 (18.5%)||7210 (25.3%)||28525 (100%)|
|Sixth||0 (0.0%)||351 (2.0%)||9402 (54.1%)||7639 (43.9%)||9569 (55.0%)||2144 (12.3%)||5679 (32.7%)||17392 (100%)|
|Total||84996 (31.3%)||75102 (27.7%)||60883 (22.4%)||50386 (18.6%)||169903 (62.6%)||53659 (19.8%)||47805 (17.6%)||271367 (100%)|
False positive risk
Age and screen number did not significantly affect the false positive risk, and the only significant interaction was between comparison group and user group. It might seem odd that screen number was not a significant variable as the false positive risk is known to be considerable higher at first than at subsequent screens. This is, however, due to the fact that the distinction between incident or prevalence screen is included in the variable comparison group, where “no” is by definition prevalent screens.
Current HT users systematically had an excess relative risk of false positive tests compared with never users, Table II. The risk was increased by about 50% at the first screen, RR 1.49 [95%CI 1.22–1.81]. The risk was doubled at all the subsequent screens, RR 2.19 [95%CI 1.62–2.96] for fatty breasts, 2y mammogram; RR 2.30 [95%CI 1.91–2.78] for mixed/dense breasts, 2y mammograms; RR 2.28 [95%CI 1.21–4.30] for fatty breasts, 4y mammogram; and RR 1.91 [95%CI 1.47–2.49] for mixed/dense breasts, 4y mammogram. The risk of false positive tests in past HT users did not differ significantly from that of never users in any of the comparison groups. Where 2y mammograms were used in the viewing, never users with mixed/dense breasts had a significantly higher false positive risk than those with fatty breasts, p = 0.012. The same was true for current HT users, p = 0.026, while breast density had no impact on the false positive risk in any of the other groups.
|Comparison type/Breast density||Never users||Current HT users||Past HT users|
|False positive||Total||%[95%CI]/p for trend||False positive||Total||%[95%CI]/p for trend||RR2 [95%CI]||False positive||Total||%[95%CI]/p for trend||RR2 [95%CI]|
|All||357||22722||1.57% [1.41–1.73]||145||6129||2.37% [1.99–2.75]||1.49 [1.22–1.81]||45||2893||1.56% [1.10–2.01]||0.97 [0.71–1.33]|
|Fatty||149||27401||0.54% [0.46–0.63]||63||5302||1.19% [0.90–1.48]||2.19 [1.62–2.96]||42||6107||0.69% [0.48–0.90]||1.25 [0.88–1.77]|
|Mixed/dense||217||30618||0.71% [0.61–0.80]||236||14716||1.60% [1.40–1.81]||2.30 [1.91–2.78]||62||9584||0.65%[ 0.49–0.81]||0.93 [0.70–1.23]|
|Trend for density||p = 0.012||p = 0.026||p = 0.790|
|Fatty||39||9134||0.43% [0.29–0.56]||13||1322||0.98% [0.45–1.52]||2.28 [1.21–4.30]||16||3366||0.48% [0.24–0.71]||1.11 [0.62–1.99]|
|Mixed/dense||141||24981||0.56% [0.47–0.66]||93||8645||1.08% [0.86–1.29]||1.91 [1.47–2.49]||87||13451||0.65% [0.51–0.78]||1.15 [0.88–1.50]|
|Trend for density||p = 0.104||p = 0.780||p = 0.229|
As breast density had an impact on the false positive risk in some of the groups, we adjusted for breast density when calculating relative risk of false positive test for never HT users, current HT users and past HT users by type of mammogram used for comparison, Table III. Adjusted for breast density, use of 4y mammograms as compared with use of 2y mammograms was associated with a significant reduction in the false positive risk for both never users, RR 0.79 [95%CI 0.66–0.95]; and current HT users, RR 0.69 [95%CI 0.55–0.86]; while no difference was found for past users, RR 0.91 [95%CI 0.69–1.20], Table IV. However, to take account of a possible drift over time in the density classification practice, we calculated also the relative risks unadjusted for breast density. These relative risks were for never users 0.84 [95%CI 0.70–0.9992], for current HT users, RR 0.71 [0.57–0.89], and for past users, RR 0.92 [0.70–1.21]. The adjustment for breast density did, as seen, only change the relative risk estimates very slightly.
|Type of mammography used for comparison||False positive||Total||False positive % of total||RR (95% CI)Unadjusted for breast density||RR (95% CI)Adjusted for breast density|
|2y mammogram = baseline||366||58019||0.63%||1||1|
|4y mammogram||180||34115||0.53%||0.84 [0.70–0.9992]||0.79 [0.66–0.95]|
|Current HT users|
|2y mammogram = baseline||299||20018||1.49%||1||1|
|4y mammogram||106||9967||1.06%||0.71 [0.57–0.89]||0.69 [0.55–0.86]|
|2y mammogram = baseline||104||15691||0.66%||1||1|
|4y mammogram||103||169817||0.61%||0.92 [0.70–1.21]||0.91 [0.69–1.20]|
|Comparison type/Breast density||Never users||Current HT users||Past HT users|
|Interval cancer||All cancer||%[95%CI]/p for trend||Interval cancer||All cancer||%[95%CI]/p for trend||RR2 [95%CI]||Interval cancer||All cancer||%[95%CI]/p for trend||RR2 [95%CI]|
|All||26||285||9.1% [5.8–12.5]||39||115||33.9% [25.3–42.6]||3.72 [2.26–6.11]||3||29||10.3% [0–21.4]||1.13 [0.34–3.75]|
|Fatty||34||110||30.9% [22.3–39.6]||12||32||37.5% [20.7–54.3]||1.21 [0.63–2.34]||3||21||14.3% [0–29.3]||0.46 [0.14–1.50]|
|Mixed/dense||61||272||22.4% [17.5–27.4]||91||227||40.1% [33.7–46.5]||1.79 [1.29–2.47]||25||81||30.9% [20.8–40.9]||1.38 [0.86–2.19]|
|Trend for density||p = 0.084||p = 0.968||p = 0.131|
|Fatty||11||46||23.9% [11.6–36.2]||2||6||33.3% [0–71.1]||1.39 [0.31–6.29]||2||14||14.3% [0–32.6]||0.60 [0.13–2.70]|
|Mixed/dense||50||215||23.3% [17.6–28.9]||30||115||26.1% [18.1–34.1]||1.11 [0.70–1.76]||38||136||27.9% [20.4–35.5]||1.21 [0.79–1.85]|
|Trend for density||p = 0.924||p = 0.696||P = 0.284|
The false positive risk in current HT users was fairly similar for those also being current HT users when the comparison mammogram was taken, 1.49% (247/16565) and 0.98% (75/7641), for 2y mammograms and 4y mammograms, respectively; and for those being never HT users at that time, 1.52% (11/724) and 1.03% (9/877), for 2y mammograms and 4y mammograms, respectively, though the numbers were fairly small for those with a changed HT status (data not shown).
The size of the false positive risk varied from 2.37% for current HT users at the prevalent screen to 0.43% for never users at subsequent screens with fatty breasts and 4y mammograms, Figure 1.
Interval cancer proportion
Age, screen number and breast density did not significantly affect the interval cancer proportion, and the only significant interaction was between comparison group and user group. It might seem odd that screen number was not a significant variable as the interval cancer proportion is known to be considerable higher at first than at subsequent screens. This is, however, due to the fact that the distinction between incident or prevalence screen is included in the variable comparison group, where “no” is by definition prevalent screens.
Compared to never users, current HT users had a higher interval cancer proportion at the first screen, RR 3.72 [95%CI 2.26–6.11], Table IV. The interval cancer proportion was increased at all the subsequent screens, but the increase was only significant for mixed/dense breasts were 2y mammograms where used, RR 1.79 [95%CI 1.29–2.47]. The interval cancer proportion among past HT users did not differ significantly from that of never users in any of the comparison groups. Breast density had no significant impact on the interval cancer proportion.
As breast density had a nearly significant impact on the interval cancer rate in some of the groups, we adjusted for breast density when calculating relative risk of interval cancer rate for never HT users, current HT users and past HT users by type of mammogram used for comparison, Table V. Adjusted for breast density, use of 4y mammograms as compared with use of 2y mammograms was for current users associated with a significant reduction in interval cancer proportion, RR 0.66 [95%CI 0.45–0.99]; while no difference was found for never HT users, RR 0.97 [95%CI 0.70–1.34]; or for past users, RR 0.91 [95%CI 0.56–1.48], Table V. Relative risk estimates without adjustment for breast density gave for never users, RR 0.97 [95%CI 0.60–1.57]; for current users, RR 0.66 [95%CI 0.45–0.99] and for past users, RR 0.94 [95%CI 0.68–1.30]. The adjustment for breast density did, as seen, only change the relative risk estimates very slightly.
|Type of mammography used for comparison||Interval cancer||All cancer||Interval % of all cancer||RR (95% CI) Unadjusted for breast density||RR (95% CI) Adjusted for breast density|
|2y mammogram = baseline||95||382||24.87%||1||1|
|4y mammogram||61||261||23.37%||0.97 [0.60–1.57]||0.97 [0.70–1.34]|
|Current HT users|
|2y mammogram = baseline||103||259||39.37%||1||1|
|4y mammogram||32||121||26.45%||0.66 [0.45–0.99]1||0.66 [0.45–0.99]2|
|2y mammogram = baseline||28||102||27.45%||1||1|
|4y mammogram||40||150||26.67%||0.94 [0.68–1.30]||0.91 80.56–1.48]|
It is a well-known clinical problem that the risk of misclassification at mammography screening is increased in HT users compared to that of non-users. The present study based on comprehensive data from an organized mammography screening program confirmed this observation. HT users compared with non-users had a 50% increased risk of a false positive test at first screen, and a doubled risk at subsequent screens. HT users compared to non- users had an almost 4-fold interval cancer proportion at first screen, and smaller, varying increased risks at subsequent screens. For current HT users, the study however showed that both the high false positive risk and the high interval cancer proportion could be significantly reduced by using 4-year old instead of 2-year-old mammograms for comparison at the screen reading. The use of 4-yearold instead of 2-year-old mammograms for comparison had no adverse impact for never and past users of HT.
The Million Women study found the relative risk of a false positive test among postmenopausal current HT users to be 1.64 [95%CI 1.50–1.80] compared to that of postmenopausal never users.8 We found that the relative risk of a false positive test among current HT users compared to that of never users was smaller at first screen, RR 1.49, than at subsequent screens, where it varied between 1.91 and 2.30 depending on breast density and age of comparison mammograms. It is therefore clear that the overall estimate of the excess risk associated with HT use depends on the proportion of prevalence screens included in the study. We found an overall estimate for the relative risk of a false positive test among current HT users compared to never users of 1.95 [95%CI 1.75–2.17] (unadjusted for breast density and age of comparison mammogram). A similar pattern was seen for the interval cancer proportion. Calculated from the data reported from the Million Women study,16 the interval cancer proportion among current users compared to that of never users was 2.18 [95% CI 1.29–3.70]. We found that the interval cancer proportion among current users compared to never users was high at first screen, RR 3.71, and somewhat smaller at subsequent screens, where it varied between 1.11 and 1.79 depending on breast density and age of comparison mammograms.
The Million Women study found a significant increase in relative risk of a false positive test among past users compared to never users of 1.21 [1.06–1.38]. In our study past users (women who ceased HT use at least 14 days before the screening) did not have a significantly increased risk of a false positive test compared to never users. Again the same pattern was seen for the interval cancer proportion. Calculated from the data reported from the Million Women study,16 the interval cancer proportion for past users compared to that of never users was non-significantly increased, RR 1.86 [95% CI 0.89–3.89]. In our study, past users did not have a significantly increased interval cancer proportion compared to never users. This difference between the two studies could be due to the fact that past users is a very inhomogeneous group consisting of both women who have just ceased using HT and women who have ceased using HT several years ago. Our data are too sparse to evaluate the potential effect of HT use by time after cease of use.
In Sumkin et al.,17 eleven radiologists and one resident read 128 screening mammograms three times: once without prior mammograms for comparison, once with mammograms from the most recent (1 year) examination, and once with mammograms acquired 2 years previously. Sensitivity was not significantly affected by age of the comparison mammogram, 1 year versus 2 years (p > 0.10), but specificity was affected, 1 year versus 2 years (p = 0.03). Sumkin et al. found that the latest prior examination seemed to be the optimal one and reported that HT use did not appear to influence their results. The contradiction between Sumkin et al.'s findings and ours may be due to the small numbers in Sumkin's study. But different distributions of study subjects on HT use and different ways of defining current and past users could also have caused these differences.
Age of comparison mammogram could potentially be confounded with calendar period. But the fact that the same radiologist had been in charge of the screening programme ensured continuity over time. Besides the clinic had been in operation for almost 10 years before the evaluated change in the reading procedure took place, and no major change in false positive rates or interval cancer proportion was seen before the change of comparison mammogram. Therefore it does not seem plausible that the change in misclassification of HT users after the change of comparison mammogram was explained by change in calendar period.
The fact that information of hormone use in this study derived from a drug prescription register strengthen our results. By using a drug prescription register, we avoided recall and reporting biases potentially affecting studies with information on hormone use coming from questionnaires. As we missed information on hormone use from less than 1% of the 75,581 screened women, some of these women could have been misclassified as never users instead of current or past users. This could only marginally have increased the false positive risk for never users. It has been the practice in Denmark to use one or two-view mammography at subsequent screens depending on the previous breast density. This could potentially limit the relevance of our results for screening settings using two-view mammography for all women. In analyzing the impact of type of comparison mammogram on the false positive risk and the interval cancer proportion, we did, however, adjust for breast density, and the results are therefore applicable also for systematic use of two-view mammography screening.
Some studies of the impact of HT use on mammography outcome have included oestrogen given as local treatment in the analysis.4, 6–8 We have chosen not to include oestrogen given as a low dose local treatment, as no pharmacodynamic effects can be detected, reflecting a very low systemic exposure.18, 19 It is therefore not likely that oestrogen given in this way is capable of affecting the breast tissue. However, inclusion of oestrogen given as local treatment in the analysis did not change the results (data not shown).
The cumulative risk for experiencing at least one false positive test during participation in a biennial mammography screening programme from age 50 to age 70 can be calculated from the outcome of the individual screens.15 By using this method and the results found for the different screening modalities in the present study, we estimated the cumulative risk of a false positive test for each of the HT user groups. Never users with fatty breasts after first screen had a cumulative risk of 5.6% when 4y mammograms were used where possible, while their cumulative risk was 6.5% when 2y mammograms were used. For never users with mixed/dense breasts after first screen, the cumulative risks were 7.4% and 8.3%, respectively. For women being current HT users for 10 years, from age 55 to age 65, deemed to have low breast density before and after HT use and higher breast density during use, the cumulative risk will be 11.4% if 2y mammograms are used for comparison, but only 8.7% if 4y mammograms are used where possible. A similar group of women deemed to have higher breast density throughout their screening carrier will have a cumulative risk of 12.8% if 2y mammograms are used for comparison, but only 9.8% if 4y mammograms are used. Finally, for women deemed to have low breast density throughout their screening carrier the numbers become 9.5% and 8.5%, respectively. This means that the excess risk of a false positive mammogram for current HT users compared with that of never users can be reduced by ∼40% for women with mixed/dense breasts during hormone use, when screening mammograms are compared to 4y mammograms instead of 2y mammogram.
The mammography screening programme in the county of Fyn has a low false positive risk compared to that of most other programmes,20 although the false positive risk is lower in the Dutch national screening programme.21 Mammograms from the Dutch national screening programme have been used to show that breast cancer can be detected earlier by lowering the threshold for recall, especially for recall rates of 1–4%.22 If our results hold true also for screening programmes with a higher false positive risk, HT users in these programmes are—in comparison with the Funen women—expected to benefit even more in absolute terms from a systematic viewing of current screening mammograms with 4-year-old mammograms. Studies of other screening settings are therefore warranted. The Fyn setting was ideal for evaluating the impact of using 2 different sets of comparison mammograms, but as for all other studies from a single clinic, the use of data from one setting may imply limitations for the generalizability of the results, and studies from other clinics are desirable. It would certainly also be desirable to test the results from the present observational study in an experimental setting.
In conclusion, our observational study from one clinic indicated that misclassification at screening mammography in current users of HT can be reduced considerably, when the current screening mammograms are viewed with the mammograms taken 4 years earlier.
We are indebted to Dr. Stephen Taplin, National Cancer Institute for comments on a draft of this manuscript. The study was approved by the Danish Data Inspection Agency.
- 3PerryN,BroedersM,de WolfC,TörnbergS,HollandR,von KarsaL, eds. European guidelines for quality assurance in breast cancer screening and diagnosis,4th edn., Luxembourg: Office for Official Publications of the European Communities, 2006.
- 11Mammography screening in the county of Fyn November 1993–December 1999. APMIS 2003 ( Suppl); 110: 1–33., , , , , , , , .
- 13www.whocc.no/atcddd [Accessed 13 July 2007].
- 14www.medstat.dk [Accessed 15 November 2004].