Evaluation of a breast cancer risk prediction model expanded to include category of prior benign breast disease lesion
Benign breast diseases (BBD) encompass several histologic subtypes with various risks of subsequent breast cancer. Information on previous benign breast disease biopsies has been incorporated into breast cancer risk prediction models; however, the type of histologic lesion has not been taken into account. Given the substantial heterogeneity in breast cancer risk dependent on the type of benign lesion, the authors evaluated whether incorporating this level of detail would improve the discriminatory power of risk classification models.
By using data from the Nurses' Health Study, a breast cancer nested case-control study (240 cases; 1036 controls), the authors determined predictors of categories of BBD lesions and developed imputation models. The type of BBD, imputed for each cohort member who reported a diagnosis, was added to a modified version of the Rosner-Colditz breast cancer risk prediction model.
Compared with the model that included only previous BBD (yes/no), the model that included categories of BBD was significantly improved (P < .0001). Overall, including the category of BBD increased the concordance statistic from 0.628 to 0.635. By using risk reclassification, inclusion of the type of BBD resulted in a 17% increase in incidence per increase of 1 risk decile, holding the model without BBD type risk decile constant.
Although the current data suggested that the inclusion of BBD category may improve breast cancer risk classification, the clinical utility of such a model will depend on the consistency of histologic classification of benign breast disease lesions. Cancer 2010. © 2010 American Cancer Society.
There is consistent evidence that women who are diagnosed with benign breast disease (BBD) are at an increased risk of breast cancer compared with women who do not have a history of BBD.1 Although BBD encompasses several histologic subtypes, women who have this diagnosis have double the risk of breast cancer compared with women without a BBD diagnosis. Increased morphologic data have refined our ability to estimate a woman's risk of subsequent breast cancer. Compared with women who do not have BBD or who have nonproliferative lesions, the risk of developing breast cancer increases 1.5-fold to 2-fold among women who have proliferative changes without atypia and increases 3-fold to 5-fold among women who have atypical hyperplasia (AH).2-5
The initiation of wide-scale mammographic screening has made a diagnosis of BBD a common occurrence. Because of the strong association between having a BBD and the subsequent risk of breast cancer, information on previous BBD has been incorporated into breast cancer risk prediction models.6, 7 However, the level of detail usually is limited to ever having a previous BBD (yes/no)6 or the number of previous biopsies8 and does not take into account the histologic category of the benign lesion. The original Gail model7 included the number of previous biopsies and the presence of AH (yes/no), and the Tyrer-Cuzick model includes both previous AH and lobular carcinoma in situ.9 Breast cancer risk prediction models have been used to determine eligibility for clinical trials10 and to identify women at high risk who may benefit from chemoprevention.11 Although these models perform well in estimating the number of breast cancers that will occur in a population, their ability to discriminate between individuals who will develop breast cancer and those who will not is modest. Given the substantial heterogeneity in risk of breast cancer depending on the type of benign lesion, we evaluated whether incorporating this level of histologic detail improved the discriminatory power of breast cancer risk classification models.
MATERIALS AND METHODS
The Nurses' Health Study (NHS) cohort was initiated in 1976, when 121,700 US registered nurses ages 30 to 55 years returned an initial questionnaire. Every 2 years, information on reproductive variables, body mass index, exogenous hormone use, and disease outcomes has been updated. This study was approved by the Committee on Human Subjects at Brigham and Women's Hospital.
Population for Analysis
The population of women whose data have been used in the current analysis has been described elsewhere in detail.6, 12 Briefly, we excluded women with unknown, inconsistent, or out-of-range reports for height, weight in 1976, or at age 18 years, age at menarche or menopause, or each pregnancy, parity, and duration or type of postmenopausal hormone use. In addition, women who underwent a simple hysterectomy were excluded, as were women with prevalent cancers in 1976 or with no follow-up after 1978. Overall, 75,022 participants remained in the analysis. These women contributed 1,167,715 person-years from 1980 to 2000, during which 3221 incident, invasive cases of breast cancer occurred.
Breast Cancer Nested Case-Control Study
We conducted a case-control study nested within the subcohort of participants in the NHS and Nurses' Health Study II (NHS II) who had a biopsy-confirmed BBD. Similar to NHS, NHS II also is an ongoing cohort study of >116,000 US women who were nurses and were ages 25 to 42 years in 1989 when the study was initiated. The methods that were developed to follow participants and to confirm incident cancers and deaths in the study have been described previously.13
Beginning with the initial NHS questionnaire in 1976, participants have been asked on every biennial questionnaire to report any diagnosis of fibrocystic disease or other BBD. Beginning in 1982, the NHS I questionnaires asked whether participants had biopsy-confirmed BBD. The initial 1989 NHS II questionnaire and all subsequent questionnaires also asked participants to report any diagnosis of BBD and to indicate whether it was confirmed by biopsy. For women from whom we are able to obtain specimens, 95% of the self-reported BBD is confirmed by pathology review.14, 15 Upon centralized review, the pathologists consider women with the following histologies to have a BBD: cysts, apocrine metaplasia, mild hyperplasia, fibroadenoma, moderate or florid hyperplasia, intraductal papilloma, sclerosing adenosis, atypical ductal hyperplasia, and atypical lobular hyperplasia.
Within the subcohort of women with biopsy-confirmed BBD, eligible participants were women who reported a first diagnosis of breast cancer between 1976 and the return of the 1996 questionnaire (NHS) or between 1989 and the return of the 1995 questionnaire (NHS II). Incident breast cancer cases in both cohorts were identified through the nurses' own reports and were confirmed by a review of medical records. Eligible controls were women who did not have a diagnosis of breast cancer at the time the case was diagnosed and who also had a previous biopsy-confirmed BBD. Controls were matched to cases on year of birth and year of biopsy. Attempts were made to identify 4 matched controls for each case when possible.
Collection and Review of BBD Specimens
We identified incident, confirmed breast cancer cases that were diagnosed after return of the initial questionnaire through the 1996/1995 follow-up cycle and controls who also reported a previous biopsy-confirmed BBD. This nested case-control study has been described in detail previously.16 Briefly, a total of 1310 cases originally were identified for this study, and 5273 matched controls were selected. Greater than 70% of eligible participants confirmed their BBD biopsy and granted permission for a review of their pathology slides. We received specimens for 465 cases and 1939 controls. There were no significant differences in the success of obtaining slides between cases and controls. Approximately 98% of pathology specimens obtained were of good quality and were evaluated by study pathologists (431 cases and 1869 controls). After excluding participants whose benign biopsy specimens were of poor quality or had no breast tissue, evidence of carcinoma in situ or invasive carcinoma, invalid dates of diagnosis, or insufficient information on laterality, in total, there were 395 breast cancer cases and 1610 controls.16, 17
Hematoxylin and eosin-stained biopsy slides were reviewed independently by 1 of 2 collaborating pathologists (S.J.S. or J.L.C.) in a blinded fashion. Any slide that was identified as having atypia or questionable atypia was reviewed jointly by both pathologists. For each set of slides reviewed, a detailed work sheet was completed. BBDs were classified according to the Page classification system18 into 1 of 3 categories: nonproliferative, proliferative without atypia (PWOA), or AH. To mimic the larger population that would be used in the risk prediction modeling, only cases and controls who met the specific inclusion criteria described above were included in the analysis. Thus, women with unknown type of menopause or simple hysterectomy (and, thus, unknown age at menopause) or with unknown age at menopause were not included (n = 729). In total, 1276 women (240 cases and 1036 controls) were included from the nested case-control study.
Description of the Risk Prediction Model
We fit the log-incidence model of breast cancer to incident breast cancer cases. This model has been described in detail previously.6, 19 We assume that incidence at time t(It) is proportional to the number of cell divisions Ct accumulated throughout life up to age t, that is,
The cumulative number of breast cell divisions is factored as follows:
Thus, λi = Ci+1/Ci represents the rate of increase of breast cell divisions from age i to age i + 1. This assumes that log(λi) is a linear function of risk factors that are relevant at age i. The set of risk factors and their magnitude may vary according to the stage of reproductive life. Details of the representation of Ci are provided by Colditz and Rosner.6
The general rationale for a log-incidence model is that the number of precancerous cells increases multiplicatively with time but that historic exposures differentially affect the rate of increase. Specifically, for breast cancer, it is assumed that the number of precancerous cells increases annually at the rate of exp(β0) before menopause for nulliparous women, at the rate of exp(β0 + β1s) before menopause for parous women with parity = s, and so forth. Finally, the number of precancerous cells increases immediately after the first birth by exp(β2[t1 − t0]), where t1 is the age at first birth, and t0 is the age at menarche. This assumes that the incidence rate of breast cancer is approximately proportional to the number of precancerous cells.
The log-incidence model was fit by using iteratively reweighted least squares with PROC NLIN in SAS statistical software version 9.1 (SAS Institute, Cary, NC). The parameters of the model are readily interpretable in a relative risk (RR) context. For example, exp(−β0) = RR for a 1-year increase in age at menarche among nulliparous women, exp(−[β0 + β2]) = RR for a 1-year increase in age at menarche among parous women, and so forth. In this analysis, women were censored if they developed types of cancer other than nonmelanoma skin cancer or if they died.
Imputation and Inclusion of Type of BBD in the Risk Classification Model
Ideally, we would have information on the type of BBD from a centralized pathology review for each study participant. However, because this was not logistically possible, we used an indirect approach to impute the probability of each of the categories of BBD among women who reported a diagnosis of BBD.
Let x1 = nonproliferative BBD, x2 = proliferative BBD without atypia, x3 = AH, and z = other covariates in the risk prediction model. From the main study, we can obtain Pr(D|z), given by
under the rare-disease assumption. We want to estimate Pr(D|x1, x2, x3, z), where, under the rare-disease assumption,
From the nested case-control study, we can estimate δ1*, δ2*, and δ3* based on the polytomous logistic regression model. Indeed, in principle, we also could estimate from the breast cancer nested case-control study, but the estimates would be very imprecise because of the small sample size.
Therefore, we used the main study population to estimate the parameters in Equation 4 by estimating x1, x2, and x3 for all individuals in the main study who reported a BBD. By using the breast cancer nested case-control study in which the type of BBD was determined for each participant, we developed a polytomous logistic regression model to predict the probability of women who had proliferative BBD without atypia and AH compared with the outcomes of women who had nonproliferative BBD. The covariates that were included as predictors of the type of BBD in the model were age at biopsy, menopausal status at biopsy, nulliparous at biopsy, early breast cancer case (within 8 years of biopsy), and late breast cancer case (≥8 years after biopsy).
In the breast cancer nested case-control study, cases are more likely than controls to have a proliferative BBD (and, specifically, to have AH). The rationale for including case status as a covariate in these equations is to account for this relation in the main study as well. Early cases were considered those below the median time from biopsy to breast cancer diagnosis (8 years), and late cases were breast cancer cases diagnosed ≥8 years after BBD biopsy. Then, we applied the estimates obtained from the nested case-control study to the larger cohort to estimate the probability of having each of the 3 types of BBD lesions (p1, p2, and p3).
Next, we imputed the type of BBD for women in the larger cohort who reported a BBD, applying the β values from the polytomous logistic regression models to estimate the probability of each category of BBD for each woman who reported a BBD in the larger cohort. To impute the type of BBD for each woman with BBD in the main study, we drew a random number (u) using the RANUNI function in SAS. If u < p1, then woman was assigned the category nonproliferative BBD; if p1 ≤ u < p1 + p2, then the woman was assigned the category proliferative BBD without atypia; if u ≥ p1 + p2, then the woman was assigned the category AH.
We then fit Equation 4 using , , and instead of x1, x2, and x3, thus obtaining the model:
Because the parameter estimates above (Eq. 5) may be influenced by random error, we repeated this imputation approach 4 additional times and used multiple imputation20 to combine estimates from the separate imputations to obtain an overall estimate. In addition, we included an additional category for BBD that could not be classified into 1 of the 3 categories, because those categories were missing necessary information (eg, age at BBD, menopausal status at BBD) for the prediction model (n = 7707).
To assess the additional predictive power of BBD category, we computed age-specific deciles (5-year age groups) of the risk function with BBD included as a yes/no variable but without the BBD category (Model A) and then including the imputed BBD category (Model B). From the cross-classification of risk decile Model A × risk decile Model B, we then compared the observed number of cases in specific risk deciles of Model B with the expected number of cases within strata defined by the Model A risk decile. Specifically, let Xij = number of breast cancer cases, Nij = number of person-years, and = Xij/Nij = estimated incidence rate within the ith age-specific risk decile for Model A and the jth age-specific risk decile for Model B, and let ln(Pij) = αi + β(j − 1).21, 22 Then, exp( ) is an estimate of the percentage increase in breast cancer incidence for an increase of 1 Model B risk decile, holding the Model A risk decile constant.
In addition, to assess the additional predictive ability of our risk prediction models, we used the area under the receiver operating characteristic (ROC) curve (ie, the concordance or C statistic). This statistic ranges from 0.5 to 1.0 and represents the probability that, for a randomly selected pair of women—1 with breast cancer and 1 without breast cancer—the woman with breast cancer has the higher estimated disease probability. Also, we compared the C statistic for different risk prediction rules.23
Within this dataset, as demonstrated previously, women with proliferative BBD lesions are at an increased risk of breast cancer. Women with proliferative disease without atypia are at a 30% increased risk (RR, 1.29; 95% confidence interval [CI], 0.93-1.79) (Table 1), and those with AH are at a 3.5-fold increased risk of breast cancer (RR, 3.47; 95% CI, 2.26-5.34) (Table 1) relative to women who have nonproliferative BBD.
Table 1. Relative Risk of Breast Cancer According to Type of Benign Breast Disease
|Nonproliferative||67 (27.9)||393 (37.9)||1.0 [Referent]|
|Proliferative without atypia||116 (48.3)||538 (51.9)||1.29 [0.93-1.79]|
|Atypical hyperplasia||57 (23.8)||105 (10.1)||3.47 [2.26-5.34]|
Breast cancer case status and nulliparity were the strongest predictors of the type of BBD (Table 2). In addition, age and menopausal status at biopsy were modest predictors of the type of BBD. The effect of nulliparity was similar for PWOA and AH; all other variables that were included in the model varied for PWOA and AH. No other variables from the Rosner and Colditz model were associated significantly with the type of BBD.
Table 2. β Estimates (Standard Error) for Predicting Proliferative Benign Breast Disease Without Atypia and Atypical Hyperplasia Relative to Nonproliferative Benign Breast Disease (n = 460) From a Polytomous Logistic Regression Model
|Intercept||−0.1527 (0.47)||−2.3229 (0.89)|
|Ageb||0.0185 (0.01)||0.0972 (0.01)|
|Premenopausal||0.0818 (0.20)||0.5051 (0.31)|
|Nulliparous||0.5642 (0.23)||0.5642 (0.23)|
|Early breast cancerc||0.0289 (0.24)||1.0765 (0.29)|
|Late breast cancerd||0.4135 (0.22)||1.3956 (0.29)|
The type of BBD, which was imputed for each cohort member who reported a BBD, was added to a modified version of the Rosner and Colditz model (Table 3). In total, 1164,494 person-years with 3221 breast cancer cases were included in this analysis. Women with nonproliferative BBD were at a nonsignificant 10% increased risk of breast cancer relative to women without a BBD (RR, 1.10; 95% CI, 0.97-1.25). Compared with women without BBD, women with PWOA had a 47% increased risk of breast cancer (RR, 1.47; 95% CI, 1.34-1.61), and women with AH had a 3-fold increased risk of breast cancer (RR, 3.02; 95% CI, 2.57-3.55). Women with an unclassified type of BBD had a 50% increased risk of breast cancer relative to women without BBD (RR, 1.49; 95% CI, 1.31-1.69). Compared with using only BBD (yes/no), adding specific categories of BBD significantly improved the model (difference in −2log-likelihood = 1331.86; 3 degrees of freedom; P < .0001). Overall, including the type of BBD increased the concordance statistic from 0.628 to 0.635 (Table 4). Because not all women will have BBD, we also calculated the area under the ROC curve for women with BBD and women without BBD. Among women with BBD, 422,986 person-years and 1576 breast cancer cases contributed to this analysis. In the population with BBD, the improvement in the concordance statistic with type of BBD was 0.03, whereas there was no improvement in the statistic among women without BBD.
Table 3. Summary of 5 Imputations of the Model With Imputed Benign Breast Disease Categories and the Average β Estimates, Corresponding Relative Risks, and P Values
|Intercept||−8.96||−8.88||−8.90||−8.86||−8.88||−8.87||−8.88|| || || |
|Duration of premenopausal years||0.075 (<.001)||0.073||0.073||0.072||0.073||0.073||0.073||<.001||1||1.08 [1.07-1.09]|
|Age at first birth—age at menarche||0.0065 (.18)||0.0066||0.0070||0.0067||0.007||0.0068||0.0068||.06||10||1.07 [1.00-1.15]|
|Birth index||−0.0033 (.01)||−0.0033||−0.0032||−0.0032||−0.0033||−0.0033||−0.0032||<.001||102||0.72 [0.64-0.80]|
|Duration of natural menopause||0.020 (.01)||0.017||0.017||0.017||0.017||0.017||0.017||.08||1||1.02 [1.00-1.03]|
|Duration of bilateral oophorectomy||0.009 (.16)||0.008||0.008||0.007||0.008||0.0074||0.0075||.82||1||1.01 [0.95-1.07]|
|Prior BBD (yes/no)||0.39 (<.001)||—||—||—||—||—||—|| || || |
|Duration of E only PMH||0.033 (.03)||0.030||0.031||0.031||0.031||0.032||0.031||<.001||10||1.36 [1.20-1.55]|
|Duration of E+P PMH||0.062 (.01)||0.062||0.062||0.062||0.062||0.062||0.062||<.001||10||1.86 [1.52-2.27]|
|Duration of other PMH||0.026 (.09)||0.025||0.025||0.025||0.025||0.025||0.025||.007||10||1.28 [1.07-1.54]|
|Current use of PMH||0.027 (.49)||0.031||0.035||0.032||0.034||0.035||0.033||.56||1||1.03 [0.92-1.16]|
|Past use of PMH||−0.14 (.12)||−0.14||−0.14||−0.14||−0.14||−0.14||−0.14||.01||1||0.87 [0.78-0.97]|
|Average BMI pre*duration premenopause||−0.0013 (<.001)||−0.0013||−0.0013||−0.0013||−0.0013||−0.0013||−0.0013||<.001||370b||0.61 [0.53-0.70]|
|Average BMI post*duration postmenopause||0.0039 (.005)||0.0039||0.0038||0.0038||0.0038||0.0039||0.0038||<.001||200c||2.16 [1.78-2.61]|
|Height*duration premenopause||0.00086 (.06)||0.00088||0.00086||0.00086||0.00086||0.00087||0.00087||<.001||222d||1.21 [1.09-1.35]|
|Height*duration postmenopause||−0.0010 (.34)||−0.0011||−0.0011||−0.0010||−0.0012||−0.001||−0.0011||.32||120e||0.88 [0.68-1.13]|
|Cumulative alcohol premenopause||0.00020 (.06)||0.00021||0.00019||0.00020||0.00020||0.00020||0.00020||<.001||384f||1.08 [1.04-1.12]|
|Cumulative alcohol postmenopause if PMH||0.00022 (.32)||0.00018||0.00025||0.00020||0.00023||0.00022||0.00021||.34||120g||1.03 [0.97-1.08]|
|Cumulative alcohol postmenopause no PMH||0.00030 (.16)||0.00029||0.00030||0.00030||0.00031||0.00030||0.00030||.04||240h||1.07 [1.00-1.15]|
|Family history of breast cancer||0.42 (.003)||0.42||0.43||0.43||0.42||0.43||0.43||<.001||1||1.53 [1.39-1.68]|
|Nonproliferative BBD||—||0.09||0.12||0.05||0.10||0.11||0.095||.15||1||1.10 [0.97-1.25]|
|PWOA BBD||—||0.38||0.39||0.39||0.38||0.37||0.38||<.001||1||1.47 [1.34-1.61]|
|AH BBD||—||1.12||1.04||1.15||1.10||1.13||1.11||<.001||1||3.02 [2.57-3.55]|
|Unclassified BBD|| ||0.40||0.40||0.40||0.40||0.40||0.40||<.001||1||1.49 [1.31-1.69]|
Table 4. Age-Specific and Overall Area Under the Receiver Operating Characteristic Curves for Models With and Without Imputed Probabilities of Benign Breast Disease Category
Cross-classifying Model A (without BBD category) risk deciles with Model B (with BBD category) risk deciles (Table 5) revealed that there were substantial differences in estimated incidence. Overall, the observed number of cases was higher than the expected number when the Model B decile was high and was lower than the expected number when the Model B decile was low relative to Model A. The overall slope was β = 0.16 (P < .001), indicating a significant estimated 17% increase in breast cancer incidence for an increase of 1 Model B age-specific risk decile, holding the age-specific Model A risk decile constant. Thus, adding the BBD category to the risk prediction model increases its predictive power.
Table 5. Cross Classification of Model A (Without Benign Breast Disease Category) Risk Decile and Model B (With Benign Breast Disease Category) Risk Decile
|Model A: Without BBD Category|
|1||125/109,649 (114)||7/6975 (100)||0/24 (0)||1/43 (2325)||1/67 (1493)0/0||0/146 (0)||0/82 (0)||0/22 (0)||0/4 (0)||0/0 (—)||0.59||0.18||.001|
|2||4/2980 (134)||152/98,468 (154)||25/14,737 (170)||0/0 (—)||0/0 (—)||0/32 (0)||1/201 (498)||2/302 (662)||0/50 (0)||0/2 (0)||0.21||0.09||.02|
|3||5/2731||7/1633 (429)||157/88,411||41/23,223 (177)||0/0 (—)||0/0 (—)||0/10 (0)||1/341||3/395 (759)||0/30 (0)||0.10||0.07||.16|
|4||2/1240 (161)||6/3482||0/1566 (0)||166/77,754 (213)||69/31,598 (218)||0/0 (—)||0/0 (—)||0/48 (0)||6/854 (703)||1/229 (437)||0.16||0.06||.009|
|5||0/316 (0)||9/4655 (193)||3/2473 (121)||4/1398 (286)||147/67,632 (217)||84/38,612 (218)||0/0 (—)||0/0 (—)||4/748 (535)||7/935 (749)||0.15||0.05||.003|
|6||0/90 (0)||5/1301 (384)||13/7423 (175)||9/3183 (283)||4/954 (363)||161/59,325 (271)||99/41,667 (238)||0/2 (0)||0/170 (0)||15/2663 (563)||0.07||0.04||.10|
|7||0/8 (0)||0/244 (0)||4/1918 (209)||18/9944 (181)||25/6885 (363)||3/675 (444)||188/55,486 (339)||109/37,547 (290)||0/0 (—)||22/4062 (542)||0.09||0.04||.03|
|8||0/0 (—)||0/14 (0)||0/220 (0)||0/1224 (0)||23/9469 (243)||29/14,197 (204)||5/1309 (382)||231/59,249 (390)||73/26,161 (279)||49/4931 (994)||0.23||0.04||<.001|
|9||0/0 (—)||0/0 (—)||0/0 (—)||0/2 (0)||0/166 (0)||11/3791 (290)||50/18,004 (278)||37/9581 (386)||288/69,874 (412)||87/15,355 (567)||0.20||0.05||<.001|
|10||0/0 (—)||0/0 (—)||0/0 (—)||0/0 (—)||0/0 (—)||0/0 (—)||0/8 (0)||30/9680 (310)||71/18,517 (383)||522/88,562 (589)||0.36||0.08||<.001|
|Overall|| || || || || || || || || || ||0.16||0.006||<.001|
Previous work in the NHS suggests that age at menarche and menopause may modify the association between BBD and subsequent breast cancer risk.6 Similar to what is observed in the larger cohort,6 women without BBD experience the protective effects of late age at menarche, whereas women with any type of BBD do not (Table 6). In contrast, an early age at menopause appears to be protective for all women regardless of BBD status (Table 6).
Table 6. Effect of Age at Menarche, Age at Menopause, and Benign Breast Disease Category on the Risk of Breast Cancer
|No BBD||0.72 (0.71-0.73)||0.43 (0.42-0.44)|
|Nonproliferative BBD||1.16 (1.08-1.24)||0.37 (0.34-0.40)|
|Proliferative without atypia||0.93 (0.86-0.99)||0.42 (0.40-0.44)|
|Atypical hyperplasia||0.83 (0.73-0.93)||0.50 (0.46-0.53)|
|Unclassified||1.08 (0.99-1.18)||0.40 (0.36-0.43)|
In secondary analyses, we used the type of BBD determined by central review for the subset of women whose specimens had undergone centralized review rather than the imputed type of BBD, and the results were nearly identical. In addition, we also conducted analyses in which imputation of type of BBD was restricted to those women whose first BBD was confirmed by biopsy and had the necessary information, and an additional 10,008 women with BBD that was not confirmed by biopsy were included in the unclassified BBD category. These results were very similar to the results obtained when imputation was conducted on all women with a BBD regardless of biopsy status.
In the NHS, we observed that the type of BBD, as imputed from a nested case-control study with centralized pathology review, added significantly to a modified Rosner-Colditz breast cancer risk prediction model. By using risk reclassification, inclusion of the type of BBD resulted in a 17% increase in incidence per increase of 1 risk decile, holding the model without BBD type risk decile constant. The increase in the C-statistic also was statistically significant, especially when the analyses were restricted to women with BBD. The Rosner-Colditz breast cancer risk prediction model is a log-incidence model that fits numerous time-various epidemiologic risk factors efficiently to a large dataset. The complex nature of breast cancer incidence, with many time-dependent risk factors, requires prediction models that account for change in risk factors over time. Such models outperform traditional approaches that fit indicator variables with fixed effects across time.24 Using this model, requires recording the year of birth, age at menarche, age at first birth and at each subsequent birth, age at menopause and type of menopause, history of BBD, family history of breast cancer in mother or sister, height, weight at age 18 years, current use of postmenopausal hormones (including type and duration of use), and alcohol intake. Although the model requires a more extensive list of personal factors than those considered in the Gail or Tyrer-Cuzick model, each of these characteristics represents an established reproductive or behavioral risk factor for breast cancer.25
There are a few additional differences between the Rosner-Colditz model and other breast cancer risk prediction models. It is noteworthy that Gail Model 28 does not include details of menopause or use of postmenopausal hormones in its prediction algorithm. These clearly are established risk factors26 and, accordingly, the model performance after including these factors is improved. We have not compared this model with the model that was developed by Tyrer et al,9 which incorporated breast cancer 1 gene (BRCA1) and BRCA2 estimates and a hypothetical low-penetrance gene along with some personal risk factors (including age at menarche, age at first birth, height, body mass index, and age at menopause). With respect to incorporating BBDs into these models, Gail Model 2 includes the number of biopsies and the presence of AH,8 and the Tyrer-Cuzick model includes previous AH and lobular carcinoma in situ (yes/no) only.9
The strengths of the current study include the large size of the cohort, prospectively collected data, and centralized pathology review for a subset of women. By using both risk factors and breast cancer case status in polytomous logistic regression models, our imputed categories of BBD, as applied to the larger cohort, accounted both for the association between BBD category and breast cancer and for the correlation between BBD and other risk factors that already were included in the risk prediction model.
A limitation of this study is that we did not have the category of BBD determined by central pathology review on all cohort members with BBD. In imputing category of BBD, only age, menopausal status and nulliparity at the time of BBD, and breast cancer case status were significant predictors. Although some degree of misclassification is expected in the imputed categories of BBD, the RR estimates for breast cancer were nearly identical to those observed in our own and other studies with centralized pathology review. Although our nested case-control study from which the imputation model was developed was limited to women who had biopsy-confirmed BBD, our primary analysis imputed type of BBD for all women who reported a BBD. The results from secondary analyses in which imputation was restricted only to women whose first BBD was confirmed by biopsy were very similar. One explanation for the similar results is that women without a biopsy-confirmed BBD may have a similar distribution of histologic classifications as women who do not undergo biopsy. An alternative explanation may reflect the methods used in the current study. In this study, we used a woman's first self-report of BBD to impute the type of BBD she had and did not update with subsequent reports of BBD. Thus, it is possible that a woman who has a diagnosis of BBD without biopsy confirmation, in fact, may go on to have a second BBD that is biopsy-confirmed. Because the model similarly is improved when imputation is applied to all women with BBD regardless of biopsy status, we presented those as our primary results.
With the inclusion of imputed category of BBD, the C-statistic increased from 0.628 to 0.635. There is increasing acknowledgment of the limitations to using the ROC to evaluate risk prediction.27-29 The correlation of 1 risk factor or a combination of risk factors with disease must be very strong—RRs on the order of 100 to 200 between exposed and unexposed—to serve as a screening tool at the individual level.30-32
Although the measures of model change suggest significant improvements, the magnitude of the effects is modest. One factor contributing to this is that the change to the model only applies to a subset of women—those with BBD. In addition, the original model includes a variable for BBD. The β estimate of this parameter in the original model is very similar in magnitude to that for women classified with proliferative BBD without atypia and for women with BBD that could not be classified. Thus, the application of the modified Rosner-Colditz model with BBD category will be altered only for a small percentage of women. The majority of the population did not report a BBD; therefore, their estimated risk will not change. Thus, on a population level, the inclusion of these variables is small but has the greatest impact on those women with BBD.
Recent work has demonstrated improvements in the area under the ROC curve when mammographic density33 and breast cancer genetic susceptibility loci34 were added to the National Cancer Institute's Breast Cancer Risk Assessment Tool. There were increases in the average area under the ROC curve of 0.047 with the addition of mammographic density33 and of 0.025 with the addition of 7 breast cancer single nucleotide polymorphisms.34 The improvement in the prediction model with the addition of mammographic density and genetic single nucleotide polymorphisms is similar to what we observed when we restricted the analysis to women with BBD (difference in AUC = 0.03).
Although these data suggest that the inclusion of BBD category may improve breast cancer risk classification, the clinical utility of such a model will depend on the consistency of histologic classification of BBD lesions. Continued expansion of current models with other risk factors that can be estimated on everyone (eg, mammographic density35) may further improve breast cancer risk classification.
We thank Drs. Stuart J. Schnitt and James L. Connolly for their expertise in breast pathology and review of the Nurses' Health Study benign breast disease slides. We also thank the participants of the Nurses' Health Study for their continued participation and dedication to the study.
CONFLICT OF INTEREST DISCLOSURES
Financial support was received from Public Health Service Grants CA046475 and CA087969; Specialized Program of Research Excellence (SPORE) in Breast Cancer CA089393; and from the National Cancer Institute, National Institutes of Health, and the Breast Cancer Research Foundation. Dr. Colditz is supported in part by an American Cancer Society Cissy Hornung Clinical Research Professorship.