Fax: (713) 792-4689.
Development and validation of nomograms for predicting residual tumor size and the probability of successful conservative surgery with neoadjuvant chemotherapy for breast cancer
Article first published online: 31 AUG 2006
Copyright © 2006 American Cancer Society
Volume 107, Issue 7, pages 1459–1466, 1 October 2006
How to Cite
Rouzier, R., Pusztai, L., Garbay, J.-R., Delaloge, S., Hunt, K. K., Hortobagyi, G. N., Berry, D. and Kuerer, H. M. (2006), Development and validation of nomograms for predicting residual tumor size and the probability of successful conservative surgery with neoadjuvant chemotherapy for breast cancer. Cancer, 107: 1459–1466. doi: 10.1002/cncr.22177
- Issue published online: 18 SEP 2006
- Article first published online: 31 AUG 2006
- Manuscript Accepted: 24 MAY 2006
- Manuscript Revised: 10 MAR 2006
- Manuscript Received: 14 NOV 2005
- Nellie B. Connally Breast Cancer Research Fund
- Philippe Foundation, Paris and New York
- breast cancer;
- neoadjuvant chemotherapy;
- breast conservation;
Neoadjuvant chemotherapy (NACT) increases the likelihood that breast conservation therapy for breast cancer patients will be successful. There is no available nomogram to predict breast conservation after NACT. The aim of the current study was to develop and validate nomograms for predicting residual tumor size and probability of a patient becoming eligible for breast conservation surgery after NACT.
A total of 1147 patients treated at M. D. Anderson Cancer Center (Houston, TX) and the Institut Gustave Roussy (Villejuif, France) who received anthracycline with or without paclitaxel NACT were included in the analysis. Clinicopathologic data from 1 series were used to construct logistic regression models for breast conservation and residual tumor size < 3 cm after NACT and were validated on an independent series.
The discrimination and the calibration of the nomogram for predicting the probability of residual tumor size < 3 cm after anthracycline-based NACT were good when applied to the validation set (concordance index = 0.79; U-index = 10−3). The discrimination of the nomogram for predicting eligibility for breast conservation therapy was also good (concordance index = 0.67). However, the calibration had to be adjusted to take into account global rates of breast conservation surgery. A second nomogram adapted to preoperative chemotherapy regimens containing paclitaxel was established. The concordance index of the nomogram for predicting breast conservation was 0.71 (P < 10−6) for the independent dataset and the calibration was also good. The confrontation of both nomograms showed that predictions were highly correlated (r = 0.97), suggesting that eligibility for breast conservation therapy was independent of the preoperative chemotherapy regimen used.
Nomograms were developed for breast cancer patients who received NACT to predict residual tumor size and whether the patient would thus become eligible for breast conservation therapy. These tools may be useful when counseling patients about treatment options, and a web-based interface is now available to help guide patients and physicians in these decisions. Cancer 2006. © 2006 American Cancer Society.
For patients with inflammatory and large-volume breast carcinoma, neoadjuvant chemotherapy (NACT) is considered the cornerstone of treatment. In 6 randomized trials comparing adjuvant and primary chemotherapy for patients with operable tumors, however, researchers found no survival benefit for patients treated with NACT.1, 2 Nevertheless, NACT has been definitively shown to increase the rate of breast conservation compared with that of patients who received only adjuvant chemotherapy.1 We and others have conducted studies to identify patients treated with primary chemotherapy who were most likely to achieve a complete pathologic response and qualify for breast conservation therapy.3, 4 However, the options for making such determinations of eligibility at the individual patient level are quite limited.
Nomograms are frequently being developed to assist clinicians and patients in clinical decision-making. Nomograms are statistical tools that enable users to calculate the overall probability of a specific outcome (i.e., death from a disease or the chance of success of a particular surgery) for an individual patient.5, 6 Increasingly, nomograms are being accepted as models in which known prognostic factors can be combined and used to predict outcome.7 To our knowledge, however, no calibrated and validated nomogram for predicting residual tumor size and potential eligibility for breast conservation therapy is available for patients who are considered candidates for primary chemotherapy.
The aim of this study was to develop and validate nomograms that could be used to predict residual tumor size < 3 cm and the probability that a patient would become eligible for breast conservation therapy after anthracycline-based or paclitaxel plus anthracycline-based primary chemotherapy. In this study, we report on 3 nomograms: 1 to predict residual tumor size < 3 cm after anthracycine-based primary chemotherapy, 1 to predict breast conservation eligilbility after anthracycine-based primary chemotherapy, and 1 to predict breast conservation eligilbility after anthracycine plus paclitaxel-based primary chemotherapy.
MATERIALS AND METHODS
We identified in institutional clinical databases 651 women from the University of Texas M. D. Anderson Cancer Center (MDACC) in Houston, Texas, and 496 patients from the Institut Gustave Roussy (IGR) in Villejuif, France, and who had been diagnosed with primary breast cancer and treated with either an anthracycline-based or paclitaxel plus anthracycline-based chemotherapy. All the women gave informed written consent to therapeutic procedures and to the analysis of clinicopathologic data related to their malignancy in accordance with Institutional Review Board institutional guidelines and the Declaration of Helsinki. Patients were divided into 4 cohorts according to the treatment center and the regimen. The first cohort included 496 patients treated at IGR with 3 or 4 courses of anthracycline-based preoperative chemotherapy and was used as a training set to develop the predictive models. The second cohort included 337 patients treated at MDACC with 4 courses of anthracycline-based preoperative chemotherapy and was used as a validation set. The third cohort included 237 patients from MDACC who received preoperative 4 courses of paclitaxel every 3 weeks or weekly paclitaxel for 12 weeks followed by 4 courses of 5-fluorouracil, doxorubicin, and cyclophosphamide (FAC). Data from these patients were used to develop the nomograms for paclitaxel-containing pre operative chemotherapy. Cohort 4 included 109 patients treated with paclitaxel and anthracycline-based preoperative chemotherapy at MDACC and was used to estimate the accuracy of the paclitaxel-anthracycline predictor. The clinical and histologic characteristics were prospectively recorded into each of the institutions' respective clinical databases. The patients from IGR were referred there between 1987 and 2000; patients from MDACC were referred there between 1989 and 2003. All patients were treated with NACT followed by surgery. Patient characteristics and chemotherapy modalities have been reported previously3, 8, 9 and are shown in Table 1.
|1 Training for anthracycline-based NACT||2 Validation for anthracycline-based NACT||3 Training for Paclitaxel/FAC NACT||4 Validation for Paclitaxel/FAC NACT|
|Center, treatment period||IGR, 1987–2000||MDACC, 1989–2000||MDACC, 1999–2001||MDACC, 2001–2003|
|Age (mean ± SD), y||52 ± 10||52 ± 11||51 ± 11||53 ± 12|
|Initial dimension (mean ± SD), mm||50 ± 14||62 ± 28||34 ± 16||43 ± 18|
|T0-1||3 (1%)||13 (4%)||37 (16%)||10 (9%)|
|T2||293 (59%)||109 (32%)||176 (74%)||65 (60%)|
|T3||161 (32%)||90 (27%)||23 (10%)||19 (17%)|
|T4||39 (8%)||125 (37%)||1 (0.004%)||15 (14%)|
|N0||213 (43%)||85 (25%)||135 (57%)||34 (31%)|
|N1-2||283 (57%)||250 (74%)||102 (43%)||75 (69%)|
|Negative||144 (29%)||155 (46%)||88 (37%)||44 (40%)|
|Positive||352 (71%)||182 (54%)||149 (63%)||65 (60%)|
|SBR I/mBNG I||37 (7%)||&#130;17 (5%)||14 (6%)||4 (4%)|
|SBR II/mBNG II||282 (57%)||136 (40%)||105 (44%)||41 (38%)|
|SBR III/mBNG III||177 (36%)||184 (55%)||118 (50%)||64 (58%)|
|Lobular/mixed||56 (11%)||16 (5%)||205 (86%)||99 (91%)|
|Ductal/other||440 (89%)||321 (95%)||29 (12%)||10 (9%)|
|Breast conservation rate||53%||28%||45%||38%|
The largest tumor dimension at clinical examination was recorded as the tumor diameter. Multifocality was defined as > 1 tumor location. Histoprognostic grade, defined according to the modified Scarff, Bloom, and Richardson system described by Contesso et al.10 was used at IGR, and the modified Black's nuclear grade was used at MDACC. These grading systems have been reported to provide similar results.11 All the patients underwent mastectomy or segmental mastectomy and axillary lymph node dissection. Breast conservation was considered successful if margin status was negative. Patients with positive margin status on segmental mastectomy specimen underwent reexcision or mastectomy. No prophylactic mastectomies were performed.
In order to develop well-calibrated and exportable nomograms, we built each model in a training cohort and validated it in an independent validation cohort. Multivariate logistic regression analysis was used to test the association between clinicopathologic characteristics (including age, tumor size, lymph node status, histologic type and grade, estrogen receptor [ER] status, multifocality, and number of courses of treatment) and eligibility for breast conservation therapy or size of residual tumor. Backward variable selection was performed to determine independent covariates. These models were used to construct the nomograms.
The model performance was quantified with respect to discrimination and calibration. Discrimination (i.e., whether the relative ranking of individual predictions was in the correct order) was quantified with the concordance index, which is identical to the area under the receiver operating characteristic curve. The concordance index ranges from 0 to 1, with 1 indicating perfect concordance, 0.5 indicating no association (no better than flipping a coin), and 0 indicating perfect discordance. We used the bootstrapping technique to obtain relatively unbiased estimates (200 repetitions): it provides an estimate of the average optimism of the C index when all data are included. The bootstrap is a general data-based computational tool that can be used to assign measures of accuracy to statistical estimates. Calibration (i.e., agreement between observed outcome frequencies and predicted probabilities) was studied with graphic representations of the relation between the observed outcome frequencies and the predicted probabilities (calibration curves) for groups of patients defined by quartiles (each quartile contained at least 20 cases). A calibration curve can be approximated by a regression line with intercept α and slope β. These parameters can be estimated in a logistic regression model with the event as outcome and the linear predictor as only covariate. Well-calibrated models have α = 0 and β = 1. Therefore, a sensible measure of calibration is a likelihood ratio statistic testing the null hypothesis that α = 0 and β = 1, which we used. The statistic has a chi square distribution with 2 degrees of freedom (“unreliability” [U]-statistic).12 All analyses were performed using the R package with the Design, Hmisc, and Lexis libraries (available at URL: http://lib.stat.cmu.edu/R/CRAN/ [accessed August 8, 2006]).13–16
Prediction of Probability of Residual Disease < 3 cm and Breast Conservation after Anthracycline-Based NACT without Taxanes
First, we built a nomogram to predict residual tumor size < 3 cm. The model was built with the IGR series and was validated on the MDACC series (patients who underwent surgery after anthracycline-based chemotherapy without taxanes). We chose 3 cm because this threshold was identified as the first split by recursive partitioning on the training set. The aim was to explore whether a threshold of residual tumor could be validated and not to validate a particular value for a threshold. Correlation between residual tumor size and breast conservation in the 4 cohorts is reported in Table 2. In the multivariate logistic regression analysis, initial tumor diameter (P < .0001), grade (P < .0001), and histologic type (P = .001) were independently associated with a residual tumor size < 3 cm. ER status (P = .074) and number of NACT chemotherapy courses (P = .09) were of borderline significance. The concordance indices before and after bootstrapping were 0.71 (P < 10−14) and 0.70 (P < 10−13), respectively. The concordance index for the independent dataset was 0.71 (P < 10−10). We compared the nomogram-predicted rates of residual tumor size < 3 cm with the observed rates in this validation set. The nomogram accurately predicted the observed rates (Fig. 1). For the validation set, the calibration was good with no significant difference (P = .08) between the predicted and the observed proportions. The nomogram built from this model is reported in Figure 2.
|pT < 3 cm||83%||17%|
|pT ≥ 3 cm||36%||64%|
|pT < 3 cm||42%||58%|
|pT ≥ 3 cm||6%||94%|
|pT < 3 cm||50%||50%|
|pT ≥ 3 cm||9%||91%|
|pT < 3 cm||41%||59%|
|pT ≥ 3 cm||25%||75%|
The same analysis was performed for the probability of a patient becoming eligible for breast conservation therapy. In the multivariate logistic regression analysis, initial tumor diameter (P = .0007), grade (P = .07), histologic type (P = .0005), multicentricity (P = .01), and ER status (P = .04) were independently associated with eligibility for breast conservation therapy after anthracycline-based NACT. The concordance indexes before and after bootstrapping were 0.68 (P < 10−11) and 0.67 (P < 10−11), respectively. The concordance index for the independent dataset was 0.67 (P < 10−6). However, the calibration was poor, with a significant difference (P < .01) between the predicted and the observed proportions (Fig. 3A). Breast conservation rates were different between IGR and MDACC: 53% and 28%, respectively (Table 1). Therefore, the prediction model built with IGR data overestimated the actual probability in the MDACC data. It should be noted that patients included in the validation set had larger tumors than patients included in the training set (37% T4 tumors in the validation set vs. 8% in the training set, and the mean tumor size was 62 mm in the validation set vs. 50 mm in the training set) (Table 1). However, the prediction resulted in a line with a 45° slope but a different intercept. Integrating the ratio of breast conservation between the 2 institutions restored a good calibration (P = .33), thus demonstrating that breast conservation was less performed independently of patient characteristics integrated into the nomogram. This suggests that regional and cultural attitude account for this finding. The nomogram built from this model is shown in Figure 4.
Prediction of Probability of Residual Disease Less than 3 cm and Breast Conservation after Paclitaxel/FAC NACT
We used the population of a randomized trial, comparing 4 courses of paclitaxel given every 3 weeks followed by 4 courses of FAC or weekly paclitaxel for 12 weeks followed by 4 courses of FAC, as the training set and an independent validation population of 109 patients. We failed to establish a model that could correctly predict residual tumor size < 3 cm. The principal reason was that 84% of the tumors measured < 3 cm after paclitaxel/FAC.
In a multivariate model, initial tumor diameter (P = .007), histologic type (P = .08), multicentricity (P = .0001), and ER status (P = .04) were independently associated with breast conservation after anthracycline-based NACT. The concordance indexes before and after bootstrapping were 0.73 (P < 10−11) and 0.71 (P < 10−10), respectively. The concordance index for the independent dataset was 0.71 (P < 10−6). The calibration was also good, with no significant difference (P = .25) between the predicted and the observed proportions (Fig. 3A). The nomogram built from this model is reported in Figure 5.
Comparison of the Two Nomograms
Improvement of breast conservation in patients receiving paclitaxel and anthracycline may result either from direct benefit of adding paclitaxel in the neoadjuvant setting or from change in attitude toward breast conservation. Therefore, we examined for which category of patients the addition of preoperative paclitaxel improved eligibility for breast conservation therapy. For this purpose we used the last cohort and plotted the probability of breast conservation with anthracycline-based NACT against the probability of breast conservation eligibility with FAC + paclitaxel NACT. The results are reported in Figure 6. The probability of breast conservation eligibility calculated with the nomogram for predicting the probability of breast conservation with anthracycline-based NACT was readjusted for the global rate of breast conservation. The correlation was highly significant (r = .97; P < .001). Of note is that the models yielded similar parameters: the parameters of the linear predictor of the model for breast conservation after anthracycline-based NACT were −0.92 (standard error [SE] of 0.58), 0.43 (SE of 0.22), 0.023 (SE of 0.007), −0.25 (SE of 0.17), 1.3 (SE of 0.52), and 1.3 (SE of 0.36) for intercept, ER status, initial diameter, grade, multicentricity, and histologic type, respectively. For a breast conservation model after paclitaxel/FAC NACT, the parameters were: −1.4 (SE of 0.97), 0.73 (SE of 0.35), 0.03 (SE of 0.01), −0.11 (SE of 0.31), 2.1 (SE of 0.52), and 0.96 (SE of 0.55) for intercept, ER status, initial diameter, grade, multicentricity, and histologic type, respectively, before backward variable selection. Actually, the nomogram for breast conservation after anthracycline-based NACT (with integration of the local breast conservation rate obtained fom cohort 3) predicted well breast conservation after paclitaxel/FAC NACT, with no significant difference (P = .45) between the predicted and the observed proportions in cohort 4. Taken together, these data suggest that predictive factors of breast conservation were relatively independent of the regimen and that the benefit of increasing the number of courses and the number of chemotherapeutic agents is limited.
We developed a computer program to help patients and physicians make decisions about NACT. The program, called Neo!adjuvant (version 1), is programmed in Java. An Internet browser with Java capability is required to run the applets. An example of a screen is shown in Figure 7. The applets will be available online through the MDACC and the IGR websites (available at URL: http://www.ecsvd.org/nomogram.html or http://www.mdanderson.org/care_centers/breastcenter/dIndex.cfm?pn=448442B2-3EA5-4BAC-98310076A9553E63).
Improving the accuracy of our prognostic estimates is increasingly important for making surgical treatment recommendations. On the basis of patient data from 2 institutions, we developed several tools for predicting outcome after NACT in patients with breast cancer. We internally and externally validated these tools in terms of discrimination and calibration. Moreover, we developed a Web-based interface that provides physicians with a user-friendly version of our nomograms, which are the first nomograms available for predicting residual tumor size and breast conservation eligibility probabilities for patients treated with NACT.
The nomograms provide probability estimates that might be useful at an individual level. For example, a 50-year-old woman with a unicentric (60 points), 70-mm (40 points), ER-positive (0 point), low nuclear grade (0 point) lobular carcinoma (0 point) has a 10% probability of being eligible for breast conservation after paclitaxel/FAC NACT in a center with a breast conservation rate of 50% after this regimen. Because increasing eligibility for breast conservation is theoretically the principal advantage of NACT, the low probability might lead her to opt for initial mastectomy instead of NACT. In contrast, a 45-year-old woman with a unicentric (60 points), 40-mm (65 points), ER-negative (20 points), high nuclear grade (5 points) ductal (25 points) carcinoma has a 64% probability of being eligible for breast conservation after paclitaxel/FAC NACT in the same center. Therefore, in her case, NACT appears to be a very valuable option. The nomograms reported in this study make no actual treatment recommendations but may be useful when counseling patients regarding treatment options. Additional validation still remains necessary to confirm the accuracy of the nomograms reported in our study. The grading system was different between institutions. A less than optimal level of agreement for both intraobserver and interobserver reproducibility has been constantly reported with breast cancer grading. The discrepancies generated from different grading systems are not different from intraobserver and interobserver reproducibility with a unique grading system.17 Actually, the model takes into account this aspect by allocating a coefficient that integrates the uncertainty related to grade assessment.
The factors involved in the prediction of response to primary chemotherapy would intuitively be predictive of the ability of NACT to reduce the primary tumor enough to make the patient eligible for conservation therapy. We chose 3 cm as the threshold for residual tumor size because this threshold was identified as the first split by recursive partitioning. However, the possibility of breast conservation also depends on multicentricity, breast size, and tumor location. This point explains that the nomograms for residual tumor size < 3 cm and for breast conservation do not provide the same results. We plan to include breast size and tumor location in the next version of our nomogram. Other parameters should be included in the next version such as imaging data and extent of ductal carcinoma in situ component. These latter parameters were not prospectively registered in our databases during the study period and were therefore not included in the nomogram. It is likely that their integration in the next version of the nomogram will improve its calibration and its discrimination.
Interestingly, prediction of residual tumor size was well calibrated, although eligibility for breast conservation therapy needed to be readjusted for patients treated with anthracycline-based NACT at MDACC versus IGR. The need for readjustment suggests that, although tumor behavior can be predicted based on objective factors, decisions regarding mastectomy versus breast conservation therapy are influenced by more subjective factors such as regional and cultural attitudes. The decision to opt for breast conservation has been previously reported to be dependent on country. Recently, Locker et al.18 reported that women in the U.S. enrolled in the Arimidex, Tamoxifen Alone or in Combination trial were more likely to undergo mastectomy compared with their counterparts from the U.K. Our study corroborates this report. One could suggest that rates of breast conservation and local recurrence rates might be correlated. In patients treated with breast conservation, the 5-year local recurrence rates were 5% in the MDACC and 10% in the IGR series. Of note, the local recurrence rate was 8% in patients treated with mastectomy at IGR. Difference in terms of adjuvant treatment (boost, adjuvant hormonotherapy) precludes direct comparison.
Patients treated with preoperative taxane and anthracyclines have been reported to have a higher probability for becoming good candidates for breast conservation than patients treated preoperatively with anthracyclines alone. In the study conducted by the Aberdeen Breast Group,20 patients receiving docetaxel after anthracycline-based NACT had an increased rate of breast conservation compared with patients treated with anthracycline-based NACT only (67% vs. 48%). Similarly, Dieras et al.19 reported that breast-conserving surgery was performed in 58% of patients treated with 4 courses of combined doxorubicin plus paclitaxel and 45% of those treated with doxorubicin plus cyclophosphamide. However, the results of the National Surgical Adjuvant Breast and Bowel Project, Protocol B27 (NSABP B27), a randomized trial comparing neoadjuvant doxorubicin/cyclophosphamide with neoadjuvant doxorubicin/cyclophosphamide followed by docetaxel, did not confirm these findings.21 Overall, although 60% of patients were able to undergo lumpectomy, no significant difference was found between the study arms with regard to the use of breast conservation therapy. Interestingly, our logistic regression model for breast conservation after anthracycline-based NACT and after paclitaxel/FAC NACT showed that similar variables were predictive of breast-conserving surgery. Moreover, the models yielded similar parameters. This similarity indeed suggests that the probability of eligibility for breast conservation therapy is relatively independent of the preoperative regimen and corroborates the results of the NSABP B27 study. The correlation between the predictions of breast conservation after anthracycline-based NACT and paclitaxel/FAC NACT in the last cohort may be interpreted as a change of attitude toward breast conservation over the years.
In summary, we developed nomograms that can be used to predict the probability of residual tumor size and eligibility for breast conservation therapy after NACT. These nomograms may be useful when counseling patients about treatment options. We also developed a nomogram to predict the probability of pathologic complete response22 and plan to develop other nomograms and Web-based interfaces, such as for predicting lymph node status after NACT, which might be useful when counseling patients regading sentinel lymph node biopsy or axillary lymph node dissection.
- 11Relative worth of estrogen or progesterone receptor and pathologic characteristics of differentiation as indicators of prognosis in node negative breast cancer patients: findings from National Surgical Adjuvant Breast and Bowel Project Protocol B-06. J Clin Oncol. 1988; 6: 1076–1087., , , .
- 13http://www.pubhealth.ku.dk/˜bxc/Epi/ [accessed August 8, 2006]., , , et al. Epi: a package for statistical analysis in epidemiology. R package version 0.2. 2005. Available at URL:
- 14Regression Modeling Strategies, With Applications to Linear Models. Logistic Regression, and Survival analysis. New York: Springer; 2001., .
- 15Hmisc: A Package of Miscellaneous S Functions. Nashville, TN: Vanderbilt University; 2005. Available at URL: http://biostat.mc.vanderbilt.edu/s/Hmisc [accessed August 8, 2006]., .