Among the several proposed risk classification schemes for predicting survival in women with breast carcinoma, one of the most commonly used is the Nottingham Prognostic Index (NPI). The goal of the current study was to use a continuous prognostic model (similar to those that have already been demonstrated to possess greater predictive accuracy than risk group–based models in other malignancies) to predict breast carcinoma mortality more accurately compared with the NPI.
A total of 519 women who had been treated with mastectomy and axillary lymph node dissection at Memorial Sloan-Kettering Cancer Center (New York, NY) between 1976 and 1979 met the following requirements for study inclusion: confirmation of the presence of invasive mammary carcinoma, no receipt of neoadjuvant or adjuvant systemic therapy, no previous history of malignancy, and negative lymph node status as assessed on routine histopathologic examination. Paraffin blocks were available for 368 of the 519 eligible patients. All available axillary lymph node tissue blocks were subjected to enhanced pathologic analysis. The competing-risk method was used to predict disease-specific death, and the accuracy of the novel prognostic model that emerged from this process was evaluated using the concordance index. Jackknife and 10-fold cross-validation predictions yielded by this new model were compared with predictions yielded by the NPI.
Of the 348 women for whom complete data were available, 73 died of disease; the 15-year probability of breast carcinoma–related death was 20%. On the basis of these 348 cases, the authors developed a prognostic model that took patient age, disease multifocality, tumor size, tumor grade, lymphovascular invasion, and enhanced lymph node staining into account, and using competing-risks regression analysis, they found that this new model predicted disease-specific death more accurately compared with the NPI.
The complexity of the decision regarding the use of adjuvant therapy to treat breast carcinoma is well documented. It is clear that the benefits of adjuvant therapy are modest and that these benefits must be weighed against the associated toxicities.1 Decisions regarding adjuvant therapy use generally are based on four key considerations: 1) the risk of recurrence in the absence of adjuvant therapy; 2) the toxicity associated with adjuvant therapy; 3) the efficacy of adjuvant therapy; and 4) the patient's preferences.2 Thus, it has been recognized that by improving our ability to predict the risk of recurrence in the absence of adjuvant therapy, we can make better decisions regarding whether adjuvant therapy is warranted in a given situation.3
The Nottingham Prognostic Index (NPI), an established and validated model for predicting disease-specific survival in women with breast carcinoma, operates under the assumption that adjuvant therapy is not used. The NPI separates women into three risk categories using a simple equation that depends on tumor size, tumor grade, and lymph node status.4
The goal of the current study was to determine whether the NPI could be made more accurate by the incorporation of additional variables and the construction of a continuous function for calculating the probability of disease-specific death. When used to assess patients with other types of malignant disease, continuous prediction models have exhibited greater prognostic accuracy compared with risk group–based models.5, 6
MATERIALS AND METHODS
A total of 519 consecutive women who had been treated with mastectomy and axillary lymph node dissection at Memorial Sloan-Kettering Cancer Center (New York, NY) between 1976 and 1979 met the following requirements for study inclusion: confirmation of the presence of invasive mammary carcinoma, no receipt of neoadjuvant or adjuvant systemic therapy, no previous history of malignancy, and negative lymph node status as assessed on routine histopathologic examination. Paraffin blocks were available for 368 of the 519 eligible patients. Enhanced pathologic analysis of available axillary lymph nodes was performed by sectioning tissue blocks at 2 deeper levels (50 μm apart from one another) and then staining with hematoxylin and eosin (H & E) and immunohistochemical (IHC) stains (AE1 and AE3 antibodies; Ventana Medical Systems, Tucson, AZ). Tumor grading was performed using the standard modified Bloom–Richardson system.7 Diagnoses of lobular carcinoma were made on the basis of morphologic criteria; E-cadherin staining was not performed. No attempt was made to further characterize cases of lobular carcinoma as being classic, alveolar, or pleomorphic. Lymphovascular invasion was defined according to morphologic criteria; the expression of endothelial markers such as CD31 was not considered. Vessel involvement outside the confines of the invasive carcinoma was taken into account in the diagnosis of lymphovascular invasion, whereas intratumoral vessel involvement was ignored.
With regard to the endpoint of disease-specific death, we believed that data on the following variables would be widely available and potentially prognostically significant: patient age, disease multifocality, tumor size, tumor grade, lymphovascular invasion, and enhanced lymph node staining. Patients for whom 1 or more values were unavailable (multifocality, n = 1; tumor size, n = 2; tumor grade, n = 17) were excluded from the study, leaving 348 complete patient records. Causes of death were recorded for patients who died.
Disease-specific mortality was estimated using the competing-risk method, as nearly half of all deaths in the study population were attributable to other causes.8 A model was constructed on the basis of the results of conditional cumulative incidence analysis,9 and this model served as the starting point for the development of a computerized prediction tool. The prediction model also was represented graphically in nomogram form.10
The process of model validation comprised two steps. First, the discriminatory power of the model was quantified using the concordance index.11 Similar to the area under the receiver-operating characteristic curve (but appropriate for censored data), the concordance index represents the probability that the model will predict a poorer outcome for the patient who dies first out of a randomly selected pair of patients. Note that for the purposes of assessing concordance, it is not required that both patients in a given pair die of disease; one patient simply needs to survive longer than the other. The concordance index represents the fraction of these patient pairs in which the prediction model correctly identifies the patient with the shorter survival duration. Flipping a coin to identify the patient with the shorter survival duration would be expected to yield a concordance index of 0.50, corresponding to a success rate of 50%.
In the second step of the validation process, we assessed the calibration of our prognostic model. This assessment was performed by grouping patients according to their jackknife-calculated, model-predicted mortality probabilities and then, for each group, comparing the mean predicted probability of death with the observed cumulative disease-specific mortality rate. All analyses were performed using the S-Plus 2000 Professional software package (Statistical Sciences, Seattle, WA) with the cmprsk, Design, and Hmisc libraries included.12
Ultimately, predictions made by our model were compared with those made by the NPI. First, a jackknife prediction was obtained for each patient by removing the patient in question from the data set, refitting our model to the remaining data, and then calculating the removed patient's probability of death within 15 years. These predictions and the predictions yielded by the NPI were compared with respect to their concordance indeces. NPI values were calculated in the manner described by the Swedish Breast Cancer Cooperative Group,13 using the following equation:
Lymph node scores were assigned as follows: patients with no positive lymph nodes detected on H & E staining were assigned a score of 1, patients with 1–3 positive lymph nodes were assigned a score of 2, and patients with > 3 positive lymph nodes were assigned a score of 3. Survival predictions were obtained by using NPI values to group patients into low-risk (NPI ≤ 3.4, group I), moderate-risk (NPI = 3.4–5.4, group II), and high-risk (NPI > 5.4, group III) categories.4 Note that for all comparisons involving jackknife-predicted probabilities, the full model was refit following the omission of each patient. Variable selection was not performed. Throughout the course of the study, the full model was used regardless of the statistical significance of the individual predictors within, as it is likely that for future patients, the full model will yield more accurate predictions compared with a reduced version of the model.14 In a subsequent analysis aimed at assessing whether our results were sensitive to the internal validation procedure, we used the 10-fold cross-validation method, rather than the jackknife method, to calculate mortality probabilities.
Descriptive statistics for the current cohort are summarized in Table 1. At most recent follow-up, 73 patients had died of disease, and 67 had died of other causes. Disease-specific mortality according to NPI risk group is depicted in Figure 1.
Table 1. Descriptive Statistics for Breast Carcinoma Cohort
IHC: immunohistochemistry; H&E: hematoxylin and eosin; +: positive; −: negative.
Unless otherwise noted.
Data not available
Data not available
IHC+ and H&E+
IHC− and H&E−
Tumor size (cm)
Data not available
In the conditional cumulative incidence model, tumor size (P = 0.006), Grade II (vs. Grade I) disease (P = 0.010), Grade III (vs. Grade I) disease (P = 0.012), lobular (vs. Grade I) disease (P = 0.002), lymphovascular invasion (P = 0.008), and positive H & E staining of the lymph nodes (P = 0.005) were found to be associated with disease-specific death, whereas patient age (P = 0.270), disease multifocality (P = 0.440), and IHC staining of the lymph nodes (P = 0.800) were not. The concordance index yielded by the model was 0.69 when the jackknife method was used and 0.68 when the 10-fold cross-validation method was used. Figure 2 illustrates the acceptable degree of calibration exhibited by our prognostic model. This figure plots the observed cumulative mortality rate against the mean predicted mortality risk for each of the jackknife-predicted mortality probability quartiles. The solid diagonal line represents the performance of an ideal model, for which predicted and observed 15-year mortality rates would be in perfect agreement. The actual data points are reasonably close to this line, suggesting that our model is relatively well calibrated. Furthermore, the observed disease-specific mortality rate did not differ significantly from the mean predicted mortality risk in any of the patient quartiles.
In addition to assessing concordance and calibration, we also compared the predictions yielded by our model with those obtained using the NPI risk groupings. Individual NPI and model predictions were compared in terms of their ability to rank patients according to mortality risk (i.e., in terms of their concordance indeces); only patients for whom both the current model and the NPI were applicable (i.e., patients with ductal carcinoma) were included in this comparison. To correct for overfitting, model predictions were calculated on a ‘leave-one-out’ basis, as well as on a 10-fold cross-validated basis; in this way, each patient was excluded from the model used to generate her probability of disease-specific death. Regardless of whether the jackknife method or the 10-fold cross-validation method was used, the discriminatory capability of our model proved to be superior to that of the NPI (concordance index, 0.70 vs. 0.61; P = 0.003). This difference is difficult to appreciate from a clinical perspective, and thus, the actual discrepancies between the results generated by our model and the results generated by the NPI are illustrated more clearly in Figure 3. From this figure, it can be seen that within each NPI category, there is significant heterogeneity in terms of model-predicted probability of death.
It also warrants mentioning that the NPI takes into consideration lymph node status as assessed via H & E staining. Some women in the current cohort, on reanalysis of their lymph nodes, were found to have had positive lymph node status despite originally being diagnosed as having negative lymph node status decades ago. Consequently, we recomputed NPI scores using the results of our reanalysis of lymph node status. Doing so improved the performance of the NPI (concordance index, 0.64 [new lymph node data] vs. 0.61 [old lymph node data]), but not to the level of our prognostic model (concordance index, 0.69; P = 0.014).
Figure 4 depicts our novel prognostic tool in nomogram form. This graphic representation of the regression model readily allows the user to compute a patient's predicted probability of death due to breast carcinoma within the ensuing 15 years. For example, in the absence of adjuvant therapy, a 50-year-old woman (5 points) who had a Grade II (83 points) unifocal lesion (0 points) measuring 1 cm in diameter (10 points), no lymphovascular invasion (0 points), and negative lymph node status (0 points) would have a 17% probability (total score, 98) of dying due to breast carcinoma within the next 15 years.
The postmastectomy prognosis for women with negative lymph nodes is of critical importance. In one study, 91% of women considering adjuvant therapy expressed a desire to know what their prognosis would be in the absence of such treatment15; however, when asked after the initiation of therapy, only 39% of women claimed to have received quantitative estimates of their prognosis, and only 31% stated that they had been provided with quantitative estimates of what their prognosis would be both with and without adjuvant therapy.16 Thus, for the simple purposes of notifying and counseling patients in this setting, information on prognosis is critical, and it appears that such information is not being communicated adequately.
Decisions regarding whether adjuvant therapy is warranted in a given situation are exceedingly difficult. It is clear that adjuvant therapy provides modest benefit1 and that this benefit can be accompanied by a number of complications.17 For patients with breast carcinoma, the issue of whether to receive adjuvant therapy is a legitimate one,17 and one that demands that a number of tradeoffs be considered.18 Although a National Cancer Institute clinical alert has recommended that all women with lymph node–negative disease receive adjuvant therapy,19 refinement of the method for assessing mortality risk in patients with breast carcinoma remains necessary.3, 19
In the current study, we performed an enhanced pathologic analysis of a cohort of women with breast carcinoma who underwent mastectomy but did not receive adjuvant therapy. The results of this pathologic assessment, which revealed that 9% of all patients had lymph nodes that stained positively for H & E, were found to be associated with disease-specific death on multivariate analysis (P = 0.005). Using these pathologic data in conjunction with a number of other variables, we developed a model for generating continuously valued probabilities of disease-specific mortality, and this model appears to predict death more accurately compared with previous models. Specifically, the current model had a higher concordance index than did the NPI (P < 0.02). Note that in comparing our model with the NPI, we considered NPI prognostic groups rather than NPI scores. To our knowledge, NPI-predicted survival probabilities can only be obtained from these groupings, and not from raw scores.
Our model suggests that tumor size is a significant prognostic variable, as has been reported by others.20, 21 Nonetheless, it is clear that establishing cutoff points with regard to tumor size is problematic. Simply put, with all other factors being held constant, prognosis becomes poorer with increasing tumor size. Consequently, the use of a heuristic, such as a 1 cm cutoff,22 will result in inferior predictive accuracy. Furthermore, the categorization of tumor size causes valuable information to be lost. For the purpose of counseling a patient or making a decision regarding adjuvant therapy, all predictive factors should be considered in an optimal fashion, so that the prognosis that is made is as accurate as possible. The development of a computerized version of our prognostic model would represent an important step in this direction, and such a tool might provide the most accurate method currently available for predicting mortality in women with breast carcinoma. Figure 5 compares model predictions with predictions generated using a heuristic cutoff point of 1 cm in women with lymph node–negative disease. In this figure, a high degree of heterogeneity in terms of model-predicted mortality risk is evident among women in the > 1 cm category. Furthermore, it can be seen that certain patients in the ≤ 1 cm category have a mortality risk of > 20%. Thus, we believe that women should not be counseled or managed in a uniform fashion on the basis of their tumor size category. In support of this idea, our model (concordance index, 0.69) proved to be a better predictor of risk compared with either a model classifying tumor size as ≤ 1 cm or > 1 cm (concordance index, 0.53)22 or a model classifying tumor size as < 2 cm, 2–4.9 cm, or ≥ 5 cm (concordance index, 0.60).21
In addition to being useful in patient counseling, the nomogram also allows interpretation of the relative weight of each variable in the risk model. In general, nomograms possess numerous advantages over typical hazard ratio tables.23
Our work is similar in spirit to, although more limited in scope than, the work of Ravdin et al.24 and Loprinzi and Thome.25 Their approaches extend well beyond ours by examining the effects of adjuvant therapy on the subsequent probabilities of recurrence and death. Our model does not make such predictions, but it does take more information into account in assessing mortality risk in the absence of adjuvant therapy. For example, in our model, tumor grade and staining method, neither of which is included in the other two models, are both statistically significant predictors of disease-specific death. In addition, whereas the other two models predict 10-year survival, our model predicts survival out to 15 years. Furthermore, in terms of concordance index, the Adjuvant! software package (Adjuvant! Inc., San Antonio, TX) proved to be less accurate (concordance index, 0.65) than our nomogram when model probabilities were calculated using the jackknife method or the 10-fold cross-validation method.
On the basis of internal validation studies, it appears that our model predicts the probability of breast carcinoma–specific death more accurately compared with other popular models or common heuristic tools. Nonetheless, the ‘value added’ by our model is subject to debate. Our belief is that due to the complexity and the serious implications associated with decisions regarding adjuvant treatment, steady, marginal improvements in our ability to predict outcome do in fact represent progress. In this respect, it is best to use the most accurate prediction tool available when counseling women with regard to adjuvant therapy use. Nonetheless, the clinical implications of the use of our model in a real-world setting are difficult to ascertain. Still, it is clear that relative to the use of a model incorporating a 1 cm tumor size cutoff, the use of our model would have dramatic ramifications with respect to the identification of high-risk patients (Fig. 5).
In addition to being useful for patient counseling, our model, which predicts mortality risk in the absence of adjuvant therapy, also has the potential to assist physicians in deciding whether adjuvant therapy is warranted in a given situation. For example, a patient with a low baseline level of risk might wish to avoid the toxicity associated with adjuvant therapy, and the ability to inform such a patient of her risk would be useful in any discussion of treatment options. In this way, our prognostic model could serve as an effective decision-making aid.26 It also is possible that our model could act as a benchmark for judging the predictive ability of new technologies, such as gene expression analysis. It is hoped that in the future, such novel techniques will become more widely accessible and allow mortality risk to be predicted with even greater accuracy.
The current study is not without significant limitations. For example, the cohort investigated comprised a relatively small number of patients (although the follow-up of these patients was excellent). Although such a limitation would be expected to make the derivation of an accurate and robust prediction model difficult, our model's ability to outperform the NPI suggests that this was not an issue. A more important concern is that our model requires external validation by other investigators.
In conclusion, we have developed and internally validated a tool for predicting 15-year disease-specific mortality in women with breast carcinoma who have been treated with mastectomy alone. This tool appears to represent an improvement over the NPI, although external validation clearly is necessary.