A nomogram for predicting long term tumor-specific death in patients with soft tissue sarcoma (STS) was developed at the Memorial Sloan-Kettering Cancer Center (MSKCC).
A nomogram for predicting long term tumor-specific death in patients with soft tissue sarcoma (STS) was developed at the Memorial Sloan-Kettering Cancer Center (MSKCC).
To assess the performance of the MSKCC nomogram, 642 consecutive patients with extremity STS who underwent surgery over a 20-year span at a single referral center were analyzed. Nomogram predictions were based on tumor size, depth, site, patient age, histologic subtype, and grade. The latter, at variance with the system in use at the MSKCC, was classified as Grade 1–3 according to the French Federation of Cancer Centers Sarcoma Group (FNCLCC) system. The statistical approach used for nomogram performance assessment was that of “validation by calibration” proposed by Van Houwelingen.
Graphic comparison of observed and predicted sarcoma-specific survival curves showed that predictions by the nomogram were quite accurate, within 10% of actual survival for all prognostic strata. Statistical analysis showed that such predictions could be improved by employing approximately 25% shrinkage to achieve good calibration. The contribution of histologic grade was highly significant in both univariate analysis (P < 0.001) and multivariate analysis (P < 0.001), and a survival trend across the 3 grade categories was observed. Based on those findings, a nomogram that included the FNCLCC histologic grade classification was produced.
Results of the current study confirmed that the MSKCC nomogram is a valuable tool for individual prognostic assessment. A nomogram that included the FNCLCC histologic grade classification was proposed and was validated internally. Cancer 2005. © 2004 American Cancer Society.
Soft tissue sarcomas (STS) are a group of rare neoplasms of mesenchymal origin with an expected incidence of 1.5–2.0 per 100,000 inhabitants per year and an overall mortality rate of 30–50%1, 2 in the United States. Extremity STS in particular accounts for approximately 50% of all STS.1
The clinical course of STS is highly variable, depending on a number of patient, tumor, and treatment-related factors.3–10 Recently, Kattan et al.11 developed a prognostic model and derived a nomogram that integrates the information on patient age, tumor size, histologic grade, histologic subtype, tumor depth, and site in such a way that it arrives at an individual prediction of sarcoma-specific death. This model was developed from a series of 2136 adult patients with tumors of any site who were treated at Memorial Sloan-Kettering Cancer Center (MSKCC) and is referred to herein as the MSKCC model.
The MSKCC model was validated on the same series used for its development by means of bootstrap (internal validation). To our knowledge, the performance of this tool has never been tested on independent series. Furthermore, the authors classified histologic grade as either low or high according to Hajdu et al.12 Other classification systems are available and are widely used, such as the French Federation of Cancer Centers Sarcoma Groups (FNCLCC) classification.3
Over a 20-year time span > 1000 consecutive patients with localized extremity STS underwent surgery with curative intent at our institute, and the FNCLCC classification of histologic grade was adopted. This provided us with an opportunity both to check and to adapt the MSKCC model based on our retrospective series in such a way to incorporate information from this different classification system for histologic grading.
Between January 1980 and December 2000, 1013 consecutive patients with localized extremity STS underwent surgery with curative intent at the Istituto Nazionale per lo Studio e la Cura dei Tumori (INT) (Milan, Italy). From this series, we included 642 patients who presented with primary disease, as proposed by Kattan et al.
The main series characteristics are shown in Table 1. The same definitions used by Kattan et al. were applied to tumor size (≤ 5 cm, 5–10 cm, or > 10 cm), histologic subtype (fibrosarcoma, leiomyosarcoma, liposarcoma, malignant fibrous histiocytoma, malignant peripheral nerve tumor, synovial, or other), tumor depth (superficial or deep), and site (upper or lower extremity), whereas histologic grade was classified in our series according to the FNCLCC system.3 This is a 3-level classification system (Grade 1, Grade 2, and Grade 3) that defines a score in relation to tumor differentiation, mitotic index, and tumoral necrosis.
|Characteristic||No. of patients||Percent|
|≤ 5 cm||291||45|
|> 10 cm||159||25|
|Malignant fibrous histiocytoma||52||8|
|Malignant peripheral nerve tumor||65||10|
All surgical resections were macroscopically complete, which we defined as the absence of macroscopic residual disease after surgical excision of the tumor. Adjuvant radiation therapy was delivered to 237 patients (37%). External beam radiation was used in all such patients, and the doses ranged from 45 grays (Gy) to 65 Gy (median, 57 Gy). Adjuvant chemotherapy (mainly anthracycline-based regimens associated with ifosfamide) was given to 114 patients (18%) at the discretion of the multidisciplinary STS group or as part of clinical trials.
The median follow-up duration as of June, 2003, was 99 months (interquartile range, 91–106 months). A small fraction of patients (n = 29 patients; 4.5%) was lost before 10 years of follow-up.
In developing the MSKCC nomogram, Kattan et al.11 used a multiple Cox regression model in which patient age, tumor size, histologic subtype, tumor depth, and site were entered as covariates. Histologic grade was included as a stratification factor because of nonproportional hazards. Hence, two separate baseline hazard functions were fitted, one for low-grade tumors and the other for high-grade tumors; the covariate joint information was incorporated into the model linear predictor (LP). In this way, nomogram predictions were based on both components (baseline hazards and LP) of the Cox model. The MSKCC nomogram predicts the probability that the patient will die of sarcoma within 12 years of initial surgery, assuming that the patient does not die first of another cause.
To pictorially compare predictions based on the MSKCC model and the observed outcome in our series, Grade 2–3 tumors were grouped into a “high-grade” category, coherently with the American Joint Committee on Cancer-International Union Against Cancer (AJCC-UICC) staging system.13 The patients with Grade 2–3 tumors were stratified into 4 prognostic groups corresponding to the quartiles of the distribution of the LP from MSKCC model. The Kaplan–Meier curves for each group were then plotted together with the predicted survival curves, which were obtained within each group as the average of the individual curves estimated by the MSKCC model. A similar description was not possible considering Grade 1 tumors as “low grade” because of the small number of disease-related deaths (only 7 patients) in this subgroup.
For MSKCC model testing and revision, we adopted the approach of “validation by calibration” proposed by Van Houwelingen and Thorogood14 and Van Houwelingen.15 With such an approach, the original prognostic model is embedded in a “calibration model,” which allows testing the baseline hazard function, that is, whether the new patients fare better or worse on average than predicted by the model, as well as whether the relative effects of covariates are stronger or weaker than originally estimated. In our case series the patients were classified using a grading system different from that in use at MSKCC; therefore any checking of the baseline hazards was hampered. Furthermore, we observed a survival trend from Grade 1 to Grade 3 (as discussed below; see Results), contraindicating any grouping of the 3 classes for predictive purposes. Therefore, for checking the calibration of covariate effects, we fitted a Cox model in which the LP from the MSKCC model was entered as a covariate. The corresponding regression coefficient allowed us to test the hypothesis βLP = 1, which, if not rejected, indicates the validity of covariate-based predictions on the new data. Interaction terms between the LP and histologic grade also were tested jointly to detect a possibly different behavior of the LP covariates in the three grade categories.
For nomogram revision, histologic grade was entered into the Cox model as either a three-level stratification factor or an additional covariate. The latter option did not have a meaningful effect on the estimated regression coefficient corresponding to the LP and had the advantage that only one baseline hazard function was estimated. Furthermore, unlike in the data reported by Kattan et al., the proportional hazard assumption was met for histologic grade based on scaled Schoenfeld residuals plots16 and time-dependent covariate testing. We therefore based the revision process on the Cox model including LP and histologic grade as covariates, and we constructed a revised nomogram for survival predictions at 10 years, which was the maximum allowed by our data.
To quantify the prognostic accuracy of the nomogram, the c statistic described by Harrell et al.17 was computed. This is a discrimination measure corresponding to the nonparametric estimate of the area under the receiver-operating characteristic curve, which may range between 0.5 (no discrimination) to 1.0 (perfect discrimination). To account for possible over fitting, we calculated the degree of shrinkage of Cox model regression coefficients and the optimism in the estimated c statistic by means of bootstrap.18 Further details can be found in the statistical section of the report by Kattan et al.11
LP computation, taking into account individual covariate values, was based on the original model regression coefficients supplied by MSKCC. Survival time, which was computed from the date of surgery to the date of death or last follow-up, was censored for living patients and for patients who died of causes unrelated to STS, because we modeled disease-specific death. All statistical analyses were performed using S-Plus® (StatSci, MathSoft, WA) with Harrell's Design and Hmisc libraries (available at http://lib.stat.cmu.edu or in S-Plus 6.0). The reported P values are two-sided and were obtained from likelihood ratio tests.
The characteristics of patients in the INT series (Table 1) and the MSKCC series (see Table 1 from the article by Kattan et al.) did not differ greatly overall, despite the selective inclusion of patients with extremity STS in our series. Age distribution and site distribution were similar, with a median age of 49 years (range, 16–89 years) and a relative frequency of lower extremity STS of 72% in our series, compared with a median age of 51 years (range, 16–93 years) and a 71% of lower extremity STS (within the subset of extremity STS) in the MSKCC series. Tumors that measured > 10 cm in size or that were located deeply were slightly less frequent in our series (25% and 79%, respectively, vs. 36% and 84% in the MSKCC series). With regard to tumor histology, liposarcomas, leiomyosarcomas, malignant peripheral nerve tumors, and synovial sarcomas were more common in the INT series, whereas fibrosarcomas, malignant fibrous histiocytomas, and other histologic types were more common in the MSKCC series. This may be related both to the selection criteria we used for our series (only patients with extremity STS were considered) and to the discrepancies that occur commonly between pathologists in classifying adult type STS. All 3 grade categories were represented well in the INT series, with a prevalence of Grade 3 tumors (46%) over Grade 1 tumors (28%) and Grade 2 tumors (27%). There were 176 deaths overall; of these, 143 deaths (81%) were due to sarcoma and, thus, contributed to the current analysis.
Kaplan–Meier curves stratified by histologic grade (Fig. 1) showed a clear survival trend across Grades 1–3; the difference between the curves was highly significant at the log-rank test (P < 0.001). Ten-year survival estimates (95% confidence interval) were 0.958 (from 0.913 to 0.980) in patients with Grade 1 STS, 0.765 (from 0.678 to 0.832) in patients with Grade 2 STS, and 0.594 (from 0.528 to 0.653) in patients with Grade 3 STS.
Figure 2 compares observed and predicted survival curves, which were obtained as described above (see Statistical Methods) in patients with Grade 2 and Grade 3 STS. No such investigation was possible in patients with Grade 1 STS, as discussed above. Differences between observed and predicted 10-year survival rates ranged between −5.5% and 8.9%. The prognostic accuracy of the MSKCC nomogram, with Grade 1 tumors considered low-grade and with Grade 2–3 tumors considered high grade, was 0.75, as measured by the Harrell c statistic. It is shown in Figure 2 that there was a clear survival trend across the 4 prognostic groups, with the 10-year survival rate ranging from ≈ 80% in the group with the best survival to ≈ 40% in the group with the poorest survival. It is noteworthy, however, that the spread among predicted survival curves tended to be greater than that achieved among observed survival curves. This finding suggests that predictions would better fit our data if they were shrunk.
The Cox model stratified by histologic grade failed to detect significant interactions between the LP and histologic grade (P = 0.13) but yielded a significant result when testing the hypothesis βLP = 1 (P = 0.047). The regression coefficient estimated for the LP (95% confidence limits) was 0.76 (0.520–0.994). The latter result is in agreement with the indications given by Figure 2 and denotes that predictions based on the MSKCC model could be improved by using ≈ 25% shrinkage (on the log-hazard scale) to achieve good calibration.
The results obtained with the Cox model that included both the MSKCC LP and the histologic grade as covariates are shown in Table 2. Notably, the contribution of histologic grade was highly significant (P < 0.001), as also shown in the univariate analysis with the log-rank test. The hazard ratio estimates confirmed the survival trend shown in Figure 1: 4.51 and 8.93 for the comparisons of Grade 2 versus Grade 1 tumors and of Grade 3 versus Grade 1 tumors, respectively. The correction term estimated with bootstrap for shrinkage of Cox model regression coefficients was 0.98, suggesting that over fitting was not a problem with our data. The revised nomogram (Fig. 3), therefore, could be derived directly from the Cox model described above. The corresponding bootstrap-corrected concordance index was c = 0.76, a value that compares favorably with that reported by Kattan et al. for the MSKCC nomogram (0.77).
|Variable||β||HR||95% CI||P value|
|MSKCC linear predictor||0.761||—||—||0.048a|
|Grade 2 vs. Grade 1||1.507||4.51||1.99–10.2||—|
|Grade 3 vs. Grade 1||2.190||8.93||4.14–19.3||—|
Prognostic models are of major interest in clinical oncology, because they can assist therapeutic decision making, provide information to the patient, and allow patient selection or stratification into randomized trials. When the models are complex and include information on many covariates, possibly with nonlinear effects, interactions, or heterogeneous baseline hazards, computation of individual predictions may be difficult. For such a purpose, one established graphic tool is the nomogram, through which a score is computed for each covariate according to its prognostic importance; then, a total score reflecting the joint contribution of all covariates is obtained and converted into a clinically interpretable measure, usually the cumulative probability of an outcome event at a given observation time.
Considering STS, one example is the nomogram developed at the MSKCC by Kattan et al.11 that predicts the probability of disease-specific death within 12 years after surgery using commonly available information on patient age, tumor size, histologic grade, histologic subtype, depth, and site. The nomogram was derived from a Cox model, which, in turn, was demonstrated as the best modeling strategy of the MSKCC data compared with Kaplan–Meier analysis of all possible subsets or recursive partitioning. The authors also developed a software application (available at www.nomograms.org) that makes computations even easier and that extends the nomogram by facilitating for 5-year and 10-year predictions.
Although it was developed carefully and was validated internally, the performance of the MSKCC has never been assessed in an independent series. This task, which is defined commonly as “external validation,” is important for establishing the generalizability of any prognostic tool.
The current study was conducted along these lines, even though there were some deviations from the ideal situation. First, the original model, although it was developed for all types of STS, was applied in our study only to patients with extremity STS, covering ≈ 50% of patients in most series. Second, and more important, the FNCLCC system3 has been used at the INT for histologic grade classification in place of the system proposed by Hajdu et al.,12 which is in use at the MSKCC. The most striking difference between the 2 classifications is in the number of categories, namely, 3 grades (Grades 1–3) in the FNCLCC and 2 grades (low grade and high grade) in the system developed by Hajdu et al.
For the purpose of tumor staging, the AJCC-UICC13 suggests that 3-tiered systems should be conducted to a binary classification by including Grade 2 and 3 into the “high-grade” group. Extrapolating this suggestion, patients with Grade 2 tumors and patients with Grade 3 tumors were classified jointly into 4 prognostic categories based on the quartiles of the MSKCC LP distribution; for each group, we plotted observed and predicted survival curves for the purpose of graphic comparison (Fig. 2). The observed 10-year survival rate ranged between ≈ 80% in the most favorable group to ≈ 40% in the group with the poorest survival, thus confirming that prognostic stratification can be achieved using the MSKCC model. Predictions by the nomogram were quite accurate, within 10% of actual survival for all strata. However, there was also some discrepancy between predictions and the true outcome, and the pattern of disagreement suggested the need to shrink predictions to fit the data better.
Grade 2–3 grouping, although useful as adopted for descriptive purposes, is questionable for 2 reasons. First and foremost, different classification systems are based on distinct histologic and/or pathologic features, and no study empirically has compared the two classification systems under consideration. A comparative study of the National Cancer Institute and FNCLCC 3-tiered classification systems in a population of 410 patients with STS has been published.19 That study showed a good level of agreement for Grade 1 and 3 tumors, but a weak correlation in Grade 2 tumors. Based on the above reasoning, the French and the Hajdu classifications appear hardly comparable. Second, we observed a survival trend across Grades 1 through Grade 3 (Fig. 1); therefore, any grouping would imply a loss of prognostic information, as observed previously by Kandel et al.,20 by using a different 3-grade system. Accordingly, within a statistical approach of “validation by calibration,”14, 15 to formally check and revise the MSKCC model, we fitted a new Cox model in which histologic grade was entered as an additional covariate together with the MSKCC LP. In agreement with the indications from the above-mentioned descriptive analysis, the latter model showed that the predictions based on the MSKCC model according to tumor size, histologic subtype, tumor depth, and site were somewhat overstated and could be improved by applying to the LP a shrinkage factor of ≈ 25%. Histologic grade yielded a highly significant result (P < 0.001), and the estimated hazard ratios confirmed the survival trend across Grades 1 through Grade 3, which also was shown in the univariate analysis (Fig. 1). Finally, the Harrell c statistic, which was used to quantify the prognostic accuracy of the model, was equal to 0.76. Kattan et al. obtained a value very similar to ours (0.77), which those authors judged suggestive of good discrimination among patients. Our value of 0.76 also was slightly higher than that achieved by the original nomogram on our patient series (0.75) when considering Grade 1 tumors as low grade and Grade 2–3 tumors as high grade.
The findings described above suggest that improvement in predictions using the three-grade system may be less than suggested by the Cox model insofar as the c statistic is used. However, we modified the nomogram originally developed by Kattan et al. and obtained the revised nomogram portrayed in Figure 3. Notably, some differences are apparent between the two nomograms. In particular, the revised nomogram only applies to patients with extremity STS and allows estimation of 10-year, rather than 12-year, mortality rates and compatibly with the composition and follow-up pattern of the INT series. Histologic grade, which was modeled as a covariate rather than as a stratification factor, no longer requires distinct “mortality axes,” one for each grade category, but contributes to the computation of the total score together with the other covariates.
The nonogram, as noted by Kattan et al., also is useful for visualizing the association between each predictor variable and sarcoma-specific death: The larger the covariate axis, the higher the corresponding shift in prognosis when moving across the levels (or values) of the covariate, holding all the others fixed. In the revised nomogram, the axis corresponding to histologic grade is the largest one, which underlines the prognostic importance of this factor. The estimated hazard ratio for Grade 3 tumors versus Grade 1 tumors was 8.93, as discussed above (see Results), whereas a hazard ratio of ≈ 3.0 (which is to be thought of as an average over time, in consideration of the nonproportional hazard in the MSKCC series) can be derived from the original nomogram. This suggests that the prognostic impact of the FNCLCC system may be stronger compared with that of the classification system reported by Hajdu et al.
In conclusion, the current study confirmed that the MSKCC nomogram is a valuable tool for individual prognostic assessment. However, some degree of adjustment seems useful for improving the quality of predictions. This hypothetically may reflect either statistical “over fitting” in the original model, weaker prognostic effect of covariates in extremity STS compared with STS in other sites, the application of a three-grade system instead of two-grade system, or some combination of the above mechanisms. The revised nomogram incorporates such an adjustment of predictions, and it is proposed as an extension in extremity STS of the MSKCC nomogram whenever histologic grade is classified according to the FNCLCC system, which is now the system used most widely all over the world.