• renal cell carcinoma;
  • prognosis;
  • validation;
  • prognostic models


  1. Top of page
  2. Abstract
  6. Acknowledgements


The objective of the current study was to compare, in a large multicenter study, the discriminating accuracy of four prognostic models developed to predict the survival of patients undergoing nephrectomy for nonmetastatic renal cell carcinoma (RCC).


A total of 2404 records of patients from 6 European centers were retrospectively reviewed. For each patient, prognostic scores were calculated according to four models: the Kattan model, the University of California at Los Angeles integrated staging system (UISS) model, the Yaycioglu model, and the Cindolo model. Survival curves were estimated by the Kaplan–Meier method and compared by the log-rank test. Discriminating ability was assessed by the Harrell c-index for censored data. The primary end point was overall survival (OS), and the secondary end points were cancer-specific survival (CSS) and disease recurrence-free survival (RFS).


At last follow-up, 541 subjects had died of any causes, with a 5-year OS rate of 80%. The 5-year CSS and RFS rates were 85% and 78%, respectively. All models discriminated well (P < 0.0001). The c-indexes for OS were 0.706 for the Kattan nomogram, 0.683 for the UISS model, and 0.589 and 0.615 for the Yaycioglu and Cindolo models, respectively. The Kattan nomogram was found to improve discrimination substantially in the UISS intermediate-risk patients.


The current study appears to better define the general applicability of prognostic models for predicting survival in patients with nonmetastatic RCC treated with nephrectomy. The results suggest that postoperative models discriminate substantially better than preoperative ones. The Kattan model was consistently found to be the most accurate, although the UISS model was only slightly less well performing. The Kattan model can be useful in the UISS intermediate-risk patients. Cancer 2005. © 2005 American Cancer Society.

Renal cell carcinoma (RCC) is the most frequent malignancy of the adult kidney, with approximately 30,000 new cases expected per year in the U.S. and 20,000 cases expected in the European Union,1 with a relative increase of > 30% in the past 2 decades.2

Survival is closely related to initial stage. The 5-year survival rate is 50–90% for localized disease, decreasing to 0–13% for metastatic disease.3

It is estimated that, at the time of diagnosis, 25% of patients have metastatic disease and another 25% have locally advanced disease.4 The main therapy for nonmetastatic RCC is nephrectomy. However, it is estimated that 30% of patients will eventually develop metastases during the course of the disease.5 The results concerning the treatment of metastatic disease still remain poor. For patients with advanced-stage kidney carcinoma, the only available option is immunotherapy, albeit of restricted efficacy.6–8

In this context, the use of reliable prognostic indicators aimed at distinguishing between patients with good or poor prognosis plays a crucial role in predicting outcome and in enrolling subjects in new trials of adjuvant agents.

Leibovich et al.9 and Motzer et al.10 presented two prognostic scoring systems for patients with metastatic RCC who were treated with nephrectomy followed by cytokine-based therapy.

Several prognostic scores have recently been published to predict outcome after nephrectomy for nonmetastatic RCC.11–16 All models consider clinical and/or pathologic variables, but they differ with regard to the number and type of covariates, tool properties (nomogram or prognostic categories), and endpoints (overall survival [OS], cancer-specific survival [CSS], and disease recurrence-free survival [RFS]). To our knowledge, only a few models to date have been internally validated to reduce overfitting (e.g., by bootstrap) and, to our knowledge, only the University of California at Los Angeles integrated staging system (UISS) has been validated externally.17, 18

Moreover, patient selection is often unclear, sample size is not usually justified, and precision of estimates is not assessed. Therefore, the generalization of these models to external cohorts of patients with different characteristics is questionable. Furthermore, to our knowledge, no direct comparison of models has been published to date in the literature to improve the decision-making ability of clinicians caring for patients who underwent nephrectomy for RCC.

The aim of the current study was to compare the discriminating accuracy of the prognostic models for nonmetastatic RCC in a large multicenter study involving six centers from three European countries (Italy, France, and Austria).


  1. Top of page
  2. Abstract
  6. Acknowledgements

Prognostic Scores

Currently, five prognostic scores are available for nonmetastatic RCC.11, 13–16 The Kattan et al. nomogram11 included histologic type, tumor size, 1997 TNM classification, and clinical presentation. For each subject, the probability of RFS after 5 years of follow-up was estimated.

The UISS model14 included TNM classification, Eastern Cooperative Oncology Group (ECOG) performance status (PS) score, and Fuhrman grade, separately for metastatic and nonmetastatic RCC. Patients were categorized in three groups with low, intermediate, and high risk. The endpoint was OS.

The stage, size, grade, necrosis (SSIGN) score developed by Frank et al.15 included tumor stage, tumor size, grading, and necrosis. The endpoint of the SSIGN model was CSS.

All the previous models assigned postoperative scores. Conversely, Yaycioglu et al.13 and Cindolo et al.16 developed pure preoperative scores taking into account only clinical presentation and clinical size of the renal masses and using RFS as the endpoint. Only two risk groups were derived.

Because the information concerning necrosis required by the SSIGN model15 was insufficient for many patients, comparison was restricted to the other four models.

Study Subjects

The study population of this multicenter retrospective study consisted of subjects who underwent surgery for RCC between 1984 and 2002 in six urologic centers in Italy (Napoli and Verona), France (Rennes, St. Etienne, Créteil), and Austria (Graz). Institutional databases are available in the participating clinical centers and covered 3151 subjects. Patient management and follow-up procedures were performed in accordance with the protocols of the centers. Information on follow-up was updated in each center by direct phone call and, alternatively, by contacting general practitioners or relatives.

Inclusion criteria mirrored the eligibility criteria common to all prognostic scores. We excluded from the comparison subjects who were not common to all four models (e.g., patients with distant metastasis [M+] or histologically confirmed lymph node-positive [N+] or large tumors [pT4]) as well as patients with benign lesions, bilateral disease, carcinoma of the Bellini ducts, or unclassified histology. Subjects who died of surgical complications (perioperative death) in the first month after nephrectomy and subjects with a follow-up < 1 month were not eligible.

An additional 61 subjects were excluded because the lack of information on Fuhrman grade precluded calculation of the UISS score. Eventually, 2404 patients were available for analysis. A flow chart of patient selection is illustrated in Figure 1.

thumbnail image

Figure 1. Flow chart of patient selection. *Some patients had multiple exclusion criteria.

Download figure to PowerPoint

Study Variables

From each institutional database, anonymous patient data regarding demographic, surgical, and pathologic information needed to define prognostic models were collected and pooled centrally.

Patients were staged preoperatively with an abdominopelvic computed tomography (CT) scan and either a chest CT scan or X-ray. Pathologic staging was determined in accordance with the 1997 TNM classification.19 Classifications pT3b and pT3c were pooled together. The tumor size of pathologic specimens was determined as the greatest dimension in centimeters. The Heidelberg classification was used to stratify histologic subtypes20 and the Fuhrman grading scheme was used to determine the nuclear grade of tumors.

Patient presentation was categorized as incidental or symptomatic (local or systemic), as described by Patard et al.21 General health status was measured by the ECOG PS score, categorized as an ECOG PS of 0 versus an ECOG PS ≥ 1.

Statistical Analysis

A total of 2404 patients with complete baseline and follow-up data were available for analysis.

The primary end point was OS, defined as the time from surgery to death of any cause or, for living patients, to the date of last available information. The secondary end points were CSS and RFS. The CSS was defined as the time from surgery to death attributable to cancer according to clinicians. Patients dead of causes other than cancer were censored to the date of death. RFS was defined as the time from surgery to recurrence or progression of cancer or cancer-specific death, whichever occurred first. In the current study, information on the date of disease recurrence was available just in three centers, where RFS comparison was restricted. Patients who did not experience disease progression were censored to the date of the last follow-up. Survival curves were estimated by the product-limit method of Kaplan–Meier and compared by the log-rank statistic. For descriptive purposes only, individual probability values from the Kattan nomogram were arbitrarily categorized in 5 classes (< 0.6, 0.6–0.7, 0.7–0.8, 0.8–0.9, and 0.9–1.0). The appropriate scores were attributed to each patient according to each model.

The discriminating ability of the prognostic models was assessed using receiver operating characteristic (ROC) curves. The area under the ROC curves was calculated using a modified version for censored data of the c-index22: among all possible pairs of individuals with different outcomes, a credit equal to 1 was assigned if the predicted survival probability was larger for the patient who lived longer, and equal to zero if the reverse was true. Any pair with equal predicted survival times received one-half credit. All possible pairs of patients were considered, at least one of whom had the studied event. A patient pair was unusable if both patients developed the event at the same time, or if one had the event and the other was censored at a shorter time. A c-index value of 0.5 represents no discriminating ability and a value of 1.0 represents perfect discrimination.

The 95% confidence intervals (95% CI) of c-indexes were calculated by bootstrapping. One thousand bootstrap samples, each involving a resampling of the entire dataset of patients with replacement, were assessed and 2-tailed percentile 95% CIs were calculated.23 The 95% CIs of pairwise differences between c-indexes of prognostic models were estimated similarly.

Statistical analyses were performed using SAS version 8.1 (SAS Institute, Inc., Cary, NC) and S-Plus 6 (Insightful Corporation, Seattle, WA) software packages.


  1. Top of page
  2. Abstract
  6. Acknowledgements

Baseline characteristics by each center are reported in Table 1. Patient recruitment was limited to more recent years in two centers (i.e., Centers 3 and 6). The overall median age was 62 years and the middle 50% of patients' ages ranged from 53–70 years. Male gender was predominant. Only a few patients had systemic symptoms at the time of presentation in Centers 5 and 6. Although radical nephrectomy was the prevailing treatment at all centers, the rate of partial nephrectomy still ranged from 2–28% among centers. Good PS (ECOG PS = 0) was reported for 75% of subjects, varying from 42% of those in Center 1 to 94% of those in Center 5. Large tumors (> 7 cm) were found in 40% of RCCs in 2 centers, but only in 20% of RCC cases in the remaining 4 centers. Fuhrman grade distribution was still highly heterogeneous among the centers.

Table 1. Baseline Characteristicsa*
VariableWhole data set (n = 2404)Center
(n = 454)(n = 125)(n = 236)(n = 379)(n = 565)(n = 645)
  • ECOG: Eastern Cooperative Oncology Group.

  • a

    Table entries are the percentages of sample but for date of surgery and age.

Date of surgery1984–20021984–20021986–20021991–20021984–20021984–20011994–2000
Age in yrs       
ECOG performance status       
 = 075.242.363.256.877.093.890.1
 > 024.857.736.843.
Nephrectomy type       
 Clear cell86.992.396.085.685.286.483.4
TNM 1997 classification       
Tumor size in (cm)       
Fuhrman's grade

The distribution of risk categories among centers according to prognostic models is reported in Table 2. Because the Kattan nomogram estimates individual probabilities of 5-year RFS, we arbitrarily derived 5 classes of increasing risk for descriptive purposes only. Consistent with baseline characteristics, a higher prevalence of patients with good prognosis was found in Centers 5 and 6, whereas Center 1 had a higher number of high-risk subjects.

Table 2. Distribution of Risk Categories According to Prognostic Model and Centera
VariableWhole data set (n = 2404)Center
(n = 454)(n = 125)(n = 236)(n = 379)(n = 565)(n = 645)
  • UISS: University of California at Los Angeles integrated staging system.

  • a

    Table entries are percentages values.

  • b

    Arbitrary classes of 5-year predicted disease recurrence-free survival.


At the time of analysis, of the entire cohort of subjects, 541 (22.5%) of them had died with an estimated 5-year survival rate of 80% (Table 3). In 360 cases (15%), clinicians attributed death to cancer progression. Cancer recurrence was recorded in 152 of 815 patients (18.7%) in centers in which information was available.

Table 3. Follow-Up Characteristics
VariableWhole data set (n = 2404)Center
(n = 454)(n = 125)(n = 236)(n = 379)(n = 565)(n = 645)
  1. NA: not available.

Follow-up status       
 Overall dead541127212367135168
 Cancer-specific dead3609117193285116
 Disease recurrence1521102022NANANA
3-Yr Kaplan–Meier survival probability       
 Cancer-specific dead0.9070.8550.8920.9260.9540.9220.900
 Disease recurrence free0.8440.7950.8730.925NANANA
5-Yr Kaplan–Meier survival probability       
 Cancer-specific dead0.8460.7780.8770.8890.9080.8790.808
 Disease recurrence free0.7830.7320.8140.868NANANA

Observed OS curves of the 4 prognostic models are depicted in Figure 2. As expected, all four models discriminated well and log-rank tests were all highly significant. Similar results were achieved for CSS and RFS (data not shown).

thumbnail image

Figure 2. Kaplan–Meier overall survival curves for the four prognostic scores. P < 0.0001 for all models.

Download figure to PowerPoint

The discriminating accuracy of the 4 models is reported in Table 4. The c-indexes and 95% bootstrap CIs were calculated for OS, CSS, and RFS. Pairwise differences between models are also reported. For all four models, discriminating accuracy was lower for OS compared with CSS and RFS. The Kattan model was consistently the most accurate, although the UISS model was just slightly less performing, plausible differences being at least 7% less than the Kattan model. Conversely, both preoperative models were less accurate in predicting the prognosis of patients with RCC who received nephrectomy, without substantial differences between them.

Table 4. Comparison of Prognostic Models Discriminationa
VariableOverall survival (95% CIs)Cancer-specific survival (95% CIs)Disease recurrence-free survival (95% CIs)b
  • 95% CI: 95% confidence interval; UISS: University of California at Los Angeles integrated staging system.

  • a

    c-index with bootstrap confidence intervals.

  • b

    Centers 1, 2, and 3 only.

 Kattan0.706 (0.681–0.731)0.771 (0.745–0.795)0.807 (0.777–0.835)
 UISS0.683 (0.661–0.705)0.733 (0.709–0.757)0.782 (0.752–0.812)
 Yaycioglu0.589 (0.566–0.611)0.629 (0.601–0.655)0.651 (0.609–0.691)
 Cindolo0.615 (0.592–0.636)0.648 (0.620–0.673)0.672 (0.640–0.704)
Differences between models   
 Kattan–UISS0.023 (0.004–0.041)0.039 (0.016–0.059)0.025 (−0.004–0.054)
 Kattan–Yaycioglu0.117 (0.094–0.141)0.143 (0.118–0.166)0.156 (0.120–0.190)
 Kattan–Cindolo0.091 (0.066–0.115)0.124 (0.099–0.150)0.134 (0.095–0.168)
 UISS–Yaycioglu0.094 (0.071–0.116)0.104 (0.075–0.132)0.131 (0.090–0.173)
 UISS–Cindolo0.068 (0.047–0.088)0.085 (0.060–0.110)0.110 (0.076–0.141)
 Cindolo–Yaycioglu0.026 (0.004–0.048)0.019 (−0.007–0.042)0.021 (−0.019–0.061)

Results were confirmed when the c-indexes of the four models were estimated separately by centers, although with a high heterogeneity (Table 5).

Table 5. Comparison of Prognostic Models Discrimination by Centera
ModelsWhole data set (n = 2404)Center
(n = 454)(n = 125)(n = 236)(n = 379)(n = 565)(n = 645)
  • UISS: University of California at Los Angeles integrated staging system; NA: not available.

  • a


  • b

    Centers 1, 2, and 3 only.

Overall survival       
Cancer-specific survival       
Disease recurrence-free survivalb       

Based on these results, a previously unplanned descriptive analysis of the relative contributions of the UISS and Kattan scores was performed. The observed Kaplan–Meier OS curves of the Kattan classes separately for UISS risk categories are reported in Figure 3. Although the Kattan predictions added discrimination throughout, they added substantially in the UISS intermediate-risk category, in which the 5-year OS was 0.745 (log-rank test P = 0.034, < 0.0001, and 0.044 for low, intermediate, and high-risk, respectively).

thumbnail image

Figure 3. Kaplan-Meier overall survival curves of Kattan-predicted probabilities according to the risk categories of the University of California at Los Angeles integrated staging system.

Download figure to PowerPoint

Assuming that the additional discriminating ability of the Kattan model might be attributed to prognostic information excluded from the UISS model, we explored the distributions of symptoms, histology, and tumor size among Kattan categories within UISS categories (Table 6): tumor size appeared to be associated mainly with prognosis, whereas clear cell histology had only limited prognostic value. Symptoms were uncommon in subjects with good prognosis but became more clearly associated with prognosis in the intermediate and the high-risk subjects.

Table 6. Distribution of Variables Included in the Kattan Prognostic Model and Excluded in the UISS Prognostic Model
VariableKattan-predicted probabilities
  • UISS: University of California at Los Angeles integrated staging system.

  • a

    Absolute numbers and percentages.

 Systemic symptoms (%)a1 (0.1)5 (10.0)2 (100)
 Clear cell histology (%)a913 (86.1)50 (100)2 (100)
 Median tumor size in cm3.87.07.0
 Systemic symptoms (%)a9 (2.7)38 (8.1)22 (15.0)22 (40.7)19 (52.8)
 Clear cell histology (%)a200 (60.6)447 (95.7)146 (99.3)54 (100)36 (100)
 Median tumor size in cm4.
 Systemic symptoms (%)a1 (10.0)1 (2.0)9 (10.5)27 (51.9)45 (75.0)
 Clear cell histology (%)a1 (10.0)48 (96.0)83 (96.5)51 (98.1)59 (98.3)
 Median tumor size in cm5.


  1. Top of page
  2. Abstract
  6. Acknowledgements

To our knowledge, this is the first article to attempt to compare the discriminating accuracy of prognostic models for nonmetastatic RCC. A total of 2404 patients who underwent nephrectomy from 6 centers scattered throughout Europe with heterogeneous baseline characteristics and data management were retrospectively assessed for analysis. This is both the strength and the main limitation of our study.

External validation of prognostic models suggests their applicability in various clinical environments, beyond factors such as demographic characteristics, differences in tumor presentation, variations of symptom classification, and variability in the pathologists' and surgeons' abilities that might affect the performance of the prognostic algorithm. With regard to this issue, the study offered a number of heterogeneous frameworks to evaluate transportability and consistency of predictions of the proposed tools. Central review by a single set of pathologists ideally increases validity by minimizing interobserver variability24 and could be desirable in a research setting (e.g., clinical trials), but it is less relevant in clinical settings in which variability is common. When looking for generalizability, the heterogeneity of baseline characteristics, rather than homogeneity, is desirable. Heterogeneity accounts for the entire spectrum of disease presentations and is more representative of the community.

The time limits of the current study were largely overlapping the recruitment periods of the primary studies, so no bias due to time trends in diagnostic or clinical management should be expected.25 Finally, pooling information from different countries should increase geographic transportability of the observed results. Differences in the methods of patient selection and data collection between primary studies and clinical practice are more likely to be detected when multiple independent centers are involved. The more heterogeneous the settings in which the system is tested and found accurate are, the more it will generalize to an untested setting.25 Consistency of results among centers could be a good point in favor of a valid conclusion.

The retrospective design is the main drawback of the current study in so far as loss of information and worse quality of data could follow from it. Lack of information on the presence/absence of necrosis and cellular grading precluded us from evaluating the SSIGN score.15 Possible differences in follow-up assessment among centers could affect both RFS and, in a lesser way, OS. Further heterogeneity of diagnostic assessment could affect CSS. Disease-specific death largely depends on subjective assessment and is often not substantiated.26, 27 Further usefulness of RFS was limited by lack of information in three of six centers. To limit potential biases, we used OS as a primary endpoint, although a survival selection bias could still operate in early recruited patients.

All models confirmed their ability to discriminate among categories with a different prognosis. However, the main question to be answered was whether there is a best model for clinical practice? Only discrimination, and not calibration accuracy, was compared because different endpoints were measured when prognostic models were primarily developed.

Overall results suggest that, perhaps unsurprisingly, postoperative models involving both clinical and pathologic variables (e.g., symptoms or ECOG PS) discriminate substantially better than preoperative models, although there appears to be room for improvement. The Cindolo model was previously performed using a smaller sample of patients taken from the current series.16 As a consequence, some overfitting could be expected. However, as shown in Table 5, this was not the case and no c-index differences were found among centers.

Overall, the Kattan model worked best, but its performance was only slightly better than the UISS model, and was well below the heterogeneity among centers, thus suggesting that both models are most likely the same on practical grounds. However, a deeper insight from an additional exploratory analysis highlighted that the greatest advantage can be attained in the heterogeneous UISS intermediate-risk prognosis category, thanks to the substantial prognostic information added by tumor size and, to some extent, by clinical presentation.

Surprisingly, the difference in discriminating accuracy between the Kattan and the UISS models did not increase from OS to RFS in the three centers in which both were available. It is noteworthy that the Kattan model was originally developed with RFS as the endpoint, whereas the UISS score was originally derived according to OS.

Similar results were found for the two preoperative models of Yaycioglu et al. and Cindolo et al.. These models have a lower discriminating accuracy, but they may still be useful in definite situations, such as preoperative patient counseling or morcellization after laparoscopic nephrectomy.

Relative performances of the different models were consistent among centers, tightening the strength of our conclusions and suggesting that no differential bias affected the results.

Surprisingly, after approximately two decades of basic research, only clinical and pathologic variables are still retained in modern prognostic equations. Postoperative models appear to be the best indicators of survival. Nevertheless, more powerful and accurate systems need to be developed and validated. It is expected that the combination of the usual prognostic variables (such as stage, grade, PS, histology, tumor size) with new molecular targets28, 29 will be the next step in the search for a better integrated prognostic system.


  1. Top of page
  2. Abstract
  6. Acknowledgements

The authors thank E. Caccese, a professional translator, who revised the entire article.


  1. Top of page
  2. Abstract
  6. Acknowledgements