Fax: (305) 243-3310
Clinically significant Gleason sum upgrade†
External validation and head-to-head comparison of the existing nomograms
Article first published online: 5 JUL 2011
Copyright © 2011 American Cancer Society
Volume 118, Issue 2, pages 378–385, 15 January 2012
How to Cite
Iremashvili, V., Manoharan, M., Pelaez, L., Rosenberg, D. L. and Soloway, M. S. (2012), Clinically significant Gleason sum upgrade. Cancer, 118: 378–385. doi: 10.1002/cncr.26306
We thank the Center for Urologic Research, Education, and Diseases (CURED) and Mr. Vincent Rodriguez.
- Issue published online: 5 JAN 2012
- Article first published online: 5 JUL 2011
- Manuscript Accepted: 9 MAY 2011
- Manuscript Revised: 4 MAY 2011
- Manuscript Received: 9 MAR 2011
- prostate biopsy;
- prostatic neoplasms;
Several nomograms have been developed for the purpose of predicting the likelihood of an increase in Gleason sum (GS) from biopsy information compared with the GS determined after examination of the “entire prostate” in patients with prostate cancer. In this study, the authors evaluated and compared the ability of 4 nomograms (published by Capitanio et al, Chun et al, Kulkarni et al, and Moussa et al) to predict GS upgrades for patients with biopsy GS ≤6 prostate cancer who underwent radical prostatectomy (RP) at their center.
The entire study cohort included 942 patients with a biopsy GS ≤6. Predictive performances of the nomograms were compared using area under the receiver operating characteristic curve (AUC-ROC) analysis, calibration plots, and decision curve analysis (DCA) in the entire cohort, in patients with low-risk prostate cancer (LRPC), and a subgroup of those patients who underwent extended biopsy with ≥10 cores.
Patients with a GS ≥7 at prostatectomy included 319 of 942 patients (33.9%) in the entire study cohort, 263 of 814 patients (32.2%) with LRPC, and 84 of 301 patients (27.9%) with LRPC who underwent extended biopsy. With an AUC-ROC of 0.637 to 0.647 in the different subgroups of patients with low-risk cancer, the Kulkarni et al nomogram demonstrated significantly higher discriminative ability compared with the other nomograms. The same nomogram provided a small clinical benefit at DCA. All nomograms were poorly calibrated.
The available prognostic tools had limited ability to predict clinically significant upgrading in patients with biopsy GS ≤6 and, thus, the authors concluded that these tools are not ready for clinical application. Cancer 2011;. © 2011 American Cancer Society.
The biopsy Gleason sum (GS) is 1 of the most important determinants for accurately assessing risks and making informed choices regarding treatment options in patients with prostate cancer. However, it has been well documented that the biopsy GS is prone to error, because it is based on the examination of a small portion of the prostate.1, 2 Consequently, it is not surprising that the GS obtained after an examination of the “entire gland” often is higher than the GS determined at biopsy.
The possibility of GS under grading at biopsy is particularly important for patients with GS ≤6 low-risk disease who are eligible for less aggressive treatments, such as active surveillance and brachytherapy. Failure of the biopsy to detect high-grade (Gleason grade 4/5) cancer leads to an underestimation of the risks and may negatively affect treatment outcome.3, 4 Several factors have been associated with an increased risk of missing high-grade cancer at biopsy, including prostate size, preoperative prostate-specific antigen (PSA) level, the number of biopsy cores taken, and the prostate cancer volume at biopsy.5-7 These variables can be combined into a multivariable statistical model presented in the form of a nomogram. A nomogram allows for the derivation of an individualized estimate of the risk of GS under grading from the biopsy. Over the last few years, several nomograms predicting biopsy GS upgrades have been described8-11; however, to our knowledge, they have not been validated in external samples.
The objective of the current study was to evaluate and compare the ability of 4 nomograms to predict GS upgrade for patients with biopsy GS ≤6 prostate cancer who underwent radical prostatectomy at our center. Performance of the nomograms also was tested in subsets of patients with low-risk prostate cancer and those who underwent extended (≥10 cores) prostate biopsy. The subsets were analyzed, because these patients often are candidates for less aggressive methods of treatment; therefore, missing high-grade cancer would have adverse implications for disease progression and survival.
MATERIALS AND METHODS
This study was approved by the Institutional Review Board (IRB) (IRB no. 20020391). From December 1991 to December 2010, 2251 consecutive patients underwent radical prostatectomy using open (n = 2191) and robot-assisted laparoscopic (n = 60) techniques. Data were collected in the prostatectomy database. In total, we noted that 1264 patients had a biopsy GS ≤6. Of these, 322 patients were excluded from the current study because of incomplete prostate sampling (<6 cores; n = 36), the receipt of neoadjuvant therapy (n = 97), or missing data (n = 189). After this process was completed, the study cohort included 942 patients.
The pretreatment serum PSA level was measured before digital rectal or transrectal ultrasound (TRUS) examination. Indications for biopsy included an increased PSA level and/or an abnormal digital rectal examination. Prostate biopsies were performed under TRUS guidance. All outside biopsy slides were reviewed by an institutional genitourinary pathologist. Clinical stage was assigned by the attending urologist. The final analysis of this characteristic was based on the 2002 tumor-lymph node-metastasis (TNM) classification system.
The protocol for processing surgical specimens was consistent over the years of the study. The prostate was step-sectioned with 3-mm to 5-mm intervals. All sections were embedded for analysis and examined as quarter mounts. Tumor areas were traced and outlined on the slides. Primary and secondary Gleason grade was documented for each patient.
Characteristics of the analyzed nomograms are listed in Table 1. All patients had complete information regarding all preoperative clinical and pathologic variables that were used in the nomograms. The only exception was prostate volume, which was not available for most (889 of 942 patients; 94.4%) of the cohort. Instead, we used pathologic prostate weight as a surrogate for preoperative TRUS volume. Several studies have demonstrated a strong correlation between prostate weight and TRUS-based volume.12, 13 The percentage probability of GS upgrading was recorded for each patient individually using published versions of the nomograms.8-11
|Reference||Country||Type of GS Upgrade||No. of Patients||Percentage of Extended Biopsiesa||Variables Included in the Nomogram||Predictive Accuracy (AUC) in the Original Cohort|
|Chun 20068||Italy, Germany, United States||From ≤6 to ≥7 or from 7 to ≥8||4789||35.1||PSA, clinical stage, and biopsy GS||0.757|
|Kulkarni 20079||Canada||From ≤6 to ≥7 in patients with low-risk diseaseb||175||13||Age, PSA, pathology expertise, digital rectal examination results, presence of PIN, prostate volume, TRUS results, type of biopsy (sextant vs extended), and cancer volume (%)||0.710|
|Capitanio 200911||Italy||From ≤6 to ≥7 in patients with low-risk diseaseb||301||100||GS, no. of cores taken at biopsy, no. of positive cores||0.661|
|Moussa 200910||United States||From ≤6 to ≥7 or from 7 (3+4) to ≥7 (4+3)||1017||Not shownc||Age, race, digital rectal examination results, prostate volume, clinical stage, number of previous biopsies, PSA, no. of cores, no. of positive cores, maximum percent of cancer, secondary Gleason grade, perineural invasion, inflammation, high-grade PIN, atypia||0.680|
To calculate the predictive ability (ie, discriminative power) of the nomograms, receiver operating characteristic (ROC) curves were generated. Analysis of the curves was aided by calculating the areas under the ROC curves (AUC) and comparing them using the method described by DeLong et al.14 With these data, we were able to compare the predictive power of each nomogram with respect to a pair of patients with GS 6. The AUC represents the probability that the patient with the GS upgrade had a higher predicted probability of upgrade calculated by the nomogram than that with the unchanged score. This parameter may range from 0.5 (no discrimination, equivalent to the toss of a coin) to 1.0 (perfect discrimination).
The agreement between observed outcome frequencies and predicted probabilities was studied graphically by constructing calibration plots in which the x-axis represented the probability of GS upgrade predicted by the nomograms, and the y-axis represented actuarial frequency of high-grade disease in prostatectomy specimens. A separate graph was generated for each of the nomograms using the LOWESS smoother function in the STATA statistical software package (version 11.0; Stata Corp., College Station, Tex).
To evaluate the potential effect of the nomograms on clinical management, we implemented decision curve analysis (DCA) as described by Vickers and Elkin.15 Decision curves are constructed by plotting “net benefit” against threshold probability. In our study, this analysis estimates the magnitude of benefit resulting from altering clinical management in patients with different threshold probabilities of GS under grading at biopsy.
Two additional series of analyses were performed in the group of patients who had low-risk prostate cancer. These patients were identified using the classification proposed by D'Amico et al,3 which included clinical stage T1c or T2a disease and a PSA level <10 ng/mL. The subgroup of patients with low-risk prostate cancer who underwent extended biopsy was analyzed separately, because the probability of GS under grading potentially may have a significant, negative impact on clinical management. Because the Kulkarni et al nomogram was based on a group of patients with low-risk cancer (Table 1), it was studied only in the 2 corresponding low-risk subgroups of our cohort. Furthermore, the predictive performance of the Capitanio et al nomogram was analyzed only in patients who underwent extended biopsy, because only these patients were included in the original cohort.
All statistical tests were performed using STATA software (version 11.0; Stata Corp). All tests were 2-sided, and the level of significance was set at .05.
Table 2 displays the clinical and pathologic characteristics of the study cohort and the low-risk subgroups. Overall, 25.4% of patients were aged >65 years, and 9.9% had PSA levels >10 ng/mL. The median numbers of total and positive biopsy cores in the general group were 8 and 2, respectively. The proportion of patients who had a GS ≥7 at prostatectomy was 33.9% (319 of 942 patients) in the study cohort, 32.3% (263 of 814 patients) in the group with low-risk disease, and 27.9% (84 of 301 patients) in the subgroup with low-risk disease who underwent extended biopsy.
|No. of Patients (%)|
|Characteristic||All Patients||Patients With Low-Risk Disease||Patients With Low-Risk Disease and Extended Biopsy|
|No. of patients||942||814||301|
|Median age at surgery [IQR], y||60.5 [54.8-65.2]||60.3 [54.6-64.9]||59.8 [56.2-64.1]|
|Median preoperative PSA [IQR], ng/mL||5.5 [4.3-7.5]||5.2 [4.3-6.7]||4.9 [3.9-6.4]|
|No. of cores sampled at biopsy|
|<10||589 (62.5)||513 (63)||0 (0)|
|≥10||353 (37.5)||301 (37)||301 (100)|
|No. of cores positive for cancer|
|1||412 (43.7)||358 (44)||116 (38.5)|
|2||231 (24.5)||203 (24.9)||68 (22.6)|
|3||140 (14.9)||120 (14.7)||42 (14)|
|>3||159 (16.9)||133 (16.3)||75 (24.9)|
|Median cancer volume as percentage of total tissue [IQR]||2.7 [1.0-6.7]||2.8 [1.0-6.7]||2.0 [0.8-5.8]|
|Clinical tumor classification|
|T1||645 (68.5)||568 (69.8)||221 (73.4)|
|T2||273 (29)||246 (30.2)||80 (26.6)|
|T3/T4||24 (2.5)||0 (0)||0 (0)|
|Pathologic tumor classification|
|T2||830 (88.1)||728 (89.4)||284 (94.4)|
|T3/T4||112 (11.9)||86 (10.6)||17 (5.6)|
|Pathologic primary Gleason grade|
|1-3||888 (94.3)||774 (95.1)||290 (96.3)|
|4||46 (4.9)||35 (4.3)||11 (3.7)|
|5||8 (0.8)||5 (0.6)||0 (0)|
|Pathologic secondary Gleason grade|
|1-3||669 (71)||584 (71.7)||228 (75.7)|
|4||260 (27.6)||223 (27.4)||70 (23.3)|
|5||13 (1.4)||7 (0.9)||3 (1)|
|2-6||623 (66.1)||551 (67.7)||217 (72.1)|
|7||295 (31.3)||247 (30.3)||81 (26.9)|
|8||20 (2.1)||14 (1.7)||3 (1)|
|9||4 (0.4)||2 (0.2)||0 (0)|
The discriminative ability of each nomogram is illustrated with ROC curves (Fig. 1) and is summarized as the AUC in Table 3. In the entire cohort, the discriminative properties of both the Chun et al and Moussa et al nomograms were similarly weak. In the low-risk group, the AUC for the Kulkarni et al nomogram was significantly larger than that for the other tools studied, whereas it did not differ significantly from the Moussa et al nomogram in the extended biopsy subgroup. The Moussa et al nomogram statistically significantly outperformed the Chun et al nomogram in all groups. Other comparison pairs demonstrated no statistically significant differences between the curves. In both subgroups of patients with low risk disease, the AUC for the Chun et al nomogram did not differ significantly from 0.5 (random prediction). In addition, the AUC for the Capitanio et al nomogram in the extended biopsy subgroup was not significantly different from 0.5.
|All Patients (n = 942)||Patients With Low-Risk Disease (n = 814)||Patients With Low-Risk Disease Who Underwent Extended Biopsy (n = 301)|
|Nomogram||AUC||SE||95% CI||AUC||SE||95% CI||AUC||SE||95% CI|
Figure 2 presents lowess-based calibration plots that examine the relation between predicted and observed rates of GS upgrade in different groups of patients. All models differed substantially from ideal predictions. The risk values estimated by Kulkarni et al demonstrated a relatively higher correlation with the actual frequency of the upgrade. However, this model tended to overestimate the probability of upgrading as the actual frequency increased. The magnitude of departure from ideal predictions increased from <10% at an actual probability of 20% to >20% in patients who had an actual probability of 40%.
The DCA (Fig. 3) indicated that the Kulkarni et al nomogram was the only 1 that provided a moderate net benefit. For example, net benefits in the low-risk subgroup were 1.6% and 7% for threshold probabilities of 25% and 35%, respectively. Similar net benefit values were observed in the other subgroup.
Accurate and preferably objective estimates of the likelihood of events affecting treatment success are essential for patient counseling and the decision-making process. Although these estimates traditionally relied on physician judgment, more accurate and individualized predictions can be made using prognostic tools based on statistical models.16 Several prognostic factors modeled as continuous variables to predict a particular endpoint may be represented graphically with a nomogram. By plotting certain characteristics of patients on continuous scales, an individual prediction can be calculated. It has been demonstrated that nomograms are the most accurate and discriminative tools for predicting outcomes in patients with prostate cancer.17
Although nomograms often are used as objective tools, each is based on the particular population of patients from which it was developed. Accordingly, using a nomogram on a different patient population may negatively affect its predictive power. External validation represents the gold standard for assessing the applicability of predictive tools by studying predictive accuracy, validity, and other performance characteristics in independent data sets.17 If several models are applied to the same group of patients, then performance may be compared in a head-to-head fashion.
In our study, we performed external validation and comparison of 4 nomograms for predicting the risk of pathologic upgrading in a cohort of patients who had prostate cancer with biopsy a GS ≤6. Analysis of the entire cohort indicated that the ability of the nomograms to discriminate between patients with and without upgrades was limited. The models produced an AUC-ROC in the range of 0.5 to 0.6, which demonstrates low discriminatory power.18
Missing high-grade cancer at biopsy carries particular clinical significance in patients with low-risk prostate cancer, who are candidates for active surveillance19 or brachytherapy. Thus, we performed a separate analysis in the corresponding subgroup in our cohort. The performance of the nomograms in this subset of patients did not differ significantly from the performance in the entire cohort (Table 1). Calibration plots illustrated poor performances of the models across the entire range of predicted values. The Kulkarni et al nomogram had some correlation with the ideal prediction but tended to significantly overestimate the risk of upgrade for predicted probabilities >20%. The same nomogram was the only tool that provided a marginal net benefit at in DCA.
Extended prostate biopsies consisting of at least 10 cores have emerged as the current standard of care.20 Previous studies and our current analysis indicate that extended biopsy schemes are associated with a decreased risk of clinically significant GS upgrading. Therefore, it is important to evaluate the performance of nomograms in patients who undergo extended biopsy, because they most closely represent the contemporary population of patients being considered for less aggressive management. In our patients with low-risk disease and extended biopsy, the efficacy of the studied nomograms was similar to that for other cohorts.
It should be noted that the number of cores taken at biopsy is not the only factor that determines the probability of missing high-grade cancers. Certainly, even saturation biopsies are not 100% accurate in their ability to predict the pathologic Gleason score.21 Furthermore, the number of prostate biopsy cores obtained in clinical practice is limited to a certain extent by both medical and nonmedical factors. Therefore, nomograms may be very valuable additions to biopsy results, because they integrate different clinical variables associated with the possibility of an upgrade in Gleason score.
A possible reason for the relative under performance of the Chun et al and Capitanio et al nomograms, which had AUCs <0.5, may be the limited number of factors that were included in those models. The nomogram described by Chun et al8 was constructed from a large data set of 4789 patients, but it did not incorporate potentially important predictors of upgrading, such as biopsy characteristics, results, and prostate volume. In contrast, the Capitanio et al nomogram relies heavily on the features of the prostate biopsy and does not include any other clinical or demographic characteristics of the patients.11 Although using small numbers of parameters may make a nomogram more practical, this also may limit its predictive power. To increase their efficacy, future models could integrate more variables, possibly including newer markers, such as prostate cancer antigen 3.22
The clinical backgrounds of the studied nomograms differ significantly. Therefore, we have analyzed their performances in different subgroups of our cohort. Although each of the studied tools had inherent limitations, we believe that it is important to analyze their clinical performances not only to demonstrate deficiencies but also to provide information for further improvements.
The current study cohort included patients with PSA levels >10 ng/dL, and such patients are not considered low-risk according to the widely used D'Amico et al model.3 They were included for 2 main reasons. First, patients with PSA levels >10 ng/dL were included in the cohorts from which 2 of 4 studied nomograms were constructed. Furthermore, it is known that the value of PSA for distinguishing patients with low-risk/low-volume prostate cancer is limited. For example, the PSA level is not being used as an eligibility criterion for active surveillance and/or as an indication of treatment in many contemporary series,23-25 including our own.19 Therefore, the possibility of a GS upgrade remains clinically important even in patients who have PSA levels >10 ng/mL.
Our study had several limitations. The number of patients included in the analysis could have been larger; however, the number of available patients was limited by the large number of variables required for analysis. Although most biopsies were performed at outside institutions because of the referral nature of our practice, urologic pathology specialists performed centralized histopathologic review in all cases. Although we used prostate weight instead of TRUS-based volume, it has been demonstrated that these 2 variables are closely correlated. In addition, we used the published versions of the nomograms to make our calculations, because the actual regression coefficients from the original cohorts were not available. Although this may have resulted in small errors, it is unlikely that they significantly affected the meaning of the outcomes. Some models also were developed for other types of GS upgrading, for which their predictive performance may be different.
In conclusion, our ability to predict clinically significant GS upgrades remains limited. In our analysis, the best performing nomogram was characterized by low accuracy, poor calibration, and small potential clinical benefit. Clearly, further studies are needed to develop prognostic tools that can accurately estimate the probability of GS under grading at transrectal prostate biopsy and, thus, will be useful for clinical practice.
No specific funding was disclosed.
CONFLICT OF INTEREST DISCLOSURES
The authors made no disclosures.
- 12Comparison of transrectal ultrasound prostatic volume estimation with magnetic resonance imaging volume estimation and surgical specimen weight in patients with benign prostatic hyperplasia. J Clin Ultrasound. 1996; 24: 169-174., , , et al.