The probability of Gleason score upgrading between biopsy and radical prostatectomy can be accurately predicted

Authors


Pierre I. Karakiewicz md frcsc, Cancer Prognostics and Health Outcomes Unit, University of Montreal Health Center (CHUM), 1058, rue St-Denis, Montreal, Quebec, H2X 3J4, Canada. Email: pierre.karakiewicz@umontreal.ca

Abstract

The objective of this study was to test the external validity of a previously developed nomogram for the prediction of Gleason score upgrading (GSU) between biopsy and radical prostatectomy (RP). The study population consisted of 973 assessable patients treated with RP at a tertiary care institution. The accuracy of the nomogram was quantified with the receiver operating characteristics curve-derived area under the curve. The performance characteristics (predicted vs observed rate of GSU) were tested within a calibration plot. Overall, GSU was recorded in 39.8% (n = 387) of patients at RP. Of patients with GSU, 70 (18.1%), 23 (5.9%) and 32 (8.3%), respectively, had extracapsular extension, seminal vesicle invasion and lymph node invasion. The accuracy of the nomogram was 74.9% (confidence interval 72.1–77.6%). The model tended to underestimate the observed rate of GSU and the discordance between the predicted and observed rate of GSU ranged from −7 to +10%. The current tool represents the most accurate method of predicting GSU between biopsy and RP. Nonetheless it is not perfect and its performance characteristics should be known prior to its use in clinical decision-making.

Introduction

Previous reports indicate that 30–40% of men will have more aggressive prostate cancer (PCa) variants at radical prostatectomy (RP) than diagnosis at biopsy.1–3 Gleason score upgrading (GSU) affects treatment selection. For instance, GSU increases the rate of extracapsular extension (ECE) and should prompt the clinician to consider a treatment modality where ECE can be effectively addressed.4–10 For example, RP or external beam radiotherapy (EBRT) represent adequate choices for patients at high risk of GSU and associated ECE. Conversely, brachytherapy, high-intensity-focused ultrasound (HIFU) or active surveillance represent less adequate options.11–14

Three models were developed to predict the probability of GSU.1,2,9 One was externally validated and yielded a concordance index of 0.5 (50% accuracy), which is equivalent to a flip of a coin.9 The second model achieved 71% accuracy in internal validation. However, it relied on nine predictor variables, some of which are not routinely recorded in patients' charts.2 The third demonstrated a concordance index of 0.8 (80% accuracy).1 However, this accuracy estimate was not confirmed in an independent external population, which prompted vivid criticisms that were founded on lack of formal external validation.15,16 Based on the importance of GSU in PCa decision-making, we decided to perform a formal external validation of the Chun et al. model1 using a fully independent dataset. Moreover, we examined the rate of ECE, seminal vesicle invasion (SVI) and lymph node invasion (LNI) according to the presence or absence of GSU to confirm the validity of the GSU concept.

Methods

The study population consisted of 973 assessable patients treated with RP at a tertiary care institution. The attending urologist assigned the clinical stage according to the 2002 tumor–node–metastasis system. Of all patients, 92% had more than 10 cores taken and all were carried out at our institution. All prostatectomy specimens were processed according to the Stanford protocol and were also graded according to the Gleason system. The RP specimens were assessed for the presence of ECE, which was diagnosed if cancer was evident outside the prostatic capsule and the seminal vesicles and the lymph node were free of tumor.17 SVI was diagnosed if the tumor invaded the muscular wall of one or both seminal vesicles without evidence of LNI.17 LNI was assigned if one or several pelvic lymph nodes were involved with cancer.17 No patient received neoadjuvant androgen therapy.

The accuracy of the original Chun et al. nomogram1 was quantified with the receiver operating characteristics (ROC) curve. To determine the nomogram predicted probability of GSU, we applied the Chun et al. nomogram (Fig. 1) to all 973 observations. The ROC methodology was then applied to define the area under the curve (AUC) relative to the observed rate of GSU. The AUC may range from 0.5 to 1.0, where 0.5 is as bad as a flip of a coin and 1.0 is equivalent to perfect predictions. Finally, the concordance between the nomogram predicted and the observed rate of GSU was quantified in calibration plots that relied on the Loess smoother function. Within the calibration plots, a 45° line indicates perfect agreement and the extent of departure from perfect agreement can be quantified for specific predicted values.

Figure 1.

Original Chun et al. nomogram1 predicting Gleason sum upgrading between biopsy and radical prostatectomy. Instruction: To obtain nomogram-predicted probability of biopsy upgrading, locate patient values at each axis. Draw a vertical line to the ‘Point’ axis to determine how many points are attributed for each variable value. Sum the points for all variables. Locate the sum on the ‘Total Points’ line to be able to assess the individual probability of biopsy Gleason sum upgrading on the ‘P(Upgrade)’ line. PSA: prostate-specific antigen (ng/mL); BX Gleason Pri: primary biopsy Gleason score; BX Gleason Sec: secondary biopsy Gleason score; P(Upgrade): probability of biopsy Gleason sum upgrading.

All statistical analyses were performed using S-Plus Professional, version 1 (MAthSoft Inc., Seattle, WA, USA). Moreover, all tests were two-sided with a significance level of 0.05.

Results

Statistically significant differences were recorded between clinical stage, biopsy and pathological Gleason score distributions of the current population relative to the development cohort of the GSU nomogram (Table 1).

Table 1.  Descriptive characteristics of the validation cohort (n = 973) and of the original cohort (Chun et al.,1n = 2982)
VariablesExternal validation cohort (n = 973)Original cohort (n = 2982)P-value
  1. NA, not applicable; PSA, prostate-specific antigen.

PSA  NA
 Mean (median)8.1 (6.4)9.6 (7.0)
 Range0.8–110.00–125.0
Clinical stage (%)  <0.001
 T1c734 (75.4)1951 (65.4)
 T2a133 (13.7)493 (16.5)
 T2b23 (2.4)349 (11.7)
 T2c4 (0.4)108 (3.6)
 T379 (8.1)81 (2.7)
Primary biopsy Gleason sum (%)  <0.001
 ≤3869 (89.3)2667 (89.4)
 4100 (10.3)310 (10.4)
 54 (0.4)5 (0.2)
Secondary biopsy Gleason sum (%)  <0.001
 ≤3775 (79.7)2209 (74.1)
 4180 (18.5)742 (24.9)
 518 (1.8)31 (1.0)
Biopsy Gleason sum (%)  <0.001
 ≤6719 (73.9)1993 (66.8)
 7203 (20.9)887 (29.7)
 836 (3.7)73 (2.4)
 915 (1.5)29 (1.0)
Radical prostatectomy primary Gleason sum (%)  <0.001
 ≤3818 (84.1)2608 (87.5)
 4146 (15.0)368 (12.3)
 59 (0.9)6 (0.2)
Radical prostatectomy secondary Gleason sum (%)  <0.001
 ≤3564 (58.0)1719 (57.6)
 4363 (37.3)1223 (41.0)
 546 (4.7)40 (1.3)
Radical prostatectomy Gleason sum (%)  <0.001
 ≤6467 (48.0)1397 (46.8)
 7438 (45.0)1527 (51.2)
 825 (2.6)19 (0.6)
 942 (4.3)39 (1.3)
 101 (0.1)0 (0.0)
Gleason sum upgrading (%)387 (39.8)875 (29.3)<0.001

Overall, 39.8% had a higher grade at RP. Of patients with GSU, 70 (18.1%), 23 (5.9%) and 32 (8.3%) patients, respectively, had ECE, SVI and LNI. In ECE and SVI cases, these proportions were statistically significantly higher than in patients without GSU (Table 2). The accuracy of the nomogram predicting GSU was 74.9% (confidence interval 72.1–77.6%).

Table 2.  Prevalence of extracapsular extension, seminal vesicle invasion and lymph node invasion stratified according to the presence of Gleason sum upgrading between biopsy and radical prostatectomy
VariablesGleason sum upgradingNo Gleason sum upgradingP-value
  1. NA, not applicable.

Number of patients387 (39.8%)586 (60.2%)NA
Extracapsular extension70 (18.1%)73 (12.5%)0.01
Seminal vesicle invasion23 (5.9%)19 (3.2%)0.04
Lymph node invasion32 (8.3%)33 (5.6%)0.11

Figure 2 shows the concordance between the predicted and the observed rates of GSU. For predicted values between 0 and 65% the nomogram tends to underestimate the rate of GSU. For example, when the probability was estimated at 35%, the true rate was close to 38%. For predicted probabilities above 65% the nomogram tends to overestimate the true rate of GSU. For example, when the probability was estimated at 80%, the true rate was close to 72%. At the most, the nomogram underestimated by 7% and overestimated by 10%.

Figure 2.

External validation (n = 973) calibration plot of the original Chun et al. Gleason sum upgrading nomogram.1

Discussion

Biopsy upgrading has important clinical implications in watchful waiting, surgery and radiotherapy (RT) candidates.1–3,9,18 It might be argued that the implications are the most important for patients who might be considered for either brachytherapy, HIFU or watchful waiting. For brachytherapy, the presence of a Gleason sum of 7 or higher represents a contra-indication.19 Similarly, a Gleason sum of 7 or higher certainly represents a contraindication for active surveillance (AS) candidates.20 Conversely, such individuals represent good candidates for RP or EBRT.20–22 These treatment modalities may effectively address the presence of a more aggressive grade. Their efficacy in patients with grade 7–10 is confirmed and surpasses that of brachytherapy, HIFU and active surveillance.23,24 Moreover, RP or EBRT may be adjusted to better address the possibility of ECE or SVI. At RP a wide resection and inclusion of the seminal vesicle represent means of improving efficacy. A wider radiation field, higher dose (afterloading) and addition of adjuvant therapy represent options for EBRT patients.25 The nomogram for prediction of individual GSU probability can help discriminating between those patients in whom RP or EBRT only should be considered and those in whom treatment alternatives such as brachytherapy, HIFU or active surveillance should not be ruled out.

Based on the importance of GSU, we decided to validate the most accurate and simple nomogram predicting GSU1 in an independent external validation sample. Using an adequately sized sample population, our metrics demonstrated an AUC that was 5% worse (80% vs 74.9%) than in the original sample. Nonetheless the external population accuracy surpassed the other two available models predicting GSU (51% and 71% accuracy).2,9 It is of note that the 71% accuracy may decrease if the nomogram is subjected to formal external validation, as was the case with the Chun et al. nomogram and several other models that were externally validated. This may be explained by differences in baseline prostate-specific antigen, clinical stage, biopsy and pathological Gleason score characteristics of the external validation dataset versus the original cohort. These population differences might also have contributed to less perfect performance characteristics than in the original article, in which there were virtually no departures from ideal predictions.

Additional limitations of this study warrant mention. We limited the study sample to patients whose biopsy and RP specimens were assessed at our institution by seven dedicated genito-urinary pathologists. Nonetheless, inter-observer variability may exist between those seven highly-trained individuals. This variability may account for the accuracy and performance characteristics differences that were noted between the current validation and the original results that relied on pathological reports from a single pathologist, which were free of inter-observer variability.

Despite these limitations, our study proves the ability of the GSU nomogram to accurately predict the individual risk of having a more aggressive Gleason sum at RP than at biopsy. This information may improve treatment decision-making and may result in a more appropriate treatment selection and more favorable cancer control outcomes.26,27

Conclusion

Our findings confirm the accuracy of the GSU nomogram. They also provide information about the potential differences that might exist between expected and observed rates of GSU. Finally, our ECE, SVI and LNI data highlight the importance of the GSU concept in clinical practice.

Acknowledgment

Dr. Pierre I. Karakiewicz is partially supported by the University of Montreal Urology Associates, Fonds de la Recherche en Santé du Québec, the University of Montreal Department of Surgery and the University of Montreal Foundation.

Ancillary