Significant upgrading affects a third of men diagnosed with prostate cancer: predictive nomogram and internal validation

Authors


  • FKHC and AB contributed equally to the manuscript

Pierre I. Karakiewicz, Cancer Prognostics and Health Outcomes Unit, University of Montreal Health Center (CHUM), 1058, rue St-Denis, Montréal, Québec, Canada, H2X 3J4. e-mail: pierre.karakiewicz@umontreal.ca

Abstract

OBJECTIVE

To explore the rate of significant upgrading from biopsy to radical prostatectomy (RP) specimens in a contemporary cohort, and to develop a prognostic model capable of predicting the probability of significant upgrading, as previous reports indicate that up to 43% of men with low-grade prostate cancer at biopsy will be diagnosed with high-grade cancer at RP.

PATIENTS AND METHODS

The study cohort comprised 4789 men (median age 63 years, range 39–82) treated with RP, with available clinical stage, prostate-specific antigen levels, biopsy and RP Gleason sum values. These variables were used as predictors in multivariate logistic regression models (LRMs) addressing the rate of significant Gleason sum upgrading, defined as a Gleason sum increase either from ≤ 6 to ≥ 7 or from 7 to ≥ 8 between the biopsy and RP specimens. Regression coefficients were used to develop and validate (200 bootstrap re-samples) a nomogram predicting significant biopsy Gleason sum upgrading.

RESULTS

Significant biopsy Gleason sum upgrading was recorded in 1349 (28.2%) patients. In multivariate LRMs, all predictors were highly significant (all P < 0.001). The bootstrap-corrected accuracy of the nomogram predicting the probability of significant Gleason sum upgrading between biopsy and RP specimens was 75.7%.

CONCLUSION

Our nomogram might prove highly useful when the possibility of a more aggressive Gleason variant could change the treatment options.

Abbreviations
RP

radical prostatectomy

LRM

logistic regression model

AUC

area under the curve.

INTRODUCTION

In men with prostate cancer, any Gleason sum upgrading from that of the biopsy to the final pathological specimen can alter the treatment options [1–3]. Previous reports [4–7] suggest that up to 43% of men with a low-grade prostate cancer at biopsy will be diagnosed with high-grade prostate cancer at radical prostatectomy (RP). The pathological Gleason sum represents a better predictor of biochemical recurrence than the biopsy Gleason sum [8]. A high RP Gleason sum is associated with a higher rate of biochemical recurrence and worse prostate cancer-specific survival [9–11]. We previously reported a prognostic model [12] that was 80.4% accurate in predicting the probability of biopsy Gleason sum upgrading. Currently, it represents the only available and highly accurate clinical aid capable of predicting pathological Gleason sum upgrading. King [4] and King et al.[7] defined significant Gleason sum upgrading as a Gleason sum increase either from ≤ 6 to ≥ 7 or from 7 to ≥ 8 between the biopsy and RP specimens. They distinguished between any upgrading and significant upgrading, and suggest that significant upgrading represents a clinically meaningful entity. Moreover, previous reports indicate that with more extended biopsy schemes, the risk of significant upgrading decreases [13,14] due to higher sampling density and more accurate pathological biopsy evaluation.

We hypothesized that significant biopsy Gleason sum upgrading can be predicted as accurately as the overall rate of upgrading. Moreover, we hypothesized that the extent of gland sampling (sextant vs ≥ 10 cores) will have a negligible effect on the ability to predict this clinically meaningful entity. To address these hypotheses, we examined the rate of significant Gleason sum upgrading between biopsy and final pathology, and developed a nomogram predicting the probability of significant biopsy Gleason sum upgrading in a large multi-institutional cohort.

PATIENTS AND METHODS

Clinical and pathological data were prospectively gathered in 5301 consecutive patients from two continents and three centres (University Vita Salute, San Raffaele, Milan, Italy; University of Hamburg, Germany; and University of Texas, Southwestern Medical School, USA). All men had biopsy-confirmed, clinically localized prostate cancer and all underwent RP. Of these, 512 patients were excluded because of missing data on the number of removed biopsy cores. Analyses targeted 4789 evaluable patients assessed with six or more biopsy cores. Analyses were repeated in 1682 men assessed with ≥ 10 biopsy cores to assess the effect of the extent of biopsy sampling.

Clinical stage was assigned by the attending urologist according to the 2002 TNM system. Under TRUS guidance, 6–14 needle cores were obtained; 10–14 biopsy cores were taken in men included in a subgroup analyses. Pre-treatment PSA levels were measured before a DRE and TRUS. The biopsy Gleason sum was assigned by pathologists from each centre. All RP specimens were processed according to the Stanford protocol and were graded according to the Gleason system [15]. No patient received neoadjuvant androgen deprivation therapy.

For both patient cohorts, the same predictors, i.e. PSA level, clinical stage and biopsy Gleason sum, were used in univariate and multivariate logistic regression models (LRMs) addressing the rate of significant Gleason sum upgrading between biopsy and RP pathology. Significant upgrading was defined as a biopsy Gleason sum changing from ≤ 6 to ≥ 7, or from 7 to ≥ 8, according to previous reports by King [4] and King et al.[7]. LRM coefficients were then used to develop a nomogram predicting the probability of significant Gleason sum upgrading. The accuracy of the nomogram was quantified using the area under the curve (AUC) and then subjected to 200 bootstrap re-samples for internal validation and to reduce the over-fit bias. The extent of over- or underestimation relative to the observed rate of significant upgrading was explored graphically using nonparametric Loess smoothing plots. All analyses were repeated for the subgroup who had ≥ 10 biopsy cores taken. All tests were two-sided with a significance level set at P < 0.05.

RESULTS

The patients’ characteristics are shown in Table 1 and data are stratified between the entire cohort, for men assessed with ≥ 10 biopsy cores, and for the participating institutions. Pre-treatment PSA levels were 0.1–50 ng/mL, and PSA levels of >20 ng/mL were recorded in 299 patients (6.2%), vs ≤ 10 ng/mL in 3542 (74.0%) men. Clinical stages T1c and T2 were recorded in 4732 (98.8%) patients. Of all men, 4075 (85.1%) had a biopsy Gleason sum of 6 or 7, representing those who were at greatest risk of being significantly upgraded to a more aggressive pathological Gleason sum. The characteristics of the subgroup with ≥ 10 cores were virtually the same (Table 1) as those of the entire cohort.

Table 1.  Descriptive characteristics of entire cohort and subgroups according to extended initial biopsy scheme and institutions
VariableEntire cohortSubgroup with ≥10 biopsy coresHHMilanTexas
  1. Significant upgrading, biopsy Gleason sum changing from either ≤6 to ≥7 or from 7 to ≥8 at RP pathology.

Mean (median, range):
 Age, years  62 (63, 39–82)  63 (64, 39–82)  62 (63, 39–79) 66 (66, 47–82) 60 (61, 39–75)
 Preop. PSA level, ng/mL   8.7 (6.7, 0.1–50.0)   8.4 (6.5, 0.1–47.3)   8.7 (6.7, 0.1–50.0)  9.4 (7.3, 0.2–49.9)  7.7 (6.1, 0.3–44.5)
Clinical stage, n (%)
 T1c 3191 (66.6) 1183 (70.3)2565 (66.9)233 (59.6)393 (69.6)
 T2 1541 (32.2) 478 (28.4)1220 (31.8) 151 (38.6)170 (30.1)
 T3  57 (1.2)  21 (1.2)  48 (1.3)  7 (1.8)  2 (0.4)
Mean (median, range) number   of biopsy cores   8 (8, 6–14)   11 (10, 10–14)   8 (6, 6–14)  11 (12, 6–14)  8 (6, 6–14)
N (%):
 Biopsy Gleason sum
   2   4 (0.1)   1 (0.1)   3 (0.1)  1 (0.2)
   3  17 (0.4)   4 (0.2)  17 (0.4)
   4  69 (1.4)  24 (1.4)  48 (1.3) 15 (3.8)  6 (1.1)
   5 451 (9.4) 185 (11.0) 325 (8.5) 99 (25.3) 27 (4.8)
   62755 (57.5) 971 (57.7)2264 (59.1)146 (37.3)345 (61.1)
   71320 (27.6) 440 (26.2)1055 (27.5)109 (27.9)156 (27.6)
   8 122 (2.5)  39 (2.3)  88 (2.3) 13 (3.3) 21 (3.7)
   9  50 (1.0)  18 (1.1)  33 (0.9)  9 (2.3)  8 (1.4)
  10   1 (0.02)  1 (0.2)
 Pathological Gleason sum
   4  15 (0.3)   8 (0.5)   5 (0.1)  8 (2.0)  2 (0.4)
   5 531 (11.1) 220 (13.1) 435 (11.3) 82 (21.0) 14 (2.5)
   61663 (34.7) 577 (34.3)1377 (35.9) 68 (17.4)218 (38.6)
   72418 (50.5) 812 (48.3)1941 (50.6)198 (50.6)279 (49.4)
   8  73 (1.5)  25 (1.5)  28 (0.7) 17 (4.3) 28 (5.0)
   9  88 (1.8)  40 (2.4)  47 (1.2) 18 (4.6) 23 (4.1)
  10   1 (0.02)  1 (0.2)
Grade agreement2570 (53.7) 872 (51.8)2107 (55.0)165 (42.2)298 (52.7)
Biopsy Gleason sum:
 upgrading1576 (32.6) 565 (33.6)1205 (31.4)158 (40.4)213 (37.7)
 downgrading 643 (13.4) 245 (14.6) 521 (13.6) 68 (17.4) 54 (9.6)
Significant upgrading1349 (28.2) 482 (28.7)1022 (26.7)128 (32.7)199 (35.2)
Significant downgrading 298 (6.2) 105 (6.2) 234 (6.1) 22 (5.6) 42 (7.4)
Number of patients4789 (100)1682 (35.1)3833 (80.0)391 (8.2)565 (11.8)

In all patients, the concordance between the biopsy and RP Gleason sum was recorded in 2570 (53.7%). Overall upgrading was recorded in 1576 (32.9%) men, whereas 13.4% were downgraded. In 298 (6.2%) men the Gleason sum decreased from ≥ 8 to ≤ 7, or from 7 to ≤ 6, and was defined as significant downgrading. There was significant upgrading in 1349 (28.2%) men. In patients assessed with ≥ 10 biopsy cores, concordance between the biopsy and RP Gleason sum was recorded in 872 (51.8%). Overall upgrading was recorded in 565 (33.6%) men, whereas 14.6% were downgraded. Stratified according to institutions, Gleason biopsy and pathology agreement was highest in the Hamburg dataset (2107 men, 55.0%), followed by Texas (298, 52.7%) and Milan (165, 42.2%). Interestingly, significant upgrading was highest in Texas (199 men, 35.2%), followed by Milan (128, 32.7%) and Hamburg (1022, 26.7%). For the entire cohort, significant up- and downgrading was reported in 482 (28.7%) and 105 (6.2%), respectively (Table 2).

Table 2.  The concordance between biopsy and RP Gleason sum in 4789 men
Pathological Gleason sumBiopsy Gleason sum
2345678910Total
 400239100015
 5221513532748200531
 6172515613241471201663
 7082615310711063801702418
 810121430205073
 9000210301926188
100000010001
Total41769451275513201225014789

To emphasize the clinical usefulness of significant upgrading vs overall upgrading, we divided the cohort into three risk levels according to a suggestion by D’Amico et al.[8]. Thus, significant upgrading was noted in 38.3 vs 96.7% and 28.6 vs 32.5% for low- and intermediate-risk patients.

Table 3 shows the uni- and multivariate LRMs for the entire cohort. In univariate analyses, all variables were highly statistically significant (all P ≤ 0.026) predictors of significant biopsy Gleason sum upgrading. Of all predictors, biopsy Gleason sum (69.3%) was the most informative, followed by PSA level (53.5%), and clinical stage (51.6%), where 50% accuracy represents random chance. In multivariate analyses, all variables were highly statistically significant (P < 0.001). The multivariate, 200 bootstrap-corrected predictive accuracy was 75.7% and exceeded the accuracy of any univariate predictor. Table 3 also shows the uni- and multivariate LRMs restricted to patients assessed with ≥ 10 biopsy cores. In univariate analyses, all predictors were highly statistically significant predictors of significant biopsy Gleason sum upgrading (P ≤ 0.015). Again, biopsy Gleason sum (67.0%) was the most informative, followed by PSA level (55.9%) and clinical stage (53.2%). In multivariate analyses, all variables were highly statistically significant (P < 0.001). The multivariate, 200 bootstrap-corrected predictive accuracy was 74.2% and exceeded all univariate predictors.

Table 3.  Univariate and multivariate LRMs with the corresponding predictive accuracy estimates to predict significant upgrading
Population (n)PredictorsUnivariate LRMMultivariate LRM
ORPPA, %ORPPA, %
  1. OR, odds ratio; PA, predictive accuracy.

All patients (4789)PSA, ng/mL<0.00153.49<0.00175.73
PSA linear1.072<0.001 1.133<0.001 
PSA cubic spline0.9160.002 0.882<0.001 
Clinical stage0.02651.63<0.001 
T2 vs T1c1.1280.078 1.855<0.001 
T3 vs T1c0.4940.054 1.9110.191 
Biopsy Gleason sum<0.00169.27<0.001 
6 vs 2–51.1890.076 1.2070.061 
7 vs 2–50.087<0.001 0.060<0.001 
8–10 vs 2–5<0.0010.645 <0.0010.618 
≥10 biopsy cores  (1682)PSA, ng/mL<0.00155.92<0.00174.16
PSA linear1.147<0.001 1.202<0.001 
PSA cubic spline0.821<0.001 0.797<0.001 
Clinical stage0.01553.17<0.001 
T2 vs T1c1.3020.024 1.839<0.001 
T3 vs T1c0.2800.088 2.6660.341 
Biopsy Gleason sum<0.00166.98<0.001 
6 vs 2–51.1570.353 1.2620.150 
7 vs 2–50.123<0.001 0.096<0.001 
8–10 vs 2–5<0.0010.691 <0.0010.669 

Figure 1 shows the regression coefficient-based nomogram. Its axes indicate that biopsy Gleason sum has the strongest effect on the probability of significant Gleason sum upgrading. Interestingly, virtually the same risk is associated with a biopsy Gleason sum of 6 as with biopsy Gleason sums of 2–5. As expected, biopsy Gleason sum 7 carries an intermediate risk and biopsy Gleason sums 8–10 do not increase the risk. PSA level is linearly related to increasing risk. The steepest relation between PSA level and ‘risk-points’ was with PSA values of 0–10 ng/mL. Finally, cT2 and cT3 stages marginally increase the risk vs cT1c.

Figure 1.

Nomogram predicting significant Gleason sum upgrading between biopsy and RP specimen. ClinStage, clinical stage; BXGlSum, biopsy Gleason sum, Prob. of Gl.Up, probability of significant biopsy Gleason sum upgrading. To obtain the nomogram-predicted probability of significant biopsy upgrading, locate the patient values at each axis, draw a vertical line to the ‘Point’ axis to determine how many points are attributed for each variable value; sum the points for all variables; locate the sum on the ‘Total Points’ line to be able to assess the individual probability of significant biopsy Gleason sum upgrading on the ‘Prob. of Gl.Up’ line.

Figure 2A–D depict the uni- and multivariate calibration plots, showing the relationship between observed and predicted probability for each predictor (Fig. 2B–D) and for the combined effect of all three predictors (Fig. 2A). The performance of PSA level (AUC 55.9%; Fig. 2B) and clinical stage (AUC 53.2%; Fig. 2C), show major departures from ideal predictions. The performance characteristics of biopsy Gleason sum virtually parallel the ideal prediction line and closely resemble performance characteristics of the combined model (Fig. 2A). Despite this resemblance, the predictive accuracy of biopsy Gleason sum is 69.3% vs 75.7% for the entire model. This indicates that areas of under- and overestimation of the risk of significant biopsy Gleason sum upgrading are evenly distributed for biopsy Gleason sum, as well as for the entire model. However, incorrect predictions will be made in 30.7% of cases, when biopsy Gleason sum is used in isolation, vs in 24.3% when all variables are considered.

Figure 2.

Univariate and multivariate calibration plots for significant Gleason sum upgrading between biopsy and pathology: A, Multivariate nomogram calibration plot. The calibration plot shows the performance of the multivariate nomogram model to predict significant biopsy Gleason sum upgrading. Specifically, nomogram-predicted probabilities are compared to the observed rates of significant biopsy Gleason sum upgrading. B, the PSA calibration plot; this shows the univariate performance of PSA level in predicting significant biopsy Gleason sum upgrading. Specifically, univariate PSA-predicted probabilities are compared to the observed rates of significant biopsy Gleason sum upgrading. C, The clinical stage calibration plot; this shows the univariate performance of clinical stage in predicting significant biopsy Gleason sum upgrading. Specifically, univariate clinical stage predicted probabilities are compared to the observed rates of significant biopsy Gleason sum upgrading. D, Biopsy Gleason sum calibration plot, showing the univariate performance of biopsy Gleason sum to predict significant biopsy Gleason sum upgrading. Specifically, univariate biopsy Gleason sum predicted probabilities are compared to the observed rates of significant biopsy Gleason sum upgrading. In all plots, the abscissa represents the predicted probability of significant biopsy upgrading, and the ordinate the observed rate of significant upgrading. Perfect prediction would correspond to a slope of 1 (diagonal 45° broken line).

DISCUSSION

Most reported biopsy Gleason sums consist of either 6 or 7; these Gleason sums are at greatest risk of being upgraded. King [4] and King et al.[7] coined the term of ‘significant upgrading’, defined as a biopsy Gleason sum upgrading from ≤ 6 to ≥ 7, or from 7 to ≥ 8. The notion of significant upgrading rests on the concept that a single Gleason sum difference might affect treatment decision-making. However, to date there are no tools capable of reliably and accurately predicting this phenomenon. To address this omission we successfully developed and validated a model predicting significant Gleason sum upgrading from biopsy to final pathology using clinical variables (PSA level, clinical stage and biopsy Gleason sum).

Our model relies on three readily available clinical variables; all are statistically significant univariate and multivariate predictors of significant biopsy Gleason sum upgrading. Despite their virtually equal statistical significance, the three variables had different abilities to predict significant biopsy Gleason sum upgrading, varying from 51.6% to 69.3%. The individual ability of the predictors was substantially exceeded by their combined input, which resulted in a predictive accuracy of 75.7%. The increase in accuracy was paralleled by remarkably better combined performance characteristics, relative to those recorded for each variable alone (Fig. 2B–D). Examination of Fig. 2B,C shows a poor performance for PSA level and clinical stage, when predicted vs observed rates of significant biopsy Gleason sum upgrading are compared. Performance characteristics indicate that the low predictive accuracy of PSA level (Fig. 2B) mainly stems from the overestimation of the observed rate of upgrading by PSA level in low predicted risk ranges. The effect was opposite for clinical stage (Fig. 2C), which underestimated the risk in low predicted risk ranges. Biopsy Gleason sum (Fig. 2D), as a single variable, showed close agreement between its predictions and observed rates, which paralleled the concordance of the combined model. However, its ability to accurately predict significant biopsy Gleason sum upgrading was grossly inferior (AUC 69%) to that of the multivariate combined model (75.7%). This indicates that individual variables can have a seemingly good performance that is not invariably supported by accuracy. This reinforces previous observations suggesting that multivariate input is better than univariate input [16].

Extended biopsy schemes (≥10 cores) might affect the rate of significant biopsy Gleason sum upgrading, and the ability to predict it. Thus, we repeated all statistical analyses in a subset of 1682 (35.1%) men who had ≥ 10 cores taken at the initial biopsy. The results virtually replicated the findings of the entire cohort. The rate of significant biopsy Gleason sum upgrading was 28.7%, vs 28.2% in the entire cohort. The ability to predict the rate of significant biopsy Gleason sum upgrading was 74.2%, vs 75.7% in the entire cohort. These results indicate that the difference in the extent of gland sampling resulting from the use of extended biopsy schemes is almost negligible in the context of significant biopsy Gleason sum upgrading.

The present findings are important in view of a substantial rate of significant Gleason sum upgrading in this contemporary cohort, and in previous patient populations. The largest published cohort (2982 men) showed a rate of overall biopsy Gleason sum upgrading of 29.3%[12]. The present cohort of 4789 patients showed an overall upgrading rate of 32.6% and a rate of significant Gleason sum upgrading of 28.2%. This is slightly lower than reported by King et al.[13], where there was significant upgrading in 32% of patients. Close agreement between the present results and the data of King et al. can be explained by the similar extent of gland sampling in both studies. Other explanations might include similar patient characteristics, such as PSA level, clinical stage and distribution of biopsy Gleason sum.

Several applications of the present findings can be considered, e.g. the choice of interstitial brachytherapy might be reconsidered in men who are at greater risk of significant biopsy Gleason sum upgrading. Similarly, neoadjuvant hormonal therapy might be considered if radiation therapy is contemplated. Finally, in surgical candidates, the risk of significant biopsy Gleason sum upgrading might contribute to different considerations on the extent of neurovascular bundle resection and the implications of positive surgical margins [15].

We are not the first to recommend the use of multivariate models to predict Gleason sum upgrading between biopsy and RP [17]. Unfortunately, the accuracy and performance characteristics of previous models were not provided. Testing of the this model in our previous report [12], revealed suboptimal performance (52.3%). The advantage of the present significant upgrading nomogram over the ‘look-up’ table of D’Amico et al.[17] stems from its capacity to adjust the effect of one variable for the contribution of other covariates, from its better accuracy (75.7% vs 52.3%), and from the ability of the present model to not only predict upgrading but to predict significant upgrading.

To date there are no other models capable of accurately predicting the rate of significant upgrading. Consequently, despite its limitations, such as predictions that are not perfect, our model represents the only alternative to clinical ratings of the probability of significant biopsy Gleason sum upgrading. Moreover, the current model is better than our previously published nomogram [12], which predicts overall Gleason sum upgrading. Additionally, the notion of significant biopsy Gleason sum upgrading is clinically more relevant than any degree of biopsy upgrading. Better clinical relevance compensates for the loss in predictive accuracy (present 75.7%, vs previous 80.4%).

There are clear limitations to the present study; we included systematic sextant and 10-core biopsy data in the cohort, but in the present data the difference in the rate of upgrading was not statistically significant between these biopsy regimens. However, biopsy schemes that rely on taking even more cores might be associated with a lower rate of biopsy Gleason sum upgrading. Lack of differences between limited (median number of biopsy cores, six) and extended (median 8) biopsy schemes in the observed rate of upgrading might be attributable to a relatively narrow range of biopsy cores. Thus, it is conceivable that there might be a more pronounced difference in individuals exposed to sextant/octant vs men exposed to 14 or 18 cores. Moreover, the pathologist’s experience could also strongly contribute. Our institutions represent centres with a large surgical volume, and therefore the rates of upgrading might be lower than in smaller-volume hospitals. The overall accuracy of our model was not perfect (75.7%); this limitation is shared with other predictive models, where the accuracy rarely exceeds 80%[18,19]. Therefore, up to 24.3% of patients might be provided with incorrect predictions. Also, we have not tested the performance of the nomogram in an external data set. Instead, we confirmed its accuracy using a statistically accepted surrogate of external validation, i.e. bootstrapping. Finally, the accuracy of the model could potentially be improved by integrating additional predictor variables, e.g. the level of expertise of the pathologist, or by existing biomarkers [20]. Despite these limitations, our model represents an important contribution and we strongly advocate its use.

In conclusion, significant Gleason sum upgrading between biopsy and final pathology represents an important consideration in treatment decision-making, even in most contemporary patients. Nearly a third of patients with prostate cancer will be significantly upgraded and our nomogram can accurately identify those men.

CONFLICT OF INTEREST

None declared.

Ancillary