Limited ability of existing nomograms to predict outcomes in men undergoing active surveillance for prostate cancer

Authors


Abstract

Objective

To assess the ability of current nomograms to predict disease progression at repeat biopsy or at delayed radical prostatectomy (RP) in a prospectively accrued cohort of patients managed by active surveillance (AS).

Materials and Methods

A total of 273 patients meeting low-risk criteria who were managed by AS and who underwent multiple biopsies and/or delayed RP were included in the study.

The Kattan (base, medium and full), Steyerberg, Nakanishi and Chun nomograms were used to calculate the likelihood of indolent disease (‘nomogram probability’) as well as to predict ‘biopsy progression’ by grade or volume, ‘surgical progression’ by grade or stage, or ‘any progression’ on repeat biopsy or surgery.

We evaluated the associations between each nomogram probability and each progression outcome using logistic regression with (area under the receiver-operating characteristic curve (AUC) values and decision curve analysis.

Results

The nomogram probabilities of indolent disease were lower in patients with biopsy progression (P < 0.01) and any progression on repeat biopsy or surgical pathology (P < 0.05).

In regression analyses, nomograms showed a modest ability to predict biopsy progression, adjusted for total number of biopsies (AUC range 0.52–0.67) and any progression (AUC range 0.52–0.70).

Decision curve analyses showed that all the nomograms, except for the Kattan base model, have similar value in predicting biopsy progression and any progression.

Nomogram probabilities were not associated with surgical progression in a subgroup of 58 men who underwent delayed RP.

Conclusions

Existing nomograms have only modest accuracy in predicting the outcomes of patients undergoing AS.

Improvements to existing nomograms should be made before they are implemented in clinical practice and used to select patients for AS.

Introduction

Prostate cancer screening using PSA has been shown to increase the detection of disease and reduce disease-related mortality among men with a long life expectancy [1, 2]; however, many detected prostate tumours may be clinically insignificant and, if treated, may lead to unnecessary morbidities associated with intervention, without substantial benefit [3, 4]. Active surveillance (AS) with physical examination, serial PSA assessments and biopsies, and offering treatment in response to disease progression can be a safe alternative to immediate treatment [4]. In practice, while AS studies have reported excellent short-term survival rates, viable candidates for AS are often overtreated with immediate radical prostatectomy (RP) [4, 5]. Some patients may also be misclassified at diagnosis. In a retrospective study, as many as 28% of patients undergoing AS at our institution were upgraded and up to 21% had T3 disease at RP, depending on the eligibility criteria used for AS [6]. Improvements in the ability to distinguish between indolent and significant disease are needed to increase the safety and effectiveness of AS.

Nomograms that use the clinical characteristics of patients at diagnosis have been developed to predict the presence of pathologically indolent tumours, defined according to Epstein et al. [16] as tumour volume ≤0.5 cc and no Gleason pattern >3 [7, 8]. Kattan et al. [9] created the first nomograms in 2003 based on PSA, biopsy Gleason grade, clinical stage, TRUS-based prostate volume, and percentage and total length of positive cores. Steyerberg et al. [10] developed an updated model which included similar variables. Other nomograms have subsequently been developed by Nakanishi et al. [11], based on a cohort of men with one positive core, and by Chun et al. [12], based on a larger cohort, in attempt to increase accuracy.

These nomograms have been found to be 61–79% accurate in predicting pathological indolence in patients undergoing surgery [13]; however, none of the nomograms was evaluated in patients undergoing AS and Epstein's definition of indolence is debatable. Tumour volume has not been shown to be associated consistently and independently with outcome in screened populations of patients with prostate cancer, especially when controlling for grade and stage [14]. The nomograms have been externally validated for predicting pathology after immediate surgery but not for predicting important clinical endpoints, such as disease progression [7, 15].

In the present study, we assessed the ability of these nomograms to predict disease progression defined by upgrading at repeat biopsy and by upgrading or upstaging at surgical pathology in a prospectively accrued cohort of patients on AS. We hypothesized that a lower predicted probability of indolent disease at diagnosis using any of six nomograms would be associated with a higher likelihood of disease progression on repeat biopsy and/or surgery.

Patients and Methods

From 2011, 656 men at University of California, San Francisco (UCSF) enrolled in an AS programme and consented to prospective data collection under institutional review board supervision. A total of 308 patients met the strict low-risk criteria of biopsy Gleason 3+3, PSA <10 ng/mL, ≤33% biopsy cores involved, <50% involvement in a single core, and clinical stage T1–T2 disease. Thirty-five men with neither repeat biopsies nor surgical pathology (n = 27 were still on AS at the end of the study and n = 8 eventually received radiation or hormonal therapy) were excluded because they did not progress by grade, volume, or stage. The 273 patients who were followed for at least 6 months with no active treatment and who underwent multiple biopsies and/or delayed RP were included in the study.

Patients were followed on AS with quarterly DRE and PSA measurements, TRUS every 6–12 months, and extended pattern biopsies every 12–24 months with a median (interquartile range [IQR]) of 12 (10–16) cores. Biopsies performed outside UCSF were routinely reviewed by UCSF pathologists. The total duration of follow-up was calculated as time from diagnosis to last surveillance PSA, TRUS, biopsy or office visit.

Probabilities of indolent disease were calculated with the nomograms designed by Kattan (base, medium and full), Steyerberg, Nakanishi and Chun [9-12] using each patient's clinical data at diagnosis. They all defined indolent disease according to Epstein et al. [13, 16] as tumour volume ≤0.5 mL and no Gleason pattern >3. The primary independent exposure variable was ‘nomogram probability’, defined as the predicted probability of indolent disease computed by each nomogram. Outcomes analysed included ‘biopsy progression’ (Gleason upgrade ≥3+4 or volume increase to >33% positive cores at repeat biopsy), ‘surgical progression’ (upgrading or upstaging to ≥ pT3a or pN1 at surgical pathology among patients undergoing AS who eventually underwent RP), and ‘any progression’ on biopsy or surgery.

We tested the associations between each nomogram probability and each individual outcome using logistic regression models adjusted for total number of biopsies and compared them with area under the receiver-operating characteristic curve (AUC) values to measure predictive accuracy [17]. Sensitivity analyses also were performed with similarly adjusted models, restricted to biopsies within 2 and 3 years of diagnosis. Decision curve analysis was performed to determine the threshold probability of progression above which the patient would have a ‘net benefit’ from treatment, thus mitigating harm from over- or underestimating progression using a nomogram probability [18]. Statistical analyses were performed using sas 9.1 (SAS Institute, Cary, NC, USA) and Stata 11 (StataCorp, College Station, TX, USA), with a P value < 0.05 considered to indicate statistical significance.

Results

Table 1 shows the patient characteristics at diagnosis. The patients' mean (range; sd) age was 61.3 (40–82; 7.4) years and the median (IQR) PSA was 5.0 (3.8–6.2) ng/mL. A total of 204 patients (75%) had clinical stage T1 disease. The median (IQR) prostate volume at diagnosis was 35.0 (27–48.0) mL and the median (range) of percentage of positive biopsy cores 10 (4–33%).

Table 1. Characteristics of the final cohort of 273 patients managed by active surveillance at UCSF Department of Urology, 1990–2011
Patient characteristic 
Age, years 
Mean61.3
Range40–82
sd7.4
Race, n (%) 
Asian/Pacific7 (3)
Latino4 (1)
African-American5 (2)
White236 (86)
Unknown21 (8)
PSA at diagnosis, ng/mL 
Median5.0
IQR3.8–6.2
Gleason sum ≤6 at diagnosis, n (%)273 (100)
Clinical T-stage at diagnosis, n (%) 
T1a1 (0)
T1c203 (74)
T27 (3)
T2a62 (23)
TRUS prostate volume at diagnosis, mL 
Median35
IQR27.0–48.0
Positive biopsy cores at diagnosis, % 
Median10
Range4–33
Positive tissue per core at diagnosis, % 
Median2.0
IQR1–6
Follow-up after diagnosis, months 
Median44
IQR7–201
Outcome 
Any progression on repeat biopsy or surgery, n (%) 
Yes129 (47)
No144 (53)
Biopsy progression by grade or volume, n (%) 
Yes117 (45)
No143 (55)
Upgrade ≥7 at repeat biopsy, n (%) 
Yes86 (33)
No174 (67)
Volume increase ≥33% positive cores, n (%) 
Yes52 (20)
No207 (80)
Surgical progression by grade or stage, n (%) 
Yes26 (45)
No32 (55)
Upgrade ≥7 on surgical pathology, n (%) 
Yes20 (35)
No37 (65)
Upstage≥pT3a or pN1, n (%) 
Yes9 (16)
No49 (84)

The median (range) follow-up time was 44 (7–201) months and the median (IQR) time between all biopsies was 14 (10–21) months. Most patients (n = 261, 96%) had at least one repeat biopsy, of whom 183 continued on AS and 78 eventually underwent active treatment (surgery: n = 46; radiotherapy and/or hormones: n = 32). Of those 78 patients, 58 were treated after biopsy progression while 20 had no progression. An additional 12 patients subsequently underwent RP without undergoing repeat biopsy. Decision-making data were not available for these 32 patients who were treated without biopsy progression.

Mean nomogram probabilities of indolent disease ranged from 0.18 with the Steyerberg model to 0.53 using the Nakanishi model (Table 2) and all were correlated with each other except the Kattan base model (Table 3; P < 0.01). Patients with any progression had lower probabilities than men without progression for all nomograms (Table 4; T-test, P < 0.01 for all but the Kattan base nomogram, which was P = 0.04). Among 261 patients with repeat biopsies, 117 (45%) with biopsy progression had nomogram probabilities that were significantly lower than patients without biopsy progression (Table 4; T-test, P < 0.01); however, among 58 patients who underwent RP, none of the nomogram probabilities differed in 35 (60%) men with surgical progression compared with those without progression (Table 4).

Table 2. Predicted probabilities of indolent disease using six different nomograms developed in surgical cohorts
ModelNMeanRangesd
  1. *Ultrasonography volume at diagnosis; length of cancerous tissue; length of non-cancerous tissue.
Kattan, base (PSA, cT, grade)2730.230.11–0.850.13
Kattan, medium (PSA, cT, grade, % positive cores, volume*)2730.340.06–0.930.17
Kattan, full (PSA, cT, grade, mm positive, mm negative, volume*)2730.280.00–0.960.26
Steyerberg (PSA, grade, mm positive, mm negative, volume*)2730.180.02–0.860.15
Nakanishi (PSA density, mm positive, age)2730.530.00–0.880.22
Chun (PSA, cT, grade, % positive cores, mm positive)2730.200.00–0.590.13
Table 3. Pearson's correlations of probabilities between nomograms for indolent prostate cancer
Pearson's R correlation coefficient
NomogramKattan, baseKattan, mediumKattan, fullSteyerbergNakanishiChun
  1. All correlations significant at P < 0.01.
Kattan, base      
Kattan, medium0.73     
Kattan, full0.340.58    
Steyerberg0.880.790.66   
Nakanishi0.280.510.600.59  
Chun0.300.560.710.660.80 
Table 4. Predicted probabilities of indolent disease by nomogram, among 273 men following a contemporary active surveillance protocol at UCSF
ModelOutcome
Any progression on repeat biopsy or surgery
Mean (sd)Mean (sd)P*
Kattan, base0.21 (0.09)0.24 (0.16)0.04
Kattan, medium0.29 (0.14)0.39 (0.19)<0.01
Kattan, full0.22 (0.25)0.33 (0.26)<0.01
Steyerberg0.14 (0.11)0.21 (0.18)<0.01
Nakanishi0.45 (0.22)0.59 (0.20)<0.01
Chun0.16 (0.12)0.23 (0.13)<0.01
 Biopsy progression by grade or volume
Yes Mean (sd) No Mean (sd) P*
Kattan, base0.21 (0.07)0.25 (0.16)<0.01
Kattan, medium0.29 (0.12)0.39 (0.19)<0.01
Kattan, full0.22 (0.25)0.33 (0.27)<0.01
Steyerberg0.14 (0.10)0.22 (0.18)<0.01
Nakanishi0.44 (0.22)0.59 (0.19)<0.01
Chun0.16 (0.12)0.23 (0.13)<0.01
 Surgical progression by grade or stage
Mean (sd)Mean (sd)P*
  1. *T-test.
Kattan, base0.22 (0.13)0.20 (0.06)0.49
Kattan, medium0.27 (0.17)0.32 (0.15)0.25
Kattan, full0.14 (0.19)0.19 (0.23)0.36
Steyerberg0.14 (0.14)0.13 (0.08)0.61
Nakanishi0.45 (0.21)0.47 (0.25)0.82
Chun0.12 (0.09)0.17 (0.12)0.14

A set of logistic regression models, one for each nomogram probability, was run for each progression outcome and comparative ROC curves were constructed. All nomogram probabilities were associated with any progression (AUC ranging from 0.52 to 0.70, Fig. 1; P < 0.05) and with biopsy progression adjusting for total number of biopsies (AUC ranging from 0.52 to 0.67, Fig. 1, P ≤ 0.01). Odds ratios and AUCs for biopsy progression were extremely similar to the main findings in sensitivity analyses restricting models to biopsies within 2 years (AUC 0.62–0.69) and within 3 years (AUC 0.58–0.69). Again, none of the nomogram probabilities was associated with surgical progression among the 58 men who ultimately underwent surgery (Fig. 1).

Figure 1.

Receiver-operating characteristic curves for the Kattan (base, medium, and full), Steyerberg, Nakanishi, and Chun nomograms when predicting various outcomes. (A) Any progression on repeat biopsy or surgery is predicted by the nomogram. (B) Biopsy progression by grade or volume adjusted for total number of biopsies is predicted by the nomograms. (C) Surgical progression by grade or stage is not predicted by the nomograms.

In decision curve analyses, most of the nomogram probabilities resulted in similar net benefits for prediction of biopsy progression and any progression except the Kattan base model, which was shown to have a lower net benefit (Fig. 2). For example, at a 50% threshold probability of progression, the net benefit for predicting any progression approached 0 for the Kattan base model compared with 0.1 for the other nomograms. Most of the net benefit was realized for men with intermediate risk of progression (i.e. threshold probabilities for treatment between 40 and 60%). None of the nomograms showed any net benefit for the prediction of surgical progression in the subgroup of patients who underwent RP (data not shown).

Figure 2.

Decision curve analysis of the Kattan (base, medium, and full), Steyerberg, Nakanishi, and Chun nomograms when predicting various outcomes. (A) Any progression on repeat biopsy or surgery is predicted by the nomograms. (B) Biopsy progression by grade or volume adjusted for total number of biopsies is predicted by the nomograms.

Discussion

We assessed nomograms designed to predict indolent disease based on the criteria used by Epstein et al. [16] for indolent cancer, which includes both pretreatment and post-prostatectomy measures. These nomograms use clinical characteristics of patients meeting pretreatment requirements to predict minimal pathology in post-surgical specimens [19]. The tumour volume threshold of 0.5 mL in the Epstein et al. criteria was determined from a cohort of patients with incidentally detected prostate cancer using an 8% lifetime risk of clinically significant disease [8]. As noted above, larger tumours may also be indolent in terms of clinical behaviour. The European Randomized Study of Screening for Prostate Cancer showed that organ-confined tumours with a Gleason sum ≤6 may be considered insignificant with volumes up to at least 1.3 mL [20].

We aimed to determine the ability of nomograms to predict outcomes in AS patients. In our cohort, the mean nomogram probabilities for indolent disease varied widely, from 0.18 to 0.52, consistent with those of the surgical populations used to construct the nomograms, which ranged from 20% in the Kattan studies to 55% in the Chun studies [9, 12]. These results were as expected, because the AS inclusion criteria were similar to pretreatment variables included in the Epstein et al. study. In addition, patients found to have biopsy progression had lower predicted probabilities for indolent disease.

In the present cohort of patients undergoing AS, lower nomogram probabilities were associated with biopsy progression and any progression but not with surgical progression. Since the nomograms use Gleason patterns as a major component in defining indolent disease, a lower predicted probability of indolent disease would be expected in the surgical group. It is unclear whether differences are indiscernible because of the small sample size (patients undergoing RP: n = 58) or reflect limited accuracy of the nomograms when applied to patients on AS.

To ensure that AS is used appropriately, providers must more accurately predict the likelihood of progression in patients. The nomograms evaluated in the present study were found to have a modest ability to predict the patients who will have biopsy progression and any progression. The best performing nomogram was by Nakanishi et al. [11] which had AUC values of 0.67 for biopsy progression and 0.70 for any progression.

Decision curve analysis showed that nomograms, excluding the Kattan base model, increased net benefit of treatment when the threshold probability of biopsy progression or any progression was between 40 and 60%. This suggests that nomograms offer modestly accurate estimations of likelihood of progression and have value in predicting patient outcomes in limited threshold probabilities; however, the probabilities obtained from these nomograms do not appear reliable enough to warrant routine use in treatment decision-making. Improvements in both discriminatory ability and clinical benefit are needed before nomograms can be confidently incorporated into AS protocols.

Although the pathological outcomes of men receiving delayed RP are similar to those of patients in low-risk groups receiving immediate surgery, patients and clinicians express concern that AS will compromise the ability to cure disease if treatment is delayed [21, 22]. This anxiety may contribute to the avoidance of or withdrawal from AS in the absence of progression [23]. Indeed, 32 patients (12%) receiving treatment in the present cohort of patients on AS did so without any triggers for intervention. Novel approaches that accurately predict patient outcomes are needed to reduce anxiety and optimize the selection criteria for AS. The present analysis suggests that at demonstrated levels of accuracy, nomograms currently available for predicting indolent disease are not yet able to fill this void. Nonetheless, there is a growing consensus in support of at least a trial period on AS for the majority of men diagnosed with low-risk disease [24].

As more patients with low-risk prostate cancer elect to undergo AS, new risk prediction tools based explicitly on AS populations may yield more useful ways to counsel patients. Incorporating imaging studies, such as MRI, into models using clinical characteristics may also improve our ability to predict outcomes of low-risk disease [25]. Similarly, new biomarkers may be useful in both identifying candidates for AS and monitoring disease progression. Incorporating genetic data using gene expression profiling from paraffin-embedded biopsy specimens is also a promising new area of discovery. Novel biomarkers have shown the ability to improve other existing nomograms and will remain a fruitful area of research in the years ahead [26].

The present study has some limitations. We designed the study to evaluate how well different nomograms predict outcomes of patients on current AS protocols. We used an increase in Gleason grade or volume to indicate clinically significant progression but acknowledge that upgrading, particularly on early surveillance biopsies, could reflect undersampling of existing disease rather than true biological progression. While a higher Gleason grade is associated with an increased risk of prostate cancer mortality, recent studies suggest that even patients at intermediate risk with Gleason 7 cancer have an excellent survival rate on AS, although these studies have only short- to intermediate-term follow-up [27, 28]. Studies that include metastasis and disease-specific mortalities as outcomes will be critical in improving AS protocols.

In conclusion, nomograms designed to predict indolent tumours were found to have a modest ability to predict biopsy progression and any progression on either biopsy or surgery in patients undergoing AS. None of the various nomograms showed any value in predicting surgical progression at RP in a small group of 58 men. Before nomograms are routinely used in patient care and clinical decision-making for AS candidates, additional improvements must be made. Furthermore, tools for predicting the need for treatment must be validated using the endpoints of metastatic disease and prostate cancer-specific mortality. We anticipate that future instruments incorporating imaging tests and/or biomarkers will prove to be of great value in optimizing AS for men with prostate cancer.

Acknowledgements

The authors thank Jeanette Broering, Hazel Dias, Alex Ignatov, Sarah Joost and Frank Stauf for operating and managing the UCSF Urologic Oncology Outcomes Database and Nannette Perez for supporting the UCSF prostate cancer active surveillance cohort.

Conflict of Interest

None declared.

Abbreviations
RP

radical prostatectomy

AS

active surveillance

AUC

area under the receiver-operating characteristic curve

UCSF

University of California, San Francisco

IQR

interquartile range

Ancillary