Department of Urology, University of California San Francisco Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, California
Department of Urology, University of California San Francisco Helen Diller Family Comprehensive Cancer Center, University of California at San Francisco, Box 1695, 1600 Divisadero St, A-607; San Francisco, CA 94143-1695; Fax: (415) 885-7443
In the current study, the authors propose the quantitative Gleason score (qGS), a modification of the current Gleason grading system for prostate cancer, based on the weighted average of Gleason patterns present in the pathology specimen. They hypothesize that the qGS can improve prostate cancer risk stratification and help prevent the overtreatment of patients with clinically indolent tumors.
The qGS was applied to patients in the University of California San Francisco urologic oncology database with tumors determined to have a GS of 7 on prostate biopsy or final pathology after radical prostatectomy (RP). Using multivariable logistic regression, Cox proportional hazards regression, receiver operating characteristic (ROC), and decision curve analyses, the ability of qGS to predict pathological GS and the risk of disease recurrence after RP was assessed.
A total of 225 men were included in the analysis of biopsy specimens and 618 men were included in the assessment of RP specimens. Compared with traditional Gleason scoring, the qGS improved concordance between biopsy and pathological GS on decision curve and ROC analyses (area under the curve ROC curve, 0.79 vs 0.71). On regression analysis, the qGS of biopsy specimens was found to be significantly associated with pathological grade after RP (hazard ratio [HR], 1.78; 95% confidence interval [95% CI], 1.49-2.12) and the qGS of RP specimens was significantly associated with the risk of biochemical disease recurrence after RP (HR, 1.13; 95% CI, 1.04-1.24).
In 1966, Gleason et al proposed an ordinal system for grading the histopathological architecture of prostate tumors.1, 2 In subsequent years, this grading system has proven to be a durable, powerful predictor of outcome in patients with prostate cancer. The Gleason score (GS) of prostate biopsy specimens is associated with adverse pathologic outcomes after radical prostatectomy (RP) and an increased risk of disease recurrence and progression, regardless of treatment modality.3-5 Furthermore, both the primary and secondary GS have been reported to be strong predictors in several multivariable models designed to forecast the risks of biochemical disease recurrence (BCR), clinical progression, and mortality.6, 7
Despite these strengths, there are several potential drawbacks of the current Gleason system. First, because the GS is a categorized assessment, it is likely that patients with various risks of disease progression are grouped together into the same Gleason category. Second, the primary GS is determined by the tumor pattern that accounts for the majority of total tumor volume, with the secondary score representing the less common pattern. This practice was established arbitrarily, and there are no data to suggest that this 50% threshold is the optimal cutoff for identifying more aggressive biology in contemporary patients. Finally, although pathologists do often, but not always, assign a separate GS to each prostate needle biopsy specimen, the overall GS for prostate biopsies typically represents the highest Gleason grade of any single core needle biopsy specimen.8 This practice often results in discrepancies between the biopsy GS and the pathological GS after RP.9 The inability to accurately assess tumor grade at the time of diagnosis could lead to the overtreatment of patients with relatively indolent lesions, thereby subjecting men with prostate cancer to significant unnecessary morbidity.
In the current study, we propose a novel technique for assigning GS. The quantitative GS (qGS) is derived from the percentage of high-grade (Gleason pattern 4) tumor in the pathology specimen. We applied this novel grading technique to patients assigned a GS of 7 on pathological analysis of prostate biopsy or RP specimens, a group with a varied risk of adverse outcomes.10 We then compared the qGS with traditional Gleason scoring. For prostate biopsies, we hypothesized that the qGS can improve the concordance between the biopsy and RP GSs. For RP specimens, we aimed to assess whether the qGS would allow for a more accurate assessment of recurrence risk in men undergoing surgery for prostate cancer.
MATERIALS AND METHODS
We performed an analysis of the University of California San Francisco (UCSF) urologic oncology database (UODB). Data for the UODB are prospectively collected for all patients undergoing surgery for prostate cancer who consent to participate, under supervision of the UCSF institutional review board.
In the current study, we developed the qGS, a novel method of assigning a GS to prostate cancer specimens. The qGS is based on the weighted average of the Gleason patterns presented in surgical pathology reports. Since July 2006, UCSF pathology staff have noted the percentage of tumor with Gleason patterns 4 or 5 on both biopsy and RP specimens. The percentage of tumor greater than Gleason pattern 3 is reported in deciles, ranging from 10% to 90%.
The qGS is calculated as follows:
in which the percentage GS3 and percentage GS4 represent the percentage of Gleason 3 and 4 tumors, respectively. The weighted average of the Gleason patterns is multiplied by 2 to yield a continuous qGS result between 6 and 8, an indicator of risk consistent in magnitude with the current Gleason system. Thus, a tumor with 50% of Gleason pattern 3 and 50% of Gleason pattern 4 would have a qGS of 7 (2 × [3 × 0.5] + [4 × 0.5] = 7). With increasing predominance of pattern 3 and pattern 4 disease, the qGS approaches 6 and 8, respectively.
In assessing biopsy specimens, the percentage GS3 and percentage GS4 across all positive biopsies were determined and used to calculate the biopsy qGS. For RP specimens, the pathological qGS was calculated using the percentage GS3 and percentage GS4 on the final pathology specimen. Figure 1 shows a sample calculation of qGS for 2 hypothetical prostate biopsy (Fig. 1a) and RP specimens (Fig. 1b), and compares this score with the traditional GS.
We assigned each patient a “traditional” GS according to the guidelines established in the 2005 International Society of Urological Pathology consensus conference on Gleason grading.11 For RP specimens, the primary Gleason pattern was the most prevalent pattern observed on final pathology, with the secondary pattern representing the minority pattern. Although there is some controversy regarding how to assign an overall GS for several positive biopsy specimens, the majority of urologists assign the biopsy GS as the highest Gleason grade of any single needle biopsy specimen.8 We thus applied this rule in determining the “traditional” GS of needle biopsy specimens.
We first calculated the qGS for prostate biopsy specimens to assess whether this quantitative score improved concordance between the biopsy and pathological GS. This analysis included all patients in UODB diagnosed with prostate cancer between July 2006 and December 2009 with a traditional GS of 7 on prostate biopsy. Patients diagnosed before July 2006 were excluded because pathologists at UCSF were not reporting the percentage of high-grade tumor at that time. Only those patients who underwent RP as their primary treatment within 1 year of the last prostate biopsy were included.
The primary outcome in this analysis was the traditional pathological GS after RP. The Student t test was used to compare the mean biopsy qGS in patients with traditional pathological Gleason 4 + 3 versus 3 + 4 tumors after RP. Receiver operating characteristic (ROC) and decision curve analyses12 were used to compare the ability of the traditional versus qGS of biopsies to predict pathological grade after RP. Decision curve analysis assessed both the predictive accuracy and the net benefit (eg, value) of 1 model relative to another, reflecting both discrimination and calibration.12 The decision curve in this analysis compared the net benefit of traditional versus qGS as a function of threshold probability (ie, the probability of upgrading at which the patient would benefit from more aggressive treatment).
Multivariable logistic regression analysis was then used to evaluate the associations between biopsy traditional GS and qGS with the pathological GS after RP. Models were adjusted for patient age, prostate-specific antigen (PSA) level at diagnosis, percentage of positive core needle biopsy specimens, percentage of positive biopsy tissue length, and clinical T stage.
Prostatectomy Specimen Analysis
Our second analysis applied the qGS to RP specimens, and investigated whether this modified grading system improved the risk assessment of BCR compared with traditional Gleason grading. This analysis included all patients who underwent RP as their primary treatment between July 2006 and December 2009 and had Gleason sum 7 tumors on final pathology. We excluded patients with < 1 year of follow-up or < 2 postoperative PSA measurements, for whom BCR could not be determined. BCR was defined as 2 consecutive PSA measurements ≥ 0.2 ng/mL after RP or any secondary treatment at least 6 months after surgery.
Decision curve analysis was used to compare the net benefit of the traditional versus qGS in predicting BCR. Cox proportional hazards regression analysis was used to test for associations between traditional or quantitative pathological GS and BCR. Models were adjusted for patient age, PSA level at diagnosis, positive surgical margins, extraprostatic extension, seminal vesicle invasion, and lymph node involvement.
Quantitative Gleason Grading of Prostate Biopsy Specimens
A total of 3479 men with prostate cancer were enrolled in the UCSF UODB as of December 2009. Of these, 823 were diagnosed after July 1, 2006 and underwent RP as their primary treatment. A total of 287 patients had a biopsy Gleason sum of 7, and the percentage of high-grade tumors was reported in 225 of these men.
Demographic and disease characteristics for the study population are shown in Table 1.6 Figure 2a compares the distributions of traditional and qGSs of biopsy specimens for men in the current study cohort. Using the traditional GS, 84% of the men were grouped into a single Gleason category of 3 + 4, and 16% were grouped as Gleason 4 + 3. Using the continuous qGS allowed for the finer discrimination of the study population.
Table 1. Demographic and Disease Characteristics of the Study Population
Analysis of qGS of Prostate Biopsy Specimens (N=225)
Analysis of qGS of Radical Prostatectomy Specimens (N=618)
Abbreviations: PSA, prostate-specific antigen; qGS, quantitative Gleason score; UCSF-CAPRA, University of California San Francisco-Cancer of the Prostate Risk Assessment.
On pathological analysis of RP specimens, 165 men (73%) had traditional pathologic GSs of ≤ 3 + 4. The average biopsy qGS was significantly lower in these men compared with men with a GS of ≥ 4 + 3 tumors (mean biopsy qGS, 6.35 vs 6.89; P < .01).
The traditional biopsy GS matched the pathological GS in 68% of men, whereas 16% were upgraded and 16% were downgraded at the time of RP.
Figure 3 stratifies patients by biopsy qGS, and compares the percentage of men with traditional pathologic Gleason ≤ 3 + 4 tumors with the percentage with ≥ 4 + 3 tumors. There was a consistent trend toward a larger percentage of high-grade tumors in men with higher biopsy qGS.
Figure 4 shows a univariate ROC curve assessing the accuracy of traditional versus quantitative Gleason grading of prostate biopsies in predicting the traditional pathological GS after RP. The AUC of the qGS exceeded that of traditional Gleason grading (0.79 vs 0.71; P < .01). Figure 5a illustrates a decision curve comparing the usefulness of traditional versus qGS of biopsies. The net benefit of qGS exceeded that of traditional Gleason scoring at threshold probabilities (ie, probability of pathological upgrading) < 47%.
Table 2 summarizes the results of multivariable logistic regression models investigating clinical factors associated with advanced traditional pathologic GS. In separate models, both traditional (hazards ratio [HR], 18.74; 95% confidence interval [95% CI], 7.42-47.34) (model 1) and quantitative (HR, 1.78 per 0.2-point increase; 95% CI, 1.49-2.12) (model 2) Gleason grading of biopsies were found to be significantly associated with advanced pathologic GS. However, in a backward stepwise selection model containing both the traditional GS and qGS, only the qGS maintained statistical significance (HR, 1.78 per 0.2-point increase; 95% CI, 1.49-2.12) (model 3).
Table 2. Multivariable Regression Models of Clinical Factors Associated With Advanced (≥ Gleason 4+3) Traditional Pathological Gleason Score After Radical Prostatectomy
Of the 3479 men with prostate cancer in the UCSF UODB, 1157 underwent RP as their primary treatment and were found to have Gleason 7 tumors on final pathology. Of these, 618 patients had at least 2 PSA values and 1-year follow-up after surgery, and had the percentage of high-grade tumor recorded on their final pathology report. The mean follow-up for these men was 40 months (range, 12 months-113 months).
Table 1 shows the demographic and disease characteristics of these 618 men. The distributions of traditional GS and qGS for RP specimens are shown in Figure 2b. The majority of RP specimens (73.0%) were grouped into a single Gleason category (3 + 4), whereas the qGS enabled the finer categorization of risk according to extent of Gleason pattern 4 disease.
A decision curve comparing the traditional GS with the qGS of RP specimens is shown in Figure 5b. At threshold probabilities (ie, probability of BCR) < 28%, the net benefit of the qGS exceeded that of traditional Gleason grading.
For the entire cohort, the 5-year BCR-free survival rate after RP was 82%. The results of the Cox proportional hazards regression model investigating factors associated with BCR after RP is shown in Table 3. In the presence of other predictor variables, the traditional pathologic GS was not found to be significantly associated with BCR after RP (HR, 1.48; 95% CI, 0.91-2.40) (model 1). In contrast, pathologic qGS was found to be significantly associated with the risk of BCR (HR, 1.13 per 0.2-point increase; 95% CI, 1.04-1.24) (model 2). Moreover, qGS (HR, 1.15 per 0.2-point increase; 95% CI, 1.06-1.24) remained significantly associated with BCR in a backward stepwise selection model containing both traditional GS and qGS data (model 3).
Table 3. Cox Proportional Hazards Model of Factors Associated With Biochemical Disease Recurrence After Radical Prostatectomy
In the current study, we present a novel method of assigning GS to prostate biopsy and RP specimens, transforming the GS for mixed Gleason pattern 3 and 4 tumors from a categorical to a continuous variable, the qGS. We have shown that applying this novel score to prostate biopsy and RP specimens results in superior risk stratification for men with prostate cancer compared with traditional Gleason scoring.
The GS of prostatic tumors, whether applied to prostate biopsy or radical prostatectomy specimens, is strongly associated with prostate cancer outcomes.3-5, 13 However, categorical reporting of Gleason patterns according to current standards may not be optimal. Compared with continuous variables, categorical variables offer a lesser degree of discrimination, resulting in decreased statistical power in risk assessment models. Moreover, although the Gleason system was originally designed to assign a score ranging from 2 to 10, the evolution of grading standards over time and preferential sampling of the transition zone on prostate biopsy has essentially eliminated GS 2 through 5.11, 14-16 Thus the contemporary GS score effectively ranges from 6 to 10,14, 17 forcing patients with a range of risk levels into relatively few Gleason categories.10, 16
In the current study, 83.6% of prostate biopsies and 73.0% of RP specimens were assigned a traditional GS score of 3 + 4, indicating anywhere from 1% to 49% of Gleason pattern 4 tumor.14 It is unlikely that all of these patients harbored equivalent disease risk; rather, higher volumes of poorly differentiated disease are likely to be associated with an increased risk. The qGS captures this continuum of risk and increases the statistical power by assigning a decimal value GS based on the percentage of high-grade disease.
Compared with traditional Gleason grading, the qGS of prostate biopsies is better correlated with the “gold standard,” the traditional pathological GS of RP specimens. The suboptimal concordance of biopsy and pathological GS has been well documented, and several authors have explored various manipulations of grading prostate needle biopsy specimens to minimize this discordance.8, 9, 18 Whereas the majority of practitioners assign grade based on the most aggressive pattern of any individual needle biopsy specimen,8 the qGS accounts for the histology of all positive needle biopsy specimens. The data from the current study suggest that the qGS better estimates the grade of the entire volume of prostate tumor compared with traditional Gleason grading.
The improved risk assessment offered by the qGS carries several important clinical implications. A growing body of evidence suggests that active surveillance is an effective management strategy for appropriately selected patients with low-risk prostate cancer.19 Men with Gleason 7 tumors are typically considered to have intermediate-risk disease20 and therefore are often considered inappropriate candidates for active surveillance.19 However, recent data have suggested that active surveillance may be safe in carefully selected men with intermediate-risk tumors, usually low-volume Gleason 3 + 4 lesions.21 The current practice of grading prostate biopsies based on the highest grade of any single core needle biopsy specimen could potentially overestimate disease risk, thus excluding men from active surveillance and subjecting them to unnecessary prostate cancer treatment. In men with Gleason 3 + 4 tumors, the finer risk stratification offered by the qGS may allow for the identification a larger number of men who can be safely managed with active surveillance.
Furthermore, management strategies are often altered by the presence and/or extent of Gleason pattern 4 tumor. Because the risk of lymph node metastases is greater in men with higher grade tumors,22 lymphadenectomy is more often performed in this patient group during RP. For patients undergoing radiotherapy, dose escalation, concurrent pelvic radiotherapy, and/or hormonal therapy are often recommended based on increasing Gleason grade,23 interventions that carry with them the potential for treatment-related morbidity.24, 25 The qGS may allow for the identification of “lower risk” men who could be spared the morbidity associated with these additional treatments.
We have also shown that applying the qGS to RP specimens improves the prediction of BCR after surgery. Stamey et al first reported a similar association between the percentage of high-grade tumor and disease recurrence after RP,26 a finding that has been confirmed in a more contemporary cohort.27 However, to the best of our knowledge, the current study is the first to investigate the percentage of high-grade tumor in prostate biopsy specimens. Furthermore, our modification of the Gleason system incorporates these data into a single numerical GS. Because the range of this qGS is similar to that of the traditional GS (ie, 6-8), the qGS can be incorporated intuitively within existing risk stratification paradigms and in clinical practice. Moreover, the additional discrimination offered by the qGS may improve the performance of multivariable prediction models and nomograms. This additional risk discrimination may help to better identify those men who would benefit the most from adjuvant or early salvage treatment after RP, or allow for the intensity of follow-up regimens after RP to be tailored to an individual patient's risk profile.
A potential limitation of the current study is that only Gleason 7 tumors were included in the analysis. However, the weighted average of Gleason patterns used to calculate the qGS could easily be applied to other traditional Gleason grades. For example, a traditional Gleason 9 tumor would be assigned a qGS ranging from 8.0 to 10.0 based on the relative percentage of Gleason pattern 4 versus pattern 5 tumor. In addition, although the presence of tertiary Gleason pattern 5 is known to be an adverse prognostic indicator,18, 28 these data are not recorded in the UCSF UODB and therefore could not be assessed. However, tertiary patterns could easily be incorporated into the qGS by adding a third term into the weighted average equation.
Additional limitations of the current study include its retrospective nature, although all patients were enrolled and all data were collected prospectively. Furthermore, because the percentage of high-grade tumor has only been reported at our institution since July 2006, we have relatively limited follow-up for the assessment of BCR after RP. Finally, pathologists across a range of practices do not uniformly report sufficient detail for the calculation of the qGS. However, whether calculation of the qGS per se is the goal, we hope that reporting standards will evolve to include this degree of detail, which we believe can help inform better clinical decision-making.
The current study establishes the qGS, a simple modification of traditional Gleason grading techniques. This scoring system can be applied to both prostate biopsy and RP specimens, and appears to be better associated with prostate cancer outcomes compared with traditional Gleason grading. Validation studies and additional research are needed to explore the full potential of this novel method for grading prostate tumors.