Prof Rodolfo Montironi, Institute of Pathological Anatomy and Histopathology, Polytechnic University of the Marche Region (Ancona), School of Medicine, Azienda Ospedaliera Umberto I°, I-60020 Torrette, Ancona, Italy. e-mail: firstname.lastname@example.org
The Gleason grading system is a powerful tool to prognosticate and aid in the treatment of men with prostate cancer. The needle biopsy Gleason score correlates with virtually all other pathological variables, including tumour volume and margin status in radical prostatectomy specimens, serum prostate-specific antigen levels and many molecular markers. The Gleason score assigned to the tumour at radical prostatectomy is the most powerful predictor of progression after radical prostatectomy. However, there are significant deficiencies in the practice of this grading system. Not only are there problems among practising pathologists but also a relative lack of interobserver reproducibility among experts.
Numerous grading systems have been designed for the histopathological grading of prostate cancer. The main controversies have been whether grading should be based on glandular differentiation alone or a combination of glandular differentiation and nuclear atypia, and whether prostate cancer should be graded according to its least differentiated or dominant pattern. The Gleason grading system, named after D.F. Gleason, is now the predominant grading system (Fig. 1A), and in 1993 it was recommended by a WHO consensus conference . The Gleason grading system is based on glandular architecture; nuclear atypia is not evaluated [2,3]. Nuclear atypia, as adopted in some grading systems, correlates with the prognosis of prostate cancer but there is no convincing evidence that it adds independent prognostic information to that obtained by grading glandular differentiation (i.e. pattern of growth) alone.
The aim of this review is to evaluate the current clinical significance of Gleason scores in needle biopsies and radical prostatectomy (RP) specimens, and is also based on the results of a recent WHO-sponsored consensus [4,5]. This contribution includes an analysis of the effect of pathological evaluation in the differences in Gleason scores between these types of specimens.
GLEASON GRADING SYSTEM
The Gleason grading system defines five histological patterns or grades with decreasing differentiation. The primary and secondary pattern, i.e. the most prevalent and the second most prevalent pattern, are added to obtain a Gleason score or sum .
Gleason pattern 1 is composed of a very well circumscribed nodule of separate, closely packed glands which do not infiltrate into adjacent benign prostatic tissue. The glands are of intermediate size, and similar in size and shape. This pattern is usually seen in transition zone cancers (Fig. 1B). Gleason pattern 1 is exceedingly rare.
Gleason pattern 2 is composed of round or oval glands with smooth ends. The glands are more loosely arranged and not quite as uniform in size and shape as those of Gleason pattern 1. There may be minimal invasion by neoplastic glands into the surrounding non-neoplastic prostatic tissue. The glands are of intermediate size and larger than in Gleason pattern 1. The variation in glandular size and separation between glands is less than that seen in pattern 3. Although not evaluated in Gleason grading, the cytoplasm of Gleason pattern 1 and 2 cancers is abundant and pale-staining (Fig. 1C). Gleason pattern 2 is usually seen in transition zone cancers but may occasionally be found in the peripheral zone.
Gleason pattern 3 is the most common; the glands are more infiltrative and the distance between them is more variable than in patterns 1 and 2. Malignant glands often infiltrate between adjacent non-neoplastic glands. The glands of pattern 3 vary in size and shape and are often angular (Fig. 1D). Small glands are typical for pattern 3, but there may also be large, irregular glands. Each gland has an open lumen and is circumscribed by stroma. Cribriform pattern 3 is rare and difficult to distinguish morphologically from cribriform high-grade prostatic intraepithelial neoplasia. The latter shows the presence of basal cells; these are lacking in cribriform pattern 3 prostate cancer.
In Gleason pattern 4, the glands appear fused, cribriform or they may be poorly defined. Fused glands are composed of a group of glands that are no longer completely separated by stroma (Fig. 1E). The edge of a group of fused glands is scalloped and there are occasional thin strands of connective tissue within this group. The hypernephroid pattern described by Gleason is a rare variant of fused glands, with clear or very pale-staining cytoplasm.
Cribriform pattern 4 glands are large or they may be irregular with jagged edges. As opposed to fused glands, there are no strands of stroma within a cribriform gland. Most cribriform invasive cancers should be assigned a pattern 4 rather than pattern 3. Poorly defined glands do not have a lumen that is completely encircled by epithelium.
In Gleason pattern 5, there is an almost complete loss of glandular lumina, with only occasional lumina apparent. The epithelium forms solid sheets, solid strands or single cells invading the stroma (Fig. 1F). Care must be applied when assigning a Gleason pattern 4 or 5 to limited cancer on needle biopsy to exclude an artefact of tangential sectioning of lower grade cancer. Comedonecrosis may be present.
GLEASON SCORES IN PROSTATE BIOPSIES
While the prime goal of the needle biopsy is to diagnose prostatic adenocarcinoma, once carcinoma is detected further descriptive information on the type, Gleason score and amount of cancer forms the cornerstone for managing the patient, assessing the potential for local cure and the risk for distant metastasis [6–9].
THE GLEASON SCORE AS A PROGNOSTIC FACTOR
The Gleason score of adenocarcinoma of the prostate is the quintessential prognostic factor in predicting findings in the RP specimen (pathological stage), biochemical failure, local recurrences and lymph node or distant metastasis in patients receiving no treatment, radiation therapy, RP and other therapies including cryotherapy and neoadjuvant therapy [9–12]. The needle biopsy Gleason score also correlates with virtually all other pathological variables, including tumour volume and inked margin status in RP specimens, serum PSA levels and many molecular markers [6–9]. Specifically, Gleason scores of 7–10 are associated with worse prognoses, and tumours with Gleason scores 5–6 are associated with lower progression rates after definitive therapy. The predictive value of the Gleason score is enhanced when combined with other clinical variables, including a DRE and serum PSA levels. In recent years, nomograms have been developed to predict pathological stage at RP and disease progression after surgery or radiation therapy. Nomograms typically include pretreatment variables including clinical stage, Gleason score, serum PSA, amount of cancer in needle biopsy, etc. [9,11,13]. Based on statistical modelling of cumulative, prospectively accrued data on large consecutive series of patients, the nomograms have reasonable discriminatory ability to predict (depending on the patient cohort of the nomogram and statistical modelling) the pathological stage, seminal vesicle involvement, lymph node metastases, biochemical failure, small-volume organ-confined tumours, response to radiotherapy, etc. Such nomograms are used with increasing frequency in clinical practice by urologists and radiation oncologists to counsel their patients on the therapeutic options and potential risk for failure, based on the therapy they may choose. Inclusion of the needle biopsy Gleason score in all clinically valid nomograms is testimony to the prognostic and predictive power of this grading system and its central role in the contemporary management of patients with prostate cancer [9,11,13]. The Gleason score is also often used to determine eligibility for clinical trials, including those for watchful waiting [14,15]
While the pivotal role of the Gleason score in the needle biopsy is not in question, the method of reporting needs clarification for a few issues, including some not addressed in the original Gleason system . The recommendations of reporting of the Gleason score in needle biopsies [4,16–18] are:
• Report the primary and secondary pattern, and assign a Gleason score.
• If one pattern is present, double it to yield the Gleason score.
• A Gleason score can usually be assigned even if the cancer is extremely small.
• In a needle biopsy with more than two patterns, the worst pattern must be reflected in the Gleason score, even if it is not the predominant or secondary pattern (see text).
• Provide a Gleason score for each separately identifiable involved core.
• A diagnosis of Gleason score 2–4 should not be made (see text).
• Do not report the Gleason score after hormonal or radiation therapy, except if the cancer shows no treatment effect.
• Provide a Gleason score for all adenocarcinoma variants, i.e. ductal, signet ring and mucinous.
• Do not assign a Gleason score for small cell carcinoma and sarcomatoid carcinomas, but specify the histological variant pattern.
• Provide a Gleason score for adenocarcinoma morphological patterns (e.g. hypernephroid, foamy gland, atrophic, pseudohyperplastic).
• Provide a composite (overall) Gleason score for all cores for the patient.
• Provide the percentage of tumour with Gleason pattern 4 in Gleason score 7.
• Provide the percentage of tumour with Gleason patterns 4 and 5 in tumours with Gleason score 8–10.
The most significant new recommendation is to separately report the Gleason score for each recognisable core, irrespective of whether the cores are individually submitted (in individual containers signifying specific anatomical location, e.g. right base), or submitted together (more than one core, possibly sampling different areas of the prostate, e.g. three cores from the left apex, mid and base sent in one container). The needle biopsy core(s) with the highest Gleason score is often given the most weight in clinical decision-making and hence should be identifiable as a separate Gleason score, information which would be lost if individual cores were not graded. If extreme fragmentation makes grading of individual cores difficult, the emphasis should be to identify and provide information on the core with the highest Gleason score. A recent survey of the surgical members of the Society of Urologic Oncology indicated that 81% used the highest Gleason score in a positive biopsy, regardless of the overall percentage involvement, to determine their treatment plan . This paradigm was also used in creating and validating the Kattan nomograms, Partin tables that are currently in wide clinical use [20,21]. Assigning am overall (composite) score is optional.
Another important change is the recognition and reporting of the tertiary pattern in needle biopsies . Tertiary patterns are uncommon but when the worst Gleason grade is the tertiary pattern, it should influence the final Gleason score. For example: a case with primary Gleason pattern 3, secondary pattern 4, and tertiary pattern 5 should be assigned a Gleason score of 8; a case with primary Gleason pattern 4, secondary pattern 3, and tertiary pattern 5 should also be assigned a Gleason 8 (the secondary score being 4, based on the average of patterns 5 and 3 = 4; or Gleason score 9, pattern 4 + 5).
GLEASON 4 PATTERN IN GLEASON SCORE 7 TUMOURS
The data on the importance of the percentage of Gleason 4 pattern in Gleason score 7 tumours is rapidly expanding [22,23]. In recently generated nomograms, patients with Gleason score 4 + 3 vs 3 + 4 are stratified differently, underlining the importance of the relative amount of pattern 4 . Whether or not the actual percentage of pattern 4 tumour should be included in the report is not clear, based on published data to date and, if this emerges as an important variable, meaningful discriminatory thresholds for percentage of pattern 4 will need to be defined.
GLEASON SCORE 2–4
The diagnosis of Gleason score 2–4 should not be made on needle biopsies ; the reasons for this are compelling. (i) Gleason score 2–4 cancer is extraordinarily rare in needle biopsies compared with TURP specimens; (ii) there is poor reproducibility among experts for lower grade tumours ; (iii) the correlation with the RP specimen score for Gleason 2–4 tumours is poor and about half of the RP specimens in one study had extraprostatic extension; and (iv) a ‘low’ score of Gleason 2–4 may misguide clinicians and patients into believing that there is an indolent tumour
GLEASON SCORES IN RP SPECIMENS
The Gleason score assigned to the tumour at RP is the most powerful predictor of progression after RP.
Gleason scores 2–4 are rarely seen as the grade of the main tumour at RP for stages T1c or T2 disease. Tumours with these scores are typical in small multifocal incidental adenocarcinomas of the prostate, most commonly found within the transition zone . Because these tumours are small and anterior they are rarely seen on needle biopsy. The situation where Gleason score 2–4 tumours represents the major tumour is in RPs for tumour incidentally found on TURP (stages T1a and T1b). In one analysis of > 2494 men with clinically localized adenocarcinoma of the prostate, Gleason score 2–4 was the grade of the main tumour in only 2% of the RP specimens . This value represents a disproportionate number of T1a and T1b tumours compared to what would be seen in current practice, as this series encompassed older cases where RP for stages T1a and T1b disease was more prevalent. All men with only Gleason score 2–4 tumour at RP are cured .
Tumours with Gleason scores 5–6 at RP show a spectrum in biological behaviour, depending on other variables such as margin status and organ-confined status . It is important to recognize that most tumours with these Gleason scores are cured, regardless of whether they show extraprostatic extension or positive margins.
Tumours with a Gleason score of 7 have a significantly worse prognosis than those with a Gleason score of 6 [28,29]. Given the adverse prognosis associated with Gleason pattern 4, it would be expected that whether a tumour is Gleason score 3 + 4 or 4 + 3 would influence the prognosis. There are several studies addressing Gleason score 3 + 4 vs 4 + 3 at RP, with somewhat conflicting results.
One study reported no significant survival advantage for Gleason pattern 3 + 4 over 4 + 3 ; however, the lack of statistical significance in that study might be ascribale to there being too few patients and the inclusion of patients with positive lymph nodes and/or seminal vesicle invasion. In another study  Gleason score 3 + 4 or 4 + 3 correlated with both stage and progression, but the median follow-up was only 25.8 months and the difference between the scores was independently predictive only in men with serum PSA values of <10 ng/mL and in those with organ-confined disease.
Several other reports show that Gleason score 4 + 3 has a worse prognosis than Gleason score 3 + 4, yet it was not reported whether it was an independent prognosticator [32,33]. Other studies show a significant difference in recurrence-free survival rate between patients with Gleason score 3 + 4 and 4 + 3 tumours, independent of surgical margins and extraprostatic extension [27,29]. Men with Gleason score 7 tumours were stratified into four different prognostic groups; those with the best prognosis had either Gleason score 3 + 4 or 4 + 3 and organ-confined disease, or had extraprostatic extension of any degree with Gleason score 3 + 4 and negative surgical margins. In the next worse prognostic group men had focal extraprostatic extension and only one adverse finding, meaning either Gleason score 3 + 4 with positive margins or Gleason score 4 + 3 with negative margins. The next worst prognostic group had established extraprostatic extension with again only one adverse finding in terms of grade and margins. The patients with the worst prognosis were those who had focal or established extraprostatic extension and both adverse findings (i.e. positive margins and Gleason score 4 + 3). In a multivariate analysis, surgical margin status was more influential than the extent of extraprostatic extension in predicting progression after RP. It is unclear whether the adverse effect of positive margins relates to the intrinsic biology of disease or to the ability to achieve local control. Preoperative PSA levels did not add to the multivariate model.
Gleason score 8–10 tumour accounted for only 7% of the grades seen at RP at one large centre . Typically, men with Gleason score 8–10 tumours have highly aggressive tumours and present at an advanced stage, such that they are not amenable to local therapy. Even in more recent series 70–91% of men with Gleason score 8–10 tumour do not present with organ-confined disease [34,35]. Overall, patients with Gleason scores 8–10 at RP have a 15% chance of having no evidence of disease at 15 years after surgery . Although there are few men in each study and the follow-up fairly short, it was reported than when Gleason score 8–10 tumour is organ-confined the prognosis is significantly better [30,35–37].
PERCENTAGE GLEASON PATTERN 4/5
The group from Stanford has been a strong proponent of using the proportion of high-grade tumour as the preferred method for grading prostate cancer. However, the percentage of pattern 4/5 is only strongly predictive for progression at the extremes (>70% or < 20% pattern 4/5) . It has not been shown that classifying tumours based on the percentage of pattern 4/5 is more predictive than stratifying patients into Gleason scores 2–4, 5–6, 3 + 4, 4 + 3 and 8–10. Furthermore, assessing the percentage of Gleason pattern 4 is often difficult, as patterns 4 and 3 are often intimately admixed. Further difficulty in asking pathologists to derive a specific percentage of pattern 4/5 stems from studies showing interobserver variability in grading tumours with Gleason scores 5–7 . Therefore, while an accurate measurement of the percentage of Gleason pattern 4 may not be practical, distinguishing Gleason score 3 + 4 from 4 + 3 is simpler and more likely to be part of a routine pathological examination.
TERTIARY GLEASON PATTERN
Within RP specimens, as a result of there being more tumour available for histological examination, a higher proportion of cases are found to contain more than two grades. Aihara et al. found an average of 2.7 different Gleason patterns per case and over half of cases contained at least three different grades in a series of 101 RPs. There is no consensus about how to grade these tumours, as the Gleason system only accounts for the primary and secondary patterns. The other controversy is how to grade tumours which are > 95% of one pattern, where there is only a very small percentage of higher grade tumour. For example, if a tumour is composed of > 95% Gleason pattern 3 and < 5% pattern 4, some experts would assign a Gleason score 3 + 3 = 6, as it was proposed that there must be > 5% of a pattern present for it to be incorporated within the Gleason score. Others might grade the tumour as Gleason score 3 + 4 = 7. In the only studies to address this issue, the existence of a high-grade component, even it constituted < 5% of the whole tumour, had a significant adverse influence on the overall biological behaviour . The progression rate of Gleason score 5–6 tumours with a tertiary component of Gleason pattern 4 is almost the same as that of a pure Gleason score 7 tumour. Gleason score 7 tumours with tertiary pattern 5 are associated with progression rates after RP approximating those for a pure Gleason 8 tumour. However, there was no such significance in cases of Gleason score 8 with tertiary pattern 5, partly because of the limited sample size, yet it is also likely that Gleason score 8 tumours are already so advanced that the existence of pattern 5 elements makes no difference. Consequently, when a tumour contains tertiary high grades, the tumour should be graded routinely with a comment in the report noting the presence of the tertiary element .
CORRELATION AND SOURCES OF DISCREPANCIES BETWEEN NEEDLE BIOPSY AND RP GLEASON SCORES
There have been several studies addressing the correlation between Gleason scores in needle biopsies and corresponding RP specimens. Although earlier studies used the thicker (14 G) needle biopsies [43,44], more recent series are based on thin-core (18 G) needles used in conjunction with biopsy guns and TRUS guidance. Sextant or other methods of systematic sampling are typical in the more current series. In a recent compilation of data on 3789 patients from 18 studies, there was exact correlation of Gleason scores in 43% of cases and plus or minus one Gleason core unit in 77%. Under-grading of carcinoma in needle biopsy is the most common problem, occurring in 42% of all reviewed cases. Importantly, over-grading of carcinoma in needle biopsies may also occur, but this was only found in 15% of cases. In general, adverse findings on needle biopsy accurately predict adverse findings in the RP specimen, whereas favourable findings on the needle biopsy do not necessarily predict favourable findings in the RP specimens, largely through sampling error.
SOURCES OF DISCREPANCIES
Perhaps the most important factor is sampling error, which relates to the small amount of tissue removed by thin-core needle biopsies. The average 20-mm, 18-G core samples are ≈ 0.04% of the average gland volume (40 mL). The most common type of sampling error occurs when there is a higher grade component present within the RP specimen which is not sampled on needle biopsy . This typically occurs when a needle biopsy tumour is graded as Gleason score 3 + 3 = 6. In the RP specimen there is a Gleason pattern 4 which was not sampled on the biopsy, resulting in a RP Gleason score of 3 + 4 = 7.
In some instances under-grading results from an attempt to grade very small areas of carcinoma, so-called ‘minimal’ or ‘limited’ adenocarcinoma . Scores of minimal adenocarcinoma in needle biopsies show a reasonably strong correlation with RP scores, but the Gleason scores do not have the same power to predict extraprostatic extension and positive margin status as they do in non-minimal carcinomas .
Over-grading can result from sampling error in cases where the high-grade pattern is selectively represented in needle biopsy. It may only represent a very minor element in the RP specimen. Even the same cancer focus may have different grades, depending on the area sampled.
The other source of discrepancy between biopsy and RP is borderline cases. In the description of the Gleason grading system there are some cases that fall at the interface between different patterns, where there will be interobserver variability and possible even intra-observer variability .
Pathology error is most common when pathologists assigned a Gleason score of ≤4 on a needle biopsy which in fact was Gleason score 5–6. Many pathologists under-grade needle biopsies by confusing quantitative changes with qualitative changes. When there is a limited focus of small glands of cancer on needle biopsy, by definition this is a Gleason pattern 3 (this consists of small glands with an infiltrative pattern). Taking a biopsy of truly low-grade adenocarcinoma of the prostate could not result in just a few neoplastic glands, but rather would be more extensive, as low-grade adenocarcinoma grows as nodules of closely packed glands rather than infiltrating in and amongst normal glands. Under-grading may result from difficulty in recognizing an infiltrative growth pattern or failing to recognize the presence of small areas of gland fusion .
PATHOLOGISTS’ EDUCATION AND EXPERIENCE
The pathologists’ experience in grading thin-core needle biopsies can also influence the overall correlation with RP results. With experience, pathologists recognize grading pitfalls, in particular that Gleason scores of 4 and lower are almost non-existent in the needle biopsy. Furthermore, small areas of fusion in the presence of a predominantly grade 3 background are recognized and will yield a Gleason score of 7, which often correlates well with RP results .
INTRA-OBSERVER AND INTEROBSERVER VARIABILITY
Reproducibility studies can be categorized as intra-observer and interobserver; for investigations of intra-observer agreement of Gleason grades, exact agreement was reported in 43–78% of cases [49,50], and agreement within plus or minus one Gleason score unit was reported in 72–87% of cases. Gleason wrote that he duplicated exactly his previous histological scores about half the time. Highly variable levels of interobserver agreement on Gleason scores were also reported, at 36–81% for exact agreement and 69–86% of observers within plus or minus one Gleason score unit. The reproducibility of Gleason grading can be improved by recognizing problematic areas and educating physicians via meetings, courses, website tutorials and publications that specifically focus on the Gleason grading system .
The Gleason grading system for prostatic carcinoma, based on glandular architecture, is the dominant method worldwide in research and in daily practice. The Gleason grading system should be used in all prostatic tissue samples, including needle-core biopsies and RP specimens. Its prognostic value was tested in a large population with a long-term follow-up that included the use of survival as an endpoint. The Gleason grading system shows a reasonable degree of correlation between biopsy and RP specimens. Several sources of discrepancy between these types of specimens have been identified. Further educational endeavours are needed to arrive at a greater consensus and accuracy in the use of the Gleason system.
This publication was supported by grants from the Polytechnic University of the Marche Region (Ancona) (M.S.) and the Italian Ministry of University and Scientific Research (R.M., 2003). The content of this paper is solely the responsibility of the authors and does not necessarily represent the official views of the Polytechnic University of the Marche Region (Ancona) (Italy).