The Phoenix definition of biochemical failure predicts for overall survival in patients with prostate cancer†
The contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute or Varian Medical Systems.
The American Society for Therapeutic Radiology and Oncology (ASTRO) definition of biochemical failure (BF) incorporates backdating, resulting in an artificial flattening of Kaplan-Meier curves and overly favorable estimates when follow-up is short. The nadir + 2 ng/mL (Nadir + 2; Phoenix) definition reduces these artifacts. The objective of the current study was to compare ASTRO and Phoenix BF estimates as determinants of distant metastasis (DM), cause-specific mortality (CSM), and overall mortality (OM).
A total of 1831 patients with T1-4N0M0 prostate cancer were treated with external beam radiotherapy (RT) using conventional or three-dimensional conformal methods to at least 60 grays (Gy). The median follow-up was 71 months and the median RT dose was 72 Gy (range, 60–79 Gy). Cox regression models incorporating BF as a time-dependent covariate were used for both univariate and multivariate analyses. Other covariates included in the analyses were T classification, Gleason score, neoadjuvant/adjuvant androgen deprivation, age, RT dose, and pretreatment prostate-specific antigen.
BF was observed in 389 men (21%) using the Phoenix definition and 460 men (25%) using the ASTRO definition. DM was observed in 84 patients (5%), 48 patients (3%) patients died of prostate cancer, and 404 patients (22%) died of any cause. The Phoenix definition of BF was found to be a significant predictor of DM, CSM, and OM, after controlling for other significant covariates. The ASTRO definition was found to be associated with CSM and DM, but not OM.
The Phoenix definition of BF is a more robust determinant of patient outcome compared with the ASTRO definition. The correlation with mortality, including OM, and the independence of this correlation from the use of neoadjuvant/adjuvant androgen deprivation, supports the use of Nadir + 2 in prostate cancer clinical trials of RT with or without androgen deprivation. Cancer 2008. © 2007 American Cancer Society.
In the late 1990s, the American Society for Therapeutic Radiology and Oncology (ASTRO) consensus guidelines provided a starting point for a uniform definition of biochemical failure (BF) after treatment with radiotherapy alone.1 However, with greater patient numbers and longer follow-up, several weaknesses have become clear. First, the ASTRO definition backdates BF, causing an artificial early drop and late flattening of Kaplan-Meier curves. These effects contribute to a pronounced dependency on the length of follow-up.2, 3
Second, the ASTRO definition was not, at the time of its recommendation, associated with clinical outcomes. In fact, after its incorporation as an endpoint, evidence of the prediction of overall mortality (OM) remained elusive.4–8 It was not until 2004 that the ASTRO definition, used as a time-dependant variable, was shown to be correlated with OM.9
Third, ASTRO BF was originally designed as an endpoint for patients treated with external beam radiotherapy alone. Because there is a natural elevation of the prostate-specific antigen (PSA) level after the withdrawal of androgen deprivation (AD), the ASTRO definition will overestimate BF in these patients.10 It also fails in many cases to account for the PSA bounce phenomenon, which has been well described11–13; patients treated with brachytherapy have a greater risk of PSA bounce and are more apt to be misclassified using the ASTRO definition.
There are other BF definitions that are more accurate predictors of clinical outcome.14 Recently, many of these were compared in a pooled analysis.15 The results showed that when BF was described as a 2-ng/mL rise in PSA over the PSA nadir with BF designated as the time the event occurred (at call), the results were more sensitive and specific for clinical failure than the ASTRO definition and avoided its pitfalls. This analysis did not evaluate the correlations between the ASTRO and nadir + 2 ng/mL definitions with cause-specific mortality (CSM) or OM. Although the appeal of the nadir + 2 ng/mL (Nadir + 2) definition was acknowledged at a consensus conference in 2006,16 to our knowledge, there is little evidence that this definition is a stronger predictor of mortality.4–7 Although both the Nadir + 2 and ASTRO definitions have been shown to predict for overall survival, these studies did not compare the 2 definitions, were based on patients treated with low doses of radiotherapy (<70 grays [Gy] in the majority of cases), and did not include dose in their analyses.7, 9
As a potential surrogate endpoint in clinical trials, it is important that the Nadir + 2 definition be substantiated as a strong correlate of mortality. In the current study, we demonstrate that Nadir + 2 is a robust determinant of distant metastases (DM), CSM, and OM, much more so than the ASTRO BF.
MATERIALS AND METHODS
A total of 1831 patients with pathologically diagnosed prostate cancer who were treated at Fox Chase Cancer Center between 1987 and 2001 were included in our analysis. Postprostatectomy patients as well as those with lymph node-positive disease or evidence of metastasis at the time of presentation were excluded. Patient characteristics are shown in Table 1. The median age of the patients was 69 years (range, 43–89 years); 231 patients (13%) were aged ≤60 years, 773 patients (42%) were ages 61 to 70 years, and 827 patients (45%) were aged >71 years. The average initial pretreatment PSA (iPSA) level was 7.1 ng/mL (range, 0–371 ng/mL). There were 716 patients (41%) with T1 disease, 857 patients (49%) with T2 disease, and 171 patients (10%) with T3/T4 disease. Neoadjuvant, concurrent, and/or adjuvant hormone therapy was used in 291 patients (16%). The median length of AD was 7.69 months (range, 0–107.6 months). The Gleason score was 2 to 6 in 1244 patients (68%), 7 in 442 patients (24%), and 8 to 10 in 145 patients (8%). The median follow-up was 71 months (range, 1–204 months).
Table 1. Patient Characteristics
| 3-dimensional conformal||1750||95.6%|
|Initial hormone treatment|
| <6 mo||189||10.3%|
| >24 mo||102||5.6%|
|Salvage hormone treatment|
Patients were treated with conventional (81 patients) and 3-dimensional conformal radiotherapy (1750 patients) to a minimum dose of 60 Gy (range, 60–79 Gy). These methods have been reported previously.17 The International Commission on Radiation Units and Measurements reference point doses were used.18 Briefly, patients were simulated and treated in a supine position in a α-cradle cast for immobilization. In general, patients with T1-2AB disease and a Gleason score <7 received treatment to the prostate alone. Patients with T2C-T4 disease or a Gleason score of 7 to 10 were treated to a dose of 46 to 50 Gy to the prostate and periprostatic tissues (small pelvic field) followed by a boost to the prostate and seminal vesicles. The dose was prescribed to the 95% isodose line. All patients were treated with 10 to 18 megavolt photons. Patients treated using intensity modulated radiation therapy (IMRT) are not included in this analysis due to the short follow-up.
Endpoints and Statistical Analysis
The ASTRO definition of BF as 3 consecutive PSA rises after the post-treatment PSA nadir dated at the midpoint between the nadir and the first rise1 was compared with the Radiation Therapy Oncology Group (RTOG)-ASTRO Phoenix definition (Nadir + 2).15, 16 DM was confirmed by imaging. CSM was defined as death determined to be due to prostate cancer. All patient data were collected prospectively by a single data manager. Calculations were begun from the completion of radiotherapy.
Both definitions of BF were treated as time-dependant variables. Therefore, Cox regression models incorporating BF as a time-dependant variable were used for both univariate and multivariate analyses. Covariates included T classification (T1/T2 vs T3/T4), initial hormone treatment (yes vs no), Gleason score (2–6, 7, and 8–10), radiotherapy dose (continuous variable), iPSA (continuous variable), and age (<60 years, 61–70 years, and >71 years). Kaplan-Meier estimates were calculated from the time of the completion of radiotherapy.
Cox regression models and Kaplan-Meier calculations were performed using both definitions of BF. A stepwise model reduction procedure was applied to determine the most parsimonious model.
Results Using the Nadir + 2 Definition of BF
BF was observed in 389 patients (21%) using the Nadir + 2 definition. DM occurred in 84 patients (4.6%) and 48 patients (3%) died of prostate cancer. On univariate analysis, the Phoenix definition was found to be a significant predictor of DM, CSM, and OM, with hazards ratios (HR) of 213, 517, and 2.2, respectively (Table 2).
Table 2. Univariate Analyses of Factors Related to DM, CSM, and OM
|DM||Nadir BF||Yes vs no||213.4||91.56–497.18||<.0001|
|ASTRO BF||Yes vs no||86.7||37.55–200.32||<.0001|
|Pretreatment PSA (Unit/10)||Continuous variable||1.01||1.011-1.007||<.0001|
|RT dose||Continuous variable||1.002||1.002-.937||.96143|
|T classification||T3/T4 vs T1/T2||4.49||4.488-2.814||<.0001|
|Hormone therapy||No vs yes||0.46||0.462-0.291||.00107|
|Gleason score||7 vs 2–6||3.93||3.927-2.398||<.0001|
|Gleason score||8–10 vs 2–6||7.08||7.077-3.876||<.0001|
|Age, y||61–70 vs ≤60||0.61||0.613-0.332||.11862|
|Age, y||≥70 vs ≤60||0.64||0.644-0.351||.15648|
|CSM||Nadir BF||Yes vs no||516.93||58.768–4546.9||<.0001|
|ASTRO BF||Yes vs no||39.90||39.89-200.32||<.0001|
|Pretreatment PSA (U/10)||Continuous variable||1.01||1.008–1.016||<.0001|
|RT dose||Continuous variable||1.06||1.060-0.967||.21564|
|T classification||T3/T4 vs T1/T2||7.55||7.553-4.195||<.0001|
|AD||No vs yes||0.31||0.311-0.172||.00011|
|Gleason score||7 vs 2–6||2.90||2.901-1.458||.00241|
|Gleason score||8–10 vs 2–6||8.48||8.483-4.069||<.0001|
|Age, y||61–70 vs ≤60||0.80||0.797-0.338||.60352|
|Age, y||≥70 vs ≤60||0.64||0.638–0.262||.32215|
|OM||Nadir+ 2||Yes vs no||2.16||1.725–2.698||<.0001|
|ASTRO||Yes vs no||1.15||0.920–1.425||.22547|
|Pretreatment PSA (U/10)||Continuous variable||1.01||1.006-1.003||.00004|
|RT dose||Continuous variable||0.98||0.977-0.945||.17684|
|T classification||T3/T4 vs T1/T2||1.57||1.568-1.188||.00147|
|Hormone therapy||No vs yes||0.78||0.783-0.619||.04051|
|Gleason score||7 vs 2–6||1.27||1.274-1.006||.04419|
|Gleason score||8–10 vs 2–6||2.35||2.347-1.717||<.0001|
|Age, y||61–70 vs ≤60||1.20||0.803-1.802||.36929|
|Age, y||≥70 vs. ≤60||2.34||2.339-1.590||.00002|
Table 3 shows that Nadir + 2 BF as a time-dependant covariate in multivariate analysis was the most significant predictor of DM, CSM, and OM. On multivariate analysis for OM, BF, age, Gleason score, and T classification were all found to significantly increase the HR for death. T classification and Gleason score also were found to be predictive of DM, CSM, and OM. Age was not found on multivariate analysis to be a predictor of CSM or DM. Pretreatment PSA was not found to be a significant predictor of DM, CSM, or OM.
Table 3. Nadir + 2 Multivariate Analysis of Factors Related to DM, CSM, and OM
|DM||Nadir + 2||173||74–404||<.0001|
|Gleason score 7 vs 2–6||1.8||1.1–2.9||.01|
|Gleason score 8–10 vs 2–6||2.2||1.3–3.8||.005|
|T classification T3/T4 vs T1/T2||1.8||1.1–2.8||.02|
|CSM||Nadir + 2||308||38–2483||<.0001|
|Gleason score 7 vs 2–6||1.2||0.6–2.4||.6|
|Gleason score 8–10 vs 2–6||2.6||1.3–5.2||.01|
|T classification T3/T4 vs T1/T2||2.9||1.6–5.3||.001|
|OM||Nadir + 2||2.0||1.6–2.6||<.0001|
|Age 61–70 y vs ≤60 y||1.3||0.9–2.0||.2|
|Age ≥70 y vs ≤60 y||2.6||1.7–3.7||<.0001|
|Gleason score 7 vs 2–6||1.04||0.8–1.3||.7|
|Gleason score 8–10||1.6||1.2–2.1||.002|
|T classification T3/T4 vs T1/T2||1.4||1.02–1.8||.03|
Results Using the ASTRO Definition of BF
ASTRO BF was observed in 460 patients (25%). On univariate analysis, the ASTRO definition was found to be predictive of DM (HR of 87) and CSM (HR of 40), but not OM.
Table 4 shows the time-dependant multivariate analysis results using the ASTRO BF definition for DM, CSM, and OM. With regard to DM, ASTRO BF was found to be significant, with an HR of 63. Other significant covariates were Gleason score and T classification. For CSM, ASTRO BF was found to be a significant predictor, with an HR of 26. Other factors that were found to be predictive for CSM were iPSA, T classification, and Gleason score. The ASTRO definition was not found to be an independent predictor of OM (HR of 1.0). Factors that did predict for OM were age, iPSA, Gleason score, T classification, and radiotherapy dose.
Table 4. ASTRO Multivariate Analysis for DM, CSM, and OM
|Gleason score 7||1.7||1.1–2.7||.02|
|Gleason score 8–10||2.3||1.4–4.0||.002|
|T classification T3/T4||2.0||1.2–3.2||.004|
|Pretreatment PSA (Unit/10)||1.07||1.02–1.12||.01|
|Gleason score 7 vs 2–6||1.2||0.6–2.4||.5|
|Gleason score 8–10 vs 2–6||3.8||1.9–7.5||.0001|
|T classification T3/T4 vs T1/T2||2.9||1.6–5.4||<.001|
|Age 61–70 y vs ≤60 y||1.2||0.8–1.8||.32|
|Age ≥70 y vs ≤60 y||2.4||1.6–3.5||<.0001|
|Pretreatment PSA (U/10)||1.05||1.01–1.08||.01|
|Gleason score 7 vs 2–6||1.15||0.9–1.5||.2|
|Gleason score 8–10 vs 2–6||1.8||1.3–2.4||<.0001|
|T classification T3/T4 vs T1/T2||1.4||1.1–2.0||<.02|
|RT dose (Gy)||0.96||0.93–0.99||.02|
The ASTRO definition was the first consensus definition of PSA failure that promoted the standardization of analyses between institutions. However, several shortcomings have been recognized, including the artificial flattening of Kaplan-Meier curves and confounding bias from the length of follow-up. The Phoenix BF definition minimizes these problems and more accurately classifies BF in the setting of AD.15, 19 There is a limited experience with Nadir + 2 BF, particularly in terms of how this definition relates to mortality. This is especially true in patients treated with contemporary doses of radiotherapy.
In the current study, the Nadir + 2 definition was found to be the strongest determinant of DM, CSM, and OM after adjusting for the conventional clinical factors of iPSA and Gleason score end stage, as well as radiotherapy dose and the use of neoadjuvant/adjuvant AD. The correlation between Nadir + 2 definition and OM was notably robust, especially when considering that ASTRO BF definition was not associated with OM. Although the ASTRO BF definition has been found to predict for OM,9 this correlation has been observed inconsistently.4–7
The current study is not the first to compare the ASTRO and Phoenix definitions.7, 20 However, to our knowledge, this is the first time the Phoenix definition has been shown to be superior to the ASTRO definition for these endpoints. The study by Kwan et al. was the first study to demonstrate that the Phoenix definition predicts for OS. This study did not discuss the pros and cons of various definitions for BF.7 In addition, the current study was the first study to show this using high-dose radiotherapy. This is noteworthy given the dose effects DM and BF rates.21
The ASTRO definition previously has been shown to be inferior to other definitions by Kuban et al. and more recently in a pooled analysis by Horwitz et al.15, 22 In the latter study, the ASTRO and Phoenix definitions as well as multiple other definitions of BF were compared. Horwitz et al. concluded that the Nadir + 2 definition was the preferable definition given its simplicity, sensitivity, and sensitivity. However, survival endpoints were not evaluated in these studies.
The consensus conference statement from the RTOG-ASTRO has furthered this opinion. The Phoenix conference recommendation was for the Nadir + 2 definition to be accepted as the current standard. They further recommend that use of the ASTRO consensus definition be considered inappropriate if AD is used.16
The results of the current study demonstrate that prostate cancer recurrence or death due to prostate cancer was extremely uncommon without BF using either definition. The robust HRs noted in the current study are consistent with other published results.8, 23, 24 However, in these other studies, including the one by Pollack et al.,8 a profound correlation between BF and OM using the ASTRO definition was found to be marginal at best.
The benefit of AD in high-risk prostate cancer patients is well known. It is interesting that this therapy was not found to be predictive for outcome. Because the use of salvage AD was defined as a failure in both definitions, this would not affect our results. The percentage of patients in the current study who were treated with AD was also low, which would make our ability to detect statistical significance on multivariate analysis poor. Finally, the majority of these patients were treated with dose-escalated radiotherapy, in which the benefits of AD are less well known.
The current study contrasts the ability of the Phoenix and ASTRO definitions to predict clinical outcomes and survival in patients with prostate cancer. Based on the current study results, it is clear that the Nadir + 2 definition is a superior predictor of DM, CSM, and OM. This is especially important when trying to asses the impact of treatment modalities on a disease with a long natural history in a clinical trial. The Phoenix definition is a superior endpoint for use in clinical trials and also may be of use in discussions with patients.
The results of the current study demonstrate that the Phoenix definition of BF is an early predictor of OM, CSM, and metastatic disease. It is superior to the ASTRO definition with regard to these endpoints.
We thank Dr. Gerald Hanks for his leadership in the establishment of the Fox Chase Cancer Center database for the treatment of prostate cancer reported herein and Ruth Peter for her dedication to its maintenance.