Prior presentations of this study: Lee SJ, Richardson PG, Sonneveld P et al, Health-related quality of life (HRQL) with bortezomib compared with high-dose dexamethasone in relapsed multiple myeloma (MM): Results from the APEX study. Presented at the 42nd Annual Meeting of the American Society of Clinical Oncology, Orlando, FL, May 13–17, 2005.
Stephanie J. Lee, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, D5-290, PO Box 19024, Seattle, WA 98109, USA. E-mail: firstname.lastname@example.org
Health-related quality of life (HRQL) was prospectively measured during the phase III APEX trial of bortezomib versus dexamethasone in relapsed multiple myeloma patients. The European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire – Core (QLQ-C30) and Functional Assessment of Cancer Therapy/Gynecologic Oncology Group–Neurotoxicity (NTX) side-effects questionnaires were administered at baseline and every 6 weeks up to 42 weeks. Patients receiving bortezomib (1·3 mg/m2, days 1, 4, 8 and 11 for eight 3-week cycles, then days 1, 8, 15 and 22 for three 5-week cycles; n = 296) demonstrated significantly better mean Global Health Status over the study versus patients receiving dexamethasone (40 mg/d, days 1–4, 9–12, and 17–20 for four 5-week cycles, then days 1–4 only for five 4-week cycles; n = 302), plus significantly better physical health, role, cognitive, and emotional functioning scores, lower dyspnoea and sleep symptom scores, and better NTX questionnaire score, using multiple imputation to account for missing data. Results were similar using available-data analyses. Sensitivity analyses suggested that improved HRQL with bortezomib is at least partially explained by improved survival. These results show that bortezomib was associated with significantly better multidimensional HRQL compared with dexamethasone, consistent with the better clinical outcomes seen with bortezomib.
Patients diagnosed with MM suffer from pain, fatigue, reduced physical and role functioning, and reduced overall HRQL compared with an age- and gender-matched population. These symptoms can improve with successful treatment with either cytotoxic therapy or supportive care (Gulbrandsen et al, 2004). For example, in randomised studies, increases in haemoglobin following treatment with erythropoietic agents are associated with improvements in fatigue and overall HRQL (Osterborg et al, 2002; Hedenus et al, 2003). HRQL and performance status, whether measured at diagnosis or after therapy, have been reported to have prognostic value for survival similar to that of biological characteristics, such as β2-microglobulin and age (Wisløff & Hjorth, 1997), and various patient-reported outcomes have also been shown to help predict survival in conjunction with clinical data in MM (Viala et al, 2007). Indeed, results from a recent systematic review indicated that patient-reported outcomes in various cancer clinical trials provided distinct prognostic information for survival (Gotay et al, 2008).
Bortezomib (VELCADE, Millennium Pharmaceuticals, The Takeda Oncology Company, Cambridge, MA, USA and Johnson & Johnson Pharmaceutical Research & Development, L.L.C., Raritan, NJ, USA) was recently approved for the treatment of multiple myeloma based on the phase 3 VISTA trial; this expands the existing indication to include previously untreated multiple myeloma patients. Previously, the approval in the United States and European Union for the second-line treatment of MM was based on the randomised, phase III APEX trial in relapsed MM patients which compared single-agent bortezomib with high-dose dexamethasone (Richardson et al, 2005), bortezomib received full approval in the United States and European Union for the second-line treatment of MM (Kane et al, 2006). Patients treated with bortezomib had significantly longer median time-to-progression (6·2 vs. 3·5 months, P < 0·001), higher response rates (complete/partial response [CR/PR]: 38% vs. 18%, P < 0·001; CR/near CR: 13% vs. 2%, P < 0·0001), and improved survival (1-year survival rate: 80% vs. 66%, P = 0·003) (Richardson et al, 2005). Consequently, the dexamethasone arm was halted at interim analysis following the recommendation of an independent data-monitoring committee (IDMC). In an updated analysis of APEX, with a median follow-up of 22 months, there was a 6-month survival benefit for patients randomised to receive single-agent bortezomib (29·8 months) compared with dexamethasone (23·7 months) (Richardson et al, 2007).
Results from a phase II study of relapsed and refractory MM SUMMIT showed that response to bortezomib was associated with improved HRQL scores and suggested that patient-reported outcomes may serve as a complementary tool to traditional clinical assessments (Lee et al, 2003; Richardson et al, 2003; Dubois et al, 2006; Viala et al, 2007). Assessment of HRQL was included as a prespecified exploratory efficacy objective of the APEX trial. Here we report the results of this assessment and compare the effects of bortezomib and dexamethasone on HRQL domains and symptoms.
Study design and treatments
APEX was a prospective, international, randomised (1:1), open-label study that compared bortezomib with high-dose dexamethasone. Patients randomised to bortezomib received 1·3 mg/m2 on days 1, 4, 8, and 11 for eight 3-week cycles, followed by three 5-week maintenance cycles with bortezomib 1·3 mg/m2 on days 1, 8, 15, and 22. Patients randomised to dexamethasone received 40 mg/d on days 1 to 4, 9 to 12, and 17 to 20 for four 5-week cycles, and on days 1 to 4 only for five 4-week cycles. Further details have been published elsewhere (Richardson et al, 2005). The study was conducted according to the Declaration of Helsinki, the International Conference on Harmonisation, and the Guidelines for Good Clinical Practice.
The HRQL analysis included clinically evaluable patients who had participated in the APEX study and who had a valid HRQL assessment at baseline and at least one postbaseline assessment.
Two disease-specific HRQL questionnaires were administered to assess the impact of treatment: the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire – Core (EORTC QLQ-C30) (Fayers et al, 1999), a validated cancer-specific HRQL questionnaire, and the Functional Assessment of Cancer Therapy/Gynecologic Oncology Group–Neurotoxicity (FACT/GOG-NTX) subscale (Calhoun et al, 2000, 2003), a side-effects questionnaire focused on neurotoxicity symptoms. The questionnaires were to be administered at baseline and weeks 6, 12, 18, 24, 30, 36, and 42. HRQL assessment was stopped when a subject discontinued from the protocol for any reason (patient or physician preference, disease progression, death, or premature closure of the study). Instruments were scored according to the developers’ recommendations. The EORTC QLQ-C30 comprises a global health/quality of life scale, five functional scales, three symptom scales, and six single item scales for symptoms. Scores range from 0 to 100. For quality of life and functional scales, higher scores reflect better quality of life or functioning; for symptom scales, higher scores reflect worse symptoms. The FACT/GOG-NTX comprises 11 individual items evaluating symptoms of neurotoxicity on a scale of 0 (not at all) to 4 (very much). For the present analysis, to align directionality with quality of life scales, items were reversed (reversed score = 4 – raw score); therefore, total scores ranged from 0 to 44, with higher values indicating a lower burden of neurotoxicity.
The principal aim of the HRQL analysis was to determine the treatment effect of bortezomib versus dexamethasone in the Global Health Status (GHS) domain of the EORTC QLQ-C30 over the 42 weeks of the APEX study. Other aims were to determine treatment effects of bortezomib and dexamethasone in the remaining EORTC QLQ-C30 scores and in the FACT/GOG-NTX questionnaire over the study duration, and to assess the change in each component from baseline to best clinical response.
GHS assessment. Treatment differences between bortezomib and dexamethasone on longitudinal GHS of the EORTC questionnaire were assessed using generalised estimating equations (GEE) analysis of covariance (Zeger et al, 1988). The GEE approach considers all observations during the given time period, and accounts for informative correlation among observations by subject. For comparison between the treatment groups, the models included effects of treatment, and the covariates age, disease duration, number of prior therapies, and baseline HRQL value. Covariates were included regardless of the homogeneity of their distribution between treatment groups, as they have been shown to relate to HRQL and could increase precision in the treatment-group estimates. A significance level of 0·05 was used for these analyses.
Patients contributed different numbers of observations to the GEE analysis, depending on their length of participation in the APEX study, which varied due to withdrawal from the study, loss to follow-up, closure of the dexamethasone arm, and death. In order to account for this, missing data were multiply imputed following a previously described approach (Little & Rubin, 1987). Patients with missing data due to death were assigned the lowest/worst possible score from the date of death through to the end of the study (Method A). Data missing for other reasons (i.e. those who were alive at the time of discontinuation) were imputed once to obtain a monotonic missing data pattern using the Markov Chain Monte Carlo approach. Once imputed, missing data due to premature discontinuation were multiply imputed using the propensity score method in SAS® software (SAS Institute Inc., Carey, NJ). Specifically, monotonic missing data were multiply imputed four times to form four complete data sets (the imputed values are drawn from a distribution predicted by the previous HRQL scores and covariates). This method results in valid statistical inferences that reflect the uncertainty due to missing values (Little & Rubin, 1987). Each of the four “complete” datasets were then analysed by standard complete data methods to assess treatment effect on the HRQL outcome; results from the four analyses were combined to yield a P-value assessing the significance of the treatment difference.
As there were more deaths on the dexamethasone arm than on the bortezomib arm, how they are treated in the analysis could have important implications. To reduce the potential impact of more deaths on the dexamethasone arm, the analysis using multiple imputation was also conducted with patients who died treated as withdrawals/missing (Method B) instead of being assigned the worst possible score from the date of death to study end. Other available methods for treating missing data, including the Sun and Song method (Sun & Song, 2001) and the Pattern-Mixture model (Michiels et al, 2002), were also employed. Conclusions drawn from these analyses generally supported the findings of our primary analyses and will not be discussed further.
Furthermore, analyses were conducted using only the available data, without multiple imputation, both with patients who died assigned the worst possible score (Method C) and with deaths treated as withdrawals/missing data (Method D).
Individual EORTC scale scores and NTX assessment. Individual scale scores from the EORTC questionnaire and the NTX score were examined for treatment differences using Methods A–D. To avoid the risk of identifying non-existent treatment differences, analyses were adjusted for multiple comparisons using the Hochberg–Benjamini sequential testing procedure, keeping the experiment-wise error level at 0·05 (Hochberg, 1988; Westfall et al, 1999). Only adjusted P-values are presented for the individual EORTC scale scores and NTX score, except for Method B, for which this adjustment was not applicable as there were no initial statistical differences.
Change from baseline to best response. Changes in scores from baseline to best clinical response were analysed within treatment groups using the Wilcoxon signed-rank test. These analyses did not include covariates. Within-group changes were also evaluated by response (CR/PR, minimal response [MR]/stable disease [SD], and progressive disease [PD]); it was anticipated that those who achieved a better clinical outcome (CR/PR) would have significantly improved HRQL scores, whereas those in whom the disease progressed would have significantly impaired HRQL. Changes in scores from baseline to best clinical response between treatment groups were analysed using a general linear model with effects for treatment group and the covariates age, disease duration, and number of prior therapies.
Baseline demographic and patient disposition
A total of 669 patients were randomised and 663 received at least one dose of study medication. From a population of 642 patients who completed at least one EORTC questionnaire, the population for the HRQL analysis of EORTC data included 296 bortezomib-treated and 302 dexamethasone-treated patients. Baseline demographic and clinical characteristics were comparable (Table I). A total of 45 patients were excluded from the analysis because they did not complete a baseline assessment (n = 23) or because they completed only a baseline assessment (n = 22). EORTC scores range from 0 to 100; higher numbers represent better functioning or worse symptoms. At baseline, prior to treatment, the mean EORTC scores were comparable for the bortezomib and dexamethasone arms, except for emotional functioning (76·2 vs. 72·2, P = 0·047), fatigue (35·7 vs. 40·8, P = 0·013), sleep (26·1 vs. 32·7, P = 0·020) and diarrhoea (7·5 vs. 10·4, P = 0·046), for which patients on the bortezomib arm reported better functioning and fewer symptoms.
Table I. Demographic and clinical characteristics of 598 subjects included in the HRQL analysis using the EORTC questionnaire.
Bortezomib (n = 296)
Dexamethasone (n = 302)
Total subjects (n = 598)
Age, median years (range)
Male sex, n (%)
Race, n (%)
Karnofsky Performance Status, n (%)
Weight, median kg (range)
Disease duration from diagnosis, median years (range)
Time to prior treatment failure, n (%)
Progressed on or within 6 months
Progressed more than 6 months
Number of prior treatments, n (%)
1 prior treatment
2 or more prior treatments
From a population of 640 patients who completed at least one NTX questionnaire, the population for analysis included 303 bortezomib-treated and 303 dexamethasone-treated patients. A total of 34 patients who completed at least one NTX questionnaire were excluded from the analysis because they were missing a baseline assessment (n = 22) or because only a baseline assessment was available (n = 12). NTX scores range from 0 to 44; per the reversal of item scores used in the present analysis, a higher number denotes less neurotoxicity. At baseline, mean NTX scores were comparable between the bortezomib and dexamethasone groups (35·9 and 35·1, respectively).
Over time, the amount of missing data increased due to discontinuation from the protocol because of adverse events, disease progression, the premature termination of the dexamethasone arm of the study, and death. HRQL assessments were discontinued when patients stopped protocol treatment. Overall, 92%, 69%, 45%, and 29% of patients in the bortezomib arm completed at least 2, 4, 6, and 8 cycles, respectively, and 9% completed all protocol-specified treatment. In the dexamethasone arm, 77%, 36%, 21%, and 11% completed at least 2, 4, 6, and 8 cycles, respectively, and 5% completed treatment. The primary reasons for discontinuing treatment were progressive disease (29% and 52% of patients in the bortezomib and dexamethasone arms, respectively), adverse events (20% and 15%), and patient request (5% and 4%). Furthermore, 28% and 15% of patients in the bortezomib and dexamethasone arms, respectively, were receiving treatment at the time of the IDMC decision to stop the study. In the overall study population, the percentage of patients who died over time was greater in the dexamethasone arm than in the bortezomib arm: Week 6 (2·1% vs. 1·2%), Week 12 (7·7% vs. 3·9%), Week 18 (11·3% vs. 7·5%), Week 24 (14·9% vs. 8·7%), Week 30 (18·2% vs. 8·7%), Week 36 (21·1% vs. 10·8%) and Week 42 (22·9% vs. 12·6%). As a consequence of all these factors, at each time point, the amount of missing data for the analysis of GHS was: Week 6 (12·5%), Week 12 (22·7%), Week 18 (41·1%), Week 24 (57·4%), Week 30 (66·4%), Week 36 (69·9%) and Week 42 (75·6%).
GHS assessment. Generalised estimating equations analysis using Method A, multiple imputation with deaths assigned to the worst possible score, found a significant difference in GHS over the 42 weeks of the study, favouring the bortezomib arm (P = 0·001, Table II). No significant difference was seen using multiple imputation with deaths treated as withdrawals/missing data (Method B). Analyses using only the available data, with deaths either assigned the worst possible score (Method C) or treated as withdrawals/missing data (Method D), again found a significant difference in GHS favouring the bortezomib arm (both P = 0·002).
Table II. Statistical differences between changes in component/symptom scores in the longitudinal HRQL analysis over 42 weeks in patients treated with bortezomib or dexamethasone, as assessed using multiple imputation to account for missing data or using available data only, with deaths either assigned the worst possible score or treated as withdrawals/missing data.
EORTC QLQ-C30 component or symptom scale
GEE using multiply imputed datasets (M = 4)
GEE using available data
Method A: HRQL and NTX scores set to worst possible after death
Method B: HRQL and NTX scores set to missing after death
Method C: HRQL and NTX scores set to worst possible after death
Method D: HRQL and NTX scores set to missing after death
Global Health Status
*Bortezomib significantly better than dexamethasone.
†P-values adjusted using the Hochberg-Benjamini sequential testing procedure to account for multiple testing.
EORTC QLQ-C30, European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire – Core; GEE, generalised estimating equations; FACT/GOG-NTX, Functional Assessment of Cancer Therapies – Neurotoxicity questionnaire.
Figure 1 shows mean GHS by treatment arm over time using only the available data, with deaths assigned the worst possible score (Method C). Although GHS declined in both arms, bortezomib significantly delayed the decline compared with dexamethasone; the minimum important difference for decline was 5 points (P = 0·030).
Individual EORTC scale scores. Analysis of the individual scale scores from the EORTC questionnaire confirmed that patients randomised to bortezomib had better or comparable HRQL relative to patients randomised to dexamethasone. The component scores for physical health, and role, cognitive, and emotional functioning, and the symptom scores for dyspnoea and sleep were significantly better for the bortezomib group by all of Methods A, C, and D, while other significant differences favouring bortezomib were noted for the component score of social functioning and the symptom scores for nausea, pain, diarrhoea, and financial impact by one or two of these methods of analysis (Table II). However, no significant differences were seen using Method B, multiple imputation with deaths treated as withdrawals/missing.
NTX assessment. For overall NTX score, statistically significant differences were seen, favouring the bortezomib arm, using both methods of analysis in which deaths were assigned the worst possible score (Methods A and C, Table II). However, using the methods in which deaths were treated as withdrawals/missing (Methods B and D), no significant differences were seen between bortezomib and dexamethasone. Figure 2 illustrates the differing results obtained in mean NTX score over time using all available data with deaths either assigned the worst possible score or treated as withdrawals/missing.
Change from baseline to best response. Across all response categories, using available data with deaths assigned the worst possible score (Method C), GHS declined over time from baseline to best response. Additionally, patients on the bortezomib and dexamethasone arms statistically significantly worsened from baseline to best response in 8 and 11 of the EORTC scale scores, respectively, while patients on the bortezomib arm significantly improved in the pain symptom score. Changes from baseline to best response did not significantly differ between the bortezomib and dexamethasone arms with the exception of the nausea symptom score, for which the magnitude of change favoured dexamethasone (bortezomib 7·1 points vs. dexamethasone 1·5 points, P= 0·010). For patients achieving CR/PR or MR/SD, there were no significant differences between the bortezomib and dexamethasone arms in the change from baseline to best response in EORTC component scores. However, there were significant differences in symptom score changes among CR/PR patients for sleep, which favoured bortezomib (P = 0·039), and for nausea and appetite, which favoured dexamethasone (P = 0·006 and P = 0·014, respectively).
As with the GHS scores, using Method C, mean overall NTX score declined over time. There were no significant differences between the bortezomib and dexamethasone arms in the changes in NTX score from baseline to best response in the overall study population or in patient groups categorised by response. In the bortezomib arm, there was a significant within-group decline in mean NTX score from baseline to best response among CR/PR patients (n = 102, P = 0·002). In the dexamethasone arm, a significant within-group decline was seen in the overall population (P = 0·003), but not when patients were categorised by response (P = 0·14, P = 0·056 and P = 0·14 for patients with CR/PR, MR/SD and PD, respectively).
Assessment of HRQL in conjunction with clinical outcomes provides an additional means of evaluating treatment effects. Treatment with bortezomib was associated with better HRQL than high-dose dexamethasone across multiple domains over 42 weeks of study treatment. Specifically, bortezomib was associated with better GHS, better physical health, role, cognitive, and emotional functioning, and less dyspnoea and sleep disturbance over time compared with high-dose dexamethasone.
As demonstrated by the different methods of data analysis used, the way in which deaths were treated had an important impact on data interpretation. The APEX trial showed that 1-year survival was 80% on the bortezomib arm and 66% on the dexamethasone arm (Richardson et al, 2007). Thus, our prespecified plan to set the HRQL of deceased patients to the worst possible value (Methods A and C) may have skewed the results in favour of bortezomib. While there is clear methodological support for this analytical approach, appropriate interpretation of these data requires knowledge of, and integration with, survival data. To reduce the potential impact of higher death rates in the dexamethasone arm, deaths were treated as withdrawals/missing data in two additional analyses (Methods B and D). Results of these sensitivity analyses are more interpretable to patients and clinicians as the analyses address the question: “What kind of health-related quality of life am I likely to experience if I am treated with one medication or the other?”, the answer to which is only meaningful to surviving patients. Regardless of the method used to analyse the data, HRQL was often better with bortezomib than with dexamethasone. Notably, in one of the four analyses (Method B), no statistically significant differences between bortezomib and dexamethasone were seen when deaths were treated as withdrawals/missing and multiple imputation was used to account for missing data, suggesting that improved HRQL with bortezomib is at least partially explained by improved survival.
Although there is a higher incidence of some toxicities with bortezomib than with dexamethasone (Richardson et al, 2005), this is not reflected in overall HRQL. Notably, although the APEX trial showed bortezomib to be associated with a significantly greater incidence of grade ≥3 peripheral neuropathy compared with dexamethasone, this HRQL analysis showed that bortezomib was better than, or comparable with, dexamethasone with respect to changes in NTX score over the study period. This may be because the NTX questionnaire measures several aspects of neurotoxicity and includes a number of questions not specifically related to assessment of chemotherapy-induced peripheral neuropathy. An analysis of NTX score limited to only those questions specific to assessment of peripheral neuropathy would likely have revealed HRQL differences between patients receiving bortezomib and dexamethasone reflective of the greater incidence of peripheral neuropathy in the bortezomib arm of the APEX trial. NTX scores did worsen significantly from baseline in patients who achieved CR/PR with bortezomib. Patients who achieved CR/PR in the bortezomib arm received more cycles of therapy (median 8) than patients who did not respond (median 4); this may account for the difference in NTX scores, as the risk of peripheral neuropathy increases during the first five cycles of therapy before reaching a plateau (San Miguel et al, 2005). Comparison of NTX scores may also be confounded by the improved survival with bortezomib; analyses by Methods B and D showed no significant differences between bortezomib and dexamethasone.
At baseline prior to treatment, there were some differences in emotional functioning, fatigue, sleep and diarrhoea that favoured bortezomib. Given that the treatment arm was randomly assigned, HRQL assessments were made before therapy started, and the two groups were well balanced for clinical characteristics, these differences may be due to chance. Within the group achieving CR/PR with either bortezomib or high-dose dexamethasone, changes in HRQL from baseline were similar for most domains. Areas in which they differed, better sleep and more neurotoxicity in the bortezomib-treated group achieving CR/PR, and less nausea and anorexia in the high-dose dexamethasone-treated group achieving CR/PR, are consistent with clinical experience and support the validity of the HRQL instruments.
Recent data from a randomised trial published in abstract form suggest that a combination of lenalidomide plus low-dose dexamethasone (40 mg/d on days 1, 8, 15 and 22 every 4 weeks) is associated with better survival and less toxicity than a combination including high-dose dexamethasone, as used in the APEX study (Rajkumar et al, 2007). Adoption of a lower dexamethasone dose may slow the decline in HRQL seen with high-dose dexamethasone treatment, providing a better combination of response and HRQL. This hypothesis will need to be tested in future studies.
With an incurable disease like MM, efforts to improve survival are important. However, it is equally important to ensure that the quality of life is maximised during the extended survival period and that the side effects of treatment are not worse than the disease symptoms. This study confirms that the clinical benefits of bortezomib therapy demonstrated in the APEX trial are associated with significantly better HRQL than high-dose dexamethasone treatment. Compared with patients treated with high-dose dexamethasone, patients treated with bortezomib experience clinical benefits of living longer and living better. Whether this same HRQL advantage is maintained if lower-dose dexamethasone regimens are compared with bortezomib requires additional study.
We thank Lucy Cooke, Steve Hill, and Rosemary Washbrook for their assistance in drafting this manuscript. Lucy Cooke and Steve Hill are medical writers and Rosemary Washbrook is a medical editor with Gardiner-Caldwell London.