Evaluating the ability to detect change of health-related quality of life in children with Hodgkin disease

Authors


Abstract

BACKGROUND:

We evaluated 4 different health-related quality of life (HRQL) measures prospectively to determine their ability to detect change over time: the Health Utilities Index Mark 2 and Mark 3, the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core and Cancer Module, the EuroQol EQ-5D visual analogue scale (EuroQol), and the Lansky Play-Performance Scale.

METHODS:

Children with all stages of Hodgkin disease from 12 centers across Canada were asked to complete the 4 measures at 4 time points: 2 weeks after the first course of chemotherapy, on the third day of the second course of chemotherapy, during the third week of radiation, and 1 year after diagnosis.

RESULTS:

Fifty-one patients were enrolled in the study between May 1, 2002 and March 31, 2005. Two patients were excluded: 1 patient died shortly after the first time point and the other patient failed to complete any of the questionnaires. All measures showed a significant change between Time 1 and Time 4 (<0.05). When the change in child scores was analyzed between the time points using the child's self-reported change in HRQL, the PedsQL and the EuroQol showed significant change at all time points.

CONCLUSIONS:

All of the measures were able to detect change in a diverse group of children with Hodgkin disease. The PedsQL and the EuroQol appeared to be the most sensitive to change. Cancer 2010. © 2010 American Cancer Society.

Children with cancer have had a dramatic decrease in mortality over the past 30 years, with the overall 5-year survival in Canada estimated to be greater than 80%.1 In light of the excellent survival from many types of cancer in childhood, more emphasis is being placed now on assessment of quality of life, with the development of a variety of different measures. The Children's Oncology Group (COG), whose affiliated institutions treat approximately 94% of all children diagnosed with cancer in Canada and the United States,2 has not adopted a specific health-related quality of life (HRQL) measure for use in clinical trials and is reviewing the various tools available.

One of the most widely used HRQL measures is the Health Utilities Index Mark 2 and Mark 3 (HUI 2 and 3) which were developed initially for pediatric oncology.3, 4 Another popular measure is the Pediatric Quality of Life Inventory (PedsQL), which comprises a generic core with several disease-specific modules, including one for patients with cancer.5, 6 If either or both of these measures are to be used as an outcome in pediatric clinical studies, then we need to be clear about their measurement properties. In particular, can these measures determine small but clinically significant change over time (ie, are they responsive?). This has important implications for clinical trials that include the assessment of quality of life as a secondary outcome.

To help investigators choose HRQL measures, it is essential that we establish the limitations of the different types of measures so that they can select the optimal instruments for the particular study design. The purpose of this study is to determine the ability to detect change in HRQL over time (responsiveness) using 4 different HRQL measures: the PedsQL generic core and cancer modules, the HUI 2 and 3, the EuroQol EQ-5D visual analogue scale (EuroQol), and the Lansky Play-Performance Scale (Lansky). We decided to perform this study in patients with Hodgkin disease, as the median age of diagnosis is 13 years, with very few patients presenting before 8 years of age, allowing for self-completion of the measures by virtually all patients.7 This relatively homogeneous patient group allowed us to anticipate the pattern of change in quality of life to better assess the responsiveness properties of the measures.

A companion study looking at the agreement in HRQL scores among the children, their parents, and their clinic nurse has been submitted for publication.

MATERIALS AND METHODS

Recruitment of Participants

Patients and their families were recruited from oncology clinics at 12 pediatric centers across Canada: B.C. Children's Hospital, Vancouver, British Columbia; Alberta Children's Hospital, Calgary, Alberta; Stollery Children's Hospital, Edmonton, Alberta; Saskatoon Cancer Centre, Saskatoon, Saskatchewan; Cancer Care Manitoba, Winnipeg, Manitoba; Children's Hospital of Western Ontario, London, Ontario; McMaster Children's Hospital, Hamilton, Ontario; Hospital for Sick Children Toronto, Ontario; Children's Hospital of Eastern Ontario Ottawa, Ontario; Kingston General Hospital, Kingston, Ontario; Montreal Children's Hospital, Montreal, Quebec; and IWK Health Centre, Halifax, Nova Scotia. Each institution obtained approval from its local research ethics board before opening the study. Each participant was required to sign a consent form before enrolling in the study.

Eligible participants included all children between 8 and 17.99 years of age with a new presentation of pathologically confirmed Hodgkin disease. A lower age limit of 8 was chosen so that participants could self-complete all of the questionnaires. Hodgkin disease was selected as the optimal pediatric tumor group for this study design as the vast majority of newly diagnosed patients are older than 7 years of age. Patients had to be enrolled by the second week after starting chemotherapy so that all stages of disease could participate. Children who were unable to communicate in English were excluded. The study design was a prospective cohort study with repeated measures.

The research assistant approached consecutive newly diagnosed patients fitting the inclusion criteria. Consenting children were all asked to complete the HUI 2 and 3, the PedsQL generic and cancer modules, the EuroQol, and the Lansky at 4 time points (Fig. 1). Patients who did not receive radiation did not complete the Time 3 questionnaires. Questionnaires were given to the participants at a clinic visit and they were requested to complete the package that same day before leaving the clinic. If the patient had questions, then they were directed to the research assistant, not their parent.

Figure 1.

Depicted is the study design. Open arrows indicate timing of administration of the health-related quality of life measures: 2 weeks after the start of the first course of chemotherapy (Time 1), on Day 3 of the second course of chemotherapy (Time 2), during the third week of radiation (Time 3), and 1 year after diagnosis (Time 4).

Questionnaires

All of the measures were framed so as to encompass the HRQL of the preceding week.

PedsQL 4.0 Generic Core Scales and PedsQL 3.0 Cancer Module

The PedsQL takes a modular approach to measuring HRQL, comprising one generic core scale in combination with a disease-specific module. The measure was developed originally for a population of children with cancer.8

The generic component of the PedsQL 4.0 was designed to address the 3 core dimensions of health (physical, mental, and social) as delineated by the World Health Organization, as well as role (school) functioning.9 This scale comprises 23 items divided among 4 domains: physical functioning (8 items), emotional functioning (5 items), social functioning (5 items), and school functioning (5 items). The PedsQL 3.0 Cancer Module comprises 27 items divided among 8 domains, namely, pain and hurt (2 items), nausea (5 items), procedural anxiety (3 items), treatment anxiety (3 items), worry (3 items), cognitive problems (5 items), perceived physical appearance (3 items), and communication (3 items).10

The PedsQL maintains consistency across a broad age range of respondents and generally has a high level of internal and external reliability. This measure is deemed an acceptable tool for use in both individual patient analyses and primary analysis of HRQL in clinical trials.8

Health Utilities Index Mark 2 (HUI 2) and Mark 3 (HUI 3)

The HUI 2 and HUI 3 are complementary measures combined into a single questionnaire, the HUI23. They are generic, preference-based instruments that measure health status and HRQL. The HUI 2 assesses 7 attributes of health: sensation, mobility, emotion, cognition, self-care, pain, and fertility. Fertility was not assessed in this study. Although the HUI 3 also assesses cognition, emotion, and pain (using different constructs from HUI 2), it includes other attributes, such as ambulation, speech, hearing, vision, and dexterity. The HUI 2 health status classification system provides 3-5 levels of function for each attribute, while the HUI 3 system has 5-6 levels for each attribute. The 15-item HUI23 questionnaire for self-report allows the investigator to describe a subject's health status according to the classification systems of both the HUI 2 and the HUI 3, using corresponding coding algorithms. The respective utility functions then generate utility scores for single attributes and overall HRQL. The scores are defined on a scale from 0-1, but to facilitate comparison with the other measures, the results were multiplied by 100. The HUI 2 and HUI 3 demonstrate moderate convergent validity, good discriminant validity, and high test-retest reliability.11, 12

The EuroQol (EQ-5D)

The EuroQol was developed by an international, multidisciplinary group of researchers in adult patients to establish a standardized generic instrument capable of being expressed as a single index value. It is made up of 2 components: a description of the patient's health status and a self-rating of their current status using a visual analogue scale (VAS). The descriptive component comprises 5 dimensions, each having 3 levels: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The VAS is a vertical thermometer anchored by the “best imaginable health state” at the top of the scale (100) and the ‘worst imaginable health state’ at the bottom (0). The EuroQol has been shown to have good consistency, convergent validity, and reliability.13 For the purposes of this study, we used only the VAS for analysis, as there are no “value sets” for populations younger than 20 years of age to use with the descriptive component.

The Lansky Play-Performance Scale

The Lansky scale is a pediatric adaptation of the Karnofsky Performance Status Scale Index of global functioning. It comprises a single domain with 11 items scaled from 0 to 100, ranging from unresponsive (0) to fully active (100). The reliability and validity of the scale are very good.14 This measure has been used widely in pediatric oncology, but it is limited by looking solely at functioning. As it is the best known of all the various measures to pediatric oncologists, we have included it despite its limitations.

Global Rating of Change

The global rating of change provides an overview of the child's change in quality of life. At Time 2, Time 3, and Time 4, all participants were asked to rate whether the health of the child had improved, stayed the same, or was worse than the last time they filled in the questionnaire.

A Priori Hypothesis

We hypothesized that the PedsQL 4.0 would be more responsive to change over time than the HUI 2 and 3 when administered to a group of children with Hodgkin disease at various stages of therapy. We expected that the HRQL would be lowest during chemotherapy (Time 1 and Time 2), increase during radiotherapy (Time 3), and be highest at 1-year after diagnosis (Time 4).

Statistical Analysis

We analyzed the correlation between measures as well as the responsiveness of the measures over time.

Correlation Analyses

We evaluated the associations between measures. The objective was to assess the concurrent validity of the 4 measures. The Spearman rank-order correlation coefficient was used. We adopted the scheme described by Juniper et al 1996 for evaluating correlations: 0.00 to 0.19, negligible or not correlated; 0.20 to 0.34, weakly correlated; 0.35 to 0.50, moderately correlated; and >0.50, strongly correlated.15

Responsiveness Analyses

We used scatter diagrams to examine the trend over time in the HRQL scores.

To compare the responsiveness of the 4 HRQL measures to change in health status, we used 4 responsiveness statistics discussed by Deyo et al16 These are as follows: 1) standard t-statistics were obtained from tests comparing mean HRQL scores between the beginning and end of a period, for patients whose health status changed over that period; 2) effect size was calculated as the difference in mean HRQL scores at the beginning and end of a period, divided by the standard deviation of the scores at the beginning of a period, for patients whose health status changed over that period; 3) Cohen's effect size was computed by dividing the difference in mean HRQL scores between the beginning and end of a period, by the standard deviation of the individual differences;17 and 4) the area under the receiver operating characteristic (ROC) curve. Effect size was interpreted as described by Cohen: 0.00-0.19 negligible; 0.20-0.49 small; 0.50-0.79 moderate; and ≥0.80 large effect sizes.17

In addition, we performed a repeated measures analysis of variance (ANOVA) to compare the trends over time among instruments. These analyses are possible only if there are methodological reasons to scale the instrument domain scores compared with a common index (eg, between 0 and 100). Standard residual diagnostics were used to assess the goodness of fit of the ANOVA model. By using the fitted models, we derived the least squares mean estimates of domain scores at different time points, together with their 95% confidence intervals.

RESULTS

Fifty-one patients were enrolled in the study between May 1, 2002 and March 31, 2005, with 49 patients included in the final analysis. The 2 excluded patients included 1 child who died shortly after completing the first set of questionnaires and another child who failed to complete any of the questionnaires. During the study period, there were 4 patients who relapsed (8%). Patient characteristics and treatment details are summarized in Table 1.

Table 1. Patient Characteristics and Treatment Details (N = 49)
CharacteristicsNo. (%)
  1. POG 9425 indicates Pediatric Oncology Group Advanced Stage Hodgkin Disease study; POG 9426, Pediatric Oncology Group Early Stage Hodgkin Disease study; AHOD 0031, Children's Oncology Group Intermediate Risk Hodgkin Disease study; GPOH-HD95, German multinational study which includes all stages of Hodgkin Disease.

Mean age, range, y14.7, 8.9–18.0
Male sex22 (45)
Stage 
 IIA19 (39)
 IIB5 (10)
 IVA9 (18)
 IVB6 (12)
 Other10 (20)
Chemo protocol 
 POG 942511(22)
 POG 94266 (12)
 AHOD 003118 (37)
 GPOH-HD955 (10)
 Other9 (18)
Radiation 
 21Gy27 (61)
 Other dose5 (11)
 No radiation12 (27)

Correlation

All of the measures were strongly correlated at Time 1 (Table 2) except for the Lansky and the PedsQL cancer module, which were only moderately correlated.

Table 2. Correlation Between Questionnaire Scores at Time 1
 LanskyEQ5DPedsQLPedsQL cancerHUI 2HUI 3
  • Lansky indicates Lansky Play-Performance Scale; EuroQol, EuroQol EQ-5D visual analogue scale; PedsQL, Pediatric Quality of Life Inventory; HUI 2, Health Utilities Index Mark 2; HUI 3, Health Utilities Index Mark 3.

  • a

    Correlation is significant at the 0.01 level (2-tailed).

Lansky1.0     
EuroQol0.57a1.0    
PedsQL0.70a0.58a1.0   
PedsQL cancer0.41a0.52a0.72a1.0  
HUI20.59a0.65a0.73a0.61a1.0 
HUI30.60a0.65a0.72a0.71a0.83a1.0

Responsiveness

At 1 year from diagnosis (Time 4), 91% of patients rated that their health had improved from when they were receiving treatment, with 6% stating they were the same and 3% stating they were worse. All measures showed a significant change in summary scores between Time 1 and Time 4 (P < .05). The mean change in scores ranged from 10 for the HUI 2 to 27 for the Lansky. Figure 2 shows the change in the summary scores over the various stages of treatment for all children. Figure 3 is the change for children who rated their health as improved at least once during the study with the remainder of the time points rated as the same or improved: children were excluded if they ranked their health as worse or did not complete the rating. This was done to ensure that the assessment of responsiveness was done only on patients who improved during the course of the study, as there were inadequate numbers to analyze the group that worsened. No child rated their health as the same throughout the 3 follow-up time points.

Figure 2.

Change in child reported health-related quality of life scores from the time of enrollment to 1 year after diagnosis shows (A) change in Health Utilities Index Mark 2 and Mark 3 scores, (B) change in the Pediatric Quality of Life Inventory generic and cancer module scores, and (C) change in the Lansky Play-Performance Scale and EuroQol EQ-5D visual analogue scale scores.

Figure 3.

Change in summary scores for participants who rated their health as improved shows (A) change in Health Utilities Index Mark 2 and Mark 3 scores, (B) change in the Pediatric Quality of Life Inventory generic and cancer module scores, and (C) change in the Lansky Play-Performance Scale and EuroQol EQ-5D visual analogue scale scores.

A comparison of the responsiveness of the various measures is summarized in Table 3. All of the effect sizes were large and clinically relevant. When the changes between the individual time points were analyzed, all of the measures demonstrated a large effect size between Time 1 and Time 2. Between times 2-3 and 3-4, the HUI 2 and HUI 3 had negligible to small effect sizes, whereas the PedsQL, Lansky, and EuroQol had moderate to large effect sizes. The PedsQL cancer module had small to moderate effect sizes. A summary of the effect sizes can be seen in Table 4.

Table 3. Summary of Deyo's Statistics for Analysis of Responsiveness Between Baseline and Final Visit
MeasureTest
t StatisticEffect SizeCohen's Effect SizeArea Under ROC Curve
  1. ROC indicates receiver operator curve; HUI 2, Health Utilities Index Mark 2; HUI 3, Health Utilities Index Mark 3; PedsQL, Pediatric Quality of Life Inventory; Lansky, Lansky Play-Performance Scale; EuroQol, EuroQol EQ-5D visual analogue scale.

HUI23.740.770.850.801
HUI33.870.860.900.681
PedsQL7.381.171.690.690
PedsQL cancer5.831.181.220.688
Lansky7.771.381.950.547
EuroQoL5.931.331.360.737
Table 4. Effect Size of Change in Measure Scores Between Time Points
 Effect SizeCohen's Effect Size
1-22-33-41-22-33-4
  • HUI 2 indicates Health Utilities Index Mark 2; HUI 3, Health Utilities Index Mark 3; PedsQL, Pediatric Quality of Life Inventory; Lansky, Lansky Play-Performance Scale; EuroQol, EuroQol EQ-5D visual analogue scale.

  • a

    Indicates a large effect size (≥0.8).

  • b

    Indicates a moderate effect size (0.5-0.79).

HUI21.12a0.00.31.0a00.3
HUI31.61a0.240.41.3a0.20.4
PedsQL1.53a0.51b0.5b1.3a0.6b0.8a
PedsQL Cancer1.16a0.380.30.9a0.6b0.5b
Lansky1.08a0.78b1.0a0.9a0.5b1.0a
EQ5D2.03a0.78b0.9a1.9a0.6b1.1a

DISCUSSION

The HUI 2 and 3, PedsQL, Lansky, and EuroQol all provide different information regarding a patient's quality of life. The HUI 2 and 3 are generic, preference-based measures that provide utility scores that can be used for economic evaluation. The PedsQL is comprehensive modular health profile with a generic core questionnaire and a disease-specific module. The Lansky scale is a simple tool that asks a single question about global functioning. The VAS from the EuroQol allows patients to “ball park” their quality of life on a line anchored on either end by what they consider to be the best and worst imaginable health state.

There was strong correlation among all the measures, supporting the finding that they are all measuring a similar construct: HRQL. All of the measures were able to detect the change between baseline and the end of therapy, and all with major effect sizes (>0.8). Not surprisingly, the EuroQol was one of the most sensitive to change, as patients are able to succinctly capture all relevant aspects of their own HRQL using a VAS, allowing very detailed assessment of change. The main limitation of a VAS is that it does not allow the investigator to differentiate which specific area of HRQL is affected. The Lansky scale is similarly limited in that it asks only about physical functioning. The advantage of tools such as the PedsQL and the HUI are that they are subdivided into domains (dimensions or attributes) of health, so that we can assess the area in which the majority of the improvement/deterioration occurs: is it mainly emotional, physical or social?

Fortunately all of the measures showed major effect sizes from Time 1 to Time 4, albeit in the setting of a significant change in clinical status. Whether these measures are able to detect more subtle changes is an important issue that is not fully resolved. Of the more comprehensive measures, the PedsQL showed moderate to large Cohen's effect sizes when the patients went from receiving chemotherapy to radiotherapy to off therapy, whereas the HUI 2 and 3 showed only small to no effect sizes.

A smaller single institution study looking at a heterogeneous group of pediatric cancer patients also found improved responsiveness characteristics for the PedsQL when compared with the HUI 2 and 3.18 A detailed analysis of responsiveness was carried out in adults with epilepsy, which showed the EuroQol to be the measure most responsive to change with a moderate effect size, with the HUI 3 having a small effect size and the HUI 2 a negligible effect size. However, when analyzed by using the minimally clinically important difference and the “really important difference,” the HUI 2 was the most responsive.19 A study of angioplasty patients showed a significant change in HUI 3 scores from baseline but did not report on effect sizes.20 Another study of the HUI 2 and 3 found a large effect size for the change in both measures in osteoarthritis patients before and after total hip arthroplasty.21

One limitation of our study is that when assessing change, one should compare the change in scores between stable patients and patients who either improve or worsen. In our study, virtually all of the patients improved, making this analysis impossible. This is not surprising given the clinical setting: indeed, it would have been surprising to not have an improvement in HRQL going from cancer therapy to off treatment. The few children who didn't improve were the patients who relapsed, which occurred only in 4 subjects in our study population. Because of this, we are unable to assess if any of the measures were “too sensitive” to change, but this is unlikely because all of the tools have previously been shown to be reliable. Another limitation is that we did not collect data on nonparticipants, and so we cannot be certain of how representative our study sample is.

Conclusion

In children with Hodgkin disease, all of the measures used showed significant change in HRQL from 2 weeks after receiving their first chemotherapy to 1 year after diagnosis. We recommend that, in particular, the PedsQL and the EuroQol be included as outcome measures in any future studies in children with cancer when change in HRQL needs to be measured. If utility or economic analysis is part of the study, then the HUI 2 and 3 should also be included.

Acknowledgements

We acknowledge Dr. Ronald Barr, Dr. Rochelle Yanofsky, Dr. Paul Rogers, and Dr. Anne-Sophie Carret for their contributions as coauthors of this manuscript.

CONFLICT OF INTEREST DISCLOSURES

This research was supported by the Hospital for Sick Children Foundation.

Ancillary