Development and validation of the revised Cedars-Sinai health-related quality of life for rheumatoid arthritis instrument

Authors


Abstract

Objective

To improve accuracy and content coverage of the original 33-item Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument (CSHQ-RA).

Methods

A total of 312 RA patients from 55 sites were screened in a 24-week trial. Patients completed an expanded 48-item version of the CSHQ-RA, Medical Outcomes Study Short Form 36 (MOS SF-36), and Stanford Health Assessment Questionnaire (HAQ) Disability Index at 5 visits. The revised CSHQ-RA was created based on response frequencies and distributions, item-to-item correlation, factor and Rasch analysis, and input from experts. Psychometric evaluation included internal consistency, test–retest reliability, convergent and discriminant validity, and responsiveness. Minimum clinically important difference (MCID) was also measured.

Results

Response rates were 93% at baseline and 71% at 12 weeks. Eighty-one percent of respondents at baseline were women, mean ± SD age was 52 ± 12 years, and mean ± SD duration of RA was 10.8 ± 10.4 years. The revised CSHQ-RA included 36 items measuring 7 domains (4 original and 3 new). All Cronbach's alpha coefficients were >0.8, indicating good internal consistency. Test–retest reliability measured intraclass correlation coefficients, which ranged from 0.86 to 0.95. All 7 domains correlated significantly with the MOS SF-36 and HAQ, indicating good convergent validity. Analysis of variance of disability group scores showed good discriminant validity (P < 0.0001). The MCIDs ranged from 6.2 for social well-being to 14.8 for pain/discomfort.

Conclusion

The revised CSHQ-RA was validated using a broader RA patient population. It captures 3 additional domains (social well-being, pain/discomfort, and fatigue), which allow for measuring all important aspects of health-related quality of life.

INTRODUCTION

In 2001–2002, the original Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument (CSHQ-RA) (1, 2) was specifically designed to measure the impact of RA on patients' health-related quality of life (HRQOL) (3). The CSHQ-RA is a disease-specific instrument that differs from well-validated instruments such as the Health Assessment Questionnaire (HAQ) Disability Index, which measures functional status. The original CSHQ-RA looks at the broader and more inclusive construct of HRQOL. The instrument has 33 items, measuring 5 domains: dexterity, mobility, physical activity, emotional well-being, and sexual function. There is also a short form version containing 11 items (4). High correlations between the original CSHQ-RA subscales and the Stanford HAQ Disability Index and Medical Outcomes Study Short Form 36 (MOS SF-36) demonstrated the convergent validity of the instrument (2). The subscale scores of the original CSHQ-RA also showed high reproducibility over the 4-week test–retest time frame (2).

However, the original CSHQ-RA has several limitations due to the study design and convenience sample of patients with RA used in its validation. First, although the original CSHQ-RA was found to be valid and reliable, important HRQOL domains such as social well-being were not measured. Second, the initial psychometric assessment of the original CSHQ-RA did not determine the magnitude that represents a minimum clinically important difference (MCID), and the cross-sectional study design precluded testing of the instrument's responsiveness to MCIDs in health states. Consequently, it is not clear how accurately the original CSHQ-RA can measure changes in HRQOL in RA patient populations over time and across treatment groups.

To address these limitations, an effort to revalidate the instrument was made while at the same time a revised version of the CSHQ-RA was developed. Because the original CSHQ-RA has been used in several clinical studies in different countries, it was important to further validate the original instrument (5). Meanwhile, the present study seeks to develop a more comprehensive version of the CSHQ-RA and to assess the psychometric characteristics of the revised instrument using a larger and more geographically distributed RA patient population. A further objective was to determine the magnitude of MCID in the scores on the revised CSHQ-RA and its responsiveness to changes in health status in patients with RA.

PATIENTS AND METHODS

Role of the study sponsor.

Amgen had a main role in study design, data collection, data analysis, and writing of the manuscript. Both Amgen and Cerner LifeSciences had roles in agreement to submit the manuscript and in approval of the content of the manuscript.

Data sources and material.

New items.

In an effort to expand the clinically relevant areas, 15 items were added to the original CSHQ-RA by 5 of the 6 experts who developed the original 33-item CSHQ-RA instrument. The 48-item expanded survey (see Appendix A, available at the Arthritis Care & Research Web site at http://www.interscience.wiley.com/jpages/0004-3591:1/suppmat/index.html) included additional items selected from the original item pool and items developed by the panel to fill perceived gaps in the original CSHQ-RA.

Phase IV anakinra trial.

Data were collected during a phase IV, multicenter, open-label, single-arm study of patients with RA receiving anakinra therapy. Patients were enrolled at 55 sites, mainly clinics, that were evenly distributed throughout the US. All patients were ≥18 years old with active RA. Active RA was defined by the presence of at least 3 of the following 4 criteria: ≥3 swollen joints, ≥3 tender joints, morning stiffness ≥30 minutes, or C-reactive protein levels ≥1.0 mg/dl or erythrocyte sedimentation rate ≥28 mm/hour (required only if the previous 3 criteria were not all met). Patients were included only if they were currently receiving stable doses of disease-modifying antirheumatic drugs for a minimum of 2 months, stable doses of nonsteroidal antiinflammatory drugs and oral corticosteroids for 1 month, or were not receiving either therapy.

Patients were screened in an initial session; received anakinra injections on site at baseline, 4-week, 12-week, and 24-week visits; and received tools for daily self-injection between clinic visits. At screening and baseline sessions, demographic information was collected and patients completed the expanded CSHQ-RA questionnaire, as well as the HAQ and MOS SF-36. The 3 questionnaires were each administered again at the 4-week, 12-week, and 24-week visits. All data were collected from April 2002 to September 2003.

Missing data.

If responses to more than half of the items in a subscale of the MOS SF-36 were missing, then the observation(s) were not included. Otherwise, the given item was assigned a value equal to the mean of the subscale (6). No imputation methods were used to fill in missing data on the HAQ or the expanded CSHQ-RA. Responses with missing values were included in the initial item reduction analysis of the expanded CSHQ-RA. Responses were not included in the factor analysis if a response to any of the expanded CSHQ-RA items was missing. The number of patients with missing data on any of the 48 items ranged from 1 to 28 (0.3% to 9.6%). Only 2 items had >5% missing data. Patients who had missing values for any given subscale of the expanded CSHQ-RA were not included during the convergent and discriminant analyses of that subscale. For regression and discriminant analysis, only patients with complete data for the specified analysis were used.

Development of the revised CSHQ-RA.

The first stage of the development of the revised CSHQ-RA was an initial item reduction based on statistical characteristics of the responses to all 48 items on the expanded CSHQ-RA items, including response rates, floor and ceiling effects, and item-to-item correlations. In the second stage, factor analysis was conducted to determine the number of interpretable and meaningful domains and their corresponding items in the revised instrument. Item response theory (IRT)–based analysis was then conducted to evaluate whether each item measured the same underlying concept as the other items in its domain. At each stage, the expert panel reviewed results. Item reduction and the final factor structure were reviewed and determined by expert panel consensus based on the statistical analyses.

Item reduction.

Items with >5% missing responses were considered for exclusion from the analysis. Items with >50% of responses at either the upper (ceiling) or lower (floor) end of the response scale were considered as not assessing a wide enough spectrum of the relevant issue and were considered for deletion (1, 7).

Item-to-item correlations were calculated for each item on the expanded CSHQ-RA using Pearson's correlation coefficients. Those items that showed strong homogeneity (r ≥ 0.70) were considered for deletion.

Factor analysis.

Principal components analysis with oblique promax rotation, as opposed to the orthogonal ratio, was conducted to determine the number of interpretable domains of the expanded CSHQ-RA to retain, as we expected that these derived domains were somewhat correlated. The number of factors was determined by the number of eigenvalues >1. The item-factor membership was determined by the factor loadings as an indication of the degree to which each item was associated with each factor. Items were retained in a given factor if they had a factor loading ≥0.30. Items loaded on multiple factors (factor loading ≥0.30) were reviewed and their final factor membership was determined by the expert panel based on their content similarities with other items in the same factor.

Item response theory analysis.

Polytomous Rasch model IRT-based analyses (8) were conducted to further facilitate the item selection/reduction for those items retained from the factor analysis. Specifically, Andrich's rating scale model (9–12) was applied for analyzing the Likert-type response data of the expanded CSHQ-RA. To examine the fit of each item to measure a unidimensional factor, the infit and outfit mean square item fit statistics were evaluated. Values substantially above 1 indicate noise in the data or a “misfitting” item; values substantially below 1 indicate redundancy or an “overfitting” item and do not disturb the meaning of a measure. These values are on a ratio scale, so that 1.3 indicates 30% excess noise in the data. We set 0.7–1.3 as a criterion for items with good fit, and focused more on infit than outfit because infit provides more information about item performance among individuals located near the position of the item on the continuum.

Cronbach's alpha coefficients were calculated before and after removal of items to examine how the internal consistency of each domain changed. The expert panel reviewed all the statistical analysis results and reached consensus on the final form of the revised CSHQ-RA from both clinical/conceptual and statistical viewpoints.

Validation of the revised CSHQ-RA.

Test–retest reliability was assessed using responses to the revised CSHQ-RA recorded at screening and baseline sessions. Patients were included in the analysis if they completed the revised CSHQ-RA at screening and baseline visits within 7–14 days of each other and reported no change in health status between screening and baseline visits, as defined by an unchanged score on the global health item of the MOS SF-36 questionnaire. Intraclass correlation coefficients (ICCs) were computed for each subscale score to indicate the test–retest reliability, with 0.7 as the threshold value (13).

Tests for convergent validity were based on patients' responses to the revised CSHQ-RA, the MOS SF-36, and the HAQ at baseline. Pearson's correlation coefficients were calculated between subscale scores of the revised CSHQ-RA, between revised CSHQ-RA subscale scores and the HAQ, and between the MOS SF-36 Physical Component Summary (PCS) and Mental Component Summary (MCS) subscales. Tests for know-group discriminant validity were based on patients' responses to the revised CSHQ-RA and the HAQ at baseline.

Studies have demonstrated that a change from 0.22 to 0.5 on the HAQ indicates clinically meaningful change in the health of a patient (14–16). Because HAQ scores can vary from 0 to 3, patients were divided into 6 groups of RA severity (i.e., using 0.5 as the cutoff point for each group) based on their responses to the HAQ to test the know-group discriminant validity. An analysis of variance (ANOVA) was performed to determine whether the mean score on each subscale of the revised CSHQ-RA varied significantly across these 6 groups, and whether individuals with greater disability reported statistically higher revised CSHQ-RA scores than those with less severe disability. Internal consistency was tested for each domain using Cronbach's alpha coefficient, with coefficients ≥0.70 indicating acceptable internal consistency (17).

To test the responsiveness or sensitivity to change, patients were divided into 2 groups: patients who had a change of at least 1 point on the 5-point Likert scale of the SF-36 global health question from baseline visit to week 12, and those who did not. For the group that showed change (either became worse or better), a Wilcoxon signed rank test was conducted on the revised CSHQ-RA subscale scores at the 2 time points. In addition, an analysis of covariance (ANCOVA) was conducted to examine whether subscale scores changed differently for the 2 groups from baseline visit to week 12. The covariates included scores at baseline visit, age, sex, education, marital status, employment status, race/ethnicity, and duration of RA in years.

Patient's or physician's global assessment was used as a benchmark for defining MCID or clinically meaningful change (18, 19). We used the global health question of the MOS SF-36 as the benchmark for determining the MCID for the revised CSHQ-RA. The global health question is a 5-point Likert scale with 1 indicating excellent health and 5 representing poor health. Study participants were classified into 2 groups depending on whether their change in global health score from baseline visit to 12 weeks was <1 or ≥1. A discriminant function was then generated using the classification variable and the change in revised CSHQ-RA subscale scores from baseline visit to week 12. One set of coefficients was obtained from the discriminant function for each of the 2 groups. MCID was then calculated using these coefficients. MCID was defined as the cut-point that yielded the dividing line between those patients who achieved MCID on the global health question and those who did not.

All statistical analyses used for instrument development and validation except for ICC and Rasch analysis were performed using SAS version 8.2 (SAS Institute, Cary, NC). ICCs were calculated using SPSS software (SPSS, Chicago, IL). Rasch analysis was performed using WINSTEPS (MESA Press, Chicago, IL).

RESULTS

Sample characteristics.

Data from 307, 291, 259, 222, and 238 patients were recorded at the screening visit, baseline visit, week 4, week 12, and week 24, respectively; 202 patients completed all 6 assessments. Data from the 291 patients who completed the baseline session were used for item reduction and tests of convergent and discriminant validity. Data from 286 patients were recorded at both screening and baseline visits; of those patients, 207 had stable health at 2 time points, and therefore their data were used to examine test–retest reliability. There were 222 patients who completed both the baseline visit and week 12 sessions, included in the analysis of instrument responsiveness and calculation of MCID.

The mean ± SD age of the sample was 52 ± 12 years, with an average RA duration of 10.8 ± 10.4 years. The majority of patients were women (80% at screening visit and 81% at baseline visit). Approximately 65% of patients were married or living with a significant other, and ∼46% were either retired or unemployed. This patient population was broader in its education level than the convenience sample used for validation of the original CSHQ-RA. At baseline visit, 16.4% of the patients had completed <12 years of schooling, 36% only had a high school degree, 19.2% had some college experience (but no degree), and 28.3% had obtained at least a 2-year college degree. The sample was also more ethnically diverse, with 10.5% of patients being African American and 9.4% being Latino/Mexican American.

Revised CSHQ-RA development.

No item showed ceiling or floor effects. Only 2 items had an item-to-item correlation >0.70, indicating possible item content redundancy. The Pearson correlation coefficient for items 1 and 2 (“During the past 4 weeks, how difficult was it for you to get in or out of bed?” and “During the past 4 weeks, how difficult was it for you to get in or out of a chair?”) was 0.833. The expert panel examined the 4 items that met criteria for possible deletion. After judging their contents, the panel decided to retain all of those items. As a result, all 48 items of the expanded CSHQ-RA were subjected to factor and Rasch analyses.

The first principal components analysis resulted in 7 factors with an eigenvalue >1. Another principal components analysis with 7-factor solution was then conducted to determine the retention of items in each factor. Items were retained if their factor loading was >0.30. The 7 factors had 14, 7, 7, 6, 6, 4, and 4 items, respectively.

Separate Rasch analyses were conducted to evaluate scale homogeneity and item fit for the items in each retained factor/domain. Item fit statistics for each retained item in the same factor were evaluated. In factor 1, items 22, 28, 29, 35, and 40 had an infit or outfit mean square >1.40 in at least 1 of the combinations tested. High misfit was also found for item 44 in factor 2; items 20, 23, and 41 in factor 3; items 42 and 43 in factor 5; and item 27 in factor 6. The internal consistency of each factor when comprised of different items is reported in Table 1 to examine whether the removal of misfitting items improves internal consistency. For most factors, Cronbach's alpha coefficient increased after removing items with misfit, with one exception. The internal consistency of factor 1 remained virtually unchanged when items 22, 28, 29, 35, and 40 were removed. Based on these results, the expert panel removed all 12 items with high misfit from the expanded CSHQ-RA. The final revised CSHQ-RA retained 36 items in 7 clinically meaningful domains: emotional well-being (9 items), dexterity (6 items), physical activity (4 items), mobility (6 items), social well-being (4 items), pain/discomfort (3 items), and fatigue (4 items). Strong internal consistency is indicated by the final scores, which ranged from 0.88 to 0.96.

Table 1. Internal consistency analysis and test–retest reliability of the revised CSHQ-RA*
DomainNo. itemsItems used or deletedCronbach's alphaICC
  • *

    CSHQ-RA = Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument; ICC = intraclass correlation coefficient.

1. Emotional well-being14All items: CSHQ34, CSHQ39, CSHQ33, CSHQ38, CSHQ37, CSHQ36, CSHQ30, CSHQ32, CSHQ35, CSHQ22, CSHQ40, CSHQ31, CSHQ29, CSHQ280.95 
 10Deleted items: CSHQ40, CSHQ29, CSHQ28, CSHQ220.951 
 9Deleted items: CSHQ40, CSHQ35, CSHQ29, CSHQ28, CSHQ220.9490.94
2. Dexterity7All items: CSHQ10, CSHQ11, CSHQ9, CSHQ8, CSHQ7, CSHQ12, CSHQ440.927 
 6Deleted items: CSHQ440.9370.94
3. Physical activity7All items: CSHQ19, CSHQ18, CSHQ21, CSHQ17, CSHQ23, CSHQ41, CSHQ200.945 
 5Deleted items: CSHQ41, CSHQ200.955 
 4Deleted items: CSHQ41, CSHQ23, CSHQ200.9610.95
4. Mobility6All items: CSHQ3, CSHQ2, CSHQ4, CSHQ1, CSHQ6, CSHQ50.9100.90
5. Social well-being6All items: CSHQ46, CSHQ45, CSHQ47, CSHQ48, CSHQ42, CSHQ430.858 
 4Deleted items: CSHQ42, CSHQ430.8940.90
6. Pain/discomfort4All items: CSHQ25, CSHQ26, CSHQ24, CSHQ270.875 
 3Deleted items: CSHQ270.8770.86
7. Fatigue4All items: CSHQ14, CSHQ15, CSHQ16, CSHQ130.8930.87

The final domain scores were calculated by summing the items and multiplying a factor to normalize the score. The equations were as follows:

equation image
equation image
equation image
equation image
equation image
equation image
equation image

The scores were standardized and varied from 0 to 100.

Revised CSHQ-RA validation.

A total of 207 patients met the criteria for stable health across both the screening and baseline visits (no change in global health item); their data were included in analyses of test–retest reliability. ICCs for all domains exceeded 0.70 (range 0.86–0.95), indicating good test–retest reliability (Table 1).

The mean, median, and interquartile ranges for each domain of the revised CSHQ-RA, the HAQ, and the PCS and MCS of the MOS SF-36 at baseline visit are summarized in Table 2. Pearson's correlation coefficients (Table 3) indicated strong correlations between the revised CSHQ-RA subscales, as well as between each subscale and the other instruments (the SF-36 MCS and PCS and the HAQ). As expected, the PCS was more highly correlated with the physical activity and mobility domains, whereas the MCS was more highly correlated with the emotional well-being and social well-being domains. Between-domain correlation coefficients ranged from 0.47 to 0.72. Correlations were higher with the HAQ than with the PCS and MCS as expected, but good convergent validity was indicated with both instruments (Table 3).

Table 2. Mean, median, and interquartile ranges (IQRs) for revised CSHQ-RA, PCS, MCS, and HAQ at baseline visit*
InstrumentsMean ± SDMedian (IQR)Range
  • *

    CSHQ-RA = Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument; PCS = Physical Component Summary; MCS = Mental Component Summary; HAQ = Health Assessment Questionnaire; MOS SF-36 = Medical Outcomes Study Short Form 36.

  • Lower values indicate better health-related quality of life (HRQOL; CSHQ-RA or physical well-being [HAQ]); mean scores are standardized from 0 to 100.

  • Mean scores are standardized from 0 to 100; higher values indicate better HRQOL for the MOS SF-36.

  • §

    Lower values indicate better HRQOL (CSHQ-RA or physical well-being [HAQ]); range of possible scores is 0–3.

Revised CSHQ-RA domains   
 Dexterity52.5 ± 20.550.0 (36.7–66.7)0–100.0
 Mobility62.8 ± 19.963.3 (50.0–76.7)0–100.0
 Physical activity69.2 ± 23.970.0 (55.0–90.0)0–100.0
 Emotional well-being69.4 ± 19.568.9 (55.6–84.4)0–100.0
 Social well-being59.0 ± 20.960.0 (45.0–75.0)0–100.0
 Pain/discomfort76.2 ± 18.480.0 (60.0–93.3)0–100.0
 Fatigue62.1 ± 22.860.0 (45.0–80.0)0–100.0
MOS SF-36: generic quality of life   
 PCS31.2 ± 8.430.5 (25.6–35.6)8.7–54.5
 MCS37.9 ± 10.938.7 (30.0–44.8)3.7–66.2
HAQ disability index: physical disability (standard)§   
 Total score1.5 ± 0.71.6 (1.0–2.0)0–3.0
Table 3. Pearson's correlation coefficients for all HRQOL instruments and revised CSHQ-RA*
DomainDexterityMobilityPhysical activityEmotional well-beingSocial well-beingPain/discomfortFatigue
  • *

    HRQOL = health-related quality of life; CSHQ-RA = Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument; PCS = Physical Component Summary; MCS = Mental Component Summary; HAQ = Health Assessment Questionnaire.

  • Lower values indicate better HRQOL (CSHQ-RA) or physical well-being (HAQ); mean scores are standardized from 0 to 100.

  • Mean scores are standardized from 0 to 100; higher values indicate better HRQOL (Medical Outcomes Study Short Form 36).

  • §

    Lower values indicate better HRQOL (CSHQ-RA) or physical well-being (HAQ); range of possible scores is 0–3.

Dexterity      
Mobility0.71     
Physical activity0.540.62    
Emotional well-being0.510.540.65   
Social well-being0.600.610.650.72  
Pain0.550.630.650.620.57 
Fatigue0.470.540.660.620.590.61
PCS−0.58−0.67−0.64−0.59−0.58−0.55−0.47
MCS−0.43−0.47−0.61−0.68−0.66−0.56−0.62
HAQ§0.740.720.640.540.660.520.50

The mean domain scores (standardized from 0 to 100) for the 6 HAQ disability groups are shown in Table 4. Results of the ANOVA showed a statistically significant difference in domain means across the 6 HAQ groups (P < 0.0001), demonstrating discriminant validity for all domains of the revised CSHQ-RA. Mean scores of all revised CSHQ-RA domains increased with HAQ disability scores (Table 4).

Table 4. Discriminative validity of the revised CSHQ-RA relative to the HAQ known groups*
CSHQ-RA subscaleMean HAQ disability score ranges
0–0.5 (n = 20)0.5–1.0 (n = 35)1.0–1.5 (n = 60)1.5–2.0 (n = 77)2.0–2.5 (n = 69)2.5–3.0 (n = 24)
  • *

    CSHQ-RA = Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument; HAQ = Health Assessment Questionnaire.

  • Mean scores on the CSHQ-RA are standardized from 0 to 100.

  • P value of the analysis of variance is <0.0001. P value of the Kruskal-Wallis test is <0.0001.

Dexterity24.534.143.856.366.582.0
Mobility32.547.954.866.176.389.4
Physical activity32.350.862.175.784.191.4
Emotional well-being46.054.766.172.579.485.8
Social well-being32.041.653.560.273.184.7
Pain/discomfort53.765.871.878.586.390.7
Fatigue38.345.358.265.175.374.2

Seventy-eight patients reported a change of ≥1 in global health score from baseline visit to 12 weeks; for 144 patients, change in health was <1. Both the Wilcoxon signed rank test and the ANCOVA indicated that all domains of the revised CSHQ-RA were responsive to the change in patients' HRQOL from baseline visit to week 12. Wilcoxon's signed rank test (Table 5) showed that differences between mean scores at the 2 time points were significant for every domain (all P values <0.0001). The ANCOVA results confirmed that finding, showing a significant difference between groups for change in domain scores from baseline visit to week 12 (all P values <0.0001).

Table 5. Results of Wilcoxon's signed rank tests of revised CSHQ-RA for change over time (responsiveness)*
DomainDomain baseline scoreDomain 12-week scoreP
  • *

    Values are the mean ± SD unless otherwise indicated. CSHQ-RA = Cedars-Sinai Health-Related Quality of Life for Rheumatoid Arthritis Instrument.

  • Mean scores on the CSHQ-RA are standardized from 0 to 100.

Dexterity53.80 ± 19.7139.10 ± 17.93< 0.001
Mobility66.62 ± 19.3748.93 ± 18.74< 0.001
Physical activity73.97 ± 23.1558.01 ± 26.18< 0.001
Emotional well-being72.91 ± 19.8056.72 ± 19.18< 0.001
Social well-being61.99 ± 21.1447.82 ± 21.30< 0.001
Pain/discomfort81.37 ± 15.0660.60 ± 19.14< 0.001
Fatigue66.79 ± 24.0152.69 ± 20.32< 0.001

The cut-points between those patients who achieved an MCID on the HAQ and those who did not were determined for each domain of the revised CSHQ-RA. MCIDs were calculated to be 12.2 for emotional well-being, 10.9 for dexterity, 11.7 for physical activity, 12.5 for mobility, 6.2 for social well-being, 14.8 for pain/discomfort, and 9.1 for fatigue. Cut-points were calculated for domain scores standardized from 0 to 100.

All Pearson's correlation coefficients between domains of the revised CSHQ-RA and the original CSHQ-RA were significant (P < 0.0001), whereas correlations for matching domains were the highest (0.952–0.978). These results demonstrated that the revised CSHQ-RA is at least as effective an instrument for measuring HRQOL in RA populations as is the original CSHQ-RA.

DISCUSSION

To enhance the comprehensiveness and validity of the original CSHQ-RA, the revised CSHQ-RA instrument was developed and validated using data from patients in a phase IV, multicenter, open-label, single-arm study of RA treatment. The patient population and anakinra protocol held several advantages over the convenience sample used to develop and validate the original CSHQ-RA instrument (1, 2). This trial included a representative population of patients diagnosed with RA from across the US; this population was more ethnically diverse than the original patient pool, and represented a wider range of educational backgrounds. In addition, the anakinra trial provided a subpopulation of patients who showed a change in HRQOL over a 12-week treatment period and could therefore be studied to determine both the responsiveness of the instrument and the magnitude of MCID for the revised CSHQ-RA and each of its domains. A known MCID makes the revised CSHQ-RA an effective tool for studying changes in HRQOL in RA patient populations over time and across treatment groups. This important measure was not available for the original CSHQ-RA.

The revised CSHQ-RA consists of 36 items measuring 7 clinically distinct yet related quality of life domains for patients with RA: dexterity (6 items), mobility (6 items), activity (4 items), emotional well-being (9 items), social well-being (4 items), pain/discomfort (3 items), and fatigue (4 items). Some items in the pain/discomfort and fatigue domains were originally embedded in physical activities or sexual function subscales in the original CSHQ-RA. Social well-being is a new domain, which contains questions based on items in the physical activity domain of the original CSHQ-RA. The original “severity of overall pain and discomfort” item was replaced with 3 more specific items and became an individual domain. All revised CSHQ-RA domains appear to contain more distinct items than do the original domains. Because of the greater number of domains and items, and because of the specificity of items, the revised CSHQ-RA is a more comprehensive instrument than the original CSHQ-RA.

The subscale scores of the revised CSHQ-RA had high test–retest reliability (ICC 0.86–0.95) based on results from screening and baseline visits. There was also good internal consistency in each subscale of the instrument, as shown by Cronbach's alpha coefficients >0.88. Strong correlations between the revised CSHQ-RA domains and MOS SF-36 subscales and the HAQ demonstrated good convergent validity. As might be predicted, the items concerning physical disability were more strongly correlated with the HAQ and the PCS than with the MCS. The emotional well-being subscale had an equivalent correlation with both the MCS and the HAQ, and both of those correlations were stronger than the correlation with the PCS. As also expected, the social well-being domain of the revised CSHQ-RA was more strongly correlated with the MCS than with the PCS or the HAQ. Both the physical activity and pain/discomfort subscales had nearly equivalent correlations to the MCS, PCS, and HAQ. This pattern of correlation indicates that the revised CSHQ-RA successfully captures the various dimensions of HRQOL measured by multiple separate scales.

The development and validation of the revised CSHQ-RA addresses the shortcomings of the original CSHQ-RA, but some limitations are apparent in the current study. Although the additional items and domains make the revised instrument more comprehensive than the original, the length of the revised CSHQ-RA might limit its effectiveness and application in some circumstances; for example, a shorter instrument might be easier for severely disabled patients to complete. The 11-item CSHQ-RA Short Form has been developed to address this problem and should be used to make the assessment of HRQOL less burdensome for large RA population studies and more severely disabled patients (2, 4). However, the short form cannot replace the comprehensiveness of the full version. The longer version may be preferred when resources allow, especially when the goal of the study is to examine health in each specific domain. Another limitation is that there was no control group in the anakinra protocol. The study did not include a group for which there was no expected change in HRQOL to compare with the predicted change in the treatment group. Although the revised CSHQ-RA did measure differences between patients who reported a change in global health on the HAQ and those who did not, it would be useful to establish that the revised CSHQ-RA can detect differences between control and treatment groups in RA treatment trials. To determine whether the revised CSHQ-RA can accurately measure the effectiveness of RA treatments, it warrants study in a therapeutic design that compares changes in HRQOL for control and treatment groups. Additionally, it has not been proven that a 1-point change on the SF-36 global health question would result from a change in clinical management. However, experts believe that a 1-point change on a 5-point Likert scale is equivalent to a 20% improvement in a patient's general health, and therefore the change should be meaningful to most patients. Finally, this study uses the SF-36 as a criterion measure of HRQOL. There is no existing RA-specific gold-standard measure for comparison; we recognize that our choice of the SF-36 may therefore limit criterion validity.

Results of this study show that the revised CSHQ-RA is a reliable, valid instrument for measuring HRQOL in patients with RA. The revised CSHQ-RA correlated well with other commonly used generic instruments, the HAQ and the MOS SF-36, but is more comprehensive and is uniquely relevant to patients with RA. The questions on the revised CSHQ-RA are specifically tailored to measure issues and concerns in 7 different areas related to HRQOL in RA patient populations. The revised RA-specific instrument is sensitive to change in overall health and in each domain. Applying the MCID of the revised CSHQ-RA and each of its 7 subscales will allow researchers to track the progress of patient populations in treatment trials as well as to examine which specific areas do or do not show improvement. The revised CSHQ-RA will therefore prove to be a useful tool in evaluating expanded treatment options for RA. The instrument is both more sensitive and more specific than other instruments not designed to specifically measure the issues affecting HRQOL of patients with RA.

Ancillary