Development and validation of Cedars-Sinai Health-Related Quality of Life in Rheumatoid Arthritis (CSHQ-RA) short form instrument

Authors


Abstract

Objective

To develop an abridged version of the 33-item Cedars-Sinai Health-Related Quality of Life in Rheumatoid Arthritis instrument (CSHQ-RA) and test the validity and reliability of the abridged instrument.

Methods

Items from the original 33-item, 5-domain CSHQ-RA were assessed using psychometric and regression analyses of survey responses from 274 patients with rheumatoid arthritis. Items were retained in the final instrument based on statistical analysis and evaluation by an expert panel. Test-retest reliability, internal consistency, convergent and discriminant validity, and ceiling and floor effects were examined for the shortened CSHQ-RA.

Results

Statistical analysis and expert assessment yielded an 11-item instrument including questions in 4 domains. Test-retest reliability and internal consistency were high and the instrument showed good convergent and discriminant validity.

Conclusion

The abridged CSHQ-RA short form is a valid and reliable instrument that can be used to examine the impact of RA on patients' health-related quality of life. Prospective validation in clinical trial settings is warranted.

INTRODUCTION

Rheumatoid arthritis (RA) is a chronic and progressive inflammatory musculoskeletal disease affecting approximately 2.1 million Americans (1–4). The course of the illness can result in joint destruction, impaired psychological and social functioning, and increased mortality (5). Economically, the disease is responsible for $4.2 billion in medical costs annually (6). Significant resources have been devoted to developing new treatments in an effort to reduce the impact of RA at the individual and societal levels. Accompanying this effort is the need to accurately assess changes in different aspects of patients' health, including physical function and health-related quality of life (HRQOL), due to disease activity and therapeutic efficacy.

Assessing patient-centered outcomes in RA has become a high priority for patients and providers, particularly in light of several new and effective treatment options. Currently, there is no single clinical or laboratory variable that can completely measure disease activity or treatment response in RA. To complement existing instruments, such as the Stanford Health Assessment Questionnaire (HAQ) (7), the modified HAQ (8), and the McMaster Toronto Arthritis Patient Preference Disability Questionnaire (9), we developed a multidimensional disease-specific instrument, the Cedars-Sinai Heath-Related Quality of Life in Rheumatoid Arthritis instrument (CSHQ-RA; see Appendix A, available at the Arthritis Care & Research Web site at http://www.interscience.wiley.com/jpages/0004-3591:1/suppmat/index.html) (10). It is intended to reflect the impact of RA on patients' HRQOL. HRQOL refers to those aspects of human life and activities that are generally affected by health conditions or health services (11).

Although found to be valid and reliable, there may be situations, such as application of the instrument in a large-scale study, where the 33-item CSHQ-RA would present too much of a response burden. Consequently, we decided to reduce the potential response burden by identifying a subset of items that most adequately measure the true construct of each of the 5 included domains. The present study was designed to develop a shorter RA-specific HRQOL instrument and to examine the test-retest reliability and convergent and discriminant validity (i.e., the ability to discriminate between levels of physical disability) of the shortened CSHQ-RA.

SUBJECTS AND METHODS

Institutional Review Board approval was obtained from the Cedars-Sinai Health System (Los Angeles, California) in January 2001 for survey administration.

Participants.

A total of 350 adult, English-speaking patients with RA identified by their treating physicians from 13 rheumatology practices in the metropolitan Los Angeles area agreed to participate in the study, as described previously (10). Among them, 276 completed both the initial and followup surveys and were included in the study population. During analysis, 2 subjects were inadvertently given the same subject identification number and had to be dropped, so the total number of patients included in analyses was 274. Table 1 lists sociodemographic and medical characteristics of the study population.

Table 1. Sociodemographic and medical characteristics*
CharacteristicFemale n = 232 (85%)Male n = 41 (15%)Total n = 274
  • *

    Data presented as no. (%) unless otherwise indicated. RA = rheumatoid arthritis.

  • One patient did not indicate sex on their survey.

  • Sum not always equal to 274 due to missing data.

  • §

    Visual analog scale score range is from 0 (no pain) to 100 (most severe pain possible).

Age, mean ± SD (range) years57 ± 14 (20–92)59 ± 13 (25–80)57 ± 14 (20–92)
Education   
 Grades 1–115 (2.2)1 (2.4)6 (2.2)
 High school38 (16.5)8 (19.5)46 (17.0)
 Some college57 (24.8)7 (17.1)64 (23.6)
 2-year college graduate33 (14.4)4 (9.8)37 (13.7)
 4-year university graduate40 (17.4)10 (24.4)50 (18.5)
 Attended or completed graduate school57 (24.7)11 (26.8)68 (25.0)
Ethnicity   
 White, non-Hispanic172 (74.8)33 (80.5)205 (75.7)
 Black, African American13 (5.7)3 (7.3)16 (5.9)
 Asian15 (6.5)1 (2.4)16 (5.9)
 Latino, Mexican American16 (7.0)2 (4.9)18 (6.6)
 Other14 (6.0)2 (4.9)16 (5.9)
Marital status   
 Single, never married22 (9.5)5 (12.2)27 (9.9)
 Married132 (57.1)28 (68.3)160 (58.8)
 Separated6 (2.6)1 (2.4)7 (2.6)
 Divorced31 (13.4)1 (2.4)32 (11.8)
 Widowed35 (15.2)2 (4.9)37 (13.6)
 Living with significant other but not married5 (2.2)4 (9.8)9 (3.3)
Employment status   
 Working for pay (full time)63 (27.5)17 (41.5)80 (29.6)
 Working for pay (part time)26 (11.4)3 (7.3)29 (10.7)
 Student (full or part time)3 (1.3)0 (0)3 (1.1)
 Full-time homemaker27 (11.8)0 (0)27 (10.0)
 Retired71 (31.0)16 (39.0)87 (32.3)
 Self employed12 (5.2)4 (9.8)16 (5.9)
 Unemployed27 (11.8)1 (2.4)28 (10.4)
Duration since diagnosis of RA, years   
 118 (9.4)0 (0)18 (8.0)
 2–553 (27.5)14 (43.8)67 (29.9)
 6–1033 (17.2)6 (18.8)39 (17.4)
 11–2052 (27.1)8 (25.0)60 (26.8)
 21–3022 (11.5)3 (9.4)25 (11.2)
 31–5014 (7.3)1 (3.0)15 (6.7)
Severity of joint pain, mean ± SD (range)§47 ± 26 (0–100)36 ± 27 (0–100)45 ± 26 (0–100)

Administration and materials.

The initial survey mailing included a 58-item CSHQ-RA draft questionnaire, a sociodemographic and medical questionnaire, and 3 general health status questions: patient's global assessment of functioning (visual analog scale [VAS]) (12); patient's global assessment of joint pain (VAS); and an overall rating of health (item 1 from the Medical Outcomes Study Short Form 36 (SF-36), “In general, how would you say your health is?” (13).

A reminder phone call was made to those who did not return the first mailing within 1 week. After 4 weeks, patients responding to the first survey mailing were sent the CSHQ-RA draft questionnaire a second time along with the SF-36 and the HAQ.

The 58 items of the CSHQ-RA draft were selected and refined from an item pool comprising 324 items derived from 24 existing HRQOL instruments and patient focus groups. Items were selected by a panel of rheumatologists, psychometricians, and 11 patients with RA. Factor analysis and multitrait scaling resulted in 33 items that examine 5 domains: physical activity (PA), dexterity (D), mobility (M), emotional well being (EWB), and sexual function (10). For the CSHQ-RA, a higher score indicates poorer HRQOL. The 33-item CSHQ-RA was validated against the HAQ and the SF-36. The validation results support the construct validity, discriminant validity, and reliability of the instrument as a measure that captures the impact of RA on patients' quality of life (14).

The SF-36 is a 36-item non–disease-specific self-reported measure that assesses 8 domains of patients' HRQOL. Results from several studies of RA patients demonstrated the reliability and construct validity of the SF-36. Two summary scales, the mental component summary (MCS) and the physical component summary (PCS) can be calculated where higher scores represent better health (13). The MCS consists of the vitality, social functioning, role-emotional, and mental health domains and the PCS consists of the physical functioning, role-physical, bodily pain, and general health domains.

The HAQ is a 20-item, self-administered, arthritis-specific instrument used to assess patients' level of physical disability. The HAQ has been extensively validated in arthritis (including RA) samples. The instrument consists of 4 sections: disability, discomfort and pain, drug side effects, and dollar costs (7). Only the disability index was used in the present study. The index yields a score ranging from 0 (no physical difficulty) to 3 (severe physical difficulty); a higher score indicates a more severe condition. Several studies have reported that a change of approximately 0.5 in the HAQ disability index score indicates a clinically important change in health status (15, 16).

Assessment.

The sexual function domain contained only 2 items and the response rate of these 2 items was relatively lower than other items (indicating low generalizability) according to the data used for the development of the 33-item instrument. Therefore, these 2 items were removed prior to further studies for item reduction. The remaining 4 domains were subjected to both regression and item-response analyses. The goal of these analyses was to inform experts' selection of a subset of items from each domain that most accurately measured the true construct of each domain.

The regression models used the sum of the item scores in each domain as the dependent variable and individual item scores in that domain as the independent variables. To avoid singularity, 1 item in each domain was eliminated for each regression; the item eliminated was the one with the smallest correlation with the dependent variable (total score). Both forward stepwise and all subset regression models were used and the best model of no more than 2 variables per domain was selected based on the size of the multiple R2.

The psychometric technique, Rasch analysis (1-parameter item response theory [IRT]) (17), was also used to reduce the number of items by identifying the items in each domain that can most likely get evenly distributed responses among response categories from individuals. The program used for this analysis was NCSS2000 (Kaysville, UT, 2001). For each item in the instrument, IRT examined the probability that patients would give a particular response and determined a slope function across the 5 possible responses. A slope value <2.00 was set as a threshold for retaining items in this analysis.

Additional criteria used as references for eliminating items included 1) if responses to that item were missing on more than 5% of patient surveys; 2) if >50% of respondents chose either the highest (ceiling effect) or lowest (floor effect) rating for that item; and 3) if a significant item-to-item correlation (≥0.70) was found.

Three experts (one rheumatologist, noted as expert A in Table 2, and 2 psychometricians, noted as experts B and C in Table 2) reviewed the regression and IRT analysis results and provided subjective judgment against those results for each of the remaining 31 items. The experts then discussed the results and selected items for the shorter instrument based on consensus. All 3 experts participated in the development and validation study of the CSHQ-RA and have extensive experience and knowledge in both rheumatology and HRQOL fields.

Table 2. Summary results of statistical analysis and experts' opinions*
ItemStep-wise regressionIRT analysisExpert AExpert BExpert C
  • *

    Items in bold were selected for the short-form Cedars-Sinai Health-Related Quality of Life in Rheumatoid Arthritis instrument. Mobility (M) 4, dexterity (D) 5 and 33, emotional well being (EWB) 21, and physical activity (PA) 28 and 31 are the worst items based on statistical analyses. Items 29 and 30 are the 2 in sexual function domain.

  • The regression analysis retained the 2 items in each domain with greatest contribution to the total R2.

  • Items were retained based on the ‘discrimination parameter or slope’ value from the item response theory (IRT) analysis. The item with a higher slope represents higher level of difficulty. Any item with a slope value <2.00 was retained in this analysis.

  • §

    Expert C suggested keeping both dexterity 8 and dexterity 10, but preferred dexterity 8 over dexterity 10 because it is less similar to dexterity 11.

  • Expert B found PA 25 unclear. Otherwise, expert B is indifferent between it and PA 26.

M 1 DifficultRetainDeleteDelete
M 2 DifficultRetainRetainDelete
M 3RetainRetainDeleteRetainRetain
M 4 DifficultDeleteDeleteDelete
D 5 DifficultDeleteDeleteDelete
D 6 DifficultDeleteDeleteDelete
D 7 DifficultRetainRetainDelete
D 8RetainRetainDeleteDeleteRetain§
D 9 DifficultRetainRetainDelete
D 10 DifficultRetainRetainRetain§
D 11RetainRetainRetainRetainRetain
M 12RetainDifficultDeleteRetainDelete
M 13 DifficultRetainDeleteDelete
M 14 RetainRetainRetainRetain
PA 15RetainRetainRetainRetainRetain
PA 16 RetainDeleteDeleteDelete
EWB 17 DifficultDeleteRetainRetain
EWB 18RetainDifficultRetainRetainRetain
EWB 19 DifficultDeleteRetainDelete
EWB 20 DifficultDeleteDeleteDelete
EWB 21 DifficultDeleteDeleteDelete
EWB 22RetainRetainRetainRetainRetain
EWB 23 RetainDeleteDeleteDelete
EWB 24 DifficultRetainDeleteDelete
PA 25 DifficultRetainDeleteRetain
PA 26 DifficultDeleteRetainDelete
PA 27 DifficultDeleteRetainDelete
PA 28 DifficultDeleteDeleteDelete
PA 31 DifficultDeleteDeleteDelete
PA 32RetainRetainRetainRetainRetain
D 33 DifficultDeleteDeleteDelete

Test-retest reliability for the 11-item CSHQ-RA was assessed using intraclass correlation coefficients (ICCs) (18). Responses to the global health question of the SF-36 (13) were used to identify participants with stable health status—that is, whose responses did not change across baseline and 4 weeks later. ICCs were calculated for the total score at the 2 time points for this subpopulation only. The threshold for test-retest reliability was defined as ICC ≥ 0.70 (19, 20).

To assess internal consistency of the shortened instrument, Cronbach's alpha and item-to-total score correlations (measured with Pearson correlation coefficients) were calculated for each of the 11 items. The criterion used for good overall internal consistency was Cronbach's alpha coefficient ≥0.70 (19) and the criterion for good item-internal consistency was item-to-total score correlation of ≥0.40 (21).

Construct validities, including convergent and discriminant validity, were tested. Tests for convergent validity were performed by calculating the Pearson correlation coefficients between the summed 11-item CSHQ-RA score and the SF-36 MCS, PCS, and the HAQ disability index. High negative correlation coefficients would indicate good convergent validity of the CSHQ-RA and the MCS and the PCS, whereas high positive correlation coefficients would indicate good convergent validity of the CSHQ-RA and the HAQ. We hypothesized that all of these correlations would be significant and the one with the PCS or HAQ would be greater than the one with MCS.

To determine whether the 11-item CSHQ-RA can measure differences between patients or changes in patient status over time, survey respondents were divided into 6 groups of severity based on their HAQ scores. For example, patients were included in the lowest severity group if their HAQ score was between 0 and 0.5 and in the highest severity group if their HAQ score was between 2.5 and 3. Mean total CSHQ-RA scores for these 6 groups were compared using analysis of variance (ANOVA) and a Kruskal-Wallis test.

All statistical analyses used for instrument validation except ICC were performed using the Statistical Analysis System (SAS) version 6.12 for Windows (SAS Institute, Cary, NC). ICC was calculated with SPSS (Chicago, IL). It is worth noting that data analyzed in the present study are the sum of responses to several items and the central limit theorem helps to support approximate normality of such scores, even though the responses to each item of the CSHQ-RA are ordinal.

RESULTS

Item reduction.

Table 2 lists each of the 33 items from the long form CSHQ-RA, summarizing the results of the statistical analyses and expert evaluations. The step-wise regression indicated 8 items to be retained, including the 2 items from each domain that contributed most heavily to the R2. The IRT analysis generated slope values <2.00 for 9 of the 33 CSHQ-RA items, and those items were retained by the analysis. The items retained by the 2 analyses overlapped considerably: the IRT analysis dropped only 1 item that was retained by the regression and retained 2 that were not included in the best model from the regression analysis.

With the results of the regression and IRT analyses as references, the expert panel selected items for the final CSHQ-RA short form. The 3 experts independently selected 14, 16, and 11 items to retain, prior to their discussion. Eight items (M 4; D 5, 6, and 33; EWB 20 and 21; and PA 28 and 31) were rejected by all 3 experts and both statistical analyses, and were dropped from the instrument. One additional item (PA 16) was retained by the IRT analysis but rejected by all 3 experts and so was dropped. Seven items (D 10 and 11; PA 15 and 32; EWB 18 and 22; and M 14) unanimously selected by the experts were included in the final instrument, even though 3 of them were indicated as rejected by the regression analysis (M 14), the IRT analysis (EWB 18), or both (D 10). Among the remaining 15 items subjected to discussion among experts for the second round of selection, 3 additional items (M 2 and 3 and EWB 17) were selected because at least 2 experts agreed during the discussion and the statistical analyses supported the decision. Two items (D 7 and 9) were first selected by 2 experts even though rejected by the statistical analyses. However, they were not retained at the end after experts' discussion due to the belief that D 10 and 11 represented the same domain better and because it was important to limit the number of items of a single domain as much as possible. The final item selected was PA 25 because it was judged to be valuable to the instrument by consensus of the experts. The final shortened instrument (see Appendix B, available at the Arthritis Care & Research Web site at http://www.interscience.wiley.com/jpages/0004-3591:1/suppmat/index.html) included 11 items: 2 in the dexterity domain (D 10 and 11) and 3 each in the mobility (M 2, 3, and 14), physical activity (PA 15, 25, and 32), and emotional well being (EWB 17, 18, and 22) domains.

Table 3 shows the percentage of respondents choosing highest and lowest ratings for each of the 11 items included in the final short form of the CSHQ-RA. No items showed floor or ceiling effects. There was also no item-to-item correlation ≥0.70.

Table 3. Response distribution for each item*
ItemMean ± SD% in lowest rating (=1)% in highest rating (=5)
  • *

    Based on response at the baseline; n is not always equal to 274 due to missing data. M = mobility; D = dexterity; PA = physical activity; EWB = emotional well being.

M 2 (n = 268)2.01 ± 1.0039.21.1
M 3 (n = 270)2.49 ± 1.2626.77.4
D 10 (n = 265)2.11 ± 1.1036.62.3
D 11 (n = 264)2.67 ± 1.1817.48.7
M 14 (n = 265)3.31 ± 1.023.89.1
PA 15 (n = 268)3.03 ± 1.1613.19.3
EWB 17 (n = 267)3.09 ± 1.2212.416.5
EWB 18 (n = 268)2.96 ± 1.2415.713.1
EWB 22 (n = 268)3.42 ± 1.248.225.8
PA 25 (n = 272)3.33 ± 1.3713.627.2
PA 32 (n = 270)2.33 ± 1.1732.64.4

Instrument validation.

Fifty-seven percent (n = 157) of the 274 patients who completed both baseline and followup surveys reported stable health across the 4-week time frame and were included in the analysis of test-retest reliability. The ICC for the baseline and followup total scores was 0.884 (95% confidence interval 0.841–0.916), well above the 0.70 threshold. The 11-item CSHQ-RA met the criteria for both overall internal consistency and item-internal consistency. Raw and standardized Cronbach's alphas were 0.907 and 0.909, respectively, indicating high overall internal consistency. Table 4 gives item-to-total score Pearson correlation coefficients for the 11 CSHQ-RA items. Coefficients for all items were significantly >0.40 (P < 0.0001 for each item).

Table 4. Item to total score correlation (n = 251)*
ItemPearson correlation coefficient
  • *

    Based on response at the baseline; n is not equal to 274 due to missing data. See Table 3 for abbreviations.

  • All items have a P value < 0.0001.

M 20.712
M 30.780
D 100.649
D 110.730
M 140.767
PA 150.809
EWB 170.804
EWB 180.664
EWB 220.494
PA 250.807
PA 320.738

Convergent validity was confirmed by significant correlations between the 11-item CSHQ-RA and SF-36 MSC, SF-36 PCS, and HAQ. The Pearson correlation coefficients for the SF-36 MSC and SF-36 PCS total scores were −0.497 and −0.760, respectively. The correlation coefficient for the HAQ total scores was 0.780. All 3 correlation coefficients were significant at P < 0.0001. Discriminant validity was also confirmed. Table 5 contains mean CSHQ-RA scores of each of the 6 RA severity subgroups based on HAQ scores. ANOVA and Kruskal-Wallis tests both indicate that total CSHQ-RA score varies significantly across HAQ subgroups (P < 0.0001 for both tests). All these results indicated that the 11-item CSHQ-RA has good construct validity and test-retest reliability as an RA-specific HRQOL instrument.

Table 5. Mean total score of the 11-item CSHQ-RA relative to the HAQ*
HAQ score subgroup0–0.5 (n = 82)0.51–1.0 (n = 65)1.1–1.5 (n = 37)1.6–2.0 (n = 22)2.1–2.5 (n = 8)2.5–3.0 (n = 5)
  • *

    P value of the ANOVA is < 0.0001. P value of the Kruskal-Wallis test is < 0.0001. CSHQ-RA = Cedars-Sinai Health-Related Quality of Life in Rheumatoid Arthritis instrument; HAQ = Health Assessment Questionnaire; ANOVA = analysis of variance.

  • The 11-item CSHQ-RA score can range from 0 to 55 with higher scores indicting poorer health-related quality of life.

Mean CSHQ score26.1232.3237.2441.8644.1345.8

DISCUSSION

Reduction of the 33-item CSHQ-RA instrument resulted in an 11-item survey with questions in 4 of the 5 original domains: mobility, dexterity, activity, and emotional well being. The shortened CSHQ-RA instrument has good internal consistency according to Cronbach's alpha and all item-to-total score correlations, even though items were selected from different domains. This result suggests that all items are in close agreement with each other on this measurement. The results of the present study also indicate that the 11-item CSHQ-RA has good reliability for measuring HRQOL in adults with RA. Responses were highly reproducible over a 4-week test-retest timeframe. None of the survey items showed ceiling or floor effects, and thus the instrument will be useful in measuring HRQOL for a wide spectrum of patients with RA.

Results from the present investigation show that the shortened CSHQ-RA has good convergent validity. The 11-item total score correlated well with 2 widely used and reliable instruments, the HAQ and the PCS of the SF-36. Correlation with the SF-36 MCS was moderate, though still significant. The lower correlation was not surprising, however, because there are only 3 items on the shortened CSHQ-RA that measure emotional well being, the focus of the MCS. The correlation between the sum of the 3 emotional well being items and the MCS would likely be higher. The significant total score correlation with the MCS indicates that emotional and physical well being may be largely related in patients with RA (22–25). The instrument also shows good discriminant validity, distinguishing levels of severity in RA. When patients in the study population were grouped by HAQ score, which has been demonstrated to measure clinically important changes in health status (15, 16), there were significant differences between those groups. These results indicate that total score on the 11-item CSHQ-RA can be used to measure differences in severity of RA.

Several characteristics of the study population upon which the CSHQ-RA is based might not be consistent with the general RA population. Patient responses were collected from a convenience sample from an urban section of Southern California. Although the sex distribution of the sample (predominantly female) was approximately the same as the typical RA population (4), the sample was skewed toward a population with a relatively high socioeconomic status who classified themselves as non-Hispanic whites. In addition, few of the patients (5%) reported having severe physical disability. Indeed, several patients dropped out, citing increases in their physical disability as the reason they could not continue. The survey materials received by the study population included a 58-item draft version of the CSHQ-RA, as well as the SF-36 and the HAQ, to be completed on 2 separate occasions (10). It is likely that the magnitude of that pencil-and-paper task selected for a higher-functioning RA population. The shortened form of the survey would be far easier for more severely disabled patients with RA to complete and, consequently, should be more useful for studying the full spectrum of patients with RA in future studies. As described above, none of the items showed a ceiling effect, so there is sufficient room at the top of the scale for responses from more severely disabled patients. On average, a greater proportion of responses by the current study population fell into the lowest rating than in the highest rating (see Table 3): 19.9% of responses to all items were “1” whereas only 11.4% of responses were “5.” Thus the 11-item CSHQ-RA should be particularly useful for accommodating severely disabled patients with RA who might otherwise be left out of study populations.

The cross-sectional nature of the present study did not allow for testing the CSHQ-RA's ability to detect clinically important changes in health status. We have not yet determined how much improvement in CSHQ-RA scores represents a clinically meaningful change. Inclusion of the 11-item CSHQ-RA in several controlled trials will be necessary to validate it for use in clinical trials. Furthermore, prospective longitudinal studies will also be helpful toward validation of the 11-item CSHQ-RA for longitudinal studies.

Results from the present study provide evidence for the internal consistency, convergent and discriminant validity, and test-retest reliability of the 11-item CSHQ-RA in the measurement of HRQOL in individuals with RA. The comprehensiveness of the 33-item version cannot be replaced by the 11-item version, and the long version may be preferred when resources allow, especially when the goal of the study is to examine the health in each specific domain of patients with RA. The shorter version of the CSHQ-RA, however, will make the assessment of HRQOL less burdensome in large RA population studies and for more severely disabled patients.

Ancillary