SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Objective

The Health Assessment Questionnaire (HAQ) disability index (DI) is the most common self-reported measure of physical disability in rheumatoid arthritis (RA). Recently, the HAQ-II was developed in the US as a short, valid, and reliable alternative using Rasch analysis. Our objective was to compare the scaling properties of the HAQ DI and HAQ-II in Dutch patients with RA.

Methods

We used data from 472 patients with confirmed RA. Internal construct validity of the HAQ versions was assessed using Rasch analysis. Additionally, external construct validity was assessed by examining correlates with other outcome measures.

Results

The HAQ DI had a large floor effect, with 9.5% of the patients indicating no disability compared with 4.3% for the HAQ-II. Both versions were unidimensional and adequately fit the Rasch model, containing only 1 nonfitting item. Additionally, 2 HAQ-II items demonstrated overfit and a high residual correlation, suggesting overlap or redundancy in item content. The HAQ-II demonstrated better item separation, indicating that it covered a wider range of physical function. Item difficulty estimates were reasonably well spread for the HAQ-II, whereas the HAQ DI items tended to cluster around similar difficulty levels. Both scales contained several items with differential item functioning by sex, age, or disease duration. Both scales demonstrated the expected pattern of correlations with other outcome measures.

Conclusion

The results indicate that both the HAQ DI and HAQ-II are psychometrically robust measures of physical function. The Rasch-developed HAQ-II, however, has several favorable scaling properties, including a better scale length and a reduced floor effect.

INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Patient assessment of physical function is one of the core measures of clinical trials and observational studies of patients with rheumatoid arthritis (RA) (1–3). Over the years, the Health Assessment Questionnaire (HAQ) disability index (DI) has become the measure of choice for assessing self-reported disability in RA (4, 5). Although the HAQ DI has proven to be reliable, valid, and responsive (6, 7), it is not without its limitations. In particular, its reduced sensitivity in patients with lower levels of disability due to a floor effect and the nonlinear nature of the scale have been repeatedly noted (8–13).

In an effort to overcome these problems, Wolfe et al recently developed a revised version of the HAQ DI, the HAQ-II (14). Using Rasch analysis on a set of 31 items, including the 20 items from the HAQ DI, they selected those 10 items that best balanced the concerns of item fit, scale length, and evenly spaced items. The resulting HAQ-II showed excellent scaling properties, a reduced floor effect, and similar convergent validity and sensitivity to change as the original HAQ DI. The aim of the present study was to examine the construct validity of the Dutch versions of the HAQ DI and HAQ-II in a cross-sectional sample of patients with RA.

PATIENT AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Patients and study design.

The data for this study were collected at the outpatient rheumatology clinic of Medisch Spectrum Twente in Enschede, The Netherlands. During 3 waves of data collection between 2005 and 2007, all patients visiting the clinic were asked to complete a questionnaire consisting of demographic questions and standard self-reported measures of disease activity and health status. In total, 1,363 unique patients were included during the 3 study periods. From this sample, all patients with confirmed RA were selected, resulting in a cross-sectional sample of 472 patients. For patients with multiple visits during the study periods, data from the first visit were used in the analysis.

Measures.

The HAQ DI contains 20 items measuring physical disabilities over the past week in 8 categories of daily living: dressing and grooming, rising, eating, walking, hygiene, reach, grip, and activities (5). Each item of the HAQ DI is scored on a 4-point rating scale from 0 (without any difficulty) to 3 (unable to do). The overall HAQ DI score is calculated by summing and averaging the highest item score of each category when at least 6 categories are completed, essentially reducing the HAQ DI to an 8-item scale (14). The overall score ranges from 0 to 3, where scores of 0–1 are generally considered to represent mild to moderate disability, 1–2 moderate to severe disability, and 2–3 severe to very severe disability (6). We used the standard scoring method, which corrects for the use of devices or assistance from others (7). The validated Dutch version of the HAQ DI used in this study is a literal translation of the current US version with one important modification (15). In the Dutch version, the metric weight in the item “reach and get down a 5-pound object (such as a bag of sugar) from just above your head” has been reduced to 1 kg, making the task easier to complete (16).

The HAQ-II consists of 10 items: 5 items from the original HAQ DI and 5 additional items. All items are scored using the same 4-point response scale as the HAQ DI. Following the original validation study of the HAQ-II (14), we added 11 disability items to the patient questionnaire. The additional items were literally translated into Dutch by 2 bilingual individuals, using a forward-backward translation procedure. The HAQ-II is scored by simply taking the mean of the items when at least 8 items are completed, also resulting in a score from 0 to 3, with higher scores indicating more disability.

Besides the HAQ DI and the additional items for the HAQ-II, patients completed the Medical Outcomes Study 36-Item Short Form Health Survey (SF-36; version 2) (17) and numerical rating scales (NRS) for pain (NRS-P) and general health (NRS-GH). The SF-36 has 8 scales that can be aggregated into a physical component summary (PCS) and mental component summary (MCS) score. The scales and summary scores range from 0 to 100, with higher scores representing better health status. The component summary scores are standardized using normative data from the 1998 US general population with a mean score of 50 and an SD of 10. The NRS-P and NRS-GH consisted of 11-point rating scales ranging from 0 (“no pain” or “very good”) to 10 (“unbearable pain” or “very bad”).

Statistical analysis.

Internal construct validity of the HAQ DI and HAQ-II was assessed using Rasch analysis, while external validity was assessed by testing for expected associations with other established outcome measures in RA. Rasch analyses were performed with Winsteps software, version 3.60 (Winsteps, Chicago, IL). All other analyses were performed using SPSS software, version 14.0 (SPSS, Chicago, IL).

The Rasch model is a 1-parameter item response theory model that assumes that the probability of a certain response to a questionnaire item is a function of the person's ability on the underlying dimension being measured by the scale and the difficulty of the item (18, 19). The model asserts that the easier the item (in the case of the HAQ, the ability to perform an activity), the more likely it will be answered affirmatively, and the more able a person is, the more likely he or she will affirm an item compared with a less able person (20). When data are fitted to the Rasch model, both person ability and item difficulty are calibrated in log-odd units (logits) on a common interval-level scale.

Rasch analysis provides a powerful method for evaluating the internal construct validity of a scale. First, Rasch analysis is useful in testing whether the items of a scale measure a single, unidimensional construct. Moreover, it can be used to evaluate whether the items are arranged hierarchically, with sufficient spread in difficulty to measure the full range of the underlying construct. Redundant items (items that are too easy or too difficult or items with the same item difficulty) and large gaps in difficulty between items can be identified to examine the efficiency and precision of the scale. Finally, another useful attribute of Rasch analysis is that it allows for the identification of items with differential item functioning (DIF; also called item bias, i.e., items that have different levels of difficulty across subgroups of patients after controlling for overall ability). In recent years, Rasch analysis has been increasingly and successfully used in the development and evaluation of functional disability questionnaires in rheumatology (10, 13, 21–28).

Because the HAQ DI and the HAQ-II consist of polytomously scored items with ordered response categories, the unrestricted partial credit model, which does not require the distance between item thresholds to be equal across items, was applied throughout the current analyses (29). As all Rasch models assume that the items in a scale are unidimensional and locally independent, these assumptions were tested within the Rasch analysis process. Unidimensionality and fit of the HAQ DI and HAQ-II to the Rasch model were first assessed by examination of the information-weighted mean square (InFit MNSQ) and outlier-sensitive mean-square (OutFit MNSQ) fit statistics for each item. MNSQ values are the ratio between observed and predicted variance and have an expected value of 1.0. Higher values suggest that the item is inconsistent with the Rasch model or does not measure the same underlying dimension as the other items. Lower values indicate that the item measures redundant or overlapping item content. MNSQ values between 0.7 and 1.3 were considered acceptable (30). Additionally, a principal component analysis of the standardized residuals was performed. Once the Rasch factor has been extracted, there should be no secondary structures (factors) left in the data. The following rules of thumb were used to confirm unidimensionality: >60% of the variance explained by the Rasch factor and an eigenvalue and explained variance of the first residual factor <3.0 and <5%, respectively (31). Finally, residual correlations between pairs of items were examined. A relatively high residual correlation (e.g., >0.5) (32) between 2 items indicates that these items are not locally independent and can also point to highly overlapping or redundant items or the existence of some other shared dimension.

The efficiency and precision in measuring the underlying disability construct was examined by inspection of the item difficulty calibrations of the scales. Ideally, item difficulty levels (in logits) should be spread across a wide range of ability. Additionally, person and item separation and reliability indices were examined. The item separation index gives an estimate of the potential range of item difficulty covered by the scale (scale length), with larger values indicating a greater spread of items. The person separation index indicates the extent to which the items can distinguish between statistically different levels of person ability. Values >2.0 were considered acceptable, as this corresponds to the ability of the scale to differentiate 3 distinct levels of ability (e.g., high, medium, and low ability). Person reliability is an indicator of the degree to which the items measure persons in a consistent manner and is analogous to Cronbach's alpha, where values >0.7 are required for group use and >0.85 for individual patient use (33).

Possible DIF was evaluated between subgroups of patients based on sex, age, and disease duration. Age and disease duration were split at the median to create high and low subgroups. The presence of uniform DIF was assessed using the Rasch approach implemented in Winsteps (31). Items were considered to display substantial DIF when the difference between the separate item calibrations was statistically significant as determined by the t-test and the size of difference was at least 0.5 logits (34–36).

Additionally, we assessed the agreement and convergent validity of the scales. Agreement between the Rasch-transformed scores of the HAQ DI and HAQ-II was assessed by the Bland-Altman approach (37). Convergent validity was tested by correlating both the raw and the Rasch-transformed scores of the scales with the SF-36 PCS and MCS, the NRS-P, and the NRS-GH. It was hypothesized a priori that the scales should be strongly associated (r >0.6) with the SF-36 PCS, which is intended to measure a similar construct of physical functioning, and moderately (r = 0.3–0.6) with the NRS-P, NRS-GH, and SF-36 MCS, which are related but conceptually distinct constructs.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

The demographic and clinical characteristics of the study sample are listed in Table 1. Physical disability of the included patients was generally mild to moderate, with 46% of the patients scoring <1.0 on the HAQ DI and 41% between 1.0 and 2.0 (6).

Table 1. Demographic and clinical characteristics of the study sample*
CharacteristicValue
  • *

    IQR = interquartile range; HAQ DI = Health Assessment Questionnaire disability index; HAQ-II = Health Assessment Questionnaire II; SF-36 = Medical Outcomes Study 36-Item Short Form; PCS = physical component summary; MCS = mental component summary; NRS-P = numerical rating scale for pain; NRS-GH = numerical rating scale for general health.

Sex, % 
 Female69.7
 Male30.3
Age, years 
 Mean ± SD59.6 ± 14.2
 Median (IQR)59.0 (51.0–70.0)
Disease duration, years 
 Mean ± SD10.5 ± 11.2
 Median (IQR)7.0 (2.0–16.0)
HAQ DI (range 0–3) 
 Mean ± SD1.1 ± 0.7
 Median (IQR)1.0 (0.5–1.6)
HAQ-II (range 0–3) 
 Mean ± SD1.0 ± 0.7
 Median (IQR)1.0 (0.5–1.5)
SF-36 PCS (range 0–100) 
 Mean ± SD36.8 ± 9.2
 Median (IQR)36.8 (30.6–43.5)
SF-36 MCS (range 0–100) 
 Mean ± SD47.8 ± 11.9
 Median (IQR)48.7 (39.5–58.2)
NRS-P (range 0–10) 
 Mean ± SD4.5 ± 2.8
 Median (IQR)4.0 (2.0–7.0)
NRS-GH (range 0–10) 
 Mean ± SD4.2 ± 2.5
 Median (IQR)4.0 (2.0–6.0)

Both the HAQ DI and HAQ-II scores tended toward normal distributions but were slightly skewed toward lower scores (Figure 1). The HAQ DI had a relatively large floor effect with 9.5% of the patients scoring 0 (no disability) compared with 4.3% of the patients on the HAQ-II.

thumbnail image

Figure 1. Distribution of the total Health Assessment Questionnaire (HAQ) disability index (DI) and HAQ-II scores. HAQ DI: skewness 0.32, kurtosis −0.80; HAQ-II: skewness 0.39, kurtosis −0.75.

Download figure to PowerPoint

In general, the HAQ DI and HAQ-II items adequately fit the unidimensional Rasch model (Tables 2 and 3). Both scales had only 1 inconsistent item with an InFit or OutFit statistic >1.3. The hygiene category of the HAQ DI did not fit with the overall construct of functional disability, whereas “walk outdoors on a flat ground” did not fit well with the other items of the HAQ-II. Additionally, the 2 most difficult items of the HAQ-II (“move heavy objects” and “lift heavy objects”) had OutFit statistics <0.7, suggesting overfit or overlapping item content. The principal component analyses of the standardized residuals confirmed the unidimensionality of both scales. For the HAQ DI, 62.9% of the variance was explained by the Rasch dimension, whereas 7.4% of the unexplained variance was accounted for by the first residual factor with an eigenvalue of 1.6. The Rasch dimension in the HAQ-II accounted for 74.8% of the variance and the first residual factor, with an eigenvalue of 2.4, explained only 6% of the variance. Finally, interitem residual correlations were generally low for both measures. All residual correlations in the HAQ DI were <0.30. The HAQ-II, however, contained a high residual correlation of 0.59 between the items “move heavy objects” and “lift heavy objects.” Interitem residual correlations were low for the other items of the HAQ-II (r < 0.35).

Table 2. Item difficulties and fit statistics of the HAQ DI items ordered by difficulty level*
 Item difficulty (logits)SEInFit MNSQOutFit MNSQ
  • *

    HAQ DI = Health Assessment Questionnaire disability index; InFit = information-weighted fit; MNSQ = mean square; OutFit = outlier-sensitive fit. Person separation index 2.49, person reliability 0.86, item separation index 11.45.

  • More negative scores indicate more difficult items.

  • No MNSQ values <0.70 (overlapping or redundant items).

  • §

    MNSQ values >1.30 (inconsistent items or items not measuring the underlying construct).

Rising2.000.091.010.94
Walking0.470.081.061.07
Dressing and grooming0.340.080.860.84
Reach0.210.080.870.82
Eating−0.030.080.960.98
Grip−0.800.081.191.26
Activities−1.030.080.890.88
Hygiene−1.170.071.231.37§
Table 3. Item difficulties and fit statistics of the HAQ-II items ordered by difficulty level*
 Item difficulty (logits)SEInFit MNSQOutFit MNSQ
  • *

    HAQ-II = Health Assessment Questionnaire II; InFit = information-weighted fit; MNSQ = mean square; OutFit = outlier-sensitive fit. Person separation index 2.74, person reliability 0.88, item separation index 21.63.

  • More negative scores indicate more difficult items.

  • MNSQ values >1.30 (inconsistent items or items not measuring the underlying construct).

  • §

    MNSQ values <0.70 (overlapping or redundant items).

Get on and off the toilet?2.890.121.020.88
Stand up from a straight chair?2.330.101.101.19
Open car doors?1.870.100.961.08
Walk outdoors on flat ground?1.280.091.181.70
Reach and get down a 1-kg object (such as a bag of sugar) from just above your head?0.710.091.071.12
Go up 2 or more flights of stairs?−0.630.081.071.11
Wait in a line for 15 minutes?−0.860.081.021.20
Do outside work (such as yard work)?−1.560.080.950.91
Move heavy objects?−2.920.080.740.66§
Lift heavy objects?−3.110.080.770.64§

The HAQ-II had an excellent scale length with an item separation index of 21.63 compared with 11.45 for the HAQ DI, indicating that the HAQ-II covered a much wider range of the functional disability construct. Inspection of the item difficulty calibrations showed that the items of the HAQ-II were reasonably well spread across a wide range of difficulty. Besides its relatively limited scale length, the items of the HAQ DI tended to cluster around similar difficulty levels around the middle of the scale hierarchy, with relatively few items at the extremes. Both scales had person separation and reliability indices >2.0 and >0.85, respectively, indicating that both can adequately discriminate between levels of physical disability and are sufficiently reliable for individual patient use.

Three items of the HAQ DI exhibited substantial uniform DIF between subgroups of patients (Figure 2). After controlling for overall disability, women had less difficulty with dressing (DIF contrast 0.93, P < 0.001), but more difficulty with grip (DIF contrast 0.75, P < 0.001). Hygiene was less difficult for younger patients (DIF contrast 0.86, P < 0.001) and patients with shorter disease duration (DIF contrast 0.78, P < 0.001). The HAQ-II also had 3 items with DIF. Standing up from a straight chair was more difficult for men (DIF contrast 0.62, P = 0.007), younger patients (DIF contrast 0.55, P = 0.008), and patients with shorter disease duration (DIF contrast 0.75, P < 0.001). Getting on and off the toilet was more difficult for younger patients (DIF contrast 0.75, P = 0.001) and patients with shorter disease duration (DIF contrast 0.54, P < 0.020). Finally, opening car doors was more difficult for younger patients (DIF contrast 0.53, P = 0.010).

thumbnail image

Figure 2. Differential item functioning plots of the Health Assessment Questionnaire (HAQ) disability index (left column) and HAQ-II (right column) between patient groups based on sex (top), age (middle), and disease duration (bottom). Age and sex are split at the median.

Download figure to PowerPoint

The raw and Rasch-transformed HAQ DI and HAQ-II scores were highly intercorrelated (Table 4) and the absolute difference in raw mean scores was only 0.04 units. Additional Bland-Altman analysis of the Rasch-transformed scores showed that the mean HAQ-II scores were systematically biased toward worse scores on the HAQ (mean difference 0.216 logits, paired t-test, P < 0.001) with the 95% limits of agreement ranging from −2.156 to 2.588. Both scales demonstrated the expected pattern of correlations with other outcome measures, where HAQ-II correlates tended to be slightly stronger (Table 4).

Table 4. Pearson's correlations between the HAQ DI and HAQ-II scores and other self-reported outcome measures*
 HAQ DIHAQ-IISF-36 PCSSF-36 MCSNRS-PNRS-GH
  • *

    See Table 1 for definitions. Correlations with the Rasch-transformed scores of the HAQ versions are presented in parentheses.

HAQ DI1.00     
HAQ-II0.92 (0.89)1.00    
SF-36 PCS−0.65 (−0.65)−0.71 (−0.71)1.00   
SF-36 MCS−0.32 (−0.32)−0.34 (−0.34)0.091.00  
NRS-P0.46 (0.46)0.46 (0.47)−0.59−0.211.00 
NRS-GH0.40 (0.41)0.41 (0.44)−0.54−0.280.691.00

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

This study used Rasch analysis to examine the construct validity of the HAQ DI and the HAQ-II in patients with RA. The results suggest that both scales are psychometrically robust measures of physical disability. Compared with the HAQ DI, the HAQ-II has favorable scaling properties, as demonstrated by a better scale length and a reduced floor effect. The results further point to some improvements that could be made with respect to misfitting or redundant items and items with DIF that were present in both scales.

In general, both the HAQ DI and the HAQ-II showed an acceptable fit to the Rasch model as judged by the information-weighted mean square InFit and OutFit statistics. Following previous studies examining the HAQ using Rasch analysis (10, 13, 14), we applied the common critical range of 0.7–1.3 for reasonable fit. However, it should be noted that these statistics are sensitive to sample size and may lead to an unacceptable Type I error rate with large samples (38). In the HAQ DI, only the hygiene category demonstrated substantial underfit as indicated by a relatively high OutFit value. The finding that this category is inconsistent or does not measure the same dimension as the other categories is in accordance with previous Rasch analyses of the scale (13, 14, 25). Wolfe et al (14) have suggested that this noisiness may be caused by patients guessing at their ability to answer the underlying item “take a tub bath,” because many people use showers instead of bathtubs. The HAQ-II also contained 1 item with high OutFit. The finding that “walk outdoors on a flat ground” did not closely relate to the overall construct of disability is somewhat surprising and has not been reported in previous studies. Given that the corresponding InFit statistic, which is less sensitive to unexpected responses to items far from a person's level of ability, is acceptable, the misfit of this item may be inflated by a few unexpected responses of patients with high disability.

Besides this underfitting item, the HAQ-II additionally contained 2 items with low InFit and OutFit statistics, which usually indicates overlap or redundancy in the pattern of responses. This overlap between “move heavy objects” and “lift heavy objects” was also apparent by an unacceptably high residual correlation between these items. Indeed, simple inspection of the item content suggests that the items assess very similar and interdependent tasks as people will usually try to lift objects in order to move them. Omission of one of these items or, even better, replacement with a slightly less difficult item in future studies could lead to better overall scaling properties of the HAQ-II.

As would be expected from a Rasch-developed measure, the HAQ-II demonstrated better distributional and scaling properties than the traditional HAQ DI. Total HAQ-II scores showed a substantially lower floor effect than total HAQ DI scores. The floor effect of the HAQ DI, in which patients report a normal score but nonetheless experience functional limitations, is a well-known problem of the HAQ DI (8–11, 13, 14) and was one of the main reasons for the development of the HAQ-II. This smaller floor effect is achieved by a better scale length and item difficulty calibration of the HAQ-II. The HAQ-II measures a wider range of disability and has specifically more items probing relatively difficult activities. In fact, according to the current results, the 5 most difficult items on the HAQ-II were the ones that were added to the scale by Wolfe et al. This resulted in a high item separation index for the HAQ-II, which was almost twice the size of that for the HAQ DI.

The present study design did not allow for a direct examination of the responsiveness of the scales. However, the high person separation and reliability indices indicate that both scales can identify several statistically distinct levels of person ability. This lends support to the sensitivity of both scales to changes in physical disability, where the HAQ-II would theoretically be somewhat more sensitive.

One concern with both scales is the presence of items with DIF between patient groups. In particular, older and younger patients, who are at the same level of disability, appear to respond somewhat differently to several items. Although the absolute magnitude of DIF was generally small and may average out across the items in a scale, future studies should continue to examine the presence of DIF and its influence on the total scale scores.

Although the actual item difficulty estimates of the HAQ DI in this study were somewhat different from those reported in previous studies in RA (10, 13, 25), the difficulty hierarchy (rank order) was quite similar to those most recently found by Wolfe (25) and Taylor and McPherson (13). In addition, both the actual difficulty estimates and the difficulty hierarchy of the HAQ-II items in this study were very similar to those of the original US version (14). This finding provides some preliminary support for both the comparability of the scales across different RA cohorts and the robustness of the present findings. Future research, using Rasch analysis on pooled data from different countries and cohorts, could assess more thoroughly whether the scales are equivalent across cultures and different cohorts.

An important, and possibly related issue, remains: the divergent translation of the item “reach and get down a 5-pound object (such as a bag of sugar) from just above your head” in the Dutch HAQ DI and HAQ-II. In the versions we used in this study, this item is made “easier” by reducing the object weight to 1 kg. In the Rasch analysis, this was reflected in substantially lower item difficulty estimates for this item of the HAQ-II and the corresponding reach category of the HAQ DI compared with previous studies using the original wording (10, 13, 14, 25). This difference in item difficulty should be kept in mind when comparing the present results with previous (Rasch) analyses of the HAQ DI and HAQ-II. Recently, a new consensus version of the Dutch HAQ DI was proposed, which should overcome this problem (16).

Finally, the HAQ DI and HAQ-II were highly intercorrelated and demonstrated a similar pattern of associations with other validated outcomes, suggesting that both scales assess the same underlying construct. Off course, this is not very surprising because 5 of the 10 HAQ-II items stem directly from the HAQ DI. Although the absolute mean difference between the raw HAQ DI and HAQ-II scores was negligible, Bland-Altman analysis of the Rasch-transformed scores showed that the HAQ DI and HAQ-II scores were significantly different and characterized by high intraindividual variation. Wolfe et al (14) have suggested conversion formulae for transforming group-level data from HAQ DI to HAQ-II and vice versa. The current results, however, confirm their finding that the HAQ DI and HAQ-II cannot be used interchangeably at the individual patient level.

In conclusion, this study suggests that the HAQ DI and HAQ-II are both adequately valid measures of physical disability in patients with RA, but confirm that the Rasch-developed HAQ-II has better distributional and scaling properties. Moreover, given that the HAQ-II is much shorter (particularly when the aids and devices section of the HAQ DI are considered) and easier to score, the HAQ-II appears to be more suitable for use in clinical care.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

Dr. ten Klooster had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Ten Klooster, Taal, van de Laar.

Acquisition of data. Ten Klooster, Taal, van de Laar.

Analysis and interpretation of data. Ten Klooster, Taal.

Manuscript preparation. Ten Klooster, Taal, van de Laar.

Statistical analysis. Ten Klooster.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES

The authors thank Christine Bellmann, Ilse Bosgra, Petra Hagens, Nicolette Kupper, Julia Rulle, Lisanne Schmit, Amrah Schotanus, Johan Steehouder, Katharine Steentjes, Lidewij van Gessel, and Anouk van der Heij for collecting the data, and Christina Bode and Andre Brands for their help in organizing the study.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENT AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  • 1
    Boers M, Tugwell P, Felson DT, van Riel PL, Kirwan JR, Edmonds JP, et al. World Health Organization and International League of Associations for Rheumatology core endpoints for symptom modifying antirheumatic drugs in rheumatoid arthritis clinical trials. J Rheumatol 1994; 21: 869.
  • 2
    Felson DT, Anderson JJ, Boers M, Bombardier C, Chernoff M, Fried B, et al. The American College of Rheumatology preliminary core set of disease activity measures for rheumatoid arthritis clinical trials. Arthritis Rheum 1993; 36: 72940.
  • 3
    Wolfe F, Lassere M, van der Heijde D, Stucki G, Suarez-Almazor M, Pincus T, et al. Preliminary core set of domains and reporting requirements for longitudinal observational studies in rheumatology. J Rheumatol 1999; 26: 4849.
  • 4
    Fries JF, Spitz P, Kraines RG, Holman HR. Measurement of patient outcome in arthritis. Arthritis Rheum 1980; 23: 13745.
  • 5
    Fries JF, Spitz PW, Young DY. The dimensions of health outcomes: the Health Assessment Questionnaire, disability and pain scales. J Rheumatol 1982; 9: 78993.
  • 6
    Bruce B, Fries JF. The Stanford Health Assessment Questionnaire: dimensions and practical applications [review]. Health Qual Life Outcomes 2003; 1: 20.
  • 7
    Bruce B, Fries JF. The Stanford Health Assessment Questionnaire: a review of its history, issues, progress, and documentation. J Rheumatol 2003; 30: 16778.
  • 8
    Stucki G, Stucki S, Bruhlmann P, Michel BA. Ceiling effects of the Health Assessment Questionnaire and its modified version in some ambulatory rheumatoid arthritis patients. Ann Rheum Dis 1995; 54: 4615.
  • 9
    Pincus T, Swearingen C, Wolfe F. Toward a Multidimensional Health Assessment Questionnaire (MDHAQ): assessment of advanced activities of daily living and psychological status in the patient-friendly Health Assessment Questionnaire format. Arthritis Rheum 1999; 42: 222030.
  • 10
    Tennant A, Hillman M, Fear J, Pickering A, Chamberlain MA. Are we making the most of the Stanford Health Assessment Questionnaire? Br J Rheumatol 1996; 35: 5748.
  • 11
    Uhlig T, Haavardsholm EA, Kvien TK. Comparison of the Health Assessment Questionnaire (HAQ) and the modified HAQ (MHAQ) in patients with rheumatoid arthritis. Rheumatology (Oxford) 2006; 45: 4548.
  • 12
    Wolfe F. The psychometrics of functional status questionnaires: room for improvement. J Rheumatol 2002; 29: 8658.
  • 13
    Taylor WJ, McPherson KM. Using Rasch analysis to compare the psychometric properties of the Short Form 36 physical function score and the Health Assessment Questionnaire disability index in patients with psoriatic arthritis and rheumatoid arthritis. Arthritis Rheum 2007; 57: 7239.
  • 14
    Wolfe F, Michaud K, Pincus T. Development and validation of the Health Assessment Questionnaire II: a revised version of the Health Assessment Questionnaire. Arthritis Rheum 2004; 50: 3296305.
  • 15
    Zandbelt MM, Welsing PM, van Gestel AM, van Riel PL. Health Assessment Questionnaire modifications: is standardisation needed? Ann Rheum Dis 2001; 60: 8415.
  • 16
    Boers M, Jacobs JW, van Vliet Vlieland TP, van Riel PL. Consensus Dutch Health Assessment Questionnaire. Ann Rheum Dis 2007; 66: 1323.
  • 17
    Ware JE, Kosinski M, Dewey JE. How to score version 2 of the SF-36 health survey. Lincoln (RI): QualityMetric Incorporated; 2000.
  • 18
    Hays RD, Morales LS, Reise SP. Item response theory and health outcomes measurement in the 21st century. Med Care 2000; 38: II2842.
  • 19
    Reeve BB, Fayers P. Applying item response theory modelling for evaluating questionnaire item and scale properties. In: FayersPM, HaysRD, editors. Assessing quality of life in clinical trials: methods and practice. Oxford: Oxford University Press; 2005. p. 5573.
  • 20
    Tennant A, McKenna SP, Hagell P. Application of Rasch analysis in the development and application of quality of life instruments. Value Health 2004; 7 Suppl 1: S226.
  • 21
    Wolfe F, Kong SX. Rasch analysis of the Western Ontario MacMaster questionnaire (WOMAC) in 2205 patients with osteoarthritis, rheumatoid arthritis, and fibromyalgia. Ann Rheum Dis 1999; 58: 5638.
  • 22
    Roorda LD, Jones CA, Waltz M, Lankhorst GJ, Bouter LM, van der Eijken JW, et al. Satisfactory cross cultural equivalence of the Dutch WOMAC in patients with hip osteoarthritis waiting for arthroplasty. Ann Rheum Dis 2004; 63: 3642.
  • 23
    Ryser L, Wright BD, Aeschlimann A, Mariacher-Gehler S, Stucki G. A new look at the Western Ontario and McMaster Universities Osteoarthritis Index using Rasch analysis. Arthritis Care Res 1999; 12: 3315.
  • 24
    Wolfe F, Hawley DJ, Goldenberg DL, Russell IJ, Buskila D, Neumann L. The assessment of functional impairment in fibromyalgia (FM): Rasch analyses of 5 functional scales and the development of the FM Health Assessment Questionnaire. J Rheumatol 2000; 27: 198999.
  • 25
    Wolfe F. Which HAQ is best? A comparison of the HAQ, MHAQ and RA-HAQ, a difficult 8 item HAQ (DHAQ), and a rescored 20 item HAQ (HAQ20): analyses in 2,491 rheumatoid arthritis patients following leflunomide initiation. J Rheumatol 2001; 28: 9829.
  • 26
    Pouchot J, Ecosse E, Coste J, Guillemin F, for the French Quality of Life Study Group and the Paediatric Rheumatology International Trials Organisation. Validity of the Childhood Health Assessment Questionnaire is independent of age in juvenile idiopathic arthritis. Arthritis Rheum 2004; 51: 51926.
  • 27
    Kucukdeveci AA, Sahin H, Ataman S, Griffiths B, Tennant A. Issues in cross-cultural validity: example from the adaptation, reliability, and validity testing of a Turkish version of the Stanford Health Assessment Questionnaire. Arthritis Rheum 2004; 51: 149.
  • 28
    Durez P, Fraselle V, Houssiau F, Thonnard JL, Nielens H, Penta M. Validation of the ABILHAND questionnaire as a measure of manual ability in patients with rheumatoid arthritis. Ann Rheum Dis 2007; 66: 1098105.
  • 29
    Masters G. A Rasch model for partial credit scoring. Psychometrika 1982; 47: 149174.
  • 30
    Wright BD, Linacre JM, Gustafson JE, Martin-Lof P. Reasonable mean-square fit values. Rasch Meas Trans 1994; 8: 370.
  • 31
    Linacre JM. A user's guide to WINSTEPS MINISTEP Rasch-model computer programs. Chicago: Winsteps; 2006.
  • 32
    Davidson M, Keating JL, Eyres S. A low back-specific version of the SF-36 physical functioning scale. Spine 2004; 29: 58694.
  • 33
    Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Rheum 2007; 57: 135862.
  • 34
    Draba RE. The identification and interpretation of item bias (research memorandum no. 25). Chicago: University of Chicago; 1977.
  • 35
    Lai JS, Teresi J, Gershon R. Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Eval Health Prof 2005; 28: 28394.
  • 36
    Linacre M. Sample size and item calibration stability. Rasch Meas Trans 1994; 7: 328.
  • 37
    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 30710.
  • 38
    Smith RM, Schumacker RE, Bush MJ. Using item mean squares to evaluate fit to the Rasch model. J Outcome Meas 1998; 2: 6678.