- Top of page
- PATIENTS AND METHODS
Assessment of functional ability in rheumatology is a central element in health status measurement. It provides indicators of outcomes of interventions and of disease severity. The integrity of the evaluation depends upon whether or not the instrument possesses proper measurement properties. The most effective instruments are reliable (yield the same results repeatedly), valid (measures what it is supposed to measure), responsive (able to detect changes), widely used in many different settings (generalizable), have good norms (based on representative samples), are available in multiple languages, and have been stable for a sufficient length of time to permit longitudinal study. There are a number of such instruments. Some of them are multipurpose or generic, i.e., they are applicable across many diseases and conditions, such as the Short Form 36 (SF-36) (1); whereas some are disease-specific, i.e., they have restricted application to 1 or a small number of disease conditions, such as the Arthritis Impact Measurement Scale (2). The primary advantages for using a disease-specific instrument versus a more generic one are that it has improved content validity and better responsiveness to change in the specific disease condition for which it was tailored.
Two functional status assessment instruments widely used in clinical and observational trials for assessing physical function are the disease-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) (3) and the more generic Health Assessment Questionnaire (HAQ) (4). Both are patient-centered, self-assessment tools that measure multiple dimensions of health status and take ∼5–10 minutes to complete. The WOMAC is designed specifically for patients with osteoarthritis (OA) of the knee and/or hip joints and evaluates 3 dimensions: physical function, pain, and stiffness (5). The HAQ, although initially developed and validated in patients with rheumatoid arthritis (6), has been broadly and extensively used and validated in widely diverse populations, including patients with OA, human immunodeficiency virus/acquired immunodeficiency syndrome, juvenile rheumatoid arthritis, systemic lupus erythematosus, scleroderma, ankylosing spondylitis, fibromyalgia, psoriatic arthritis, systemic sclerosis, and in healthy aging populations (7). The full HAQ is designed to measure 5 generic dimensions: physical function, pain, drug side effects, health care utilization, and mortality. However, it is the short, or 2-page, HAQ that is most recognized for its 2 core measurement scales: the disability index and the pain scale, as well as its assessment of global health status (7). The 2-page HAQ is one of the most frequently used instruments for evaluation of functional status, is frequently used in clinical settings, and is one of the instruments recommended for patient outcome assessment in rheumatoid arthritis clinical trials (7, 8). Although the HAQ disability index (DI) is copyrighted to preserve its integrity, the English version is provided free of charge, whereas use of the WOMAC may be fee based. Both instruments have repeatedly demonstrated reliability, validity, and responsiveness both in observational and clinical trials (5, 7, 9).
Although both instruments have a long history and similarities and distinctions between the 2 instruments are readily observable, reports of head-to-head comparisons are few (5). In particular, face and content validity have not been directly compared, nor has the seminal characteristic, sensitivity to change, been rigorously compared. Analysis of comparability between instruments can be helpful for instrument selection, can permit comparisons of findings across disease conditions, and can provide insight into considerations regarding generic and disease-specific instruments. For clinical trials, greater sensitivity to change (responsiveness) means greater statistical power and fewer subjects that are required for a given level of statistical power.
In this study, we address 3 questions. What are the similarities and differences in face and content validity between the physical function and pain scales of the 2 instruments? How well do the disease-specific WOMAC and the generic HAQ correlate with each other? Does the disease-specific WOMAC demonstrate better responsiveness to change in a specific disease condition, knee or hip OA, as compared with the more generic HAQ?
- Top of page
- PATIENTS AND METHODS
The first goal of our study was to compare the similarities and differences in face and content validity of the HAQ and WOMAC physical function and pain scales. Overall, we found that there was a great deal of commonality between the 2 physical function scales in the types of functional abilities assessed. These similarities, along with their respective histories, reaffirm their utility as measures of functional status.
However, we found that there are significant differences between the 2 instruments. Only the HAQ DI contains items that specifically assess upper extremity function, which was not unexpected because the WOMAC is designed to be a measure of lower extremity function (3). On the other hand, the WOMAC physical function scale contains more items than the HAQ DI that assess both upper and lower extremity function, which was surprising in an instrument designed for lower extremity function in OA of the knee or hip. In contrast to the single-item pain scale of the HAQ, the 5 items used to assess pain in the WOMAC consist of function-based behaviors and suggest attributes of a disability measure rather than purely pain assessment. This finding is similar to those of Wolfe (18), who had reported that the WOMAC tapped into other domains and was influenced by existence of fatigue, symptom counts, depression, and back pain. In addition, the more general wording of many WOMAC items, compared with the specificity in the HAQ, may have contributed to equivocal responses. Ambiguity in item construction has been reported to negatively effect reliability and sensitivity to change relative to items that are more precisely constructed (19).
The second goal of our study was to examine the association between the HAQ and WOMAC physical function and pain scales to determine the extent of their relationship and whether they measure similar constructs. Both physical function scales were significantly and strongly correlated, indicating that the scales performed well and that patients interpreted them similarly. This finding is consistent with the strong correlations between these scales that have been reported by Brazier et al (20), Wolfe (18), and in a study that compared the SF-36 with the WOMAC (21). The WOMAC physical function scale and the SF-36 physical function component were found to be highly correlated (rs = 0.71–0.75), further confirming the WOMAC's ability to measure functional status.
However, the strongest correlations at both time points and with change scores were among the 3 WOMAC scales. These strong intrascale correlations suggest that the 3 WOMAC scales are measuring similar constructs within the overall instrument. In contrast, the relationship between the HAQ DI and the HAQ pain scale was moderate, which may be due in part to the HAQ DI assessing a broader range of functioning and its pain scale being more global and capturing reported pain from both upper and lower extremities. The HAQ and WOMAC pain scales were also correlated with each other, albeit weakly, at both times of administration.
We also found that the 2 physical function scales differed in their responsiveness; the HAQ DI was more sensitive to change than the WOMAC in detection of disease progression, although less sensitive than radiograph scores. This finding is particularly significant because sensitivity to change is crucial for statistical power calcula- tions for estimation of patient numbers for a clinical study. Plausible explanations for this observation are that the items in the HAQ DI are more specific and that the HAQ DI scoring rule, which averages the highest scores in each of the 8 categories rather than averaging all responses, is more sensitive than the corresponding scoring of the WOMAC. This phenomenon has been demonstrated in a comparison of the HAQ DI and the modified HAQ, which asks a single question in each of the 8 categories and then averages the responses (22, 23). In addition, because we had modified the WOMAC timeframe from its convention of 48 hours to 1 week to match the HAQ timeframe, this could have affected the results. However, Griffiths et al (13) reported finding no time dependency of responses between 1 and 14 days.
Furthermore, although radiographic progression was not a principal subject of this analysis, the TKS performed well and was able to distinguish progression at the group level. Although the TKS was poorly correlated with other outcome measures, this was not unexpected because they represent different entities and other researchers have also found little evidence of a strong association between such outcomes as pain and disability with radiographic changes in patients with knee OA (24, 25).
Outcome assessment in arthritis is highly dependent on the deployment of valid, reliable, and responsive measurement tools, such as the HAQ and WOMAC. Overall, both instruments performed well, but the more generic HAQ demonstrated somewhat better measurement properties than the disease-specific WOMAC in this group of patients with OA of the knee or hip. A number of plausible explanations for this finding were identified. These differ from what has been suggested previously (26), specifically that a disease-specific instrument, such as the WOMAC, would be expected to be more sensitive in assessment of the condition for which it was developed. The WOMAC has been shown to be a useful measurement tool in OA of the knee or hip and has achieved status as a standard in these conditions. Whether the HAQ or WOMAC is used should depend in part on the expected magnitude of differences between groups. If small differences are to be shown significant, then the measure that is more sensitive to change is preferred. As such, this indicates that the HAQ would be the better choice. These results also suggest that findings across studies may be better compared using the HAQ, which offers the additional advantages of having a wider usage, multiple language and cultural translations, and being adaptable to a wide variety of diseases and conditions, which contribute to the generalizability of findings.