To compare the measurement properties of the generic Health Assessment Questionnaire (HAQ) and the disease-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).
To compare the measurement properties of the generic Health Assessment Questionnaire (HAQ) and the disease-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).
Physical function, pain, and radiographic progression were assessed in knee or hip osteoarthritis patients (n = 271) who had 2 radiographs that were at least 6 months apart from 6 ARAMIS (Arthritis, Rheumatism, and Aging Medical Information System) databanks. Data were compared at baseline and after a mean of 3.2 (SE 0.10) years. Correlation coefficients and standardized effect sizes (SES) were used to assess their relationship and responsiveness.
The majority of items in the 2 function and pain scales overlapped and were highly and significantly correlated with each other at baseline and last assessments (function at baseline rs = 0.71 and function at last assessment rs = 0.79, P < 0.0001; pain at baseline rs = 0.70 and pain at last assessment rs = 0.76, P < 0.0001). The HAQ disability index and total knee score were more sensitive to detection of disease progression than the WOMAC (SES for HAQ = 0.27; SES for WOMAC = −0.05).
Both instruments showed favorable measurement properties, with the HAQ having the advantages of being more sensitive to change and adaptable to a wide variety of diseases and conditions, which contribute to the generalizability of findings.
Assessment of functional ability in rheumatology is a central element in health status measurement. It provides indicators of outcomes of interventions and of disease severity. The integrity of the evaluation depends upon whether or not the instrument possesses proper measurement properties. The most effective instruments are reliable (yield the same results repeatedly), valid (measures what it is supposed to measure), responsive (able to detect changes), widely used in many different settings (generalizable), have good norms (based on representative samples), are available in multiple languages, and have been stable for a sufficient length of time to permit longitudinal study. There are a number of such instruments. Some of them are multipurpose or generic, i.e., they are applicable across many diseases and conditions, such as the Short Form 36 (SF-36) (1); whereas some are disease-specific, i.e., they have restricted application to 1 or a small number of disease conditions, such as the Arthritis Impact Measurement Scale (2). The primary advantages for using a disease-specific instrument versus a more generic one are that it has improved content validity and better responsiveness to change in the specific disease condition for which it was tailored.
Two functional status assessment instruments widely used in clinical and observational trials for assessing physical function are the disease-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) (3) and the more generic Health Assessment Questionnaire (HAQ) (4). Both are patient-centered, self-assessment tools that measure multiple dimensions of health status and take ∼5–10 minutes to complete. The WOMAC is designed specifically for patients with osteoarthritis (OA) of the knee and/or hip joints and evaluates 3 dimensions: physical function, pain, and stiffness (5). The HAQ, although initially developed and validated in patients with rheumatoid arthritis (6), has been broadly and extensively used and validated in widely diverse populations, including patients with OA, human immunodeficiency virus/acquired immunodeficiency syndrome, juvenile rheumatoid arthritis, systemic lupus erythematosus, scleroderma, ankylosing spondylitis, fibromyalgia, psoriatic arthritis, systemic sclerosis, and in healthy aging populations (7). The full HAQ is designed to measure 5 generic dimensions: physical function, pain, drug side effects, health care utilization, and mortality. However, it is the short, or 2-page, HAQ that is most recognized for its 2 core measurement scales: the disability index and the pain scale, as well as its assessment of global health status (7). The 2-page HAQ is one of the most frequently used instruments for evaluation of functional status, is frequently used in clinical settings, and is one of the instruments recommended for patient outcome assessment in rheumatoid arthritis clinical trials (7, 8). Although the HAQ disability index (DI) is copyrighted to preserve its integrity, the English version is provided free of charge, whereas use of the WOMAC may be fee based. Both instruments have repeatedly demonstrated reliability, validity, and responsiveness both in observational and clinical trials (5, 7, 9).
Although both instruments have a long history and similarities and distinctions between the 2 instruments are readily observable, reports of head-to-head comparisons are few (5). In particular, face and content validity have not been directly compared, nor has the seminal characteristic, sensitivity to change, been rigorously compared. Analysis of comparability between instruments can be helpful for instrument selection, can permit comparisons of findings across disease conditions, and can provide insight into considerations regarding generic and disease-specific instruments. For clinical trials, greater sensitivity to change (responsiveness) means greater statistical power and fewer subjects that are required for a given level of statistical power.
In this study, we address 3 questions. What are the similarities and differences in face and content validity between the physical function and pain scales of the 2 instruments? How well do the disease-specific WOMAC and the generic HAQ correlate with each other? Does the disease-specific WOMAC demonstrate better responsiveness to change in a specific disease condition, knee or hip OA, as compared with the more generic HAQ?
Between 1996 and 2000, we studied 271 patients with OA of the knee or hip with a mean (SE) of 3.1 (0.1) years of observation. They primarily were white (80%) women (79%) with an average age of 65.7 (0.6) years, 13.8 (0.1) years of education, and a mean disease duration of 14.1 (0.7) years. At the first administration, patients reported a HAQ DI score of 0.79 (0.04), representing mild to moderate disability.
These patients had been participants of a larger randomized trial and an associated observational study whose data were pooled for this investigation. The larger study was designed to compare radiographic and clinical outcomes between patients with painful hip or knee OA who were regular users of nonsteroidal antiinflammatory drugs and those using other analgesics (primarily acetaminophen) (10, 11). In both the longitudinal observational study and the randomized controlled trial, there were no differences in patient outcomes between the 2 groups, and their degrees of OA progression were similar (10, 11). To be included in these analyses, patients from the larger studies had to have had 2 radiographs taken that were at least 6 months apart.
All study patients were drawn from 6 ARAMIS (Arthritis, Rheumatism, and Aging Medical Information System) databanks. ARAMIS is a longitudinal outcome assessment program that has been collecting detailed outcome data on patients with rheumatic diseases semiannually for >2 decades from databank centers in North America. Patient data are reviewed by trained outcome assessors who followup with patients for ambiguities, inconsistencies, or missing data following standard ARAMIS protocols (12). In general, patients in this study mirror ARAMIS cohorts of OA patients, who have been shown to be similar to the general OA patient population. The study protocol and informed consent were approved by the Stanford University Administrative Panel on Human Subjects in Medical Research, and each patient gave written informed consent.
Patients completed the full HAQ (comprised of the HAQ DI, HAQ pain scale, patient global health assessment, drug side effects, health care resource utilization, health behavior, and demographic variables) and the visual analog scale (VAS) WOMAC physical function, pain, and stiffness scales on 2 occasions, an average of 3.2 (0.10) years apart. The HAQ DI and pain scale and the 3 WOMAC scales were separated by several pages at each administration. The HAQ DI, pain scale, and patient global assessment were unchanged from their standard format and reference timeframe of 1 week. To minimize patient reporting differences, the reference timeframe on the 3 scales of the VAS WOMAC was set to match the HAQ timeframe of 1 week, rather than the original WOMAC timeframe of the previous 48 hours. This modification has been reported to have little effect on results (13).
The HAQ DI and pain scale and the WOMAC physical function, pain, and stiffness scales were assessed for face and content validity by conducting a content analysis. The content and degree of overlap between the HAQ and WOMAC physical function scales were evaluated by categorizing each item by whether it measured lower extremity function, upper extremity function, both upper and lower extremity function, or fell into an “other” category when it did not fit into any of the other groups. The content analysis also included comparison of items for specificity and for face validity.
In addition, we used a total knee score (TKS) obtained from knee radiographs as an indicator of OA progression. Weight-bearing knee radiographs with knee flexion at 15° using the Buckland-Wright method (14) had been evaluated for a global Kellgren-Lawrence (15) summary grade and for joint space narrowing, osteophytes, sclerosis, and minimal joint space in millimeters to obtain a TKS for each knee from 0 (normal) to 64 (worst). Radiographs were read and evaluated by 1 trained reader (Nancy Lane, MD) who was blinded to group assignment. For this study, as an external measure of validity, our gold standard was the relative ability to detect disease progression.
The 2-page HAQ, consisting of the HAQ DI, the HAQ pain scale, and a measure of global health status was used for this study (7). The HAQ DI is comprised of 20 items in 8 categories (dressing, arising, eating, walking, hygiene, reaching, gripping, and outside activity) and is measured on a 4-point ordinal scale from 0 to 3: 0 = without any difficulty; 1 = with some difficulty; 2 = with much difficulty; and 3 = unable to do. The highest score in each of 8 categories is averaged into a disability index on a scale from 0 (no disability) to 3 (complete disability). The HAQ pain scale evaluates the presence or absence of arthritis-related pain and its severity using a single-item, 15-cm double-anchored VAS that is scored from 0 (no pain) to 3 (severe pain). The patient global health scale is also a single-item, 15-cm double-anchored VAS that is scored from 0 (very well) to 100 (very poor).
The WOMAC consists of 24 items that comprise 3 scales: a physical function scale that consists of 17 items, each rated from 0 (no difficulty) to 100 (extreme difficulty); a pain scale that consists of 5 items related to pain levels in different activities, each rated from 0 (no pain) to 100 (extreme pain); and a stiffness scale that consists of 2 items that assess degree of stiffness, rated from 0 (no stiffness) to 100 (extreme stiffness). The WOMAC can be administered in either a Likert scale format, scored from 0–4, or a VAS format, scored from 0–100. The VAS version has been shown to be slightly more responsive (5) and was used in this study. The VAS scales are scored by calculating a value for each subscale by summing the assigned value on component items, resulting in subscale scores: physical function = 0–1,700; pain = 0–500; and stiffness = 0–200 where higher scores on all 3 scales indicate poorer status. We also normalized the 3 WOMAC indices on 0–100 scales to correct for differences in scale range by using the following correction factors as recommended by Bellamy (3): physical function scale × 0.059; pain scale × 0.20; and stiffness scale × 0.50.
Spearman's correlation coefficients were estimated for the HAQ DI, pain, and patient global scales and the 3 WOMAC scales to assess the extent to which they were correlated and measured similar constructs. We performed correlations with patient global and TKS to assess convergent validity. Means and standard errors for the first and last measurements and change scores (SE) between first and last administrations were computed. Standardized effect sizes were computed to assess sensitivity to change by dividing the change scores by the standard deviation of the difference (16). We followed general convention for interpretation of effect sizes with 0.2 = small change; 0.5 = moderate change; and 0.8 = large change (17). Nonparametric Wilcoxon's signed rank tests were used to test statistical significance at P < 0.05. Statistics were computed using SAS version 8.2 (SAS Institute, Cary, NC) for Windows.
Table 1 shows that the WOMAC and HAQ differ structurally and contextually; that the stems are not the same; the wording, numbering, and ordering of items are different; and that the particular response items vary. Overall, the HAQ contains slightly fewer items than the WOMAC (22 versus 24). The HAQ DI contains more items than the WOMAC function scale (20 versus 17). The HAQ pain scale is comprised of 1 item, whereas the WOMAC contains 5. Stiffness is measured by 2 items on the WOMAC that assess degree of stiffness relative to defined activities (after first awakening in the morning and after sitting, lying, or resting later in the day).
|Structure||Ordinal scale, 20 items, scored 0–3 (3 = completely disabled)||VAS, 17 items, scored 0–100 (100 = extreme difficulty)|
|Stem||Are you able to…||The following questions concern your physical function. By this we mean your ability to move around and to look after yourself. Please indicate the degree of difficulty you have experienced…|
|Structure||VAS, 1 item, scored 0–3 (3 = severe pain)||VAS, 5 items, scored 0–100 (100 = extreme pain)|
|Stem||How much pain have you had because of your illness?||The following questions concern the amount of pain you are currently experiencing. For each situation, please enter the amount of pain that you have experienced in the past week…|
|Structure||VAS, 2 items, scored 0–100 (100 = extreme stiffness)|
|Stem||The following questions concern the amount of joint stiffness (not pain) you have experienced in the past week…|
|Patient global||Not applicable|
|Structure||VAS, 1 item, scored 0–100 (100 = very poor)|
|Stem||Considering all the ways that your arthritis affects you, rate how you are doing…|
Comparison of the 2 physical function scales (Table 2) shows more similarities than differences in behaviors being measured. The majority of items in the HAQ DI overlap with the WOMAC physical function scale, and similarly the majority of items in the WOMAC physical function scale overlap with those in the HAQ DI. Both scales contain items that assess both upper and lower extremity function (8 items on the HAQ and 10 on the WOMAC) as well as only lower extremity function (3 items on the HAQ and 5 items in the WOMAC). The HAQ, but not the WOMAC, contains 9 items that specifically assess upper extremity function. Two items on the WOMAC, “lying in bed” and “sitting,” were classed as “other,” because they did not fit into any of the other specific categories.
|HAQ DI||Overlaps with WOMAC||WOMAC function||Overlaps with HAQ|
|Upper and lower extremity|
|Items||1. Dress yourself, including tying shoelaces and doing buttons||✓||1. Putting on socks, stockings||✓|
|2. Get in and out of bed||✓||2. Taking off socks/stockings||✓|
|3. Take a tub bath||✓||3. Rising from bed||✓|
|4. Bend down to pick up clothing from the floor||✓||4. Getting in/out of bath||✓|
|5. Run errands and shop||✓||5. Bending to floor||✓|
|6. Get in and out of a car||✓||6. Going shopping||✓|
|7. Do chores like vacuuming or yard work||✓||7. Getting in/out of car||✓|
|8. Get on and off the toilet||✓||8. Heavy domestic duties||✓|
|9. Getting on/off toilet||✓|
|10. Light domestic duties||✓|
|Items||9. Shampoo your hair||NA|
|10. Cut your meat||NA|
|11. Lift a full cup or glass to your mouth||NA|
|12. Open a new milk carton||NA|
|13. Wash and dry your body||NA|
|14. Reach and get down a 5-pound object (such as a bag of sugar) from just above your head||NA|
|15. Open car doors||NA|
|16. Open jars that have been previously opened||NA|
|17. Turn faucets on and off||NA|
|Items||18. Stand up from a straight chair||✓||11. Rising from sitting||✓|
|19. Walk outdoors on flat ground||✓||12. Walking on flat surface||✓|
|20. Climb up five steps||✓||13. Ascending stairs||✓|
|15. Descending stairs|
|Other||16. Lying in bed|
We found that the WOMAC pain scale evaluates pain with function-based items (e.g., walking on a flat surface, going up/down stairs, at night while in bed, sitting or lying, and standing upright), whereas the single-item VAS pain scale in the HAQ assesses amount of arthritis-related pain. We also observed that by and large the items on the WOMAC are more general and do not articulate particular behaviors that might be specific to knee or hip OA, such as squatting, flexing beyond 90°, full straightening, side-to-side instability, etc., which may also be noted about the HAQ. However, the HAQ operates more like a generic instrument and as such points to no advantage or disadvantage between the 2 instruments, although its more generic nature may be less of an advantage when exact or specific measurement is desired.
Analysis of individual items reveals that the language on the WOMAC as compared with the HAQ is more general, which could potentially contribute to misinterpretation. For example, different patients may construe the WOMAC item “bending to floor” in different ways, e.g., as bending to pick up something, bending only partway from the waist, bending all the way so that fingers touch the floor, or even bending knees first. In the WOMAC User's Guide (3), Bellamy defines this item as referring to the degree of difficulty bending to pick something up (unspecified) off the floor; however this definition is not provided to the patient. In contrast, the HAQ companion item, “bend down to pick up clothing from the floor” provides a specific reference, effort level, and a particular range of motion. Furthermore, many WOMAC items omit the conjunction, which can contribute to ambiguity. For example, it is unclear whether “getting in/out of bath,” “getting in/out of car,” “getting on/off toilet” refer to one or both of the activities named.
Table 3 shows the mean (SE) response values at first and last assessments for all outcomes of the pooled data. They represent disease progression at the group versus individual patient level because numerous variable factors could affect individual disease course. WOMAC function and stiffness scores decreased slightly, patient global scores increased, and we found essentially no changes in either of the pain scales. None of these changes were statistically significant, and all had relatively small effect sizes. However, the HAQ DI and the TKS showed highly significant (P < 0.0001) progression accompanied by meaningful effect sizes (0.27 and 0.75). This major difference in the ability to detect disease progression in the pooled data shown here was similarly seen in both the observational study and the randomized trial from which these patients were drawn (10, 11).
|First||Last||P||Standardized effect size|
|HAQ DI (0–3, 3 = completely disabled)||0.8 (0.04)||0.9 (0.04)||< 0.0001||0.27|
|HAQ pain (0–3, 3 = severe pain)||1.2 (0.1)||1.2 (0.06)||0.83||−0.003|
|WOMAC function (0–100, 100 = extreme difficulty)||32.0 (1.4)||31.3 (1.4)||0.98||−0.05|
|WOMAC pain (0–100, 100 = extreme pain)||33.5 (1.4)||32.2 (1.5)||0.23||−0.10|
|WOMAC stiffness (0–100, 100 = extreme stiffness)||43.2 (1.8)||38.7 (1.7)||0.05||−0.15|
|Patient global (0–100, 100 = very poor)||32.3 (1.4)||34.9 (1.6)||0.05||0.11|
|Total knee score (0–64, 64 = worst)||8.4 (0.5)||11.1 (0.6)||< 0.0001||0.75|
Correlation analyses (Table 4) suggest a high degree of construct similarity among measures. The HAQ DI and the WOMAC physical function scale are highly and significantly correlated at both times of assessment (rs = 0.71 and rs = 0.79, both P < 0.0001). However, the WOMAC function and WOMAC pain scales also exhibit strong correlations with each other at both times (rs = 0.86 and rs = 0.88, both P < 0.0001). In contrast, the corresponding HAQ scales are modestly correlated (rs = 0.62 and rs = 0.63, both P < 0.0001). Global health is similarly and moderately associated with all other measures at both times, but lowest for WOMAC stiffness (rs = 0.54 and rs = 0.61 for stiffness, both P < 0.0001).
|HAQ pain||WOMAC function||WOMAC pain||WOMAC stiffness||Patient global||Total knee score|
In general, correlations among the WOMAC scales trended higher than the HAQ. TKS showed the lowest correlations with other measures, but was highest for WOMAC function, WOMAC pain, and patient global assessment. Neither the HAQ DI nor the WOMAC stiffness scale was significantly correlated with TKS at either administration. Analysis of the correlations between change scores (Table 5) similarly reveals that, as expected, most relationships between measures were statistically significant (P < 0.0001), but were weaker than raw score correlations.
|Δ HAQ pain||Δ WOMAC function||Δ WOMAC pain||Δ WOMAC stiffness||Δ Global health||Δ Total knee score|
The first goal of our study was to compare the similarities and differences in face and content validity of the HAQ and WOMAC physical function and pain scales. Overall, we found that there was a great deal of commonality between the 2 physical function scales in the types of functional abilities assessed. These similarities, along with their respective histories, reaffirm their utility as measures of functional status.
However, we found that there are significant differences between the 2 instruments. Only the HAQ DI contains items that specifically assess upper extremity function, which was not unexpected because the WOMAC is designed to be a measure of lower extremity function (3). On the other hand, the WOMAC physical function scale contains more items than the HAQ DI that assess both upper and lower extremity function, which was surprising in an instrument designed for lower extremity function in OA of the knee or hip. In contrast to the single-item pain scale of the HAQ, the 5 items used to assess pain in the WOMAC consist of function-based behaviors and suggest attributes of a disability measure rather than purely pain assessment. This finding is similar to those of Wolfe (18), who had reported that the WOMAC tapped into other domains and was influenced by existence of fatigue, symptom counts, depression, and back pain. In addition, the more general wording of many WOMAC items, compared with the specificity in the HAQ, may have contributed to equivocal responses. Ambiguity in item construction has been reported to negatively effect reliability and sensitivity to change relative to items that are more precisely constructed (19).
The second goal of our study was to examine the association between the HAQ and WOMAC physical function and pain scales to determine the extent of their relationship and whether they measure similar constructs. Both physical function scales were significantly and strongly correlated, indicating that the scales performed well and that patients interpreted them similarly. This finding is consistent with the strong correlations between these scales that have been reported by Brazier et al (20), Wolfe (18), and in a study that compared the SF-36 with the WOMAC (21). The WOMAC physical function scale and the SF-36 physical function component were found to be highly correlated (rs = 0.71–0.75), further confirming the WOMAC's ability to measure functional status.
However, the strongest correlations at both time points and with change scores were among the 3 WOMAC scales. These strong intrascale correlations suggest that the 3 WOMAC scales are measuring similar constructs within the overall instrument. In contrast, the relationship between the HAQ DI and the HAQ pain scale was moderate, which may be due in part to the HAQ DI assessing a broader range of functioning and its pain scale being more global and capturing reported pain from both upper and lower extremities. The HAQ and WOMAC pain scales were also correlated with each other, albeit weakly, at both times of administration.
We also found that the 2 physical function scales differed in their responsiveness; the HAQ DI was more sensitive to change than the WOMAC in detection of disease progression, although less sensitive than radiograph scores. This finding is particularly significant because sensitivity to change is crucial for statistical power calcula- tions for estimation of patient numbers for a clinical study. Plausible explanations for this observation are that the items in the HAQ DI are more specific and that the HAQ DI scoring rule, which averages the highest scores in each of the 8 categories rather than averaging all responses, is more sensitive than the corresponding scoring of the WOMAC. This phenomenon has been demonstrated in a comparison of the HAQ DI and the modified HAQ, which asks a single question in each of the 8 categories and then averages the responses (22, 23). In addition, because we had modified the WOMAC timeframe from its convention of 48 hours to 1 week to match the HAQ timeframe, this could have affected the results. However, Griffiths et al (13) reported finding no time dependency of responses between 1 and 14 days.
Furthermore, although radiographic progression was not a principal subject of this analysis, the TKS performed well and was able to distinguish progression at the group level. Although the TKS was poorly correlated with other outcome measures, this was not unexpected because they represent different entities and other researchers have also found little evidence of a strong association between such outcomes as pain and disability with radiographic changes in patients with knee OA (24, 25).
Outcome assessment in arthritis is highly dependent on the deployment of valid, reliable, and responsive measurement tools, such as the HAQ and WOMAC. Overall, both instruments performed well, but the more generic HAQ demonstrated somewhat better measurement properties than the disease-specific WOMAC in this group of patients with OA of the knee or hip. A number of plausible explanations for this finding were identified. These differ from what has been suggested previously (26), specifically that a disease-specific instrument, such as the WOMAC, would be expected to be more sensitive in assessment of the condition for which it was developed. The WOMAC has been shown to be a useful measurement tool in OA of the knee or hip and has achieved status as a standard in these conditions. Whether the HAQ or WOMAC is used should depend in part on the expected magnitude of differences between groups. If small differences are to be shown significant, then the measure that is more sensitive to change is preferred. As such, this indicates that the HAQ would be the better choice. These results also suggest that findings across studies may be better compared using the HAQ, which offers the additional advantages of having a wider usage, multiple language and cultural translations, and being adaptable to a wide variety of diseases and conditions, which contribute to the generalizability of findings.
The authors would like to acknowledge the contributions of Nancy Lane (San Francisco, California), Michael Luggen (Cincinnati, Ohio), Dena Ramey (Blue Bell, Pennsylvania), John Sibley (Saskatoon, Canada), and Frederick Wolfe (Wichita, Kansas).