The health assessment questionnaire disability index and scleroderma health assessment questionnaire in scleroderma trials: An evaluation of their measurement properties
To evaluate the measurement properties of the Health Assessment Questionnaire (HAQ) disability index (DI) for group comparisons in scleroderma trials, and to determine if the Scleroderma Health Assessment Questionnaire (SHAQ) visual analog scales confer any measurement advantage over the HAQ DI.
A computer search for articles describing the use of the HAQ DI and SHAQ in scleroderma was performed. Evidence supporting the sensibility, reliability, validity, and responsiveness of these measures was evaluated.
The SHAQ has incremental face and content validity over the HAQ DI because it addresses scleroderma-specific manifestations that also contribute to disability. The HAQ DI has good concurrent validity, construct validity, and predictive validity. Whether SHAQ confers incremental construct, concurrent, or predictive validity over the HAQ DI is uncertain. The HAQ DI appears more reliable than the SHAQ; however, reliability studies provide insufficient data to ascertain if minimum standards have been achieved. Responsiveness of the HAQ DI subscales has been demonstrated.
The SHAQ has incremental face and content validity over the HAQ DI. The HAQ DI has greater reliability and demonstrated construct, concurrent, and predictive validity. Further investigation into the measurement properties of the HAQ DI and SHAQ visual analog scales, and their relation to the required standards of measurement is needed.
The Health Assessment Questionnaire (HAQ) disability index (DI) and the Scleroderma Health Assessment Questionnaire (SHAQ) are instruments increasingly utilized to assess scleroderma patients in randomized trials (1–4). The HAQ DI is a measure of disability developed for rheumatoid arthritis patients. It is valid, reliable, and responsive to change in this population (5, 6). The HAQ DI contains 8 domains of activity (dressing, arising, eating, walking, hygiene, reach, grip, and common daily activities) each of which has at least 2 questions, for a total of 20 items. For each item, patients report the amount of difficulty experienced performing the activity. There are 4 possible responses for each item ranging from 0 (without any difficulty) to 3 (unable to do). A mean score is calculated for each domain ranging from 0 to 3. A composite HAQ DI score is calculated by dividing the summed domain scores by the number of domains answered. The composite score is reported, falling between 0 and 3 on an ordinal scale. The scores are interpreted as 0 (no impairment in function) to 3 (maximal impairment of function) (7).
The HAQ-DI also contains a visual analog scale (VAS) that patients use to report the amount of pain experienced in the past week. The VAS is a 15-cm line that is converted to a continuous scale from 0 to 3 where 1 cm is equivalent to 0.2 points. The anchors of the VAS are 0 (no pain) to 100 (very severe pain). To obtain the patient score, a metric ruler is used to measure the distance in centimeters from the left anchor to the patient's mark, and then multiplied by 0.2 (8). The VAS pain score is not incorporated into the HAQ DI composite score.
Steen and Medsger extrapolated use of the HAQ DI to scleroderma patients. Believing the HAQ DI was inadequate to evaluate the multisystem effects of scleroderma, they added 5 scleroderma-specific VASs, thereby creating the SHAQ (8) (See Appendix A available at the Arthritis Care & Research Web site at http://www.interscience.wiley.com/jpages/0004-3591:1/suppmat/index.html). The domains address overall disease activity, Raynaud's phenomenon, finger ulcers, breathing, and intestinal problems. The 5 VASs ask patients how much symptoms interfere with daily activities and are scored similarly to the pain VAS. A composite VAS score is not created nor are the individual VAS scores incorporated into the HAQ DI score. Each VAS score is reported individually. Georges et al (9) have proposed a combined score obtained by pooling the 8 domains of the HAQ DI and the 5 VASs; however, this approach has not yet been widely accepted.
With the advent of new therapeutic interventions, the HAQ DI and SHAQ are being recommended for use in trials to evaluate change over time in groups of scleroderma patients (10). Indeed, a few trials have already implemented these instruments to evaluate change in disability (2–4). Although some of the operational issues regarding the HAQ DI in scleroderma trials have been described, the measurement properties of the HAQ DI and SHAQ have not been comparatively related to the recommended standards of measurement (11). Thus, it is difficult for future investigators to ascertain if these measures are appropriate for evaluation of scleroderma patients in trials, and if one measure confers an advantage over the other. The first objective of this study was to evaluate the measurement properties (sensibility, reliability, validity, and responsiveness) of the HAQ DI for group comparisons in scleroderma trials through appraisal of the published literature. Second, this study will evaluate the measurement properties of the SHAQ VASs to determine whether or not they confer added value to the HAQ DI.
Identification of published literature.
Studies were identified using Medline (1966–October 2004), CINAHL (1982–October 2004), and Health and Psychosocial Instruments (1985–September 2004) databases without language restriction. The following subject headings were used: health assessment questionnaire, HAQ, health assessment questionnaire disability index, HAQ-DI, disability, scleroderma, and systemic sclerosis. All subheadings were used to increase comprehensiveness. Titles and abstracts were screened to exclude ineligible studies. Included studies were entered into PubMed and the “related articles” tool was used to search for other eligible studies. The bibliographies of included studies and published reviews were also searched.
Evaluation of measurement properties.
Articles were reviewed using Kirshner's framework for assessing an evaluative index and Wright's criteria to evaluate the quality of a clinical measure (12, 13). The following criteria were used to evaluate the HAQ DI and SHAQ: item selection, item reduction, sensibility, reliability, validity, and responsiveness.
Item selection refers to the process used to collate items that may be included in the final instrument (12). Item reduction is the process used to reduce a large number of items to a manageable number by eliminating inappropriate items (13). Once developed, an instrument's quality can be evaluated by assessing its sensibility, reliability, validity, and responsiveness.
Sensibility is an indication of the usefulness of a measure (13). Principles used to appraise sensibility include a statement of the purpose for which the measure will be used, population, setting, content validity, face validity, and feasibility (14). Face validity evaluates if the information being sought reflects the personal attributes of the patient, and if there is biologic coherence of the items (14). Content validity evaluates if the components of an instrument reflect the conceptual framework (15). Current standards require evaluation of a patient's own perceptions of their health (16, 17). The depth of a measure is evaluated by the prevalence of floor and ceiling effects, referring to the percentage of the sample achieving the best and worst scores possible (18). McHorney and Tarlov (18) suggest that ceiling and floor effects should occur in <15% of respondents. Feasibility refers to the ease of usage of the instrument. The recommended standard of feasibility is that self-reported instruments should be completed in <15 minutes (18).
Reliability refers to the reproducibility of a measure. The test–retest reliability is determined when the measure is administered to the same group of patients on 2 different occasions (13). A common test of reliability is the intraclass correlation coefficient (ICC). The recommended minimum standard for reliability is an ICC of 0.90 if the unit of analysis is a group of patients and 0.95 if the unit of analysis is an individual patient (18).
Construct validity evaluates the relationship of a measure with other measures. Two strongly correlated measures of the same construct have convergent construct validity (19). Concurrent validity assesses the correlation between a measure and the gold standard when they are administered concurrently. Predictive validity assesses the correlation between a measure given at baseline and the standard measure administered some time later (19). These subtypes of validity are evaluated using correlation coefficients (Pearson's r, Spearman's rho, or Kendall's tau) (20).
Responsiveness refers to the ability of an instrument to accurately detect change when it has occurred (21). A traditional method of assessing responsiveness involves correlating a change in the instrument score with changes in physiologic measures (22). More recent methods include reporting a difference in scores, reporting an effect size (calculated by dividing the change score by the baseline standard deviation), or reporting the standardized response mean (calculated by dividing the change in score by the standard deviation of the difference) (20, 21, 23, 24).
Seventy-nine citations were identified from the literature search. Seven articles contained measurement data that could be abstracted for this study (Table 1). Four additional articles provided the historical data pertaining to the HAQ DI, item generation, and item reduction. The remaining citations were excluded because they described other measures of disability, assessed outcomes that did not include disability, used selected portions of the HAQ DI, or did not report data pertaining to the measurement properties of the HAQ DI or SHAQ. A list of the excluded articles can be obtained from the authors upon request.
Table 1. Summary of studies addressing measurement properties of the HAQ DI and SHAQ VAS in scleroderma patients*
|Sensibility|| || |
| Purpose, population, setting||Yes (29)||Yes (8)|
| Content validity||NT||NT|
| Face validity||NT||Yes (8)|
| Feasibility||Yes (30)||Yes (30)|
|Reliability||Yes (33)||Yes (33)|
|Validity|| || |
| Convergent construct||Yes (30)||Yes (30)|
| Construct||Yes (29)||NT|
| Predictive||Yes (47)||NT|
| Concurrent||Yes (40, 43)||Yes (30)|
|Responsiveness||Yes (8, 33)||Yes (8, 33)|
The content of the HAQ DI was determined by Fries and coauthors (5) after review of domains and questions of other existing physical function instruments. These included the Uniform Database for Rheumatic Diseases, a patient status scale developed by Convery et al, the Barthel index, and an activities of daily living index from Katz et al (25–28). Materials were reviewed and 62 questions designed to assess functional abilities were selected.
The SHAQ uses the full HAQ DI, including the pain VAS, together with 5 additional scleroderma-specific VASs. These additional VASs were developed and included based on the clinical judgment of the developers.
In a sample of 40 consecutive rheumatoid arthritis patients, a psychometric approach was used for HAQ DI item reduction. Spearman's rank correlation coefficients were calculated for each of the 62 potential questions against every other question, and against a composite disability index representing the mean of question responses. Questions with low correlations (<0.5) with the overall index were eliminated because they did not relate to the underlying concept being measured. Where interitem correlations were ≥0.90, indicating redundancy, 1 item was eliminated. Using 3 iterations of this process, the most parsimonious set of items was created (5, 6).
Only rheumatoid arthritis patients were consulted in the generative and reductive processes. Thus, it is difficult to ascertain if these items are relevant or generalizable to scleroderma patients. The developers of the SHAQ created the VASs based on issues they believed were important to scleroderma patients. The degree of participation by scleroderma patients in the VAS development is uncertain.
Sensibility assessment of a measure includes a statement of the purpose for which the measure will be used, population, setting, content validity, face validity, and feasibility (14).
Purpose, population, and setting.
The HAQ DI was designed to describe and evaluate changes in disability in adult rheumatoid arthritis patients in the clinic setting. In the first article describing use of the HAQ DI in scleroderma patients, Poole and Steen use the HAQ DI to “determine disability in patients with systemic sclerosis” (29).
The addition of disease-specific VASs made the SHAQ a disease-specific instrument for assessment of disability in scleroderma patients in the clinic setting. Its stated purpose is to evaluate “meaningful clinical changes in the course of the disease over time” (8). Although the population and setting are clearly specified, the purpose is ambiguous because the conceptual framework of disability in scleroderma is not clearly defined. The ambiguity in a conceptual framework has resulted in subsequent investigators using these tools to measure functional “limitation,” “capacity,” “impairment,” “status,” and “disability” interchangeably in scleroderma patients (4, 30–33).
The HAQ DI has reasonable content validity because it contains important domains pertaining to activities of daily living. However, some investigators have criticized the HAQ DI for insufficiently assessing disability caused by skin tightness and muscle weakness (34). This has led to the development of other scleroderma-specific indices (30, 34–37). Examples of activities deemed to be important include, “Can you lift and pour off water from a sauce pan?” and “Can you unscrew a jam jar lid from a jar that has been opened?” (34).
The SHAQ has incremental content validity to the HAQ DI, because it contains scleroderma-specific domains that contribute to the multifaceted conceptual framework of disability in scleroderma. The addition of the SHAQ VASs enhances the ability to capture disability secondary to internal organ involvement over the HAQ-DI alone. One threat to the content validity of the SHAQ VASs is the phrasing of the questions. In essence, they are double-barreled questions; they require the scleroderma patient to ascertain the degree of severity of the organ in question and ascertain the degree of interference with daily activities.
The face validity of the HAQ DI has not been tested among scleroderma patients. Because the HAQ DI was originally intended for rheumatoid arthritis patients, scleroderma patients have not been consulted regarding the applicability of the HAQ DI domains to their lives. One potential threat to both face and content validity of the HAQ DI is that some items (walking and arising) are not considered major problems in scleroderma (34). However, these items may be an issue in a minority of patients, particularly those with severe disease. Thus their inclusion allows the measure to capture the depth and breadth of problems experienced by scleroderma patients. To clarify this issue, future investigators should report the percentage of patients reporting the best and worse possible scores, thereby indicating the ceiling and floor effects of these domains.
Steen and Medsger (8) assessed the face validity of the SHAQ VASs. They asked 11 scleroderma patients how they described their symptoms when they responded to the VAS questions. All patients used at least one of the same words or phrases as used in the stem of the VAS.
The HAQ DI and the SHAQ are inexpensive and are easily completed by patients in the clinic (30). Both measures are in the public domain, and permission for their use is routinely given without charge. However, the authors should be acknowledged in subsequent publications. The scoring of the SHAQ limits its feasibility. The SHAQ does not provide a simple aggregate score. An aggregate score for the HAQ DI is easily calculated using simple arithmetic, but each of the VAS scores are reported individually. In essence, the SHAQ is composed of separate scales. Comparisons within and between patients require comparisons of each scale individually. Georges et al (9) have proposed a composite score obtained by pooling the 8 domains of the HAQ DI with the 5 VASs. If widely accepted, this scoring system may improve the feasibility of within- and between-group comparisons.
Reliability is an essential quality of an instrument. It represents the degree of consistency across repeated assessments. A difference in a reliable clinical measurement can be more confidently attributed to clinical change. Furthermore, reliable measures have less noise (measurement error), thus fewer subjects are required in trials in which the measured outcome of interest has high versus low reliability (38).
Reliability of the HAQ DI and SHAQ VASs was evaluated in patients with Raynaud's phenomenon secondary to scleroderma during the stable posttrial period of an iloprost study (33). Investigators estimated the repeat-measure reliability by dividing the standard deviation of the difference in scores between week 6 and 12 by the observed range of the variable at baseline (39). In the evaluation of Raynaud's phenomenon, the investigators found all the VASs to have lower reliability than the other outcome measures assessed (Raynaud's condition score, physician's and patient's global assessment, Arthritis Impact Measurement Scales 2). The standard deviation of the differences varied between 15.0% and 25.3% of the baseline range. The HAQ DI had an intermediate value of 11.9%.
Steen and Medsger (8) assessed test–retest reliability of the SHAQ in 50 scleroderma patients who completed the SHAQ on 2 occasions within 1 month. The correlation coefficients for the 2 scores was 0.89 for the HAQ DI and 0.78–0.87 for the Raynaud's phenomenon, finger ulceration, and breathing VASs (P < 0.001). The pain and gastrointestinal (GI) VASs were less well correlated with coefficients of 0.69 and 0.68, respectively (P < 0.001). Uncertainty regarding the correlation coefficients used to analyze the data is a limitation of this study. It is unclear whether a statistic of association (Pearson's, Spearman's) or concordance (intraclass or Kappa) was used.
Smyth et al (30) assessed concurrent validity by comparing the HAQ DI with the United Kingdom Scleroderma Functional Score (UKFS). The UKFS is an 11-item questionnaire with questions pertaining to upper limb function and muscle weakness (34). When compared, the UKFS and HAQ DI showed excellent correlation (r = 0.90). The SHAQ VASs were moderately correlated with the UKFS (r = 0.45–0.72). The digital ulceration VAS scale was poorly correlated with the UKFS (r = 0.18).
Poole and Steen (29) also assessed construct validity using the method of extreme groups (19). They hypothesized that patients with diffuse scleroderma would have more disability as reflected by higher HAQ DI scores than patients with limited scleroderma (29). Their results indicate a significantly higher HAQ DI aggregate score in patients with diffuse versus limited scleroderma (1.10 versus 0.67, P < 0.001).
Poole et al (40) assessed concurrent validity by comparing self-reported disability on the HAQ DI with the performance of 10 of the items on the HAQ DI scored by an occupational therapist blinded to the self-report measures. The overall ICC was 0.76, suggesting the HAQ DI has reasonable concurrent validity. However, the individual items had ICCs ranging from 0.38 to 0.72. These results are not surprising because ratings of performance compared with self-reported performance ratings have been shown to differ (41, 42). Thus, correlation coefficients in this range are acceptable.
Similarly, Brower and Poole (43) evaluated the concurrent validity of the HAQ DI and the Duruöz Hand Index (DHI) in 40 scleroderma patients. The DHI is a self-administered questionnaire containing 18 items regarding hand ability performing kitchen tasks, dressing, personal hygiene, and office tasks. The investigators report a Spearman's rho of 0.79 (P < 0.01 between the HAQ DI and DHI) (43).
Predictive validity of the HAQ DI was evaluated using data from a trial of methotrexate in scleroderma (3). The sample was divided into those who had ≥20% improvement in the primary outcome measures (patient global assessment, physician global assessment, University of California Los Angeles skin tethering score, modified Rodnan skin score, diffusing capacity for carbon monoxide [DLCO] as % predicted, and HAQ DI score) at 1 year and those who did not. The investigators identified baseline characteristics that correlated with ≥20% improvement. They tested these variables using data from a trial of D-penicillamine in scleroderma to determine if they were still predictive of improved outcome at 1 and 2 years (4). They found that when dichotomized, a HAQ DI score of less than the median score for the sample was associated with improved outcomes (a modified Rodnan skin score odds ratio [OR] 2.3, P < 0.09 and OR 3.4, P < 0.02 at 1 and 2 years, respectively; DLCO OR 3.4, P < 0.04 at 2 years; and physician global assessment at 2 years OR 3.0, P < 0.03). However, the correlation coefficients ranged from 0.18 to 0.35 for the various measures. This suggests that a dichotomized HAQ DI score has good predictive validity. However, if the score is used as a continuous measure, then the predictive validity is less strong. Furthermore, because the median of this sample was used as the cut-point, it is unclear whether this is the most relevant cut-point, since the median is sample dependent. The predictive validity of the SHAQ VASs has not been published.
Responsiveness of the SHAQ was evaluated in 1,250 scleroderma patients (8). Scleroderma patients who died had a significant increase in HAQ DI score prior to death. Patients who had a >15% worsening of their skin score had a significant worsening in their HAQ DI score; those who had a >15% improvement in skin score had a significant improvement of their HAQ DI score (r = 0.68, P < 0.001). The HAQ DI was also shown to reflect changes in disease status associated with medication use. Patients treated with D-penicillamine had an improvement in HAQ DI score over time (1.21 to 0.88), whereas those not treated had, on average, worsening of their HAQ DI score (0.95 to 1.46; P < 0.0001). These results are in keeping with the direction and magnitude of change one would expect.
The SHAQ VASs are also responsive to change over time (8). Patients who developed digital ulceration had a worsening of their vascular VAS (0.38 to 1.13; P < 0.001), whereas patients who demonstrated improvement in digital ulceration had an improvement in their vascular VAS score (1.31 to 0.34, P < 0.001). Patients who developed GI symptoms had a worsening of their GI VAS score (0.47 to 0.96), whereas those who had an improvement in GI symptoms demonstrated an improvement in their GI VAS score (1.01 to 0.49; P < 0.001). Similarly, patients who demonstrated >15% decline in forced vital capacity % predicted had an increase of 1.11 in the lung VAS score, whereas patients who demonstrated a >15% improvement demonstrated a decrease in lung VAS score by 0.62.
This study summarizes the sensibility, reliability, validity, and responsiveness of the HAQ DI for the purpose of assisting researchers in deciding its appropriateness for group comparisons in scleroderma trials. Secondly, this study evaluated the measurement properties of the SHAQ VASs to determine whether or not they confer incremental value over the HAQ DI. Incremental validity refers to the degree to which a measure makes a contribution to the predicted outcome over that possible with a simpler measure (44). Dimensions of incremental validity include incremental content validity, incremental predictive validity, incremental sensitivity to change, and ecologic validity or generalizability across settings (45).
Both the HAQ DI and SHAQ VASs are sensible measures of difficulty performing daily activities in scleroderma patients in trials. Both are equally feasible. The SHAQ VASs confer incremental face and content validity over the HAQ DI alone because they evaluate the impact of scleroderma-specific disease manifestations on daily activities. However, in this era of multicenter international trials, the crosscultural content validity of both instruments is uncertain. To date, the SHAQ has been validated in France (9). However, the phrasing and content of the daily activities may not have ecologic validity to scleroderma patients in other parts of Europe or Asia. Additionally, to meet current standards of content validity, further inquiry regarding patient's perceptions of their health in relation to the content of the instruments is required.
Due to the incremental content validity of the SHAQ, future investigators may consider using the SHAQ over the HAQ DI alone as an outcome measure in scleroderma trials. The SHAQ VASs enhance the ability of the HAQ DI to capture a greater breadth of factors contributing to disability in scleroderma. One portion of the SHAQ may be used as the primary outcome measure, whereas other VASs may be used as secondary outcomes measures. For example, endothelin receptor antagonists have been studied for the treatment of pulmonary hypertension in scleroderma, but more recently have been shown to reduce digital ulcer burden and improve hand function (46). The use of the SHAQ in this manner has implications for trials assessing responsiveness to therapy. In a trial where an individual VAS is the outcome measure (e.g., overall disease activity), the sample size calculation will need to be based on the effect size of the overall disease activity VAS. However, if the outcome measure is the entire SHAQ, the sample size calculation will need to be based on the VAS with the smallest effect size.
With regards to validity, it is important to recognize that testing of construct validity is difficult because there is no gold standard of disability in scleroderma against which one can compare an index. The HAQ DI has excellent convergent construct validity when compared with the UKFS and DHI. The relatively poor correlation of the SHAQ VASs with the UKFS is not a surprising result. Because the UKFS focuses on disability caused by skin tightness in the upper limb and muscle weakness, one would not expect a strong correlation with disability secondary to intestinal or breathing problems. The construct validity of the SHAQ VASs needs to be evaluated before conclusions regarding their incremental construct validity over the HAQ DI can be made.
The HAQ DI has reasonable concurrent validity when compared with an occupational therapy assessment (ICC 0.76) and reasonable predictive validity when the HAQ DI score is used as a dichotomous outcome. The concurrent and predictive validity of the SHAQ VASs have not been assessed, and thus it is difficult to ascertain if the SHAQ confers incremental concurrent or predictive validity over the HAQ DI.
Studies assessing reliability suggest the SHAQ VASs do not confer incremental reliability over the HAQ DI alone. However, both studies provide insufficient data to make definitive conclusions. Further reliability testing of the HAQ DI and SHAQ VASs is imperative because reliability is a basic requirement of a valid scientific instrument (13). Appropriate analysis including correlation coefficients of concordance should be reported.
The HAQ DI and SHAQ VASs have demonstrated responsiveness to change in clinical parameters. However, only the absolute change in scores was described, and data to calculate the standardized response mean or effect size were not provided. Thus it is difficult to ascertain the magnitude of change over time and the relative sensitivity to change. Future studies should report the effect size and standardized response mean. These values will assist readers in assessing the value of observed change in relation to expected change in functional outcome. The effect size will also assist sample size calculations for future trials.
Determination of the incremental value of the SHAQ VASs over the HAQ DI alone is complicated by a number of factors. First, there is a lack of vigorous evaluation of the HAQ DI and SHAQ VASs in the aforementioned measurement areas. Second, the SHAQ VASs are 5 stand-alone measures that evaluate distinct aspects of disability in scleroderma. A scleroderma patient may have a high lung VAS score secondary to pulmonary hypertension but have a low HAQ DI score because they do not have difficulty eating, arising, or grooming. Because an external criterion for disability does not exist, it is difficult to quantify the incremental value of the lung VAS over the HAQ DI alone. Thus, incremental value assessment of the SHAQ VASs over the HAQ DI is difficult. Future investigators may consider using a composite SHAQ score to ascertain incremental validity over the HAQ DI alone (9).
There are several important directions for the future. Future investigators should outline the conceptual framework of disability that the HAQ DI or SHAQ VASs are being used to measure. Additional reliability and responsiveness studies of these instruments are required and investigators should report the effect size and standardized response means. Further construct and predictive validity testing of the SHAQ VASs is required.
In conclusion, the SHAQ has incremental face and content validity over the HAQ DI alone because it addresses scleroderma-specific manifestations that contribute to the conceptual framework of disability in scleroderma. The HAQ DI has good concurrent, construct, and predictive validity. Whether the SHAQ VASs confer incremental construct, concurrent, or predictive validity is uncertain. The SHAQ VASs do not appear to confer incremental reliability over the HAQ DI. Further investigation into the measurement properties of the HAQ DI and SHAQ VASs and their relation to the minimum standards of measurement is needed before these instruments can be confidently utilized in scleroderma trials.
We would like to acknowledge Dr. Virginia Steen for providing information regarding the historic development of the SHAQ, and for granting permission for the SHAQ to be added as an appendix to this manuscript.