- Top of page
- PATIENTS AND METHODS
The rheumatic disease process frequently leads to physical disability (1). Researchers aiming for a better understanding of rheumatoid arthritis (RA) outcome must first quantify it in a meaningful and reliable way. However, physical disability is a hypothetical construct, i.e., it was put together by scientists to explain the decline in the ability to perform physical activities that can occur in RA and other diseases (2). This implies that physical disability cannot be directly observed or measured, and as such, it can be considered a latent variable (3). Available measurement tools to assess physical disability in RA indirectly tap into the underlying construct (3).
To measure disability in RA, researchers have a variety of instruments and scales from which to choose (4). Some of these are considered arthritis-specific because they center on outcomes more immediately relevant to arthritis (5–7). Generic scales, on the other hand, measure more global outcomes and are suitable for studying a diversity of diseases (8). Each has its own set of advantages (9–12). Some empirical studies, however, have not found major differences in performance between the 2 types of scales (13, 14). The choice between one type over the other not being clear cut, some authorities reasonably advocate including both an arthritis-specific and a generic outcome measure in RA trials (9, 15). This recommendation has the added benefit that 2 or more measurement tools will provide a more reliable representation of the underlying construct (16).
However, not much attention has been given to how to report the results of studies that include 2 or more outcome measures of the same construct. The option of describing results on both scales separately in the same, or different, reports has certain disadvantages, including the need to conduct separate, parallel analyses; a greater potential for type I errors due to multiple comparisons; the added space needed to show results fully; enticement to duplicate publication; and problems of interpretation if results on the different scales diverge. When the last of these occurs, investigators may be lured into simply omitting results on the scale that do not fit their hypotheses. These potential problems, theoretical or real, can be averted by a data-reduction process aimed at estimating the underlying latent variable, conserving or enhancing information provided by the various scales.
We have confronted some of the above dilemmas during an ongoing study of the disablement process in RA. We selected 2 self-report scales, one generic and the other disease-specific, and an observer-derived classification system to assess the extent of physical disability in RA. In this article, we describe the data-reduction process we utilized to derive a parsimonious, single variable representing the construct of physical disability. We also show evidence of its equivalence, or superiority, to the 3 primary scales.
- Top of page
- PATIENTS AND METHODS
As expected for a group of people with established RA visiting a rheumatologist, most were women, median disease duration was 8 years, and rheumatoid factor was present in the majority (Table 1). The median number of 8 deformed joints indicates a substantial amount of joint damage (22). In accord with this finding, only 21% of the patients were working full or part time, and 27% stated they were unable to work. Of the 756 patients on whom we had followup information up to 6 years later, 71 were known to have died (9%).
Table 1. Clinical characteristics of the 776 RA patients studied*
|Characteristic||No. with data available||Distribution|
|Age, median (range), years||776||57 (19–90)|
|Male, no. (%)||776||229 (30)|
|Ethnic group, no. (%)||776|| |
| White|| ||272 (35)|
| Black|| ||53 (7)|
| Asian|| ||14 (2)|
| Hispanic|| ||431 (56)|
| Other|| ||6 (1)|
|Education, median (range), years||772||12 (0–17)|
|Currently working, no. (%)||776||166 (21)|
|Disabled for work, no. (%)||776||213 (27)|
|Time from disease onset, median (range), years||776||8 (0–52)|
|Tender joint count, no. (%)||776||15 (13)|
|Swollen joint count, no. (%)||776||7 (7)|
|Deformed joint count, no. (%)||776||10 (11)|
|Nodules, no. (%)||776||233 (30)|
|Rheumatoid factor positive, no. (%)||770||682 (89)|
|Walking velocity, mean ± SD, meters/minute||775||59 ± 25|
|Grip strength, mean ± SD, lbs||776||14 ± 10|
|Button test, mean ± SD, buttons/minute||769||7.1 ± 3.8|
|MHAQ, mean ± SD||776||1.89 ± 0.70|
|SF-36, mean ± SD||776||35.6 ± 27.87|
|Steinbrocker functional class, mean ± SD||776|| |
| I|| ||163 ± 21|
| II|| ||383 ± 49|
| III|| ||190 ± 24|
| IV|| ||40 ± 5|
|Latent Disability Scale, mean ± SD, lbs||776||56 ± 23|
|Deaths as of March 2002, no. (%)||756||71 (9)|
Figure 1 is a diagram of the factor analysis we used to derive the physical disability latent variable. The 3 primary variables, M-HAQ, SF-36PF, and Steinbrocker class, loaded strongly on a single factor, with loadings ≥0.8. This factor explained ≥75% of the primary variables' combined variance. Uniqueness values were <0.3 for each of the primary variables, indicating that these share more than two-thirds of their combined variance. We extracted the single factor without rotation, using linear regression scoring. Figure 2 shows probability distributions for the 3 primary scales and the latent variable.
Figure 1. Diagram of the factor analysis we conducted to extract the latent variable measuring physical disability. The 3 primary variables are represented by squares and the circles represent information outside the latent variable. M.H.A.Q. = modified Health Assessment Questionnaire; SF-36 = Short Form 36.
Download figure to PowerPoint
Figure 2. Frequency distributions of the disability scales employed. Disability level decreases from left to right. A large proportion of patients had a score of 1 on the modified Health Assessment Questionnaire (M.H.A.Q.), indicating low disability levels on this scale (top left). However, the opposite is true for the Short Form 36 physical function (SF36PF) scale, in which the largest category is made up of patients with low scores, indicating high disability levels (top right). The Steinbrocker functional class provides only 4 levels to classify physical disability (lower left). The distribution of scores on the latent disability scale approached normality (lower right).
Download figure to PowerPoint
The Pearson correlation coefficients between the extracted latent variable and the primary variables, as expected, was also strong, with r values ≥0.8. Figure 3 shows scatterplots of the bivariate distribution of these variables. The correlations between the latent variable and the criterion variables (pain, joint tenderness, swelling or deformity, grip strength, walking velocity, and the timed button test) are shown in Table 2, contrasted with the correlation coefficients between the primary scales and the same criterion standards. The latent variable had a significantly stronger correlation with most of the criterion standards than did the primary variables M-HAQ, SF-36PF, and Steinbrocker class. Notable exceptions were the correlation with the pain and articular examination variables, for which there was no significant difference between the M-HAQ and the latent variable. Also interestingly, the number of deformed joints correlated more strongly with Steinbrocker class that with any of the other physical disability scales.
Figure 3. Matrix plot showing the bivariate distribution of the 3 primary variables and the latent variable. The Pearson correlation coefficient between the latent variable and the modified Health Assessment Questionnaire was −0.87; between the latent variable and Short Form 36 physical function scale (SF-36PF) was 0.89; and between the latent variable and the Steinbrocker class was −0.85. All coefficients were significant at P ≤ 0.0001.
Download figure to PowerPoint
Table 2. Correlation between physical disability scales and variables measured as criterion standards*
| ||MHAQ||SF-36PF||Steinbrocker class||Latent variable|
Figure 4 shows the relationship between the latent variable and selected comparison criteria. These graphs show the association between higher values in the latent variable and graded decreases in the number of deformed joints, the proportion of disabled patients, and the proportion of those who died within 6 years. Conversely, performance-based functional measures (grip strength, timed button test, and walking velocity) displayed a proportional rise with increasing values on the latent scale, as did the probability of working full or part time.
Figure 4. Relationship between the latent variable measuring physical disability and the criterion measures deformed joint count (top left; trend P ≤ 0.001); walking velocity, grip strength, and timed button test (top right; trend P ≤ 0.001 for each variable); work disability and death within 6 years (bottom left; trend P ≤ 0.001 for each); and currently working (bottom right; trend P ≤ 0.001). Error bars represent standard error.
Download figure to PowerPoint
Table 3 shows the BICs of models that contained age, sex, and each of the 4 disability scales (the M-HAQ, SF-36PF, Steinbrocker class, and the latent variable) as independent variables for each of the criterion standards. For most of the criterion standards, the BIC was smaller, indicating better fit, in the models that included the latent variable (Table 3). Notable exceptions, again, included the Steinbrocker class, whose model had a better fit versus the deformed joint count than did any of the other physical disability scales. Likewise, there was positive evidence that the SF-36PF fit better in a model for disabled work status, than did any of the other physical disability scales.
Table 3. Bayesian information criterion of multivariate models, according to physical disability scale used as independent variable*
|Dependent variable||Physical disability scale included as independent variable in multivariate model†|
|MHAQ||SF-36PF||Steinbrocker class||Latent variable‡|
|Death within 6 years||−4,782#||−4,780§||−4,775§||−4,791|
|Tender joint count||809#||855§||933§||818|
|Swollen joint count||−8¶||−5¶||7§||−14|
|Deformed joint count||662§||661§||519||607|
|Timed button test||−7,501§||−7,465§||−7,478§||−7,597|
- Top of page
- PATIENTS AND METHODS
One desirable characteristic of research data is parsimony, or simplicity of explanation (31). Under this principle, one variable is preferable to 2 or more, providing that the single variable is as informative as the 2 or more. We have shown evidence that a single latent variable derived from principal component factor analysis of 3 scales, the M-HAQ, the SF-36PF, and the Steinbrocker functional class, has equal or superior performance to the primary scales, as manifested by an equal or stronger degree of association with the criterion standards we selected. We used the disablement process as a theoretical framework to inform our selection of criterion standards (18, 32, 33), aiming to test the underlying physical disability construct from as many perspectives as possible. Thus, our comparison criteria included key RA impairments, such as the amount of pain and the number of tender, swollen, and deformed joints (33). We also used measures of functional limitation, occupational status, and death within 6 years as criteria.
The correlation between the joint impairments and the latent variable was nearly always stronger than that between the same impairments and the primary disability scales (Table 2). This likely is due to the superior reliability of the latent variable, which is a composite of the 3 primary disability scales. This approach has been referred to as incomplete principal component regression because the variable of interest is provided by the first principal component in a factor analysis (34). The composite measure's stronger correlation with most criterion standards conforms to a fundamental theorem of measurement theory
according to which the correlation between 2 variables, x and y is limited by the square root of the product of each variable's reliability (16). However, there were 2 comparisons that did not follow this rule: The M-HAQ correlated equally strongly as the latent variable with the impairments; and the Steinbrocker class correlated more strongly with the number of deformities than did any of the other disability scales, including the latent variable. The reason for this may be that examiners may have incorporated findings from the joint exam into their judgment of the Steinbrocker class. In contrast, the M-HAQ and the SF-36 are self-reported scales that patients answer according their own perceived condition.
We also used 3 performance-based measures of functional limitation: grip strength, walking velocity, and timed button test. Within the disablement process framework, these measurements are closer to the physical disability construct than are the joint impairments (18, 32, 33) and, consequently, their degree of correlation with the disability scales was stronger. Here, even more so than with the impairments, the latent variable's association with the performance-based measures was stronger than that of any of the 3 primary physical disability scales considered individually.
Work loss is one of the main adverse consequences of RA (35). We found that work status was strongly associated with the 4 physical disability scales with a tendency for the association to be stronger for the latent variable. Likewise, death displayed a similar pattern of association. One of the main uses of these comparison standards, occupational and vital status, is as anchors that researchers or clinicians can use to interpret the values along the latent variable scale. As shown in Figure 4, there are strong adverse outcomes associated with lower values for the latent variable.
The physical disability scales we used in the present analyses, including the latent variable we developed, often find use in multivariate models, either as outcomes or predictors. We thus compared the fit of models that included the different physical disability scales as independent variables and each of the different criterion variables as outcomes. Because each of the criteria we used can be heavily influenced by age and sex, we included these 2 variables as covariates in all of the multivariate models. We chose the BIC as a comparative measure because it is a tool used often for model selection (30, 36). We expected that the models that included the new latent variable would have smaller BICs, indicating better fit. Indeed this was the case with nearly all of the criterion variables.
The latent disability variable has distributional advantages over the 3 primary scales. Both the M-HAQ and the SF-36PF display skewed distributions (Figure 2), the former displaying a ceiling effect, the latter, a floor effect (37). The latent scale lacks skewness in either direction, more closely approximating normality than any of the primary scales. Moreover, the latent variable displays an interval or near-interval distribution, as suggested by the monotonic rise in criterion variables as the scale increases (Figure 4).
The latent variable has theoretical advantages as well: Physical disability is a hypothetical construct and claims that any one disability measurement scale is superior to others are debatable. Using more than one measurement tool may be a more accurate way to get at the underlying construct because it enables the unmeasured construct to be assessed from a variety of angles. For this same reason, the idea of using both a scale intended specifically for arthritis and one intended for unselected populations (9, 15) is quite attractive, because the arthritis-specific scale, the M-HAQ in our study, will capture the arthritis-relevant outcomes whereas the generic scale, here provided by the SF-36PF, will capture an overall nonspecific disease impact.
We acknowledge some limitations of our analysis. Factor analysis assumes that data are distributed on interval, multivariate normal scales, an assumption that may not be stringently met by the 3 disability scales we entered into the factor analysis. However, this assumption is a strict requirement only if statistical inference is used to determine the number of factors and can be relaxed when factor analysis is used descriptively (26, 38). The least squares factor extraction method we used is also robust to deviations from normality (39). The M-HAQ and SF-36PF scales we used were developed using sound psychometric theory to produce results on interval or near-interval scales, and they have each been used as such in numerous studies over many years. We used the composite scores of both these scales, scored as originally intended. It is possible, however, to select items from each of these scales and calibrate their weights so that they more closely approximate a true interval or ratio scale by using item response theory or Rasch analysis (40, 41). This may represent an alternative method to accomplish the aims we pursued here.
Data parsimony is a desirable feature in a research study; among other reasons, because it avoids the problems we mentioned in the beginning of this article. In the present analysis, we have reduced the original 3 scales into 1 single variable that in many respects outperforms the individual primary scales. A similar data reduction strategy could be used for other RA processes, such as inflammatory disease activity, disease damage, joint impairment, and functional limitation (32, 33). For example, a latent variable extracted from the disease activity measures recommended for RA clinical trials (42) could potentially lead to more efficient trials if the latent variable outperforms the primary scales, as was the case for disability measures in the present analysis.
It is important to point out that ours is a data-driven approach, and that the latent variable cannot be fully specified as an outcome measure in advance of a study. We do not advise investigators to attempt to directly apply the factor loadings we estimated here to develop a latent disability variable for use in their own studies, because data from another patient sample could be quite different. Moreover, investigators may have reasons to choose a different set of primary disability scales from those used here. We do believe, however, that researchers can apply a principal component factor analysis, similar to that shown here to their own data, to extract a latent variable that will likely exceed the primary scales in reliability.
In conclusion, we have used factor analysis to derive a latent variable that measures physical disability in RA. The new variable outperforms the primary scales in a number of tests of association with comparison criterion standards. This approach may be used to develop latent variables measuring other RA disease components, such as disease activity, damage, and functional limitation.