Confirmatory factor analysis of the Arthritis Impact Measurement Scales 2 short form in patients with rheumatoid arthritis




To examine the factorial validity of the short form Arthritis Impact Measurement Scales 2 (AIMS2-SF) in patients with rheumatoid arthritis (RA).


Data were from a sample of 279 patients with active RA who completed the long form AIMS2 before starting treatment with tumor necrosis factor α–blocking agents. Confirmatory factor analyses were conducted to test and compare the fit of the currently used theoretical measurement model of the AIMS2-SF, originally suggested for the long form AIMS2, and 2 alternative models based on previous exploratory research.


A model with the physical dimension divided into upper and lower body limitations was superior to the current model, and both models provided a clearly better fit than a model without a separate symptom dimension. Under the restrictive assumption of uncorrelated error terms, none of the models achieved a consistent and acceptable fit as judged by several goodness-of-fit indices. Allowing error covariances between 6 pairs of items within the same dimension resulted in an improved and acceptable fit of both the current model and the model with a separate upper and lower body component.


This study generally supports the factorial validity of the AIMS2-SF and suggests the use of separate scores for upper and lower body limitations. Further research is needed to resolve the issue of high error correlations associated with particular items.


Since its introduction by Meenan et al in 1980 (1), the Arthritis Impact Measurement Scales (AIMS) and the revised AIMS2 (2) have been widely used for measuring health status in patients with rheumatic diseases. Because the length of the AIMS2 limited its use in clinical research and routine practice, Guillemin et al (3) developed a 26-item short form of the questionnaire (AIMS2-SF) for patients with rheumatoid arthritis (RA). The investigators attempted to preserve the content validity of the questionnaire and the final short form version showed similar psychometric properties as the original long form AIMS2.

Despite its increasing use, however, empirical support for the factorial validity of the AIMS2-SF is still somewhat limited. For multidimensional measures such as the AIMS2-SF, factorial validation is important for understanding how to score and interpret the different dimensions. According to the study by Guillemin et al (3), the AIMS2-SF is usually scored using the 5 dimensions originally suggested as second-order components for the long form AIMS2. Although a number of studies have examined the underlying structure of the AIMS2-SF, all of the studies used exploratory factor analysis. Moreover, somewhat different factor solutions were found across different study samples (Table 1).

Table 1. Sample characteristics and factor structures from previous exploratory factor analyses of the Arthritis Impact Measurement Scales 2 Short Form (AIMS2-SF)*
Author, year (ref.)Study sampleNo. of patientsStudy designDisease severityMean ageMean disease durationFemale, %Factor solution
  • *

    RA = rheumatoid arthritis; OA = osteoarthritis.

  • Arbitrarily defined using the reported mean AIMS2-SF symptom scale score (<3 = mild, 3–6 = moderate, >6 = severe).

  • Items from the role component were excluded from all factor analyses.

Guillemin et al, 1997 (3)RA127Prospective cohort of patients starting methotrexateSevere518765 factors: upper body, lower body, affect, symptom, social interaction
Ren et al, 1999 (4)OA147Cross-sectional performance test sampleModerate6614815 factors: upper body, lower body, affect, symptom, social interaction
Taal et al, 2003 (5)RA587Cross-sectional outpatient samplesModerate6116633 factors: physical, psychological, social
Rosemann et al, 2005 (6)OA220Cross-sectional primary care patient sampleModerate4710443 factors: physical, psychological, social

Analysis of the original AIMS2-SF in a French cohort study of patients with RA starting treatment with methotrexate identified 5 factors, representing upper body function, lower body function, affect, symptom, and social interaction (3). This factor structure was indeed close to the original dimensions of the long form, with the physical dimension, however, split into 2 parts. In a cross-sectional study of patients with osteoarthritis (OA) in the US, Ren et al (4) reported a very similar 5-factor solution. Contrary to these findings, a Dutch study using cross-sectional data from 3 studies of outpatients with RA found a 3-factor solution representing a physical, psychological, and social dimension (5). All lower and upper body items loaded on 1 factor and the 3 symptom items loaded on both the physical and psychological dimension, but more strongly on the psychological dimension. This 3-factor solution was closely replicated in another cross-sectional study of German patients with OA in primary care (6).

Given the inconclusiveness of previous efforts to identify the factor structure of the AIMS2-SF, it is unclear whether its current scoring procedure is appropriate. Therefore, the objective of the present study was to further examine the factor structure of the AIMS2-SF using confirmatory factor analysis (CFA) in a new sample of patients with active RA. CFA provides a more powerful test of factorial validity than exploratory approaches by examining whether a hypothesized measurement model adequately fits the data of a given sample. Moreover, it allows for the comparison of competing models and selection of the best fitting model. Finally, CFA can be used to refine existing models to increase their parsimony and statistical power. Therefore, CFAs were conducted to test the current measurement model of the AIMS2-SF and compare it with 2 alternative models based on previous exploratory analyses.



Participants in this study were patients with RA who were included in the Dutch Rheumatoid Arthritis Anti–Tumor Necrosis Factor Monitoring (DREAM) register. The DREAM register is an ongoing observational cohort study of patients with RA starting anti–tumor necrosis factor treatment in 11 hospitals in The Netherlands. Details on the inclusion criteria, methodology, and cohort characteristics of the DREAM study are reported elsewhere (7). For this part of the study, we used data from a subset of patients who additionally completed the AIMS2 at baseline. The DREAM cohort study was approved by the appropriate institutional ethics committees and all patients provided written informed consent.


The AIMS2 is a self-administered questionnaire designed to measure various dimensions of health status in patients with arthritis (2). It has been used in different rheumatic conditions and an increasing number of validated translations are available (8–15). The Dutch version that was used in this study has shown good psychometric properties in patients with RA (16–18). The core part of the AIMS2 contains 57 items that are categorized into 12 scales representing different areas of health. Each scale contains 4 or 5 items measured on 5-point Likert-type rating scales. The scales can be combined into 5 second-order summary component scores: physical (mobility level, walking and bending, hand and finger function, arm function, self-care, household tasks), affect (level of tension, mood), symptom (arthritis pain), social interaction (social activity, support from family), and role (work).

The AIMS2-SF (3) comprises 26 items from the long form version. Item reduction for the AIMS2-SF was mainly based on a Delphi exercise of patients' and experts' judgments of relevance and was aimed at preserving the 5 original dimensions of the AIMS2. The AIMS2-SF showed similar convergent validity, reliability, and sensitivity to change as the original AIMS2. Several studies have since confirmed the psychometric qualities of the AIMS2-SF (4–6, 18, 19). Although in the Dutch version (5) items 31 and 38 from the long form version are recommended instead of items 33 and 42, the items from the original French version were used in the current study. The AIMS2-SF is usually scored by combining the items from the 5 dimensions of the long form AIMS2. First, responses to 16 items are recoded so that higher scores indicate worse health. After recoding the raw responses, the scores on each item within a dimension are summed and transformed to a score ranging from 0 to 10, with higher scores representing poorer health status.

Statistical analyses.

CFAs were conducted using LISREL 8.70 (Scientific Software International, Lincolnwood, IL). Because the AIMS2-SF consists of ordinal Likert-type items, robust maximum likelihood (RML) estimation with Satorra-Bentler (SB)–scaled statistics was used (20). This estimation procedure corrects for non-normality using the asymptotic covariance matrix and provides more accurate fit indices. Given the relatively small sample size in this study, this procedure was considered to be preferable to alternative asymptotic distribution-free methods (21–24).

The conventional overall measure of fit in CFA is the chi-square statistic, where small, nonsignificant values indicate a good fit with the data. Because of several problems associated with this statistic, such as its sensitivity to large sample sizes, the chi-square statistic is likely to overstate the lack of fit of a model. Therefore, this measure was primarily used to statistically compare the relative fit of the nested models by means of chi-square difference tests. Because the simple difference between 2 SB-scaled chi-squares does not yield a correct test statistic, we used the SB-scaled difference test statistic (ΔSBχ2) procedure (25).

Additionally, a variety of indices have been developed that account for the problems associated with the chi-square statistic. As suggested by Hu and Bentler (26, 27), multiple fit indices were used to examine the fit of the models: the non-normed fit index (NNFI; also known as the Tucker–Lewis index), the comparative fit index (CFI), the standardized root mean square residual (SRMR), and the root mean square error of approximation (RMSEA). Values ≥0.95 for the NNFI and CFI, ≤0.08 for the SRMR, and ≤0.06 for the RMSEA indicate a good fit between the hypothesized model and the data (26, 27).

As in other studies exploring the factor structure of the AIMS2-SF, the 2 role items were excluded from the factor analyses because these are not completed by patients who are unemployed, disabled, or retired at the time of study. At baseline, 48% of the patients fell into one of these categories. Moreover, constructing a factor with only 2 items may lead to possible identification and convergence problems (28–30). The percentage of missing values for the 24 remaining items ranged from 0.0 to 3.6. Because no missing data patterns were identified, missing values were imputed using the expectation-maximization algorithm procedure in LISREL.

We tested and compared 3 different factor models. Model 1 is the currently used measurement model of the AIMS2-SF that combines the remaining items of the AIMS2-SF in the same dimensions as the long form AIMS2: physical, social interaction, symptom, and affect. Model 2 is based on the findings of 2 previous exploratory factor analyses indicating that the items in the physical dimension should be split into 2 factors, 1 reflecting upper body limitations and 1 reflecting lower body limitations (3, 4). Finally, model 3 reflects the 3-factor solution found in 2 independent studies, in which the symptom items did not form a separate factor but, instead, loaded highly on the psychological (affect) dimension (5, 6).

In all models, the items were constrained to load on a single factor, the variance of the hypothesized latent factors was fixed at 1.0, and the factors were allowed to correlate freely. Initial comparison of the models was based on the restrictive assumption that the error terms of the items were uncorrelated. Correlated error terms between items generally indicate that these items share a common variance that is not accounted for by the hypothesized factor structure, such as the presence of 1 or more meaningful unspecified factors. However, findings of correlated error terms are not unusual in the validation of assessment instruments in general, and of psychological measures in particular (31). The presence of correlated error terms can also reflect certain method effects, especially perceived redundancy or overlap in item content (32, 33). A model can be further improved by allowing such error terms to correlate, but only when this can be justified and interpreted substantively (34). In general, it is considered justifiable to allow error correlations between items within the same factor. When none of the restrictive models achieved acceptable fit, the constraints of the models were relaxed one at a time by allowing the error terms with the largest modification index within each factor to correlate, provided that this made substantive sense.


Between March 2003 and November 2004, 302 patients were enrolled in this part of the study. Of these patients, 279 (92.4%) completed the long form AIMS2 at study entry. There were no significant differences in demographic and clinical characteristics between patients who did and those who did not complete the AIMS2 (data not shown). Of the participants who completed the AIMS2, 70% were female and the mean ± SD age and median (interquartile range) disease duration were 54.6 ± 12.8 years and 8.0 (3.0–15.0) years, respectively. Assessment of disease severity at baseline generally indicated severe RA, with a mean ± SD 28-joint Disease Activity Score (35) of 5.43 ± 1.22, AIMS2-SF symptom scale score of 6.9 ± 2.0, and Health Assessment Questionnaire disability index (36, 37) score of 1.49 ± 0.59.

The means, SDs, skewness, and kurtosis of the recoded scores on the items are listed in Table 2. The full range of the response options was used for all items, except items 32 and 53, for which the highest response option (“never” or “no days”) was not used. Skewness and kurtosis values of the items suggested low to moderate non-normality of the items, further supporting the use of RML estimation with SB-scaled statistics.

Table 2. Distribution of the recoded Arthritis Impact Measurement Scales 2 Short Form (AIMS2-SF) scores*
 Mean ± SDSkewnessKurtosis
  • *

    The item numbers refer to the original numbers in the long form AIMS2. Higher scores indicate poorer status for all items.

1. Use a car or public transportation1.95 ± 1.010.49−0.65
5. In a bed or a chair for most or all of the day1.82 ± 0.960.65−0.60
6. Trouble doing vigorous activities4.59 ± 0.77−1.350.59
7. Trouble walking several blocks or climbing a few  flights of stairs3.76 ± 1.19−0.36−0.82
10. Unable to walk unless assisted2.00 ± 1.360.76−0.81
11. Write with a pen or pencil2.34 ± 1.070.24−0.60
12. Button a shirt or blouse2.56 ± 1.140.17−0.63
13. Turn a key in a lock2.46 ± 1.060.17−0.55
18. Comb or brush your hair2.10 ± 1.030.39−0.69
20. Reach a shelve that was above your head2.76 ± 1.250.10−0.75
22. Need help to get dressed1.90 ± 1.100.69−0.66
24. Need help to get in or out of bed1.45 ± 0.791.310.46
29. Get together with friends or relatives2.95 ± 0.62−0.031.29
32. On the telephone with close friends or relatives2.48 ± 0.74−0.09−0.16
33. Go to a meeting of a church, club, team, or other group3.66 ± 0.95−0.14−0.49
35. Family and friends sensitive to your personal needs1.91 ± 0.980.54−0.62
39. Severe pain from your arthritis3.72 ± 1.07−0.30−0.65
41. Morning stiffness lasts more than 1 hour3.49 ± 1.31−0.26−0.91
42. Pain makes it difficult for you to sleep3.14 ± 1.15−0.05−0.57
48. Felt tense or high strung2.67 ± 0.910.03−0.17
49. Bothered by nervousness or your nerves2.32 ± 0.920.16−0.42
53. Enjoyed the things you do2.19 ± 0.730.160.08
54. In low or very low spirits3.11 ± 0.89−0.04−0.04
56. Others better off if you were dead1.38 ± 0.761.551.09

The results of the CFAs are presented in Table 3. None of the restricted models satisfied the recommended criteria of acceptable fit on any of the fit indices. From the 3 models tested, the alternative measurement model with the physical dimension split into upper and lower body limitations (model 2) provided the best fit to the data, with an NNFI and CFI close to 0.95 and an SRMR and RMSEA just above the cutoff values of 0.08 and 0.06, respectively. Although the fit indices of the current model (model 1) were only marginally worse, the difference between these nested models was significant (ΔSBχ2[4] = 87.44, P < 0.001). Both models performed substantially better than the model without a separate factor for symptoms (model 3).

Table 3. Summary of fit indices for the different models of the Arthritis Impact Measurement Scales 2 Short Form (AIMS2-SF)*
  • *

    χ2 = Satorra-Bentler–scaled chi-square; NNFI = non-normed fit index; CFI = comparative fit index; SRMR = standardized root mean square residual; RMSEA = root mean square error of approximation; 90% CI = 90% confidence interval; model 1 = current model (physical, social interaction, symptom, and affect); model 2 = alternative model with the physical dimension split into upper and lower body limitations; model 3 = alternative model with the symptom items loading on the affect dimension.

Model 1805.032460.920.930.100.09 (0.08–0.10)
 Refined: 6 correlated errors483.902400.960.970.080.06 (0.05–0.07)
Model 2727.172420.930.940.100.09 (0.08–0.09)
 Refined: 6 correlated errors413.722360.970.980.080.05 (0.04–0.06)
Model 31038.232490.880.900.110.11 (0.10–0.11)
 Refined: 7 correlated errors549.512420.950.960.100.07 (0.06–0.08)

Subsequent examination of the different models showed that the assumption of no correlation between the error terms did not hold. The modification indices indicated the presence of several high error covariances between pairs of items within the same dimension. The 6 largest modification indices (all >20) were consistently present in all models. These error covariances involved items 48 (felt tense or high strung) and 49 (bothered by nervousness or your nerves), items 6 (trouble doing vigorous activities) and 7 (trouble walking several blocks or climbing a few flights of stairs), items 22 (need help to get dressed) and 24 (need help to get in or out of bed), items 11 (write with a pen or pencil) and 12 (button a shirt or blouse), items 12 and 13 (turn a key in a lock), and items 11 and 13. In model 3, an additional high correlation was identified between the error terms of items 53 (enjoyed the things you do) and 54 (in low or very low spirits). All of these correlated error terms appeared to involve pairs of items with a high degree of similarity in feeling state or items reflecting similar degrees of severity on the same functional limitation.

Because allowing these error terms to correlate did not seriously compromise the structure of the original models, the models were respecified to include these correlations. The final refined models of the AIMS2-SF are shown in Figure 1, including the standardized factor loadings, correlations between factors, and correlations between error terms. The fit indices of the refined models with correlated error terms showed marked improvements in model fit for all 3 models (see Table 3). Moreover, the refined versions of both the current measurement model (model 1) and the alternative model with the physical dimension split into upper and lower body limitations (model 2) now satisfied the criteria for acceptable model fit for the NNFI, CFI, SRMR, and RMSEA. As with the restricted versions of the models, the latter model (model 2) performed significantly better than model 1 (ΔSBχ2[4] = 144.86, P < 0.001) and both models provided a clearly better fit than the model without a separate symptom factor (model 3).

Figure 1.

Standardized parameter estimates for the 3 refined models of the Arthritis Impact Measurement Scales 2 Short Form. Rectangles represent the observed variables (items) and ellipses represent the hypothesized latent constructs (factors). Values on the single-headed arrows leading from the factors to the items are standardized factor loadings. Values to the left of the items represent error variances. Values on the curved double-headed arrows are correlations between factors or error terms.


The AIMS2-SF is usually scored by grouping its items into the same (second-order) components of the original long form AIMS2. Although this scoring procedure certainly has face validity, it has not been confirmed using appropriate statistical analyses. The current study used CFA to test and compare the goodness-of-fit of the current measurement model of the AIMS2-SF and 2 alternative models based on previous exploratory factor analyses. The results support the validity of the current measurement model of the AIMS2-SF, but suggest that an alternative model with separate upper and lower body limitations factors may be more meaningful.

The study confirms the findings by Guillemin et al (3) and Ren et al (4) that an upper body limitations factor should be distinguished from lower body limitations. This distinction was already suggested for version 1 of the long form AIMS (1, 38) and is quite common in the assessment of physical function in rheumatic diseases. Despite the intuitive appeal of a distinction between upper and lower body limitations, the validity of this distinction depends critically on whether all items can be classified as involving upper or lower body functions. Although most of the items for physical function are clearly associated with either upper body activities (e.g., write with a pen or pencil) or lower body activities (e.g., trouble walking several blocks or climbing a few flights of stairs), the AIMS2-SF also contains some items for which this classification is not so clear. Items such as “need help to get dressed” or “trouble doing vigorous activities” may involve both upper and lower body functions or may be related to other factors such as physical condition or athleticism. Indeed, in the final model (model 2) the latter item loaded very poorly (0.20) on the specified lower body limitations factor. However, because this item demonstrated a similarly weak factor loading in both the current measurement model (model 1) and the alternative model without a symptom factor (model 3), it is suggested that this item may not represent the proposed physical function factor in general. All other items, however, demonstrated a sufficient loading on the specified upper or lower body limitations factor, supporting the use of separate scales for these factors.

The results also suggest the presence of unspecified factors or overlapping or redundant items, especially within the physical and psychological dimensions. The assumption of uncorrelated error terms did not hold for the items in the AIMS2-SF and several error covariances were added post hoc to the models. The high error correlation between items 11, 12, and 13 most likely indicates the presence of an additional factor related to restricted finger movement, which may be distinct from other physical limitations. Indeed, in the original long form AIMS2, these items were part of the hand and finger function subscale. Future studies could examine whether adding a separate factor for these items would improve the factor structure of the AIMS2-SF. Although correlated error terms are generally considered indicative of the omission of 1 or more relevant factors, they can also point to other types of method effects such as overlapping item content or perceived commonality (33). A review of the other items with high error term correlations indeed showed that these items used the same wordings (e.g., “need help”) or assessed very similar feelings. Because high correlations between the error terms of such items are not uncommon, we considered it justifiable to allow error covariances between pairs of items within the same dimension. Nonetheless, this finding does deserve further attention and the final models need to be cross-validated to increase confidence in the replicability of the post hoc modifications. Moreover, further research could focus more closely on possible modifications in the wording of these items to improve the precision of the AIMS2-SF.

Also, it is possible that the different factor structures found in previous studies are the result of specific differences in the patient samples studied. For instance, Guillemin et al (3) developed the AIMS2-SF using data from a prospective cohort study of patients with RA with severe disease. Other studies, however, investigated its factor structure in OA (4, 6), which is a different and unique disease condition, or used cross-sectional data from patients with RA with moderate disease severity (5). Although we did not find a clear association between sample characteristics and the reported factor structures in these studies, the results of the current study should be cross-validated in other samples, such as OA patients or RA patients with less severe disease.

An important limitation of this study is the omission of the proposed role dimension of the AIMS2-SF in the models. This is the result of a general problem associated with the use of both the long form and the short form of the AIMS2. Because patients who are unemployed, disabled, or retired at the time of the study are asked to skip the items from this dimension, missing values for these items are often high. Consequently, the items from this dimension are usually excluded from factor analyses and the actual presence of a separate role dimension has as yet not been confirmed.

In summary, this study supports the factorial validity of the AIMS2-SF and suggests the use of separate scales for upper and lower body limitations in scoring the questionnaire. The results also point to certain problems associated with some of the items that need further study.


Mr. ten Klooster had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. ten Klooster, Veehof, Taal, van de Laar.

Acquisition of data. Veehof, van Riel.

Analysis and interpretation of data. ten Klooster, Taal, van Riel, van de Laar.

Manuscript preparation. ten Klooster, Veehof, Taal, van Riel, van de Laar.

Statistical analysis. ten Klooster.


Schering-Plough had no role in the study design, data collection, data analysis, writing of the manuscript, or the decision to submit the manuscript for publication.


We thank T. van Gaalen, W. Kievit, and P. Welsing for their contribution to the organization of the study and data management. We thank the following rheumatologists and research nurses for their assistance in patient recruitment and data collection: J. Alberts, C. Allaart, A. ter Avest, P. Barrera Rico, T. Berends, H. Bernelot Moens, K. Bevers, C. Bijkerk, A. van der Bijl, J. de Boer, A. Boonen, E. ter Borg, E. Bos, A. Branten, F. Breedveld, H. van den Brink, J. Bürer, G. Bruyn, H. Cats, M. Creemers, J. Deenen, C. De Gendt, K. Drossaers-Bakker, A. van Ede, A. Eijsbouts, S. Erasmus, M. Franssen, I. Geerdink, M. Geurts, E. Griep, E. de Groot, C. Haagsma, H. Haanen, J. Harbers, A. Hartkamp, J. Haverman, H. van Heereveld, A. van de Helm-van Mil, I. Henkes, S. Herfkens, M. Hoekstra, K. van de Hoeven, D. M. Hofman, M. Horbeek, F. van den Hoogen, P. M. Houtman, T. Huizinga, H. Hulsmans, P. Jacobs, T. Jansen, M. Janssen, M. Jeurissen, A. de Jong, M. Kleine Schaar, G. Kloppenburg, H. Knaapen, P. Koelmans, M. Kortekaas, B. Kraft, A. Krol, M. Kruijssen, D. Kuiper-Geertsma, I. Kuper, R. Laan, J. van de Laan, J. van Laar, P. Lanting, H. Lim, S. van der Linden, A. Mooij, J. Moolenburgh, N. Olsthoorn, P. van Oijen, M. van Oosterhout, J. Oostveen, P. van 't Pad Bosch, K. Rasing, K. Ronday, D. de Rooij, L. Schalkwijk, P. Seys, P. de Sonnaville, A. Spoorenberg, A. Stenger, G. Steup, W. Swen, J. Terwiel, M. van der Veen, M. Veerkamp, C. Versteegden, H. Visser, C. Vogel, M. Vonk, H. Vonkeman, A. Westgeest, H. van Wijk, and N. Wouters.