The content herein is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
To investigate the usefulness of various scales for evaluating joint and fascia manifestations in patients with chronic graft-versus-host disease (GVHD) after allogeneic hematopoietic cell transplantation, and to compare the scales in terms of simplicity of use and ability to yield reliable and clinically meaningful results.
In a prospective, multicenter, longitudinal, observational cohort of patients with chronic GVHD (n = 567), we evaluated 3 scales proposed for assessing joint status: the National Institutes of Health (NIH) joint/fascia scale, the Hopkins fascia scale, and the Photographic Range of Motion (P-ROM) scale. Ten other scales were also tested for assessment of symptoms, quality of life, and physical functions.
Joint and fascia manifestations were present at study enrollment in 164 (29%) of the patients. Limited range of motion was most frequent at the wrists or fingers. Among the 3 joint assessment scales, changes in the NIH scale correlated with both clinician- and patient-perceived improvement of joint and fascia manifestations, with higher sensitivity than the Hopkins fascia scale. Changes in all 3 scales correlated with clinician- and patient-perceived worsening, but the P-ROM scale was the most sensitive in this regard. Onset of joint and fascia manifestations was not associated with subsequent mortality.
Joint and fascia manifestations are common in patients with chronic GVHD and should be assessed carefully in these patients. Our results support the use of the NIH joint/fascia scale and P-ROM scale to assess joint and fascia manifestations. The NIH scale better captures improvement, while the P-ROM scale better captures worsening. The utility of these scales could also be tested in the rheumatic diseases.
Allogeneic hematopoietic cell transplantation is a curative treatment for many hematologic diseases (). Chronic graft-versus-host disease (GVHD) occurs in approximately half of the transplant survivors and is the leading cause of late morbidity that compromises quality of life (QOL) and function ([2-4]). Chronic GVHD is thought to occur because the donor's immune system recognizes recipient tissue, causing inflammation and fibrosis. Joint/fascia manifestations have been considered to be infrequent in patients with chronic GVHD, but studies investigating this complication have been limited. Reported joint/fascia manifestations include joint stiffness, edema, restricted range of motion (ROM), arthralgia, and rarely, arthritis or synovitis (). Joint/fascia manifestations may be clinically detectable when inflammation and fibrosis arise in deep tissue (deep sclerosis/fasciitis) or skin overlying joints (superficial sclerosis), and deep sclerosis may occur with or without superficial sclerosis. Isolated fasciitis is frequently recognizable by restricted ROM or joint contractures. It is usually accompanied by stiffness or edema of the extremities, while the overlying skin remains freely mobile (). For example, inability to assume a “Buddha prayer” posture with full bilateral wrist extension indicates limited wrist extension due to tightening of flexor tendons. Sometimes superficial sclerosis is confluent with deep sclerosis or fasciitis, in which case the skin may be hidebound and underlying tissue has a wooden texture.
Joint/fascia manifestations in patients with chronic GVHD need to be assessed reliably, simply, and in a clinically meaningful way. The severity of these manifestations and response to therapy require documentation in both clinical trials and clinical practice, to guide therapy. Recognizing the lack of validated joint assessment scales, participants in a 2005 National Institutes of Health (NIH) consensus conference and other investigators have proposed various measurement scales ([5-13]). In the present study, in order to determine the optimal approach for identifying changes in joint/fascia manifestations in patients with chronic GVHD, we evaluated 3 joint assessment scales and 10 other scales that assess symptoms, QOL, and physical function. We also examined longitudinal joint responses according to the validated scales and investigated associations of joint/fascia manifestations with subsequent mortality.
PATIENTS AND METHODS
The present investigation was conducted under the auspices of the Chronic GVHD Cohort Study, a prospective, multicenter, longitudinal, observational study by the Chronic GVHD Consortium (). Patients who were at least 2 years of age, with systemically treated chronic GVHD within 3 years after transplantation, were eligible for the Chronic GVHD Cohort Study; patients with recurrent disease or anticipated survival of <6 months were not eligible. Chronic GVHD was diagnosed according to the NIH consensus criteria (). Incident cases (enrollment <3 months after chronic GVHD diagnosis) and prevalent cases (enrollment ≥3 months after chronic GVHD diagnosis but within 3 years after transplantation) were included. At enrollment and every 6 months thereafter, clinicians and patients (or their guardians, in the case of young juvenile patients) reported standardized information on chronic GVHD organ involvement and manifestations. Incident cases underwent an additional assessment at 3 months after enrollment. Patients were treated according to institutional practice in compliance with the NIH chronic GVHD consensus guidelines (). The study protocol was approved by the institutional review board of each participating center, and all participants or their guardians provided written informed consent in accordance with the Declaration of Helsinki.
A total of 13 assessment scales were evaluated in the present study (Table 1). The NIH joint/ fascia scale uses a 0–3-point scale to calculate a composite score for tightness, ROM, and activities of daily living (ADL). The Hopkins fascia scale uses a 0–3-point scale but scores only tightness. The Photographic Range of Motion (P-ROM) scale is a series of images that captures ROM separately for shoulders, elbows, wrists/fingers, and ankles () (see Supplementary Figure 1, on the Arthritis & Rheumatology web site at http://onlinelibrary.wiley.com/doi/10.1002/art.38293/abstract), with lower scores indicating more limited ROM. The P-ROM total score is the sum of scores in all 4 joints, with a maximum possible score of 25. P-ROM data were collected among 502 patients who were enrolled after November 2008. The Lee symptom scale is a 30-item self-administered patient questionnaire specific to symptoms of chronic GVHD (). The muscle/joint subscale from the Lee overall symptom scale was also evaluated in this study. Patients reported their overall chronic GVHD symptoms on a 10-point scale of peak severity during the past week (). The Functional Assessment of Cancer Therapy–General (FACT-G) () and the Short Form 36 (SF-36) () were used for QOL assessment. The Human Activities Profile (HAP) is a 94-item self-reported assessment of energy expenditure or physical fitness (). The walk test measures physical performance based on the total distance in feet walked in 2 minutes (). The grip strength test measures physical performance using a hydraulic dynamometer (), with average pounds of pressure from 3 measurements in the dominant hand used for analysis.
Table 1. Assessment scales evaluated in this study*
No. of items
Clinically meaningful change
NIH = National Institutes of Health; ROM = range of motion; ADL = activities of daily living; NA = not applicable; P-ROM = Photographic ROM; GVHD = graft-versus-host disease; FACT-G = Functional Assessment of Cancer Therapy–General; QOL = quality of life; SF-36 = Short Form 36; MCS = mental component score; PCS = physical component score; HAP = Human Activities Profile; MAS = maximum activity score; AAS = adjusted activity score.
cDerived from half of the standard deviation of baseline values.
NIH joint/fascia scale (range 0–3)
0 = no symptoms; 1 = mild tightness of arms or legs, normal or mild decreased ROM, and not affecting ADL; 2 = tightness of arms or legs or joint contractures, erythema thought due to fasciitis, moderate decrease in ROM, and mild-to-moderate limitation of ADL; 3 = contracture with significant decrease of ROM and significant limitation of ADL (unable to tie shoes, button shirts, dress self, etc.)
Summary of 4 items, i.e., joint and muscle aches, limited joint movement, muscle cramps, and weak muscles, with each item rated as follows: 0 = not at all; 1 = slightly; 2 = moderately; 3 =quite a bit; 4 = extremely
Joint/fascia manifestations were defined as an NIH joint/fascia score of ≥1 at any study visit. At followup visits every 3–6 months, as an anchor of response, both the clinician and the patient rated their perception of change in joint/fascia manifestations on an 8-point scale that was collapsed for analysis into the following categories: improved (1 = completely gone; 2 = very much better; 3 = moderately better), stable (4 = a little better; 5 = about the same; 6 = a little worse), or worse (7 = moderately worse; 8 = very much worse). Longitudinal change scores on the scales were calculated by subtracting previous values from current values. To account for within-patient correlation, multivariable linear mixed models with random patient effect were used to evaluate correlations between changes in each scale and clinician- or patient-perceived changes in joint status (improved versus stable, or worse versus stable). The analysis included all paired visits when joint/fascia manifestations were documented in the previous or current visit. Linear mixed models were chosen since missing data had little effect on the models ([17, 18]). All models were adjusted for covariates that were associated with longitudinal changes in measures in univariate analysis at a P value of ≤0.01. In comparing performance among the different scales, the estimated differences in measures according to clinician- or patient-perceived improvement or worsening (versus stability) were standardized to the clinically meaningful change on the particular scale. This standardization is important because each scale has a different increment and potential range. As described in the NIH consensus report (), clinically meaningful changes were defined according to the original design of the scale or half of the standard deviation of baseline values (Table 1).
Cox regression models were used to examine correlations between onset of joint/fascia manifestations and subsequent overall and nonrelapse mortality, with onset of joint/fascia manifestations treated as a time-dependent covariate. The models were adjusted for study site, case type, months from transplantation to enrollment, platelet count, serum total bilirubin level, Karnofsky score, prednisone dosage, age at transplantation, HLA matching and donor relation, donor/recipient sex combination, conditioning intensity, history of grades II–IV acute GVHD, and classic or overlap subcategory. These covariates were chosen to control for known chronic GVHD mortality risk factors and potential outcome differences among study sites ([19-22]). Proportions of patients with joint response (improvement, stability, or worsening) across time after visits at which newly developed joint/fascia manifestations were recorded were graphically plotted. Newly developed joint/fascia manifestations were defined as an NIH joint/ fascia score of ≥1 at enrollment for incident cases, and as the first visit with an NIH joint/fascia score of ≥1 without previous joint/fascia manifestations for prevalent cases.
Medians and interquartile ranges (IQRs) were calculated for continuous variables, and numbers and percentages of patients were recorded for categorical variables. Statistical analyses were performed using SAS/STAT, version 9.3 and R, version 2.15.2.
Patient characteristics and presence of joint/fascia manifestations
A total of 567 participants at 10 sites were enrolled through December 31, 2011. Among patients who remained alive at the last followup, the median duration of followup after enrollment was 23.6 months (IQR 13.3–-34.0). Table 2 shows characteristics of the 567 patients at the time of enrollment (baseline). Joint/fascia manifestations (NIH joint/fascia score of ≥1) were present at the time of study enrollment in 164 of the patients (29%).
Table 2. Characteristics of the patients grouped by presence or absence of joint/fascia manifestations at the time of enrollment*
Except where indicated otherwise, values are the number (%). IQR = interquartile range; NA = information not available; TBI = total-body irradiation; GVHD = graft-versus-host disease.
aBy 2-sample t-test or chi-square test of independence.
Time from transplantation to enrollment, median (IQR) months
Age, median (IQR) years
Age <18 years
Stem cell source
Mobilized blood cells
Donor/patient sex combination
Female to male
HLA matching and donor relation
Myeloablative with high-dose TBI
Nonmyeloablative/reduced-intensity with low-dose TBI
Prior grade II–IV acute GVHD
Presence of joint/fascia manifestations at enrollment was associated with longer duration from transplantation to enrollment, prevalent GVHD cases, and the use of high-dose total-body irradiation conditioning. Other characteristics were similar between the group with and the group without joint/fascia manifestations at enrollment. Features of chronic GVHD were also compared between patients with and those without joint/fascia manifestations at enrollment (Table 3). In this context, joint/fascia manifestations were associated with more frequent skin involvement and skin sclerosis, less frequent mouth and liver involvement, higher NIH global severity score, higher symptom burden, and lower QOL as measured by the FACT-G and the SF-36 physical component score. SF-36 mental component score, maximum and adjusted HAP scores, and walk test and grip strength test results were similar between the two groups. Walk test results also did not differ between patients with limited ROM in the ankles at enrollment and those without joint/fascia manifestations (median 466 feet [IQR 400–536] versus 500 feet [IQR 410–575]; P = 0.08). Grip strength test results were lower among patients with limited ROM in the wrists/fingers at enrollment than among those without joint/fascia manifestations (median 51 pounds [IQR 42.7–75.3] versus 62.3 pounds [IQR 49.7–81]; P = 0.02).
Table 3. Features of chronic GVHD in the patients grouped by presence or absence of joint/fascia manifestations at the time of enrollment*
IQR = interquartile range (see Table 1 for other definitions).
aBy 2-sample t-test or chi-square test of independence.
NIH joint/fascia score
Hopkins fascia score
P-ROM total score
Other site involvement
NIH global score
Lee muscle/joint subscale
Lee overall symptom score
10-point overall global rating
Physical function measures
Walk test, feet
Grip strength test, pounds
Among the 164 patients with joint/fascia manifestations at enrollment, 107 (65%) had mild joint/fascia manifestations according to the NIH joint/fascia score, 51 (31%) had moderate manifestations, and 6 (4%) had severe manifestations. Among the 98 patients with joint/fascia manifestations and available P-ROM data at enrollment, limitations in ROM were present in the wrists/fingers (64%), ankles (47%), shoulders (35%), and elbows (30%) (Figure 1). Limitations in ROM were most frequently mild in all joints, according to the P-ROM score (i.e., score of 6 or 5 for shoulders, elbows, and wrists/fingers, and score of 3 for ankles), and limitations were present in multiple joints for 72% of the patients with limited ROM in at least 1 joint. The median and mean ± SD P-ROM total scores at enrollment were 25 (IQR 24–25) and 23.9 ± 2.1, respectively.
Difference in longitudinal changes in measurement scores according to perceived changes at followup visits
Changes in joint status were examined for 652 paired visits when joint/fascia manifestations were documented at the previous or current visit. Joint status at the later visit was rated by clinicians as improved in 44% of the visits, stable in 51%, and worse in 5%, and by patients as improved, stable, and worse in 45%, 44%, and 11% of the visits, respectively. Agreement between clinicians and patients was moderate (weighted κ = 0.32).
Estimated differences in longitudinal changes in measures between improvement and stability or between worsening and stability for the 3 joint/fascia scales are shown in Figure 2A. The estimated difference in linear mixed models indicates the average difference in scores for the group of visits associated with perceived improvement or perceived worsening as compared to the group of visits associated with perceived stability (for details, see Supplementary Figure 2, on the Arthritis & Rheumatology web site at http://onlinelibrary.wiley.com/doi/10.1002/art.38293/abstract). For example, Figure 2A shows that the NIH joint/fascia score improved by an estimated average of 0.41 points (95% confidence interval 0.28–0.55, P < 0.001) when clinicians perceived improvement versus stability.
Among the 3 joint/fascia scales, changes in the NIH joint/fascia score and the Hopkins fascia score correlated with both clinician- and patient-perceived joint improvement, whereas changes in the P-ROM total score correlated with clinician-perceived improvement but not with patient-perceived improvement (Figure 2A). Estimated differences between clinician-perceived improvement and stability were larger for the NIH score than for the Hopkins score. This indicates that the NIH joint/fascia scale is more sensitive to clinician-perceived improvement than is the Hopkins fascia scale. With regard to patient perception, estimated differences were similar between the NIH score and Hopkins score. In comparing worsening versus stability, changes in all 3 of the joint/fascia scales correlated with both clinician- and patient-perceived joint worsening (Figure 2A). Among the 3 scales, estimated differences between worsening and stability were significantly larger for the P-ROM total scale than for the other 2 scales, by both clinician and patient perception. Thus, the P-ROM scale is most sensitive to worsening. The NIH joint/fascia score might have had an advantage in demonstrating change since this score was used to select visit pairs, but results were similar even if the P-ROM score was used to select visit pairs for analysis (data not shown).
Estimated standardized differences in scores for other scales are shown in Figure 2B. Changes in only the SF-36 physical component score correlated with both clinician- and patient-perceived joint improvement. In comparing worsening versus stability, changes in all 3 symptom scores and in FACT-G scores correlated with both clinician- and patient-perceived joint worsening. Changes in HAP scores correlated with worsening as perceived by clinicians but not by patients, and did not correlate with perceptions of improvement by either clinicians or patients. Changes in walk test or grip strength test results were not significantly associated with clinician- or patient-perceived changes in the joints, and the results were similar even when the analysis was limited to patients with limited ROM in the ankles or wrists.
Longitudinal response assessment according to the NIH joint/fascia scale and P-ROM scale
Seventy-seven percent of the patients in our study cohort with “new” joint/fascia manifestations (108 of 140) had subsequent visits at 3 months and/or 6 months. Joint response at these visits according to the NIH joint/fascia and P-ROM scales is shown in Figure 3. Analysis beyond 6 months was not possible because more than half of the data were missing. Among incident cases (Figures 3A and B), there was little difference between the 3-month and 6-month visits in the proportions of patients categorized as having joint improvement, stability, and worsening according to both scales, suggesting that the changes were evident by 3 months after onset of joint/fascia manifestations. The proportion of patients with improvement among incident cases was ∼10% lower according to the P-ROM scale compared to the NIH scale, while the proportion with worsening was 10–15% higher according to the P-ROM scale compared to the NIH scale. This trend was more apparent among prevalent cases than among incident cases. Compared with incident cases, improvement was less frequent among prevalent cases, while worsening was more frequent (Figures 3A–D).
Association of joint/fascia manifestations with survival outcomes
In multivariable time-dependent Cox models, joint/fascia manifestations (NIH joint/fascia score ≥1) at any time were not associated with subsequent overall mortality or nonrelapse mortality (data not shown). Results were similar when only moderate or severe joint/fascia manifestations (NIH joint/fascia score ≥2) were considered. The number of patients with severe manifestations was not sufficient for a separate analysis of this group.
Our results showed a 29% incidence of joint/fascia manifestations in patients with chronic GVHD. Although the cohort did not include all consecutive patients at each participating center and therefore a selection bias may have been present, we believe these findings highlight the importance of careful examination of the joints and fasciae in this population. Based on our data, it is particularly important to provide education about potential joint/fascia manifestations among patients who are >1 year posttransplantation, those who received high-dose total-body irradiation conditioning, and those who had skin involvement or sclerosis with GVHD ([23, 24]).
The NIH joint/fascia score was originally intended to evaluate the severity of GVHD manifestations in joints and fasciae for baseline or cross-sectional use (), but our results suggest that longitudinal changes in the NIH joint/fascia score between visits could be used for evaluating response. Recent studies demonstrated similar utility of longitudinal changes in the NIH organ score for measuring response in the skin and eyes ([25, 26]). Changes in the Hopkins fascia score also correlated with clinician- and patient-perceived improvement and worsening, but estimated differences were smaller with the Hopkins fascia score than with the NIH joint/fascia score. This indicates that the Hopkins fascia scale has a lower sensitivity, which may be explained by differences in the information it captures. The NIH joint/fascia score incorporates all 3 domains of tightness, ROM, and ADL, whereas the Hopkins fascia score addresses only tightness. Thus, we recommend that the Hopkins score is not needed if the NIH score is collected.
The greatest merit of the P-ROM scale is its objectivity and simplicity (). The NIH consensus group recommended active-assisted ROM as a useful objective measure of joint response, but the need for an adequately trained professional who can conduct ROM measurements in a standardized and reproducible manner is a major limitation of this assessment technique (). Therefore, the P-ROM scale was developed as an alternative for clinical use since any provider, including a family physician, can complete the assessment in 1–2 minutes. Although we found that the P-ROM scale was the most sensitive to perceived joint worsening among all scales, it was insensitive to patient-perceived joint improvement, perhaps because it does not capture information on tightness or ADL. Patients often report improvement in tightness before they or their clinicians observe improvement in ROM, which tends to occur more slowly. Such subtle changes may be more readily apparent to patients than to clinicians. One consideration for the future would be to increase the sensitivity of the P-ROM by incorporating a tightness component in this scale.
Changes in scores on symptom scales did not correlate with clinician-perceived improvement. Symptom information must be obtained from patients, and patients' perceptions are often discordant with clinicians' assessments. In this context, the Lee muscle/joint symptom subscale is useful for capturing changes in joint-specific symptoms. Similarly, either the Lee overall symptom scale or the 10-point global rating scale is useful for capturing changes in overall symptoms.
The FACT-G was sensitive to worsening but not to improvement, while the converse was true of the SF-36 physical component score, suggesting that neither scale was perfectly sufficient to capture changes in QOL associated with joint response. We did not observe a correlation of changes in scores on activity or physical function scales with joint response. These scales may lack either sufficient sensitivity or sufficient relevance to enable detection of changes in joint status. Nonarticular manifestations of GVHD may have more impact on these measures.
The onset of joint/fascia manifestations was not associated with subsequent mortality outcomes, supporting our understanding that issues related to disability and morbidity are more important factors in these patients. This result is consistent with the finding in another study that transplant outcomes, except for prolonged duration of immunosuppressive treatment, were similar in patients with chronic GVHD who had sclerotic manifestations and those who did not have sclerotic manifestations ().
The present study has some limitations. First, the study population consisted mostly of adults who received mobilized blood cell grafts. The results may not apply to children or those who received transplantation from other stem cell sources. Second, the scales used may not reflect symptoms associated with arthralgia or arthritis. Arthralgia is sometimes observed, but is often difficult to document and not captured. In contrast, arthritis with destruction occurs rarely, although true incidence data are lacking. Future studies should elucidate the frequency, presentation, and significance of these manifestations. Finally, we were unable to evaluate treatment effect for joint/fascia manifestations since immunosuppressive or physical therapies were not mandated in this observational study. Future prospective interventional studies could address this question using the validated scales.
To our knowledge, this study represents the first attempt to validate scales for assessing joint/fascia manifestations in patients with chronic GVHD. Our results support use of the NIH joint/fascia scale and P-ROM scale. The NIH scale better captures improvement, while the P-ROM scale better captures worsening. Our longitudinal assessments demonstrated that joint improvement was evident by 3 months after the onset of joint/fascia manifestations, and that significant proportions of patients experienced worsening in ROM within 6 months if joint/fascia manifestations developed >3 months after diagnosis of chronic GVHD. The utility of these scales could also be tested in the rheumatic diseases.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Carpenter had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Inamoto, Kurland, Lee, Carpenter.