To develop adapted versions of the Sharp/van der Heijde radiographic scoring system for use in juvenile idiopathic arthritis (JIA), and to investigate their validity in JIA patients with polyarticular disease.
To develop adapted versions of the Sharp/van der Heijde radiographic scoring system for use in juvenile idiopathic arthritis (JIA), and to investigate their validity in JIA patients with polyarticular disease.
The study group comprised 177 patients with polyarticular JIA. Radiographs of the wrist/hand of each patient were obtained at baseline (first observation) and then at 1, 3, 5, 7/8, and 10 years and were assessed independently by 2 pediatric rheumatologists according to different adaptations of the Sharp/van der Heijde method. To facilitate score assignment, the radiograph for each patient was compared with a bone age–related standard. Validation procedures included analysis of reliability, construct validity, and score progression over time.
Interobserver and intraobserver agreement on longitudinal score values and score changes was good for all of the adapted scoring versions (intraclass correlation coefficient >0.85). Score changes over time were moderately to strongly correlated with the clinical indicators of long-term joint damage and with the amount of long-term radiographic damage as measured with the carpo:metacarpal ratio, thereby demonstrating good construct validity. A steady increase in scores over time was observed, with joint space narrowing being the most common form of damage throughout the disease course. The inclusion of 5 new areas appeared to increase the overall construct validity of erosion scores.
Our results show that the adapted versions of the Sharp/van der Heijde score are reliable and valid for the assessment of radiographic progression in patients with JIA.
Juvenile idiopathic arthritis (JIA) is a chronic and heterogeneous disease characterized by prolonged synovial inflammation that may lead to destructive lesions of joint structures (1). Because the prevention or retardation of joint changes is a major objective of treatment of chronic arthritis, evaluation of radiographic joint damage has become an important tool for assessing disease severity and progression in patients with JIA. The assessment of structural joint damage is considered the gold standard of treatment efficacy studies in patients with chronic arthritis (2) and is now required by the US Food and Drug Administration to be used as a measure of disease progression in clinical trials of potential disease-modifying drugs (3). The evaluation of radiographic progression has never been included in controlled trials in JIA, reflecting primarily the paucity of established radiographic scoring systems for use in the pediatric age group. However, because new potent therapeutic agents are now available for children with JIA (4), there is a growing need for a reliable radiographic assessment standard to investigate thoroughly the effectiveness of these new agents.
In recent years, there has been a great deal of effort to devise new radiographic scoring systems or validate existing methods for use in JIA (5–13). However, only a few of these measures have undergone a detailed validation process or have been tested in sufficient numbers of patients. It is commonly believed that the traditional scoring methods used for adult patients with rheumatoid arthritis, which are based on the assessment of joint space narrowing (JSN) and erosions, may not be suitable for the evaluation of pediatric patients with joint diseases. In contrast to the situation in adults, it is difficult to reliably determine cartilage loss and erosions in children by simple examination of radiographs, because growing joints change anatomically over time (14, 15). In a recent pilot study, however, we observed that the Sharp and Larsen scoring methods are potentially reliable and valid for the assessment of radiographic progression in children with JIA (16).
In the present study, we describe the development of adapted versions of the Sharp/van der Heijde radiographic scoring method (17) for use in JIA and provide preliminary evidence of their validity in a large sample of children with polyarticular disease.
Because we were principally interested in examining the value of the scoring system for assessing radiographic progression, we chose a longitudinal design and aimed at analyzing continuous data. We reviewed the radiology records of patients seen at the study units from January 1986 to December 2004, to identify those who had a diagnosis of JIA according to the International League of Associations for Rheumatology revised criteria (18) and polyarthritis with wrist and/or hand joint involvement, and who had ≥2 standard radiographs of both wrists and hands in the posteroanterior view, one that was obtained at baseline (first observation) and one or more that were obtained at years 1, 3, 5, 7 or 8, and 10.
All films were scored using the wrist/hand component of the Sharp/van der Heijde method (i.e., feet were excluded). This method applies to 15 areas for JSN and 16 areas for erosion in each hand and wrist (17, 19). Scores for JSN and erosion in each area range from 0 to 4 and from 0 to 5, respectively. The total Sharp/van der Heijde score is calculated as the sum of the scores for JSN (range 0–120) and erosion (range 0–160) and ranges from 0 to 280. These scores will be referred to as “original” Sharp/van der Heijde scores.
We previously observed that JSN was more common than erosive changes in patients with JIA (16), and that erosions were frequent in some wrist areas not included in the adult scores (Ravelli A, et al: unpublished observations). Furthermore, erosive damage has been shown to be a better predictor than JSN of the long-term outcome of chronic arthritis (20, 21). For these reasons, we devised a modified version of the Sharp/van der Heijde erosion score, which included 5 additional areas in each wrist: the second, third, and fourth metacarpal bases, the capitate bone, and the hamate bone. Values for this modified erosion score range from 0 to 210. The sum of the JSN and modified erosion scores yields the total modified Sharp/van der Heijde score, which ranges from 0 to 330. The areas included in the original and modified radiographic scores are shown in Figure 1.
In younger children (generally boys with a bone age <5 years and girls with a bone age <6 years), some of the wrist areas were not assessable due to incomplete ossification of the carpal bones. In such cases, each area that was not assessable was assigned the average score of assessable areas, rounded to the closest integer.
As noted previously (16), in younger children the changes in carpal bones and, to a lesser extent, in distal metacarpal epiphyses, were frequently seen as deformity in shape, from squaring to squeezing to gross deformity, rather than as discrete erosions. Although bone deformity and erosion are presumably caused by different pathogenetic mechanisms, for practical reasons they were considered as equivalent, and the severity of bone deformity was graded in the Sharp/van der Heijde erosion score, on the same 0–5-point severity scale.
Because in childhood the degree of ossification and the width of joint spaces vary with age, the evaluation of time progression of JSN and bony erosion in an individual patient is difficult; the same applies to comparison of films from patients of different ages. To facilitate assignment of Sharp/van der Heijde scores, we compared each study patient's radiograph with a wrist/hand radiograph from a healthy child with the same bone age. Radiographs obtained from healthy boys and girls of all bone ages, ranging from 1.5 to 16 years according to the atlas of Greulich and Pyle (22), were identified by reviewing a large sample of radiographs in the study units. All children had undergone a bone age evaluation for short stature and were found to have a constitutional growth delay without endocrinologic abnormalities or had a radiograph (disclosing no abnormalities) that was obtained after wrist/hand trauma.
Two observers (MI and XN) independently assigned the original and modified Sharp/van der Heijde scores to all study radiographs. Radiographs from each patient were read in sequential order, and previous radiographs and scores were available to observers when examining and scoring followup radiographs. Both observers are pediatric rheumatologists with >5 years of clinical experience in the field, but they were not familiar with radiographic scoring. Before the beginning of the study, the observers had a training session with the principal investigator (AR), a pediatric rheumatologist with ∼20 years of clinical experience and familiarity with radiographic scoring, in order to gain experience with the Sharp/van der Heijde method.
Interobserver reliability of each scoring method was assessed for all of the films read by the 2 observers. Intraobserver reliability was based on the scores of radiographs obtained from a subset of 39 randomly selected patients, whose films were read a second time in a blinded manner by the 2 observers (81 films from 20 patients for observer MI and 71 films from 19 patients for observer XN), 3 months after the previous review.
Patient information included age at disease onset, sex, disease duration at baseline and at last followup visit, JIA subtype, and therapy with second-line medications and systemic corticosteroids throughout the study period. The following clinical assessments made at the last followup visits were recorded: physician's global assessment of overall disease activity, as measured on a 10-cm visual analog scale (0 = no activity and 10 = maximum activity); count of joints with swelling, pain on motion/tenderness, restricted motion, and active disease (23); assessment of functional ability, using the Italian version of the Childhood Health Assessment Questionnaire (C-HAQ; 0 = best and 3 = worst) (24, 25); Steinbrocker functional class (26); erythrocyte sedimentation rate (ESR) (Westergren method); C-reactive protein (CRP) level (as determined by nephelometry); the Juvenile Arthritis Damage Index, Articular (JADI-A) score (27); and the Poznanski score for radiographic damage (28). Briefly, the JADI-A assessed 36 joints or joint groups for the presence of damage, and the damage observed in each joint was scored on a 3-point scale (0 = no damage, 1 = partial damage, and 2 = severe damage, ankylosis, or prosthesis). The maximum total score is 72. The Poznanski score is a measure of the carpo:metacarpal ratio (29) and reflects the amount of radiographic damage in the wrist. Poznanski scores that are more negative represent more severe radiographic damage.
Validation procedures were primarily based on the analysis of reliability, construct validity, and score progression over time. Interobserver agreement and intraobserver agreement for the Sharp/van der Heijde scores were analyzed by computing the intraclass correlation coefficient (ICC) (30) for both longitudinal score values and score changes between study time points. For interpretation of the ICC values, the following classification was used: <0.4 = poor agreement, ≥0.4–<0.75 = moderate agreement, and ≥0.75 = good agreement (31). To visualize observer agreement, we plotted the scoring values (both absolute and changes) using the method described by Bland and Altman (32). The independent scores determined by the 2 observers for each radiograph were then averaged, and this average was used for the analyses.
Construct validity is a form of validation that examines whether the construct in question, in this case the Sharp/van der Heijde score, is related to other measures in a manner consistent with a priori prediction. Given that the Sharp/ van der Heijde score is a measure of structural joint damage, it was predicted that the correlation between baseline Sharp/ van der Heijde scores and score changes over time with the values at the last followup visit for the count of joints with restricted motion, Steinbrocker functional class, the JADI-A, and the Poznanski score, which measure closely related constructs, would be in the moderate-to-high range. Correlations with disease activity parameters at the final visit, such as the physician's global assessment of disease activity, the count of swollen and tender joints, the ESR, and the CRP level, were predicted to be poor. No predictions were made for the correlation with the C-HAQ, because this measure was found to reflect both disease activity and damage in all stages of JIA (33). The level of radiographic damage at baseline and the rate of radiographic progression over time were expected to be associated with the amount of long-term radiographic damage. All correlations were assessed using Spearman's rank correlation coefficient. For the purpose of this analysis, correlations >0.7 were considered high, correlations ranging from 0.4 to 0.7 were considered moderate, and correlations <0.4 were considered low (34). Agreement between predicted and observed correlations was taken as evidence of construct validity.
The time course of radiographic scores was assessed by calculating the absolute value at each study time point and the change between time points, and was described by plotting the median score values and changes over time, using the cumulative probability plots method (35). Because the range of scores differs according to various versions of the Sharp/van der Heijde score, we compared the grading of joint damage for each score by normalizing each score by its possible range, according to the following formula: (observed value − minimum value)/possible range × 100. Likewise, the comparison of score changes between study time points was made after normalization of each observed change by the maximum possible change. Because the study sample comprised JIA patients with polyarthritis, who represent the subset of patients with the most severe form of JIA, an average steady increase in scores over time was expected. Statistical analysis was performed with Statistica (StatSoft, Tulsa, OK).
A total of 180 patients who were eligible for the study were identified. Three patients were excluded because they had only a baseline radiograph available for review. Of the 177 patients included (57 boys and 120 girls), 55 had systemic arthritis, 55 had polyarthritis (8 were rheumatoid factor positive), 52 had extended oligoarthritis, 9 had psoriatic arthritis, and 6 had undifferentiated arthritis. The mean age at disease onset was 3.7 years (range 0.3–15.7 years), and the mean disease duration at baseline was 1.4 years (range 0.6–6.5 years). During the study period, 170 patients (96%) had received ≥1 second-line drugs, and 96 patients (54%) had received systemic corticosteroids. At 1 year, 3 years, 5 years, 7/8 years, and 10 years, 147 patients (83%), 103 patients (58%), 75 patients (42%), 44 patients (25%), and 22 patients (12%) had radiographs available; a total of 568 radiographs were available for study.
The interobserver agreement and intraobserver agreement (as assessed by the ICC) for the radiographic scores, either absolute values or score changes, were good, with 90% of the ICCs >0.9 and the remaining ICCs >0.85 (data not shown). The ICCs for radiographs examined separately at single time points and for all radiographs combined (and for the original and modified scores) were comparable. Table 1 shows the results obtained for the total original and modified scores, using the Bland and Altman method. The 95% limits of agreement tended to be slightly wider for the modified scores, which included more areas for the scoring of erosion. Overall, the 95% limits of agreement for absolute score values and score changes were comparable. Assessment of intraobserver reliability revealed better results, in terms of both the average difference and 95% limits of agreement, for observer 1 than for observer 2.
|Score range||Average difference||95% limits of agreement|
|All radiographs (n = 568)|
|Total original score||0–280||1.2||−16.1, 18.6|
|Total modified score||0–330||0.4||−19.4, 20.2|
|Baseline radiographs (n = 177)|
|Total original score||0–280||−0.5||−11.8, 10.8|
|Total modified score||0–330||−0.8||−14.1, 12.4|
|1-year radiographs (n = 147)|
|Total original score||0–280||0.6||−14.7, 16.0|
|Total modified score||0–330||0.1||−17.6, 17.9|
|5-year radiographs (n = 75)|
|Total original score||0–280||2.9||−13.1, 19.0|
|Total modified score||0–330||1.3||−15.8, 18.3|
|Baseline to 1 year (n = 145)|
|Total original score||0–280||1.3||−9.6, 12.2|
|Total modified score||0–330||−1.9||−20.1, 16.2|
|Baseline to 5 years (n = 75)|
|Total original score||0–280||3.9||−13.6, 21.3|
|Total modified score||0–330||2.7||−15.1, 20.4|
|Observer 1 (n = 81)|
|Total original score||0–280||−0.1||−9.8, 9.6|
|Total modified score||0–330||−0.2||−9.5, 9.0|
|Observer 2 (n = 71)|
|Total original score||0–280||3.8||−14.3, 22.0|
|Total modified score||0–330||1.9||−15.1, 19.0|
Table 2 presents Spearman's correlations between the baseline radiographic score values, the change in radiographic scores between baseline and 1 year, the change in radiographic scores between baseline and 5 years, and the clinical measures of JIA severity at the last followup visit, ≥5 years (range 5–21 years) after the baseline visit. This analysis involved 96 patients who were followed up for ≥5 years and for whom followup clinical data were available. The radiography scores for these patients were comparable with those for the 81 patients who could not be included because of a followup period <5 years or a lack of clinical information (data not shown). Radiography score changes in the first year and in the first 5 years, but not the baseline score values, were moderately correlated, as predicted, with the clinical indicators of disease damage, such as the number of joints with restricted motion, the JADI-A, the Steinbrocker functional class, and the Poznanski score. Also as predicted, radiographic damage was poorly correlated with disease activity measures such as the physician's global assessment of disease activity, the swollen and tender joint counts, and the laboratory indicators of inflammation. Unexpectedly, all correlations for the functional ability tool (the C-HAQ) were in the poor range. Looking at the correlations concerning change in radiography scores between baseline and 1 year, which are the most meaningful clinically, the modified erosion score demonstrated overall better results than the original erosion score.
|JIA severity measure|
|Physician's global assessment||No. of swollen joints||No. of tender joints||No. of joints with restricted motion||No. of actively involved joints||C-HAQ||JADI-A||Steinbrocker class||Poznanski score||ESR||CRP|
|Total original score||−0.06||−0.08||−0.04||0.16||−0.07||−0.19||0.21||0.21||−0.16||−0.22||−0.22|
|Total modified score||−0.12||−0.12||−0.09||0.10||−0.10||−0.18||0.22||0.26||−0.15||−0.28||−0.27|
|Change 0–1 year|
|Total original score||0.18||0.25||0.24||0.35||0.23||0.09||0.53†||0.48†||−0.46†||0.09||−0.01|
|Total modified score||0.18||0.25||0.24||0.40†||0.25||0.09||0.53†||0.51†||−0.48†||0.11||0.02|
|Change 0–5 years|
|Total original score||0.13||0.24||0.35||0.59†||0.38||0.30||0.60†||0.48†||−0.55†||0.37||0.21|
|Total modified score||0.17||0.25||0.35||0.56†||0.38||0.28||0.57†||0.46†||−0.51†||0.39||0.25|
Spearman's correlations between the baseline radiography score values, the changes from baseline to 1 year and the changes from baseline to 5 years, and the absolute radiography score values at 5 years are shown in Table 3. As expected, the level of correlation increased progressively, revealing that the amount of radiographic damage at 5 years was predicted poorly by the level of radiographic damage at baseline, moderately by the level of radiographic progression during the first year, and strongly by the level of radiographic progression during the first 5 years. Looking again at the correlations concerning the changes in radiography scores between baseline and 1 year, there was a tendency for the modified scores to yield better correlations than did the original scores. Furthermore, the modified erosion scores, but not the original erosion scores, yielded overall better correlations compared with the JSN scores.
|Total original score||Total modified score|
|Original erosion score||0.27||0.27|
|Modified erosion score||0.35||0.37|
|Total original score||0.38||0.39|
|Total modified score||0.42†||0.40†|
|Change, baseline to 1 year|
|Original erosion score||0.49†||0.50†|
|Modified erosion score||0.60†||0.62†|
|Total original score||0.61†||0.61†|
|Total modified score||0.66†||0.66†|
|Change, baseline to 5 years‡|
|Original erosion score||0.79||0.81|
|Modified erosion score||0.80||0.84|
|Total original score||0.89||0.88|
|Total modified score||0.88||0.89|
The amount of radiographic damage at baseline was comparable across the different JIA subtypes (data not shown). Plotting of the median normalized values of absolute score values at different study time points showed a steady increase in scores during the study period (Figure 2). The JSN scores increased more rapidly and remained consistently higher over time than both the original and modified erosion scores, suggesting that cartilage loss remained the most common and severe form of radiographic damage throughout the entire disease course. Modified erosion scores offered a slight advantage over original erosion scores in capturing erosive changes over time. Analysis of the median normalized change in scores over time confirmed that the rate of JSN change was constantly greater than that of erosive change (data not shown). Figure 3 depicts the cumulative probability plot of individual baseline-to–1 year changes in total original and modified scores. Five percent to 15% of the patients, depending on the study interval, showed improvement (positive change) in radiography scores over time.
The course and prognosis of JIA are highly variable: some patients experience a benign course and recover fully, whereas others have unremitting illness and carry a significant risk of joint destruction and permanent disability (36, 37). Although the outcome of JIA is generally unpredictable, especially early in the disease course (38), it is well known that patients with polyarticular arthritis are those in whom progressive destructive disease is more likely to develop (36, 37, 39). Furthermore, a higher than expected percentage of these patients have been found to have JSN and erosions early in their illness (40, 41). The presence of polyarthritis is a prerequisite for a patient's inclusion in controlled trials of second-line or biologic agents (42–44).
JIA patients with polyarthritis and wrist disease are at high risk of experiencing radiographic progression (10, 45). As many as 85% of the 633 patients enrolled in a controlled trial aimed at comparing intermediate and high doses of methotrexate in polyarticular JIA (44) had active disease in the wrist and/or hand joints (Ruperto N, et al: unpublished observations). Thus, the wrist and hand joints represent optimal sites at which to investigate radiographic progression in patients with polyarticular JIA.
We observed that the Sharp/van der Heijde score is a reliable and valid method for assessing radiographic progression in children with chronic polyarthritis. The study sample represents 30–40% of the entire population of JIA patients seen by the authors during the study period. Furthermore, the sample includes the majority of the most severe cases, as shown by the fact that as many as 96% of the patients had received second-line medications, and roughly half had received systemic corticosteroids.
We chose to investigate the Sharp/van der Heijde scoring method, because we believed that its manner of grading bony erosion, which not only is based on the count of the number of erosions but also takes into account their size in relation to bone surface, is particularly suited for application in pediatric patients. The bone size in children changes with skeletal maturation. Furthermore, in younger children with JIA, the changes in carpal bones and, to a lesser extent, in distal metacarpal epiphyses, are seen most frequently as deformity in shape rather than as discrete erosions (16). This phenomenon is unique to JIA and is likely attributable to a combination of growth abnormalities, ossification of previous cartilage injury, and true bony erosions (46–48). We excluded the foot component of the Sharp/van der Heijde score, because foot joints are rarely involved in JIA.
The assessment of JSN and erosions in growing children is challenging, owing to the changes in the morphology of bones and joints during skeletal maturation. To overcome this problem, we compared each patient's radiograph with a wrist/hand radiograph obtained in a healthy child of the same sex and bone age. We chose bone age–related instead of age-related or size-related standards, because patients with JIA frequently have advanced skeletal maturation (45, 46) and are small for their age (with their bones being correspondingly small), making these standards unreliable.
In the investigational setting chosen, the radiography scores under study proved to be reliable. Interobserver and intraobserver agreement, as assessed by the ICC, were good for both absolute values and score changes. The overall good concordance among observers was confirmed using the Bland and Altman method and the cumulative probability plots method.
The results of construct validity analysis were consistent with our expectations. Radiography score changes over time were strongly correlated with the clinical indicators of long-term joint damage and were poorly correlated with clinical measures of disease activity at the last followup visit. Similarly, radiographic progression over time was highly correlated with the amount of long-term radiographic damage. Evaluation of the rate of radiographic progression over time confirmed our previous observation (16) that JSN, which captures cartilage resorption, is the most common form of damage throughout the disease course in children with JIA. A sizable proportion of patients experienced improvement in radiographic progression over time, which may reflect both the effectiveness of antirheumatic therapies and the distinctive regenerative capacity of articular cartilage in growing children (39, 46).
Of the different radiography scoring methods tested, the modified versions (which included more areas at which erosion can be scored) appeared to be overall advantageous compared with the original versions. The modified scores revealed slightly wider 95% limits of agreement on the Bland and Altman analysis of interobserver reliability and were only modestly superior to the original scores in capturing radiographic progression over time. However, the modified scores revealed better construct validity in the clinically most meaningful subset of correlations between radiographic change from baseline to 1 year and long-term clinical and radiographic damage. Furthermore, 4 of the 5 new erosion areas included in the modified score were shown to be the most frequent sites of erosive changes within the entire hand and wrist areas (data not shown).
Our study should be viewed in the context of certain limitations. We chose a longitudinal design, because our aim was to examine the reliability of scores in the assessment of radiographic progression. The reading of serial radiographs may have facilitated concordance among readers, whereas agreement on scoring of cross-sectional films might have been more difficult to achieve. Readers examined the radiographs in chronological sequence and were allowed to see the previous scores. There is no definite consensus regarding whether readers should be aware of the time order of radiographs (49). However, blinding of films to chronological order in children is impossible due to readily apparent growth and maturation of the skeleton. We recognize that our choice of considering bone deformity as equivalent to bony erosion represents a potential limitation of our work, because these forms of damage are presumably unrelated pathogenetically. We also acknowledge that because the study scores assess radiographic damage only in the wrist/hand joints, our findings are of value only for patients with wrist and/or hand disease.
We conclude that in our cohort of JIA patients with polyarthritis, the adapted versions of the Sharp/van der Heijde scoring method proved reliable and valid and performed well in terms of capturing radiographic damage and its progression. These findings confirm and extend our previous demonstration that scoring methods for adults can be used to assess radiographic progression in JIA. Furthermore, they support the use of quantitative measures of radiographic damage in pediatric rheumatology care and their inclusion in future observational studies and therapeutic trials in JIA.
Dr. Ravelli had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Ravelli, Pistorio, Ruperto, Magni-Manzoni, Martini.
Acquisition of data. Ioseliani, Norambuena, Sato, Rossi, Ullmann.
Analysis and interpretation of data. Ravelli, Sato, Pistorio, Magni-Manzoni.
Manuscript preparation. Ravelli, Martini.
Statistical analysis. Pistorio, Ruperto.