To evaluate agreement among musculoskeletal pediatric specialists in assessing radiographic joint damage in juvenile idiopathic arthritis (JIA).
To evaluate agreement among musculoskeletal pediatric specialists in assessing radiographic joint damage in juvenile idiopathic arthritis (JIA).
Two pediatric rheumatologists, 2 pediatric radiologists, and 2 pediatric orthopedic surgeons evaluated independently 60 radiographs of both wrists and hands of children with polyarticular-course JIA. Films were scored using an adapted and simplified version of the Larsen score, ranging from 0–5. Study radiographs were selected from 568 films used in a previous study aimed to validate an adapted pediatric version of the Sharp/van der Heijde (SHS) score. To enable comparison of specialists' scores with the adapted SHS score, the 60 radiographs were divided into 6 classes of severity of damage based on quintiles of the adapted SHS score. Agreement was evaluated in terms of absolute agreement and through weighted kappa statistics.
The pediatric radiologists tended to assign lower scores and to provide more frequently scores of 0 than did the other specialists. Weighted kappa for the 3 pairs of specialists ranged from 0.67–0.69, indicating substantial agreement. Absolute agreement ranged from 51.3–55.7%, depending on the pair of specialists examined. Both absolute and weighted kappa concordance between specialists' scores and the adapted SHS score were poorer for the pediatric radiologist than for the other specialists.
We observed fair agreement in the assessment of radiographic damage among pediatric specialists involved in the care of children with JIA. The radiologists tended to be more reserved than the rheumatologists and orthopedic surgeons in labeling radiographs as damaged or in considering changes as important.
Juvenile idiopathic arthritis (JIA) is a chronic and heterogeneous disease characterized by prolonged synovial inflammation that may cause cartilage and bone destruction (). Structural joint lesions can lead to serious impairment of physical function and affect the quality of life of children and their families (). For this reason, evaluation of radiographic changes is an important tool for assessing disease severity and progression and for monitoring the effectiveness of therapeutic interventions in patients with JIA. Notably, in the recent American College of Rheumatology recommendations for the treatment of JIA, radiographic damage has been recognized as a feature of poor prognosis ().
Although newer imaging techniques such as magnetic resonance imaging and ultrasound allow earlier detection of bone and cartilage changes, conventional radiography remains the gold standard for the demonstration of structural joint damage in patients with chronic arthritis (). In the routine clinical setting, the presence and extent of joint changes are assessed by visual inspection of plain radiographs to search for joint space narrowing (JSN), bone erosions, or other abnormalities. However, in spite of the recent advances in standardized radiographic scoring ([5-10]), the experience with the application of quantitative scoring systems in childhood arthritis is still limited.
The management of children with JIA is ideally conducted through the establishment of a multidisciplinary team of specialists (). Among the various specialists, the pediatric rheumatologist, the pediatric radiologist, and the pediatric orthopedic surgeon may all be involved in the evaluation of joint features on radiographs. Some therapeutic decisions, either medical or surgical, are made through discussion and consensus between specialists by viewing patient radiographs. However, it is unknown whether and to what extent different specialists agree in the assessment of the amount of radiographic joint damage.
The primary aim of the present study was to evaluate the agreement between musculoskeletal pediatric specialists in assessing structural joint changes in children with JIA. A secondary aim was to investigate the concordance between the evaluation made by specialists and the quantitative assessment made by 2 external readers using an established radiographic scoring system.
For the purposes of the present study, 60 radiographs were selected from a sample of 568 films of both wrists and hands in the posteroanterior view used in a previous study aimed to validate an adapted pediatric version of the Sharp-van der Heijde (SHS) scoring system (). Before selection, the 568 radiographs were divided into 6 classes based on the adapted SHS score assigned in the original study as follows: class 1 included radiographs scored as 0, and classes 2–6 included radiographs with a score >0 divided according to the quintile of score distribution (range 1–324). Then, a random sample of 10 radiographs was selected for each class to make up the study sample. The 6 classes of films based on the adapted SHS score were selected to represent the entire spectrum of severity of radiographic damage and to enable their comparison with the 6 categories of the adapted Larsen score used in the present study (see below). The study protocol was approved by the Independent Ethics Committee of the Istituto Giannina Gaslini of Genova, Italy.
In this study (), 568 radiographs from 177 JIA patients were scored using the adapted SHS scoring system. This method applies to 15 areas for JSN and 21 areas for erosion or bone deformity in each hand and wrist. Scores for JSN and erosion in each area range from 0–4 and from 0–5, respectively. The total adapted SHS score is calculated as the sum of the scores for JSN (range 0–120) and erosion (range 0–210), and ranges from 0–330. To facilitate assignment of adapted SHS scores, each patient's radiograph was compared with a wrist/hand radiograph from a healthy child with the same bone age. The adapted SHS scores were assigned to all study radiographs by 2 independent readers (XN and MI). Both inter- and intraobserver reliability were good. The average score determined by the 2 readers for each radiograph was used for the analyses.
Two pediatric rheumatologists (RP and SV), 2 pediatric radiologists (MV and FM), and 2 pediatric orthopedic surgeons (SG and SR) evaluated independently the study radiographs. Specialists (SV, MV, and SG) evaluated all 60 radiographs. Specialists (RP, FM, and SR), who were involved in the study at a later stage, evaluated only 55 films because at the time of their readings 5 radiographs could not be retrieved from the radiology archive. All specialists had more than 3 years of experience in the assessment or care of children with JIA, but none of them were familiar with radiographic scoring. Each specialist was asked to score each film using an adapted version of the Larsen score, ranging from 0–5 () (Table 1). The score was adapted to consider bone deformity equivalent to bone erosion. This modification was made to take into account the peculiarities of radiographic changes seen in childhood arthritis, particularly disturbance of bone growth ([13, 14]). The readers were instructed not to score specific bone or joint areas, but to assign an overall score to the film, referring to the more damaged area(s) in either wrists or hands. No information on International League of Associations for Rheumatology (ILAR) category, disease duration and disease activity, or severity was given. Before the beginning of the study, the observers had a training session with the senior investigator (AR), a pediatric rheumatologist with >20 years of clinical experience and familiarity with radiographic scoring, in order to gain experience with the adapted Larsen score.
|0||Normal joint space and intact bony outlines|
|1||Slight reduction of joint space, erosion <1 mm in diameter or slight bone deformity|
|2||Moderate reduction of joint space, erosion >1 mm in diameter or moderate bone deformity|
|3||Marked reduction of joint space, marked erosion, or marked bone deformity|
|4||Severe reduction of joint space, severe erosion, or severe bone deformity|
|5||Mutilating bone changes and/or ankylosis|
Interobserver reliability of the adapted Larsen score was assessed for all of the films read by each pair of specialists in the same field. Intraobserver reliability was based on the scores of radiographs obtained from a subset of 20 randomly selected films, which were read a second time in a blinded manner by 1 pediatric rheumatologist (SV), 1 pediatric radiologist (MV), and 1 orthopedic surgeon (SG) 3 months after the previous review.
Descriptive statistics were reported as means, SDs, medians, and interquartile ranges for continuous variables, and as absolute frequencies and percentages for categorical variables. Interobserver and intraobserver agreement for the adapted Larsen score were analyzed by computing the intraclass correlation coefficient (ICC) (). For interpretation of the ICC values, the following classification was used: <0.4 = poor agreement, ≥0.4 to <0.75 = moderate agreement, and ≥0.75 = good agreement (). Agreement between the adapted Larsen scores assigned by the 6 specialists and concordance between specialists' scores and adapted SHS classes was evaluated in terms of absolute agreement or concordance and by means of unweighted and weighted kappa statistics. Absolute agreement or concordance was defined as the percentage of films to which the specialists assigned the same score. Results of kappa statistics were interpreted as follows: <0.4 = poor agreement, 0.4–0.60 = moderate agreement, 0.61–0.80 = substantial agreement, and >0.80 = excellent agreement. Concordance between specialists' scores and individual adapted SHS total, JSN, and erosion scores was assessed using Spearman's rank correlation coefficient. For the purpose of this analysis, correlations >0.7 were considered high, correlations ranging from 0.4–0.7 were considered moderate, and correlations <0.4 were considered low ().
The study radiographs were obtained from 60 children with JIA, whose main demographic and clinical features are presented in Table 2. All patients had a polyarticular disease course, with the most common ILAR category being systemic arthritis. No case of enthesitis-related arthritis was included. On average, patients had longstanding disease with a mean or median disease duration of more than 5 years. The spectrum of severity of radiographic joint damage was wide, as shown by the adapted SHS score range of 0 to 324.
|Features||No. (%)||Mean ± SD||Median||Lower quartile||Upper quartile|
|Systemic arthritis||22 (36.6)|
|Oligoarthritis extended||15 (25)|
|RF-negative polyarthritis||14 (23.3)|
|RF-positive polyarthritis||3 (5)|
|Psoriatic arthritis||2 (3.3)|
|Enthesitis-related arthritis||0 (0)|
|Undifferentiated arthritis||4 (6.7)|
|Age at disease onset, years||5.5 ± 3.9||4.55||2.4||7.2|
|Onset to radiograph interval, years||5.6 ± 4.1||5.09||1.7||9.2|
|Total adapted SHS score||68.3 ± 85.8||30.5||3.5||108.3|
|Classes of adapted SHS score|
|1||3.9 ± 2.2||3.5||2.3||5.4|
|2||16.4 ± 5.02||15.5||14.3||16.4|
|3||49.5 ± 16.7||41.75||39.1||63.3|
|4||112.2 ± 19.3||110||97.3||130.6|
|5||227.7 ± 63.3||216.25||176.4||277.4|
The interobserver agreement for the adapted Larsen score, as assessed by ICC (95% confidence interval [95% CI]), was 0.87 (0.78–0.92) between pediatric rheumatologists, 0.84 (0.73–0.90) between pediatric radiologists, and 0.84 (0.74–0.91) between pediatric orthopedic surgeons. The ICCs (95% CIs) for intraobserver agreement were 0.85 (0.61–0.94) for the pediatric rheumatologist, 0.91 (0.76–0.97) for the pediatric radiologist, and 0.87 (0.67–0.95) for the pediatric orthopedic surgeon. Owing to the good interobserver agreement, the scores provided by each pair of specialists in the same field were combined for the purposes of the remaining analyses. This means that a total of 115 (60 + 55) film readings were assessed for each pair of specialists.
The comparison of the adapted Larsen score assigned by each pair of specialists is reported in Table 3. On average, the scores provided by pediatric rheumatologists and pediatric orthopedic surgeons were higher than those of pediatric radiologists. Looking at individual scores, all specialists tended to cluster their evaluations at the lowest scores of 0 or 1, with this tendency being more pronounced for pediatric radiologists (52.1% of films) than for pediatric orthopedic surgeons (48.2% of films) and pediatric rheumatologists (46.1% of films). Pediatric rheumatologists assigned the score of 4 (16.5%) more frequently than pediatric radiologists (7.8%) and pediatric orthopedic surgeons (10.4%). The proportion of films that were given the maximum score of 5 was comparable across specialists.
|Pediatric rheumatologists||Pediatric radiologists||Pediatric orthopedic surgeons|
|Mean ± SD||2.4 ± 1.8||2.1 ± 1.9||2.3 ± 1.7|
|Median (IQR)||2 (1–4)||1 (0.5–4)||2 (1–4)|
|Individual score, no. (%) of radiographs (n = 15)|
|0||17 (14.8)||29 (25.2)||12 (10.4)|
|1||36 (31.3)||14 (26.9)||40 (37.8)|
|2||11 (9.6)||12 (10.4)||18 (15.7)|
|3||9 (7.8)||10 (8.7)||12 (10.4)|
|4||19 (16.5)||9 (7.8)||12 (10.4)|
|5||23 (20.0)||24 (20.9)||21 (18.3)|
Table 4 shows the level of agreement among specialists in assigning the adapted Larsen score. The weighted kappa for the 3 pairs of specialists ranged from 0.67–0.69, indicating that agreement was in the substantial range. The absolute agreement was fair with 51.3–55.7% of the films, depending on the pair of specialists examined, being assigned the same score. Agreement among specialists was overall greater for the scores of 0, 1, and 5 and poorer for the scores 2–4. All specialists agreed better in assigning a score of 1 than a score of 0; this phenomenon was particularly pronounced between pediatric rheumatologists and pediatric orthopedic surgeons.
|Pediatric rheumatologists vs. pediatric radiologists (n = 115)||Pediatric rheumatologists vs. pediatric orthopedic surgeons (n = 115)||Pediatric radiologists vs. pediatric orthopedic surgeons (n = 115)|
|Absolute agreement, no. (%)||Weighted kappa||Absolute agreement, no. (%)||Weighted kappa||Absolute agreement, no. (%)||Weighted kappa|
|All radiographs||59 (51.3)||0.69||64 (55.7)||0.68||60 (52.2)||0.67|
|0||12 (20.3)||5 (7.8)||10 (16.7)|
|1||17 (28.8)||25 (39.1)||20 (33.3)|
|2||4 (6.8)||6 (9.4)||7 (11.7)|
|3||3 (5.1)||5 (7.8)||4 (6.7)|
|4||4 (6.8)||6 (9.4)||2 (3.3)|
|5||19 (32.2)||17 (26.6)||17 (28.3)|
The concordance between the adapted Larsen score assigned by the 3 pairs of specialists and the 6 classes of the adapted SHS score is illustrated in Table 5. Both absolute and weighted kappa concordance were poorer for pediatric radiologists than for pediatric rheumatologists and pediatric orthopedic surgeons. Furthermore, pediatric radiologists revealed good concordance only for the minimum and maximum scores of 0 and 5. As seen for agreement among specialists, the concordance was poorer for the intermediate scores of 2 to 4 (with the exception of the good concordance shown by pediatric rheumatologists for score 4 and by pediatric orthopedic surgeons for score 1).
|N||Pediatric rheumatologists||Pediatric radiologists||Pediatric orthopedic surgeons|
|Absolute concordance, no. (%)||Weighted kappa||Absolute concordance, no. (%)||Weighted kappa||Absolute concordance, no. (%)||Weighted kappa|
|All radiographs||115||57 (49.6)||0.67||48 (41.7)||0.64||54 (47.0)||0.66|
|Individual SHS score classes|
|0||20||9 (45.0)||15 (75.0)||6 (30.0)|
|1||19||10 (52.6)||7 (36.8)||14 (73.7)|
|2||19||6 (31.6)||5 (26.3)||8 (42.1)|
|3||19||10 (52.6)||4 (21.1)||6 (31.6)|
|4||18||14 (77.8)||3 (16.7)||5 (27.8)|
|5||20||15 (75.0)||14 (70.0)||16 (80.0)|
The comparison of the frequency of individual adapted Larsen scores assigned by specialists and the classes of the adapted SHS score is depicted in Figure 1. The bar graph confirms the greater tendency for pediatric radiologists to assign score 0 and the best concordance between specialists' scores and adapted SHS classes for score 5. As compared with adapted SHS score assessment, the pediatric rheumatologists and the pediatric orthopedic surgeons tended to overestimate radiographic damage at score 1 and all specialists tended to underestimate radiographic damage at scores 2–4 (with the exception of the overestimation of radiographic damage by pediatric rheumatologists at score 4).
The Spearman's correlation between the adapted Larsen scores assigned by each pair of specialists and the individual adapted SHS scores assigned in the previous study () ranged from 0.86 to 0.87 for the total score, from 0.84 to 0.85 for the JSN score, and from 0.84 to 0.86 for the erosion score.
We investigated the agreement in the assessment of radiographic joint damage among musculoskeletal pediatric specialists traditionally involved in the care of children with JIA. To take into account the peculiarities of radiographic changes seen in childhood arthritis, particularly the disturbance of bone growth ([13, 14]), we devised an adapted version of the Larsen scoring system in which bone deformity was considered equivalent to bone erosion. Furthermore, because we were primarily interested in obtaining information of relevance to standard clinical practice, we asked the readers to assign a single overall score to the whole wrist and hand bones and joints, rather than to provide a detailed score of multiple areas, as required by the original Larsen method (). The scoring system developed for this study may constitute a suitable platform for future efforts aimed to introduce standardized assessment of radiographic joint damage in routine care of children with chronic arthritis.
The study radiographs were obtained from patients with polyarthritis and bilateral wrist disease. These features identify a subset of JIA patients at high risk for radiographic progression (). The study population had a wide range of disease duration and severity of radiographic joint damage. Thus, the selected sample of radiographs was well suited for the study purpose.
The comparison of the scores assigned by the 3 pairs of specialists showed that pediatric radiologists tended to assign lower scores and to provide more frequently scores of 0 (meaning no radiographic damage) than did pediatric rheumatologists and pediatric orthopedic surgeons (Table 3). This finding suggests that radiologists were more reserved in labeling radiographs as damaged or in considering minor changes as substantial. This discrepancy may be explained by the inclination of the radiologist, who is generally asked to provide a qualitative judgment of films for diagnostic purposes, to judge as pathologic only important abnormalities. Because the rheumatologist and the orthopedic surgeon are involved in management decisions, they may be more prone to evaluate the films in the perspective of a therapeutic intervention or change and may thus give more value to minor abnormalities. In keeping with our observation, a comparative study of the assessment of radiographic progression by rheumatologists and radiologists in adult patients with rheumatoid arthritis showed that the radiologists judged fewer radiograph sets as progressive than did the rheumatologists ().
Another interesting finding was that all specialists tended to cluster their evaluations at the extremes of the scoring scale and to assign fewer intermediate scores, i.e., scores of 2, 3, and (with the exception of pediatric rheumatologists) 4 (Table 3). This phenomenon may be related to the lack of specific training in radiographic scoring or to the difficulty in differentiating between moderate, marked, and severe damage. Notably, the lack of objective differential parameters for intermediate scoring categories may justify the difficulties all intervening observers found in choosing them as the most likely score. That the absolute agreement between each pair of specialists was much lower for scores 2, 3, and 4 than for scores 0, 1, and 5 was an indirect confirmation that assignment of intermediate scores was most problematic (Table 4).
Although a gold standard against which to test the validity of radiographic assessments was not available, we compared the semiquantitative scores provided by the 3 pairs of specialists in the present study with those assigned to the same set of films in a previous study by 2 independent readers (both pediatric rheumatologists), who used an established quantitative scoring system (). As seen for the evaluation of agreement among pairs of specialists, concordance was better for the extreme scores than for the intermediate scores (Table 5). Both absolute concordance and kappa values were lower for pediatric radiologists, which further underscores the distinctive attitude of these specialists in assessing radiographic features. That the independently obtained adapted SHS scores have been documented to be related to outcomes like physical functioning or clinical damage () lends support to the judgments of rheumatologists and orthopedic surgeons.
Our analysis should be interpreted in the light of some potential limitations. Radiographic scoring was only made on cross-sectional films. The lack of evaluation of longitudinal radiographs did not allow us to investigate the agreement in the assessment of progression of structural changes. Progression of radiographic damage over time is more important clinically than the amount of damage at a single point in time and plays a major role in influencing the decision to change therapy. We should acknowledge that which specialists gave the most valid judgment still remains debatable and cannot be definitely determined by this study.
In conclusion, we observed fair agreement in the assessment of radiographic joint damage among musculoskeletal pediatric specialists involved in the care of children with JIA. Agreement was greater in scoring absence of damage and maximum damage, and lower in scoring intermediate levels of damage. The radiologists tended to be more reserved than the rheumatologists and the orthopedic surgeons in labeling radiographs as damaged or in considering changes as important. These findings underscore the need for consensus initiatives aimed to improve concordance among physicians in the evaluation of radiographic changes in childhood arthritis.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Ravelli had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Rodriguez-Lozano, Giancane, Pignataro, Martini, Ravelli.
Acquisition of data. Pignataro, Viola, Valle, Gregorio, Norambuena, Ioseliani, Magnaguagno, Riganti.
Analysis and interpretation of data. Giancane, Pistorio, Martini, Ravelli.