SEARCH

SEARCH BY CITATION

Keywords:

  • Gout;
  • Radiography;
  • Outcome measure

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Objective

To identify a valid method to measure radiographic damage in patients with chronic gout.

Methods

The scoring method that best represented radiographic damage in individual joints was analyzed by comparing a gold standard rheumatologist consensus global score with recognized scoring methods, including the Sharp/van der Heijde erosion and narrowing scores, Ratingen destruction score, and Steinbrocker score. Ninety-five proximal interphalangeal joints from 12 patients with gout were included in this analysis. Scoring of hand and feet radiographs from an additional 35 patients with gout was used to analyze the sites to be included in a scoring system and the additional features to be recorded.

Results

For individual joints, the combination of the Sharp/van der Heijde erosion and narrowing scores correlated best with the consensus global score. In addition, the limits of agreement were narrowest for the combined Sharp/van der Heijde erosion and narrowing score. All joint areas in the Sharp/van der Heijde rheumatoid arthritis score and the distal interphalangeal joints were affected by chronic gout and contributed to the total score. Additional features (extraarticular erosions, joint space widening, and ankylosis) occurred infrequently, and scoring of these features did not increase the reliability of the total score. The reliability of the total score was high: intraclass correlation coefficient for intraobserver reproducibility was 0.993–0.998 and for interobserver reproducibility was 0.963–0.966. The modified Sharp/van der Heijde score was able to discriminate between early and advanced disease.

Conclusion

A modified Sharp/van der Heijde system accurately and reliably represents radiographic joint damage in chronic gout.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Structural articular damage is an important end point in studies of patients with chronic rheumatic disease. Radiographic scoring is frequently used to analyze the severity and progression of structural damage over time or in response to a therapeutic intervention. Studies of patients with erosive arthropathies such as rheumatoid arthritis (RA) and psoriatic arthritis (PsA) have emphasized that evaluation of radiographic damage is a key outcome measure that predicts other outcomes, such as joint deformity, functional status, economic outcomes, and mortality (1–4).

A number of radiographic scoring methods have been described for analysis of radiographic damage in patients with RA and PsA. The modified Sharp/van der Heijde method is most widely used in RA, and involves scoring of erosions and joint space narrowing in the joints of the hands and feet (5, 6). Modification of this method has been described for PsA, with additional scoring of the distal interphalangeal (DIP) joints and separate scoring for gross osteolysis and pencil-in-cup deformities (7, 8). The Steinbrocker scoring method (with modifications) has also been used in PsA, with one or a number of joints scored globally based on the severity of joint damage (9, 10). The Ratingen scoring method is another option, with scoring based on the extent of joint destruction in the hands and feet (11, 12). In order for any scoring method to be of value, it must meet the Outcome Measures in Rheumatology Clinical Trials (OMERACT) filters of truth (face, content, construct, and criterion validity), discrimination (reliability and sensitivity to change), and feasibility (13).

Chronic gout is characterized by typical radiographic changes (14–16). Subcutaneous tophi appear as asymmetric and lobulated soft tissue masses. Intraarticular tophi frequently lead to bone erosions, which are well defined with overhanging edges and without associated osteopenia. Compared with RA, joint space narrowing occurs late in disease. Ankylosis or joint space widening may also occur in advanced disease. Extraarticular erosions and osteolytic lesions have also been described. Some studies have indicated that the radiographic features of chronic gout can improve (15, 17). However, to date no radiographic scoring method has been described for gout. OMERACT VIII has included radiographic scoring as part of the preliminary core set for outcome measures in chronic gout (18). The goal of this work was to identify a valid scoring method for radiographic damage in chronic gout.

The authors initially met in Auckland, New Zealand in March 2006 with the intention of developing a draft scoring method for further testing. This meeting included a review of the radiographic features of gout and established scoring systems for other forms of erosive arthritis and a structured analysis of radiographs. This preliminary literature review and analysis indicated that an erosion score alone might not sufficiently represent the extent of disease, due to the apparent ceiling effects of established erosion scores. The potential importance of a joint destruction score was highlighted. This review also suggested that scoring for joint space narrowing might not be instructive in a chronic gout score, because this feature is typically associated with advanced disease. Therefore, validation of the scoring method proceeded in 3 stages: identification of the scoring method that best reflected radiographic evidence of disease in individual joints, analysis of hand and feet radiographs to determine which joints and additional features should be included in a scoring system, and analysis of the reliability of the full scoring system.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Individual joint analysis.

To determine the scoring method that best reflected radiographic changes of chronic gout in individual joints, we compared various scoring methods with a consensus global impression of radiographic damage. The proximal interphalangeal (PIP) joints of both hand radiographs of 12 patients with chronic gout were analyzed. These films were chosen from patients attending rheumatology clinics to represent a spectrum of disease severity. The median disease duration for these patients was 17.5 years (range 3–38 years) and all patients had tophaceous disease. All patients met the criteria for diagnosis of gout developed by Wallace et al (19). Three rheumatologists independently scored each PIP joint on a scale of 0 (normal) to 10 (extremely abnormal joint). A total of 95 PIP joints were scored (one PIP joint was not assessed due to amputation). Joints in which the scores differed by more than 2 grades between at least 2 raters were rescored by the rheumatologist with the more extreme score. The final ratings were averaged to form the consensus global score, which was used as a gold standard index of joint damage. The same joints were assessed by a radiologist (BC) for Sharp/van der Heijde erosion score (0–5), Sharp/van der Heijde joint space narrowing score (0–4), Ratingen destruction score (0–5), and Steinbrocker global score (0–4) (6, 9, 11).

Each score and combinations of these scores were compared with the rheumatology global assessment using Spearman's correlation, linear regression modeling, and analysis of differences between measures, as described by Bland and Altman (20). In the analysis of mean differences, combination scores were adjusted to a scale of 0–10, if required. To model the differences between the gold standard consensus global score and the score from various scoring systems, general linear modeling was used that also incorporated the individual patient as a random factor because the 95 joints were from 12 patients and were therefore not independent.

Hand and feet radiograph scoring using the Sharp/van der Heijde method.

After identification of a scoring method that most accurately represented disease severity in individual joints, we proceeded to score the hand and feet radiographs of 35 additional patients with chronic gout using this system. These films were obtained from unselected patients attending rheumatology clinics for management of gout. The median disease duration for these patients was 16 years (range 1–46 years) and 26 (74%) patients had tophaceous disease. All patients met the Wallace criteria for diagnosis of gout (19). The joints scored were those included in the Sharp/van der Heijde scoring system for RA, with additional scoring of the DIP joints of the hands. The presence of extraarticular erosions, joint space widening, and ankylosis was also recorded. Following a training exercise, these films were independently scored by a rheumatologist (ND) and a radiologist (BC). The films were scored twice in random order by each reader. These data were used to analyze the distribution of affected joints and the observer reliability of the proposed full scoring system.

To assess which joints ought to be included, we determined the distribution of scores for each joint area: DIP, PIP/interphalangeal, wrist/carpus, metacarpophalangeal, metatarsophalangeal, and toe first interphalangeal joint. Because the possible score range for each joint area was different, the joint area score was expressed as the percentage of the maximum possible score for that area. The median scores for each joint area indicate its contribution to the total score. Scoring of additional features (extraarticular erosions, joint space widening, and ankylosis) was assessed by analysis of the frequency of the features and reliability analysis.

The intra- and interobserver reliability of the scoring system were assessed by intraclass correlation coefficients (ICCs), coefficients of variation with 95% confidence intervals calculated as described by Vangel (21), and Bland and Altman limits of agreement analysis. Discrimination analysis comparing scores in early and late disease was performed using the Mann-Whitney U test, with the mean of the assessors' first scores.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Individual PIP joint analysis.

The final rheumatologist consensus global scores had excellent agreement (ICC 0.91 preconsensus and 0.96 postconsensus). Initial analysis consisted of Spearman's correlations between the consensus global score and different scoring methods, individually and in combination (Table 1). Of all the scores and combinations of scores tested, the combination of the Sharp/van der Heijde erosion and narrowing scores showed greatest correlation with the consensus global score (r = 0.881, P < 0.001).

Table 1. Correlation between the different scoring methods and consensus global score for proximal interphalangeal joint analysis
Scoring method (score range)rPDifference from consensus score, mean ± SD*
  • *

    Scores adjusted to 0–10 range.

Sharp/van der Heijde erosion score (0–5)0.825< 0.001−0.38 ± 2.37
Sharp/van der Heijde narrowing score (0–4)0.766< 0.0010.58 ± 2.19
Steinbrocker score (0–4)0.860< 0.001−0.55 ± 1.87
Ratingen destruction score (0–5)0.718< 0.0011.49 ± 1.95
Sharp/van der Heijde erosion + Ratingen (0–10)0.831< 0.0010.56 ± 1.66
Sharp/van der Heijde erosion + narrowing (0–9)†0.881< 0.001−0.049 ± 1.41
Sharp/van der Heijde erosion + narrowing + Ratingen (0–14)0.880< 0.0010.57 ± 1.32

Multiple regression analysis was then undertaken to identify determinants of the consensus global score. In this analysis, the Sharp/van der Heijde erosion score (beta weight 0.50, P < 0.001) and the Sharp/van der Heijde joint space narrowing score (beta weight 0.43, P < 0.001) independently contributed to the consensus global score, and addition of the Ratingen destruction score did not further predict the consensus global score (beta weight 0.11, P = 0.15).

General linear modeling (incorporating the patient as a random factor) demonstrated that there was a significant difference between the mean difference scores (consensus global score compared with each of the 7 scoring methods, F = 13.2, df = 6, P < 0.001). Tukey's post hoc tests demonstrated that the agreement between the consensus global score and Ratingen score was worse, and other differences observed were due to the direction of difference rather than the absolute value. However, the limits of agreement for the mean difference between the scoring method and the consensus global score were narrowest for the adjusted Sharp/van der Heijde erosion plus narrowing score (mean −0.049; 95% confidence interval −0.337, 0.239). The Bland-Altman plot for the combined Sharp/van der Heijde erosion plus narrowing score is shown in Figure 1. Examples of the consensus global scores and comparative Sharp/van der Heijde scores are shown in Figure 2.

thumbnail image

Figure 1. Bland-Altman plot of the difference between the combined Sharp/van der Heijde score (S-vdH) and the consensus global score against the average of the scores. The combined Sharp/van der Heijde erosion and narrowing score has been adjusted to range from 0 to 10. Solid horizontal lines represent the mean difference; broken horizontal lines represent 2 SDs of difference from the mean.

Download figure to PowerPoint

thumbnail image

Figure 2. Individual joint analysis scores: examples of proximal interphalangeal joint analysis with rheumatologist consensus global scores, Sharp/van der Heijde (S/vdH) erosion scores, S/vdH narrowing scores, and combination scores.

Download figure to PowerPoint

Distribution of the affected joints.

The distribution of affected joints that contributed to the modified Sharp/van der Heijde score is represented in Figure 3. Multiple regression analysis using the joint area scores (expressed as the percentage of maximum possible score for that joint area) as the independent variable and the total raw score as the dependent variable demonstrated that all joint areas (including the DIP joint) contributed significantly to the total score. Reliability for erosion and narrowing scores was excellent for all joint areas: ICC for intraobserver analysis ranged from 0.94 to 0.99, and ICC for interobserver analysis ranged from 0.87 to 0.95.

thumbnail image

Figure 3. Box plot of the Sharp/van der Heijde score (erosion plus narrowing) expressed as a percentage of the maximum possible score for each joint area. * Extreme values. Boxes represent the interquartile range, circles represent outliers, and error bars represent all cases that are not outliers or extremes. DIP = distal interphalangeal; MCP = metacarpophalangeal; MTP = metatarsophalangeal; PIP = proximal interphalangeal; IP = interphalangeal.

Download figure to PowerPoint

Scoring of additional features.

Extraarticular erosions were less frequently observed, with 37% (assessor 1) and 63% (assessor 2) of cases scoring 0. The interobserver reliability for this feature was worse than for the full modified Sharp/van der Heijde system (ICC 0.71). Furthermore, the correlation between the modified Sharp/van der Heijde score and the same score with the addition of the extraarticular erosion score was very high (>0.95), suggesting that the addition of the extraarticular erosion score added relatively little information.

Ankylosis and joint space widening were also observed infrequently: 5 (assessor 1) and 8 (assessor 2) instances of ankylosis were recorded in 5 patients, and 8 (assessor 1) and 19 (assessor 2) instances of joint space widening were recorded in 3 and 4 patients, respectively. Consequently, a marked floor effect for these features was observed: 86% of patients had ankylosis scores of 0 and 89% of patients had joint space widening scores of 0. The reliability was also poor (interobserver ICC 0.31 [range 0–0.58] for ankylosis and 0.035 [range 0–0.36] for joint space widening).

Reproducibility of the modified Sharp/van der Heijde method.

These data suggested that the most reliable method is the modified Sharp/van der Heijde method for RA with inclusion of the DIP joints of the hands. Using this method, high levels of reproducibility were found (ICC for intraobserver reproducibility 0.993–0.998 and for interobserver reproducibility 0.963–0.966). The reliability data for the 35 patients with chronic gout are summarized in Table 2. The Bland-Altman plots for this method are shown in Figure 4. Overall, more variability was demonstrated between observers than within the same observer.

Table 2. Summary of inter- and intraobserver reproducibility for the modified Sharp/van der Heijde score in 35 patients with chronic gout*
 ICC (95% CI)Limits of agreement, mean ± SD differenceCV (95% CI), %
  • *

    ICC = intraclass correlation coefficient; 95% CI = 95% confidence interval; CV = coefficient of variation.

Intraobserver reproducibility (score 1 vs score 2)   
 Assessor 10.998 (0.996, 0.999)0.8 ± 4.84.3 (3.4, 5.6)
 Assessor 20.993 (0.986, 0.997)0.74 ± 8.97.8 (6.3, 10.2)
Interobserver reproducibility (assessor 1 vs assessor 2)   
 Score 10.963 (0.927, 0.981)−1.3 ± 20.517.9 (14.4, 23.7)
 Score 20.966 (0.934, 0.991)−1.4 ± 19.016.8 (13.5, 22.3)
thumbnail image

Figure 4. Bland-Altman plots of difference for intra- and interobserver reproducibility. Solid horizontal lines represent the mean difference; broken horizontal lines represent 2 SDs of difference from the mean.

Download figure to PowerPoint

Discrimination of the modified Sharp/van der Heijde method.

We hypothesized that this score would have the ability to discriminate between early and advanced disease. The 35 patients were separated into 2 groups based on the median disease duration of the entire group: disease duration <16 years and disease duration ≥16 years. The median score for the modified Sharp/van der Heijde method was 37 for those with shorter disease duration compared with 108.5 for those with longer disease duration (Mann-Whitney U 71.5, P = 0.007). Similarly, patients without tophaceous disease had lower median radiology damage scores than those with tophaceous disease (median scores 11.5 versus 96, Mann-Whitney U 28, P = 0.001), further supporting the ability of this score to discriminate between early and advanced disease.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

The purpose of this study was to identify a scoring method that meets the OMERACT filter of truth, discrimination, and feasibility (13). We compared a gold standard consensus score with various radiographic scoring methods in an attempt to identify the scoring method that most accurately represents radiographic joint damage in patients with gout. The results show that scoring of erosion and joint space narrowing using the Sharp/van der Heijde method can accurately represent radiographic damage in chronic gout. Disease affecting the DIP joints contributes to the scores, and therefore these joints should also be included within the scoring method. Measurement of additional features characteristic of chronic gout, such as joint space widening, ankylosis, and extraarticular erosions, does not increase the comprehensiveness of the scoring method. Scoring of plain radiographs of the hands and feet using this method is reproducible in patients with gout, and is able to discriminate between early and advanced disease. Furthermore, this method is feasible as an outcome measure for chronic gout, due to ease of access, low cost, and rapid imaging times.

We recognize that ongoing work is needed to further validate this scoring system for use in gout. This study was not designed to analyze sensitivity to change, either over time or in response to therapy, and this information would greatly assist in understanding the discrimination of the radiographic damage index as an outcome measure. In particular, at present the velocity of radiographic damage in patients with gout is unknown. Such analysis requires longitudinal data, and will be the focus of further work by our group. In addition, studies that compare other measures of disease severity, such as functional outcomes, with radiographic damage scores may further validate the use of this measure.

We believe that this system will be useful in the study of patients with gout. A number of new urate-lowering therapies are currently in development for gout, and this scoring system may be of benefit in quantifying the effects of these agents on prevention or regression of articular damage in patients with chronic gout. Some studies have suggested that radiographic damage in gout may regress. However, the role of serum urate lowering in achieving radiographic regression remains uncertain (17). The availability of a validated scoring radiographic damage index will also assist in further understanding the intensity of serum urate lowering that should be targeted in gout; recent studies have suggested that lowering serum uric acid (SUA) levels to less than 6 mg/dl is required to achieve tophus regression and suppression of acute gout flares (22–24). However, uncertainty remains about whether SUA targets should be even lower than 6 mg/ml. Quantification of radiographic damage may allow for further understanding of the required targets in patients with gout, particularly those with erosive disease.

This system may also assist in understanding the mechanisms and impact of joint damage in patients with gout. Although the relationship between tophaceous disease and bone erosion is well documented, the underlying cause of joint damage in gout remains speculative; a number of mechanisms have been suggested, including direct damage through proximity to the tophus, associated synovitis, and release of proinflammatory cytokines or matrix metalloproteases in response to uric acid crystals (25–28). Similarly, although radiographic damage has been well described as a prognostic factor in other forms of erosive arthritis, its relative importance in the prognosis of gout is uncertain. Thus, the availability of a reliable radiographic damage index will further clarify these key questions regarding the pathophysiology and natural history of disease.

In summary, we have identified a valid radiographic damage index for use in clinical studies of chronic gout. This system provides a feasible method of representing articular damage in gout, and may allow for better understanding of the natural history, pathophysiology, and impact of therapies in this disease.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES

Dr. Dalbeth had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Dalbeth, Clark, McQueen, Doyle, Taylor.

Acquisition of data. Dalbeth, Clark, McQueen.

Analysis and interpretation of data. Dalbeth, Clark, McQueen, Doyle, Taylor.

Manuscript preparation. Dalbeth, McQueen, Doyle, Taylor.

Statistical analysis. Dalbeth, Taylor.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. REFERENCES