SEARCH

SEARCH BY CITATION

Keywords:

  • Magnetic resonance imaging;
  • Ankylosing spondylitis;
  • Validation

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

Objective

To develop a feasible magnetic resonance imaging (MRI)-based scoring system for spinal inflammation in patients with spondylarthropathy that requires minimal scan time, does not require contrast enhancement, evaluates the extent of lesions in 3 dimensional planes, and limits the number of vertebral levels that are scored because MRI demonstrates characteristic inflammatory lesions in the spine of patients with ankylosing spondylitis (AS) prior to the development of typical features on plain radiographic.

Methods

Our scoring method was based entirely on the assessment of increased signal denoting bone marrow edema on T2-weighted STIR sequences. Blinded MRI films were assessed in random order at 2 sites by 3 blinded readers at each of the 2 sites (the Universities of Alberta and Toronto). Intra- and interreader reliability was assessed by intraclass correlation coefficient. The 24-week response of patients with AS randomized to infliximab:placebo (8:3) was assessed by effect size and standardized response mean.

Results

An initial analysis of all discovertebral units (DVUs) in the spine of 11 patients demonstrated a mean of 3.2 (95% confidence interval 3.2, 5.2) affected units, while limiting the scoring to a maximum of 6 units captured most of the affected units. We scanned 11 patients with AS with clinically active disease and 20 additional patients randomized to a 24-week trial of either infliximab or placebo. Intraobserver reproducibility for the 6-DVU STIR score ranged from 0.93 to 0.98 (P < 0.0001). Interobserver reproducibility of scores by readers from both sites was 0.79 (P < 0.0001) for status score and 0.82 (P < 0.0001) for change score. Analysis of pretreatment and posttreatment scores for all 20 patients randomized to infliximab/placebo showed a large degree of responsiveness (standardized response mean = 0.87). Reproducibility and responsiveness were only slightly improved by using contrast enhancement with gadolinium diethylenetriaminepentaacetic acid.

Conclusion

The Spondyloarthritis Research Consortium of Canada MRI index is a feasible, reproducible, and responsive index for measuring spinal inflammation in AS.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

A characteristic feature of ankylosing spondylitis (AS) is the progression of axial inflammation from the sacroiliac joints to the intervertebral discs, facet joints, and ligamentous structures of the spine. This progression is typically assessed in clinical practice using plain radiography. However, plain radiographic features of spinal disease primarily include reparative phenomena such as syndesmophytes and ankylosis. Although instruments that measure these structural changes have been described and validated, they demonstrate poor sensitivity to change (1–3).

Magnetic resonance imaging (MRI) of the spine in patients with AS has demonstrated abnormalities in the spine prior to the development of plain radiographic findings (4). In particular, the introduction of fat-suppression sequences has allowed the visualization of lesions within bone marrow that may be obscured on MRI by marrow fat. These lesions have included bone marrow edema (BME) adjacent to vertebral endplates, at the attachment of the annulus fibrosus to the vertebral rim, at the insertions of anterior and posterior longitudinal ligaments, and within the facet joints.

These developments have led some researchers to propose that MRI be used to provide an objective measure of disease activity in the spine. One report has described a scoring scheme for spinal inflammation that scores lesions in each discovertebral unit, which is defined as the region including the intervertebral disc and both adjacent vertebral endplates (5). This activity index integrates scores for BME and erosion into a single measure of disease activity. Lesions are scored in the sagittal plane from cervical to lumbar regions. These index scores were shown to be reproducible when scored by a rheumatologist and a radiologist at a single site, to be sensitive to change in patients receiving infliximab, and to correlate with changes in the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) (5). However, this scheme does not stipulate how to score lesions confined to one vertebral endplate within the unit and does not address the potential advantage of scoring the extent of the lesion in both sagittal and coronal planes. It also assigns higher scores to structural damage (grades 4–6) than to inflammation (grades 1–3).

The Spondyloarthritis Research Consortium of Canada has embarked on a systematic program of developing feasible and responsive MRI-based outcome tools for scoring inflammation and structural damage in both the spine and sacroiliac joints. We describe the development and validation of a simplified scoring system for the assessment of disease activity in 2-dimensional planes in the spine that has been validated by investigators at 2 Canadian sites, the Universities of Alberta and Toronto.

METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

Patients and study protocol

We studied 2 cohorts of patients with AS as defined by the modified New York criteria (6). Cohort A was a cross-sectional cohort of 11 patients with AS, 8 men and 3 women with a mean age of 38.9 years (range 29–66 years), a mean disease duration of 12.7 years (range 2–36 years), and a mean BASDAI score of 5.9, who attended the outpatient clinic in the Rheumatic Disease Unit at the University of Alberta. All patients were receiving nonsteroidal antiinflammatory drug therapy and were considered candidates for anti–tumor necrosis factor α therapy. Cohort B was a group of 20 patients who had severe, active disease as defined by a BASDAI score ≥4 and who had been randomized to receive either infliximab or placebo in a 24-week, double-blind, placebo-controlled trial. Eleven patients in cohort B were recruited at the University of Alberta and comprised 8 men and 3 women with a mean age of 45.1 years (range 36–59 years) and a mean disease duration of 19.2 years (range 11–38 years). Nine patients in cohort B were recruited at the University of Toronto and comprised 8 men and 1 woman with a mean age of 40.2 years and a mean disease duration of 16.1 years. The mean BASDAI score for the total group of 20 patients was 6.2.

Cohort A underwent MRI at a single time point, whereas cohort B underwent MRI at baseline and at 24 weeks after randomization. A dosage of either placebo or infliximab 5 mg/kg was administered intravenously at baseline, 2 weeks, 6 weeks, and every 6 weeks thereafter. The study was approved by the ethics committees of the University of Alberta and the University Health Network (Toronto).

Magnetic resonance imaging

MRI was performed with 1.5 Tesla (Siemens, Erlangen, Germany) or GE systems (GE, Milwaukee, WI) using appropriate surface coils. Sequences were obtained in sagittal orientation with 4-mm slice thickness and 12–15 slices acquired. Spine sequences were T1-weighted spin echo (time to recovery [TR] 517–618 msec, time to echo [TE] 13 msec) and STIR (TR 2,720–3,170 mecs, time to inversion 140 msec, TE 38–61 msec). Patients in cohort B also had scans performed after intravenous administration of gadolinium diethylenetriaminepentaacetic acid (Gd-DTPA), 0.1 mmole/kg body weight. The spine was imaged in 2 parts: the upper half comprising the entire cervical and most of the thoracic spine, and the lower half comprising the lower portion of the thoracic spine and the entire lumbar spine. The specific MRI parameters for acquiring spine images are provided on our website (www.altarheum.com/research.html).

Scoring of MRI lesions

Our scoring method (www.altarheum.com/research.html) for active inflammatory lesions in the spine relied on the use of a T2-weighted sequence that incorporated suppression of normal marrow fat signal. We opted for the STIR sequence, which offered greater reliability when using large fields of view compared with T2 spin echo with spectral presaturation. Signal from marrow fat frequently obscures signal emanating from BME associated with inflammation. Consequently, the use of fat suppression improves sensitivity for detection of abnormal water content.

To score lesions in defined regions of the spine we used a previously reported definition of a discovertebral unit (5), which is defined as the region between 2 virtual lines through the middle of each vertebra and includes the intervertebral disc and the adjacent vertebral endplates. Each vertebral endplate was scored independently for BME. T1 spin echo images were included for anatomic reference only and were not scored. For each lesion, a total of 3 consecutive sagittal slices were assessed. This allowed assessment of the extent of the lesion in the coronal and the sagittal planes. Discal lesions were not scored because they are often abnormal in patients with mechanical low back pain and degenerative disc disease.

Definition of abnormal STIR signal

Bone marrow signal in the center of the vertebra constituted the reference for designation of normal signal. Three control non-AS (mechanical back pain) images and a set of patients with AS as references were included to facilitate designation of abnormal signal on STIR.

Scoring of depth and intensity

The signal from cerebrospinal fluid constituted the reference for designating an inflammatory lesion as intense. A lesion was graded as deep if there was a homogeneous and unequivocal increase in signal >1 cm. Assessment of depth was made possible by including a scale on the image.

Scoring method

Examples of the scoring method are shown in Figure 1. The scoring sheet is available for download at www.altarheum.com/research.html. Each discovertebral unit was divided into 4 quadrants: upper anterior endplate, upper posterior endplate, lower anterior endplate, and lower posterior endplate. The presence of increased STIR signal in each of these 4 quadrants was scored on a dichotomous basis: 1 = increased signal, 0 = normal signal. This was repeated for each of 3 consecutive sagittal slices resulting in a maximum score of 12 per discovertebral unit. On each slice, the presence of a lesion exhibiting intense signal in any quadrant was given an additional score of 1. Similarly, the presence of a lesion exhibiting depth ≥1 cm in any quadrant was given an additional score of 1, leading to a maximum additional score of 6 for each specific vertebral unit and bringing the total maximum score to 18 per unit. Because our preliminary analyses indicated that scoring only 6 discovertebral units would be sufficient (see below), this brought the total maximum score for our method to 108.

thumbnail image

Figure 1. Three examples of the Spondyloarthritis Research Consortium of Canada (SPARCC) magnetic resonance imaging (MRI) index for scoring inflammatory lesions in the spine. Figures represent sagittal STIR sequences through the thoracolumbar spine demonstrating some lesions characterized as deep and intense. A)Example 1. Total score = 2. Level 1: score = 1 (antero-inferior quadrant); Level 2: score = 1 (antero-inferior quadrant). B) Example 2. Total score = 7. Level 1: score = 2 (antero-superior quadrant plus 1 for intensity); Level 2: score = 1 (postero-inferior quadrant); Level 3: score = 4 (antero-superior and both inferior quadrants plus 1 for depth). C) Example 3. Total score = 19. Level 1: score = 5 (3 quadrants plus 1 each for depth and intensity); Level 2: score = 4 (2 quadrants plus 1 each for depth and intensity); Level 3: score = 4 (2 quadrants plus 1 each for depth and intensity); Level 4: score = 6 (4 quadrants plus 1 each for depth and intensity).

Download figure to PowerPoint

Improvement of feasibility-defining the 6-DVU score

We conducted an initial pilot examination of STIR images of the spine in 12 patients with active AS (mean BASDAI of 6.2) at a single site (University of Alberta). There was consensus among the 3 readers at this site that the mean ± SD number of discovertebral units with unequivocal BME was 3.2 ± 3.2 (95% confidence interval [95% CI] 1.2, 5.2). Consequently, in developing a more feasible approach to the scoring of spinal inflammation, we elected to score only 6 discovertebral units for all future exercises. Applying this approach to the scoring of these 12 patients would have captured 84.2% of all affected discovertebral units (32 of a total of 38 affected units). The 6 chosen units would constitute those units demonstrating the most apparent lesions on the STIR sequence and would be the same units scored concurrently on pre- and posttreatment images. Therefore, the data presented below describes the scoring of a maximum of 6 affected discovertebral units per patient and has been designated the 6-DVU score. Confining the analysis to a maximum of 6 affected units also reduced the time required to score the MRI films from 20–30 minutes to 10–15 minutes per patient.

MRI reading exercises

A unique MRI study number was allocated to each patient, thereby ensuring blinding to all patient identifiers. Allocation was done by a technologist not connected with the study. Assessment was based on printed film to allow multiple readers to assess scans that were set up on multiple view boxes at the same time. Each film was only identified by the MRI study number and images were read in random order. Pre- and posttreatment images were scored concurrently with the observer blinded to time sequence. For assessment of intraobserver reproducibility, MR images were randomly scored on 2 separate occasions 2 weeks apart after MR images had been allocated different study numbers and rerandomized for the second reading exercise by an independent observer who was not involved in the scoring exercise.

MRI readings were preceded by 2 training exercises conducted by a rheumatologist and a radiologist (WPM and RGWL) at the University of Alberta site to determine feasibility of the scoring method and obtain preliminary data on reproducibility. In addition, a set of 6 reference MRI images from patients with AS were chosen depicting the range of abnormalities typically seen on MRI in patients with clinically active AS. Images (in JPEG format) of these reference AS cases were then sent electronically to the University of Toronto site and the scoring method was discussed by teleconference. Scoring of MRI lesions at the University of Toronto site then ensued without a formal training exercise.

Cohort A readings

Abnormal signals on STIR images from spinal MRI scans in 11 patients with AS and 3 controls with nonspecific back pain were blindly scored by 3 independent readers at the University of Alberta, 2 radiologists (RGWL and SSD) and 1 rheumatologist (WPM), on 2 separate occasions.

Cohort B readings

Abnormal signals on STIR images from spinal MRI scans in 11 patients with AS and 3 controls with nonspecific back pain were blindly scored by the same 3 independent readers at the University of Alberta at baseline and at 24 weeks after randomization of the AS patients to receive either placebo or infliximab (3:8 randomization). Spinal lesions from 9 patients with AS were also scored blindly by 3 readers at the University of Toronto, a radiologist (DS) and 2 rheumatologists (RDI and MS), at baseline and 24 weeks after randomization to receive either infliximab or placebo. The Toronto pre- and posttreatment films were then sent to the University of Alberta and read by 2 independent readers (RGWL and SSD). These exercises allowed the assessment of interobserver reliability of the scoring method across both sites. We also compared the reliability and responsiveness of scores recorded for STIR with those recorded for Gd-DTPA–enhanced MRI.

Statistical analysis

Descriptive statistics (mean, median, SD), frequency histograms, and box-plots with median, interquartile ranges, and maximum and minimum values were used to describe the overall distribution of scores. The intra- and interobserver reproducibility were calculated using analysis of variance to provide an intraclass correlation coefficient (ICC). A 2-way mixed effects model with the observer as a fixed factor was used. A value >0.6 was considered good reproducibility, a value >0.8 represented very good reproducibility, and a value >0.9 represented excellent reproducibility. Reproducibility was also examined using Bland-Altman plots and 95% limits of agreement. These plots allow the visualization of interobserver differences across the whole range of scores. The intrarater variance was used to calculate the smallest detectable difference (SDD) between 2 readings by a single rater for a single patient. The SDD was calculated by multiplying the SD of the differences by 1.96.

The criterion validity of this MRI scoring method for quantifying inflammatory lesions was assessed by comparing changes in the index score with changes in disease activity as quantified by the BASDAI, nocturnal back pain, and C-reactive protein (CRP) levels. This comparison was done by linear regression and Spearman's correlation coefficient analysis.

Two statistical methods were used to assess responsiveness: the effect size and the standardized response mean. Values of 0.20, 0.50, and 0.80 or greater were considered to represent small, moderate, and large degrees of responsiveness, respectively. Discrimination was not assessed because the open-label phase of the clinical trial is still ongoing and treatment codes remain unbroken at this time.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

Distribution of scores

The detailed distribution of scores for STIR sequences in the spines of all 22 patients with AS seen at the University of Alberta are shown in Table 1. A mean of 72.2 affected discovertebral units was recorded by the 3 readers, of which 47.7 (66.0%) were in the thoracic spine, 18 (24.9%) were in the lumbar spine, and 6.5 (9.0%) were in the cervical spine. Affected cervical segments were only evident in 5 patients and contributed a mean score of 6.3 (range 1.5–11.7) out of a maximum score of 108 for total spinal score. Involvement of only cervical segments was noted in 2 patients, though the mean scores were 1.7 and 11.7. Slightly more lesions were noted in the anterior compared with the posterior quadrants of the discovertebral units. The 3 readers assigned intensity scores in 3 (13.7%), 5 (22.7%), and 3 (13.7%) patients respectively, and the total intensity score for all 22 patients was 5, 10, and 5, respectively. A score for depth was assigned in 6 (27.3%), 3 (13.7%), and 6 (27.3%) patients by the 3 readers, respectively, and the total depth score assigned for all 22 patients was 14, 16, and 24, respectively.

Table 1. Distribution of spinal STIR scores of 22 patients with ankylosing spondylitis (cohorts A and B, University of Alberta site) as recorded by 3 readers according to the Spondyloarthritis Research Consortium of Canada magnetic resonance imaging scoring scheme*
ParameterMean ± SDMedian (25/75 percentiles)Range
  • *

    The scheme scores up to 6 affected discovertebral units per patient (the 6-DVU score). DVU = discovertebral unit.

Total 6-DVU score15.1 ± 11.914.75 (3.25–24.75)0–46
Total anterior quadrant score8.6 ± 8.47.25 (0.25–16.75)0–29
Total posterior quadrant score6.0 ± 6.25.5 (0–9.5)0–30
Median DVU score2.1 ± 2.11.75 (0–3.75)0–8
Minimum DVU score0.46 ± 0.810 (0–1)0–2.5
Maximum DVU score5.75 ± 4.16.0 (1.75–9.0)0–14

Reproducibility of scores.

Intraobserver.

Overall, intraobserver reproducibility was excellent for the 6-DVU score (ICC 0.93–0.98) and maximum discovertebral unit scores (ICC 0.85–0.97) recorded per patient in cohort A by the 3 readers at the University of Alberta site (Table 2). ICC values for the 6-DVU score were 0.95, 0.98, and 0.94 for the 3 readers, respectively, when the 3 controls were included in the analysis. Mean 6-DVU scores for the controls were only 0, 0.3, and 0 for the 3 readers, respectively, indicating appropriate definition of abnormal STIR signal. The mean percentage intraobserver concordance for the selection of affected discovertebral units was 78.8%, 87.9%, and 80.3% for the 3 readers, respectively (range 50–100%). SDD values ranged from 6.0 to 8.7 for the 6-DVU score.

Table 2. Intraobserver reproducibility of spinal STIR scores in 11 patients with ankylosing spondylitis (cohort A, University of Alberta site) as recorded by 3 readers according to the Spondyloarthritis Research Consortium of Canada magnetic resonance imaging scoring scheme*
ParameterIntraobserver ICC
Reader 1Reader 2Reader 3
  • *

    Up to 6 discovertebral units are scored per patient (6-DVU score). ICC = intraclass correlation coefficient. P < 0.0001 for all values.

6-DVU score0.930.980.93
Median DVU score0.920.980.90
Maximum DVU score0.850.970.92
Interobserver reproducibility of status score

For each of the 3 readers, status scores were available for 22 patients at the University of Alberta site and represented the mean of the 2 values recorded for cross-sectional cohort A patients (n = 11) and the baseline (pretreatment) value for cohort B patients (n = 11). Interobserver reproducibility for STIR status score was very good for the 6-DVU score (ICC 0.80) and also for the maximum discovertebral unit score (ICC 0.86) (Table 3). The median, interquartile range, and maximum plus minimum values for the 6-DVU STIR status score were comparable between the 3 readers (data not shown). Bland-Altman plots of interobserver differences plotted against the mean of the interobserver scores showed that reader 1 tended to score higher than readers 2 and 3 (Figure 2). Interobserver reproducibility for the 6-DVU Gd-DTPA status score was also very good (ICC 0.83). The mean percentage interobserver concordance for the selection of affected discovertebral units was 78.8% between readers 1 and 2, 77.3% between readers 1 and 3, and 97.0% between readers 2 and 3, the latter being the 2 radiologists.

Table 3. Interobserver reproducibility of the 6-DVU STIR status and change scores in patients with ankylosing spondylitis as recorded by 3 readers at the University of Alberta and 3 readers at the University of Toronto according to the Spondyloarthritis Research Consortium of Canada magnetic resonance imaging index*
ParameterInterobserver ICC
University of AlbertaUniversity of TorontoCombined
  • *

    Up to 6 discovertebral units are scored per patient (6-DVU score). ICC = intraclass correlation coefficient. P < 0.0001 for all values.

6-DVU STIR score status0.800.730.79
6-DVU STIR score change0.660.830.82
thumbnail image

Figure 2. Bland-Altman plot with 95% limits of agreement for the 6-discovertebral unit STIR scores recorded by readers 1 and 2 (at the University of Alberta site) in 22 patients with ankylosing spondylitis.

Download figure to PowerPoint

Interobserver reproducibility for the 6-DVU STIR status score in cohort B patients (n = 9) was good (ICC 0.73) for the 3 readers at the University of Toronto site despite no formal training exercise (Table 3). Interobserver reproducibility was also good (ICC 0.79) for the 6-DVU status score when the same MRI films were read by readers at both sites (Table 3).

Interobserver reproducibility of change scores

Interobserver reproducibility for the 6-DVU STIR change scores for patients with AS in cohort B who were randomized to receive either placebo or infliximab for 24 weeks was good for readers at the University of Alberta (ICC 0.66) and very good for readers at the University of Toronto (ICC 0.83) (Table 3). Reproducibility of Gd-DTPA change scores for Alberta readers was somewhat better (ICC 0.82). Reproducibility of STIR change scores was also very good when readers at both sites analyzed the same pre- and posttreatment spinal STIR images (ICC 0.82).

Responsiveness of scoring method

Responsiveness of the spinal scoring method was large regardless of the MRI method used and was similar at both sites despite the fact that the groups included placebo patients (Table 4). When the 6-DVU STIR scores on pre- and posttreatment images from patients at both sites (n = 20) were analyzed, the standardized response mean was 0.87. Twelve (60%) of 20 patients had a reduction in spinal inflammation score that was greater than the calculated mean SDD for the 3 readers at the University of Alberta.

Table 4. Responsiveness of the 6-DVU STIR and Gd-DTPA scores in patients with ankylosing spondylitis in cohort B randomized to placebo:infliximab (3:8) for 24 weeks*
MRI parameterSitePre-treatmentPost-treatmentResponsiveness
ESSRM
  • *

    Values are the mean ± SD unless indicated otherwise. 6-DVU = 6-discovertebral units; Gd-DTPA = gadolinium diethylenetriaminepentaacetic acid; MRI = magnetic resonance imaging; ES = effect size; SRM = standardized response mean; A = University of Alberta; T = University of Toronto.

6-DVU score STIRA13.4 ± 11.44.7 ± 5.80.770.82
6-DVU score Gd-DTPAA19.4 ± 16.16.4 ± 8.70.900.90
6-DVU score STIRT23.7 ± 16.29.3 ± 10.40.950.80
6-DVU score STIRA + T17.6 ± 14.67.1 ± 8.00.760.87

Criterion validity of scoring method

The correlation between change in the BASDAI (7) and change in spine STIR MRI scores 24 weeks following randomization to infliximab or placebo groups was not significant when the data from all patients at both sites were analyzed (Spearman correlation = 0.32; 95% CI −0.16, 0.68) but was significant when only the subset of data from Toronto patients were analyzed (Spearman correlation = 0.90; 95% CI 0.32, 0.96; P = 0.002). Similarly, although there was no significant correlation between overall change in nocturnal pain and spinal STIR scores, there was a significant correlation when only the subset of data from Toronto patients were analyzed (Spearman correlation = 0.86; 95% CI 0.42, 0.97; P = 0.005). A significant correlation was evident between change in spine STIR MRI scores and change in CRP values at 24 weeks for the combined site data (Spearman correlation = 0.79; 95% CI 0.53, 0.92; P < 0.001).

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

We have developed an outcome tool for scoring inflammation in the spine by MRI that appears to meet the standards of feasibility, truth, and discrimination, which have been used by Outcome Measures in Rheumatology Clinical Trials (OMERACT) as the key elements of a filter through which outcomes are evaluated (8). We were able to meet these standards by incorporating several novel methodologic approaches to the scoring of MRI abnormalities, namely the use of reference films; objective definitions of abnormal STIR signal, high-intensity signal, and pronounced depth of signal; and development of a simplified scoring sheet. These approaches improved both our ability to detect and discriminate abnormal from normal MRI signal and to record abnormalities within a reasonable time frame per patient (10–15 minutes). These results were reinforced by the reliability of scores recorded at the University of Toronto site despite no formal training exercise for the readers.

A previous study has reported that the mean ± SD number of discovertebral units with abnormal lesions observed on STIR MRI in a series of 39 patients with AS and active disease was 3.7 ± 6.0 (mean ± SD BASDAI score of 6.4 ± 1.4) (9). A mean of 1–1.5 discovertebral units were involved in each of the 3 segments of the spine. These results are consistent with those in the present analysis and, together with the high degree of intra- and interobserver concordance in selecting affected discovertebral units evident in our study, reinforce the view that examination of all discovertebral units may be unnecessary and feasibility may be improved by focusing solely on those lesions that are most apparent when first viewing the MRI film. Our data show that most inflammatory lesions are captured by limiting the scoring to only 6 affected discovertebral units. Thus, feasibility appears to be improved using a prior scoring method for spinal inflammation that analyzed and recorded lesions in each discovertebral unit, although we acknowledge that this will require further examination in future scoring exercises (5). A minority of patients had lesions that were considered to have either intensity or depth, and the contribution of these scores to the overall spinal score was minor. Although assignment of scores for intensity and depth has obvious face validity, their inclusion in the scoring system may not add to the metrologic properties of our instrument. This will require further examination in a broader spectrum of disease.

Reproducibility was good for status scores between observers at the individual sites and also for images analyzed at both sites. Reproducibility between readers was not as good for change scores at the University of Alberta site despite greater familiarity with the scoring method. This finding could reflect both the less active disease at baseline and the lesser degree of change following treatment that appeared evident in the patients with AS at this site compared with Toronto patients. Responsiveness was large, particularly when Toronto patients were analyzed, despite the fact that placebo-treated patients were included in the analysis. The primary limitation of our study was that we were not able to assess the discriminant properties of our instrument. The open-label phase of the infliximab trial is ongoing and treatment codes remain unbroken.

Gadolinium augmentation is costly and virtually doubles the scan time to 1 hour. This is particularly uncomfortable for patients with AS. Our data show that contrast enhancement only contributes slight improvement to reproducibility and responsiveness, suggesting that the added expense and inconvenience may not be worthwhile. A previous study that validated an alternative method to spinal inflammation came to the same conclusion (5).

There is no objective gold standard for measuring disease activity in AS. The assessment of criterion validity has therefore been performed by analyzing change in STIR scores with change in symptomatic measures and acute phase reactants. One report has described a relatively weak correlation between changes in the BASDAI score and changes on both gadolinium-enhanced (r = 0.49) and STIR (r = 0.6) MRI in 20 patients randomized to infliximab/placebo for 12 weeks (5). We were unable to confirm this in our overall cohort of 20 patients, although a relatively strong correlation was noted in the Toronto cohort (r = 0.90). Three patients in the Alberta cohort reported reductions in the BASDAI whereas significant change was not evident on MRI. However, treatment allocation was not available and so it is not known if these responses represented placebo effects. A significant correlation was noted between change in CRP levels and change in spinal STIR scores for the overall cohort.

We are aware of only 1 published scoring scheme for measuring spinal inflammation by MRI (5). This method simultaneously scores erosions and bone marrow inflammation on both sides of the vertebral disc. The approach to scoring lesions confined to 1 vertebral endplate in the discovertebral unit is not specified and scoring is confined to the sagittal plane so that the assessment of the extent of inflammation in the coronal plane is not possible. Higher scores are assigned for erosions than for marrow edema. The instrument performed well in a placebo-controlled trial of infliximab in patients with AS where sufficient reproducibility and sensitivity to change was evident to discriminate between active treatment and placebo groups.

The assessment of feasibility, truth, and discrimination of our instrument requires further validation across sites as has been organized in previous OMERACT exercises. However, our data reinforce the conclusions of an earlier study (5) that MRI is a responsive tool and should be included as an outcome instrument in clinical trials of novel therapeutics for AS. Furthermore, MRI is currently the only objective parameter for measuring disease activity that has been validated by both histopathology and the development of structural damage on plain radiography in patients with AS (10, 11). Further validation of this scoring methodology will therefore be necessary before more cost-effective approaches can be developed.

  • 1

    Walter P. Maksymowych is a Senior Scholar of the Alberta Heritage Foundation for Medical Research.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  • 1
    Spoorenberg A, de Vlam K, van der Linden S, Dougados M, Mielants H, van de Temple H, et al. Radiological scoring methods in ankylosing spondylitis: reliability and change over 1 and 2 years. J Rheumatol 2004; 31: 12532.
  • 2
    MacKay K, Mack C, Brophy S, Calin A. The Bath Ankylosing Spondylitis Radiology Index (BASRI): a new, validated approach to disease assessment. Arthritis Rheum 1998; 41: 226370.
  • 3
    Wanders AJ, Landewe RB, Spoorenberg A, Dougados M, van der Linden S, Mielants H, et al. What is the most appropriate radiologic scoring method for ankylosing spondylitis? A comparison of the available methods based on the Outcome Measures in Rheumatology Clinical Trials filter. Arthritis Rheum 2004; 50: 262232.
  • 4
    Maksymowych WP, Lambert RG. MR imaging of the sacroiliac joints: How? Why? When? So what? J Clin Rheumatol 2000; 6: 3058.
  • 5
    Braun J, Baraliakos X, Golder W, Brandt J, Rudwaleit M, Listing J, et al. Magnetic resonance imaging examinations of the spine in patients with ankylosing spondylitis, before and after successful therapy with infliximab: evaluation of a new scoring system. Arthritis Rheum 2003; 48: 112636.
  • 6
    Van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis: a proposal for modification of the New York criteria. Arthritis Rheum 1984; 27: 3618.
  • 7
    Garrett S, Jenkinson T, Kennedy LG, Whitelock H, Gaisford P, Calin A. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol 1994; 21: 228691.
  • 8
    Bellamy N. Clinimetric concepts in outcome assessment: the OMERACT filter. J Rheumatol 1999; 26: 94850.
  • 9
    Braun J, Baraliakos X, Golder W, Brandt J, Rudwaleit M, Sieper J, et al. Analyzing acute spinal changes in ankylosing spondylitis: a systematic comparison of conventional x-rays with magnetic resonance imaging (MRI) using established and new scoring systems [abstract]. Ann Rheum Dis 2003; 62 Suppl 1:251.
  • 10
    Bollow M, Fischer T, Reisshauer H, Backhaus M, Sieper J, Hamm B, et al. Quantitative analyses of sacroiliac biopsies in spondyloarthropathies: T cells and macrophages predominate in early and active sacroiliitis: cellularity correlates with the degree of enhancement detected by magnetic resonance imaging. Ann Rheum Dis 2000; 59: 13540.
  • 11
    Oostveen J, Prevo R, den Boer J, van de Laar M. Early detection of sacroiliitis on magnetic resonance imaging and subsequent development of sacroiliitis on plain radiography: a prospective, longitudinal study. J Rheumatol 1999; 26: 19538.