Dr. Maksymowych is a Senior Scholar of the Alberta Heritage Foundation for Medical Research.
Spondyloarthritis research Consortium of Canada magnetic resonance imaging index for assessment of sacroiliac joint inflammation in ankylosing spondylitis
Article first published online: 5 OCT 2005
Copyright © 2005 by the American College of Rheumatology
Arthritis Care & Research
Volume 53, Issue 5, pages 703–709, 15 October 2005
How to Cite
Maksymowych, W. P., Inman, R. D., Salonen, D., Dhillon, S. S., Williams, M., Stone, M., Conner-spady, B., Palsat, J. and Lambert, R. G. W. (2005), Spondyloarthritis research Consortium of Canada magnetic resonance imaging index for assessment of sacroiliac joint inflammation in ankylosing spondylitis. Arthritis & Rheumatism, 53: 703–709. doi: 10.1002/art.21445
- Issue published online: 5 OCT 2005
- Article first published online: 5 OCT 2005
- Manuscript Accepted: 24 JAN 2005
- Manuscript Received: 26 AUG 2004
- Magnetic resonance imaging;
- Ankylosing spondylitis;
To develop a feasible magnetic resonance imaging (MRI)–based scoring system for sacroiliac joint inflammation in patients with ankylosing spondylitis (AS) that requires minimal scan time, does not require contrast enhancement, evaluates lesions separately at each articular surface, and limits the number of sacroiliac images that are scored.
A scoring method based on the assessment of increased signal denoting bone marrow edema on T2-weighted STIR sequences was used. MRI films were assessed blindly in random order at 2 sites by multiple readers. Intra- and interreader reliability was assessed by intraclass correlation coefficient (ICC); the 24-week response of patients with AS randomized to placebo:infliximab (3:8) was assessed by effect size and standardized response mean. The reliability and responsiveness of the scoring method were compared for STIR and gadolinium diethylenetriaminepentaacetic (Gd-DTPA)–enhanced MRI sequences.
We scanned 11 patients with AS with clinically active disease and 11 additional patients randomized to the trial of infliximab therapy. ICC for total sacroiliac joint STIR score ranged from 0.90 to 0.98 (P < 0.00001) and interobserver ICC for combined readers from the 2 sites was 0.84 (P < 0.0001). ICC for change scores was lower for STIR (ICC 0.53) than for Gd-DTPA–enhanced sequences (ICC 0.79). Responsiveness was poor, although fusion was evident in one-third of patients who received treatment (placebo:infliximab) and inflammation scores were low.
The Spondyloarthritis Research Consortium of Canada MRI index is a feasible and reproducible index for measuring sacroiliac joint inflammation in patients with AS.
Ankylosing spondylitis (AS) is a relatively common form of arthritis that, until recently, has had limited therapeutic options. Although recent advances in the development, validation, and standardization of clinical outcome instruments have greatly facilitated the development of new therapeutics for this disease, these self-administered instruments are subjective and primarily assess symptoms (1, 2). There are currently no established validated instruments that provide objective measures of disease activity. Acute phase reactants are elevated in only 40% of patients with disease confined to the axial spine and correlate poorly with the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) (3, 4). Radiologic instruments such as plain radiography and computed tomography assess structural damage, although nuclear isotopic imaging lacks resolution and specificity (5, 6).
Magnetic resonance imaging (MRI) is a relatively new imaging tool that allows detailed visualization of soft tissues. It was first introduced as a tool in the evaluation of patients with AS a decade ago (7) and since then its use has further advanced our understanding of the pathogenesis of disease. In particular, the use of fat suppression techniques (allowing visualization of lesions within bone marrow that may be obscured by marrow fat on other MRI sequences) has shown that one of the earliest demonstrable lesions in the sacroiliac joints of patients with inflammatory back pain is subchondral bone marrow edema (BME) (8, 9). Prospective followup of these patients has also shown that these lesions identified by MRI may have predictive validity for the development of the typical plain radiographic features of sacroiliitis (10, 11).
A further advance has been the use of gadolinium contrast enhancement in MRI, which is used to delineate regions of increased vascularity and altered capillary permeability typically associated with BME and inflammation. Dynamic imaging constitutes a further refinement whereby the rate and maximal uptake of gadolinium can be assessed by consecutive scans of regions of interest within the joint that are defined subjectively. This has led some investigators to develop grading schemes for assessing the degree of inflammation evident on MRI of the sacroiliac joint (12). A significant drawback of scoring lesions by dynamic MRI, however, has been the unreliability of defining the region of interest on successive scans.
The construct validity of the sacroiliac joint lesion observed on MRI has been addressed by demonstrating correlations between the grade of inflammation on MRI and both the severity of patient symptoms and the response to intraarticular steroid injections (13). The MRI grade of inflammation has also been shown to correlate with the histopathologic grade of inflammation observed in sacroiliac joint biopsies (14).
These developments have led some investigators to propose that MRI be used to provide an objective measure of disease activity in AS. To date, there have been no detailed reports describing the validation of a scoring system for inflammation of the sacroiliac joints. The Spondyloarthritis Research Consortium of Canada (SPARCC) has begun a systematic program of developing feasible and responsive MRI-based outcome tools for scoring inflammation and structural damage in both spine and sacroiliac joints. We describe the development and validation of a simplified scoring system for the assessment of disease activity in 2-dimensional planes in the sacroiliac joints that has been validated by investigators at 2 Canadian sites, the Universities of Alberta and Toronto.
PATIENTS AND METHODS
Patients and study protocol.
We studied 2 cohorts of patients with AS as defined by the modified New York criteria (15). Cohort A was a cross-sectional cohort of 11 patients with AS (8 men, mean age 38.9 years [range 29–66 years], mean disease duration 12.7 years [range 2–36 years], mean BASDAI score 5.9) who attended the outpatient clinic in the rheumatic disease unit at the University of Alberta. All patients were receiving nonsteroidal antiinflammatory drug therapy and were considered candidates for anti–tumor necrosis factor α therapy.
Cohort B was a group of 11 patients who had severe, active disease as defined by a BASDAI score ≥4 and who had been randomized to receive either placebo or infliximab (3:8) in a 24-week, double-blind, placebo-controlled trial. These patients were recruited at the University of Alberta and comprised 8 men and 3 women (mean age 45.1 years [range 36–59 years], mean disease duration 19.2 years [range 11–38], mean BASDAI score at baseline 6.2). These patients were part of a larger multicenter international study of 275 patients, although MRI of the sacroiliac joints was not included in the study protocol. Consequently, images of the sacroiliac joints were only available at the University of Alberta site.
Cohort A underwent MRI at a single time point, whereas Cohort B underwent MRI at baseline and 24 weeks after randomization. Either placebo or infliximab 5mg/kg was administered intravenously at baseline, 2 weeks, 6 weeks, and every 6 weeks thereafter. The study was approved by the University of Alberta ethics committee.
Magnetic resonance imaging.
MRI was performed with 1.5 Tesla (Siemens, Erlangen, Germany) systems using appropriate surface coils. Sequences were acquired in a coronal plane tilted parallel to the long axis of the sacroiliac joint (SI joint) with 4-mm slice thickness and 12 slices acquired. Sequences were as follows: T1-weighted spin echo (SE; time to recovery [TR] 517–618 msec, time to echo [TE] 13 mecs) and STIR (TR 2,720–3,170 mecs, time to inversion 140 msec, TE 38–61 mecs). Patients in cohort B also had T1-weighted SE sequences performed after intravenous administration of gadolinium diethylenetriaminepentaacetic acid (Gd-DTPA), 0.1 mmole/kg body weight. Detailed protocol description and multiple scout images allowed repetitive acquisition of SI joint sequences in near identical anatomic sites and angles for the posttreatment images. The specific MRI parameters for acquiring sacroiliac images are provided on our website (available at www.altarheum.com/research.html).
Scoring of MRI lesions.
Our scoring method for active inflammatory lesions in the SI joint relied on the use of a T2-weighted sequence that incorporates suppression of normal marrow fat signal. In other sequences, signal from marrow fat frequently obscures signal emanating from marrow edema associated with inflammation. Consequently, the use of fat suppression improves sensitivity for detection of abnormal water content.
Scoring of the SI joints was confined to those coronal slices depicting the synovial portion of the joint. In a preliminary overview of SI joint MR images from other patients with AS, the synovial portion was consistently evident in 6 consecutive coronal slices. Of the 12 acquisitions from posterior to anterior, this was typically slices 4 to 9. We therefore scored 6 consecutive coronal slices from posterior to anterior. T1-weighted SE images were included for anatomic reference only and were not scored. All lesions within the iliac bone and within the sacrum up to the sacral foramina were scored. Increased signal within the sacroiliac joint space or in the ligamentous portion of the joint was not scored.
Definition of abnormal lesion on STIR sequence.
Sacral interforaminal bone marrow signal formed the reference for assignment of normal signal in the joint. Three non-AS control (mechanical back pain) images and a set of reference AS images were included to facilitate the designation of abnormal increased signal.
Scoring of depth and intensity.
The signal from presacral blood vessels defined a lesion that was scored as intense. A lesion was graded as deep if there was a homogeneous and unequivocal increase in signal extending over at least 1 cm from the articular surface. Assessment of depth was made possible by including a scale on the image.
Figure 1 illustrates 2 examples of the scoring method. Each SI joint is divided into 4 quadrants: upper iliac, lower iliac, upper sacral, and lower sacral. The presence of increased signal on STIR in each of these 4 quadrants was scored on a dichotomous basis, where 1 = increased signal and 0 = normal signal. The maximum score for abnormal signal in the 2 SI joints of 1 coronal slice was therefore 8. Joints that included a lesion exhibiting intense signal were each given an additional score of 1 per slice that demonstrated this feature. Similarly, each joint that included a lesion demonstrating continuous increased signal of depth ≥1 cm from the articular surface was also given an additional score of 1. This brought the maximal score for a single coronal slice to 12. The scoring was repeated in each of the 6 consecutive coronal slices leading to a maximum score of 72.
MRI reading exercises.
A unique MRI study number was allocated for each patient by a technologist unconnected with the study, thereby ensuring blinding to all patient demographics. Assessment was based on printed film to allow multiple readers to assess scans that were set up on multiple view boxes at the same time. Each film was only identified by the MRI study number and images were read in random order. Pre- and posttreatment images were scored concurrently with the observer blinded to time sequence. For assessment of intraobserver reproducibility, MR images were randomly scored on 2 separate occasions 2 weeks apart after allocation of different MRI study numbers for the second reading exercise and rerandomization of time sequence by an independent observer who was not involved in the scoring exercise.
MRI readings were preceded by 2 training exercises conducted by a rheumatologist and a radiologist (WPM and RGWL) at the University of Alberta site to determine feasibility of the scoring method and obtain preliminary data on reproducibility. In addition, a set of 6 reference MR images from other patients with AS were chosen, depicting the range of abnormalities typically seen on MRI in patients with clinically active AS. Electronic images of these reference AS cases were then sent to readers at a second site, the University of Toronto, and the scoring method was discussed by teleconference. Scoring of MRI lesions at the University of Toronto site then ensued without a formal training exercise.
Cohort A readings.
SI joint lesions in MR images from 11 patients with AS and 3 controls with non-specific back pain were blindly scored by 3 independent readers, 2 radiologists (RGWL and SSD) and 1 rheumatologist (WPM), at the University of Alberta at 2 time points. In addition, these MR images were scored blindly at a single time point by 2 independent readers at the University of Toronto, a radiologist (DS) and a rheumatologist (RDI). These exercises allowed the assessment of interobserver reliability of the scoring method across both sites.
Cohort B readings.
SI joint STIR and Gd-DTPA–enhanced lesions in MR images from 11 patients with AS and 3 controls with non-specific back pain were blindly scored by the same 3 independent readers at the University of Alberta at baseline and at 24 weeks after randomization of the patients to either placebo or infliximab (3:8). These exercises allowed us to examine not only the responsiveness of the scoring method but also the relative reliability and responsiveness of the 2 imaging techniques.
Descriptive statistics (mean, median, SD), frequency histograms, and box plots with median, interquartile ranges, and maximum and minimum values were used to describe the overall distribution of scores. The intra- and interobserver reproducibility were calculated using analysis of variance to provide an intraclass correlation coefficient (ICC). A 2-way mixed effects model with the observer as a fixed factor was used. A value >0.6 was designated as representing good reproducibility, >0.8 represented very good reproducibility, and >0.9 represented excellent reproducibility. The P value for the significance of the F statistic for each ICC is presented. Reproducibility was also examined using Bland-Altman plots and 95% limits of agreement. The intrarater variance was used to calculate the smallest detectable difference (SDD) between 2 readings by a single rater for a single patient and was calculated by multiplying the SD of the differences by 1.96.
The effect size (ES) and the standardized response mean (SRM) were used to assess responsiveness. Values of 0.20, 0.50, and 0.80 or greater were considered to represent small, moderate, and large degrees of responsiveness, respectively. Discrimination was not assessed because the open-label phase of the clinical trial is ongoing and treatment codes remain unbroken at this time.
Distribution of scores.
Scores for lesions in the sacroiliac joints were distributed towards the lower end of the scoring range (0–72) (Table 1). These results were based on data from 22 patients, 11 patients in cohort A and baseline data from 11 patients in cohort B. Five patients were noted to have fused sacroiliac joints, which likely accounts for the clustering of scores at the lower end of the range. Table 1 also shows that scoring only the single most severely affected coronal slice (maximum coronal slice score) also resulted in a median score (2.75) that was distributed towards the lower end of the scoring range per coronal slice (0–12).
|Parameter||Mean ± SD||Median (25/75 percentiles)||Range|
|Total score||9.6 ± 10.1||7.0 (1.0–14.25)||0–39.5|
|Right SI joint score||5.6 ± 6.0||3.25 (1.0–8.0)||0–23|
|Left SI joint score||3.9 ± 5.2||2.0 (0–5.5)||0–20.5|
|Median coronal slice score||1.5 ± 1.8||1.0 (0–2.5)||0–6.5|
|Minimum coronal slice score||0.6 ± 1.3||0.0 (0–0)||0–5.5|
|Maximum coronal slice score||2.9 ± 2.3||2.75 (1–4)||0–9|
Reproducibility of scores.
Overall, intraobserver reproducibility was excellent not only for the total score (0.90–0.98) but also for the distribution of scores between the individual coronal slices of the SI joint, with similar reproducibility for median, minimum, and maximum coronal slice scores (Table 2). Intraobserver reproducibility was even better for the 2 radiologists (readers 2 and 3). ICC values were the same when the 3 controls were included in the analysis (data not shown). Mean total SI joint scores for the controls were 1.2, 1.7, and 2.7 for the 3 readers, respectively, indicating appropriate definition of abnormal STIR signal. SDD values ranged from 4.7 to 12.9 for total SI joint score and were smaller for the 2 radiologists (5.4 and 4.7 for readers 2 and 3, respectively) than for the rheumatologist (12.9).
|Reader 1||Reader 2||Reader 3|
|Total SI joint score||0.90†||0.98†||0.98†|
|Median SI joint coronal slice score||0.90†||0.98†||0.94†|
|Minimum SI joint coronal slice score||0.74‡||0.98†||0.98†|
|Maximum SI joint coronal slice score||0.85†||0.96†||0.92†|
Interobserver reproducibility (single site) of status score.
Figure 2 shows that the median, interquartile range, and maximum plus minimum values for the SI joint lesion scores from 22 patients recorded at the University of Alberta were comparable between the 3 readers. The scores represent the mean of the 2 values recorded for cross-sectional cohort A patients (n = 11) and the baseline (pretreatment) values for cohort B patients (n = 11). Overall, interobserver reproducibility was very good not only for the total SI joint score (ICC 0.89) but also for the distribution (median, maximum values) of scores among individual SI joint slices (Table 3). Bland-Altman plots of interobserver differences plotted against the mean of the interobserver scores showed that reader 1 tended to score higher than readers 2 and 3 (data not shown).
|University of Alberta||University of Toronto||Combined|
|Total SI joint score||0.89†||0.90†||0.86†|
|Median SI joint coronal slice score||0.85†||0.85†||0.79†|
|Maximum SI joint coronal slice score||0.84†||0.54‡||0.71†|
Interobserver reproducibility for total SI joint score was very good (ICC 0.90) for the 2 readers at the University of Toronto site despite no formal training exercise (Table 3). Reproducibility was less good when only the score for the single most severely affected coronal slice (maximum coronal slice score) was analyzed.
Interobserver reproducibility (2 sites) of status score.
Interobserver reproducibility was very good when total SI joint scores from both sites (5 readers) were analyzed (ICC 0.86) and was less good when only the score for the single most severely affected coronal slice (maximum coronal slice score) was analyzed (ICC 0.71) (Table 3).
Interobserver reproducibility of change score.
This data was available for the 11 patients in cohort B that were randomized to placebo/infliximab (3:8) for 24 weeks. Table 4 compares interobserver reproducibility of status and change scores for STIR and Gd-DTPA sequences in cohort B patients. Reproducibility was less good for change in STIR lesion score (ICC 0.53) where complete fusion was noted in 3 patients and features supporting the presence of marrow edema were subtle in the remaining patients. Reproducibility of change scores was somewhat better when Gd-DTPA–enhanced sequences were analyzed (ICC 0.79).
|Parameter||Mean ± SD||ICC||P|
|STIR status||6.5 ± 5.7||0.67||< 0.001|
|STIR change||−0.3 ± 4.2||0.53||0.002|
|Gd-DTPA status||9.7 ± 9.0||0.70||< 0.001|
|Gd-DTPA change||1.5 ± 4.9||0.79||< 0.001|
Responsiveness of scoring method.
Responsiveness of the SI joint scoring method was poor when either STIR (ES = 0.06, SRM = 0.08) or Gd-DTPA (ES = −0.20, SRM = −0.33) sequences were used, but 3 patients had fused joints and features indicating marrow edema were subtle in the remaining patients. This is further highlighted by the low scores for active inflammation on STIR images (Table 1).
We have developed an outcome tool for scoring inflammation by MRI in the SI joints that appears to meet the standards of feasibility and reproducibility. Innovations that facilitate scoring include the availability of reference films, objective definitions of abnormal signal on STIR sequences, high-intensity signal, and pronounced depth of signal, together with a simplified scoring sheet that allows us to both improve our ability to detect abnormal MRI signal and record the abnormalities within a reasonable time frame per patient (5–10 minutes). The feasibility and simplicity of the scoring method are reinforced by the reliability of scores recorded at the University of Toronto site despite no formal training exercise.
The use of gadolinium is costly and nearly doubles total scanning time to 1 hour, which is particularly uncomfortable for patients with AS. Our data show comparable reproducibility between gadolinium and STIR sequences for status scores although reproducibility of change scores may be somewhat better with gadolinium. These findings will require further examination in a cohort of patients with a broader spectrum of inflammatory disease, as most of our patients in cohort B had chronic changes and one-third had fused SI joints. This was also the reason why we did not examine the construct validity of this scoring method.
A further limitation of our study was that we were not able to assess the discriminant properties of our instrument. The open-label phase of the infliximab trial is ongoing and treatment codes remain unbroken. We are therefore unable to analyze changes in lesion scores by treatment allocation. The responsiveness of the scoring method for SI joint inflammation will also require further analysis in patients with a broader spectrum of inflammatory disease.
Two additional methods for scoring inflammation in the SI joints have been published, although as yet neither has been comprehensively validated (16, 17). One report also described a semiquantitative method for scoring sacroiliitis that was based on the degree of gadolinium enhancement from baseline during dynamic MRI (12). Inflammation was graded according to gadolinium enhancement of <20% (no inflammation), 20–90% (latent sacroiliitis), and >90% (florid inflammation) of a region of interest that was subjectively designated. This method performed well when validated against patient symptomatology and the response to computed tomography-guided sacroiliac cortisone injections (13). Primary drawbacks to this method include the costs and lengthy scan times associated with the use of gadolinium, the requirement for special technical expertise, and the difficulty in reproducing regions of interest on posttreatment MRI. A second method for scoring sacroiliitis has been published in abstract form (17). The SI joints were divided into quadrants and the extent of inflammation in each quadrant was graded for severity, although a method for defining abnormal signal was not provided. Reproducibility across the 5 participating sites was poor (ICC<0.5), however, and in developing our instrument we similarly noted that reproducibility was poor when increased signal was graded for severity rather than being scored in a dichotomous manner as in our current scoring scheme (data not shown). A third scoring scheme for sacroiliitis has also proposed grading the extent of BME noted on STIR and the degree of contrast enhancement on each side of the SI joint (16). An overall score is then calculated based on the sum of scores for BME and gadolinium enhancement in both the bone marrow and the joint space. Interobserver agreement was poor (kappa = 0.29). We also noted poor reproducibility of joint space inflammation scores during the pilot development of our instrument, and therefore we did not incorporate scoring of joint space inflammation into our instrument (data not shown).
The assessment of feasibility, truth, and discrimination of our instrument will now require further validation across sites, as has been organized in previous Outcome Measures in Rheumatology Clinical Trials exercises. In addition, further validation of the scoring method for inflammation in the SI joints should be carried out in patients at an earlier stage of disease prior to the development of significant chronic features of disease.
The authors wish to express their gratitude to Medical Imaging Consultants for all MRI scans of control subjects and SI joint MRI scans in the patients with AS.
- 5MR imaging of the sacroiliac joints: How? Why? When? So what? J Clin Rheumatol 2000; 6: 305–8., .
- 17Interreader agreement in the assessment of magnetic resonance imaging of the sacroiliac joints in spondyloarthropathy: the 1st MISS study [abstract]. Arthritis Rheum 2002; 46 Suppl: S428..