To validate a magnetic resonance imaging (MRI) reference criterion for a positive sacroiliac (SI) joint MRI finding based on the level of confidence in the classification of spondyloarthritis (SpA) by expert MRI readers.
To validate a magnetic resonance imaging (MRI) reference criterion for a positive sacroiliac (SI) joint MRI finding based on the level of confidence in the classification of spondyloarthritis (SpA) by expert MRI readers.
Four readers assessed SI joint MRIs in 2 inception cohorts (cohorts A and B) of 157 consecutive patients with back pain ages ≤50 years and 20 age-matched healthy controls. Patients were classified according to clinical examination and pelvic radiography as having nonradiographic axial SpA (n = 51), ankylosing spondylitis (n = 34), or nonspecific back pain (n = 72). Readers indicated their level of confidence in their classification of SpA on a 0–10 scale, where 0 = definitely not SpA and 10 = definite SpA. The MRI reference criterion was prespecified by consensus as the majority of readers indicating a confidence score of 8–10; the absence of SpA required all readers to indicate non-SpA (a confidence score of 0–4). We calculated interreader reliability and agreement between MRI-based and clinical classification using kappa statistics. We estimated cutoff values for MRI lesions attaining a specificity of ≥0.90 for SpA.
In cohorts A and B, 76.4% and 71.6% of subjects met the MRI criterion, respectively. The kappa values for interreader agreement were 0.76 for cohort A and 0.80 for cohort B and between MRI-based and clinical assessment were 0.93 for cohort A and 0.57 for cohort B. Using this MRI reference criterion, the cutoff for the number of affected SI joint quadrants needed to reach a predefined specificity of ≥0.90 was ≥2 for bone marrow edema (BME) in both cohorts and ≥1 for erosion in both cohorts, and the BME and/or erosion lesions increased sensitivity without reducing specificity.
This data-driven study using 2 inception cohorts and comparing clinical and MRI-based classification supports the case for including both erosion and BME to define a positive SI joint MRI finding for the classification of axial SpA.
Inflammation on magnetic resonance imaging (MRI) of the sacroiliac (SI) joints in patients with spondyloarthritis (SpA) is a major criterion in the Assessment of SpondyloArthritis international Society (ASAS) classification criteria for axial SpA, which are based on expert clinical opinion as the gold standard . The definition of a positive SI joint MRI finding in the ASAS criteria was generated by consensus among experts and is entirely based on the presence of bone marrow edema (BME) on the STIR sequence or osteitis on the T1-weighted gadolinium-enhanced sequence . Studies using a data-driven approach to defining a positive SI joint MRI finding are scarce . Moreover, there is growing evidence that structural lesions may contribute substantially to the diagnostic utility of SI joint MRI in nonradiographic axial SpA (nr-axSpA) [4, 5].
The approaches to define a positive SI joint MRI finding in early axial SpA require a gold standard criterion for the classification of axial SpA. This requirement is challenging in the early stages of the disease because structural changes on pelvic radiographs, which constitute the basis of the modified New York criteria for classifying SpA, may take >10 years to become apparent [6, 7]. Consequently, the expert opinion of clinicians has been used as a gold standard, which has obvious limitations associated with false-positive or false-negative assignments and ultimately still requires a lengthy followup period to ascertain the development of radiographic sacroiliitis.
The level of confidence in the classification of SpA according to the global assessment of SI joint MRI by expert readers could itself be considered a reference criterion for a positive MRI finding if this approach were to be standardized and the certainty of classification according to the levels of confidence were shown to be reliable among the readers. This approach also addresses the reality that diagnostic ascertainment by MRI in routine practice is based on the simultaneous assessment of both T1-weighted spin-echo (T1SE) and STIR sequences in a global manner and is not lesion based, as described in the ASAS definition . A previous study of ours has also shown that classification of SpA is improved when the readers are trained to recognize lesions on the T1SE sequence . This important observation reflects that lesions on the STIR sequence are often subtle, resulting in diagnostic uncertainty, and there is often additional information on the corresponding T1SE sequence that enhances confidence in the diagnosis when viewed simultaneously with the STIR sequence.
In this study, we described the development and preliminary validation of an MRI reference criterion that is based on the level of confidence in the classification of SpA according to the global evaluation of both the T1SE and STIR sequences of SI joint MRI. We assessed SI joint MRI in 2 SpA inception cohorts that were recruited by 2 entirely different clinical strategies identifying patients with early SpA. We generated consensus on an operational definition of the levels of confidence for the MRI-based classification of SpA, assessed interreader reliability for the level of confidence, compared a candidate MRI-based reference criterion with clinical classification by rheumatologists, and, finally, determined which specific MRI lesions, alone and in combination, best defined the MRI-based reference criterion by calculating minimum cutoff values for the number of affected SI joint quadrants.
The study sample comprised 2 inception cohorts of consecutive patients ages ≤50 years with back pain who were newly referred to 2 university outpatient clinics. The patients with only back pain in cohort A (n = 69; Balgrist University Hospital, Zurich, Switzerland) were referred by rheumatologists and primary care physicians for further evaluation of suspected SpA. Twenty age-matched healthy controls, defined by the Nordic questionnaire  and by the absence of clinical features indicative of SpA, were concomitantly recruited from the hospital staff of the same university clinic. The patients with back pain and acute anterior uveitis (AAU) composing cohort B (n = 88; University of Alberta, Edmonton, Alberta, Canada) presented with AAU to a university ophthalmology department; all AAU patients with past or present back pain were identified by a structured questionnaire and referred to the rheumatology department of the same university hospital for the assessment of SpA. Subjects of both inception cohorts had not participated in previous imaging studies of SpA.
In both inception cohorts, a classification of SpA was made based on the clinical opinion of the local rheumatologist (UW for cohort A and WPM for cohort B). Patients were classified by clinical examination and pelvic radiographs as having nr-axSpA (n = 20 and 31 for cohorts A and B, respectively), ankylosing spondylitis (AS; n = 10 and 24 for cohorts A and B, respectively), and nonspecific back pain (NSBP; n = 39 and 33 for cohorts A and B, respectively). In cohort A, only AS patients with a symptom duration of ≤5 years were enrolled. Two readers at each site independently categorized pelvic radiographs according to the modified New York criteria ; discrepancies in the radiographic evaluation were resolved by consensus.
Patients with ongoing or previous treatment with biologic agents were excluded in both cohorts. The study protocol was approved by the local ethics review boards, and written informed consent was obtained from the study participants.
The technical parameters for the STIR and T1SE sequences of semicoronal MR SI joint scans performed at both institutions have been described previously . The MRIs were read and scored independently by 4 readers (1 radiologist [VZ] and 3 rheumatologists [SJP, UW, WPM]) blinded to the diagnosis and patient identifiers. The scans of each cohort were evaluated separately in random order on electronic workstations at the institution of each reader. MRI scores were entered into a customized online data entry module.
The evaluation of the MRIs followed a standardized module  comprising 2 sections: first, a global assessment indicating the presence or absence of SpA according to all MRI findings on both sequences, and second, a scoring section indicating the number of SI joint quadrants affected by different MRI lesions according to standardized lesion definitions .
The readers indicated the presence or absence of SpA and their level of confidence with this classification by global evaluation of both the T1SE and STIR sequences of the MRIs on a numerical rating scale ranging from 0–10, where 0 = definitely not SpA and 10 = definite SpA. Scores within the ranges of 8–10 and 0–2were defined by consensus as constituting a high level of confidence in a classification of SpA and non-SpA, respectively, while scores within the ranges of 6–7 and 3–4 were defined as moderate confidence for a classification of SpA and non-SpA, respectively. A score of 5 was defined as an equivocal level of confidence.
We assessed 4 MRI lesion types (BME, joint erosion, marrow fat infiltration, and ankylosis) according to standardized lesion definitions and a reference SI joint MRI set developed by consensus among the study investigators [3, 5, 11]. The presence or absence of BME, erosion, and fat infiltration was indicated as a binary variable in each quadrant (upper and lower ilium and upper and lower sacrum) of both SI joints on all MRI slices. Ankylosis was indicated in each half of the joint (upper and/or lower). Erosion in the absence of BME was analyzed in nr-axSpA patients as the mean number (percentage) of subjects over 4 readers.
The differences between cohorts A and B in demographic and clinical characteristics were assessed by Fisher's exact test for nominal and Wilcoxon's test for continuous variables. The frequency of single and combined MRI lesions according to the number of affected SI joint quadrants in patients and controls was analyzed descriptively as indicated concordantly by the majority of readers (≥3 of 4).
The agreement between all readers jointly for 2 categories of confidence scores (0–5 and 6–10) and for 5 categories of confidence scores (0–2, 3–4, 5, 6–7, and 8–10) was calculated by the mean percentage agreement and kappa statistics. Cohen's kappa  was used for binary variables and Fleiss' kappa for ordinal variables [13, 14]. For kappa statistics, we provide confidence intervals (CIs) based on 1,000 bootstrap samples. Agreement was defined as slight, fair, moderate, substantial, and almost perfect by the values κ < 0.2, 0.2 ≤ κ < 0.4, 0.4 ≤ κ < 0.6, 0.6 ≤ κ < 0.8, and 0.8 ≤ κ < 1, respectively .
The assignment of a definitive classification of SpA by SI joint MRI was prespecified by consensus as the majority of readers (≥3 of 4) indicating definite SpA with a confidence score of 8–10. Classification of the absence of SpA by SI joint MRI required all 4 readers to indicate non-SpA (confidence score ≤4 on a 0–10 scale). This approach combining the majority of readers indicating a high level of confidence for a classification of SpA with all readers indicating a moderate to high level of confidence of non-SpA aimed at high specificity for a definition of a positive SI joint MRI finding. Further analysis was based on the study subjects who met this MRI criterion. We compared this MRI-based criterion for all possible reader combinations with the clinical classification of the rheumatologists, both by kappa statistics and percentage agreement, to assess the external validity of this approach to defining a positive SI joint MRI finding.
We determined which specific MRI lesions, alone and in combination, best defined this MRI-based reference criterion as positive or negative for SpA by calculating the minimum cutoff values according to the number of affected SI joint quadrants using a receiver operating characteristic (ROC) curve analysis. The single MRI lesions were BME, erosion, fat infiltration, and ankylosis and the combined lesions BME and/or erosion, BME and/or fat infiltration, and fat infiltration and/or erosion. The ROC curve and the area under the curve (AUC) served to analyze which single or combined lesions provided the best discrimination between positive and negative SI joint MRI findings in SpA. We predefined a specificity threshold of ≥0.90 for SpA to be reached for a positive SI joint MRI finding and computed the corresponding cutoff values and sensitivity for all of the MRI lesions under consideration. The minimum number of affected SI joint MRI quadrants needed to discriminate between SpA and non-SpA was computed as the mean value of all 4 raters and then, for practical purposes, expressed as the next entire number of affected SI joint quadrants. We performed these computations for both the MRI reference criterion (confidence score 8–10 by the majority of readers [≥3 of 4] for SpA and confidence score ≤4 by all 4 readers for non-SpA on a 0–10 scale) and the clinical classification (SpA presence or absence as a binary variable) to compare the candidate MRI reference criterion with the traditional clinical gold standard.
We provide smooth ROC curve estimates computed according to the method described by Hall and Hyndman . This kernel-based ROC curve estimator is an alternative to the commonly used empirical ROC curve estimate, and we decided to apply the method described by Hall and Hyndman for 2 reasons: we assumed that the underlying true ROC curve is smooth, implying that a smooth estimate will on average yield a more efficient estimate in finite samples, and this method allows an easier comparison of several curves in 1 plot.
For the AUC based on the kernel estimate, we computed CIs based on 1,000 bootstrap samples. In all analyses, P values less than or equal to 0.05 were considered significant. All CIs were computed using a confidence level of 95%.
Demographic and clinical characteristics of the 2 SpA inception cohorts are shown in Table 1. The 2 different recruitment strategies to identify patients with early SpA resulted in substantial demographic and clinical differences between the 2 cohorts, with patients in cohort B (AAU + back pain) having a much longer disease duration and less severe disease. The AS patients in cohort B were older (median age 41.5 years versus 30.0 years in cohort A; P = 0.005) and the median symptom duration was substantially longer in cohort B both for the nr-axSpA group (10.0 years versus 1.3 years in cohort A; P < 0.0001) and the AS group (12.5 versus 3.9 years in cohort A; P = 0.0002 [in cohort A, AS patients with a symptom duration ≤5 years were preselected]). SpA patients in cohort B had less severe disease, with a statistically significantly lower Bath Ankylosing Spondylitis Functional Index (BASFI) score  in the nr-axSpA group (median BASFI score 0.8 versus 1.8 in cohort A; P = 0.03). Disease activity assessed by the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI)  was also lower in cohort B, without reaching statistical significance. NSBP controls in cohorts A and B had a similar median age of 32.7 and 33.6 years, respectively, which was close to the median ages of 32.2 and 36.2 years of the corresponding nr-axSpA groups.
|Cohort A (BP; n = 89)||Cohort B (AAU + BP; n = 88)|
|No. of subjects||20||10||39||20||31||24||33|
|Male:female ratio (% male)||11:9 (55.0)||8:2 (80.0)||11:28 (28.2)||7:13 (35.0)||17:14 (54.8)||11:13 (45.8)||17:16 (51.5)|
|Age, years||32.2 (12.3)||30.0 (9.5)b||32.7 (11.5)||30.6 (6.5)||36.2 (12.1)||41.5 (7.1)b||33.6 (15.7)|
|Symptom duration, years||1.3 (1.8)b||3.9 (1.8)b||N/A||N/A||10.0 (14.0)b||12.5 (13.5)b||N/A|
|HLA–B27 positive, no. (%)||12 (60.0)||9 (90.0)||N/A||N/A||24 (80.0)c||21 (87.5)||N/A|
|BASDAI score, NRS||4.4 (3.1)d||5.4 (1.5)d||N/A||N/A||3.5 (4.4)||2.0 (3.4)c||N/A|
|BASFI score, NRS||1.8 (3.9)e||2.7 (1.5)d||N/A||N/A||0.8 (2.3)b||0.6 (2.8)c||N/A|
|CRP level, mg/liter||4.0 (4.5)d||5.0 (8.0)d||N/A||N/A||2.7 (5.2)||8.0 (8.7)c||N/A|
|MRI lesions, no. (%)f|
|BME||16 (80.0)||8 (80.0)||7 (17.9)||6 (30.0)||12 (38.7)||17 (70.8)||8 (24.2)|
|ER||10 (50.0)||10 (100.0)||0 (0)||0 (0)||8 (25.8)||19 (79.2)||4 (12.1)|
|FI||8 (40.0)||6 (60.0)||8 (20.5)||2 (10.0)||12 (38.7)||18 (75.0)||4 (12.1)|
|ANK||0 (0)||1 (10.0)||0 (0)||0 (0)||1 (3.2)||5 (20.8)||0 (0)|
|BME and/or ER||10 (50.0)||8 (80.0)||0 (0)||0 (0)||5 (16.1)||16 (66.7)||4 (12.1)|
|BME and/or FI||7 (35.0)||4 (40.0)||0 (0)||1 (5.0)||8 (25.8)||11 (45.8)||4 (12.1)|
|ER and/or FI||5 (25.0)||6 (60.0)||0 (0)||0 (0)||7 (22.6)||13 (54.2)||3 (9.1)|
The number and percentage of patients and controls with single or combined MRI lesions in ≥1 SI joint quadrants as indicated by the majority of readers (≥3 of 4) are shown in Table 1. BME and erosion in the SpA patients were observed more frequently in cohort A, particularly in the nr-axSpA group (BME in 80% and erosion in 50% in cohort A and BME in 39% and erosion in 26% in cohort B). BME also was a frequent feature in the control groups (present in 18% of the NSBP patients and in 30% of the healthy controls in cohort A and in 24% of the NSBP controls of cohort B), while erosion was rarely reported in controls of either cohort (none in cohort A, 12% of NSBP controls in cohort B). Combined lesions were virtually absent in the controls of cohort A, while BME and/or erosion or BME and/or fat infiltration was observed in 12% of the NSBP controls in cohort B. Erosion in the absence of BME was indicated in a mean of 1.5 nr-axSpA patients (7.5%) and a mean of 3.5 nr-axSpA patients (11.3%) in cohorts A and B, respectively.
The agreement between 4 readers jointly for the 2 categories of confidence scores 0–5 (i.e., subjects considered most likely not to have SpA) and 6–10 (i.e., subjects considered most likely to have SpA) for a classification of SpA by SI joint MRI was excellent in both cohorts, with Cohen's kappa values of 0.76 (95% CI 0.64–0.86) for cohort A and 0.80 (95% CI 0.71–0.88) for cohort B. The mean percentage agreement for 6 possible reader pairs was very good as well, with 89.7% (positive/negative 25.5%/64.2%) and 90.1% (positive/negative 43.0%/47.1%) for cohorts A and B, respectively. After stratifying the level of reader confidence on a scale from 0–10 into the 5 score categories (0–2, 3–4, 5, 6–7, and 8–10), the agreement for all readers jointly remained substantial, resulting in kappa values of 0.73 (95% CI 0.62–0.81) and 0.74 (95% CI 0.65–0.80) for cohorts A and B, respectively.
Table 2 shows the agreement between all possible reader combinations of the MRI-based reference criterion and clinical classification. Sixty-eight (76.4%) of 89 subjects in cohort A and 63 (71.6%) of 88 subjects in cohort B met the prespecified MRI criterion for a positive MRI (≥3 of 4 readers scoring ≥8 for SpA and all 4 readers scoring ≤4 for non-SpA) (Table 3). The agreement between this reference MRI criterion and clinical classification was excellent in cohort A, showing a kappa value of 0.93 (95% CI 0.83–1.00) and a percentage agreement of 97.1%. In cohort B, the agreement was moderate, with a kappa value of 0.57 (95% CI 0.38–0.75) and a percentage agreement of 77.7% (Table 2).
|Classification by SI joint MRIb||Clinical classification in cohort A (BP; n = 89)||Clinical classification in cohort B (AAU + BP; n = 88)|
|No.||κ (95% CI)||Agreement (positive/negative), %||No.||κ (95% CI)||Agreement (positive/negative), %|
|≥1/≥1||84||0.83 (0.68–0.94)||92.9 (25.0/67.9)||75||0.56 (0.38–0.71)||77.3 (41.3/36.0)|
|≥1/≥2||87||0.84 (0.70–0.95)||93.1 (27.6/65.5)||79||0.55 (0.37–0.72)||77.2 (44.3/32.9)|
|≥1/≥3||82||0.89 (0.77–0.97)||95.2 (29.3/65.9)||80||0.52 (0.34–0.68)||76.2 (46.2/30.0)|
|≥1/4||72||0.88 (0.74–0.97)||94.4 (33.3/61.1)||78||0.49 (0.30–0.69)||75.6 (48.7/26.9)|
|≥2/≥1||88||0.78 (0.62–0.91)||90.9 (23.9/67.0)||82||0.51 (0.35–0.68)||74.4 (37.8/36.6)|
|≥2/≥2||85||0.86 (0.72–0.97)||94.1 (25.9/68.2)||82||0.53 (0.36–0.70)||75.6 (40.2/35.4)|
|≥2/≥3||80||0.91 (0.79–1.00)||96.3 (27.5/68.8)||77||0.54 (0.37–0.71)||76.7 (42.9/33.8)|
|≥2/4||68||0.93 (0.83–1.00)||97.1 (32.4/64.7)||69||0.56 (0.37–0.75)||78.2 (47.8/30.4)|
|≥3/≥1||88||0.78 (0.62–0.91)||90.9 (23.9/67.0)||80||0.48 (0.32–0.64)||72.6 (33.8/38.8)|
|≥3/≥2||85||0.86 (0.72–0.97)||94.1 (25.9/68.2)||76||0.52 (0.37–0.69)||75.0 (36.8/38.2)|
|≥3/≥3||80||0.91 (0.79–1.00)||96.3 (27.5/68.8)||71||0.54 (0.37–0.70)||76.0 (39.4/36.6)|
|≥3/4||68||0.93 (0.83–1.00)||97.1 (32.4/64.7)||63||0.57 (0.38–0.75)||77.7 (44.4/33.3)|
|4/≥1||86||0.73 (0.57–0.88)||89.5 (20.9/68.6)||73||0.44 (0.28–0.60)||69.9 (27.4/42.5)|
|4/≥2||81||0.84 (0.69–0.96)||93.8 (22.2/71.6)||67||0.49 (0.33–0.66)||73.2 (29.9/43.3)|
|4/≥3||76||0.90 (0.77–1.00)||96.1 (23.7/72.4)||62||0.51 (0.34–0.70)||74.2 (32.3/41.9)|
|4/4||64||0.93 (0.80–1.00)||96.9 (28.1/68.8)||54||0.55 (0.35–0.75)||75.9 (37.0/38.9)|
|Level of confidence for a classification of SpA||Cohort A (BP; n = 89)||Cohort B (AAU + BP; n = 88)|
|High, confidence score 8–10b|
|nr-axSpA||12/20 (60.0)||8/31 (25.8)|
|AS||10/10 (100.0)||20/24 (83.3)|
|NSBP||0/39 (0)||1/33 (3.0)|
|Low, confidence score 0–4c|
|nr-axSpA||2/20 (10.0)||13/31 (41.9)|
|AS||0/10 (0)||0/24 (0)|
|NSBP||28/39 (71.8)||21/33 (63.6)|
|Total, SpA + non-SpA||68/89 (76.4)||63/88 (71.6)|
Based on the prespecified MRI reference criterion of ≥3 of 4 readers scoring ≥8 and all 4 readers scoring ≤4, Table 4 shows the specificity, sensitivity, and cutoff values for the number of affected SI joint quadrants needed to reach a predefined specificity of ≥0.90, and also the AUCs for single and combined MRI features based on the kernel estimate. Erosion in ≥1 and ≥1 SI joint quadrants for cohorts A and B, respectively, or BME in ≥2 and ≥2 SI joint quadrants for cohorts A and B, respectively, yielded a specificity of ≥0.90 for a positive SI joint MRI finding. The combined features erosion and/or BME increased sensitivity to 0.98 and 0.96 for cohorts A and B, respectively, compared to BME alone (0.91 and 0.83 for cohorts A and B, respectively), without reducing specificity. Fat infiltration showed low sensitivity, resulting in poor utility for classification with high cutoff values for fat lesions.
|Lesion||Cohort A, BP||Cohort B, AAU + BP|
|Specificity||Sensitivity||Cutoff, no. of affected SI joint quadrantsb||AUC (95% CI)||Specificity||Sensitivity||Cutoff, no. of affected SI joint quadrantsb||AUC (95% CI)|
|MRI criterion as the gold standardc|
|BME||0.90||0.91||2 (2.32)||0.96 (0.88–1.00)||0.90||0.83||2 (1.64)||0.91 (0.82–0.98)|
|ER||0.90||1.00||1 (0.02)||1.00 (NC)d||0.90||1.00d||1 (0.25)d||1.00 (NC)d|
|FI||0.90||0.34||13 (13.11)||0.79 (0.68–0.89)||0.90||0.74||5 (5.04)||0.95 (0.88–1.00)|
|BME and/or ER||0.90||0.98||2 (2.32)||1.00 (0.99–1.00)||0.90||0.96||2 (1.64)||1.00 (0.99–1.00)|
|BME and/or FI||0.90||0.96||13 (13.46)||0.99 (0.98–1.00)||0.90||0.94||6 (5.63)||0.99 (0.97–1.00)|
|ER and/or FI||0.90||0.60||13 (13.11)||0.91 (0.84–0.98)||0.90||0.92||5 (5.03)||0.98 (0.94–1.00)|
|Clinical classification as the gold standard|
|BME||0.90||0.73||3 (3.24)||0.88 (0.81–0.95)||0.90||0.39||4 (4.46)||0.72 (0.62–0.81)|
|ER||0.90||0.77||1 (0.54)||0.93 (0.87–0.98)||0.90||0.54||2 (1.89)||0.78 (0.69–0.86)|
|FI||0.90||0.30||12 (11.63)||0.74 (0.63–0.84)||0.90||0.49||9 (9.39)||0.77 (0.68–0.86)|
|BME and/or ER||0.90||0.82||3 (3.28)||0.95 (0.90–0.99)||0.90||0.51||6 (6.13)||0.79 (0.70–0.88)|
|BME and/or FI||0.90||0.76||12 (11.83)||0.91 (0.83–0.97)||0.90||0.47||14 (13.81)||0.79 (0.71–0.88)|
|ER and/or FI||0.90||0.50||12 (11.83)||0.84 (0.76–0.93)||0.90||0.58||11 (10.91)||0.79 (0.69–0.87)|
Using the clinical classification as the gold standard, the cutoff values to obtain a specificity of ≥0.90 for a positive SI joint MRI finding were erosion in ≥1 and ≥2 SI joint quadrants for cohorts A and B, respectively, or BME in ≥3 and ≥4 SI joint quadrants for cohorts A and B, respectively. Again, the AUCs for the combined features erosion and/or BME (AUC 0.95 [95% CI 0.90–0.99] for cohort A and AUC 0.79 [95% CI 0.70–0.88] for cohort B) were superior compared to each lesion alone (for BME, AUC 0.88 [95% CI 0.81–0.95] for cohort A and AUC 0.72 [95% CI 0.62–0.81] for cohort B; for erosion, AUC 0.93 [95% CI 0.87–0.98] for cohort A and AUC 0.78 [95% CI 0.69–0.86] for cohort B) (Figure 1).
This cross-sectional analysis of SI joint MRI in 2 SpA inception cohorts by expert readers showed several findings relevant to what constitutes a positive MRI finding in SpA and clarified which lesions are most important for classification of SpA. The finding of BME in up to 30% of controls represents a significant limitation to using BME alone to define a positive MRI finding for the purpose of a classification criterion as currently proposed for axial SpA, where high specificity is desirable . Moreover, the cutoff value for the number of affected SI joint quadrants with BME was ≥3 and ≥4 for cohorts A and B, respectively, when the gold standard was clinical classification of SpA. A cutoff value of ≥2 affected SI joint quadrants is consistent with the proposed ASAS criterion for a positive MRI finding . Conversely, the high specificity of erosion resulted in the very low cutoff value of only 1 affected SI joint quadrant as constituting a positive MRI finding in both inception cohorts. Nevertheless, the combined features erosion and/or BME increased sensitivity the most without reducing specificity as measured by the AUC values in both cohorts for both clinical and MRI-based classification. Consequently, this data-driven study supports the case of including both erosion and BME as a basis for defining a positive SI joint MRI finding for the classification of axial SpA, as proposed in an earlier study .
Studies on the diagnostic utility of a given test should include healthy individuals and controls with a disorder that is clinically challenging to differentiate from the disease under observation . In both inception cohorts of the present study, BME was indicated in up to 24% of NSBP controls and in 30% of healthy individuals. In contrast, erosion was absent in both the NSBP and healthy control subjects of cohort A and was present in only 12% of NSBP controls in cohort B. These observations are consistent with a previous study reporting BME in the SI joint of 27% of a control group consisting of individuals with NSPB and healthy individuals  and another study reporting BME in the SI joint of 23% and 7% of NSBP and healthy controls, respectively, while erosion was seen in only 4% and 2% of NSBP and healthy controls, respectively . The clinical relevance of these abnormal MRI findings in NSBP and healthy controls remains unclear, with the most likely explanation being mechanically induced signal alterations or degenerative changes.
The combined features BME and/or erosion were observed in 50% and in 16% of nr-axSpA patients in cohorts A and B, respectively, supporting a previous finding that structural damage may start early in the SI joint, long before the damage is detectable by pelvic radiography . Erosion in the absence of BME was indicated in 7.5% and 11.3% of nr-axSpA patients in cohorts A and B, respectively; these results compare well to a previous study with other nr-axSpA patients showing structural lesions without BME (13.0%) . The recognition of erosion on SI joint MRI may be challenging, but trained readers are able to detect erosion and BME to a comparable degree of reliability . Identifying erosion on SI joint MRI requires specific training to recognize features of SpA on T1SE MRI; our previous work has shown that such training enhances the diagnostic utility of MRI assessment for SpA .
The cutoff values of all MRI lesion types under consideration were higher when using clinical classification as the gold standard compared with the MRI criterion. This lower specificity may be explained by the different number of patients included in the evaluations according to the 2 reference standards. An analysis according to clinical classification is performed for an entire cohort, while an evaluation by an MRI-based criterion is performed only for the fraction of patients meeting the imaging criterion. The exclusion of patients with equivocal SI joint MRI findings results in lower cutoff values of MRI-based approaches compared with clinical classification, which includes all patients with equivocal SI joint MRIs. Lesion cutoff values may vary depending on the gold standard used, the recruitment criteria of patients and controls, and various features, such as disease characteristics or symptom duration. Therefore, an MRI-derived definition of a positive SI joint MRI finding based on the confidence of expert readers upon performing simultaneous global assessment of T1SE and STIR sequences represents an alternative candidate reference criterion. Our prespecified MRI criterion was based on the majority of readers indicating a high confidence level for a classification of SpA. However, the analysis based on any 2 readers was nearly identical to the approach using the majority of readers (Table 2).
The agreement between the MRI-based criterion and clinical classification varied between the 2 cohorts, even though the MRI techniques used in both institutions were comparable and despite the inclusion of a comparable fraction of subjects from each cohort for analysis (76% versus 72%) (Table 2). The agreement between MRI-based and clinical assessment was substantial to excellent in cohort A compared with a moderate agreement in cohort B. Kappa values across 16 possible reader combinations ranged from 0.73–0.93 for cohort A and 0.44–0.57 for cohort B. The study participants included in cohort A were referred by rheumatologists and primary care physicians to a tertiary care center for the evaluation of suspected SpA and were much more symptomatic than the patients in cohort B, with cohort A having higher BASDAI and BASFI scores and a higher frequency of lesions on SI joint MRI. The major source of disagreement in cohort B between the MRI-based criterion and the clinical classification was reflected in the patients clinically classified as having nr-axSpA but showing no abnormalities indicative of SpA on SI joint MRI. The more subtle and less severe clinical manifestations of inflammatory back pain leading to the less precise clinical ascertainment in cohort B was the most likely explanation for the lower agreement between clinical and MRI-based classification. The comparison between the 2 cohorts illustrates how classification based on SI joint MRI is affected by the mode of recruitment of patients with suspected early SpA; by demographic and clinical characteristics such as age, symptom duration, and disease activity; and by the interpretation of clinical symptoms such as inflammatory back pain by local rheumatologists. Prospective data are needed whether clinical or MRI criteria are preferable for classification of nr-axSpA.
The long symptom duration of the nr-axSpA patients in cohort B is in line with several recent studies. Two interventional trials of nr-axSpA reported a symptom duration of 7–8 years and 10.1 years with HLA–B27 frequencies of 59–75% and 70–75%, respectively [22, 23]. In 2 prospective studies of SpA inception cohorts with 7.7 and 10 years of followup, only 33.3% and 24.3% of nr-axSpA patients, respectively, progressed to radiographic sacroiliitis, meeting the modified New York criteria [7, 24].
In conclusion, the level of confidence in the classification of SpA according to global assessment of SI joint MRI by expert readers may represent a candidate reference criterion for studies aiming to define a positive SI joint MRI finding for the purpose of classifying SpA. Using this MRI reference criterion as well as traditional clinical classification as the gold standard, we showed that BME performed less well than erosion because of a high frequency of false-positives in non-SpA controls. Moreover, the combined lesions erosion and/or BME increased sensitivity, compared to BME alone, without reducing specificity. The combined features erosion and/or BME represented the best candidate definition for a lesion-based definition of a positive SI joint MRI finding in SpA.
All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be published. Dr. Weber had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study conception and design. Weber, Pedersen, Rufibach, Lambert, Chan, Østergaard, Maksymowych.
Acquisition of data. Weber, Zubler, Pedersen, Lambert, Chan, Maksymowych.
Analysis and interpretation of data. Weber, Zubler, Pedersen, Rufibach, Lambert, Chan, Østergaard, Maksymowych.
Dr. Rufibach is founder and owner of Rufibach rePROstat and is an employee of F. Hoffmann-La Roche.
The authors thank the patients and healthy volunteers for their participation in the study; Tracey Clare, Clinical Research Manager, and Paul Filipow, Data Manager, Department of Radiology, University of Alberta, Edmonton, Alberta, Canada, for coordinating the web-based MRI scoring module; and Rudolf O. Kissling, MD, Department of Rheumatology, Balgrist University Hospital, Zurich, Switzerland, for scoring the SI joints (Balgrist patients) on pelvic radiographs.