SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. PATIENTS AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. REFERENCES

Objective

To select the most appropriate radiologic scoring method for the evaluation of radiographic progression in ankylosing spondylitis (AS) in clinical trials.

Methods

The validity of the currently available methods, the Bath Ankylosing Spondylitis Radiology Index (BASRI), the Stoke Ankylosing Spondylitis Spine Score (SASSS), and the modified SASSS (M-SASSS), was tested according to the aspects of the Outcome Measures in Rheumatology Clinical Trials filter: truth, discrimination (reliability and sensitivity to change), and feasibility, using radiographs of 133 patients at 4 different time points (baseline, 1 year, 2 years, and 4 years). One observer scored these sets in chronological order. To assess interobserver reliability, a second observer scored radiographs of 20 patients at the 4 different time points.

Results

After 4 years, 9% and 8% of patients showed changes >0 in the sacroiliac (SI) joints and hips, respectively. Independent of the method chosen, ∼40% of patients showed changes in both the lumbar and cervical spine. Therefore, it was concluded that, for the assessment of progression, SI joints and hips are of minor importance. The intraclass correlation coefficient (ICC) varied from 0.87 to 0.98 and ICCs for intraobserver scores varied from 0.96 to 0.99. Concerning progression scores, only the ICC for the M-SASSS measured after 2 years remained acceptable (0.82). The intraobserver scores for progression after 2 years of followup were an ICC of 0.93 for the BASRI, an ICC of 0.79 for the SASSS, and an ICC of 0.95 for the M-SASSS. Concerning sensitivity to change, it was found that the M-SASSS classified the highest percentage of patients with a change >0.

Conclusion

The M-SASSS is the most appropriate method by which to score the radiographic progression in AS patients in clinical trials.

In assessing the disease-modifying potential of drugs used in the treatment of ankylosing spondylitis (AS), the demonstration of a reduction or termination of structural damage is essential. Structural damage in AS can be measured on radiographs of the spine and hips. A number of radiographic scoring methods are available for this purpose: the Bath Ankylosing Spondylitis Radiology Index (BASRI) (1), the Stoke Ankylosing Spondylitis Spine Score (SASSS) (2), and a modification of the SASSS (M-SASSS) (3). The BASRI exists in 2 forms: BASRI-spine and BASRI-total. The former excludes, and the latter includes, the hips. The BASRI and the SASSS have been published in peer-reviewed journals, and the M-SASSS has been published in thesis form only. All methods have been validated by their developers.

It is commendable that one of these methods would be selected as the radiographic outcome assessment of choice for clinical trials, in order to ensure uniformity and allow a comparison of data across trials in the future. The Assessments in Ankylosing Spondylitis Working Group is attempting to standardize the measurements in AS (4), and the selection of a method to assess radiographic progression is one of the important issues on its research agenda.

The validity of radiographic scoring methods in AS has hardly been investigated in the past. Spoorenberg et al (5, 6) have initiated method comparisons with a maximum followup of 2 years, primarily related to assessing damage scores. In these studies, some aspects of reliability (intra- and interobserver reliability of status scores) of all 3 methods were established. Moreover, agreement between 2 observers on progression in individual patients was assessed, but only with a strict definition of “agreement.” In clinical trials, however, the subject of interest is change in radiographic damage, primarily on the group level, and not the absolute level of damage itself. Apart from that, and according to the Outcome Measures in Rheumatology Clinical Trials (OMERACT), discrimination (sensitivity to change), truth (construct validity), and feasibility of scoring methods should be investigated before a preference is made (7).

The main objective of the present study was therefore to test the radiographic scoring methods of all 3 aspects of the OMERACT filter over a followup period of 4 years, including an evaluation of the reliability of progression scores.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. PATIENTS AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. REFERENCES

Patients and radiographs.

Radiographs from a cohort in the Outcome Assessments in Ankylosing Spondylitis International Study (OASIS) population, an international longitudinal, observational study of outcomes in AS, were used (5). Originally, 217 patients from 4 centers in The Netherlands, Belgium, and France were included in this cohort. Radiographs were obtained at baseline and after 1, 2, and 4 years. One set of radiographs consisted of a posteroanterior view of the pelvis to score the sacroiliac (SI) joints and the hips, an anteroposterior (AP) and lateral view of the lumbar spine, and a lateral view of the cervical spine. Because some of the patients were lost to followup, sets of radiographs including the 4-year film of only 133 patients were available; only these patients were included in our study.

Scoring methods.

Currently, there are 2 widely known scoring methods, the BASRI and the SASSS. In 1995, Kennedy et al (8) used a radiology score that was the precursor of the BASRI. In this score, the SI joints were graded according to the New York criteria (9), which describe 5 grades of sacroiliitis ranging from 0 to 4 (Table 1). For the hips and cervical and lumbar spine, a comparable system was developed. In 1998, MacKay et al (1) introduced the currently used BASRI, the BASRI-spine, which includes the SI joints (still scored according to the New York criteria) and the lumbar and cervical spine. To assess the lumbar spine, AP and lateral views are used. The score for the lumbar spine is a composite score of both views (e.g., if one view shows syndesmophytes at a particular level and the other view shows syndesmophytes at a different level, syndesmophytes are considered to be present at 2 different levels and are scored accordingly). The lumbar spine is defined as extending from the lower border of T12 to the upper border of S1. To assess the cervical spine, a lateral view was used, and the cervical spine was defined as extending from the lower border of C1 to the upper border of C7. The lumbar spine and cervical spine are graded separately on a scale of 0–4 (Table 1). The BASRI-spine is the sum of the mean score of the right and left SI joints (to 1 decimal place) plus the scores of the lumbar spine and the cervical spine. According to the New York criteria, AS patients are supposed to have radiographic sacroiliitis, so the range of the BASRI-spine in patients fulfilling the criteria is 2–12.

Table 1. Radiologic scoring methods*
  • *

    SI = sacroiliac; BASRI = Bath Ankylosing Spondylitis Radiology Index; AP = anteroposterior; SASSS = Stoke Ankylosing Spondylitis Spine Score.

New York criteria for sacroiliitis (mean score of both SI joints is used in the BASRI)
 0 = normal
 1 = suspicious (no definite change)
 2 = minimal (minimal sacroiliitis, defined as the loss of definition at the edge of the SI joints, some juxtaarticular sclerosis, minimal erosions, and possible joint space narrowing)
 3 = moderate (moderate sacroiliitis, defined as definite sclerosis on both sides, blurring and indistinct margins, and erosive changes, with loss of joint space)
 4 = severe (complete fusion or ankylosis of the joints)
BASRI-hips (mean score of both hips is used in the BASRI-total)
 0 = normal (no change)
 1 = suspicious (focal joint space narrowing)
 2 = mild (circumferential joint space narrowing >2 mm)
 3 = moderate (circumferential joint space narrowing ≤2 mm or bone-on-bone apposition of <2 cm)
 4 = severe (bone deformity or bone-on-bone apposition ≥2 cm)
  The grade should be increased by 1 if 2 of the following bone changes are present: erosions, osteophytes, and protrusion.
BASRI-spine (for the lumbar spine, AP and lateral views are scored, and the view with the highest score is taken; for the cervical spine, lateral view is scored)
 0 = normal (no change)
 1 = suspicious (no definite change)
 2 = mild (any number of erosions, squaring, or sclerosis, with or without syndesmophytes, on ≤2 vertebrae)
 3 = moderate (syndesmophytes on ≥3 vertebrae, with or without fusion involving 2 vertebrae)
 4 = severe (fusion involving ≥3 vertebrae)
SASSS and modified SASSS (range 0–72) (for the SASSS, the anterior and posterior sites from the lower border of T12 to the upper border of S1 are scored; for the modified SASSS, only the anterior site of the lumbar spine and the anterior site of the cervical spine from the lower border of C2 to the upper border of T1 are scored)
 0 = normal
 1 = erosion, sclerosis, or squaring
 2 = syndesmophyte
 3 = bridging syndesmophyte

In 2000, MacKay et al (10) introduced the BASRI-total, which is the total of the BASRI-hip and the BASRI-spine. The score for the hip was based on the same grading system that applies to the other parts of the BASRI (Table 1). The BASRI-hip score is the mean of the right and left hips (to 1 decimal place). In the studies concerning the BASRI, it is not explained how missing observations should be handled. It was decided that when it was impossible to assess 75% or more of a view or if a view was missing, the patient was excluded from this study.

The second method that was evaluated in this study was the SASSS. In 1991, Taylor et al (11) revealed this detailed scoring system for the anterior and posterior sites of the lumbar spine, with a range of 0–72. The SASSS is obtained by assessing the lower border of T12, all 5 lumbar vertebrae, and the upper border of the sacrum on a lateral view. All 4 corners of each vertebra are examined and scored 1 for an erosion, sclerosis, and/or squaring, 2 for a syndesmophyte, and 3 for total bony bridging at each site, giving a maximum possible score of 72 (Table 1). In both studies concerning the SASSS, it is not explained how missing observations should be handled. We decided that if >3 scoring sites were missing, the radiographs were excluded. If 3 or fewer sites were missing, the mean of the other scoring sites was used as a substitute for the missing sites.

The final method included in this study was a method derived from Creemers et al (3). This method is a modification of the SASSS and scores the anterior sites of the lumbar and cervical spine on a lateral view. The anterior sites of the same vertebrae of the lumbar spine as described for the SASSS are scored, as are the anterior sites of the cervical spine from the lower border of C2 to the upper border of T1. So the range remains 0–72. We dealt with missing observations in a manner similar to that described for the SASSS.

One observer (AJBW) scored the radiographs according to the different methods. For each patient, the order in which the methods were applied was always the same. First, the SI joints and hips were scored according to the BASRI. Second, the lumbar spine was scored according to the BASRI and the SASSS. Finally, the cervical spine was scored according to the BASRI and the M-SASSS. During the followup, the format of the radiographs was changed in some centers, which made it possible for the observer to identify the point in time. Therefore, all radiographs were scored with known chronology for all methods.

Use of the OMERACT filter to evaluate the various scoring methods.

The BASRI (split into BASRI-spine and BASRI-total), the SASSS, and the M-SASSS were all judged with respect to the different aspects of the OMERACT filter: truth, discrimination, and feasibility (7).

Truth.

The truth aspect deals with the following questions: Is the measure truthful, and does it measure what is intended? How valid is the measure? To establish a valid radiologic scoring method for AS in clinical trials, it is important that the method include the relevant parts of the skeleton (construct validity). Therefore, we evaluated every part of the skeleton included in the different scoring methods. By doing this, we got an impression of the involvement of the different parts, and we identified the parts in which changes occurred. For the lumbar spine, we also compared the additional information obtained from the AP view with the additional information obtained from the lateral view, since both views are used in the BASRI and only the lateral view is used in the SASSS and the M-SASSS. Also, the anterior spine versus the posterior spine was evaluated, since in the SASSS, both sites are scored and in the M-SASSS, only the anterior site is scored. The construct validity of the method was also assessed by examining the correlation of the scoring methods with measures of spinal mobility, disease duration, and functional limitation. As measures of spinal mobility, the occiput-to-wall distance, the modified Schober test, and lateral spinal flexion (12) were used. Lateral spinal flexion is measured as the difference between the patient's middle fingertip and the floor when the patient is standing and when the patient has bent sideward maximally. As a measure of functional limitation, the Bath Ankylosing Spondylitis Functional Index (BASFI) was used (13). The correlation was expressed as Spearman's rho.

Discrimination.

The discrimination aspect concentrates on the following question: Does the measure discriminate between situations of interest? Issues of reliability and sensitivity to change are focused upon. To assess interobserver reliability, another observer (RBML) scored sets of radiographs of 20 patients, with 4 time points per patient and with known chronological order. These same 80 sets were scored in chronological order with a time interval of 4 weeks by the first observer (AJBW) to assess intraobserver reliability. Therefore, inter- and intraobserver reliability of the different scoring methods was assessed on 80 status scores and 60 progression intervals.

Inter- and intraobserver reliability can be expressed as intraclass correlation coefficients (ICCs) or as components calculated by analysis of variance (ANOVA). We chose an ANOVA with the patient, observer, and residual variance components, because a comparison of variance components provides better insight into the kind of error. Each variance component represents the percentage of the total variance that can be explained by that particular component. The patient variance component reflects the variance that is caused by true differences among patients, the observer variance component reflects the variance attributable to differences between observers, and the residual variance component reflects the remaining random error. The patient variance component can be used in comparisons with studies in which ICCs are presented, since it is equal to the ICC if the latter is calculated with observer as the fixed factor.

To obtain insight into the sensitivity to change of the methods, the means and medians are given for baseline and after 1, 2, and 4 years of followup. Also, the percentages of patients with any changes from baseline are given, and effect sizes were calculated. Effect sizes were calculated on logarithmically transformed data because of the skewed distribution pattern.

Feasibility.

This last aspect of the OMERACT filter focuses on the question of whether the measure can be applied easily, given the constraints of time, money, and interpretability. In order to provide insight into these matters, information is given about the time needed for training and for scoring of the methods, as well as the radiation exposure of the patients.

Statistical analysis.

P values less than 0.05 were considered significant.

RESULTS

  1. Top of page
  2. Abstract
  3. PATIENTS AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. REFERENCES

Characteristics of the patients.

In Table 2, the characteristics of the patients of the OASIS cohort who were included in this study are described, as well as the characteristics of the patients who were not included. Patients who were not included were younger and had a shorter disease duration, but they were affected somewhat more severely than those who were included. Although radiologic damage, disease activity, function, and spinal mobility measures were somewhat worse for those in the group not included, the differences were not statistically significant. Therefore, the included population was considered to be representative of the entire OASIS cohort.

Table 2. Characteristics of the OASIS cohort at baseline*
VariablePatients included in this study (n = 133)Patients not included in this study (n = 84)
Mean ± SDMedian25th percentile75th percentileRangeMean ± SDMedian25th percentile75th percentileRange
  • *

    OASIS = Outcome in Ankylosing Spondylitis International Study; BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score; BASDAI = Bath Ankylosing Spondylitis Disease Activity Index; BASFI = Bath Ankylosing Spondylitis Functional Index.

Age, years44.6 ± 11.744.035.053.320.0–78.042.1 ± 14.040.231.650.319.0–77.0
Duration of symptoms, years21.0 ± 11.618.212.327.10.4–51.018.9 ± 12.015.99.625.50.0–54.0
Time since disease diagnosis, years11.7 ± 9.310.04.515.70.2–42.010.7 ± 9.59.54.29.50.0–34.0
Male, %69.7    73.5    
Radiologic outcome          
 BASRI-spine score6.5 ± 3.07.04.09.01.0–12.06.4 ± 3.75.53.010.00.0–12.0
 BASRI-total score6.9 ± 3.47.04.09.01.0–16.07.0 ± 4.56.53.010.00.0–16.0
 SASSS score10.1 ± 18.02.00.012.00.0–72.012.6 ± 20.52.00.017.00.0–72.0
 Modified SASSS score12.7 ± 17.45.00.016.90.0–72.016.5 ± 22.94.00.031.30.0–72.0
Disease activity, BASDAI3.4 ± 2.13.21.75.00.0–8.53.5 ± 2.23.21.74.90.0–9.7
Function, BASFI3.3 ± 2.53.21.05.00.0–9.73.5 ± 2.73.21.05.30.0–10.0
Spinal mobility          
 Lateral spinal flexion, mm101.8 ± 63.7100.054.5150.50.0–242.097.1 ± 63.194.044.8144.00.0–261.0
 Occiput-to-wall distance, mm36.4 ± 48.715.00.061.80.0–260.042.7 ± 65.50.00.058.00.0–261.0
 Modified Schober test, mm129.0 ± 19.9132.0119.0140.814.0–168.0125.2 ± 14.5125.0110.0137.0100.0–160.0

Handling of missing data.

The sites that were scored most as missing on the lateral view of the cervical spine were the lower 3 vertebrae, the upper and lower borders of C7 (in 4% and 10% of patients, respectively), and the upper border of T1 (in 10% of patients). One way of dealing with this problem would be to exclude these sites from the scoring system, which would lead to a loss of 25% of the information about the cervical spine. Therefore, we thought substitution in ∼10% of patients was a preferable approach.

The OMERACT filter.

Truth.

The involvement of, and progression in, the different parts of the skeleton are presented in Table 3. For the SI joints and the hips, only a small percentage of patients showed progression, and involvement of the hips was limited. Abnormalities in the spine were scored for the majority of patients at baseline with all methods, and changes could also be scored by all methods in the followup period of 4 years. The same was observed for the cervical spine. When the anterior and posterior sites of the lumbar spine were evaluated separately, the majority of patients showed damage at the anterior site, and this site also showed the most progression. The difference in the percentage of patients who showed progression in the lumbar and cervical spine using the various methods was noteworthy. The SASSS and the M-SASSS quantified a higher percentage of patients as having progression than did the BASRI.

Table 3. Percentages of patients with structural damage at baseline and radiographic progression at the 4-year followup, for the different parts of the skeleton, scored according to the 3 methods*
Skeletal siteBASRISASSSModified SASSS
% of patients with baseline damage% of patients with any change from baseline% of patients with baseline damage% of patients with any change from baseline% of patients with baseline damage% of patients with any change from baseline
  • *

    BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score; SI = sacroiliac; NA = not applicable.

  • Change is defined as every change >0.

SI joints1009NANANANA
Hips248NANANANA
Lumbar spine681860466043
 AnteriorNANA60436043
 PosteriorNANA1815NANA
Cervical spine6523NANA5641
 AnteriorNANANANA5641

We compared the radiologic damage and progression visible on the AP view of the lumbar spine with that on the lateral view. Neither view provided the same information. In 12% of all cases, more damage was seen on the AP view. Thus, if the AP view were to be omitted for staging AS patients, valuable information from 12% of patients would be missed.

We also investigated whether loss of information was similar if progression scores were used. In half of the cases in which the progression on both views differed, the progression scored on the AP view was greater. Whether the progression on the AP view was significantly greater than that on the lateral view was investigated by focusing on these sets of radiographs. Of all 389 intervals studied, in 39 (10%) the AP view showed more progression than the lateral view. These intervals concerned 19 patients. For each patient, the progression on the AP view was compared with the progression on the lateral view. In only 4 patients (3%) would the missed information on the AP view have added information to the scoring derived from only the lateral view. So, for the aim of staging, the AP lumbar view provided relevant additional information, but if radiographs were assessed with the aim of evaluating progression, the AP view did not contribute importantly.

We compared the radiologic damage and progression visible on the anterior site of the lumbar spine with that on the posterior site. In more than half of the patients, there was a difference in damage on the anterior site versus the posterior site of the spine. Almost always, this difference was caused by the fact that the damage at the anterior site was worse. But 13 patients showed more progression at the posterior site than at the anterior site. For each patient, the progression at the anterior site was compared with the progression at the posterior site. We found that in 10 of these 13 patients, the progression at the posterior site significantly contributed to the total progression. Assessing the posterior site did not contribute for staging purposes, but for assessing progression, it contributed significantly in <10% of the patients.

In Table 4, correlations between measurements of spinal mobility, disease duration, and the BASFI with the different scoring methods are shown. There were significant correlations of spinal mobility, disease duration, and the BASFI with radiologic damage. For all methods, the correlations showed the same magnitude.

Table 4. Range of correlations between radiologic damage and spinal mobility, expressed as Spearman's rho, calculated at baseline and at 1 year, 2 years, and 4 years of followup*
 BASRI-spineBASRI-totalSASSSModified SASSS
  • *

    BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score; BASFI = Bath Ankylosing Spondylitis Functional Index.

Lateral spinal flexion−0.50, −0.75−0.47, −0.75−0.56, −0.77−0.52, −0.75
Occiput-to-wall distance0.59, 0.650.56, 0.630.53, 0.610.52, 0.64
Modified Schober test−0.51, −0.65−0.50, −0.65−0.61, −0.76−0.56, −0.67
Disease duration0.37, 0.420.38, 0.420.34, 0.360.33, 0.36
BASFI0.33, 0.390.34, 0.390.33, 0.410.32, 0.37
Discrimination.

The first part of the discrimination aspect is reliability. Inter- and intraobserver reliability values are shown in Table 5 for status scores at 2 years and for progression scores after a followup of 2 years. The interobserver reliability for the status scores was very good for all methods and excellent for the M-SASSS. The intraobserver reliability for status scores was excellent for all methods. The interobserver reliability for progression scores showed good reliability only for the M-SASSS; reliability for the BASRI and the SASSS was unsatisfactory. When we focused on the kind of error for the different methods, it appeared that for the BASRI, the error was random and that for the SASSS, the error consisted of random error and error caused by differences between the observers. It was remarkable that there was such a difference between the SASSS and the M-SASSS, and so the different sites of the SASSS were investigated related to interobserver reliability.

Table 5. Inter- and intraobserver reliability based on evaluation of 20 patients, expressed in variance components*
 Interobserver reliabilityIntraobserver reliability
Status score at 2 yearsProgression score after 2 yearsStatus score at 2 yearsProgression score after 2 years
Residual variance componentObserver variance componentPatient variance componentResidual variance componentObserver variance componentPatient variance componentResidual variance componentObserver variance componentPatient variance componentResidual variance componentObserver variance componentPatient variance component
  • *

    BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score.

BASRI-total13.40.086.650.90.648.53.40.496.36.30.793.0
BASRI spine14.90.085.148.90.051.12.70.397.06.30.093.0
SASSS7.44.088.632.523.843.70.90.099.121.30.078.7
Modified SASSS2.20.098.417.90.481.70.80.299.15.00.095.0

The anterior site residual variance component was 32.4, the observer variance component was 6.8, and the patient variance component was 60.9. For the posterior site, these values were 71.4, 25.9, and 2.7, respectively. So, the interobserver reliability of mainly the posterior site was very poor. The intraobserver reliability of progression scores was good for all methods. Only the reliability scores at 2 years and after an interval of 2 years are given, but all results also apply to the status scores at baseline, 1 year, and 4 years and to the progression scores after 1- and 4-year intervals (data not shown). One exception was the interobserver reliability of the progression score according to the M-SASSS after a 1-year interval. This score was not good; the patient variance component was 49.6. This poor interobserver reliability was due to residual error (40.7).

Data concerning sensitivity to change are presented in Table 6, where it can be seen that changes could be detected by all methods. After 4 years, the BASRI-spine and the BASRI-total showed changes of 0.6 and 0.7 points, respectively, and the SASSS and the M-SASSS showed changes of 3.5 and 4.4, respectively. In Table 7, the percentages of patients who showed changes with each method are listed. The M-SASSS quantified the highest percentage of patients with changes. We also investigated whether a ceiling effect occurred. At baseline, 5.3% of patients had a maximal score for the BASRI-spine; this was 0.8% for the BASRI-total, 3.8% for the SASSS, and 0.8% for the M-SASSS, respectively. If the different parts of the scoring systems were considered separately, then 14% of patients had a maximal score for the lumbar spine according to the BASRI versus 5% with the SASSS and the M-SASSS. For the cervical spine, 12% of patients had a maximal score according to the BASRI versus 2% of patients with the M-SASSS. Therefore, progression scores as assessed by the BASRI may have been influenced by a ceiling effect. The ranges of effect sizes for the different intervals of the methods were as follows: the BASRI-spine and the BASRI-total had lower effect sizes (0.12–0.36) than did the SASSS (0.32–0.51) and the M-SASSS (0.34–0.58).

Table 6. Four-year followup of structural damage assessed by the different scoring methods in 133 patients*
 No. of patientsMean ± SDMedian25th percentile75th percentile
  • *

    BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score.

BASRI-spine     
 Baseline1336.5 ± 3.07.04.09.0
 1 year1296.7 ± 3.17.04.09.0
 2 years1276.9 ± 3.07.04.09.0
 4 years1337.1 ± 3.17.54.59.8
BASRI-total    
 Baseline1336.9 ± 3.47.04.09.0
 1 year1297.1 ± 3.47.04.09.0
 2 years1277.3 ± 3.47.54.510.0
 4 years1337.6 ± 3.58.04.810.0
SASSS     
 Baseline13210.1 ± 18.02.00.012.0
 1 year12911.7 ± 18.83.00.014.0
 2 years12712.6 ± 19.24.00.016.0
 4 years13313.6 ± 19.36.00.020.0
Modified SASSS     
 Baseline13112.7 ± 17.45.00.016.9
 1 year12814.4 ± 18.35.50.020.0
 2 years12615.5 ± 18.96.50.022.4
 4 years13217.1 ± 19.69.91.027.6
Table 7. Percentages of patients showing a change of ≥1 unit per scoring method*
Scoring method1 year2 years4 years
  • *

    BASRI = Bath Ankylosing Spondylitis Radiology Index; SASSS = Stoke Ankylosing Spondylitis Spine Score.

BASRI-spine14.025.237.6
BASRI-total14.726.842.1
SASSS31.038.145.5
Modified SASSS41.646.456.5
Feasibility.

The time needed for each scoring method differed. It was not possible to provide a mean time required for scoring a single set of radiographs for a patient. The BASRI took the least time, since this method is less detailed than the SASSS and the M-SASSS (in the latter 2 methods, every corner of a vertebra must be assessed). The same also applied to the time needed for training. The radiation exposure for the patients was as follows (based on data provided by the Radiology Department of the University Hospital Maastricht): AP view of the pelvis = 0.54 mSv, AP view of the lumbar spine = 0.54 mSv, lateral view of the lumbar spine = 0.93 mSv, and lateral view of the cervical spine = 0.07 mSv. The total exposure for the different methods was 2.08 mSv for the BASRI, 0.93 mSv for the SASSS, and 1 mSv for the M-SASSS.

DISCUSSION

  1. Top of page
  2. Abstract
  3. PATIENTS AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. REFERENCES

The M-SASSS seems to be the most appropriate method for scoring radiologic progression in AS patients. This conclusion is based on the following aspects of the OMERACT filter: truth, discrimination, and feasibility.

With regard to truth, a valid scoring system requires assessments of the cervical and lumbar spine. Inclusion of the SI joints and hips has no additional value for the detection of progression. An AP view of the lumbar spine as well as an assessment of the posterior site of the lumbar spine do not provide sufficient additional information about progression to justify the extra effort, but an AP view will provide additional information (and therefore better reflects the truth) if the level of damage rather than the progression of damage is the major concern. The consequences are that the SASSS is not recommended because it does not take into account the cervical spine. The BASRI is recommended because of its AP view if radiographic damage is the matter of interest, but this AP view does not supply valuable additional information if progression must be scored.

With regard to discrimination, the M-SASSS demonstrated superior interobserver reliability. In terms of sensitivity to change, this method quantifies a higher proportion of patients as having progression as compared with the BASRI. It also appeared that the BASRI, in contrast with the M-SASSS, might be subject to a ceiling effect. With regard to feasibility, the BASRI takes less time for scoring and training but yields the highest radiation exposure to the patient. For the aspect of feasibility, there is no preferred method.

When the results of our comparison of the different scoring methods against the OMERACT filter are surveyed, the M-SASSS seems to be preferable for the evaluation of radiologic progression in clinical trials and cohort studies. Several studies related to the scoring of radiographs of AS patients have been published, mainly by the developers of the BASRI and the SASSS. It appears that our results are consistent with the results of those studies.

In our study, we could not find support for including the hips in a staging or a progression score. This finding is supported by the data of MacKay et al (1), who persuasively explain why the hips are not included in the BASRI-spine. Because hip disease affects only 18–37% of the AS population, the use of a global score for every AS patient, with a maximum score of 16 rather than 12, may inappropriately dilute the score of the majority of AS patients. Those with severe, or grade 4, spinal disease without hip arthritis would rate only 12 on a 16-point global scale despite having a bamboo spine, poor metrologic values, and poor function. It may be better to grade these populations separately, using the BASRI-spine for one and the BASRI-total for the other. Note that omission of the hips and SI joints in our scoring method does not necessarily mean that these joints are not important in AS for prognostication. As an example, hip involvement became an important predictor of severe disease (14).

The results reported by MacKay et al (1) are also consistent with our conclusion about the essential inclusion of the cervical spine. They presented data on the involvement of the cervical and lumbar spine, SI joints, and hips in a group of 470 patients (15). More than 80% of the patients showed involvement of the cervical or lumbar spine or both (43%), and 8% of the patients showed changes only in the cervical, but not the lumbar, spine. As for which view is needed for scoring the lumbar spine, our conclusion on the status scores is again supported by MacKay et al (1). They judged 58 sets of AP and lateral views of the lumbar spine, and scores for the AP view, the lateral view, and a combination score (the highest score of the 2 views) were obtained. The combination score differed from the AP or lateral score if syndesmophytes or fusion was seen at different levels on each projection. This occurred in 3 of the 58 patients. The combination score differed from the AP score alone in 9 of the 58 patients (15.5%) and from the lateral score alone in 21 patients (36%). Overall, the use of 2 projections changed the score in 46% of the cases. Assuming that the combination view provides the most truthful assessment, the sensitivity of the AP view alone is 0.83 and that of the lateral view alone is 0.73. For the aim of staging, both views are therefore necessary. Unfortunately, MacKay et al did not investigate whether both views are also necessary for assessing progression.

The thoracic spine is not included in any of the scoring methods. This is due to technical problems related to the anatomy of the chest with superimposed lung tissue. Another structure of the spine that has not been mentioned is the facet joints. In lateral views of the lumbar spine, these joints are difficult to assess with any degree of confidence even by an experienced musculoskeletal radiologist (2). On an AP view, these joints can be assessed. This is an advantage of the BASRI scoring method. All other methods ignore the posterior structures of the spine, classifying those who have only posterior element fusion as normal or as having mild radiographic changes, when in fact the spine may be completely fused (1). In Table 4, measurements of spinal mobility were compared with radiologic scores. We found a good correlation, and this relationship was as good for the BASRI-spine as for the other methods.

An important disadvantage of the BASRI in comparison with the SASSS methods is the fact that it does not pick up minor radiologic change. The score does not change with each additional erosion or sclerosis, and will always remain grade 2 until there is fusion between 2 vertebrae or ≥3 syndesmophytes are identified. The developers of the BASRI and the SASSS evaluated their reliability and sensitivity to change. Inter- and intraobserver reliability of the BASRI was assessed on status scores (1), which showed good reliability. After a period of 1 year, no change was observed. In a 2-year period, the mean BASRI-spine value increased from 7.0 to 7.9 (in 40 patients), which was statistically significant. The radiographs in that study were blinded for chronology, confirming that the BASRI could determine “forward progression” (i.e., identify the earlier of 2 radiographs performed on the same individual). We found a progression from 6.5 to 6.9 over a 2-year time interval, and our radiographs were read in chronological order, which often even amplifies progression scores. The difference might be explained by the fact that the patient population in Bath, UK, differs from the population in the OASIS cohort with respect to disease severity.

The developers of the SASSS (16) also investigated the reliability of their method. They showed a good interobserver reliability but, unexpectedly, poorer intraobserver reliability. Sensitivity to change was assessed in 28 patients over a 12-month time interval, and the radiographs were read in known order. The SASSS increased by 4.1 (from 14.4 to 18.5), which was statistically significant. This increase is considerable in comparison with our results; after 4 years of observation, we observed a progression of only 3.5 points.

The cohort used in this study has been studied before, as mentioned above (5, 6). In contrast with the previous study, we observed a change after 1 year, but the order in which the radiographs were scored was known, while in Spoorenberg et al the order was unknown. This can markedly influence the results, as has been shown for rheumatoid arthritis (17). Moreover, in the Spoorenberg study, the average of the 2 observers' progression scores was used to determine whether a patient was classified as having progression, and the criteria for defining progression were much stricter than those applied in the present study. This was especially a disadvantage for the M-SASSS. With the 4-year data available, we observed that the minor changes after 1 and 2 years indeed forecast further progression after 4 years, which adds to the validity of these minor changes.

The different results on progression in all studies can also be explained by a difference in composition of the patient populations. There are 2 different concepts in the mode of radiographic and functional progression of AS during the first 10 years after disease onset. While 2 groups (18, 19) have reported that the most rapid progression occurred in this period, another group (20) recently reported that, in their patient population, radiographic progression was linear, with no significant changes between the decades.

This study may evoke some concerns. First, the conclusions of this study are based on findings in the OASIS cohort. Although this cohort represents the entire spectrum of AS patients, which adds to the external validity of the observations, the conclusions still need to be confirmed by other independent investigators examining a different cohort. Second, we did not investigate whether any of the measures are subject to spectrum bias, i.e., whether they perform differently in patients with early versus late AS. The group was too small to make subgroups for such an analysis. Third, most of the analyses in this study are based on the scores of one reader. Although interobserver reliability appeared to be satisfactory, future studies should include more readers in order to limit biases due to single observers.

In all the studies describing the measurement of radiologic change in AS patients, there are no data available on the reliability of progression scores, which is important in clinical trials. Therefore, we would like to emphasize that in future studies, it is necessary to pay attention to the reliability of these scores. As can be seen from our results, the reliability of progression scores can add important information to the reliability of status scores. In our study, change could be assessed reliably by the M-SASSS.

In summary, comparing the BASRI, the SASSS, and the M-SASSS with respect to their use in clinical trials, we have shown that the M-SASSS offers advantages in measurement properties. However, the BASRI is a feasible and user-friendly method that reliably detects damage in patients with AS, and can be used for that purpose in clinical practice.

REFERENCES

  1. Top of page
  2. Abstract
  3. PATIENTS AND METHODS
  4. RESULTS
  5. DISCUSSION
  6. REFERENCES
  • 1
    MacKay K, Mack C, Brophy S, Calin A. The Bath Ankylosing Spondylitis Radiology Index (BASRI): a new, validated approach to disease assessment. Arthritis Rheum 1998; 41: 226370.
  • 2
    Averns HL, Oxtoby J, Taylor HG, Jones PW, Dziedzic K, Dawes PT. Radiological outcome in ankylosing spondylitis: use of the Stoke Ankylosing Spondylitis Spine Score (SASSS). Br J Rheumatol 1996; 35: 3736.
  • 3
    Creemers MC, Franssen MJ, van 't Hof MA, Gribnau FW, van de Putte LB, van Riel PL. A radiographic scoring system and identification of variables measuring structural damage in ankylosing spondylitis [thesis]. Nijmegen (The Netherlands): University of Nijmegen; 1993.
  • 4
    Van der Heijde D, Bellamy N, Calin A, Dougados M, Khan MA, van der Linden S, and the Assessments in Ankylosing Spondylitis Working Group. Preliminary core sets for endpoints in ankylosing spondylitis. J Rheumatol 1997; 24: 22259.
  • 5
    Spoorenberg A, de Vlam K, van der Heijde D, de Klerk E, Dougados M, Mielants H, et al. Radiological scoring methods in ankylosing spondylitis: reliability and sensitivity to change over one year. J Rheumatol 1999; 26: 9971002.
  • 6
    Spoorenberg A, de Vlam K, van der Linden S, Dougados M, Mielants H, van de Tempel H, et al. Radiological scoring methods in ankylosing spondylitis: reliability and change over 1 and 2 years. J Rheumatol 2004; 31: 12532.
  • 7
    Boers M, Brooks P, Strand CV, Tugwell P. The OMERACT filter for Outcome Measures in Rheumatology [editorial]. J Rheumatol 1998; 25: 1989.
  • 8
    Kennedy LG, Jenkinson TR, Mallorie PA, Whitelock HC, Garrett SL, Calin A. Ankylosing spondylitis: the correlation between a new metrology score and radiology. Br J Rheumatol 1995; 34: 76770.
  • 9
    Dale K. Radiographic gradings of sacroiliitis in Bechterew's syndrome and allied disorders. Scand J Rheumatol 1979; 32 Suppl 32: 927.
  • 10
    MacKay K, Brophy S, Mack C, Doran M, Calin A. The development and validation of a radiographic grading system for the hip in ankylosing spondylitis: the Bath ankylosing spondylitis radiology hip index. J Rheumatol 2000; 27: 286672.
  • 11
    Taylor HG, Beswick EJ, Dawes PT. Sulphasalazine in ankylosing spondylitis: a radiological, clinical and laboratory assessment. Clin Rheumatol 1991; 10: 438.
  • 12
    Bellamy N. Musculoskeletal clinical metrology. Dordrecht (The Netherlands): Kluwer Academic Publishers Group; 1993. p. 259.
  • 13
    Calin A, Garrett S, Whitelock H, Kennedy LG, O'Hea J, Mallorie P, et al. A new approach to defining functional ability in ankylosing spondylitis: the development of the Bath Ankylosing Spondylitis Functional Index. J Rheumatol 1994; 21: 22815.
  • 14
    Amor B, Santos RS, Nahal R, Listrat V, Dougados M. Predictive factors for the longterm outcome of spondyloarthropathies. J Rheumatol 1994; 21: 18837.
  • 15
    MacKay K, Brophy S, Mack C, Calin A. Patterns of radiological axial involvement in 470 ankylosing spondylitis patients [abstract]. Arthritis Rheum 1997; 40 Suppl 9: S61.
  • 16
    Taylor HG, Wardle T, Beswick EJ, Dawes PT. The relationship of clinical and laboratory measurements to radiological change in ankylosing spondylitis. Br J Rheumatol 1991; 30: 3305.
  • 17
    Bruynesteyn K, van der Heijde D, Boers M, Saudan A, Peloso P, Paulus H, et al. Detecting radiological changes in rheumatoid arthritis that are considered important by clinical experts: influence of reading with or without known sequence. J Rheumatol 2002; 29: 230612.
  • 18
    Carette S, Graham D, Little H, Rubenstein J, Rosen P. The natural disease course of ankylosing spondylitis. Arthritis Rheum 1983; 26: 18690.
  • 19
    Gran JT, Skomsvoll JF. The outcome of ankylosing spondylitis: a study of 100 patients. Br J Rheumatol 1997; 36: 76671.
  • 20
    Brophy S, MacKay K, Al-Saidi A, Taylor G, Calin A. The natural history of ankylosing spondylitis as defined by radiological progression. J Rheumatol 2002; 29: 123643.