The modified Stokes Ankylosing Spondylitis Spinal Score (mSASSS) quantifies radiographic changes in the cervical spine (C-spine) and the lumbar spine (L-spine), but not in the thoracic spine (T-spine). Our objective was to study the contribution of the lower part of the T-spine to structural damage in patients with ankylosing spondylitis (AS).
Radiographs of 80 AS patients obtained at baseline and after 2 years were scored by 2 readers using the mSASSS. In addition, changes in the lower T-spine (T10–T12) were quantified. On this basis, a new scoring tool was developed: the Radiographic Ankylosing Spondylitis Spinal Score (RASSS). The RASSS includes 2 changes: no scoring of erosions in order to confine the scoring to new bone formation, and no scoring of squaring in the C-spine for anatomic and feasibility reasons.
The mean ± SD change was 0.9 ± 2.5 units using the mSASSS and 1.6 ± 2.8 units using the RASSS (P < 0.001). Although the mSASSS identified new syndesmophytes in mean ± SD 1.4 ± 2.9 vertebral edges over 2 years, an additional 0.6 ± 1.2 vertebral edges were seen in the lower T-spine. New syndesmophytes or ankylosis were found in 15 patients (21.4%; 95% confidence interval [95% CI] 13.1–32.4%) in the C-spine/L-spine and in 6 patients (8.6%; 95% CI 3.8–17.2%) in the T-spine alone. The reliability of the RASSS and the agreement between readers was excellent.
The lower T-spine improves the sensitivity to change of scoring radiographic progression in AS. The tool developed in this study, the RASSS, showed better face and content validity than the mSASSS and was proven to be superior in the quantification of new bone formation in AS.
Ankylosing spondylitis (AS) is a frequent chronic inflammatory rheumatic disease that affects the axial skeleton at a young age (1), starting from the sacroiliac joints and later potentially spreading to the entire spine (2). New bone formation, syndesmophytes, and ankylosis of the vertebral column, which are pathognomonic for the structural changes occurring in AS, are used to assess the course of the disease.
The gold standard for the assessment of chronic structural changes in AS are conventional radiographs (3, 4), although magnetic resonance imaging (MRI) techniques are also useful to assess spinal inflammation (2, 5). Using MRI, recent studies from Europe (6, 7) and North America (8, 9) have shown that the lower half of the thoracic spine (T-spine) is most frequently affected by active but also chronic lesions in patients with AS.
It is of major interest in clinical studies and daily practice to know whether or not and how much radiographic progression related to AS can be detected in individual patients. These questions relate to the scoring systems used. Chronic spinal changes in AS are currently quantified by the modified Stokes Ankylosing Spondylitis Spinal Score (mSASSS) (10), an evaluated scoring system that is currently regarded as the best available on a data-driven basis (11). For the assessment of spinal radiographic progression in patients with AS, an observation period of 2 years is the shortest possible followup period based on the reliability and sensitivity to change of the mSASSS (12).
The smallest detectable change (SDC) (13) is a measure of a statistically significant radiographic change in individual patients beyond background noise in the case of a paired reading of the films. The SDC has a better sensitivity to change than the frequently used smallest detectable difference (SDD), which expresses the smallest difference between 2 independently obtained measures that can be interpreted as real (14). Because radiographic deterioration is scored with the films compared side by side (paired reading) and not independently in rheumatologic studies (14), the SDD is less appropriate than the SDC for the definition of cutoff levels for changes between measurements. Nevertheless, we have recently shown that counting new syndesmophytes between time points is even more sensitive than the SDC or SDD for depicting radiographic deterioration in patients with AS (15).
A potential disadvantage of the mSASSS is that the T-spine is not included, which is also true for other scoring systems (16). This is because of mainly technical reasons such as superimposition of the lungs in plain radiographs. Therefore, the reliability of scoring the T-spine has remained insufficient to date (7). Thus, the most frequently affected spinal region has not become part of scoring systems developed up to now. Because of this, many syndesmophytes may not have been detected in recent studies (17–20). This may in part explain the relatively low sensitivity to change of the mSASSS (15).
Recent data based on the mSASSS suggest that the very clinically efficacious tumor necrosis factor blockers do not inhibit radiographic progression in patients with AS (18–20). Because the mean radiographic change has been reported to be less than 1 syndesmophyte (mSASSS scores between 0.4 and 1.5 units over 2 years) (17, 21), the sensitivity to change of the mSASSS has been questioned.
Furthermore, the face and construct validity of the mSASSS may be criticized because a score of 1 contains a mixture of osteodestructive (erosions) and osteoproliferative changes (squaring and sclerosis). Recent studies have indicated that erosions occur in less than 5% of all radiographic changes that develop over 2 years (15) in patients with AS. Furthermore, it was recently shown that the scoring of squaring in the cervical spine (C-spine) has problems on the basis of the anatomy of several cervical vertebrae that already naturally appear squared; this gives reason for false positive scores at this location (22).
The Outcome Measures in Rheumatology Clinical Trials (OMERACT) filter (23) is an instrument used to evaluate and compare different outcome methods for use in rheumatology. According to the OMERACT filter, 3 aspects (discrimination/sensitivity to change, truth, and feasibility) should be investigated before a preference between any proposed methods can be made.
The main objective of this study was to assess and quantify the additional information gained by inclusion of the lower T-spine in the assessment of radiographic progression observed in patients with AS. Furthermore, on that basis, we intended to possibly develop and evaluate a new scoring tool that also takes into account the other problems discussed above.
PATIENTS AND METHODS
Patients were retrospectively selected based on the availability of radiographs of the lateral C-spine and lumbar spine (L-spine) at 2 time points: the first clinical presentation (baseline) and 2 years later (2-year followup). All radiographs were performed according to the standard protocol used in our hospital. All of the patients included in this study fulfilled the modified New York criteria for AS (24). Radiographs were performed between January 1999 and December 2003 as part of a routine outpatient clinic procedure. None of the patients were treated with biologic agents. No other selection filters (level of disease activity, clinical or laboratory parameters) were used for selection.
After blinding of the radiographs for the patient's identity and the time point of performance, all images of the C-spine and L-spine were scored using the mSASSS, as recently described in detail (10, 15), by 2 experienced readers (JB, XB) in a blinded paired design (14). In addition, all visible vertebral edges (VEs) of the lower part of the T-spine were scored and separately documented.
Radiographic studies of AS always involve missing data and technical problems such as low quality of some images, overexposure or underexposure of films, or suboptimal positioning of the patients, leading to incomplete capturing of spinal segments and vertebrae (11, 15). In the present study, similar to recent proposals (11, 15), we excluded images of patients with greater than 3 VEs missing. In cases with ≤3 VEs missing, the missing VEs were replaced by the mean scores of the vertebrae of the same spinal segment. The lower part of the T-spine and the L-spine were handled as one spinal segment.
Development of the Radiographic Ankylosing Spondylitis Spinal Score.
When it became clear that the additional scores obtained from the lower T-spine improved the sensitivity to change of the mSASSS, we decided to develop a new scoring tool in order to further improve the validity of the method. Therefore, we changed 2 further aspects: 1) in order to confine the score to new bone formation, we excluded all scorings of erosions in all spinal segments (15), and 2) in order to avoid false positive scores, we excluded the scorings for squaring in the C-spine. The main reason for these modifications was a recent study showing that the scoring of squaring in the C-spine is not reliable, with the exception of C5 and C6 (22). Since squaring in the C-spine is an infrequent event (<1% of all scores) it is also scored in the T-spine, and because of the better feasibility of handling all similar spinal segments, we decided to omit the scoring of squaring for the entire C-spine.
The new tool was named the Radiographic Ankylosing Spondylitis Spinal Score (RASSS) to stress the fact that this is not a mixed score of osteodestructive and osteoproliferative lesions anymore, hereby distinguishing it from the mSASSS (Table 1). The final step was then to score all of the images a second time with this new score in order to compare it with the current gold standard (the mSASSS) and prove its feasibility.
Table 1. Comparison of the 2 scoring systems that were evaluated in this study: the mSASSS and the RASSS*
mSASSS = modified Stokes Ankylosing Spondylitis Scoring System; RASSS = Radiographic Ankylosing Spondylitis Spinal Score; C = cervical; T = thoracic; S = sacral.
View of image/sites scored
Lateral/anterior vertebral edges
Lateral/anterior vertebral edges
Assessed spinal segments
Lower edge of C2 to upper edge of T1
Lower edge of C2 to upper edge of T1
Lower edge of T10 to upper edge of T12
Lower edge of T12 to upper edge of S1
Lower edge of T12 to upper edge of S1
Range of scoring system
Erosion, squaring, sclerosis for both the cervical and lumbar spines
No erosions scored, squaring only for the thoracic and lumbar spines, sclerosis scored for all sites available
Assessments of clinical parameters (Bath Ankylosing Spondylitis Disease Activity Index , Bath Ankylosing Spondylitis Functional Index , and Bath Ankylosing Spondylitis Metrology Index [BASMI] , which include the tests for anteroposterior [Schober] and lateral spinal mobility and standard laboratory parameters [C-reactive protein level, erythrocyte sedimentation rate]) were available for all patients at both time points. The results of the anteroposterior and lateral thoracolumbar mobility assessments were correlated with the status and change scores of the radiographic evaluations.
Wilcoxon's paired rank sum test was used to compare the readings of the 2 scoring systems between different time points. Pearson's correlation coefficient was used to measure the association between the radiographic data and the single clinical and laboratory parameters. To measure the variability between single readings of the change scores of the 2 readers, the interrater variance was estimated by means of analysis of variance. The intraclass correlation coefficients and their 95% confidence intervals (95% CIs) were calculated to compare the interrater variance with the variability between the total scores of the patients. Similar to a recent modification (15), the SDC (13) was calculated by taking into account the number of readings available for the calculation. This means that the calculation was based on 95% tolerance limits, ensuring that <5% of the changes greater than the SDC were due to the measurement error and/or the uncertainty in the readings.
Additional information on status scores at baseline after inclusion of the lower part of the T-spine.
Altogether, 80 patients who had appropriate available radiographs were included in the study. The baseline demographic, clinical, and radiographic data of the patients at baseline and at followup are shown in Table 2.
Table 2. Demographic, clinical, and radiographic descriptions of the 70 patients included in the assessment of the radiographic progression in this study*
The lower part of the T-spine was clearly visible and could be assessed in 70 patients (88%), whereas the remaining 10 patients (12%) had to be excluded from the analysis because less than 3 VEs were visible on their films.
The most caudal VE that could possibly be detected was the lower edge of the ninth thoracic vertebra (T9). Therefore, the maximal possible information on radiographic progression that could have possibly been obtained was on 6 additional VEs (lower edge of T9 through upper edge of T12). However, the lower edge of T9 was assessable in only 9 patients (12.9%), and the upper edge of T10 was assessable in only 18 (25.7%) of 70 patients, leading to >3 missing sites in the T-spine in the majority of the patients. In contrast, all other VEs from the lower edge of T10 to the upper edge of T12 were visible in >50% of the patients. Their inclusion significantly added information to the total amount of radiographic damage detected with mean ± SD 3.1 ± 0.4 VEs per patient, thus including all patients in the evaluation. The inclusion of the additional VEs in the analysis increased the range of the scoring system from 0–72 units in the mSASSS to 0–84 units in the RASSS (Table 1).
Differences in the outcome of radiographic change after inclusion and exclusion of erosions and vertebral squaring in the C-spine of patients with AS
Overall, 827 cervical VEs were available for analysis at baseline. Of those, a score of 1 was found in 16 VEs. In those VEs, squaring was identified in only 3 cases (0.4%). Furthermore, only 1% of the VEs showed deterioration from no damage to erosions after 2 years. Inclusion or exclusion of scorings for erosions and of scores for squaring in the C-spine did not change the overall score for radiographic deterioration: the mean ± SD RASSS change was 1.7 ± 3.1 units with the scorings and 1.6 ± 2.8 units without the scorings (P > 0.05).
Reliability of the readings for both scoring systems.
The interrater variances of the status and the change scores for both the mSASSS and the RASSS were very low, indicating excellent reliabilities for both scoring systems (Table 2). The low interrater variances also corresponded to very low SDC values of 1.1 for the mSASSS and 1.3 for the RSASSS, suggesting that a progression of ≥2 units represents a relevant radiographic change. The detailed comparison of the 2 scoring systems, including data on the agreement/disagreement between readers, is shown in Table 3 and Figure 1.
Table 3. Detailed data on the reliability of the readings for both scoring systems*
Data are shown as the variance between patients and interrater variance with the corresponding intraclass correlation coefficients (ICCs), including agreement and disagreement between readers. mSASSS = modified Stokes Ankylosing Spondylitis Spinal Score; RASSS = Radiographic Ankylosing Spondylitis Spinal Score; 95% CI = 95% confidence interval.
Additional information on change scores of radiographic progression in the 2-year followup after inclusion of the lower part of the T-spine.
The radiographic progression after 2 years showed a mean ± SD change of 0.9 ± 2.5 units in the mSASSS and 1.6 ± 2.8 units in the RASSS (P < 0.001). When assessing each spinal region separately, the mean ± SD RASSS change was 0.5 ± 2.9 units in the C-spine alone, 0.4 ± 2.2 units in the L-spine alone, and 0.6 ± 3.3 units in the lower part of the T-spine alone within the 2-year followup period.
At followup, new syndesmophytes were depicted in mean ± SD 1.4 ± 2.9 VEs per patient when using the mSASSS, and in 0.6 ± 1.2 VEs per patient in the additionally analyzed lower edge of T10 to the upper edge of T12 (Figure 2).
In the analysis based on single VEs, the occurrence of AS-specific progression such as development of new syndesmophytes or progression from syndesmophytes to ankylosis occurred mean ± SD 0.04 ± 0.3 times per VE in the C-spine and L-spine and 0.02 ± 0.22 times per VE in the lower T-spine (P > 0.05 between segments).
Importantly, on the patient level, development of new syndesmophytes/ankylosis was seen in 15 (21.4%) of 70 patients (95% CI 13.1–32.4%) in the C-spine and L-spine and in 6 (8.6%) of 70 patients (95% CI 3.8–17.2%) in the T-spine alone; 4 patients showed such changes in all 3 spinal segments (C-spine, L-spine, and T-spine).
Feasibility of evaluation of changes in the lower part of the T-spine.
The mean ± SD time for the scoring of the images using the mSASSS was 91.6 ± 60.2 seconds per patient for the C-spine and L-spine, and 18.8 ± 10.1 seconds for the lower part of the T-spine. Therefore, the mean ± SD time of scoring the films of a single patient was usually <2 minutes, with a minor increase in time for the lower T-spine.
The radiation exposure (dosis/surface product) calculated for the patients remained below the limit (800 cGy/cm2) prescribed by the guidelines of the responsible authorities (Bundesamt für Strahlenschutz) (28), with a value of 616.94 cGy/cm2.
Correlation of radiographic scores and clinical assessments.
The baseline values of the mSASSS and RASSS correlated significantly (r = 0.984, P < 0.001). Furthermore, there was a statistically significant correlation between the RASSS and the BASMI (r = 0.46), but also between the mSASSS and the BASMI (r = 0.49) at baseline (P < 0.001 for both). Regarding change scores, there was a statistically significant correlation between the mSASSS change and the RASSS change (r = 0.87, P < 0.001). However, there was no statistically significant correlation between radiographic scores and clinical or laboratory parameters (data not shown).
To our knowledge, this is the first study to examine and quantify the radiographic changes in the lower part of the T-spine using conventional radiographs of patients with AS. On this basis, and after taking into account recent publications on the subject, we developed a new tool for scoring radiographic progression in patients with AS that is purely based on new bone formation: the RASSS.
As shown in the present study, the lower part of the T-spine is visible in the vast majority of spinal radiographs obtained in daily routine in Germany. This is probably due to historical recommendations: some decades ago, Dihlmann taught about specifically assessing the region of the lower T-spine and the upper L-spine in patients with AS, which is still recommended by the German Society of Rheumatology (29). Since we cannot be sure that this is also the case in other radiologic departments in the world where the radiograph is often centered lower with tighter collimation so that only T12 is included on the image, our data suggest that it may be advisable to change the acquisition technique in order to be able to apply the new scoring system. The required technique is described in the Patients and Methods section. Generally, altering the beam and centering and opening the collimation requires only a minor change in practice, resulting in a slightly inferior projection of the lower L-spine without limiting AS-related changes for scoring.
In general, to be able to score with the RASSS, the segments T10 (lower edge) to T12 (upper edge), in addition to the C-spine and the L-spine, need to be assessed on the radiographs. Because spinal segments located more cranially are not clearly visible (15), vertebrae such as the lower edge of T9 and the upper edge of T10, although visible in some images, finally had to be excluded from the score. Whether it will be technically possible and statistically necessary to include more vertebrae of the T-spine will be subject to future studies. In any case, at the end of the first part of this study, it seemed obvious that the lower part of the T-spine should be included in the assessment and the scoring of the lateral lower spine. On this basis, we believed that it was time to develop a new scoring system and not to modify the mSASSS a second time (10, 30). Therefore, we proposed to score radiographic progression in AS with the new scoring tool, the RASSS, which, as shown in this study, performs superior to the mSASSS.
The general principle of scoring a mixture of osteoproliferative changes such as squaring, sclerosis, syndesmophytes formation, and ankylosis with osteodestructive changes such as erosions has no convincing face and content validity, since a development from an erosion to a syndesmophyte is a rare event, if it occurs at all. Theoretically, 2 scoring systems, one for osteoproliferative changes and one for osteodestruction, would be ideal. However, since erosions tend to occur infrequently, in less than 5% of all radiographic changes as shown only recently (15), it appeared straightforward to omit that position.
According to the data from a recent study in Korea (22), several vertebrae of the C-spine already appear squared by the nature of their anatomy. Since this was in accordance with our own subjective experience, we first decided to propose to only score the 5th and 6th cervical vertebrae in the RASSS without any further analysis of our data on this subject. Because we then calculated that squaring in the C-spine occurred in less than 1% of the VEs assessed, we finally decided to exclude all scorings of squaring in the C-spine from the final analysis. We also believe that this improves the feasibility and simplicity of the scoring system.
The status and change scores of the RASSS have been compared with the original mSASSS not only on a numerative basis of score units, but also based on the main aspects of the OMERACT filter (truth, discrimination/sensitivity to change, and feasibility) (23).
With respect to the aspect of truth, a major argument is that our results are very much in line with recent reports on the relative frequency of inflammatory and structural spinal lesions in patients with AS, as depicted by MRI, showing that spinal lesions mostly occur in the lower part of the T-spine (6–9). In the present study, when only the C-spine and the L-spine were scored, approximately 20% of the patients showed definite AS-related changes such as syndesmophytes and ankylosis, again similar to previous reports (15). This proportion increased to 30% of the patients when the lower part of the T-spine was added. Similar results were obtained when the analysis was performed on the basis of spinal segments: the mean change scores were higher in the lower part of the T-spine as compared with the changes in the rest of the spine.
Similar to the mSASSS, the RASSS is also mainly based on the most disease-specific changes in AS (15): syndesmophytes. Therefore, a change of ≥2 units in the RASSS represents a significant radiographic change. Importantly, this does not necessarily indicate the presence of new syndesmophytes, since 2 scores of a single RASSS unit would also add up to a score of 2. This score represents a relevant cutoff for the assessment of radiographic progression in AS patients after 2 years. On this basis, a deterioration of structural damage occurred in more than 80% of the patients in our cohort.
The truth aspect was also assessed by correlating the change scores of the scoring systems to clinical parameters. Similar to most previous studies, no significant correlations were found (11, 15, 18, 20), although another study suggested a closer link between radiographic damage and spinal mobility as measured by the BASMI (31). One possible reason for a partial disconnect is that physical function in AS is also independently determined by disease activity and not only radiographic damage of the spine (32).
Regarding the OMERACT filter aspect discrimination, the mean radiographic progression was significantly increased when using the RASSS as compared with the mSASSS. This might be related to the significantly higher ability of the RASSS to depict patients with development of new syndesmophytes, the most valid way to characterize cohorts of AS patients in terms of structural damage (15).
Finally, for the feasibility aspect, the addition of the lower part of the T-spine only marginally added time to the act of scoring in the RASSS as compared with the mSASSS. Overall, the additional seconds needed to score the lower part of the T-spine do not seem to matter much with respect to the gain in sensitivity to change. Furthermore, no additional time was needed to perform the radiographs of the lower part of the T-spine, since no extra radiographs were needed.
The fact that in other departments, performance of routine radiographs of the L-spine does not include the lower part of the T-spine may limit the ability to use the RASSS in those patients. However, in a setting of clinical studies, the inclusion of this part of the spine may be arranged as a part of the imaging protocol. It is worth mentioning that overall, the comparison between images with and without inclusion of the T-spine may lead to a slightly inferior projection of the lower L-spine. However, this will not have a major influence on the results, and we think that this small sacrifice is worth it because of the gain in validity and sensitivity to change by using the RASSS.
Subsequently, a further limitation not only of the present study but for all assessments of radiographic deterioration in the spine of AS patients is the fact that we are for technical reasons still not able to completely assess the lower half of the T-spine, since we are aware that the regions of T6–T9 are also frequently affected (6). Therefore, we do not think that this is the end of all efforts to improve the assessment of structural damage in AS, but we do believe that this step performed here with this data is the best we can currently do with the available technique.
In conclusion, the inclusion of the lower part of the T-spine significantly increases the sensitivity to change when scoring radiographic damage in AS because more syndesmophytes are potentially scored. The new tool, the RASSS, has better face and content validity than the mSASSS. Overall, it proved to be clearly superior for the assessment of structural damage in patients with AS. The RASSS should be further evaluated in clinical trials and cohort studies of patients with AS.
Dr. Braun had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Baraliakos, Braun.
Acquisition of data. Baraliakos, Rudwaleit, Sieper.
Analysis and interpretation of data. Baraliakos, Listing, Braun.