Evaluation of the smallest detectable difference in outcome or process variables in ankylosing spondylitis




To evaluate the smallest detectable difference (SDD) of symptomatic outcome or process variables in ankylosing spondylitis (AS) patients from various countries.


Thirty consecutive AS patients with axial involvement were recruited from 1 center in each of 4 countries (Spain, Morocco, France, The Netherlands), for a total of 120 patients. Fourteen variables were studied in 6 domains: pain (3 variables), stiffness (1 variable), function (2 variables), spinal mobility (3 variables), patient global assessment (4 variables), and the domain of enthesiopathy (1 variable). All patients were evaluated twice within a 1-week period during which no clinical or therapeutic change occurred. Intracenter reliability was evaluated using the intraclass correlation coefficient (ICC). The SDD was determined using the Bland-Altman method.


Of the 14 variables evaluated in the 120 patients (82% males, 42 ± 12 years old, with a mean disease duration of 17 ± 13 years), only the SDD for the variable occiput-to-wall distance showed statistically significant difference among centers. For the entire group, the SDD, expressed as percentage of the range of the variable, varied from 10% (Mander enthesis index) to 39% (spinal pain at night last week). The intraobserver reliability was good (ICC > 0.80) except for the variables morning stiffness and modified Schober test (ICCs of 0.76 and 0.60, respectively).


This study suggests that the evaluation of AS patients is homogenous and reliable in different centers of different European and North African countries. Evaluation of the SDD of the symptomatic outcome or process variables is a starting point to determine the minimum clinically important difference, permitting the presentation of results of clinical studies on an individual basis.


Ankylosing spondylitis (AS) is a chronic inflammatory disease of axial joints and, in particular, both sacroiliac joints. It is associated with extraarticular manifestations and is closely linked to HLA–B27 (1). Clinical presentation includes axial involvement, peripheral articular features, enthesiopathy, and extraarticular features. Patient evaluation and monitoring are specific to the clinical presentation.

The severity of extraarticular features, such as acute anterior uveitis, is usually evaluated by the frequency and severity of the events (2, 3).

The evaluation of the severity of peripheral articular features is similar to that in other inflammatory rheumatic diseases, such as rheumatoid arthritis. This evaluation includes the number of tender joints and the swollen joint count according to either the American College of Rheumatology or the European League Against Rheumatism recommendations (4, 5).

The technique of evaluating the severity of both axial involvement and enthesiopathy has been the objective of a recent working group dealing with outcomes in AS, the Assessment in Ankylosing Spondylitis (ASAS) working group. This group has recently proposed a selection of important domains and specific instruments within these domains to be systematically evaluated and reported in clinical studies (6–8). Most of these instruments are continuous outcome variables. In clinical trials in general, results are reported as mean changes of outcome variables experienced by a group of patients. However, it has been proposed that the response of individual patients should also be reported (9, 10). The presentation of results on an individual basis offers several advantages: 1) The presentation is more meaningful for the medical and nonmedical community; 2) it facilitates the comprehension of data from clinical trials; and 3) such a presentation might also be applied to calculation of sample size for clinical trials (11), might assist in pharmacoeconomic analysis and in data analysis of the Cochrane Collaborative Project (12), and facilitate calculation of the number needed to treat in an approach-to-data analysis (13).

Presentation of results on an individual basis requires switching the continuous variables (for example, 0–100 mm change in pain visual analog scale [VAS]) into a dichotomous variable (for example, improvement in pain yes/no). Such a switch necessitates using a specific cutoff, permitting the differentiation of patients who present or do not present a clinically relevant change in the variable. The techniques permitting the determination of such a minimum clinically important difference (MCID) are numerous and were recently discussed during the last Outcome Measures in Rheumatology Clinical Trials meeting. Whatever the technique, the evaluation of MCID for a specific variable should take into account the noise related to its measurement error. Bland-Altman's limits of agreement method (14, 15) permits the evaluation of this noise due to measurement error. The results using this technique are generally presented as the smallest detectable difference (SDD) or minimum individual difference (16).

The results observed using this technique can be investigator-dependent or patient-dependent. For example, AS patients seem to have a greater disease severity in North Africa than in Western Europe (17). Moreover, the metrologic aspects of the collection of outcome variables can differ among countries, and thereby influence the results of the SDD.

Based on the above factors, we conducted a study on behalf of the ASAS group to evaluate the SDD of the ASAS-selected symptomatic outcome or process variables in 4 countries: France, Morocco, The Netherlands, and Spain.



Consecutive patients who fulfilled the modified New York criteria for ankylosing spondylitis (18), had differing levels of symptomatic activities, and were willing to participate in the study were recruited.


Centers in 4 cities (Cordoba, Spain; Maastricht, The Netherlands; Paris, France; and Rabat, Morocco) agreed to participate in the study. They were selected to include patients representing a large spectrum of activity and severity of AS in Europe and North Africa.

Outcome or process variables collected.

According to the ASAS proposal (6–8), 14 different outcome or process variables were evaluated in the following 6 domains: pain, stiffness, function, spinal mobility, patient global assessment, and entheses.

Domain of pain.

For this domain, spinal pain last week was assessed on 100-mm VAS; spinal pain at night last week was assessed on 100-mm VAS; and spinal pain was assessed according to the Food and Drug Administration (FDA) guidelines. The FDA spinal pain variable consists of a 0–4 Likert scale in which 0 = absence of pain during pressure, intensive percussion, and/or mobilization, with no spasm; 1 = minimal pain during pressure, intensive percussion, and/or mobilization, with no or minimal limitation of mobility; 2 = moderate pain during moderate pressure, percussion, mobilization, with no or minimal limitation of mobility; 3 = moderate or severe pain during light pressure, percussion, mobilization, with moderate or severe limitation of mobility; 4 = extreme and unbearable pain, even during minimal pressure or percussion, and practically no spinal mobility. The 4 evaluation areas are cervical, thoracic, lumbar spine, and the sacroiliac joint. The scores obtained in each of the 4 areas are totaled; therefore, this variable ranges from 0 to 16 (19).

Domain of stiffness.

Only the duration, in minutes, of morning stiffness of the spine last week was evaluated for this domain.

Domain of function.

Two variables were evaluated in this domain: the Dougados Functional Index (DFI) and the Bath Ankylosing Spondylitis Functional Index (BASFI) during the last week. DFI focuses on 20 questions related to function, and each is scored 0 to 2 (20). This variable ranges from 0 to 40. BASFI includes 10 questions pertaining to function and is measured on a 100-mm VAS (21). The score is the mean of the results of the 10 questions and ranges from 0 to 100.

Domain of spinal mobility.

Patients were examined for chest expansion, occiput-to-wall distance, and modified Schober test. All these variables were measured in centimeters. Chest expansion was measured with a centimeter tape measure at the level of the 4th intercostal space. Measurements were taken twice, at maximal inspiration and expiration. Chest expansion value was the greatest difference between measurements of inspiration and expiration. Occiput-to-wall distance was measured twice, with a centimeter tape measure, between the occiput and the wall, with the patient standing erect, back against a straight wall, looking horizontally ahead. Occiput-to-wall distance was the highest value obtained. For the modified Schober test, 2 skin marks were made over the lumbosacral spine of a patient standing erect, one on the 5th spinous epiphysis of the lumbar spine and the other 10 cm above the lumbosacral junction. The patient then bent forward to the limit without flexing the knees, and the distance between the upper and lower marks was measured.

Domain of patient global assessment.

Four variables were collected: disease activity of the last week on VAS, the Bath Ankylosing Spondylitis Patient Global Score (BAS-G) during the last 6 months, BAS-G during the last week, and the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) during the last week. BAS-G was measured on a 100-mm VAS (22). BASDAI consists of 6 questions relating to individual domains of fatigue, spinal pain, joint pain, and symptoms, together with perception of pain relating to the entheses (i.e., tender bony sites around the body) and to aspects of morning stiffness (quantity and quality) (23). The response to each of these questions was measured on a 100-mm VAS. The BASDAI score is the mean of the results of the 6 questions and ranges from 0 to 100.


Mander's Index was also collected. The Mander enthesis index scores pain from 0 to 3 in each of 30 entheses, i.e., ranging from 0 to 90 (24).

Patient evaluation.

In each center, a single investigator performed all the evaluations. The investigators were experienced clinicians usually evaluating AS patients. Each patient was evaluated twice after an interval of at least 1 day and no more than 1 week for the subjective variables. Between the 2 evaluations, the patient had to be in stable condition, defined by the physician's assessment, with no change in concomitant treatments.

Statistical analysis.

The statistical analysis was conducted in 3 different and subsequent steps. The demographic data and the baseline values of the outcome or process variables were compared among centers using the chi-square test for qualitative variables and analysis of variance for quantitative variables. Intraobserver reliability for the 14 outcome or process variables was assessed using 2 techniques: Bland-Altman's plots, including the limits of agreement (data not shown) (14, 15); and the random effect intraclass correlation coefficients (ICCs) and 95% confidence intervals (95% CIs) using Fleiss and Shrout formulas (25).

Determination of the SDD in the 14 outcome or process variables used Bland-Altman 95% limits of agreement method (14, 15). In this method, 95% limits of agreement are defined as dmean ± 1.96 SD, where dmean is the mean of differences between paired evaluations and SD, the standard deviation of the differences. Assuming no systematic bias (mean of differences equals 0), about 1.96 SD defines the SDD. The 95% CIs of the upper and lower limits of agreement were also calculated (14, 15).

The Modified Levene Equal-Variance test using one-way analysis of variance on differences between paired evaluations was performed with the Number Cruncher Statistical System 2000 for Windows 95 (NCSS, Kaysville, UT). We used a 2-tailed formulation. P values < 0.05 were considered statistically significant. This test was used to infer statistical comparisons among centers on the SDD.


Baseline characteristics.

Thirty patients were included in each center. Differences among centers were observed in most demographic characteristics (see Table 1): the percentage of men ranged from 53% (Paris) to 90% (Cordoba), the age (years) at study ranged from 35 (Rabat) to 51 (Maastricht), and age (years) at onset ranged from 21 (Cordoba) to 33 (Maastricht). The availability of HLA typing was different among centers, ranging from 100% (Cordoba) to 23% (Rabat). The structural severity defined by the presence of a bamboo spine was more pronounced in Morocco (33%) than in the 2 other centers in which this information was available (8% in Cordoba and 14% in Paris).

Table 1. Baseline characteristics of 120 ankylosing spondylitis patients included in 4 centers*
Outcome variablesCordobaMaastrichtParisRabatTotal
  • *

    Data presented as mean ± SD unless otherwise noted. VAS = visual analog scale; FDA = Food and Drug Administration; BASFI = Bath Ankylosing Spondylitis Functional Index; BAS-G = Bath Ankylosing Spondylitis Patient Global Score; BASDAI = Bath Ankylosing Spondylitis Disease Activity Index.

Men, n (%)27 (90)22 (73.3)16 (53.3)17 (56.7)82 (68.3)
Age, years44.6 ± 11.350.5 ± 8.735.9 ± 9.835.1 ± 9.041.6 ± 11.6
Age at disease onset, years20.7 ± 5.732.6 ± 9.728.8 ± 7.023.8 ± 9.424.9 ± 9.1
Disease duration, years24.0 ± 11.517.0 ± 10.112.2 ± 9.511.3 ± 8.917.1 ± 12.5
Body mass index28.2 ± 5.524.2 ± 3.523.6 ± 3.623.0 ± 4.724.7 ± 4.8
HLA–B27, n     
 Not available 1042337
Bamboo spine, n     
 Not available6302038
Spinal pain last week (VAS), mm78.8 ± 32.737.5 ± 27.339.5 ± 33.033.3 ± 31.739.8 ± 31.4
Spinal pain at night last week (VAS), mm45.1 ± 37.231.4 ± 29.236.8 ± 36.133.4 ± 35.636.7 ± 34.6
Spinal pain, FDA guidelines, 0–165.6 ± 3.81.5 ± 2.42.7 ± 2.92.7 ± 2.83.1 ± 3.3
Morning stiffness duration, 0–180 minutes38.5 ± 50.610.3 ± 12.152.8 ± 66.433.5 ± 45.240.1 ± 53.4
Dougados Functional Index, 0–4017.0 ± 7.69.1 ± 6.310.8 ± 8.314.3 ± 9.612.8 ± 8.5
BASFI, 0–100 mm47.5 ± 27.237.1 ± 22.930.1 ± 29.543.0 ± 29.839.4 ± 27.9
Chest expansion, 0–12 cm4.0 ± 2.34.0 ± 1.66.7 ± 2.35.0 ± 2.75.0 ± 2.5
Occiput to wall distance, 0–10 cm4.5 ± 6.34.1 ± 4.11.1 ± 2.64.9 ± 8.13.7 ± 5.8
Modified Schober test, 0–10 cm2.6 ± 1.72.8 ± 1.43.3 ± 1.62.3 ± 1.72.8 ± 1.6
Disease activity (VAS), 0–100 mm58.5 ± 32.136.0 ± 24.647.4 ± 30.949.9 ± 35.348.0 ± 31.6
BAS-G last 6 months, 0–100 mm54.9 ± 32.738.3 ± 24.455.4 ± 30.756.1 ± 30.351.2 ± 30.3
BAS-G last week, 0–100 mm51.8 ± 32.853.9 ± 20.746.7 ± 32.850.7 ± 36.650.8 ± 31.0
BASDAI, 0–100 mm50.7 ± 28.036.3 ± 23.036.6 ± 29.534.6 ± 25.740.3 ± 27.2
Mander Enthesis Index, 0–905.9 ± 6.410.3 ± 12.18.3 ± 9.214.8 ± 18.09.8 ± 12.5

Reliability and SDD.

ICCs observed in the centers were reasonable or high with tolerably narrow 95% CIs for most variables (Table 2). For the modified Schober test, ICCs were ≤0.60 with a large 95% CI in all but 1 center. They appeared to differ statistically for morning stiffness, occiput-to-wall distance, the modified Schober test (95% CI in 1 center did not overlap with that of the other 3 centers in each case), and BAS-G during the last week (95% CI in 2 centers did not overlap). When including all 120 patients, ICCs were, however, reasonable or high in all centers (ICC ≥ 0.76), except for the modified Schober test (ICC = 0.60).

Table 2. Intraclass correlation coefficients (95% confidence interval) of paired evaluations of 14 outcome or process variables in 120 ankylosing spondylitis patients included in 4 centers*
Outcome variablesCordobaMaastrichtParisRabatTotal
  • *

    See Table 1 for abbreviations.

Spinal pain last week (VAS), 0–100 mm0.88 (0.76–0.94)0.93 (0.86–0.97)0.87 (0.75–0.94)0.78 (0.59–0.89)0.86 (0.81–0.90)
Spinal pain at night last week (VAS), 0–100 mm0.86 (0.74–0.93)0.92 (0.84–0.96)0.82 (0.66–0.91)0.75 (0.54–0.87)0.83 (0.77–0.88)
Spinal pain, FDA guidelines, 0–160.98 (0.95–0.99)0.88 (0.77–0.94)0.92 (0.84–0.96)0.73 (0.51–0.86)0.92 (0.89–0.94)
Morning stiffness duration, 0–180 minutes0.77 (0.56–0.89)0.87 (0.75–0.94)0.98 (0.95–0.99)0.75 (0.55–0.87)0.76 (0.67–0.83)
Dougados Functional Index, 0–400.87 (0.74–0.93)0.93 (0.87–0.97)0.86 (0.73–0.93)0.95 (0.90–0.98)0.93 (0.90–0.95)
BASFI, 0–100 mm0.88 (0.77–0.94)0.89 (0.78–0.95)0.97 (0.93–0.98)0.94 (0.87–0.97)0.93 (0.90–0.95)
Chest expansion, 0–12 cm0.85 (0.72–0.93)0.76 (0.56–0.88)0.89 (0.78–0.95)0.88 (0.76–0.94)0.98 (0.83–0.92)
Occiput to wall distance, 0–10 cm0.96 (0.91–0.98)0.93 (0.86–0.97)0.99 (0.97–0.99)1.0 (1.0–1.0)0.98 (0.97–0.98)
Modified Schober test, 0–10 cm0.50 (0.19–0.73)0.48 (0.15–0.71)0.94 (0.90–0.97)0.57 (0.27–0.77)0.60 (0.47–0.70)
Disease activity (VAS), 0–100 mm0.86 (0.74–0.93)0.86 (0.73–0.93)0.87 (0.75–0.94)0.68 (0.43–0.83)0.81 (0.74–0.87)
BAS-G last 6 months, 0–100 mm0.85 (0.72–0.93)0.79 (0.61–0.90)0.95 (0.89–0.97)0.82 (0.66–0.91)0.85 (0.72–0.93)
BAS-G last week, 0–100 mm0.91 (0.82–0.96)0.85 (0.72–0.93)0.93 (0.85–0.96)0.67 (0.41–0.83)0.91 (0.82–0.96)
BASDAI, 0–100 mm0.94 (0.88–0.97)0.94 (0.87–0.97)0.93 (0.86–0.97)0.93 (0.87–0.97)0.94 (0.91–0.95)
Mander Enthesis Index, 0–900.84 (0.69–0.92)0.96 (0.92–0.98)0.90 (0.81–0.95)0.95 (0.90–0.98)0.94 (0.91–0.96)

Smallest detectable difference.

Based on Bland-Altman's technique, Table 3 summarizes the results in terms of SDD for the 14 different variables. These results are presented for the entire group of 120 patients. The SDD, expressed in percentage of the range of the variable, ranged from 10% (Mander Enthesis Index) to 39% (spinal pain at night last week). Evaluation of intercenter differences reached statistical significance only for the SDD of the occiput-to-wall distance (P = 0.011), which appeared lower in the French and Moroccan centers than in the 2 others.

Table 3. Range of change values, smallest detectable difference, and 95% confidence interval (95% CI) of 95% limits of agreement of 14 outcome or process variables in 120 ankylosing spondylitis patients included in 4 centers*
Outcome variablesRange of changesSmallest detectable difference95% CI of 95% limits of agreement
  • *

    See Table 1 for abbreviations.

Spinal pain last week (VAS), 0–100 mm−65.5, 5033.5−38.2, 39.4
Spinal pain at night last week (VAS), 0–100 mm−83, 5538.7−44.6, 45.2
Spinal pain, FDA guidelines, 0–16−6, 52.5−3.1, 2.7
Morning stiffness duration, 0–180 min−120, 15054.7−62.2, 65.1
Dougados Functional Index, 0–40−13, 166.9−7.6, 8.5
BASFI, 0–100 mm−37.2, 34.6821.3−24.8, 24.7
Chest expansion, 0–12 cm−4, 3.52.4−2.6, 2.7
Occiput to wall distance, 0–10 cm−8, 42.5−2.8, 2.9
Modified Schober test, 0–10 cm−3.5, 113.3−3.5, 4.1
Disease activity (VAS), 0–100 mm−73, 5638.4−45.7, 43.3
BAS-G last 6 months, 0–100 mm−43.3, 5630.7−34.6, 36.6
BAS-G last week, 0–100 mm−74, 8035.1−42.2, 39.3
BASDAI, 0–100 mm−46.9, 20.819.6−23.2, 22.3
Mander Enthesis Index, 0–90−18, 218.8−9.5, 11.1


This study confirms the existence of intercountry differences in the clinical presentation of AS (17). It permitted us to evaluate the reliability of 14 outcome or process variables in 120 patients included in 4 different centers. The use of Bland-Altman's 95% limits of agreement method also permitted us to propose a cutoff defining change for each of these outcome or process variables.

The reliability of these variables has been previously assessed in single centers (22, 23, 26–34). Although differing among studies, reliability has been reported good for most of these variables, except spinal pain (27) and the duration of morning stiffness (27, 29). For chest expansion, results have been contradictory. It was reported as unreliable in the study of Viitanen et al (31) and reliable in other studies (29, 30, 33). The methodology used in these studies differed in the following ways: statistical methods, some of which were inappropriate for the purpose of establishing reliability; and various time intervals between repeated measurements of patients, from less than 1 day (22, 30, 31) to more than 1 week (27). In our study, ICCs varied with the center. However, a statistically significant difference between centers for ICC appeared for only 4 variables. Most outcome or process variables measured in the centers had satisfactory ICCs, even for chest expansion, the duration of morning stiffness, and spinal pain. For the modified Schober test, ICCs were low in all but 1 center. We also assessed reliability using Bland-Altman's 95% limits of agreement method. There are few previous studies in which Bland-Altman's 95% limits of agreement method was used for assessing the reliability of clinical outcome in AS. With repeated measurements over a period of only 2 hours, Viitanen et al (34) reported that intraobserver reliability of the modified Schober test was good using this method. Defining change on an individual basis has been recommended in clinical trials (9, 10). In a continuous variable, this criterion depends on the definition of clinically relevant change over a certain cutoff. Different techniques have been proposed to define such a cutoff (16). SDD is a technique permitting one to take into account measurement error (10, 35). Indeed, this is the first study providing SDD of clinical outcome or process variables in AS based on Bland-Altman's 95% limits of agreement method (14, 15). However, Pile et al (36), repeating measurements of the modified Schober test and chest expansion in 10 patients, reported that 90% of measurements repeated will vary up to 1.5 cm and 1.2 cm, respectively. These values are close to the SDD observed in this study for both outcome variables.

The SDD was relatively high for changes in all the 14 variables in this study. Thus, although ICCs were high, individual variables appeared to be poorly reliable as judged by SDD. Consequently, changes smaller than SDD observed in individual patients with such variables would not be considered as actual.

One limitation of this study was that patients were evaluated twice within a short time period. Long-term within-patient changes, e.g., 1 year, might result in different SDDs. However, over a short time period in which patients remain in stable condition between repeated evaluations, changes observed are largely measurement error. Another limitation is that we did not define change for multiple domains because patients may change for some outcome variables and not for others at the same time.

This study provided a cutoff in individual outcome variables of different domains in AS. Other studies are necessary to define a cutoff taking into account multiple domains.