SEARCH

SEARCH BY CITATION

Keywords:

  • Ankylosing spondylitis;
  • Responsiveness;
  • Discriminative capacity;
  • ASAS DC-ART core set;
  • Etanercept

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

Objective

To investigate the responsiveness and discriminative capacity, and the relationship between both, of instruments selected for the disease-controlling antirheumatic therapy (DC-ART) core set by the Assessments in Ankylosing Spondylitis Working Group (ASAS).

Methods

Responsiveness and discriminative capacity of different measures reflecting disease activity and function, either included in the ASAS DC-ART core set or not, were evaluated in a randomized controlled clinical trial comparing etanercept with placebo in patients with ankylosing spondylitis. Guyatt's method was used as the primary analysis for responsiveness, and Student's t-test for discriminative capacity.

Results

At day 28 of therapy, almost all measures indicated moderate to large responsiveness in the etanercept group (Guyatt 0.60–3.11). Some scales of the Short Form 36 (general health, mental component summary, and role emotional), the modified Schober's test, and the Fatigue Severity Scale were not responsive. The results were similar if analyzed at day 112 of therapy. Peripheral joint counts, joint scores, and occiput-to-wall distance could not be evaluated due to a floor effect. In general, the relation between responsiveness and discriminative capacity was strong: Measures that demonstrated high responsiveness also showed high between-group t values.

Conclusion

Measures included in the ASAS DC-ART core set, except modified Schober's test, have good responsiveness and good discriminatory capacity. Some measures could not be evaluated due to a floor effect.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

There are many outcome measures available to evaluate patients with ankylosing spondylitis (AS). To develop internationally standardized endpoints for use in clinical trials and practice, the Assessments in Ankylosing Spondylitis Working Group (ASAS) was formed in 1995. Core sets were developed by the ASAS for the following 3 settings: disease-controlling antirheumatic therapy (DC-ART), symptom-modifying antirheumatic drugs (SMARD) and physical therapy, and clinical record keeping (1). The domains physical function, pain, spinal mobility, stiffness, and the patient's global assessment are included in all 3 settings. In addition, the domains peripheral joints and entheses, and acute phase reactants are added for the settings DC-ART and clinical record keeping; the domains radiographs (of spine and hips) and fatigue are added for the DC-ART setting. Selection of the specific instruments to be included in each domain was determined by consensus among ASAS members (2). The instruments selected for the DC-ART core set have not yet been validated with respect to the Outcome Measures in Rheumatology Clinical Trials (OMERACT) filter: truth, discrimination, and feasibility (3).

Recent studies have suggested that tumor necrosis factor α inhibiting (anti-TNFα) therapy may be promising for AS. Anti-TNFα therapy has demonstrated to be efficacious in spondylitis (4, 5) and therefore the data set of a clinical trial with anti-TNFα therapy offers an opportunity to evaluate the ASAS DC-ART core set. To discriminate in trials between effective treatment and placebo, a measure should be responsive. Highly responsive measures are preferred because they facilitate the detection of improvement. However, responsiveness alone is not sufficient to assure the detection of small, but potentially important, differences between effective treatment and placebo (discriminative capacity).

The purpose of this study was twofold. First, to investigate the responsiveness and discriminative capacity of the ASAS DC-ART core set in a trial of etanercept in AS (6). Second, to evaluate the relationship between responsiveness and discriminatory capacity of the measures included in the ASAS DC-ART core set.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

The clinical trial with anti-TNFα therapy

The study was a 4-month double-blind clinical trial that randomly assigned 40 patients with AS, as defined by the modified New York criteria (7), to the treatment or placebo group. All patients had an active spondylitis, defined by morning stiffness of more than 45 minutes, inflammatory back pain, and patient and physician global assessment of disease activity of moderate or higher. Patients were allowed to continue standard therapies for AS as long as they were taking a stable dosage. Patients received 25 mg of etanercept or placebo subcutaneously twice weekly for 4 months.

DC-ART core set measures

For the 7 DC-ART domains the following instruments were assessed.

Function:

The Bath Ankylosing Spondylitis Functional Index (BASFI) (8) and the Dougados Functional Index (DFI) (9).

Pain:

Visual analog scale (VAS) for spinal pain at night over the past week and an overall VAS due to spinal pain over the past week.

Spinal mobility:

Chest expansion (10), modified Schober's test (11), and occiput-to-wall distance.

Patient global:

Assessment of patient's global well being over the past week, rated on a 5-point Likert scale.

Stiffness:

Duration of morning stiffness, expressed in minutes, experienced on the day preceding the visit.

Peripheral joints and entheses:

Number of swollen joints, counted in 42 diarthrodial joints.

Acute phase reactants:

The erythrocyte sedimentation rate (ESR) was measured as an acute phase reactant.

Other single measures and indices

Additional assessments of the peripheral joints and entheses were added.

Joint tenderness was counted (tender joint count) and tenderness and swelling scored (tender joint score and swollen joint score) in diarthrodial joints: 44 joints for tenderness evaluation and 42 joints for swelling evaluation (no hips in swollen joint score/count). These were rated on a 4-point scale: 0 = no swelling; 1 = mild (detectable synovial thickening without loss of bony contours); 2 = moderate (loss of distinct bony contours); and 3 = severe (bulging synovial proliferation with cystic characteristics).

Enthesopathy was scored by means of the modified enthesopathy index: uniform manual pressure is applied to the vertebral processes of C1–C2, C7–Th1, Th12–L1, L5–S1, the symphysis pubis, both greater trochanters, pelvic abductor origin, anterior superior border of the iliac crests, ischial tuberosities, insertions of Achilles tendons, and plantar fascia. Tenderness was scored on a 4-point scale (0 = no pain, 1 = mild tenderness, 2 = moderate tenderness, 3 = wince or withdrawal).

Physician global assessment was measured by means of a VAS of overall disease activity.

Fatigue was measured with the Fatigue Severity Scale (12) and quality of life was assessed with the Medical Outcomes Study Short Form Health Survey (SF-36) (13). The SF-36 has 8 scales: physical functioning, social functioning, role limitations due to a physical problem, role limitations due to an emotional problem, mental health, vitality, pain, and general health. There are 2 summary measures (physical and mental component summary) calculated from scores of the individual scales.

Statistical analysis

All analyses were based on intention to treat: Only 3 patients dropped out of the trial. A last-value-carried-forward approach was done for the data obtained from these patients.

The day 28 and day 112 data are reported to allow further exploration of trends in responsiveness and discriminatory power.

Three statistical methods were used to assess responsiveness: the standardized response mean (SRM), the effect size (ES), and the Guyatt method. The Guyatt method is viewed by some as the superior responsiveness statistic because this statistic takes into account the variability of the placebo group (14). Ranking the measures in order of the Guyatt method was used as the primary responsiveness statistic in this study. For all responsiveness statistics, values of 0.20, 0.50, and 0.80 or greater have been advocated to represent small, moderate, and large responsiveness, respectively (15–17).

Guyatt method

The formula for Guyatt's responsiveness index (18) is Δx/√2 × MSEx where Δx = minimally clinically important change on the measure and MSEx is the mean squared error of X obtained from an analysis of variance model that examines repeated observations of the measure in clinically stable subjects. Alternatively, if there are only 2 observations of the measure, MSEx is the standard deviation (SD) of the individual change scores in clinically stable patients (i.e., placebo-treated patients) (19). Guyatt's index is calculated as the ratio of the mean change of patients in the etanercept group divided by the SD of change of patients in the placebo group (20).

Standardized response mean

The SRM is calculated as the mean change in scores divided by the SD of these changes.

Effect size

The ES is the difference between the mean baseline and followup scores on the measure, divided by the SD of the baseline scores.

Assessment of floor effects

Floor effects may impair responsiveness because patients with very low baseline values cannot improve further. Histograms of the cross-sectional analysis were used to examine the presence of floor effects.

Discrimination

The independent unpaired Student's t-test values of each variable are reported for the discriminatory capacity. All t-tests were 2 sided with a significance level (α) of 0.05 resulting in a critical t value of 2.03.

Relationship between responsiveness and discriminative capacity

To evaluate the relationship between responsiveness and discriminative capacity, the responsiveness statistics were plotted versus the t values. For the correlation between the primary responsive statistic (the Guyatt method) and the discriminatory capacity, the Spearman correlation coefficient was also calculated.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

Both the treatment and placebo groups included 20 patients (6). The groups were acceptably balanced with respect to important demographic and prognostic variables (Table 1). Some outcome measures demonstrated an extremely skewed distribution by histograms, so analyses for responsiveness and discriminatory capacity were not performed for these variables. The following measures were excluded from further analysis: swollen joint score, tender joint score, swollen joint count, tender joint count, modified enthesopathy index, and occiput-to-wall distance. However, in those patients with the potential for improvement in the above measures, a positive treatment effect of etanercept was seen (Tables 2 and 3).

Table 1. Baseline characteristics of patients
CharacteristicsEtanercept n = 20Placebo n = 20
Male, %6590
White, %7570
HLA–B27 positive, %9590
Age, mean ± SD, years38 ± 1039 ± 10
Disease duration, mean ± SD, years15 ± 1012 ± 9
Table 2. Indices of responsiveness at day 28 of followup, for the treatment and placebo group, ordered by the Guyatt method*
 Treatment group (n = 20)Placebo group (n = 20)Guyattt
Mean changeSDSD baselineSRMESMean changeSDSD baselineSRMES
  • *

    SD = standard deviation; SRM = standardized response mean; ES = effect size; ESR = erythrocyte sedimentation rate; VAS = visual analog scale; BASFI = Bath Ankylosing Spondylitis Functional Index; SF-36 = Short Form 36. t values unpaired (i.e., between group), t values smaller than 2.03 are not statistically significant; between parentheses is the ranking order of t values. Minus (−) indicates deterioration.

  • Due to a floor effect, responsiveness statistics and discriminatory capacity could not be calculated.

ESR25.3516.3123.091.551.10−0.858.1616.33−0.10−0.053.116.42 (2)
Physician global assessment23.8514.8319.621.611.22−2.539.5916.43−0.26−0.152.496.63 (1)
VAS nocturnal pain32.3525.8123.891.251.35−3.9519.4525.34−0.20−0.161.665.02 (3)
Patient global assessment1.050.830.731.271.440.250.640.690.390.361.643.43 (6)
VAS overall pain27.7022.1021.301.251.30−3.5020.1424.88−0.17−0.141.384.67 (5)
BASFI1.621.452.061.120.79−0.451.242.45−0.37−0.181.314.85 (7)
Morning stiffness60.0056.9950.561.051.19−3.0049.8370.69−0.06−0.041.203.72 (4)
Chest expansion0.600.791.630.760.370.140.601.650.220.080.992.08 (10)
SF-36: role physical34.9842.4222.160.821.588.7536.5243.130.240.200.962.10 (15)
SF-36: bodily pain20.9515.8115.201.331.380.0023.0421.840.000.000.913.35 (12)
SF-36: vitality10.2312.3518.830.830.541.7511.3916.750.150.100.902.26 (13)
SF-36: physical functioning12.0018.0218.520.670.65−2.2514.0923.37−0.16−0.100.852.79 (11)
SF-36: physical component summary6.918.508.200.810.840.538.1410.090.060.050.852.42 (9)
Dougados functional index3.905.106.120.760.640.905.226.770.170.130.751.84 (8)
SF-36: social functioning13.1016.9225.220.770.52−4.3819.5725.74−0.22−0.170.673.02 (14)
SF-36: mental health7.2016.5019.880.440.36−0.8011.9415.71−0.07−0.050.601.76 (16)
SF-36: mental component summary3.449.1612.970.380.27−0.627.639.24−0.08−0.070.451.52 (17)
SF-36: role emotional16.6742.5844.430.390.380.0038.9938.840.000.000.431.29 (20)
Modified Schober0.180.611.500.300.120.050.451.480.100.030.400.78 (18)
Fatigue Severity Scale0.231.031.550.220.15−0.351.391.52−0.25−0.230.161.49 (19)
SF-36: general health−0.1511.8225.62−0.01−0.01−0.8016.7719.15−0.05−0.04−0.010.14 (21)
Modified enthesopathy index3.44.928.430.92.517.88
Occiput-to-wall distance0.381.787.92−0.200.683.54
Swollen joint count0.602.017.12−0.101.594.45
Swollen joint score0.652.668.060.151.905.26
Tender joint count1.853.316.77−2.058.139.43
Tender joint score3.606.8510.47−4.6516.9411.82
Table 3. Indices of responsiveness at day 112 of followup, for the treatment group and placebo group, ordered by the Guyatt method*
 Treatment group (n = 20)Placebo group (n = 20)Guyattt
Mean changeSDSD baselineSRMESMean changeSDSD baselineSRMES
  • *

    See Table 2 for definitions.

  • Due to a floor effect, responsiveness statistics and discriminatory capacity could not be calculated.

ESR23.6021.6623.091.091.02−1.256.2616.33−0.20−0.083.774.43 (3)
Physician global assessment30.4520.2419.621.501.55−1.2012.6116.43−0.10−0.072.415.94 (1)
BASFI2.551.932.061.321.24−0.201.222.45−0.16−0.082.105.40 (2)
VAS nocturnal33.1522.1523.891.501.395.3520.3525.340.260.211.634.13 (4)
Patient global assessment1.050.940.731.121.440.350.670.690.520.511.572.70 (9)
SF-36: physical component summary10.9310.388.201.051.333.227.7210.090.420.321.422.67 (10)
SF-36: physical functioning21.8022.5518.520.971.180.5016.4623.370.030.021.323.41 (6)
Dougados6.656.186.121.081.091.055.046.770.210.161.323.14 (7)
VAS overall25.7528.7121.300.901.215.9521.7724.880.270.241.182.46 (13)
SF-36: bodily pain26.1724.3915.201.071.725.4422.6221.840.240.251.162.79 (8)
Morning stiffness60.3075.9750.560.791.1917.6355.1270.690.320.251.092.03 (17)
SF-36: social function21.8525.2425.220.870.87−6.2520.0725.74−0.31−0.241.093.90 (5)
SF-36: role physical46.2346.7622.160.992.0911.2544.0443.130.260.261.052.44 (14)
Chest expansion0.711.251.630.570.440.030.831.650.030.020.852.02 (18)
SF-36: vitality12.9817.3418.830.750.69−0.2516.1016.75−0.02−0.010.812.50 (11)
Krupp0.671.301.550.520.43−0.220.931.52−0.24−0.140.722.50 (11)
SF-36: mental health9.2016.7319.880.550.46−2.2014.7115.71−0.15−0.140.632.29 (16)
SF-36: general health5.3016.4525.620.320.213.6510.3719.150.350.190.510.38 (21)
Modified Schober0.260.671.500.390.17−0.070.541.48−0.13−0.050.481.71 (20)
SF-36: mental component summary3.8910.0712.970.390.30−3.169.249.24−0.34−0.340.422.30 (15)
SF-36: Role-emotional18.3441.1544.430.450.41−8.3346.9938.84−0.18−0.210.391.91 (19)
Modified enthesopathy index5.306.598.431.704.617.88
Occiput-to-wall distance1.002.297.92−0.702.223.54
Swollen joint count1.854.287.12−0.302.054.45
Swollen joint score2.155.148.06−0.503.075.26
Tender joint count4.105.766.77−1.256.709.43
Tender joint score6.159.6910.47−3.1513.5611.82

At day 28, most remaining measures indicated moderate to large responsiveness (Guyatt 0.60–3.11), except several components of the SF-36 (general health, mental component summary, and role emotional), the modified Schober, and the Fatigue Severity Scale (Table 2, Figure 1). The results for day 112 were generally similar, with somewhat larger responsiveness for most measures (Guyatt 0.51–3.77), except some components of the SF-36 (mental component summary and role emotional) and the modified Schober test (Table 3, Figure 2). More measures demonstrated low responsiveness when results from all 3 statistical methods were considered (<0.50): chest expansion and SF-36 mental health for day 28; and chest expansion, Fatigue Severity Scale, and SF-36 mental health for day 112.

thumbnail image

Figure 1. Comparison of responsiveness and discrimination performance of measures at day 28. See Table 2 for definitions.

Download figure to PowerPoint

thumbnail image

Figure 2. Comparison of responsiveness and discrimination performance of measures at day 112. See Table 2 for definitions.

Download figure to PowerPoint

Concerning the ASAS core set (Table 4), it can be seen that for the domain function, both the BASFI and DFI demonstrated a large degree of responsiveness at day 112. At day 28, BASFI demonstrated a greater responsiveness than the DFI (large versus moderate, respectively). The BASFI also appeared to have a higher discriminative power than the DFI.

Table 4. Responsiveness statistics and unpaired t values for the DC-ART ASAS core set for the treatment group*
DomainInstrumentDay 28Day 112
GuyattSRMEStGuyattSRMESt
  • *

    DC-ART = disease-controlling antirheumatic therapy; ASAS = Assessments in Ankylosing Spondylitis Working Group; SRM = standardized response mean; ES = effect size; BASFI = Bath Ankylosing Spondylitis Functional Index; VAS = visual analog scale; ESR = erythrocyte sedimentation rate. t values are unpaired (i.e., between group); t values smaller than 2.03 are not statistically significant.

  • Due to a floor effect, responsiveness statistics and discriminatory capacity could not be calculated.

FunctionBASFI1.311.120.794.852.101.321.245.40
 Dougados Functional Index0.750.760.641.841.321.081.093.14
PainVAS nocturnal1.661.251.355.021.631.501.394.13
 VAS overall1.381.251.304.671.180.901.212.46
Spinal mobilityChest expansion0.990.760.372.080.850.570.442.02
 Modified Schober0.400.300.130.780.480.390.171.71
 Occiput-to-wall distance0.560.210.051.350.450.440.132.38
Patient globalVAS last week1.641.271.443.431.571.121.442.70
StiffnessMorning stiffness1.201.051.193.721.090.791.192.03
Peripheral joints and enthesesNumber of swollen joints
Acute phase reactantsESR3.111.551.106.423.771.091.024.43

Both instruments in the domain pain showed excellent responsiveness statistics and a high discriminative power. The responsiveness for the instruments in the domain spinal mobility was only good for the outcome measure chest expansion. The modified Schober test had a small responsiveness, and occiput-to-wall distance was not analyzed due to a floor effect. In addition, both measures could not significantly discriminate between placebo and etanercept-treated patients. The domain patient global assessment had excellent responsiveness statistics and a high discriminative power. The domain stiffness was largely responsive and highly discriminative within the first month of treatment, although the discriminative capacity decreased by day 112 and was only barely significant. The number of swollen joints, the instrument for the domain peripheral joints and entheses, was susceptible to a floor effect, and so was not able to be evaluated for responsiveness. The instrument ESR for the final domain acute phase reactants has an excellent responsiveness and discriminatory capacity.

The results are presented at 2 time points: day 28 and day 112. It is noticeable that by 28 days some variables show already a good responsiveness (Guyatt > 0.80) and discriminatory capacity (t > 2.03). By the end of the trial, these measures still have a good responsiveness and discriminatory capacity. The following variables belong to this group: ESR, physician global assessment, BASFI, VAS nocturnal, patient global assessment, VAS overall, and 5 of the SF-36 scales (physical component summary, physical functioning, bodily pain, role physical, and vitality). When attention is paid to the rank order of the measures at the 2 time points, it can be seen that those measuring similar outcomes appear to aggregate by the end of the trial. For instance, although at day 28 the SF-36 vitality scale was superior to the Fatigue Severity Scale in responsiveness and discriminatory power, these differences were lessened by the end of the trial.

The relationship between responsiveness statistics and between-group discrimination is seen in Figures 1 and 2. In general, there appeared to be a strong correlation between responsiveness and discriminatory capacity. The Spearman correlation coefficients for the Guyatt responsiveness statistics and t values are 0.92 and 0.85, respectively, at days 28 and 112.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES

In this study, we examined the responsiveness and discriminative power of the DC-ART core set. The first domain was function. Two instruments were selected for this domain: BASFI and DFI. In our study, although both the BASFI and the original 3-point DFI revealed excellent discriminative power and responsiveness, the BASFI appeared to be capable of detecting change earlier. Already at day 28 this measure demonstrated an excellent responsiveness and discriminative capacity, as compared with DFI. The differences between these 2 instruments in the context of the OMERACT filter has previously been reviewed by Ruof and Stucki (21). Based upon their review of the comparative usefulness of both indices, no definite preference could be identified, although a slight partiality existed for the BASFI because of the fewer number of questions. In addition to a review of the literature, Ruof et al (17) designed a model to compare responsiveness of the BASFI, DFI, and the Health Assessment Questionnaire modified for the spondylarthropathies (HAQ-S) (22), and concluded that the BASFI appeared to be more appropriate than either the DFI or HAQ-S. The 3 response categories in the DFI and HAQ-S are “yes, with no difficulty,” “yes, but with difficulty,” and “no.” The middle option covers a broad range of possible responses, reducing the possibility to change. Therefore, a modified DFI with a 5-point Likert scale instead of a 3-point one has been developed for use in studies (23). However, no definite choice could be made between BASFI and DFI with this modified DFI.

For the domains pain and patient global assessment, this study has shown that the selected instruments have a large responsiveness and discriminative capacity. Although the measure morning stiffness in the domain stiffness has a good responsiveness, the t value was barely statistically significant. This means that there is an effect in the treatment group (i.e., good responsiveness; Guyatt = 1.09), but that this effect creates a small, hardly significant contrast between the placebo and treatment group. Because the discriminative capacity is influenced by the number of patients, it is possible that the measure will have better discriminative capacity when evaluated in a larger number of patients.

For the domain spinal mobility, only chest expansion demonstrated a good responsiveness, but the discriminative capacity was not very good. Again, this may be explained by the small number of patients. It is necessary to recognize that the selected patient population is likely an important factor in the assessment of spinal mobility. There is a large chance that, given the mean disease duration of 13 years for patients in this trial, a number of the patients may have had some degree of spinal ossification, limiting their ability to improve in some of these outcomes. It must also be taken into account that for this domain, the period of followup might have been too short. Further work is needed to establish for which patients the measures of spinal mobility might be responsive (e.g., early disease) and which period of followup is necessary (e.g., ≥1 year).

For the domain peripheral joints and entheses, only the number of swollen joints was selected by ASAS. In this study, this measure was vulnerable to a floor effect and therefore no conclusions can be made about discriminative capacity and responsiveness. The modified enthesopathy index was not selected by ASAS, but also was unable to be evaluated because of a floor effect.

The last domain of the ASAS DC-ART core set, acute phase reactants (ESR), proved to be the most responsive and discriminative in this trial. It must be kept in mind that the results need to be interpreted in the context of this trial; this means that it concerns a selected study population of patients with a high disease activity in established disease, treated with anti-TNFα therapy.

Concerning other indices and measures studied in this trial, several issues can be noted. The measures for assessment of the peripheral joints and entheses all suffer from a floor effect: a large percentage of patients have no affected joints and/or entheses. These measures are context specific and their appropriateness will depend upon the population being investigated. The physician's global assessment measured by means of a VAS has an excellent responsiveness and discriminative capacity. From this point of view it may be an asset to the core set. The question is if the measure has an additional intrinsic value, as it is known that ankylosing spondylitis patients can accurately assess their own disease activity (24). Furthermore, it is also likely that physicians take into account other variables, such as ESR, when assessing the physician global, which consequently makes the physician global assessment a dependent measure.

The ASAS working group did not select an instrument for fatigue because not enough data on the performance of variably instruments in AS were available. The one used in this study, the Fatigue Severity Scale, demonstrated a moderate responsiveness and good discriminative capacity. In addition, the vitality scale on the SF-36 also demonstrated good responsiveness and discriminatory capacity and, with respect to content, it is fairly comparable with the Fatigue Severity Scale. Therefore, this individual scale may be able to be used as a means of measuring fatigue.

The final measure was the SF-36. It turned out that the different scales of the SF-36 had varying degrees of responsiveness and discriminative power. It seems that the items focusing on physical function and pain were more responsive than the items dealing with mental health and emotions. For patients with AS, treatment with a TNF blocking agent seems to have more effect on pain and physical functioning than on mental health and emotions. This is coherent with the earlier described fact that after pain and stiffness, one of the most important complaints of patients with AS is disability (25). These aspects are already assessed by the other measures in the ASAS core set and, because the scales related to mental health and emotions demonstrate a small responsiveness, the SF-36 appears to contribute little in the setting of a clinical trial. However, it is possible that the SF-36 could be helpful in allowing for comparison of quality of life of patients with other diseases.

So far, the ASAS working group has only chosen single measures. However, the value of combined indices, such as the Bath Ankylosing Spondylitis Disease Activity Index (26) and the Bath Ankylosing Spondylitis Metrology Index (27), may warrant further investigations. Considering responsiveness, it could be expect that indices are more responsive than single measures, as combining reduces scatter.

Improvement/response criteria have already been developed in several areas within rheumatology, and recent improvement criteria for AS have been studied by Anderson et al (28). They developed criteria for improvement in AS based upon the 5 domains of the ASAS SMARD core set (function, pain, spinal mobility, patient global, and stiffness) by using outcome data from placebo-controlled clinical trials of nonsteroidal antiinflammatory drugs (NSAIDs). It was concluded that all the domains were appropriate except spinal mobility because of a lack of responsiveness of the mobility measures. Results from our study support their conclusions, except for the domain spinal mobility. This domain has 3 measures, chest expansion, modified Schober, and occiput-to-wall distance. Although it was not possible to do responsiveness analysis for the occiput-to-wall distance, we did demonstrate an acceptable responsiveness for chest expansion. However, the discriminative capacity was relatively low. In NSAID trials, the mobility measures are not responsive and in our study, chest expansion is responsive; this might be because it is less likely that spinal mobility is impacted by NSAID treatment than by TNF blocking therapy.

In this study we have shown a tight relationship between responsiveness and discriminative capacity. Most measures with good responsiveness also showed good discriminative capacity, and vice versa. In this particular case, this tight relationship is due to the large contrast between active intervention and placebo, and partly to the definition of the responsiveness statistic we have chosen, since the Guyatt effect size encompasses the variation observed in the placebo group. A responsive measurement will not be discriminative in all situations, however. Responsiveness statistics are considered measurement specific, and can be used across different studies; discrimination statistics depend on the responsiveness of a measurement plus context-specific factors, such as sample size, treatment contrast, variation in the control group, etc.

Differences in responsiveness, and especially discrimination, may have important implications for clinical trial design. The use of measures that are both responsive and discriminative increase the statistical power of a clinical trial.

It is for the first time that the responsiveness and discriminative capacity of the ASAS DC-ART core set have been evaluated. The sample size of this study may limit the generalizability because only a small selection of patients with active longstanding disease have been investigated. On the other hand, finding these results in such a severely afflicted patient population adds to the validity of the results. It is important to realize that etanercept may specifically influence certain measures, and that other therapies may lead to different effects. Therefore, responsiveness and discriminative validity should also be assessed in trials with other treatments.

In summary, this study has confirmed responsiveness and discriminative capacity of all measures included in the ASAS DC-ART core set, with the exception of the domains spinal mobility (although the instrument chest expansion is responsive) and peripheral joints. In addition to measures of the ASAS DC-ART core set, other measures have shown to be very responsive and discriminative. The most important were physician's global assessment of disease activity and the physical functioning and pain scales of the SF-36.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  • 1
    Van der Heijde D, Bellamy N, Calin A, Dougados M, Khan MA, van der Linden S, Assessments in Ankylosing Spondylitis Working Group. Preliminary core sets for endpoints in ankylosing spondylitis. J Rheumatol 1997; 24: 22259.
  • 2
    Van der Heijde D, Calin A, Dougados M, Khan MA, van der Linden S, Bellamy N, Assessments in Ankylosing Spondylitis. Selection of instruments in the core set for DC-ART, SMARD, physical therapy, and clinical record keeping in ankylosing spondylitis: progress report of the ASAS Working Group. J Rheumatol 1999; 26: 9514.
  • 3
    Boers M, Brooks P, Strand CV, Tugwell P. The OMERACT filter for Outcome Measures in Rheumatology. J Rheumatol 1998; 25: 1989.
  • 4
    Baeten D, Kruithof E, Van den Bosch F, Demetter P, Van Damme N, Cuvelier C, et al. Immunomodulatory effects of anti-tumor necrosis factor alpha therapy on synovium in spondylarthropathy: histologic findings in eight patients from an open-label pilot study. Arthritis Rheum 2001; 44: 18695.
  • 5
    Brandt J, Haibel H, Cornely D, Golder W, Gonzalez J, Reddig J, et al. Successful treatment of active ankylosing spondylitis with the anti-tumor necrosis factor alpha monoclonal antibody infliximab. Arthritis Rheum 2000; 43: 134652.
  • 6
    Gorman JD, Sack KE, Davis JC. Treatment of ankylosing spondylitis by inhibition of tumor necrosis factor alpha. N Engl J Med 2002; 346: 134956.
  • 7
    Van der Linden S, Valkenburg HA, Cats A. Evaluation of diagnostic criteria for ankylosing spondylitis: a proposal for modification of the New York criteria. Arthritis Rheum 1984; 27: 3618.
  • 8
    Calin A, Garrett S, Whitelock H, Kennedy LG, O'Hea J, Mallorie P, et al. A new approach to defining functional ability in ankylosing spondylitis: the development of the Bath Ankylosing Spondylitis Functional Index. J Rheumatol 1994; 21: 22815.
  • 9
    Dougados M, Gueguen A, Nakache JP, Nguyen M, Mery C, Amor B. Evaluation of a functional index and an articular index in ankylosing spondylitis. J Rheumatol 1988; 15: 3027.
  • 10
    Moll JM, Wright V. An objective clinical study of chest expansion. Ann Rheum Dis 1972; 31: 18.
  • 11
    Cash JM. Evaluation of the patient: history and physical examination. In: Klippel JH, editor. Primer on the rheumatic diseases. Atlanta: Arthritis Foundation; 1997. p. 92.
  • 12
    Krupp LB, LaRocca NG, Muir-Nash J, Steinberg AD. The Fatigue Severity Scale: application to patients with multiple sclerosis and systemic lupus erythematosus. Arch Neurol 1989; 46: 11213.
  • 13
    Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 1992; 30: 47383.
  • 14
    Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation. Control Clin Trials 1991; 12(4 Suppl ): 142S158S.
  • 15
    Norman GR, Stratford P, Regehr G. Methodological problems in the retrospective computation of responsiveness to change: the lesson of Cronbach. J Clin Epidemiol 1997; 50: 86979.
  • 16
    Stucki G, Liang MH, Fossel AH, Katz JN. Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 1995; 48: 136978.
  • 17
    Cohen J. Statistical power analysis for the behavioral sciences. New York: Academic Press; 1977.
  • 18
    Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987; 40: 1718.
  • 19
    Husted JA, Cook RJ, Farewell VT, Gladman DD. Methods for assessing responsiveness: a critical review and recommendations. J Clin Epidemiol 2000; 53: 45968.
  • 20
    Ruof J, Sangha O, Stucki G. Comparative responsiveness of 3 functional indices in ankylosing spondylitis. J Rheumatol 1999; 26: 195963.
  • 21
    Ruof J, Stucki G. Comparison of the Dougados Functional Index and the Bath Ankylosing Spondylitis Functional Index: a literature review. J Rheumatol 1999; 26: 95560.
  • 22
    Daltroy LH, Larson MG, Roberts NW, Liang MH. A modification of the Health Assessment Questionnaire for the spondyloarthropathies. J Rheumatol 1990; 17: 94650.
  • 23
    Spoorenberg A, van der Heijde D, de Klerk E, Dougados M, de Vlam K, Mielants H, et al. A comparative study of the usefulness of the Bath Ankylosing Spondylitis Functional Index and the Dougados Functional Index in the assessment of ankylosing spondylitis. J Rheumatol 1999; 26: 9615.
  • 24
    Hidding A, van Santen M, De Klerk E, Gielen X, Boers M, Geenen R,et al. Comparison between self-report measures and clinical observations of functional disability in ankylosing spondylitis, rheumatoid arthritis and fibromyalgia. J Rheumatol 1994; 21: 81823.
  • 25
    Calin A. The individual with ankylosing spondylitis: defining disease status and the impact of the illness. Br J Rheumatol 1995; 34: 66372.
  • 26
    Garrett S, Jenkinson T, Kennedy LG, Whitelock H, Gaisford P, Calin A. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol 1994; 21: 228691.
  • 27
    Kennedy LG, Jenkinson TR, Mallorie PA, Whitelock HC, Garrett SL, Calin A. Ankylosing spondylitis: the correlation between a new metrology score and radiology. Br J Rheumatol 1995; 34: 76770.
  • 28
    Anderson JJ, Baron G, van der Heijde D, Felson DT, Dougados M. Ankylosing spondylitis assessment group preliminary definition of short-term improvement in ankylosing spondylitis. Arthritis Rheum 2001; 44: 187686.