SEARCH

SEARCH BY CITATION

Keywords:

  • Ankylosing spondylitis;
  • Functional index;
  • Disease activity;
  • Reproducibility;
  • Responsiveness;
  • Scales

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Objectives

To determine the agreement of scores on the original visual analog scale (VAS) or Likert scale of the Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), Bath Ankylosing Spondylitis Functional Index (BASFI), and Dougados Functional Index (DFI) with scores on a numerical rating scale (NRS). To assess the reproducibility and responsiveness of the instruments with the original scale and NRS.

Methods

Five hundred thirty-six patients with ankylosing spondylitis from the Netherlands, Mexico, and Switzerland completed a questionnaire in which all questions from the BASDAI, BASFI, and DFI were presented twice in random order with an 11-point NRS and either a 10-cm VAS (BASDAI and BASFI) or a 5-point Likert scale (DFI). Agreement of scores using Bland-Altman plots and intraclass correlation coefficients (ICCs), reproducibility using ICCs, and responsiveness were assessed.

Results

Large variability between the scores on the original scales and the NRS was found in individual questions of all 3 questionnaires, although total scores showed ICCs of at least 0.88. Reproducibility of all answer modalities showed low ICCs in individual questions, but moderate to good ICCs in total scores (Dutch group 0.62–0.89; Mexican group 0.53–0.72). Moderate to large effects (0.48–1.04) were found in responsiveness scores in the 3 questionnaires. No major differences in reproducibility and responsiveness between the answer modalities were found.

Conclusion

Although large variability between the scores on the original answer scales and the NRS was observed, the BASDAI, BASFI, and DFI can be administered with an NRS, which does not show important differences compared with the original scales.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

The Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) (1), the Bath Ankylosing Spondylitis Functional Index (BASFI) (2), and the Dougados Functional Index (DFI) (3) are well established, widely used instruments to evaluate disease activity and functioning in patients with ankylosing spondylitis (AS). The BASDAI and BASFI are completed on a visual analog scale (VAS) and the DFI on a Likert scale.

The VAS is a commonly used scale consisting of a 100-mm horizontal line anchored with 2 extremes at either end. It has proven to be a valid and reliable measure for subjective feelings such as pain and function (4–8). Disadvantages of the VAS are that many patients experience difficulties in completing the VAS and that the VAS can only be administered in a written form, which is a limitation for illiterate or visually impaired patients (9–15). Furthermore, there is a risk for measurement error: random errors that can occur during measuring the distance to the mark on the line, and systemic errors during the reproduction of the VAS questionnaires because photocopying may alter the length of the line.

The Likert scale (or verbal rating scale) consists of several categories, most commonly 5 or 7 with adjectives representing degrees of, for instance, functional ability. Subjects mark the adjective that best describes their impairment. Advantages of the Likert scale are that it is easy to understand, simple to complete, and it can be administered in either a written or verbal form (11). Disadvantages are the potential discrepancy between the patient's feelings and the descriptions on the scale, the different interpretations that can be attributed to the adjectives of the scale, and the unequal intervals between the categories (4).

Another type of scale is the numerical rating scale (NRS). The NRS is usually an 11-, 21- or (rarely) 101-point scale, with numbers in boxes that are anchored with 2 extremes at either end. Subjects mark their answer by putting a cross through the appropriate number. The NRS is simple to complete and score, and can be administered in both written and verbal form (11).

Although no major differences in practical use of these answer modalities have been found, the NRS seems to be slightly preferred, since it is easy to complete and appropriate for all groups of patients (10, 11, 16, 17). Furthermore, the presumed high sensitivity of the VAS, because of its infinite number of possibilities for answers, has been disproven by Jensen et al, who showed that little information was lost when a 101-point NRS was transformed to an 11- or 21-point NRS (18). The NRS and the Likert scale, both ordinal scales, have inherent problems that do not differ from those of a VAS because in practice, clusters are formed on the scale, which limit the actual number of responses (12). Consequently, the VAS does not behave as a true continuous scale.

Instruments for research in rheumatology should be valid in all their aspects. To standardize the nomenclature of validity, the Outcome Measures in Rheumatoid Arthritis Clinical Trials (OMERACT) filter has been proposed (19). The 3 domains of the OMERACT filter are truth (validity), discrimination (reproducibility and responsiveness), and feasibility. One of the criteria for feasibility is the appropriateness of the answer scales used in questionnaires. Because some patients may experience difficulties with VAS or Likert scales, and because the NRS is slightly preferred in the literature, we decided to assess the discrimination and feasibility properties of the BASDAI, BASFI, and DFI with an NRS. Our first objective was to study the agreement of scores on the original scales of the BASDAI, BASFI, and DFI with scores on an NRS. Second, the reproducibility and responsiveness of the BASDAI, BASFI, and DFI on the original answer scale and on the NRS were assessed. Attention was paid to both scores on single items as well as the total scores of these questionnaires. The first was done to supply more insight into the properties of single-item questionnaires, sometimes used to assess certain aspects of a disease. To enhance the generalizability of the study results, we decided to investigate all objectives across different languages and cultures, and in clinical trial patients and outpatients with varying disease duration and disease activity.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Patients

A total of 536 patients with AS from the Netherlands (n = 182; 34%), Mexico (n = 166; 31%), and Switzerland (n = 188; 35%) entered the study. Table 1 shows an overview of the groups of patients that participated in the various parts of the study, as well as baseline characteristics of each group.

Table 1. Patients used for several aspects of the study and characteristics of the groups
 Group 1 Netherlands (spa therapy Austria) n = 40Group 2 Netherlands (spa therapy Netherlands) n = 40Group 3 Netherlands (spa therapy control) n = 40Group 4 Netherlands (outpatients) n = 62Group 5 Mexico (outpatients) n = 166Group 6 Switzerland (members AS society) n = 188
  • *

    X = Participated in this aspect of the study. VAS = visual analog scale; NRS = numerical rating scale.

Comparison VAS or Likert with NRSX*XXXXX
ResponsivenessXXX
ReproducibilityXn = 38
% Male637085776570
Age, years (SD)48 (10)49 (9)48 (10)47 (11)31 (12)50 (12)
Disease duration, years (SD)11 (6)12 (5)10 (6)12 (7)5 (7)18 (10)
Duration of complaints, years (SD)19 (10)19 (9)15 (8)20 (9)11 (10)24 (11)

From the Netherlands, 120 outpatients participating in a randomized controlled trial to assess the efficacy of spa therapy in patients with AS completed the questionnaires. The patients were randomly allocated to receive spa therapy in Austria (group 1; n = 40) or the Netherlands (group 2; n = 40), or to a control group (group 3; n = 40) that stayed at home and continued weekly group physical therapy. All patients completed the questionnaires twice at home: at baseline (2 weeks prior to the intervention), and 1 week after 3 consecutive weeks of spa therapy or after 3 weeks of weekly group physical therapy. A convenience sample of 62 Dutch patients from a secondary and a tertiary outpatient clinic completed the questionnaire in the hospital after their regular outpatient visit (group 4).

In Mexico, 166 consecutive outpatients from 2 referral centers completed the questionnaire in the hospital (group 5). At one of the centers, teenagers were also enrolled in the study. Thirty-eight patients from group 5, who visited the outpatient clinic again within the study period, completed the same questionnaire a second time with a mean interval of 4 weeks.

In Switzerland, a random sample of the members from the Swiss AS association was drawn. In total, 418 patients received the questionnaire at home, which was completed by 188 patients (group 6).

Instruments

The BASDAI (1) consists of 6 questions on fatigue, pain of the spine, pain and/or swelling of the peripheral joints, localized tenderness, and severity and duration of morning stiffness. The questions are answered on a 10-cm VAS, anchored with the labels “none” and “very severe” at either end of the first 5 questions, and with “0 hours” and “2 hours” in the question on duration of morning stiffness. The mean of the 2 questions on morning stiffness counts as one variable. The final score is defined by calculating the mean of the 5 items. Scores range from 0 (best) to 10 (worst).

The BASFI (2) contains 10 questions concerning activities of daily living and is scored on a 10-cm VAS with the anchors “easy” and “impossible” at either side. The mean of the items defines the final score, with scores ranging from 0 (best) to 10 (worst).

The DFI (3) consists of 20 Likert-formatted scales, and includes activities of daily living. Originally, the DFI was completed on a 3-point Likert scale and later modified to a 5-point Likert scale. In the present study, we have applied the 5-point Likert scale. For each item, possible scores are 0, 0.5, 1, 1.5, and 2. Total scores range from 0 (best) to 40 (worst), but to facilitate the comparability with scores of the other questionnaires, we have converted the scores to a 0–10 scale, by dividing the scores by 4.

We used instruments that were translated and validated in all 3 languages by other researchers according to proposed guidelines (20). Some of these validation studies have been published (21, 22). Most of these instruments have been used extensively in clinical trials and epidemiologic studies in AS. All questionnaires are self reported.

Procedure

All questions from the BASDAI, BASFI, and DFI with both the original answer modality and an 11-point NRS with figures in boxes were mixed up and presented in random order in the questionnaire. It was checked that the same question with the 2 different answer modalities did not appear on the same page. Instructions to complete the questionnaire were given on the first page. On the last page, the preference of the participants was assessed with the question: “Which of the three answer scales did you like best?” with the possibility to mark 1 of the 3 answer scales. Questionnaires were administered in Dutch (The Netherlands), Spanish (Mexico), and German (Switzerland).

Missing values were only dealt with in calculating the total scores of the questionnaires, according to Creusen et al (23). At most, 1 of 6 questions from the BASDAI, 2 of 10 from the BASFI, and 8 of 20 questions from the DFI were substituted with the patient's mean.

Statistical analysis

To judge feasibility, Bland-Altman plots (24) were made to visualize the difference between the results on the original answer scale and on the NRS against the mean. Intraclass correlation coefficients (ICCs) were calculated to assess the concordance of the total scores on the different scales of each questionnaire. ICCs higher than 0.75 were considered relevant (25).

Data from the control group of the spa therapy trial (group 3) and the 38 patients from Mexico who completed the questionnaire twice were used for assessing reproducibility. ICCs were calculated for both individual questions and total scores of each instrument.

Because response criteria for outcome in AS trials were not yet available, the results on the reproducibility tests were used to determine cut-off levels for improvement or worsening on an individual patient level, according to the 95% limits of agreement method described by Bland and Altman (24). This method helps to distinguish between true changes and variability due to measurement errors. The smallest detectable difference (SDD) is calculated as 1.96 × SDchange of the mean total score of each instrument in the control group. Patients from both the intervention groups (n = 80) and the control group (n = 40) with a change score exceeding the positive value of the SDD were considered truly improved, those with a change score exceeding the negative value of the SDD were considered worsened.

Responsiveness or sensitivity to change was assessed with the data from both intervention groups of the spa therapy trial (n = 80) compared with the control group (n = 40). Because no consensus exists on which method is preferred, we determined the responsiveness with the effect size (26), the standardized response mean (27), and the method described by Guyatt et al (28). The results were interpreted according to the Cohen's effect size index, in which 0.2 refers to a small change, 0.5 to moderate, and 0.8 or more to large change (29). The effect size (ES) is calculated as the mean change after treatment compared with baseline, divided by SD of the baseline scores (26). The standardized response mean (SRM) is similar to the ES, but with the SD of the change score as denominator (27). In the Guyatt method, the mean change score in the treatment group is divided by the SD of the change score of the control group (28). Consequently, a responsiveness score for the control group cannot be calculated.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Feasibility

Of the 536 patients, 14 were unable to complete questions on a VAS, 2 patients had difficulties with the NRS, and 1 with the Likert scale. Forty-nine percent of the patients preferred to answer on a Likert scale, 38% on a NRS, 9% on a VAS, and 4% did not have a particular preference.

The results for the total scores on the questionnaires are presented in Table 2. For the BASDAI and BASFI, no major differences in scores between the VAS and NRS were found. However, consistently lower scores on the DFI were observed with the Likert scales compared with scores on the NRS.

Table 2. Results of total scores on different answer modalities for BASDAI, BASFI, and DFI, split per group*
 BASDAIBASFIDFI
VASNRSVASNRSLikertNRS
  • *

    Values are mean (SD). BASDAI = Bath Ankylosing Spondylitis Disease Activity Index; BASFI = Bath Ankylosing Spondylitis Functional Index; DFI = Dougados Functional Index; VAS = Visual Analog Scale; NRS = numerical rating scale.

Group 14.7 (1.8)4.6 (1.8)4.9 (1.8)5.0 (2.1)2.9 (1.4)3.7 (1.7)
Group 25.1 (2.0)5.0 (2.0)4.3 (2.0)4.4 (2.0)2.7 (1.3)3.6 (1.8)
Group 34.5 (2.0)4.2 (2.1)4.2 (2.1)4.2 (2.2)2.8 (1.4)3.4 (1.9)
Group 43.3 (1.8)3.5 (1.7)3.5 (1.9)3.6 (2.1)2.4 (1.4)3.0 (1.7)
Group 54.2 (2.3)4.1 (2.4)3.6 (2.4)3.6 (2.5)2.4 (1.9)2.7 (2.1)
Group 64.3 (2.2)4.4 (2.3)2.8 (2.0)2.7 (2.1)1.8 (1.4)2.1 (1.6)

Bland-Altman plots of the difference plotted against the mean of every question answered by NRS and either a VAS or Likert scale showed a wide distribution in all questions (representative examples are shown in Figure 1). The DFI scored on a Likert scale gave lower scores compared with the NRS. This was not present for VAS scores compared with NRS scores. The greatest variability was found in the middle part of the scoring range. Total scores of each of the instruments showed better concordance between the original answer scales and the NRS; the ICCs calculated were 0.95 for the BASDAI, 0.97 for the BASFI, and 0.88 for the DFI.

thumbnail image

Figure 1. Bland-Altman plots. The scores on the original answer scales subtracted by the scores on the numerical rating scale plotted against the mean of these scores (95% level of agreement). BASDAI = Bath Ankylosing Spondylitis Disease Index; BASFI = Bath Ankylosing Spondylitis Functional Index; DFI = Dougados Functional Index. Each center of a flower and each petal represents 1 case. *Scores from the DFI were recoded to a 0–10 scale.

Download figure to PowerPoint

Reproducibility

To assess reproducibility of individual questions, question 5 of the BASDAI on the level of morning stiffness appeared twice with an NRS in the questionnaire with 2 pages in between. Although the ICC was 0.85, an SDD of 2.8 could be calculated. Reproducibility of the 3 questionnaires is shown in Table 3. Because consistently lower ICCs in the Mexican group compared with the Dutch group were found, we decided to present the results separately. Low to moderate ICCs were found for individual questions of all 3 questionnaires. Total scores showed higher ICCs, but remained low for all questionnaires in the Mexican group and for the BASDAI in the Dutch group. The differences between the answer modalities, however, were minor. Only in the Mexican group was reproducibility of the individual questions of the BASDAI clearly lower on the VAS compared with NRS. In group 3, lower ICCs were found in individual questions on the Likert scale compared with NRS on the DFI. There was no difference in reproducibility with respect to total scores.

Table 3. Reproducibility assessed with ICCs for individual questions and the total score on the BASDAI, BASFI, and DFI for the different answer modalities in Dutch and Mexican patients*
 BASDAIBASFIDFI
VASNRSVASNRSLikertNRS
  • *

    ICCs = intraclass correlation coefficients; BASDAI = Bath Ankylosing Spondylitis Disease Activity Index; BASFI = Bath Ankylosing Spondylitis Functional Index; DFI = Dougados Functional Index; VAS = Visual Analog Scale; NRS = numerical rating scale. See Table 2 for additional definitions.

  • Mean of ICCs (range).

Group 3 (spa therapy controls)
 Individual questions0.61 (0.43–0.72)0.59 (0.41–0.69)0.76 (0.54–0.88)0.75 (0.57–0.84)0.62 (0.41–0.84)0.78 (0.57–0.90)
 Total scores0.640.620.880.890.850.88
38 patients Mexico
 Individual questions0.47 (0.29–0.57)0.58 (0.36–0.75)0.62 (0.34–0.82)0.62 (0.38–0.87)0.62 (0.34–0.83)0.58 (0.20–0.85)
 Total scores0.530.560.700.720.720.70

To assess reproducibility dichotomously, the SDDs of each instrument on each scale were calculated (Table 4). No major differences between the original scales and the NRS were found for each instrument.

Table 4. SDD of the BASDAI, BASFI, and DFI on different answer modalities, and number of patients improved/worsened based on the corresponding SDD used as cut-off level in Dutch patients from the spa therapy trial*
 BASDAIBASFIDFI
VASNRSVASNRSLikertNRS
  • *

    Absolute number of patients provided: intervention group n = 80, control group n = 40. SDD = smallest detectable difference. See Table 2 for additional definitions.

SDD3.353.572.081.981.651.86
Intervention group improved9815171010
Intervention group worsened010100
Control group improved221101
Control group worsened112222

Responsiveness

Table 5 shows the scores of all patients taking part in the spa therapy trial before and after the intervention. Statistically significant improvement (P < 0.001) for the intervention group compared with baseline was found in all questionnaires using the Wilcoxon signed rank test.

Table 5. Results on BASDAI, BASFI, and DFI from the intervention groups and control group of the spa therapy trial before and after the intervention on different answer modalities*
Answer modalityIntervention group n = 80Control group n = 40
  • *

    Values are mean (SD). See Table 2 for definitions.

  • P < 0.001 compared with baseline.

BASDAI VAS
 Before4.9 (1.9)4.5 (2.0)
 After3.8 (2.2)4.1 (2.0)
BASDAI NRS
 Before4.8 (1.9)4.2 (2.1)
 After3.9 (2.3)4.0 (2.1)
BASFI VAS
 Before4.6 (1.9)4.2 (2.1)
 After3.6 (2.1)4.2 (2.2)
BASFI NRS
 Before4.7 (2.0)4.2 (2.2)
 After3.6 (2.2)4.2 (2.2)
DFI Likert
 Before2.8 (1.3)2.8 (1.4)
 After2.1 (1.4)2.7 (1.6)
DFI NRS
 Before3.7 (1.8)3.4 (1.9)
 After2.9 (1.9)3.5 (1.9)

All questionnaires showed moderate to large responsiveness in the intervention group, and the control group remained reasonably stable (Table 6). Higher responsiveness scores were found with the method described by Guyatt. Responsiveness with the ES method showed the lowest scores. For the BASDAI, the responsiveness score was slightly higher on the VAS compared with NRS on all responsiveness methods. This was not observed in the BASFI. Except for the control group, no major differences in responsiveness on the Likert scale and NRS were found in the DFI.

Table 6. Responsiveness of BASDAI, BASFI, and DFI on different answer modalities calculated with 3 different responsiveness methods*
 BASDAIBASFIDFI
VASNRSVASNRSLikertNRS
  • *

    Positive changes imply improvement. SRM = standardized response mean; ES = effect size. See Table 2 for additional definitions.

SRM intervention group0.600.510.760.740.950.85
ES intervention group0.570.510.520.520.510.48
Guyatt method intervention group0.640.540.931.040.810.89
SRM controls0.220.08−0.01−0.020.07−0.13
ES controls0.190.070.00−0.010.04−0.07

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

In this study, 3 commonly used AS-specific questionnaires, administered with both their original answer modality and an NRS, were judged with respect to feasibility (appropriateness of the answer modalities) and discrimination (reproducibility and responsiveness) criteria of the OMERACT filter.

More patients were found to have difficulties in completing a VAS than either the Likert scale or NRS. Eighty-seven percent of the patients preferred to answer on either an NRS or a Likert scale, and only 9% on a VAS. A greater preference for the Likert scale was also described by Kremer et al in 57% of the patients studied (10).

Bland-Altman plots showed a major variety with respect to answering the same question on different scales (Figure 1). The variation was not solely due to the different answer modalities: The question that was answered twice on the NRS also showed a substantial variability. The huge differences in scores give the impression that some patients did not fully understand or properly read the anchors of the scales. From the Bland-Altman plots it can be deduced that the variability is random. However, the variability of the total scores of the questionnaires answered on different answer scales was less impressive, as can be expected by aggregating different answers into 1 score, but was still substantial. The ICCs of the total scores were relatively high, implying a high degree of concordance between scores on different answer modalities.

The reproducibility of individual questions and total scores of the BASDAI, BASFI, and DFI was much lower than expected (Table 3). Large differences in all questionnaires were found between the scores obtained from the outpatients in Mexico and the control group of the spa therapy trial in the Netherlands. It is arguable whether age, cultural aspects, or the level of education were the basis for these differences. In the Dutch group, the mean ICCs of both individual and total scores of the BASDAI were lower than the minimum of 0.75; in the Mexican group none of the ICCs from both individual and total scores of the 3 questionnaires reached the minimum of 0.75. Only the total scores of the BASFI and DFI in the Dutch group showed acceptable ICCs. However, no major differences between the ICCs with respect to different answer modalities were found for all questionnaires.

The lack of reproducibility found in individual questions can have major implications for single-item questionnaires. For instance, the 2 questions on pain and patient's global, both single-item questions and selected as specific outcome instruments in research in AS patients, may lack a sufficient degree of responsiveness on an individual patient level, due to low reproducibility (30). This lack of reproducibility merits further investigation.

Garrett et al reported a test–retest reliability with a Pearson's correlation of 0.93 for the BASDAI (1). Pearson's correlations were also published by Calin et al for the BASFI (r = 0.89) and DFI (r = 0.96) (2). Whereas the Pearson's correlation is a measure of association, the ICC gives information about the concordance of the results, which is the degree to which the same results are found in the same, stable subjects at repeated measurements. Consequently, the Pearson's correlation coefficients give higher results, which are difficult to interpret. The ICC is preferred in calculating reproducibility (28). Dougados et al reported an ICC of 0.86 for the DFI, which is similar to our result in the Dutch group (3).

Responsiveness was assessed with 3 different methods. Moderate to large effects were found in the intervention group with all 3 responsiveness methods, independent of the answer modality used. The method described by Guyatt showed higher responsiveness than either the SRM or ES method in all questionnaires, with exception of the DFI answered on the Likert scale. Ruof et al also found that the Guyatt method showed higher responsiveness scores in their comparative study of the BASFI and DFI, in which they also stated that the BASFI was more responsive than the DFI (31). The results of our study do not confirm the latter; the superiority of one of these questionnaires with respect to responsiveness appeared to be dependent on the responsiveness method applied.

Bolton and Wilkinson showed in their study that the responsiveness of measures was higher when using the NRS compared with VAS and Likert, although NRS and VAS were closely related (17). In the present study, minor differences between the scales were found, with only the BASDAI showing consistently higher scores on the VAS compared with the NRS. In general, the different scales seem to have reasonably similar properties with respect to responsiveness.

In conclusion, although a major variability in individual questions on the original answer scales of the BASDAI, BASFI, and DFI compared with the NRS was found, total scores showed a high level of agreement. These results were found for all 3 countries, and for both clinical trial patients and outpatients. The variability between the scales may entirely be explained by the low reproducibility of individual questions found in each of these questionnaires. All 3 questionnaires showed good responsiveness on both the original scales and on the NRS. The original answer modalities of the BASDAI, BASFI, and DFI can all be replaced by an NRS that maintains the properties of the original scales.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

We thank the Swiss AS association for their cooperation.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES
  • 1
    Garrett S, Jenkinson T, Kennedy LG, Whitelock H, Gaisford P, Calin A. A new approach to defining disease status in ankylosing spondylitis: the Bath Ankylosing Spondylitis Disease Activity Index. J Rheumatol 1994; 21: 228691.
  • 2
    Calin A, Garrett S, Whitelock H, Kennedy LG, O'Hea J, Mallorie P, et al. A new approach to defining functional ability in ankylosing spondylitis: the development of the Bath Ankylosing Spondylitis Functional Index. J Rheumatol 1994; 21: 22815.
  • 3
    Dougados M, Gueguen A, Nakache JP, Nguyen M, Mery C, Amor B. Evaluation of a functional index and an articular index in ankylosing spondylitis. J Rheumatol 1988; 15: 3027.
  • 4
    Ohnhaus EE, Adler R. Methodological problems in the measurement of pain: a comparison between the verbal rating scale and the visual analogue scale. Pain 1975; 1: 37984.
  • 5
    Scott J, Huskisson EC. Graphic representation of pain. Pain 1976; 2: 17584.
  • 6
    Huskisson EC, Jones J, Scott PJ. Application of visual-analogue scales to the measurement of functional capacity. Rheumatol Rehabil 1976; 15: 1857.
  • 7
    Scott PJ, Huskisson EC. Measurement of functional capacity with visual analogue scales. Rheumatol Rehabil 1977; 16: 2579.
  • 8
    Price DD, McGrath PA, Rafii A, Buckingham B. The validation of visual analogue scales as ratio scale measures for chronic and experimental pain. Pain 1983; 17: 4556.
  • 9
    Dixon JS, Bird HA. Reproducibility along a 10 cm vertical visual analogue scale. Ann Rheum Dis 1981; 40: 879.
  • 10
    Kremer E, Atkinson JH, Ignelzi RJ. Measurement of pain: patient preference does not confound pain measurement. Pain 1981; 10: 2418.
  • 11
    Jensen MP, Karoly P, Braver S. The measurement of clinical pain intensity: a comparison of six methods. Pain 1986; 27: 11726.
  • 12
    Bird HA, Dixon JS. The measurement of pain. Baillieres Clin Rheumatol 1987; 1: 7189.
  • 13
    Guyatt GH, Townsend M, Berman LB, Keller JL. A comparison of Likert and visual analogue scales for measuring change in function. J Chronic Dis 1987; 40: 112933.
  • 14
    Ferraz MB, Quaresma MR, Aquino LR, Atra E, Tugwell P, Goldsmith CH. Reliability of pain scales in the assessment of literate and illiterate patients with rheumatoid arthritis. J Rheumatol 1990; 17: 10224.
  • 15
    Eyres S, van der Heijde D, Dougados M, Tennant A. The Visual Analogue Scale: deception on a “VASt” scale? submitted for publication.
  • 16
    Downie WW, Leatham PA, Rhind VM, Wright V, Branco JA, Anderson JA. Studies with pain rating scales. Ann Rheum Dis 1978; 37: 37881.
  • 17
    Bolton JE, Wilkinson RC. Responsiveness of pain scales: a comparison of three pain intensity measures in chiropractic patients. J Manipulative Physiol Ther 1998; 21: 17.
  • 18
    Jensen MP, Turner JA, Romano JM. What is the maximum number of levels needed in pain intensity measurement? Pain 1994; 58: 38792.
  • 19
    Boers M, Brooks P, Strand CV, Tugwell P. The OMERACT filter for outcome measures in rheumatology [editorial]. J Rheumatol 1998; 25: 1989.
  • 20
    Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: literature review and proposed guidelines. J Clin Epidemiol 1993; 46: 141732.
  • 21
    Creemers MC, van't Hof MA, Franssen MJ, van de Putte LB, Gribnau FW, van Riel PL. A Dutch version of the functional index for ankylosing spondylitis: development and validation in a long-term study. Br J Rheumatol 1994; 33: 8426.
  • 22
    Ruof J, Sangha O, Stucki G. Evaluation of the Bath Ankylosing Spondylitis Functional Index (BASFI) and Dougados Functional Index (D-FI). [German]. Z Rheumatol 1999; 58: 21825.
  • 23
    Creusen E, van der Heijde D, Spoorenberg A, de Klerk E, van der Tempel H, Miedema H, et al. How to deal with missing answers in self-assessment questionnaires in ankylosing spondylitis: BASDAI-BASFI-ASFI? submitted for publication.
  • 24
    Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986; 1: 30710.
  • 25
    Streiner D, Norman G. Health measurement scales: a practical guide to their development and use. Oxford (UK): Oxford University Press; 1989.
  • 26
    Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27: S17889.
  • 27
    Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care 1990; 28: 63242.
  • 28
    Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987; 40: 1718.
  • 29
    Cohen J. Statistical power analysis for behavioral sciences. 2nd ed. Hillsdale (NJ): Lawrence Erlbaum Associates; 1988.
  • 30
    Van der Heijde D, Calin A, Dougados M, Khan MA, van der Linden S, Bellamy N. Selection of instruments in the core set for DC-ART, SMARD, physical therapy, and clinical record keeping in ankylosing spondylitis: progress report of the ASAS Working Group. J Rheumatol 1999; 26: 9514.
  • 31
    Ruof J, Sangha O, Stucki G. Comparative responsiveness of 3 functional indices in ankylosing spondylitis. J Rheumatol 1999; 26: 195963.