SEARCH

SEARCH BY CITATION

Keywords:

  • Effect;
  • Sample size;
  • Rehabilitation;
  • WOMAC;
  • SF-36

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Objective

To discuss the concepts of the minimal clinically important difference (MCID) and the smallest detectable difference (SDD) and to examine their relation to required sample sizes for future studies using concrete data of the condition-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the generic Medical Outcomes Study 36-Item Short Form (SF-36) in patients with osteoarthritis of the lower extremities undergoing a comprehensive inpatient rehabilitation intervention.

Methods

SDD and MCID were determined in a prospective study of 122 patients before a comprehensive inpatient rehabilitation intervention and at the 3-month followup. MCID was assessed by the transition method. Required SDD and sample sizes were determined by applying normal approximation and taking into account the calculation of power.

Results

In the WOMAC sections the SDD and MCID ranged from 0.51 to 1.33 points (scale 0 to 10), and in the SF-36 sections the SDD and MCID ranged from 2.0 to 7.8 points (scale 0 to 100). Both questionnaires showed 2 moderately responsive sections that led to required sample sizes of 40 to 325 per treatment arm for a clinical study with unpaired data or total for paired followup data.

Conclusion

In rehabilitation intervention, effects larger than 12% of baseline score (6% of maximal score) can be attained and detected as MCID by the transition method in both the WOMAC and the SF-36. Effects of this size lead to reasonable sample sizes for future studies lying below n = 300. The same holds true for moderately responsive questionnaire sections with effect sizes higher than 0.25. When designing studies, assumed effects below the MCID may be detectable but are clinically meaningless.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Comprehensive assessment of patients' health status is gaining in importance now that health care, with its expanding diversity of medical interventions, is becoming increasingly evidence-based. As the growing number of the elderly in industrial nations exerts additional pressure on the fiscal resources of health care systems, medical action within strict guidelines is in greater demand(1, 2). One of the key issues for evidence-based and cost-effective medicine is the detection and proof of intervention effects.

In patients with osteoarthritis (OA), information about effectiveness of medication and joint replacement(3–7) is available, but information on rehabilitation interventions is sparse. Effects of rehabilitation intervention are substantially smaller than those of arthroplasty, which reduces disability substantially(6–10). Furthermore, small effects may be more difficult to detect and require larger sample sizes for clinical studies, making them more difficult to realize.

Most importantly, the ability of an instrument to detect such a small difference (the so-called smallest detectable difference, SDD) is essential in order to quantify the minimal difference that patients and their physicians consider clinically important (the so-called minimal clinically important difference [MCID]).

Thus, in order to illuminate small effects in rehabilitation intervention, we need sensitive instruments. For the assessment of interventions in OA of the lower extremities, the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is generally recommended as the most sensitive condition-specific instrument(4, 11–18). As a generic health status measure the Medical Outcomes Study 36-Item Short Form (SF-36) is now widely used and allows the effect of an OA intervention to be gauged in comparison with other interventions under various conditions(3, 19–21).

The objective of our study was to examine both the MCID, in contrast to the smallest statistically detectable difference (SDD), and the consequent implications for sample sizes in the assessment of comprehensive rehabilitation intervention in OA patients using the WOMAC and SF-36 instruments.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Patients and data collection

Patients were recruited from the Zurzach Rheumatology and Rehabilitation Clinic in Switzerland. All patients with hip or knee OA who were consecutively referred to a comprehensive inpatient rehabilitation intervention by their family physician or their rheumatologist were invited to participate in the study by a letter that was sent to them 4 weeks prior to their entry into the clinic. On the day of their entry into the clinic, a physician performed the baseline interview and examination, which determined inclusion in or exclusion from the study. Patients were sent or given a set of questionnaires, including the WOMAC and the SF-36, on the day of entry into the clinic (baseline examination), on the day of discharge from the clinic, and again 3 months after their baseline examination.

According to the American College of Rheumatology (ACR) guidelines, inclusion criteria were (a) knee pain for more than 25 of the last 30 days, (b) morning stiffness of less than 30 minutes and crepitation in the knee, or (c) pain for more than 25 of the last 30 days, with osteophytes on x-rays of the knees indicating knee OA(22). Patients with hip OA were included when there was pain for more than 25 of the last 30 days and at least 2 of the following 3 criteria were present: erythrocyte sedimentation rate <20 mm/hour, osteophytes on x-rays, or obliteration of joint space(23). Patients were excluded if they did not fulfill the ACR criteria, had a history of medication abuse or nonadherence, had difficulty completing questionnaires, suffered from severe illness, or had undergone arthroplasty of the joint in question.

Patients included in the analysis filled out the questionnaires in accordance with the rules of the user's guide, which specifies completion of at least 4 of the 5 pain items, 1 of the 2 stiffness items, and 14 of the 17 function items in WOMAC(13). Furthermore, completion of the SF-36 required that both the physical component summary (PCS) and mental component summary (MCS) were calculable(20).

The comprehensive inpatient rehabilitation intervention consisted of a standardized program of passive and especially active physical therapy as well as a reduction in the use of nonsteroidal anti-inflammatory drugs (NSAIDs) as much as possible. The rehabilitation program concentrated on physical therapy and was supervised by physicians. Active kinesitherapies were performed both individually and in groups to strengthen and stretch the musculature, especially the quadriceps, as well as the passive structures in order to recreate regular joint mobility. Passive therapies included electrotherapies, hydrotherapies, thermotherapies such as cold or warm compresses, and massage. Instructions for relaxing strategies and consultations for preventive measures were additional elements of the rehabilitation program. Finally, each patient was instructed in an individual home rehabilitation program to be continued after discharge. The duration of the program varied between 3 and 4 weeks, depending on adaptation of the program to each individual patient's unique situation (severity of OA, comorbidity, etc.).

Measures.

The condition-specific WOMAC questionnaire is a multidimensional measure of pain, stiffness, and physical functional disability consisting of 24 items graded in a numerical rating scale ranging from 0 (“no symptoms”) to 10 (“extreme symptoms”)(4, 12–18). We selected the most responsive sections for OA pain and function(8, 13). With 36 items, the generic SF-36 calculates 8 multi-item scales—physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health—and 2 summary scales, the physical component summary (PCS) and the mental component summary (MCS)(20, 21). Each scale ranges from 0 (“extreme symptoms/poor health”) to 100 (“no symptoms/perfect health”). For the analysis, bodily pain, physical functioning, and PCS were selected.

The transition questionnaire was used to gather data from the patients about their current subjective health status in relation to the OA joint in terms of their general health. At the 3-month followup, patients had to compare their general health status with that of 3 months earlier, i.e., with that at baseline examination, using the assessment categories “much worse,” “slightly worse,” “equal,” “slightly better,” and “much better.”

Analyses

The changes in score from baseline to the 3-month followup were defined as effects. As responsiveness measures, standardized response mean (SRM)(24) and the effect size (ES)(25) have been used. The SRM is equal to the mean change in score (effect) divided by the standard deviation or deviations of individuals' changes in scores. The ES equals the mean change in score (effect) divided by the standard deviation of the baseline scores. In both coefficients, SRM and ES, a higher value indicates higher responsiveness.

Effects measured by WOMAC and SF-36 were related to the transition reply categories in order to assess MCID. This is an application of the transition method, which has been established and successfully used in different settings(26–30). We compared the mean scores of WOMAC and SF-36 and the score changes between baseline and 3-month followup within the different transition categories (“much better,” “slightly better,” etc.; see above). The mean score difference between the “equal” group and the “slightly better” group resulted in the MCID for improvement. The corresponding MCID for worsening was determined by the mean score difference between the “equal” group and the “slightly worse” group.

When planning a future study, a small pilot study is often conducted in order to assess important parameters for the main study. Given the data of the pilot study, the SDD is the smallest effect that can be detected as significant by the chosen statistical method. The size of the SDD depends on the responsiveness of the measurement instrument and the sensitivity of the statistical method(31–34). For example, a parametric Student's t-test is able to detect much smaller score differences, changes, and effects than nonparametric tests such as the Wilcoxon rank-sum test. Conversely, more sensitive statistical models allow the use of smaller sample sizes. For example, a given effect will lead to smaller sample sizes when applying the t-test rather than the Wilcoxon test(33). Which statistical model and which corresponding test to choose is a question of the observed or expected distribution of the data.

Formulas

In data generated by “natural” processes, approximation by normal (Gauss) distribution is feasible when the sample size is large enough, i.e., n ≥ 30(31, 33, 34). In smaller numbers of patients, the t-distribution replaces the normal distribution(34). In both cases, SDD and sample size can be determined by the calculation rules of normal distribution as follows(31, 34–37):

General equation for the sample size (n).

Effects were measured as differences (d) between 2 groups. For example, d is the difference between the mean of the intervention group and the mean of the control group. In large samples (n ≥ 30), d can be considered as normally distributed with the mean μd and the standard error SE(d). Specifically, this is true when the scores of both groups are normally distributed because of the fact that, by the calculating rules of normally distributed variables, the difference between 2 normally distributed variables is also normally distributed. The null hypothesis is that there is no effect: μd = 0. The alternative hypothesis(31, 34) is that

  • equation image

The z-values come from the standard normal distribution (mean = 0, standard deviation = 1) where α = two-sided type I error (mostly α = 0.05) and β = one-sided type II error; thus 1 – β = power (mostly power = 0.8). In the case of n < 30 the z-values must be replaced by t-values out of the t-distribution(34).

When comparing the difference (d) of 2 (effect) variables, the mean of the difference is equal to the difference of the 2 means by the commutative rule. By the rules of calculation with normally distributed variables, the difference's variance results from the sum of the variances of the 2 means: variance(d) = s2/n1 + s2/n2, when both effect variables have the same (or a comparable) “a priori” standard deviation, SD, and n1, n2 are the sample sizes of the variables. In paired followup data, or when both the control and treatment groups have the same size, we can set n1 = n2 = n. Thus, SE(d) can be replaced by SE(d) = √(s2/n + s2/n) = √(2s2/n) in formula [0], resulting in the general equation for the sample size:

  • equation image

where zα / zβ is the value of the standard normal distribution (mean = 0, standard deviation = 1) at the probability of α or β, respectively

α =two-sided type I error

β =one-sided type II error (thus, 1 – β = power)

Δ =mean effect, i.e., the difference of the mean score of the intervention group (or followup score) minus the mean score of the control group (or baseline score) equals the mean of the differences μd

SD =standard deviation of the scores at baseline (a priori standard deviation)

In the case of followup studies, we have the same subjects in the control group (before the intervention) and in the intervention group (after the intervention). Therefore, n is the total required number of the sample.

Conversely, given a sample size (n), and the a priori baseline standard deviation (SD), for example by a pilot study, the smallest statistically detectable difference (SDD = Δ) can be determined out of formula&lsqbr;1&rsqbr;:

  • equation image
Determination of n by ES.

If we know the effect size (ES) from a pilot study, and we assume that in the control group of the main study the standard deviation is equal or comparable to the a priori standard deviation of the control group in the pilot study, we have (SD / Δ) = 1 / ES by the definition of ES. Out of formula [1] follows

  • equation image
Determination of n by SRM.

If we have paired observations and we know the variance of the differences SDΔ2 (from a pilot study), then we can replace the standard error of the mean difference by SE(d) = √(SDΔ2/n) in formula [0](32, 35):

  • equation image

Because SRM is equal to μd / SDΔ (and μd = Δ) by formula [0] it follows that

  • equation image
  • equation image

For the mostly used type I and II errors from the standard normal distribution the expression (zα + zβ)2 can be replaced by

  • equation image
  • equation image
  • equation image

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Patients

Between February 1997 and September 1998, 142 patients with the diagnosis of OA of the hip or knee were included in the study according to ACR criteria. They completed the questionnaires correctly according to the rules of the WOMAC and the SF-36 at their entry into the clinic (baseline examination)(13, 20). None of the patients were scheduled to undergo arthroplasty in the near future; instead, they underwent a 3- to 4-week comprehensive inpatient rehabilitation.

Three months after the baseline examination, 122 patients were reexamined with complete WOMAC sets, and 116 were reexamined with both WOMAC and SF-36 questionnaire sets. Between the baseline examination and the 3-month followup, 2 patients had died (for reasons unrelated to their OA), 2 patients had undergone arthroplasty (after their clinic stay), 2 patients were excluded by the predetermined exclusion criteria (listed above), and 14 (WOMAC) and 20 (SF-36) patients returned incomplete forms according to the rules or refused to participate further.

The mean age of the study subjects was 65.1 years, 70.5% of the patients were female, 61.5% had knee OA, and 43 patients (35.2%) used NSAIDs or analgesics, or both, at baseline examination; most of them reduced or omitted these substances until the end of their rehabilitation stay. The level of disability of the patients in the study varied widely but was moderate on average (see Table 1, baseline scores: WOMAC global = 4.8, range 0–10). The patients who could not be included in the study or in the 3-month followup were a median of 5 years older than the study patients, but there was no difference between the groups with respect to sex or distribution of joints involved.

Table 1. Patients with hip or knee osteoarthritis, before and after inpatient rehabilitation*
 Baseline Mean ± SD3-month followup Mean ± SDEffect (Difference = 3-month followup − baseline)
Mean ± SDESSRM
  • *

    ES = effect size; SRM = standardized response mean; WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index; SF-36 = Medical Outcomes Study 36-Item Short Form; PCS = physical component summary. WOMAC scale: 0 = no symptoms; 10 = extreme symptoms. SF-36 scale: 0 = extreme symptoms; 100 = no symptoms. ES = mean (effect) ÷ SD (baseline). SRM = mean (effect) ÷ SD (effect). Improvement if WOMAC effect < 0, SF-36 effect > 0.

WOMAC (n = 122)
 Pain4.83 ± 2.254.18 ± 2.37−0.66 ± 1.960.290.34
 Stiffness4.61 ± 2.674.58 ± 2.40−0.03 ± 2.550.010.01
 Function4.81 ± 2.184.33 ± 2.32−0.47 ± 1.730.220.27
 Global4.80 ± 2.094.32 ± 2.26−0.47 ± 1.720.230.27
SF-36 (n = 116)
 Bodily pain27.1 ± 16.537.5 ± 20.310.5 ± 23.00.630.45
 Physical function37.5 ± 20.637.9 ± 22.10.4 ± 20.30.020.02
 PCS28.6 ± 7.730.9 ± 9.12.3 ± 8.00.300.29

Baseline scores, followup scores, and effects (Tables 1 and 2)

The scores of the examination at baseline (entry into the clinic) and at the 3-month followup are listed in Table 1. The effect is the difference between the two. The WOMAC baseline scores are positioned in the middle of the range (global score 4.80), indicating moderate illness and disability. There was low floor and ceiling effect (data not shown). WOMAC pain and function, SF-36 pain, and PCS were the most responsive sections, with the highest SRM and ES resulting in comparably small sample sizes required for future studies (Table 3, columns ES and SRM). WOMAC stiffness and SF-36 physical function were not highly responsive, with SRM and ES near zero.

Table 2. Mean effects (3-month followup vs. baseline) in groups after the categories resulting from the answer to the “transition” query “health in general related to the OA joint 3 months ago”*
 Effects (3-month followup − baseline) within transition groupsMCID
Slightly worseEqualSlightly betterMCID for worseningMCID for improvement
  • *

    OA = osteoarthritis; MCID = minimal clinically important difference; WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index; SF-36 = Medical Outcomes Study 36-Item Short Form; PCS = physical component summary. MCID for worsening = effect (“slightly worse”) − effect (“equal”): absolute value. MCID for improvement = effect (“slightly better”) − effect (“equal”): absolute value. Improvement if WOMAC effect < 0, SF-36 effect > 0.

WOMAC (n = 122)n = 22n = 42n = 28
 Pain0.45−0.65−1.401.100.75
 Stiffness0.640.13−0.590.510.72
 Function0.78−0.55−1.221.330.67
 Global0.78−0.51−1.181.290.67
SF-36 (n = 116)n = 21n = 40n = 26
 Bodily pain−0.96.414.27.27.8
 Physical function−6.4−1.12.35.33.3
 PCS−0.71.33.32.02.0
Table 3. Smallest detectable difference (SDD) and sample sizes (n) given the data of a pilot study (Tables 1 and 2)*
Given the …n of pilot study, SD (baseline)ESSRMMCID for worsening, SD (baseline)MCID for improvement, SD (baseline)
Using formula1a2311
Results inSDDn per treatment armn totaln totaln total
  • *

    SD (baseline) = standard deviation of baseline scores (a priori standard deviation); ES = effect size; SRM = standardized response mean; MCID = minimal clinically important difference; WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index; SF-36 = Medical Outcomes Study 36-Item Short Form; PCS = physical component summary. ES = Mean (effect) ÷ SD (baseline). SRM = mean (effect) ÷ SD (effect).

  • Formulas in Methods section.

WOMAC (n = 122)
 Pain0.811876866142
 Stiffness0.96>1000785431216
 Function0.7832510843167
 Global0.7529710842153
SF-36 (n = 116)
 Bodily pain6.140398371
 Physical function7.6>1000197238612
 PCS2.817594233233

The differences of the mean effects between the groups of patients who replied “slightly worse” or “slightly better” and those who replied “equal” constitute the MCID for worsening and for improvement, respectively (Table 2). On average, and especially with the WOMAC scales, lower values for improvement resulted. This seems to indicate that improvement is easier to notice subjectively than worsening.

SDD and sample sizes for ES, SRM, MCID (Table 3)

The formulas described in the Methods section were applied practically to our data, which can be interpreted as coming from a previously conducted pilot study. That study produced the necessary figures for planning a future study. Inserting the n (122 for WOMAC and 116 for SF-36) and the baseline standard deviations of the pilot data, the resulting SDDs vary between 0.75 and 0.96 points for the WOMAC sections and between 6.1 and 7.6 for the SF-36 sections. The SF-36 physical component summary has a low baseline variation that gives a small SDD of 2.8 points.

Analogously, sample sizes can be determined given responsiveness data (ES or SRM) or MCID and baseline standard deviations from the pilot study data. The moderately responsive sections (WOMAC pain and function, SF-36 bodily pain) need a relatively low sample size (between 40 and 167), whereas the less responsive scales (WOMAC stiffness, SF-36 physical function and PCS) require large sample sizes that will be difficult to provide in a future study.

Illustration of sample size (n) by effect and baseline standard deviation (Figure 1)

For the WOMAC, the dependency of n on the effect (absolute change in score from baseline to 3-month followup) and on the baseline standard deviation is illustrated three-dimensionally in Figure 1. The depicted plane illustrates the minimally required sample size for detection of the given effect, assuming the given baseline standard deviation either per treatment arm for unpaired data or total for paired (followup) data. The n did not exceed 300 when there were given effects greater than 0.6 points and baseline standard deviations smaller than 2.6 from our data. These differences of 6% to 13% of maximal possible value (12% to 26% of baseline value) and the standard deviations of 20% to 26% of maximal possible value reflect most of the values found by the pilot study. Specifically, 0.6 and 0.7 WOMAC points (scale 0 to 10) and 6 and 7 SF-36 points (scale 0 to 100) are on the level of SDD and MCID in both scales.

thumbnail image

Figure 1. Sample size (n) in dependency on the effect and the baseline standard deviation (WOMAC scale); n per treatment arm (unpaired data) or total for paired (followup) data.

Download figure to PowerPoint

Determination of sample size (n) by the effect size (ES) (Figure 2)

The size of n as a parabolic function of ES is illustrated in Figure 2 for pairwise data, assuming a type I error of α = 0.05 and a power = 0.8 using formula&lsqbr;2&rsqbr; in the Methods section. Below 0.25, required sample sizes exceed 250, but in cases of ES > 0.4, i.e., when the baseline standard deviation is less than 2.5-fold of the assumed effect, n will be smaller than 100. For ES > 0.7, the minimal required number of n = 30 is sufficient for detection of significant effects, and the parabolic function flattens more because the normal approximation must be replaced by the t-approximation(34).

thumbnail image

Figure 2. Sample size (n) in dependency on the effect size (ES); n per treatment arm (unpaired data) or total for paired (followup) data.

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

Strategy and concepts

We have explained and illustrated the methodology and concrete results of the concepts of SDD, MCID, and sample size calculation by using the WOMAC and SF-36 data of OA patients who have undergone an inpatient rehabilitation intervention. These data can be considered a result of a previously conducted pilot study. Our aim was to build a bridge between the clinician and the epidemiologist/statistician, 2 professional groups with a long history of classic issues between them.

The clinical importance of effects for the assessment of MCID was quantified by the concept of transition method exemplary(26–30). The determinations of SDD, MCID, and sample size are derived from the calculation rules for normal distribution that is assumed for large numbers of subjects (n > 30) whose data are dependent on “natural” processes(31, 34–37). The assumption of normally distributed effects and differences of effects results in simple equations for the user in clinical practice. The further concepts of study design, pilot and main study, and SDD and MCID will first be discussed and then applied to concrete data.

Study design

A study to describe the effect of an intervention can examine 2 independent groups of patients, one with the intervention and the other without (the control group). The conditions at baseline should be as similar as possible for both groups in order to avoid systematic bias. The required sample sizes for this design are twice as large as those resulting from paired, followup design(31). In the paired followup design, the control group can be created by the crossover design: First, one half of the patients will receive the intervention and the second half will not; then, after a “washout” period, the second half will be treated and the first half will not. This is one of the “gold standards” for drug studies, but when applying rehabilitation interventions this design is difficult to accomplish. When assessing the same patients, uncontrolled determination of the effects will be the result of followup studies. In this case, the formulas [1], [2] (using ES), and [3] (using SRM) can be applied.

Pilot study and main study–use of ES and SRM

When planning a study to prove the effect of an intervention, one wants to know how many patients have to be examined. To determine sample sizes, an estimate of the effect of the intervention and an estimate of the variance (standard deviation) of the data are needed. Simple estimation of these figures is vague and uncertain. These estimates can be based on the results found in the literature if studies with comparable conditions can be found. Literature or simple expectation can predict the size of the effect, but the estimation of the variance remains a problem. Ideally, a small pilot study should be performed to gain valid data under as similar conditions as possible to the future main study. To keep its realization easy, fast, and economical, a cross-sectional survey of baseline data of the future patient sample is adequate. In this case, only the baseline standard deviation (the so-called a priori standard deviation) will be known, and not the standard deviation of the single effects (differences of 2 health statuses), because the short pilot study has no followup data that will allow determination of the effect's standard deviation and by that the SRM. Therefore, we can use only the ES, which is the mean effect divided by the baseline standard deviation in formulas [1] and [2], and not the SRM in formula [3], which equals the mean effect divided by the standard deviation of the single effect. This will be most typical when planning future studies.

Interpretation of SDD and MCID

A priori effects assumed for future studies should be greater than or equal to MCID, representing clinically meaningful effects. An effect smaller than MCID may be measurable, but it will make no sense from the point of view of the patient, who will be unable to notice it. Thus, if MCID > SDD, the assumed effect will be the MCID and the sample size is sufficiently large. However, if SDD > MCID we need a greater sample size or a more sensitive statistical model that is able to detect smaller effects.

Application on concrete data

In our example of 122 (WOMAC) and 116 (SF-36) inpatient rehabilitation patients, ES and MCID for worsening and MCID for improvement led to sample sizes between 40 and more than 1,000 for rehabilitation interventions. In followup studies with paired data, effect sizes greater than 0.25 led to realizable sample sizes smaller than 250. Above 0.5, the ES can be denoted as high due to requiring an n less than 63 (Figure 2). However, except for WOMAC stiffness and SF-36 physical function, the required sample sizes did not exceed 325, which may exceed feasibility for future studies in rehabilitation patients. Our data confirm that the WOMAC stiffness section is the least responsive scale(6–9), and the physical function scale of the generic SF-36 is not as sensitive as the function dimension of the condition-specific WOMAC(3, 8), despite the fact that pain is measured similarly by both instruments(6–9).

Regarding our data, one can assume an effect's standard deviation around or below the baseline standard deviation (this is true except for SF-36 bodily pain) and use the baseline standard deviation as a good estimate of that of the effect. In this case, the simple future crossover pilot study enables the use of much smaller sample sizes for paired followup data in rehabilitation patients (formulas 1 and 3).

In the WOMAC sections, we would need effects between 0.8 and 1.0 points to be detectable statistically (SDD), assuming a sample size of 122 and the a priori standard deviations of our data. Assuming that the smallest MCID in the WOMAC scale ranging from 0 to 10 points will be 0.6 points (6% of maximal value, 12% of baseline value), the required sample size will be below 300 for future followup studies (Figure 1) and will, therefore, still be realizable. Concerning bodily pain and the PCS, these figures are also valid for the SF-36. Thus, after rehabilitation intervention with more than a hundred patients, SDD and MCID remain comparable.

Conclusion.

In rehabilitation intervention, effects larger than 12% of baseline score (6% of maximal score) can be attained and detected as MCID by the transition method in both the WOMAC and the SF-36. Effects of this size lead to reasonable sample sizes for future studies, lying below n = 300. The same holds true for moderately responsive questionnaire sections with effect sizes higher than 0.25. When designing studies, assumed effects below the MCID may be detectable but are clinically meaningless.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES

This study has been supported by the Zurzach Rehabilitation Foundation SPA. We thank Stephan Mariacher, MD, and Susanne Lehmann for the planning, management, and implementation of the data base and Robin Kyburg and Diane Fassett for editing the English-language manuscript.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. Acknowledgements
  8. REFERENCES
  • 1
    Murray CJL, Lopez DL. Global mortality, disability, and the contribution of risk factors: global burden of disease study. Lancet 1997; 349: 143642.
  • 2
    Lawrence RC, Hochberg MC, Kelsey JL, McDuffie FC, Medsger TA Jr, Felts WR. Estimates of the prevalence of selected arthritic and musculoskeletal disease in the United States. J Rheumatol 1989; 16: 42741.
  • 3
    Hawker G, Melfi C, Paul J, Green R, Bombardier C. Comparison of a generic (SF-36) and a disease specific (WOMAC) instrument in the measurement of outcomes after knee replacement surgery. J Rheumatol 1995; 22: 11936.
  • 4
    Bellamy N, Buchanan WW. A preliminary evaluation of the dimensionality and clinical importance of pain and disability in osteoarthritis of the hip and knee. Clin Rheumatol 1986; 5: 23141.
  • 5
    Wright JG, Young NL. A comparison of different indices of responsiveness. J Clin Epidemiol 1997; 50: 23946.
  • 6
    Ruof J, Shanga O, Stucki G. Comparative responsiveness of 3 functional indices in ankylosing spondylitis. J Rheumatol 1999; 26: 195963.
  • 7
    Laupacis A, Bourne R, Rorabeck C, Feeny D, Wong C, Tugwell P, et al. The effect of elective total hip replacement on health-related quality of life. J Bone Joint Surg Am 1993; 75: 161926.
  • 8
    Jones CA, Voaklander DC, Johnston DWC, Suarez-Almazor ME. Health related quality of life outcomes after total hip and knee arthroplasties in a community based population. J Rheumatol 2000; 27: 174552.
  • 9
    Brazier JE, Harper R, Munro J, Walters SJ, Snaith ML. Generic and condition-specific outcome measures for people with osteoarthritis of the knee. Rheumatology (Oxford) 1999; 38: 8707.
  • 10
    Dieppe P. Therapeutic targets in osteoarthritis. J Rheumatol 1995; 22 Suppl 43: 1369.
  • 11
    Bellamy N. Outcome measurement in osteoarthritis clinical trials. J Rheumatol 1995; 22 Suppl 43: 4951.
  • 12
    Lequesne M, Brandt KD, Bellamy N, Moskowitz R, Menkes CJ, Pelletier J-P. Guidelines for testing slow acting drugs in osteoarthritis. J Rheumatol 1994; 21 Suppl 41: 6573.
  • 13
    Bellamy N. WOMAC Osteoarthritis Index: a user's guide. London, Ontario, Canada: University of Western Ontario; 1995.
  • 14
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988; 15: 183340.
  • 15
    Bellamy N. Pain assessment in osteoarthritis: experience with the WOMAC osteoarthritis index. Semin Arthritis Rheum 1989; 18 Suppl 2: 147.
  • 16
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Duku E. Signal measurement strategies: are they feasible and do they offer any advantage in outcome measurement in osteoarthritis? Arthritis Rheum 1990; 33: 73945.
  • 17
    Bellamy N, Kean WF, Buchanan WW, Gerecz-Simon E, Campbell J. Double blind randomized controlled trial of sodium meclofenamate (Meclomen) and diclofenac sodium (Voltaren): post validation reapplication of the WOMAC osteoarthritis index. J Rheumatol 1992; 19: 1539.
  • 18
    Stucki G, Meier D, Stucki S, Michel BA, Tyndall AG, Elke R, et al. Evaluation of a German version of the WOMAC (Western Ontario and McMaster Universities) osteoarthritis index. Z Rheumatol 1996; 55: 409.
  • 19
    Theiler R, Brooks P, Ghosh P. Clinical, biochemical and imaging methods of assessing osteoarthritis and clinical trials with agents claiming ‘chondromodulating’ activity. Osteoarthritis Cartilage 1994; 2: 123.
  • 20
    Ware JE, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptional framework and item selection. Med Care 1992; 30: 47383.
  • 21
    Stucki G, Liang MH, Phillips C, Katz JN. The Short Form-36 is preferable to the SIP as a generic health status measure in patients undergoing elective total hip arthroplasty. Arthritis Care Res 1995; 8: 17481.
  • 22
    Altman RD, Bloch DA, Bole GG Jr, Brandt KD, Cooke DV, Greenwald RA, et al. Development of clinical criteria for osteoarthritis. J Rheumatol 1987; 14 &lpar;special issue&rpar;: 36.
  • 23
    Altman R, Alarcón G, Appelrouth D, Bloch D, Borenstein D, Brandt K, et al. The American College of Rheumatology criteria for the classification of osteoarthritis of the hip. Arthritis Rheum 1991; 34: 50514.
  • 24
    Liang MH, Fossel AH, Larson MG. Comparisons of five health status instruments for orthopedic evaluation. Med Care 1990; 28: 63242.
  • 25
    Kazis LE, Anderson JJ, Meenan RF. Effect sizes for interpreting changes in health status. Med Care 1989; 27 Suppl 3: 17889.
  • 26
    Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures: statistics and strategies for evaluation. Control Clin Trials 1991; 12 Suppl 4: 142S158S.
  • 27
    Jaeschke R, Singer J, Guyatt GH. Measurement of health status: ascertaining the minimal clinically important difference. Control Clin Trials 1989; 10: 40715.
  • 28
    Stucki G, Liang MH, Fossel AH, Katz JN. Relative responsiveness of condition-specific and generic health status measures in degenerative lumbar spinal stenosis. J Clin Epidemiol 1995; 48: 136978.
  • 29
    Stucki G, Daltroy L, Liang MH, Lipson SJ, Fossel AH, Katz JN. Measurement properties of a self-administered outcome measure in lumbar spinal stenosis. Spine 1996; 21: 796803.
  • 1
    Angst F, Aeschlimann A, Michel BA, Stucki G. Minimal clinically important rehabilitation effects in patients with osteoarthritis of the lower extremities. J Rheumatol. In press.
  • 31
    Bland M. An introduction to medical statistics. Oxford (UK): Oxford Medical Publications; 1996.
  • 32
    Lasserre M, Boers M, van der Heijde D, Boonen A, Edmonds J, Saudan A, et al. Smallest detectable difference in radiological progression. J Rheumatol 1999; 26: 7319.
  • 33
    Nunnally JC, Bernstein IH. Psychometric theory. New York: McGraw-Hill; 1978.
  • 34
    Lachin JM. Introduction to sample size determination and power analysis for clinical trials. Control Clin Trials 1981; 2: 93113.
  • 35
    Guyatt G, Walter S, Norman G. Measuring change over time: assessing the usefulness of evaluative instruments. J Chronic Dis 1987; 40: 1718.
  • 36
    Kastenbaum MA, Hoel DG, Bowman KO. Sample size requirements: one-way analysis of variance. Biometrika 1970; 57: 42130.
  • 37
    Rosner B. Fundamentals of biostatistics. Boston: PWS-Kent; 1990.