SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

Objective

To identify whether patients have symptomatic improvement 12 months following total hip replacement (THR) surgery.

Methods

The European Collaborative Database of Cost and Practice Patterns of Total Hip Replacement study consists of 1,327 patients receiving primary THR for osteoarthritis (OA) across 20 European orthopedic centers. The primary outcome was the difference in Western Ontario and McMaster Universities OA Index (WOMAC) score between preoperative and 12-month postoperative measurements. To classify whether patients responded to THR at 12 months, we used return to normal, Outcome Measures in Rheumatology Clinical Trials (OMERACT)–OA Research Society International (OARSI) criteria, minimum important difference (MID), and minimum clinically important difference. Exposures were age, sex, obesity, employment, educational attainment, American Society of Anesthesiologists status, and radiographs.

Results

On average, there was a large improvement in WOMAC scores 12 months after surgery, but whereas some patients improved, others got worse. The OMERACT-OARSI method classified 85.7% of patients as responders, MID 70.1%, and return to normal 64.1%. In general, each approach classified the same groups of patients as responding to THR. Based on total WOMAC score, patients who were younger, morbidly obese, employed, and better educated were more likely to respond to THR, but the effects were attenuated after adjustment for confounding, with only the effect of education remaining important.

Conclusion

The overall average response to THR was good, but ∼14–36% of patients did not improve, or were worse, 12 months postsurgery. Although the OMERACT-OARSI criteria were originally designed for use in clinical drug trials, they performed well in classifying patient response 12 months post-THR. Further research is required to understand the determinants of patient outcomes following THR.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

Total hip replacement (THR) is one of the best surgical interventions available. It has been shown to be highly cost effective and very good at relieving pain and disability as well as improving quality of life (1–4). It is also among the most frequently used elective surgical procedures (∼170,000/year in Germany [5], ∼70,000/year in England [6], and ∼500,000/year in the US [7]). Approximately 80% of THRs (94% in England) are done for people with osteoarthritis (OA) (6–8).

Until recently, research on the outcomes of THR centered on prosthesis survival, and this remains the main focus of most international hip-replacement registries (8). Techniques have improved and prosthesis survival is now very good (9). However, increasing emphasis is currently being placed on patient-reported outcome measures and patient satisfaction, which may not mirror the technical outcome of surgery (10). Recent studies in the UK (11), Canada (12), and Sweden (4) have all shown that an important minority of patients having THR are not satisfied with the outcome.

The patient-reported outcome measures most often used to assess the effects of THR include well-validated self-assessment measures of OA severity such as the Oxford Hip Score (13) or Western Ontario and McMaster Universities OA Index (WOMAC) (14), which assess pain, stiffness, and function. Most studies that report such data have treated the outcome measure as a continuous variable, and simply recorded the mean scores before and after surgery (1). These have shown a marked improvement in the mean score as a result of surgery, but have not revealed individual patient gains or losses in pain and function. Such data are of little help to surgeons or patients considering the use of a THR. To know that on average a THR reduces your Oxford Hip Score or WOMAC score by x points does not tell you anything about an individual's chances of improving. A complementary approach would be to dichotomize data into those that improve as a result of surgery and those who do not, so that surgeons and patients would know that y percent of patients respond well to THR.

One of the difficulties in trying to decide whether an individual has or has not responded to THR is that disease severity at the time of surgery varies enormously (15), and that there are floor and ceiling effects in the patient-reported outcome measures used to assess severity (16). Ceiling effects occur when a large number of patients have the maximum score (i.e., most severe disease; these patients have the greatest potential to improve following surgery), whereas floor effects occur when a high percentage of patients have the minimum score (e.g., no pain or functional limitations).

From an examination of the literature, we have identified 4 broad approaches that can be used to assess whether an individual with OA is a responder or a nonresponder to an intervention: return to normal (17, 18), the Outcome Measures in Rheumatology Clinical Trials (OMERACT)–Osteoarthritis Research Society International (OARSI) responder criteria (19), the minimum important difference (MID) (20), and the minimum clinically important difference (MCID) (21–23). It was not possible to calculate the MCID in this study due to the lack of a suitable external anchoring question.

In this study, we have attempted to apply these 3 ways of dichotomizing patients into responders and nonresponders to THR to the European Collaborative Database of Cost and Practice Patterns of Total Hip Replacement (EUROHIP) cohort data, which include WOMAC scores at baseline and 12 months after surgery (15). We examined similarities and differences in results between them, and compared them with conventional data that treat the outcome as a continuous variable.

PATIENTS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

Patients.

The EUROHIP study consists of 1,327 consenting patients coming to primary THR for OA across 20 European orthopedic centers in 12 countries (15). Inclusion criteria included a diagnosis of hip OA, primary THR, and signed informed consent; exclusion criteria included causes of hip disease other than OA, severe mental illness or dementia, and patients unwilling/unable to take part in the study. Prior to surgery, patients completed questionnaires including information about age, sex, employment, education, and the WOMAC OA Index (14). Of the patients, 908 (68.4%) completed a similar postal questionnaire 12 months after surgery. Preoperative radiographs were obtained and the Kellgren/Lawrence (K/L) score was used to assess structural disease severity. The surgical teams recorded information on the patient's height and weight (from which body mass index [BMI] was calculated) and American Society of Anesthesiologists (ASA) status, which is a standard measure of fitness for surgery that was scored in this study from 1 to 4, where 1 = normal, healthy and 4 = life-threatening systemic disease.

The WOMAC Index, version 3.1, was used to assess the severity of symptoms. This consists of 24 items in 3 subscales: pain (5 items), stiffness (2 items), and physical function (17 items) (14). It is a reliable, valid, and responsive instrument for examining outcomes in patients with OA undergoing THR surgery (24–26). Missing data were treated as per the WOMAC user's handbook. The maximum possible scores for each subscale are 20 for pain, 8 for stiffness, and 68 for function. For each subscale, a normalized score was created, where 0 = no symptoms and 100 = extreme symptoms by summing up the total score of each subscale, multiplying it by 100, and dividing by the maximum score. A total score out of 96 was created by combining the 3 subscales, and then was converted into a normalized score out of 100.

Statistical methods.

Stata, version 10.1, was used for all statistical analyses (Stata, College Station, TX). Exposure variables measured at baseline were age (<50 years, 50– 69 years, or ≥70 years), sex, obesity (not obese [BMI <30 kg/m2], obese [BMI 30–39 kg/m2], or morbidly obese [BMI ≥40 kg/m2]), employment status (employed, retired, retired early, or other), education after leaving school (none, diploma or equivalent, degree, or postgraduate degree), ASA status (1, 2, 3, or 4), and K/L grade (0, 1, 2, 3, or 4) of the hip operated on. Two types of outcome variable were used for analysis: continuous and dichotomous outcomes.

The continuous outcome was the absolute change in WOMAC score, defined as the baseline score minus the score at 12-month followup. Absolute change was calculated for the combined total WOMAC score and the 3 WOMAC subscales (pain, stiffness, and function). The distribution of WOMAC scores was assessed to examine the assumption of normality. Because the distribution of the 12-month score was skewed to the right, the median and interquartile range were calculated for WOMAC scores at baseline and followup, and for the difference in scores. We calculated 95% confidence intervals (95% CIs) around the median WOMAC scores using bootstrapping, with bias-corrected and accelerated intervals. Univariable analyses were performed to explore whether the difference in median WOMAC scores was associated with each exposure by using a Kruskal-Wallis test (one-way analysis of variance [ANOVA] by ranks), the nonparametric equivalent to ANOVA.

For the dichotomous outcomes, the following approaches were used to dichotomize patients into whether or not they responded to THR. A random-effects logistic regression model was fitted that controlled for evidence of clustering across countries. Univariable analyses were used to obtain crude odds ratios (ORs) that examined the association between whether or not a patient responded to THR and each of the exposure variables. Multivariable analyses were then fitted to obtain adjusted ORs. Wald's tests were used to explore linear trends.

Return to normal.

A cutoff point is used to identify patients that have returned to normal function (17, 18), defined as being 2 SDs below the mean of the baseline WOMAC score of the EUROHIP cohort (Figure 1). In order to use the cutoff, an underlying assumption was that the group's baseline WOMAC score was normally distributed, so this was assessed to examine the assumption of normality. In conjunction with the cutoff, the Relative Change Index was used to ensure that the degree of change in WOMAC score between baseline and 12-month followup was large enough to exceed that of measurement error (chance). The Relative Change Index was calculated by dividing the magnitude of change by the standard error of the difference score (see Supplementary Figure 1, available in the online version of this article at http://www3.interscience.wiley.com/journal/77005015/home). Therefore, to be classified as returning to normal, a patient must have passed the cutoff and have had a reliable change score.

thumbnail image

Figure 1. Histograms displaying the distribution of Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) scores at baseline, 12-month followup, and the difference.

Download figure to PowerPoint

OMERACT-OARSI responder criteria.

The OMERACT-OARSI criteria (19) are high improvement in pain or in function of ≥50% and an absolute change of ≥20, and, if the patient does not fulfill them, improvement in ≥2 of the 3 following: pain of ≥20% and an absolute change of ≥10, function of ≥20% and an absolute change of ≥10, and patient's global assessment of ≥20% and an absolute change of ≥10.

The patient's global assessment was measured using the total WOMAC score. Absolute change was defined as the 12-month score minus the baseline score on a 0–100 interval scale, and relative change was defined as the percentage of change during the study, which was the 12-month score minus the baseline score, divided by the baseline score multiplied by 100. The outcome variable was then defined according to whether or not patients met the criteria.

MID.

The MID was defined as the smallest change in WOMAC score between baseline and 12-month followup that would likely be important from the patient's or the clinician's perspective. It has been suggested that a one-half–SD change of the mean difference in scores may approximate an MID for some patient-reported outcome instruments, and that evidence from previous studies, physiologic arguments, and statistical theory shows a tendency to converge to the one-half–SD criteria as being meaningful to patients (20). However, evidence supporting any MID is needed to justify such estimates, and because no such evidence is available, we defined the MID as being within one-half of an SD of the mean difference in WOMAC score because this is a reasonable place to start as a meaningful difference (20). For the outcome variable, patients were dichotomized according to whether the difference in WOMAC score was greater than or equal to the MID.

MCID.

The MID is known as the MCID when connected to clinical anchors (20), which are from questions asking about improvement or satisfaction with surgery at the time of followup (21–23). A 75th percentile approach was used, which identified the cut point corresponding to the 75th percentile of scores for improvement in patients reporting an important improvement by the anchoring question (21–23). This was calculated for the absolute difference in WOMAC score. The binary outcome variable was created according to whether or not a patient's difference in WOMAC score was greater than or equal to the MCID. Alternatively, a receiver operating characteristic curve analysis could be used, where the gold standard was whether or not a patient improved according to the anchoring question, and we identified the cut point on the WOMAC score that maximized sensitivity and specificity. Unfortunately, we were unable to use the MCID approach in the EUROHIP study due to the lack of a suitable anchoring question.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

Of the 1,327 patients in the study with preoperative data, 908 (68.4%) completed the 12-month followup questionnaire. A comparison of the baseline characteristics of patients who completed the one-year data (n = 908) with those who did not return the followup questionnaire (noncompleters, n = 419) is shown in Table 1. Noncompleters had higher WOMAC pain (P = 0.009) and function (P < 0.001) scores. Both groups were similar with respect to most other baseline characteristics, except that noncompleters were more highly educated (P = 0.002) and had lower ASA scores (P = 0.012).

Table 1. Preoperative characteristics of those who returned 1-year followup data (completers), compared with those who did not (noncompleters)*
 Completers (n = 908)Noncompleters (n = 419)
  • *

    Values are the number (percentage) unless otherwise indicated. WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index; IQR = interquartile range; BMI = body mass index; ASA = American Society of Anesthesiologists; K/L = Kellgren/Lawrence.

WOMAC pain score  
 Mean ± SD54.5 ± 17.657.7 ± 18.1
 Median (IQR)55 (43–65)60 (45–70)
WOMAC stiffness score  
 Mean ± SD60.2 ± 20.261.1 ± 22.0
 Median (IQR)63 (50–75)63 (50–75)
WOMAC function score  
 Mean ± SD58.7 ± 16.363.8 ± 17.1
 Median (IQR)59 (49–69)65 (54–75)
Total WOMAC score  
 Mean ± SD57.9 ± 15.862.2 ± 16.4
 Median (IQR)58 (47–69)64 (53–74)
Age groups  
 <50 years77 (8.6)40 (9.9)
 50–69 years486 (54.3)217 (53.8)
 ≥70 years332 (37.1)146 (36.2)
Sex  
 Men378 (43.8)181 (44.8)
 Women485 (56.2)223 (55.2)
Obesity (BMI in kg/m2)  
 Not obese (<30)623 (74.5)306 (78.7)
 Obese (30–39)202 (24.2)79 (20.3)
 Morbidly obese (≥40)11 (1.3)4 (1.0)
Employment status  
 Employed217 (24.4)108 (26.8)
 Retired522 (58.6)218 (54.1)
 Retired early67 (7.5)38 (9.4)
 Other85 (9.5)39 (9.7)
Education  
 Postgraduate degree28 (3.5)24 (6.8)
 University degree93 (11.7)56 (15.8)
 College diploma or equivalent249 (31.2)122 (34.4)
 None428 (53.6)153 (43.1)
ASA status  
 1123 (15.5)86 (22.5)
 2505 (63.7)214 (56.0)
 3160 (20.2)77 (20.2)
 45 (0.6)5 (1.3)
K/L grade  
 06 (0.7)0 (0.0)
 15 (0.6)1 (0.4)
 226 (3.2)6 (2.4)
 3394 (49.2)116 (46.4)
 4370 (46.2)127 (50.8)

When we examined the average WOMAC scores for the whole group before and after surgery, we found a large average improvement after THR. The median total preoperative score was 58.3 (95% CI 57.1, 59.5) and the postoperative score was 15.6 (95% CI 14.1, 17.1), resulting in an average median improvement of 35.9 (95% CI 33.6, 38.3). However, further examination shows that these average figures obscure the fact that some people got worse. Histograms of the distribution of WOMAC scores at baseline and 12-month followup, and of the difference in scores, are contained in Figure 1. Although at baseline subjects had normally distributed scores, by the 12-month followup we can see that they were positively skewed to the right, suggesting an improvement in pain, stiffness, and function for the majority of the patients. Severe floor effects were apparent at followup, with 7.2% of patients having total WOMAC scores of zero, indicating no pain, stiffness, or functional limitations following surgery. However, looking at histograms of the difference in scores, we can see that although some patients had improved at 12 months, others got worse. For example, the range of values for the difference in pain score goes from −45 (worse pain at 12 months) to +100 (less pain).

The WOMAC scores and exposure variables (risk factors) are shown in Table 2. There was some evidence that greater improvements were seen in women, morbidly obese people, employed people, and people with a K/L grade of 4. Repeating the analyses for each of the 3 WOMAC subscales separately (data not shown) found evidence that women and morbidly obese people had greater reductions in pain, and that people age <50 years, employed people, and those with a K/L grade of 4 had greater reductions in stiffness. Morbidly obese people and those with a K/L grade of 4 also received greater improvements in function.

Table 2. Summary of total WOMAC scores at baseline and 12-month followup, and the difference in scores*
 No. (%)Baseline score12-month followup scoreDifference in scoresKruskal- Wallis test, P
Median (95% CI)IQRMedian (95% CI)IQRMedian (95% CI)IQR
  • *

    Nonparametric test of the null hypothesis that each exposure group comes from identical populations with the same median. 95% CI = 95% confidence interval. See Table 1 for additional definitions.

  • Whether the difference in median WOMAC scores is associated with each exposure variable.

Overall908 (100)58.3 (57.1, 59.5)46.9–68.815.6 (14.1, 17.1)6.3–36.435.9 (33.6, 38.3)20.8–50.0 
Age groups       0.21
 <50 years77 (8.6)56.3 (51.6, 60.9)46.9–63.510.4 (6.1, 14.8)2.1–20.842.2 (35.2, 49.2)25.0–52.1 
50–69 years486 (54.3)58.3 (56.8, 59.8)45.8–68.815.6 (13.7, 17.5)6.8–35.935.9 (32.9, 38.9)20.8–49.0 
 ≥70 years332 (37.1)60.4 (58.4, 62.4)50.0–70.818.8 (14.6, 22.9)7.3–39.634.4 (31.0, 37.7)19.8–50.2 
Sex       0.06
 Men378 (43.8)55.2 (52.5, 57.9)43.8–65.614.6 (12.4, 16.8)6.3–34.433.3 (30.9, 35.8)19.8–48.2 
 Women485 (56.2)60.4 (59.2, 61.7)51.0–71.916.7 (13.8, 19.6)7.3–37.538.5 (35.3, 41.8)21.9–50.3 
Obesity (BMI in kg/m2)       0.012
 Not obese (<30)623 (74.5)57.6 (56.0, 59.3)45.8–68.413.5 (11.9, 15.2)5.2–33.637.5 (35.0, 40.0)21.9–50.0 
 Obese (30–39)202 (24.2)61.5 (59.3, 63.6)51.1–72.922.9 (18.1, 27.7)11.5–46.933.3 (29.1, 37.6)17.7–49.3 
 Morbidly obese (≥40)11 (1.3)68.1 (61.4, 74.8)62.5–75.014.1 (9.4, 18.7)10.5–16.756.1 (46.7, 65.4)47.9–64.6 
Employment status       0.05
 Employed217 (24.4)57.3 (54.8, 59.8)45.8–65.612.5 (9.9, 15.1)4.2–27.240.6 (35.8, 45.5)25.0–52.0 
 Retired522 (58.6)58.3 (56.6, 60.0)46.9–68.816.7 (13.8, 19.5)7.3–38.534.4 (31.9, 36.8)18.8–49.0 
 Retired early67 (7.5)64.6 (60.2, 69.0)54.7–75.025.0 (12.5, 37.5)12.5–55.734.4 (24.5, 44.3)13.5–52.6 
 Other85 (9.5)58.3 (55.6, 61.1)47.9–69.813.0 (8.2, 17.8)6.3–32.338.5 (33.9, 43.2)25.0–52.1 
Education       0.2
 Postgraduate degree28 (3.5)52.1 (44.2, 60.0)34.4–67.75.2 (2.5, 8.0)2.1–11.543.8 (35.5, 52.0)28.1–50.0 
 University degree93 (11.7)54.2 (48.9, 59.5)41.7–63.510.4 (7.0, 13.9)3.1–19.837.5 (30.5, 44.5)29.2–50.0 
 College diploma or equivalent249 (31.2)55.2 (52.5, 57.9)44.8–65.613.8 (12.1, 15.5)6.3–27.137.5 (34.2, 40.8)21.9–49.0 
 None428 (53.6)60.4 (59.2, 61.7)50.0–70.819.8 (16.3, 23.3)8.3–42.734.4 (30.9, 37.8)19.8–50.0 
ASA status       0.13
 1123 (15.5)55.2 (51.7, 58.7)43.8–62.510.4 (7.1, 13.7)3.1–20.840.6 (35.2, 46.0)28.1–53.1 
 2505 (63.7)58.3 (56.6, 60.1)45.8–68.814.6 (12.5, 16.6)6.3–36.535.4 (32.8, 38.1)19.8–49.3 
 3160 (20.2)64.6 (61.1, 68.0)54.7–75.026.3 (19.7, 32.9)14.6–52.133.3 (27.8, 38.9)17.7–49.3 
 45 (0.6)50.0 (42.0, 58.0)50.0–52.116.7 (7.7, 25.6)11.5–16.738.5 (30.9, 46.2)33.3–40.6 
K/L grade       0.03
 06 (0.7)55.3 (37.6, 73.1)50.0–72.916.5 (−9.1, 42.1)12.5–50.035.9 (20.6, 51.3)26.0–41.3 
 15 (0.6)63.5 (48.5, 78.6)54.7–75.141.1 (12.3, 69.9)17.7–55.234.0 (3.3, 64.6)8.3–48.5 
 226 (3.2)58.9 (51.9, 65.8)42.7–64.619.8 (11.6, 27.9)7.3–31.829.7 (22.6, 36.8)18.8–42.7 
 3394 (49.2)58.3 (56.4, 60.3)46.9–66.716.7 (13.8, 19.5)7.3–40.634.4 (31.3, 37.4)18.8–47.9 
 4370 (46.2)59.4 (57.3, 61.4)47.9–70.814.2 (12.2, 16.2)5.2–34.438.5 (35.4, 41.7)26.0–52.1 

As a complementary approach, we dichotomized patients according to whether or not they responded to THR. Each methodologic approach suggests a different threshold to classify patients as responding at 12 months. For the return to normal method, patients with a reliable change score were considered returned to normal if their followup pain score was ≤18.9, their stiffness score was ≤19.9, their function score was ≤25.8, and their total WOMAC score was ≤26.1. The MID for the change between baseline and 12-month WOMAC scores was a difference in pain score of ≥24.5, in stiffness score of ≥18.4, in function score of ≥23.9, and in total WOMAC score of ≥24.2. Thresholds for the OMERACT-OARSI responder criteria were predefined as high improvement in pain or function of ≥50% and an absolute change of ≥20; and if the patient did not fulfill this, improvement in ≥2 of the following: pain of ≥20% and an absolute change of ≥10, function of ≥20% and an absolute change of ≥10, and patient's global assessment of ≥20% and an absolute change of ≥10.

The number of patients considered to have a response for each methodologic approach, broken down by exposure variables for the overall WOMAC score (data are available by pain, stiffness, and function scores, but not reported here), are shown in Table 3. The OMERACT-OARSI method classified the greatest percentage of people (85.7%) as having responded to surgery, the MID classified 70.1%, and return to normal classified 64.1%. There was evidence of variation across countries: patients in Hungary/Poland were less likely to respond to surgery compared with patients in other countries. For the total WOMAC score, a greater proportion of younger people, the morbidly obese, those in employment, educated people, and those in ASA group 1 were more likely to respond. However, only the effect of education remained after adjustment for confounding in multivariable analyses. Evidence of these associations varied dependent on the method used (Table 3), and, in general, associations were strongest using the return to normal approach.

Table 3. Proportion of patients considered to have an improvement/response 12 months after hip replacement surgery*
 TotalReturn to normalOMERACT-OARSIMID
No. (%)UnivariableMultivariableNo. (%)UnivariableMultivariableNo. (%)UnivariableMultivariable
  • *

    Values are the odds ratio (95% confidence interval) unless otherwise indicated. OMERACT = Outcome Measures in Rheumatology Clinical Trials; OARSI = Osteoarthritis Research Society International; MID = minimum important difference. See Table 1 for additional definitions.

  • Multivariable analyses controlling for age, sex, obesity, employment status, education, ASA status, and K/L grade.

  • Parameter excluded from regression model because there were too few observations to estimate it.

Overall845542 (64.1)  724 (85.7)  592 (70.1)  
Age groups          
 <50 years7457 (77.0)2.4 (1.3, 4.6)1.7 (0.6, 4.6)64 (86.5)1.1 (0.5, 2.3)0.7 (0.2, 2.5)57 (77.0)1.5 (0.8, 2.9)0.9 (0.3, 2.3)
50–69 years468306 (65.4)1.5 (1.1, 2.0)1.1 (0.7, 1.8)401 (85.7)1.1 (0.7, 1.7)0.9 (0.4, 1.7)326 (69.7)1.1 (0.8, 1.5)0.7 (0.4, 1.1)
 ≥70 years293172 (58.7)1.001.00251 (85.7)1.001.00201 (68.6)1.001.00
 Missing107 (70.0)  8 (80.0)  8 (80.0)  
 P linear trend  0.0020.42 0.740.54 0.210.28
Sex          
 Men359239 (66.6)1.001.00305 (85.0)1.001.00245 (68.2)1.001.00
 Women443273 (61.6)0.8 (0.6, 1.1)1.0 (0.7, 1.5)382 (86.2)1.1 (0.7, 1.7)1.6 (0.9, 2.8)315 (71.1)1.2 (0.9, 1.6)1.2 (0.8, 1.8)
 Missing4330 (69.8)  37 (86.0)  32 (74.4)  
Obesity (BMI in kg/m2)          
 Not obese (<30)577386 (66.9)1.001.00503 (87.2)1.001.00412 (71.4)1.001.00
 Obese (30–39)191102 (53.4)0.6 (0.4, 0.8)0.8 (0.5, 1.3)152 (79.6)0.6 (0.4, 0.9)0.9 (0.5, 1.6)124 (64.9)0.8 (0.5, 1.1)0.9 (0.6, 1.5)
 Morbidly obese (≥40)108 (80.0)2.0 (0.4, 10.0)3.1 (0.3, 28.8)10 (100.0)9 (90.0)
 Missing6746 (68.7)  59 (88.1)  47 (70.1)  
 P linear trend  0.0180.66 0.110.86 0.380.95
Employment status          
 Employed213155 (72.8)1.7 (1.1, 2.4)0.8 (0.5, 1.5)192 (90.1)1.5 (0.9, 2.6)0.9 (0.4, 2.2)162 (76.1)1.5 (1.0, 2.1)1.3 (0.7, 2.3)
 Retired475291 (61.3)1.001.00404 (85.1)1.001.00323 (68.0)1.001.00
 Retired early6432 (50.0)0.7 (0.4, 1.3)0.5 (0.2, 1.1)49 (76.6)0.7 (0.3, 1.3)0.6 (0.2, 1.5)39 (60.9)0.8 (0.5, 1.4)0.7 (0.3, 1.4)
 Other7855 (70.5)1.4 (0.8, 2.4)1.0 (0.5, 2.3)66 (84.6)0.9 (0.4, 1.8)0.8 (0.3, 2.5)59 (75.6)1.4 (0.8, 2.4)1.9 (0.8, 4.6)
 Missing159 (60.0)  13 (86.7)  9 (60.0)  
Education          
 Postgraduate degree2723 (85.2)4.0 (1.3, 12.0)2.6 (0.8, 8.7)25 (92.6)2.7 (0.6, 12.1)3.7 (0.5, 29.8)21 (77.8)1.6 (0.6, 4.0)1.5 (0.5, 5.1)
 University degree8567 (78.8)2.7 (1.5, 4.9)2.9 (1.4, 5.9)83 (97.6)8.4 (2.0, 35.3)14.4 (1.8, 117.8)68 (80.0)1.8 (1.0, 3.2)1.8 (0.9, 3.5)
 College diploma or equivalent238174 (73.1)2.0 (1.4, 2.8)2.1 (1.3, 3.4)210 (88.2)1.6 (1.0, 2.6)1.7 (0.9, 3.3)171 (71.8)1.2 (0.8, 1.7)1.4 (0.9, 2.2)
 None395224 (56.7)1.001.00326 (82.5)1.001.00269 (68.1)1.001.00
 Missing10054 (54.0)  80 (80.0)  63 (63.0)  
 P linear trend  < 0.001< 0.001 0.0010.002 0.0390.08
ASA status          
 111794 (80.3)2.4 (1.4, 4.2)2.5 (1.2, 5.3)105 (89.7)1.4 (0.7, 2.8)1.2 (0.5, 3.1)92 (78.6)1.7 (1.0, 2.9)1.5 (0.8, 3.0)
 2469299 (63.8)1.001.00404 (86.1)1.001.00322 (68.7)1.001.00
 314872 (48.6)0.6 (0.4, 0.9)0.6 (0.4, 1.0)115 (77.7)0.7 (0.4, 1.1)0.8 (0.4, 1.6)96 (64.9)0.9 (0.6, 1.4)1.1 (0.7, 1.9)
 454 (80.0)2.4 (0.2, 27.7)2.2 (0.2, 29.4)5 (100.0)5 (100.0)
 Missing10673 (68.9)  95 (89.6)  77 (72.6)  
 P linear trend  < 0.0010.003 0.070.69 0.160.85
K/L grade          
 064 (66.7)0.9 (0.2, 5.3)5 (83.3)0.6 (0.1, 5.2)5 (83.3)1.4 (0.2, 12.6)
 141 (25.0)0.1 (0.0, 1.4)3 (75.0)0.3 (0.0, 3.1)0.0 (0.0, 2.2)2 (50.0)0.2 (0.0, 1.6)0.1 (0.0, 2.7)
 22416 (66.7)0.9 (0.4, 2.2)0.8 (0.3, 2.6)21 (87.5)0.8 (0.2, 2.8)1.2 (0.2, 6.1)16 (66.7)0.5 (0.2, 1.3)0.4 (0.2, 1.3)
 3365224 (61.4)0.7 (0.5, 1.0)0.7 (0.5, 1.1)304 (83.3)0.6 (0.4, 0.9)0.6 (0.4, 1.2)239 (65.5)0.5 (0.4, 0.8)0.5 (0.4, 0.8)
 4348234 (67.2)1.001.00309 (88.8)1.001.00265 (76.1)1.001.00
 Missing9863 (64.3)  82 (83.7)  65 (66.3)  
 P linear trend  0.0510.25 0.0420.26 0.0040.016

In general, each of the different methodologic approaches classified the same patients as having a response to THR (Figure 2). For example, all 3 methods (OMERACT-OARSI, return to normal, and MID) identified the same 487 patients (58%) as responders, and they all identified the same core group of 116 patients (14%) as nonresponders.

thumbnail image

Figure 2. Venn diagram describing the extent to which the 3 methods identify the same patients (n = 845) as having responded to treatment. OARSI = Osteoarthritis Research Society International; MID = minimum important difference.

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

In agreement with previous work (1), this large cohort study of patients receiving primary THR for OA in European orthopedic centers demonstrates that 12 months after surgery, there is a large average improvement in patients' pain and disability. However, the data also show that although the majority of patients received symptomatic improvement, there was an important group of ∼14–36% of patients who experienced little or no improvement after surgery. Although some patients got better, others got worse. In general, each of the different methodologic approaches that we used to dichotomize patients identified the same patients as responders or nonresponders. The OMERACT-OARSI method classified the greatest number of patients as responders (85.7%), the MID method classified 70.1%, and return to normal classified 64.1%.

The strengths of this study include the relatively large number of patients involved and the use of validated outcome measures. The other major strength was the application of a variety of different, rigorous approaches to displaying and analyzing the data on outcomes 12 months after surgery. Another strength of this study was the multi-country nature of the cohort; however, this was also a limitation due to variations in the health care systems between, for example, Northern and Eastern Europe, because patients in Hungary and Poland were likely to have had more severe preoperative pain and function (15) and receive surgery at a later stage of disease progression. There were also limitations concerning the representativeness of the cohort because it came from self-selected centers rather than a random selection of orthopedic centers, and because we do not know how many patients were excluded from the study in each center or the reasons for their exclusion.

A potential limitation of prospective study designs such as this is the impact of response bias. Comparison of the baseline characteristics of those who did and did not complete the 12-month followup showed that noncompleters had higher pain and function scores preoperatively, and were also better educated and healthier. Others have suggested that preoperative pain and function are highly predictive of postoperative status (12). Although those with more severe arthritis symptoms reported comparable improvements with those with less severe disease, they did not reach similar postoperative levels; therefore, we may have overestimated the proportion of patients responding to THR due to response bias. Alternatively, the results of this study suggest that patients who were better educated and those with low ASA scores were more likely to have symptomatic improvement following THR. Because these groups of patients were less likely to complete the followup questionnaire, response bias could lead us to have underestimated the proportion of patients responding to surgery.

As described earlier, we were unable to calculate the MCID due to the inadequacy of the anchoring question in the data set. The most widely used anchoring questions within the literature ask patients about either their perceived improvement after surgery in order to define the minimum clinically important improvement, or, alternatively, anchoring questions are used to identify Patient Acceptable Symptom State cutoffs, such as “Taking into account all the activities you have during your daily life, your level of pain, and also your functional impairment, do you consider that your current state is satisfactory?” (21–23). It has also been suggested that rather than using 1 global anchor question, separate anchors should be used (20) for each subscale (pain, stiffness, and function), because, for example, patients may be satisfied with levels of pain, but not function.

Other studies have previously explored the issue of whether patients responded to THR (4, 16, 27, 28). An early US study by MacWilliam et al identified predictors of outcome in 1,500 patients receiving THR (27). They found that although the mean change in pain score rose rapidly initially, it tailed off by 6 months. A similar effect was seen for function, although this was more gradual. In line with our findings, the authors found that change scores approximated a normal distribution and indicated that 16% of patients reported no change or increased pain at 6 months, and that 24% of patients reported no change or decreased physical function at 6 months. A more recent study by Nilsdotter et al followed up a group of 198 patients receiving primary THR for OA in Halmstad, Sweden for 3.6 years (4). They identified nonresponders to THR surgery in 3 ways, with the following percentages not responding: 1) patients with the lowest quartile of WOMAC scores at followup (25%), 2) patients with an absolute improvement in WOMAC score of <20/100 (22%), and 3) OARSI criteria (9%). The results showed that on average, patients improved after surgery, as measured by a doubling in WOMAC score and improved Short Form 36 (SF-36) scores for all domains except general heath, yet at the individual level some patients improved, whereas others did not.

These studies found that there was a group of patients that did not improve after surgery, and their numbers of nonresponders were similar to those found in our study. In line with other studies, the results of these studies also show that at followup, patients still had worse WOMAC pain and function scores and SF-36 physical function scores (3, 28) than the reference group (age- and sex-matched controls), suggesting that patients receiving THR do not return to the same levels of pain and function as those of the general population.

The current large prospective cohort study has confirmed that although on average THR is successful, there is an important minority of ∼14–36% of patients who do not respond to treatment. What this study adds that has not been done before is a comparison of the various methods used to display the data and a dichotomization of patients into responders and nonresponders. First, the use of graphical displays of the change in WOMAC scores (Figure 1) makes it clear that there was an important subgroup of patients who got worse rather than better after surgery. The importance of dichotomizing patients to find out who these patients are was outlined in the Introduction; it allows both clinicians and patients clear information on the overall chances of improvement, and can also be used to explore the determinants of good and bad outcomes in different individuals, leading, hopefully, to more individualized information and health care. Therefore, when a patient is making a decision about surgery, they can be fully informed as to their own specific risks and benefits of surgery as part of informed patient-clinician decision making.

In general, each methodologic approach classified the same groups of patients as responding to THR. Although the OMERACT-OARSI set of responder criteria were originally designed for use in clinical drug trials in patients with OA, they also appear to perform well in classifying patient response 12 months post-THR. However, 14.3% of people were nonresponders to surgery using these criteria, indicating that THR does not even achieve the amount of improvement expected of a drug in these patients. Ease of administration/recording is an additional factor to consider in the choice of outcome measure, and our data suggest that the performance of the OMERACT-OARSI criteria, its relationship to that of other outcome measures, and its relative clinical utility support its inclusion as a patient-reported outcome measure of choice following THR. When sequential assessments using a single measure have been obtained and the OMERACT-OARSI criteria cannot be estimated, the MID derived from ordinal scales provides a convenient alternative.

The reasons for someone being a nonresponder to THR are not apparent from this study, and there are many possibilities, such as the progression of joint disease elsewhere or the development of comorbidities. However, other data referred to in the Introduction (4, 11, 12) suggest that many people are dissatisfied with their hip surgery; therefore, the operation may not achieve the expected relief of pain and disability from the hip joint in a proportion of people.

In conclusion, it is clear from this work and the other reviewed data in the literature that there is a relatively large, important group of patients who do not respond to THR. Unfortunately, neither we nor others have yet been able to identify clear predictors of responders or nonresponders to help us individualize advice about surgery, and a lot of further research is required to explore the determinants of outcomes of THR surgery (27). This will be the focus of further work by the EUROHIP collaboration.

AUTHOR CONTRIBUTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

All authors were involved in drafting the article or revising it critically for important intellectual content, and all authors approved the final version to be submitted for publication. Dr. Judge had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study conception and design. Judge, Cooper, Williams, Dreinhoefer, Dieppe.

Acquisition of data. Cooper, Williams, Dreinhoefer, Dieppe.

Analysis and interpretation of data. Judge, Cooper, Dreinhoefer, Dieppe.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

The authors wish to thank the many surgical teams and individuals who contributed to the success of this project. We would also like to thank the EUROHIP advisor committee: Hermann Brenner, Maxime Dougados, Klaus Hug, and Heiner Raspe.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information
  • 1
    Ethgen O, Bruyere O, Richy F, Dardennes C, Reginster JY. Health-related quality of life in total hip and total knee arthroplasty: a qualitative and systematic review of the literature. J Bone Joint Surg Am 2004; 86A: 96374.
  • 2
    Hawker GA, Badley EM, Croxford R, Coyte PC, Glazier RH, Guan J, et al. A population-based nested case-control study of the costs of hip and knee replacement surgery. Med Care 2009; 47: 73241.
  • 3
    Cushnaghan J, Coggon D, Reading I, Croft P, Byng P, Cox K, et al. Long-term outcome following total hip arthroplasty: a controlled longitudinal study. Arthritis Rheum 2007; 57: 137580.
  • 4
    Nilsdotter AK, Petersson IF, Roos EM, Lohmander LS. Predictors of patient relevant outcome after total hip replacement for osteoarthritis: a prospective study. Ann Rheum Dis 2003; 62: 92330.
  • 5
    BQS quality report. 2008. URL: http://www.bqs-qualitaetsreport.de/.
  • 6
    National Joint Registry. National Joint Registry for England and Wales. Fifth annual report. 2008. URL: http://www.njrcentre.org.uk/njrcentre/AbouttheNJR/Publicationsandreports/Annualreports/Archivedannualreports/tabid/87/Default.aspx.
  • 7
    United States Bone and Joint Decade, American Academy of Orthopaedic Surgeons. The burden of musculoskeletal diseases in the United States. Rosemont (IL): American Academy of Orthopaedic Surgeons; 2008.
  • 8
    Dieppe P, Dixon D, Horwood J, Pollard B, Johnston M, and the MOBILE Research Team. MOBILE and the provision of total joint replacement. J Health Serv Res Policy 2008; 13 Suppl: 56.
  • 9
    Allami MK, Fender D, Khaw FM, Sandher DR, Esler C, Harper WM, et al. Outcome of Charnley total hip replacement across a single health region in England: the results at ten years from a regional arthroplasty register. J Bone Joint Surg Br 2006; 88: 12938.
  • 10
    Darzi L. High quality care for all: NHS next stage review final report. Norwich (UK): NHS; 2008.
  • 11
    Williams O, Fitzpatrick R, Hajat S, Reeves BC, Stimpson A, Morris RW, et al. Mortality, morbidity, and 1-year outcomes of primary elective total hip arthroplasty. J Arthroplasty 2002; 17: 16571.
  • 12
    Hawker GA. Who, when, and why total joint replacement surgery? The patient's perspective. Curr Opin Rheumatol 2006; 18: 52630.
  • 13
    Dawson J, Fitzpatrick R, Carr A, Murray D. Questionnaire on the perceptions of patients about total hip replacement. J Bone Joint Surg Br 1996; 78: 18590.
  • 14
    Bellamy N. WOMAC: a 20-year experiential review of a patient-centered self-reported health status questionnaire. J Rheumatol 2002; 29: 24736.
  • 15
    Dieppe P, Judge A, Williams S, Ikwueke I, Guenther KP, Floeren M, et al. Variations in the pre-operative status of patients coming to primary hip replacement for osteoarthritis in European orthopaedic centres. BMC Musculoskelet Disord 2009; 10: 19.
  • 16
    Quintana JM, Escobar A, Bilbao A, Arostegui I, Lafuente I, Vidaurreta I. Responsiveness and clinically important differences for the WOMAC and SF-36 after hip joint replacement. Osteoarthritis Cartilage 2005; 13: 107683.
  • 17
    Jacobson NS, Roberts LJ, Berns SB, McGlinchey JB. Methods for defining and determining the clinical significance of treatment effects: description, application, and alternatives. J Consult Clin Psychol 1999; 67: 3007.
  • 18
    Jacobson NS, Truax P. Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. J Consult Clin Psychol 1991; 59: 129.
  • 19
    Pham T, van der Heijde D, Altman RD, Anderson JJ, Bellamy N, Hochberg M, et al. OMERACT-OARSI initiative: Osteoarthritis Research Society International set of responder criteria for osteoarthritis clinical trials revisited. Osteoarthritis Cartilage 2004; 12: 38999.
  • 20
    Revicki D, Hays RD, Cella D, Sloan J. Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 2008; 61: 1029.
  • 21
    Kvien TK, Heiberg T, Hagen KB. Minimal clinically important improvement/difference (MCII/MCID) and patient acceptable symptom state (PASS): what do these concepts mean? [review]. Ann Rheum Dis 2007; 66 Suppl 3: iii401.
  • 22
    Tubach F, Ravaud P, Baron G, Falissard B, Logeart I, Bellamy N, et al. Evaluation of clinically relevant changes in patient reported outcomes in knee and hip osteoarthritis: the minimal clinically important improvement. Ann Rheum Dis 2005; 64: 2933.
  • 23
    Tubach F, Ravaud P, Baron G, Falissard B, Logeart I, Bellamy N, et al. Evaluation of clinically relevant states in patient reported outcomes in knee and hip osteoarthritis: the patient acceptable symptom state. Ann Rheum Dis 2005; 64: 347.
  • 24
    Bellamy N, Buchanan WW, Goldsmith CH, Campbell J, Stitt LW. Validation study of WOMAC: a health status instrument for measuring clinically important patient relevant outcomes to antirheumatic drug therapy in patients with osteoarthritis of the hip or knee. J Rheumatol 1988; 15: 183340.
  • 25
    Nilsdotter AK, Roos EM, Westerlund JP, Roos HP, Lohmander LS. Comparative responsiveness of measures of pain and function after total hip replacement. Arthritis Rheum 2001; 45: 25862.
  • 26
    McConnell S, Kolopack P, Davis AM. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC): a review of its utility and measurement properties. Arthritis Rheum 2001; 45: 45361.
  • 27
    MacWilliam CH, Yood MU, Verner JJ, McCarthy BD, Ward RE. Patient-related risk factors that predict poor outcome after total hip replacement. Health Serv Res 1996; 31: 62338.
  • 28
    Busija L, Osborne RH, Nilsdotter A, Buchbinder R, Roos EM. Magnitude and meaningfulness of change in SF-36 scores in four types of orthopedic surgery. Health Qual Life Outcomes 2008; 6: 55.

Appendix A

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

MEMBERS OF THE EUROHIP STUDY GROUP

The members of the EUROHIP study group are: Martin Krismer, Bernd Stoeckl (University of Orthopedic Surgery, Innsbruck, Austria); Karl Knahr, Oswald Pinggera (Orthopedic Spital Wien–Speising, Austria); Pekka Ylinen (Orton Orthopaedic Hospitala, Helsinki, Finland); Moussa Hammadouche (Groupe Hospitalier Cochin, Paris, France); Christian Delaunay (Clinique De L'yette, Longjumeau, France); Philippe Chiron (Centre Hospitalier Ranguell, Toulouse, France); Wolfhart Puhl, Karsten Dreinhoefer, Markus Floeren, Sabrina Baumann, Dagmar Groeber-Graetz (University of Ulm [RKU], Ulm, Germany); Klaus-Peter Guenther, Stefan Fickert (Carl-Gustav Carus University, Dresden, Germany); Joachim Löhr, Alexander Katzer, Dietrich Klüber (ENDOClinic, Hamburg, Germany); Volker Ewerbeck, Peter Aldinger, Dominik Parsch (University of Heidelberg, Heidelberg, Germany); Wolfram Neumann, Ingmar Meinecke, Thomas Bittner (Otto von Guericke University, Magdeburg, Germany); Wilfried von Eiff, Conrad Middendorf (Center for Hospital Management [CKM], Munster, Germany); Hans-Peter Scharf, Peter Schraeder, Sabine Schmitt (University Clinic Mannheim, Mannheim, Germany); David Rowley (Ninewells Hospital and Medical School, Dundee, UK); Ian Learmonth (Avon Orthopaedic Centre, Bristol, UK); Paul Dieppe, Victoria Cavendish, Susan Williams (HSRC, University of Bristol, Bristol, UK); Peter Kellermann, Ildiko Fistzer (University of Szeged, Szeged, Hungary); Thorvaldur Ingvarsson (Akureyri University Hospital, Iceland); Paolo Gallinaro, Alessandro Masse (Universita degli Studi di Torino, Torino, Italy); Andrzej Gorecki, Maciek Ambroziak (Medical University of Warsaw, Warsaw, Poland); Eduardo Garcia-Cimbrelo (Hospital La Paz, Madrid, Spain); Anna Nilsdotter, Urban Benger (Helsingborg Hospital, Skane, Sweden); Christian Hellerfelt, Christer Olson (Lasarett Karlshamm, Sweden); Joerg Huber, Ivan Broger (Kantonalspital Aarau, Switzerland); Robert Theiler, Kurt Uehlinger, Angela Hett (Stadtspital Triemli, Zurich, Switzerland); and Til Stuermer (Harvard Medical School, Boston, Massachusetts).

Supporting Information

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. PATIENTS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. AUTHOR CONTRIBUTIONS
  8. Acknowledgements
  9. REFERENCES
  10. Appendix A
  11. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
ACR_20038_sm_suppappendix.doc25KSUPPLEMENTAL APPENDIX A: THE RELATIVE CHANGE INDEX

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.