Long-term outcome following total hip arthroplasty: A controlled longitudinal study




To assess long-term outcome and predictors of prognosis following total hip arthroplasty (THA) for osteoarthritis (OA).


We studied 282 patients from 2 English health districts ∼8 years after THA, along with 295 controls selected from the general population. Baseline data were collected by interview and examination, on sex, age, comorbidity, body mass index (BMI), and Short Form 36 (SF-36) functional status, and preoperative radiographic severity of OA was graded. Functional status was reassessed at followup by postal questionnaire. Predictors of change in physical functioning were analyzed by linear regression.


Over followup, cases who had THA reported a median improvement of 10 points in SF-36 score for physical functioning, whereas in controls there was a median deterioration of 10 points (P < 0.0001). Mental health improved by a median of 12 points in both cases and controls. Change in physical functioning was significantly worse in women and at older ages among both cases and controls. In cases, Croft grade 5 OA was associated with a physical functioning score improvement 19.4 points (95% confidence interval 7.7, 31.2) greater than the improvement in grades 0–3, but BMI was unrelated to change in physical functioning.


Improvements in physical functioning following THA for OA are sustained in the long term and are more frequent in patients with more severe radiographic features preoperatively. We found no indication that patients who are overweight benefit less from THA, but further evidence is needed on the prognostic influence of more severe obesity.


Osteoarthritis (OA) of the hip is a major cause of pain and disability, and a serious public health problem (1). Total hip arthroplasty (THA) is a demonstrably effective treatment in the short term, but most studies have followed patients for only 6–12 months after surgery, and evidence on longer-term outcomes is less extensive (2). In the longest followup study of total hip replacement in OA that we know of, 219 patients were reviewed an average of 3.6 years after surgery (3). At this time point, patients were found to have significantly worse physical functioning when compared with controls; older age at surgery and preoperative pain appeared to predict a worse physical outcome. This study is in accord with others in suggesting that the greatest benefit of surgery on quality of life relates to pain and physical functioning, but that improvements may also be found in social functioning and mental health (2–13). The treatment appears to be effective at all ages (2, 5, 13–15), with greater benefits in patients with poorer preoperative health-related quality of life (2, 4, 16). Less is known, however, about other possible predictors of outcome, such as obesity (a known risk factor for hip OA) and preoperative radiographic severity. Further information on determinants of long-term prognosis might help to refine decisions on selection and priority of patients for surgery.

To find out more about long-term outcome after THA, we followed up with participants from an earlier case–control study of hip OA after an interval of ∼8 years.


The starting point for our study was a set of 643 matched pairs of cases and controls who had participated in an earlier study (17). The case group comprised residents of 2 English health districts (Portsmouth and North Staffordshire), age ≥45 years, who because of primary hip OA were placed on the waiting list for THA over an 18-month period from 1993 to 1995. The controls were selected from the general population, and were matched to the cases for age, sex, and general practitioner.

At the time of the case–control study (baseline), cases and controls were interviewed using a structured questionnaire, which among other things collected information about sex; age; smoking habits; previous injury to the hips leading to consultation with a doctor; pain (on most days of the past month) in the hands, shoulders, and knees; and comorbidity from medically diagnosed diabetes, hypertension, and thyroid disease. It also included sections of the Short Form 36 (SF-36) questionnaire (18) relating to physical functioning, vitality, and mental health. In addition, the interviewer measured the subjects' height and weight (from which body mass index [BMI] was calculated) and examined the subjects' hands for the presence of Heberden's nodes. A single trained observer examined all subjects to minimize any variability in the scoring of Heberden's nodes. Each interphalangeal joint was clinically graded according to the presence and severity of bony swelling (0 = no bony swelling; 1 = possible bony swelling; 2 = definite bony swelling, not severe; 3 = severe bony swelling, no deformity; 4 = severe bony swelling with deformity). This method has previously been shown to have high reproducibility and to have strong concordance with radiographic evaluation of the hand joints (19). Radiographs of the hips of those that were listed for surgery were scored for severity of OA according to the Croft grading system (20). This assessment was made by a trained research assistant. Intraobserver concordance for Croft grade was tested in a subset of 50 radiographs and gave a kappa coefficient of 0.63.

During 2001–2004, we wrote to all of the original study group subjects who could be contacted, asking them to complete a self-administered questionnaire that sought confirmation of any hip replacement surgery (along with the year[s] in which this surgery had been carried out), and again included sections of the SF-36 questionnaire concerning physical functioning, vitality, and mental health. Where they could be obtained, we also reviewed the hospital records of the cases to abstract the details of their hip surgeries, including the type of prosthesis used and whether there had been any subsequent revision. Information regarding those participants who were lost to followup included demographic details, current address, and the name of the participant's general practitioner.

Statistical analysis was carried out with Stata version 9.2 software (Stata Corporation, College Station, TX). The SF-36 scores of cases and controls were compared using the Mann-Whitney U test. Baseline predictors of change in functional status over followup were explored by linear regression.


From the 643 matched pairs who were recruited at baseline, 333 (52%) of the cases and 305 (47%) of the controls were successfully contacted at followup. Among the 648 nonresponders, 260 (136 cases and 124 controls) had died, 254 (113 cases and 141 controls) had moved, and 134 (61 cases and 73 controls) declined to participate. Of the 333 cases followed up, 48 were excluded because their baseline interview was completed after their THA, as were an additional 3 because they had not undergone THA, leaving 282 cases for analysis. We also excluded from analysis 10 initial controls who reported receiving a THA during the followup period, leaving 295 controls for analysis. These 51 cases and 10 controls were no different from those included in the analysis, and removing them did not make any substantive difference to the findings of the study. The distribution of the original study subjects, those who were followed up, and those who were not according to various baseline characteristics is summarized in Table 1. Cases and controls who died during the followup period were statistically significantly older than cases and controls in the other categories. No other differences between the groups were observed.

Table 1. Distribution of cases and controls and completeness of followup according to baseline characteristics*
 Cases, no. (%)Controls, no. (%)
Baseline (n = 643)Followup (n = 282)No response (n = 61)Died (n = 136)Moved (n = 113)Baseline (n = 643)Followup (n = 295)No response (n = 73)Died (n = 124)Moved (n = 141)
  • *

    BMI = body mass index. A total of 51 cases and 10 controls were excluded at followup. See text for details.

  • BMI information was missing for 9 cases and 12 controls.

  • Radiographic grade of hip listed for surgery. Information was missing for 18 cases.

 Male219 (34)98 (35)16 (26)55 (40)34 (30)219 (34)103 (35)28 (38)38 (31)47 (33)
 Female424 (66)184 (65)45 (74)81 (60)79 (70)424 (66)192 (65)45 (62)86 (69)94 (67)
Age, years          
 <66.88223 (35)121 (43)22 (36)20 (15)37 (33)216 (34)140 (47)28 (38)5 (4)40 (28)
 ≥66.88420 (65)161 (57)39 (64)116 (85)76 (67)427 (66)155 (53)45 (62)119 (96)101 (72)
BMI, kg/m2          
 ≤24.5192 (30)76 (27)21 (34)41 (31)32 (29)242 (38)103 (35)24 (33)52 (44)58 (42)
 24.6–28.0214 (34)103 (37)17 (28)48 (36)31 (28)214 (34)105 (36)26 (36)36 (30)45 (33)
 >28228 (36)102 (36)23 (38)43 (33)47 (43)175 (28)83 (29)23 (32)31 (26)35 (25)
Radiographic grade          
 0–3101 (16)43 (16)13 (22)21 (16)17 (15)     
 4404 (65)185 (68)34 (58)80 (61)75 (68)     
 5120 (19)46 (17)12 (20)31 (24)19 (17)     

Of the 282 cases included in the analysis, most (>85%) underwent THA within 2 years of the baseline interview, but in 13 cases the interval was longer (maximum 9 years). Followup of the case group was a mean interval of 8.5 years (median 8.4 years, range 7.1–9.9 years) from the baseline interview, and an average of ∼8 years after THA. The followup case group included 98 men and 184 women, with a mean age at followup of 76.3 years. The 295 controls (103 men and 192 women) were followed up at a mean of 8.1 years after baseline (median 8.2 years, range 6.5–9.7 years).

Hospital records were accessible for 177 (63%) of the cases. The prostheses used most frequently were the Charnley and the Muller-Furlong in 53 and 37 cases, respectively. Twelve arthroplasties were documented as requiring subsequent revision.

The SF-36 scores of cases are compared with those of controls at baseline and at followup for physical functioning, vitality, and mental health (Table 2). At baseline, the case group reported markedly worse physical functioning than the controls, whereas differences in vitality and mental health were small. By the time of followup, the physical functioning of the cases had improved while that of the controls had deteriorated, the net effect being a narrowing of the gap in scores between the 2 groups. The difference in the cases' change from baseline as compared with the controls' was highly statistically significant. In contrast, vitality deteriorated significantly during followup more in the case group than in the control group, and mental health improved to a similar extent in both groups.

Table 2. Median SF-36 scores in cases and controls at baseline and followup*
SF-36 measureCasesControlsP
No.Median score (IQR)No.Median score (IQR)
  • *

    SF-36 = Short Form 36; IQR = interquartile range.

  • Higher scores indicate better health. Data were incomplete in a few subjects.

  • Difference between cases and controls, from Mann-Whitney U test.

Physical function     
 Baseline (all subjects)59215 (5,30)63375 (40,90)< 0.0001
 Baseline (subjects with followup)26020 (5,35)28185 (55,95)< 0.0001
 Followup26030 (10, 6)28165 (28,86)< 0.0001
 Change from baseline in subjects who were followed up26010 (−5,30)281−10 (−28,0)< 0.0001
 Baseline (all subjects)59260 (55,70)63360 (55,65)0.56
 Baseline (subjects with followup)27460 (55,70)29260 (55,65)0.15
 Followup27450 (35,65)29260 (45,75)< 0.0001
 Change from baseline in subjects who were followed up274−10 (−30,5)2920 (−20,20)< 0.0001
Mental health     
 Baseline (all subjects)59264 (56,68)63364 (60,68)0.012
 Baseline (subjects with followup)27164 (60,68)29168 (60,68)0.23
 Followup27176 (64,88)29180 (64,88)0.25
 Change from baseline in subjects who were followed up27112 (−4,24)29112 (0,24)0.86

Predictors of change in outcome over followup in cases and controls were analyzed in separate models because of the difference in their functional outcome (cases improved whereas controls declined), and to allow for the exploration of possible predictors of change in cases that were not relevant to controls (notably, radiographic grade). The characteristics at baseline that predicted change in the SF-36 physical function score of the controls are shown in Table 3. The effect estimates were derived from a linear regression model, which incorporated the baseline SF-36 physical function score in addition to all other risk factors. Better baseline physical functioning, female sex, older age, a BMI >28 kg/m2, current smoking, hypertension, and diabetes were all associated with a greater decline in the physical function score during followup, and the associations were statistically significant for baseline physical functioning, female sex, older age, and hypertension.

Table 3. Predictors of change over followup in SF-36 physical function score of controls*
Baseline characteristicNo. subjectsMean change (95% CI)
  • *

    SF-36 = Short Form 36; 95% CI = 95% confidence interval; BMI = body mass index. All estimates were derived from a single model and thus are mutually adjusted.

  • Change in SF-36 physical function score relative to reference level of risk factor.

  • Change per 10 units of score at baseline.

SF-36 physical function score278−3.5 (−4.6, −2.4)
 Female178−9.0 (−15.4, −2.7)
Age, years  
 ≥66.88142−8.9 (−14.8, −2.9)
BMI, kg/m2  
 24.6–28.0981.7 (−5.2, 8.5)
 >2882−7.0 (−14.3, 0.3)
Smoking habit  
 Never smoked132Reference
 Ex-smoker118−0.3 (−6.4, 5.8)
 Current smoker28−4.9 (−14.5, 4.7)
 Present81−9.9 (−16.2, −3.5)
 Present9−4.6 (−20.2, 11.0)
Thyroid disease  
 Present153.1 (−9.2, 15.3)

The results of similar analyses for the case group, which used a regression model that also incorporated various features of the cases' OA as assessed at baseline, are shown in Table 4. Similar to the controls, better baseline physical functioning, female sex, and older age were associated with a significantly worse change in physical functioning over followup. In addition, changes in physical functioning were significantly worse in cases with diabetes and in cases who had other painful joint sites at baseline. However, there was no adverse impact of having a higher BMI in the case group. After allowance for other variables, higher radiographic grade was a strong predictor of improvement in physical functioning. Thus, on average, grade 5 OA was associated with a physical function score increase that was 19.4 points (95% confidence interval 7.7, 31.2) greater than the increase in cases with OA grades 0–3. This was almost double the median improvement of 10 points in all cases over the followup period.

Table 4. Predictors of change over followup in SF-36 physical function score of cases*
Baseline characteristicNo. subjectsMean change (95% CI)
  • *

    SF-36 = Short Form 36; 95% CI = 95% confidence interval; BMI = body mass index. All estimates were derived from a single model and thus are mutually adjusted.

  • Change in SF-36 physical function score relative to reference level of risk factor.

  • Change per 10 units of score at baseline.

SF-36 physical function score249−4.8 (−6.5, −3.1)
 Female160−9.8 (−17.1, −2.4)
Age, years  
 ≥66.88136−10.1 (−16.9, −3.4)
BMI, kg/m2  
 24.6–28.0920.4 (−7.8, 8.7)
 >28920.5 (−7.8, 8.7)
Smoking habit  
 Never smoked117Reference
 Ex-smoker107−0.1 (−7.0, 6.7)
 Current smoker25−7.7 (−18.7, 3.3)
 Present730.5 (−6.5, 7.6)
 Present8−25.8 (−44.9, −6.8)
Thyroid disease  
 Present1911.5 (−1.1, 24.1)
Radiographic grade of hip listed for surgery  
 41748.3 (−0.9, 17.4)
 53719.4 (7.7, 31.2)
Previous injury to hip listed for surgery  
 Yes18−10.3 (−23.0, 2.5)
Heberden's nodes  
 Present157−0.7 (−7.4, 6.1)
No. painful joint sites in hands, shoulders, knees  
 174−9.6 (−19.2, 0.04)
 257−13.7 (−24.0, −3.4)
 340−15.9 (−26.9, −4.9)
 424−18.4 (−31.4, −5.3)
 58−23.8 (−43.2, −4.4)
 64−22.1 (−49.1, 4.8)


Our findings are consistent with a sustained beneficial impact on physical functioning following THA for OA, but we found no evidence for parallel improvement in vitality or mental health. Several of the factors predicting worse changes in physical functioning over followup were the same in cases and controls; in particular, better baseline physical functioning, female sex, and older age. In addition, changes in physical functioning were worse in cases who reported pain at other joint sites prior to surgery, and were markedly better in those with the highest preoperative radiographic grades of OA. BMI was not a predictor of outcome.

The study design that we used had several limitations. It was an observational investigation rather than a randomized controlled trial, and as such was liable to potentially confounding effects. In particular, it cannot necessarily be assumed that had the cases not undergone surgery, the cases' health-related quality of life would have been similar to that of the controls. In addition, information about quality of life was collected by interview at baseline, but by self-administered questionnaire at followup; this difference may to some extent have compromised the comparison of findings from the 2 time points. However, Picavet (21) has shown that the differences in response to postal health surveys versus home interviews are small. Also, these shortcomings are of less concern in relation to the identification of prognostic indicators within the case group and the assessment of the indicators' relative importance.

Another weakness was the incomplete followup of cases and controls. This was principally because of death and migration. Death rates may have been higher in subjects with worse physical functioning, whereas moving to a new address could be associated with either better or worse functioning (poorer functioning might mitigate against moving in some people, while leading others to enter care homes). In both cases and controls, subjects who died were statistically significantly older, but there were no differences in sex, BMI, or radiographic grade between those who were followed up and those who were not. Proportionately, there were rather more deaths and fewer changes of address among the cases who were lost to followup (22.0% versus 19.3%) than among the controls who were lost (19.0% versus 21.9%). This is a further reason for caution when comparing outcomes between the 2 groups. The frequent absence of information about surgical procedures was also limiting; we were not able to take account of the surgeon or type of prosthesis when examining indicators of prognosis. Information about the cases' disease states was limited by the use of generic measures instead of disease-specific ones. However, the measures used were applicable to both cases and controls, and thus allowed for some comparison between the 2 groups.

Against these weaknesses, however, must be set the strengths of our long followup interval and the size of our study sample, which gave good statistical power despite attritional losses. The followup interval ranged from 6.5 to 9.9 years, and was not itself found to be a predictor of outcome in either cases or controls. It was also an advantage that we collected parallel data on controls, which enabled us to assess whether prognostic indicators were specific to cases undergoing THA or whether they could be applied more generally. The original study had a matched-pair case–control design. For the main analysis presented, the case–control matching was not taken into account. However, a repeat analysis based on matched pairs with complete followup data alone (118 such pairs) gave very similar results to those presented here, albeit with less statistical power.

Even when allowance is made for possible confounding effects, the long-term improvement in the physical functioning of the cases is striking when set against the decline that occurred in controls. This observation is consistent with findings from other studies (2–13) and suggests that the benefits of THA are often substantial and sustained. In contrast, mental health improved less over followup, and no more in cases than in controls. Some earlier investigations have suggested that THA benefits mental health as well as physical health (3–5, 7, 8, 10–12). Perhaps the difference of these studies from our own lies in the mental health of our cases prior to surgery. It is notable that in our investigation the mental health of the case group at baseline was no different from that of the control group, despite the cases' far greater physical limitations (22). The vitality of the cases also differed little from that of the controls at baseline, but it tended to deteriorate more in the cases over followup, again in contrast to the improvement in physical functioning. The explanation for this finding remains unclear and requires further research.

Factors in the case group that predicted decline in physical functioning over followup included better baseline physical functioning, female sex, and older age. The relation to baseline physical functioning may simply reflect a smaller scope for improvement and greater opportunity for decline in those who started with a higher score. In the same way, controls with higher baseline scores still tended to have higher scores at followup, although to a lesser degree. These effects may also reflect an element of regression to the mean, although this has not been formally evaluated. Distinguishing between this statistical phenomenon and a true shift in outcome is complex; however, it seems unlikely that the size of the effects seen in this study is due solely to regression to the mean.

The greater improvement we found in male versus female THA cases accords with the observations of Rissanen et al (9). However, the fact that we found a similar differential in our control group suggests that it may not reflect a greater benefit from surgery for men. Smaller improvements in physical functioning after THA at older as compared with younger ages have also been reported previously (14, 15), but again the fact that older age was also associated with poorer outcome in controls suggests that the effect may not indicate differences in response to surgery. Overall, there seems to be good evidence that older cases often derive worthwhile benefit from THA (2, 5, 13).

The other baseline variables that predicted change in our cases' physical functioning were reports of pain in joints other than the hip and the radiographic severity of OA in the hip listed for surgery. It is likely that in many cases, the pain in other joints resulted from more generalized OA, and it is perhaps not surprising that such individuals had a worse prognosis.

The strong relation of outcome to radiographic severity was apparent even after adjustment for baseline physical functioning. Two other studies have explored outcomes of THA in relation to radiographic changes. In the US, Meding and colleagues followed 1,015 cases for up to 7 years following THA. There was no significant association between preoperative severity of OA and the Harris Hip Score at any stage during followup (23). However, no analyses were reported with change in hip functioning as the outcome variable. In Sweden, Nilsdotter et al assessed THA cases before and 1 year after their surgery (3). There was no difference in physical functioning at followup according to the preoperative radiographic grade of OA, but, as in our study, the improvement in functioning from baseline was greatest in cases with the most severe radiologic features. Nilsdotter et al (4) reported another case–control study with a mean followup of 3.6 years. Their findings were similar to ours in that cases had worse functioning than controls at followup and that age predicted a poor outcome.

In contrast to preoperative radiographic grade, the BMI of cases showed no association with long-term change in physical functioning. This accords with studies done by Chan and Villar (24) and by Stickles et al (11) that found that improvements in quality of life after THA were similar in obese and nonobese patients. It could be that surgeons are judicious in their selection of overweight and obese patients for surgery, but these findings suggest that higher BMI, at least in the range up to 30 kg/m2, should not be a contraindication to THA provided that the patient is sufficiently fit to undergo the short-term rigors of surgery. Our data do not allow any firm conclusions about the impact on outcome of more marked obesity because only 36 of the patients who were successfully followed up had a BMI >30 kg/m2.

In conclusion, THA is well established as a treatment for advanced hip OA. Our findings add to the accumulating evidence that the benefits for physical functioning are sustained in the longer term, and they suggest that those benefits are greatest in the patients who have the most severe radiographic changes of OA before surgery. There may, therefore, be a case for giving higher priority to patients with severe radiographic changes. We have found no indication that patients who are overweight benefit less from THA, but further evidence is needed on the prognostic influence of more severe obesity.


Dr. Cooper had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Cushnaghan, Coggon, Croft, Dieppe, Cooper.

Acquisition of data. Cushnaghan, Coggon, Byng, Cox, Cooper.

Analysis and interpretation of data. Cushnaghan, Coggon, Reading, Croft, Cox, Dieppe, Cooper.

Manuscript preparation. Cushnaghan, Coggon, Reading, Dieppe, Cooper.

Statistical analysis. Cushnaghan, Reading, Cooper.


We would like to thank Mrs. Gill Strange for preparing the manuscript.