To evaluate whether hospital volume and surgeon volume of total hip replacements (THRs) are associated with patient-reported functional status and satisfaction with surgery 3 years postoperatively.
To evaluate whether hospital volume and surgeon volume of total hip replacements (THRs) are associated with patient-reported functional status and satisfaction with surgery 3 years postoperatively.
We performed a population-based cohort study of a stratified random sample of Medicare beneficiaries who underwent elective primary or revision THR in Ohio, Pennsylvania, or Colorado in 1995. The primary outcomes were the self-reported Harris hip score and a validated scale measuring satisfaction with the results of surgery. Both outcomes were assessed 3 years postoperatively. Hospital volume was defined as the aggregate number of elective primary and revision THRs performed on Medicare beneficiaries in the hospital in 1995. High-volume hospitals were defined as those in which >100 such procedures are performed annually, and low-volume centers were defined as those in which ≤12 procedures (primary THR cohort) or ≤30 procedures (revision cohort) are performed annually.
In unadjusted analyses, patients who underwent surgery in low-volume centers had worse functional status 3 years following primary and revision THR compared with patients whose surgery was performed in higher-volume centers. Patients whose revision THR was performed by a low-volume surgeon also had worse function. After adjustment for sociodemographic and clinical variables, however, the association between higher hospital volume and better functional status following primary THR was weak and statistically nonsignificant, and no statistically significant or clinically important associations between hospital or surgeon volume and functional status following revision THR was observed. Patients who underwent elective primary THR in low-volume centers were more likely to be dissatisfied with the results of surgery compared with patients whose surgeries were performed in high-volume centers. Similarly, patients whose surgeons performed ≤12 procedures per year were more likely to be dissatisfied with the results of revision THR than were patients whose surgeons performed >12 procedures per year.
Hospital volume and surgeon volume have little effect on 3-year functional outcome following THR, after adjusting for patient sociodemographic and select clinical characteristics. However, satisfaction with primary THR is greater among patients who underwent surgery in high-volume centers, and satisfaction with revisions is greater among patients whose operations were performed by higher-volume surgeons. Referring clinicians should incorporate these findings into their discussion of referral choices with patients considering THR. Conclusions regarding the effect of volume on longevity of the implants must await longer-term followup studies. Finally, further research is warranted to better understand the association between hospital and surgeon procedure volume and patient satisfaction with surgery.
For a wide range of surgical procedures, higher hospital and surgeon volumes frequently are associated with lower rates of perioperative mortality and complications (1–10). In particular, the volume of elective primary and revision total hip replacements (THRs) performed in hospitals (hospital volume) and by surgeons (surgeon volume) appears to be inversely related to rates of mortality and perioperative complications (6–10). Most research on the effects of provider volume on surgical outcome has focused on mortality and complications occurring soon after the procedure (1–10). There has been little investigation of whether the potential advantages of having surgery in a high-volume center persist beyond the perioperative period. To our knowledge, there are no population-based studies of associations between hospital and surgeon volume and patients' functional status and satisfaction following surgery of any type.
Longer-term results are particularly germane when short-term problems are rare, as is the case for THR. Mortality in the first 90 days following THR is ∼1% (7). Small differences in the risk of death between high-volume and low-volume centers may be less important to some patients compared with the convenience of having surgery performed in a small-volume center (11). Because reduced pain and improved function are major reasons prompting patients to undergo lower-extremity joint arthroplasty (12, 13), differences in the extent of symptom relief and functional improvement may provide patients with a compelling rationale for selecting a particular hospital or surgeon. The goal of the analyses presented in this report was to determine whether hospital volume or surgeon volume influences functional status and satisfaction, as measured 3 years after primary and revision THR in a population-based cohort.
As described previously (7), we used Medicare claims data to identify all patients in the US Medicare population, ages 65 years or older, who underwent elective primary or revision THR during the calendar year 1995. To focus on elective procedures, we excluded patients with infection of the hip, metastatic cancer or bone cancer, conversion of hemiarthroplasty (or other hip surgery) to THR, and (for primary THR) fracture of the hip or femur. The positive predictive values of the algorithms used to identify primary and revision THR were 0.99 and 0.92, respectively (7).
Different sampling procedures were used to select primary THR and revision THR cohorts. For the primary THR cohort, we selected a stratified random sample of Medicare beneficiaries over age 65 years who had the procedure performed in a hospital in Ohio, Pennsylvania, or Colorado. Use of these states provided geographic variation and a satisfactory distribution of low-, middle-, and high-volume hospitals. We divided hospitals into strata according to the volume of primary THRs performed in the Medicare population at that hospital in 1995. Within each volume stratum we randomly selected hospitals, sampling with the probability of being selected proportional to hospital size (14). Finally, within each hospital we randomly selected patients who underwent elective primary THR in 1995. Patients could enter the cohort only once (even if they had undergone more than one hip replacement during the index year). Because revision procedures are performed less frequently than are primary THRs, we invited all patients who had undergone revision THR to participate in the study.
All eligible patients in the primary and revision THR cohorts who were still alive in 1998 were sent an introductory letter on Health Care Financing Administration (HCFA; recently renamed the Centers for Medicare & Medicaid Services) letterhead, in which they were assured that HCFA had approved the study protocol. We then sent letters inviting these patients to participate. The letter of invitation indicated that the study protocol would involve a review of the medical record from the patient's index THR. If patients failed to respond to the letter of invitation, they received a second and, if necessary, a third letter of invitation. HCFA protocols stipulated that if patients did not respond to any of 3 letters, they could not be contacted further. Patients who agreed to participate in the survey either received a questionnaire by mail or scheduled a telephone interview (whichever they preferred).
Medicare claims provided data on the volume of primary and revision THRs performed in 1995 by the surgeon and in the hospital. Claims also provided data on age, sex, and eligibility for Medicaid (a surrogate for low income).
Trained abstractors employed by peer review organizations in each of the 3 states performed the medical record reviews, using a standardized data abstraction form. The medical record data included preoperative weight and height, underlying arthritis diagnosis (osteoarthritis, rheumatoid arthritis, avascular necrosis), medical comorbidities (assessed using the medical record–based Charlson Comorbidity Index) (15), whether cement was used to implant the femoral and acetabular prosthetic components, the approach (e.g., anterior, posterior, trochanteric), and whether the patient had previously undergone hip, knee, or spine surgery (other than on the index hip). For revision surgery, the medical record was also scrutinized to ascertain whether there had been prior revisions of the index hip, whether a bone graft was used, and, if so, whether structural allograft was used. We developed a measure of the complexity of revision surgery by assigning a score, calculated as the number of prior revisions plus one point for use of allograft and another point for the use of structural allograft.
The questionnaire included several validated measures of pain and functional status. The patient-administered version of the Harris hip score questionnaire is a validated 9-item measure of pain and functional status designed for patients with hip disorders (16, 17). The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) (18) has subscales measuring pain, stiffness, and functional limitation, and the subscales for pain and functional limitation were used in our survey. We utilized version 3.0 of the WOMAC, in which patients are asked to attribute their symptoms and functional limitations to problems in the hips and/or knees, thereby creating lower extremity–specific scales. The satisfaction scale comprises 4 individual questions. These questions address satisfaction with pain relief, ability to do house and yard work, ability to participate in recreational activities, and overall results of surgery. Each item has 4 responses (very satisfied, somewhat satisfied, somewhat dissatisfied, very dissatisfied), which are summed (19). The WOMAC, Harris hip, and satisfaction scores were converted to a 0–100 score, with 100 representing the best possible score.
Patients also indicated the number of years of formal education they had completed, which we categorized as either less than a college education or at least some college education. Patients provided their current weight and height, which we used to calculate body mass index (BMI; weight in kilograms divided by height in meters squared). Because the medical record was missing more data than the survey on weight and height, we used the weight and height obtained in the patient survey. The correlation between the BMI calculated using medical record data and that calculated using survey data was 0.99. We also asked 4 questions about patients' preoperative functional status (including use of supportive devices, limp, stair climbing, and walking distance). We chose to ask about these particular activities because they can be recalled with reasonable accuracy (kappa scores in the range of 0.40–0.50 for the comparison of prospectively obtained versus 3-year recalled data) (20). The items were adapted from the Harris hip score evaluation system; weights were assigned as in the Harris score (16, 17), summed, and standardized to a 0–100 scale.
Hospital volume was defined as the total number of elective primary and revision THRs performed on Medicare beneficiaries in the hospital during 1995. We defined high volume as >100 procedures (primary plus revision) per year. Twenty-four percent of primary surgeries and 28% of revisions were performed in such high-volume hospitals. Because the distribution of the numbers of primary and revision THRs differed substantially, distinct cutoff values demarcating low-volume hospitals were required, and were defined using the following approach. First, we ranked patients who responded to our survey according to total hospital volume of primary plus revision THRs. We performed this ranking separately for the primary THR cohort and for the revision cohort. We then defined low-volume hospitals as those in which the lowest 25% of patients in the rank order underwent their surgeries. For the primary THR cohort, this corresponded to a volume of ≤12 primary plus revision cases (in the Medicare population) per year. For the revision THR cohort, this corresponded to a volume of ≤30 primary plus revision cases per year. Thus, low, middle, and high hospital volumes, respectively, were defined as 1–12, 13–100, and >100 cases of primary plus revision surgery cases annually for the primary THR cohort, and 1–30, 31–100, and >100 cases of primary plus revision surgery annually for the revision THR cohort. For both cohorts, low surgeon volume was defined as ≤12 primary plus revision cases per year in the Medicare population. Low-volume surgeons performed ∼50% of primary THRs and ∼40% of revision THRs. Of note, we used several other cut points for volume, with similar results.
The principal outcome measures were the Harris hip score and the satisfaction score. We chose the Harris hip score (16) as the measure of functional status, because it is well accepted in the orthopedic community. It has been used extensively to assess the status of total hip arthroplasty, has been validated as a self-administered instrument (17), and is a more specifically targeted measure of hip status compared with other measures.
To simplify the presentation and interpretation of our results, we dichotomized each outcome. We dichotomized the Harris hip score at the lowest tenth percentile, corresponding to a Harris hip score of <55 in the primary THR cohort and <40 in the revision THR cohort (0 is the worst possible score, 100 is the best). We dichotomized the satisfaction score at a score of 50 on the 0–100 scale in both the primary and revision THR cohorts. Scores <50 indicate that the patient was dissatisfied. Other cut points led to similar results.
We used claims data to evaluate differences in demographic and clinical characteristics and outcomes between patients who completed the survey and those who refused to participate or never responded to our invitation. The t-test and the chi-square test were used to evaluate the statistical significance of these differences.
In bivariate analyses, we first identified the patient characteristics associated with hospital volume, because such characteristics could potentially confound associations between volume and outcome. We next examined crude associations between select sociodemographic and clinical patient characteristics and the 2 principal outcomes (a Harris hip score in the lowest 10% and a satisfaction score <50). To estimate the crude and adjusted associations between volume and each outcome, we performed logistic regression analyses using generalized estimating equations (Proc Genmod; SAS Institute, Cary, NC) (21) to adjust variances for clustering among patients within hospitals.
Multivariate modeling proceeded in 2 phases. The first set of models included hospital volume as the primary predictor plus all patient factors that were significant (P < 0.1) predictors of outcome in bivariate analyses. These analyses provided the effect of hospital volume on outcomes adjusted for patient factors. If the crude association between surgeon volume and outcomes reached an odds ratio (OR) ≥1.5 or a P value <0.1, we used a second set of multivariate models that included surgeon volume in addition to all of the predictors that were included in the first set of models.
We performed additional analyses to evaluate the sensitivity of our results to the particular manner in which we dichotomized the Harris hip score and to the choice of the Harris hip score as the primary outcome measure. Analyses were performed using a cutoff score of 70 on the Harris hip score (generally regarded as a poor score following primary THR) as the dependent variable rather than using the lowest decile. Twenty-one percent of patients who underwent a primary THR and 43% of those who underwent a revision had Harris hip scores <70. Next, we performed logistic regression modeling, using the lowest 10% of scores on the WOMAC pain and function scales (rather than the Harris hip score) as dependent variables. The lowest deciles corresponded to WOMAC pain scores of 55 for primary and 45 for revision THR, and WOMAC function scores of 50 for primary and 41 for revision THR. To examine whether the results were sensitive to the use of a continuous versus a dichotomous dependent variable, we performed multivariate linear regression analyses using the Harris hip score as a continuous dependent variable. These models included the same covariates as those included in the logistic models and were used to compute the adjusted mean Harris hip scores. Finally, we examined a range of different cut points for the categorization of hospital and surgeon volume. All analyses were performed with SAS version 8.2 software. P values less than 0.05 were considered significant.
Using the sampling procedures described above, we selected a cohort of 1,939 patients from among the 7,092 Medicare beneficiaries who underwent primary hip replacement surgery in Pennsylvania, Ohio, or Colorado during 1995. Among these 1,939 patients, 32 (1.7%) died during the 6–12-month period between the time we chose the sample and the time we contacted patients. Another 20 patients (1.0%) could not be located because of an incorrect address. Of the 1,887 surviving patients with accurate addresses, 519 (28%) never responded to the 3 letters of invitation, 338 (18%) refused to participate, and 1,030 (55%) agreed to participate. Of these 1,030 patients, 958 (93%) returned completed questionnaires. We performed analyses on the 926 subjects who had complete data on hospital and surgeon volume.
All 1,568 patients whose revision THRs were performed in the 3 states during 1995, and who survived until 1998, were included in the revision THR cohort. Of these 1,568 patients, 21 (1.3%) died in the 6–12-month period between the time of sample selection and the time that patients were contacted, and 16 (1.0%) could not be located. Among the 1,531 survivors who had valid addresses, 605 (39%) never responded, 257 (17%) refused to participate, and 669 (44%) agreed to participate. Of these 669 patients, 595 (89%) returned completed questionnaires. We performed analyses on the 578 subjects who had complete data on hospital and surgeon volume.
We used Medicare claims data to compare select characteristics of patients who refused to participate or never responded to our invitation with those of patients who participated in the survey. In the primary THR cohort, patients who refused or never responded were older than subjects who participated (76.2 versus 74.1 years; P = 0.0001). Forty percent of eligible African-American patients participated, compared with 51.3% of white patients (P = 0.03). Thirty-four percent of patients who were eligible for Medicaid (indicating poverty-level income) completed the survey, compared with 51% of patients with higher incomes (P = 0.004). Distributions of responders and nonresponders did not differ with respect to hospital volume of THR, patient sex, and whether the patient had a dislocation, deep wound infection, or pulmonary embolus within the 3 years following surgery.
The patterns of response in the revision THR cohort were similar to those in the primary THR cohort, except that in the revision cohort, 44% of patients whose surgeries were performed in high-volume hospitals (>100 cases per year) completed the survey, compared with 40% and 33%, respectively, of those who underwent surgery in the middle-volume (31–100 cases) and lowest-volume (≤30 cases) centers (P for trend = 0.002).
In both the primary and revision THR cohorts, patients who underwent surgery in higher-volume hospitals were significantly more likely to have an income greater than $20,000 per year, to have attended college, and to have better recalled preoperative functional status compared with patients who underwent surgery in lower-volume hospitals (Table 1). In the primary THR cohort, patients whose surgeries were performed in higher-volume hospitals were younger than those who underwent the procedure in lower-volume centers. In the revision THR cohort, patients who underwent surgery in low-volume hospitals were more likely to have a BMI >30.
|Primary THR||Revision THR|
|Low (n = 222)||Medium (n = 484)||High (n = 220)||Low (n = 152)||Medium (n = 266)||High (n = 160)|
|Age ≥75 years||85 (38)||160 (33)||53 (24)†||52 (34)||88 (33)||61 (38)|
|Female||136 (61)||308 (64)||132 (60)||93 (61)||172 (65)||92 (58)|
|Income <$20,000/year||134 (62)||207 (43)||72 (33)||89 (60)||130 (49)||66 (43)†|
|Less than college education||180 (85)||375 (79)||162 (75)‡||127 (87)||214 (82)||108 (71)†|
|Osteoarthritis||194 (87)||418 (86)||191 (87)||NR||NR||NR|
|Rheumatoid arthritis||8 (4)||21 (4)||8 (4)||NR||NR||NR|
|Avascular necrosis||13 (6)||40 (8)||10 (5)||NR||NR||NR|
|Recalled preoperative function in lowest quartile||67 (32)||118 (26)||50 (23)‡||41 (29)||53 (21)||27 (18)‡|
|Any comorbidities||110 (50)||222 (46)||93 (42)||68 (45)||123 (46)||60 (38)|
|Body mass index >30||50 (23)||112 (24)||61 (29)||38 (26)||56 (22)||23 (15)‡|
|Prior orthopedic surgery||79 (36)||179 (37)||80 (36)||69 (45)||140 (53)||82 (51)|
In both the primary and revision THR cohorts, income less than $20,000 per year, less than a college education, BMI >30, and recalled preoperative function scores in the lowest quartile were all significantly associated with 3-year Harris hip scores in the lowest 10% (Table 2). Patients who previously underwent total joint replacement (other than the index joint) or spine surgery and female patients were significantly more likely than other patients to have poor function after primary THR, but not after revision THR.
|Characteristic||Lowest 10% Harris hip score|
|Primary THR||Revision THR|
|% of patients||Crude OR (95% CI)||Adjusted OR (95% CI)||% of patients||Crude OR (95% CI)||Adjusted OR (95% CI)|
|Low||13.1||1.78 (0.90–3.54)||1.29 (0.64–2.62)||13.6||1.83 (1.08–3.11)||0.90 (0.40–1.99)|
|Medium||10.0||1.32 (0.70–2.49)||1.14 (0.63–2.06)||9.6||1.22 (0.73–2.03)||0.94 (0.45–1.95)|
|Surgeon volume ≤12||11.2||1.26 (0.81–1.99)||–||12.8||1.63 (0.93–2.85)||1.54 (0.80–2.96)|
|Age ≥75 years||11.4||1.21 (0.72–2.02)||–||9.2||0.82 (0.47–1.45)||–|
|Female||12.4||2.00 (1.19–3.37)||–||11.3||1.44 (0.73–2.88)||–|
|Income <$20,000 annually||14.3||2.31 (1.55–3.44)||2.00 (1.32–3.02)||14.9||2.95 (1.65–5.27)||2.79 (1.56–5.01)|
|Less than college education||11.6||2.80 (1.26–6.23)||2.10 (0.97–4.55)||12.0||3.83 (1.23–11.9)||–|
|Rheumatoid arthritis||13.9||1.45 (0.61–3.45)||–||–||–||–|
|Avascular necrosis||16.9||1.90 (0.95–3.79)||–||–||–||–|
|Preoperative function in lowest quartile||15.0||1.92 (1.22–3.01)||1.57 (0.99–2.50)||17.9||2.47 (1.36–4.49)||2.26 (1.27–4.05)|
|Any comorbidities||12.4||1.33 (0.85–2.07)||–||11.6||1.53 (0.93–2.52)||–|
|Boss mass index >30||13.7||1.61 (1.02–2.53)||–||15.0||1.98 (1.03–3.81)||–|
|Prior orthopedic surgery||13.9||1.85 (1.14–2.99)||1.92 (1.15–3.23)||9.1||0.79 (0.45–1.35)||–|
We did not observe any clinically important or statistically significant associations between surgical variables (e.g., approach, use of cement, and surgical complexity of the case) and Harris hip score 3 years following surgery.
In crude analyses, we demonstrated that low surgeon volume was not significantly associated with a Harris hip score in the lowest decile following primary THR. Patients who underwent primary THR in low-volume hospitals were more likely to have Harris hip scores in the lowest decile than were patients whose operations were performed in high-volume hospitals (>100 cases annually), although this association did not reach statistical significance (crude OR 1.78, 95% CI 0.90–3.54). Adjustment for income, education, recalled preoperative functional status, and prior hip, knee, and spine surgery weakened the association between hospital volume and the proportion of patients with a Harris hip score in the lowest 10%. The adjusted OR for lowest-volume versus highest-volume hospitals was 1.29 (95% CI 0.64–2.62, P for trend = 0.48).
In crude analyses, patients who underwent surgery in the lowest-volume hospitals were significantly more likely to have a Harris hip score in the lowest decile, compared with patients whose surgeries were performed in the highest-volume hospitals (crude OR 1.83, 95% CI 1.08–3.11). Low surgeon volume was associated with a 1.6-fold higher odds of having a Harris hip score in the lowest decile (crude OR 1.63, 95% CI 0.93–2.85).
After adjustment for income and recalled preoperative function, the association between hospital volume and low Harris hip score became weaker, with the OR dropping from 1.83 to 1.26. When we added surgeon volume to the model, the increased risk of poor functional outcome in low-volume hospitals disappeared entirely (adjusted OR 0.90, 95% CI 0.40–1.99) (Table 2). Patients whose operation was performed by a low-volume surgeon had a 1.5-fold higher odds of having a low Harris hip score after adjustment for income, recalled preoperative function, and hospital volume. This association did not reach statistical significance (adjusted OR 1.54, 95% CI 0.80–2.96).
Female sex, income <$20,000, a diagnosis of avascular necrosis, and the presence of any comorbidity were significantly associated with dissatisfaction with the results of primary THR. Income <$20,000 and less than a college education were significantly associated with dissatisfaction with the results of revision surgery. None of the surgical variables, such as approach, use of cement, and complexity of the case, was associated with satisfaction with surgery (Table 3).
|Characteristic||Satisfaction score <50|
|Primary THR||Revision THR|
|% of patients||Crude OR (95% CI)||Adjusted OR (95% CI)||% of patients||Crude OR (95% CI)||Adjusted OR (95% CI)|
|Low||13.8||2.15 (1.20–3.85)||2.06 (1.15–3.69)||29.7||1.26 (0.80–1.97)||0.81 (0.44–1.48)|
|Medium||8.8||1.29 (0.74–2.26)||1.22 (0.70–1.13)||27.4||1.12 (0.77–1.63)||0.85 (0.54–1.33)|
|Surgeon volume ≤12||10.0||1.07 (0.68–1.68)||–||33.5||1.68 (1.14–2.46)||1.77 (1.11–2.82)|
|Age ≥75 years||10.6||1.20 (0.74–1.93)||–||29.9||1.21 (0.84–1.74)||–|
|Female||11.0||1.65 (1.02–2.66)||1.68 (1.03–2.75)||28.7||1.19 (0.82–1.74)||–|
|Income <$20,000 annually||11.8||1.63 (1.08–2.48)||–||33.0||1.77 (1.15–2.72)||–|
|Less than college education||9.9||1.31 (0.71–2.42)||–||30.5||2.43 (1.49–3.97)||2.26 (1.35–3.78)|
|Rheumatoid arthritis||10.8||1.16 (0.36–3.72)||–||–||–||–|
|Avascular necrosis||17.5||2.14 (1.08–4.24)||2.00 (1.01–3.98)||–||–||–|
|Preoperative function in lowest quartile||8.1||0.81 (0.48–1.37)||–||29.8||1.14 (0.75–1.73)||–|
|Any comorbidities||11.4||1.48 (1.00–2.19)||1.50 (1.00–2.24)||30.2||1.29 (0.93–1.79)||–|
|Boss mass index >30||10.4||1.20 (0.72–2.01)||–||32.8||1.40 (0.91–2.15)||–|
|Prior orthopedic surgery||9.5||0.99 (0.61–1.61)||–||27.1||0.97 (0.71–1.33)||–|
In crude analyses, we did not observe a significant association between surgeon volume and satisfaction with the results of primary THR. However, patients who underwent primary THR in low-volume hospitals were more likely to have satisfaction scores <50, compared with patients whose THRs were performed in the highest-volume hospitals (crude OR 2.15, 95% CI 1.20–3.85). After adjustment for sex, the diagnosis of avascular necrosis, and the Charlson comorbidity index, patients who attended the lowest-volume hospitals remained more likely to have satisfaction scores <50 (adjusted OR 2.06, 95% CI 1.15–3.69, P for trend = 0.01).
In crude analyses, we did not observe a significant association between hospital volume and satisfaction with revision surgery. However, patients whose surgeons performed ≤12 THRs annually were significantly more likely to be dissatisfied with revision THR than were patients with higher-volume surgeons (crude OR 1.68, 95% CI 1.14–2.46). This association between low surgeon volume and dissatisfaction with revision surgery persisted after adjustment for education (adjusted OR 1.77, 95% CI 1.11–2.82).
Using multivariate models analogous to those described above, we observed no association between hospital volume and a Harris hip score <70 in patients who underwent primary or revision THR. Similarly, we found no clinically or statistically significant associations between hospital volume and the risk of having a WOMAC pain or function score in the lowest 10%, for either primary or revision THR.
The logistic model was chosen because ORs provide comparatively simple measures of effect. In linear regression models adjusted for the same covariates, the adjusted mean (95% CI) Harris hip scores in patients who underwent surgery in the low-, middle-, and high-volume hospitals were 80.0 (77.3–82.8), 83.2 (81.5–84.9), and 82.7 (80.4–85.0), respectively, for primary THR. Similarly, the adjusted mean (95% CI) Harris hip scores for revision THR in the low-, middle-, and high-volume strata were 66.5 (63.0–70.0), 70.8 (67.6–74.1), and 71.8 (68.1–75.4).
Higher hospital and surgeon volumes have been consistently associated with lower rates of adverse events following numerous surgical and medical procedures (1–10). With few exceptions (22), studies of the volume–outcome relationship have focused on a narrow set of outcomes: perioperative mortality and complications. The influence of hospital and surgeon volume on pain and physical functional status, which are critically important outcomes of THR from the patient's point of view (12), has received little prior attention. In the current study, we examined the effects of provider volume on these patient-centered outcomes in a population-based study conducted among Medicare beneficiaries. We observed that hospital and surgeon volume had little effect on functional status 3 years following primary and revision THR, after adjusting for socioeconomic and clinical variables. Patients who underwent primary THR in the lowest-volume hospitals were somewhat more likely to have a Harris hip score in the lowest 10% of the distribution (adjusted OR 1.29, 95% CI 0.64–2.62).
Patients who underwent primary THR in low-volume hospitals appeared to be less satisfied with the results of surgery compared with patients whose THRs were performed in high-volume hospitals, and this effect persisted after adjustment for patient sociodemographic and select clinical factors. Similarly, patients whose revision THR was performed by a surgeon whose annual volume was ≤12 cases were more likely to be dissatisfied, and this effect persisted after adjustment for clinical and sociodemographic factors as well as hospital volume. These effects of volume on satisfaction are somewhat puzzling, given the similar functional outcomes associated with high-volume and low-volume hospitals (as well as high-volume and low-volume surgeons). The correlation between satisfaction score and Harris hip score was in the range of 0.61 to 0.71 (analysis performed separately within the 3 volume strata for both primary and revision THR), indicating that the Harris hip score and the satisfaction score assess similar though distinct domains. It is possible that patients in high-volume hospitals responded positively to the more sophisticated facilities in these centers when they answered the questions about their satisfaction with the results of surgery. Satisfaction may also tap aspects of patient outcome that are not captured by standard measures of functional status and pain. The reasons that volume is more strongly associated with satisfaction than with functional limitation require further study.
The crude association between low hospital volume and worse functional outcome merits discussion. Patients with low levels of income and educational attainment and those with worse recalled preoperative functional status were more likely to have THR performed at low-volume hospitals (Table 1). Before adjusting for these factors, low hospital volume was associated with worse Harris hip scores at followup. However, adjustment abolished the volume effect (Table 2), indicating that education, income, and preoperative functional status confounded the association between volume and functional outcome. Further analysis (data not shown) confirmed that the key confounding variables were education and income, not preoperative functional status. Other studies have shown that markers of lower socioeconomic status, such as education and income, are associated with worse health outcome in the general population (23), following orthopedic surgeries (24, 25), and in other chronic conditions (26). Low preoperative function has also been associated with worse outcome of total knee and hip replacement (24). Thus, it appears that patients with worse prognostic features undergo surgery in low-volume hospitals. It is these prognostic features, not hospital volume per se, that account for the worse functional results obtained in the unadjusted analyses.
Our findings contrast to some extent with results from the one previous study that examined associations between hospital procedure volume and subsequent patient-reported functional status. Heck and colleagues (22) showed that patients undergoing total knee replacement in hospitals that performed >50 cases annually were more likely to have an improvement in functional status (from preoperatively to 2 years postoperative) compared with patients who underwent surgery in lower-volume hospitals. The study by Heck et al focused on knee (not hip) replacement, the study group was considerably smaller than that in our study, was conducted in one state, and assessed preoperative functional status prospectively. It defined a poor outcome as failure to improve in score rather than failure to achieve a specified threshold. Hence, the 2 studies are difficult to compare, underscoring the need for further work in this area.
The findings of our study should help patients, families, and referring physicians make the complex decision of whether to schedule THR in a high-volume or a low-volume center. Low-volume hospitals have higher rates of 90-day postoperative mortality and complications following primary and revision THR (7). However, the absolute difference in risk of death between high-volume and low-volume centers is <1% for primary THR and <2% for revisions (7). Similarly, the differences between high-volume and low-volume centers in dislocation rates are 2% for primary THRs and 6% for revisions (7). Some patients might prefer to accept the risks associated with undergoing THR in a low-volume hospital rather than travel to a high-volume center (11). The findings of the present analyses suggest that beyond the 90-day risk of perioperative complications, there is little additional disadvantage, in the form of worse functional results at 3 years, to having surgery in a low-volume center. Of course, the longevity of the prosthesis cannot be determined in a 3-year time frame. The important question of whether volume influences the failure rate of the prostheses over a longer (e.g., 5–10-year) time frame is not addressed by our survey. We are planning to follow up this cohort over a longer period of time in order to address these issues.
These findings also influence health care policy. The consistent evidence of better outcomes in high-volume centers has led payers to consider regionalizing care to high-volume hospitals. Such regionalization policies would, at least theoretically, save lives (1). Our data suggest that a regionalization policy would have little additional benefit after the perioperative period, at least in the 3-year interval considered in the present study. Our data also show that older, less educated, poorer, and more functionally disabled patients would be disproportionately affected by a policy that shifts patients out of low-volume hospitals. These findings argue against a blanket regionalization policy and suggest that the potential trade-offs between having THR in a small-volume or a large-volume center should be evaluated explicitly in decision analytic models.
From a methodologic standpoint, this study provides a template for research on other procedures such as cardiac and cancer surgery to determine whether the short-term advantages of high-volume hospitals documented for these procedures persist over a longer period and extend to a broader set of outcomes. We suggest that such a broad view of outcome is needed to fully evaluate the trade-offs between volume and outcome and to formulate policy.
Our study has several strengths. The sample was population-based. Outcomes were assessed with standardized, previously validated scales. The research team was not involved in the care of patients, which precluded observer bias. The analyses accounted for potential confounding by patient characteristics and were simultaneously adjusted for hospital and surgeon volume and for clustering of patients within hospitals.
The study also has important limitations. The response rates were 51% of all eligible patients who underwent primary THR (55% of patients agreed to participate, and 93% of these completed forms) and 39% of patients who underwent revision THR (44% agreed to participate, and 89% of these completed forms). Other large surveys of elderly patients with musculoskeletal disorders have also had modest response rates, on the order of 52–64% (22, 24, 27). Our response rates were low in part because HCFA regulations precluded our contacting patients who never responded to the letters of invitation (28% of the primary THR cohort and 39% of the revision cohort). This restriction was not imposed on the studies described above. Although elderly, poor, and African-American patients were less likely to participate in our survey, there were no significant differences in perioperative outcomes between survey respondents and patients who refused to participate or did not respond. The study was also limited to Medicare beneficiaries, setting a lower limit on age of 65 years. Although two-thirds of all patients who undergo THR are Medicare beneficiaries (28), these analyses should nevertheless be extended in the future to younger patients undergoing THR.
Preoperative function is the most important determinant of postoperative function (24), and our cross-sectional design disallowed prospective preoperative assessment. However, we measured preoperative function using variables that are recalled with moderate accuracy (20, 29). The correlation between a functional limitations scale administered prospectively and one administered 3 months retrospectively has been shown to be 0.48 (29). Furthermore, preoperative status did not influence the association between volume and patient-centered outcomes, indicating that inaccurate recall is unlikely to bias the principal findings. An additional limitation is that our crude measure of surgical complexity did not capture many of the more subtle aspects of the complexity of revision THR. Finally, prosthesis failure leading to revision occurs rarely at 3 years, preventing evaluation of this important and costly outcome. Longer-term followup of this cohort is critical.
In conclusion, after accounting for socioeconomic status, the functional outcome of elective primary and revision THR 3 years postoperatively is similar in high-volume and low-volume centers. The well-documented favorable performance of high-volume centers in the perioperative period should not be assumed to extend to functional outcomes several years following THR. Nonetheless, satisfaction following primary THR is higher among patients attending high-volume centers, and satisfaction following revision surgery is higher among patients with high-volume surgeons. Further research is needed to confirm and explain these findings relating to volume and satisfaction, and to examine the effect of volume on longer-term functional outcomes of primary and revision total hip replacement.
We gratefully acknowledge Matthew H. Liang, MD, MPH, for helpful input throughout the project, Heema Kaul, BA, Allison Diamond, MPH, Joanna Case, BA, and Andrew Miner, BA, for invaluable assistance with data collection, and the many patients who participated in this research.