SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Background:

The aims were to assess the evidence that individual hospitals had mortality rates in excess of the national average after abdominal aortic aneurysm (AAA) repair and to develop an effective method for monitoring mortality using local data.

Methods:

Hospital Episode Statistics identified patients undergoing elective infrarenal AAA repair. A technique was developed that compared individual hospital mortality rates with the mortality rate in the remainder of England. The strength of evidence that the death rate was less than elsewhere, and less than twice elsewhere, was quantified using a test of statistical significance. A moving average chart technique was devised using local data for mortality monitoring and comparison to the national average.

Results:

For 30 hospitals, the mortality rate was significantly greater than elsewhere, and in three hospitals it was demonstrably greater than twice that in the remainder of England. The moving average chart appeared to provide a useful technique for local mortality monitoring.

Conclusion:

Different mortality rates exist for AAA repair within England. Mortality can be monitored locally and compared with the national average. Copyright © 2008 British Journal of Surgery Society Ltd. Published by John Wiley & Sons, Ltd.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

In the UK, clinical governance evolved following the General Medical Council inquiry into mortality rates after paediatric cardiac surgery1, 2. However, there is no consensus as to how individual unit mortality rates should be evaluated; defining a high mortality rate for a particular procedure is difficult owing to significant variations in case mix3.

A recent study of elective abdominal aortic aneurysm (AAA) repair using discharge data from England showed that the national mean mortality rate was 7·4 per cent between 2001 and 20054. This value was higher than that reported in recent national trials, even after risk adjustment5. The English discharge data also demonstrated that mortality rates at individual hospitals varied widely and were related to the annual hospital volume.

In reference to cardiology and trauma services, the Institute for Public Policy Research suggested that the closure of less effective services at smaller local hospitals would improve survival rates6, as modern interventions are complex, expensive and could be provided at a smaller number of specialist hospitals. This policy appears to have the support of the present UK Government, with the Prime Minister stating recently that patients will have to travel further to access specialist care7 (which is contrary to previous proposals8–10).

Hospital league tables have become topical in the popular press, despite arousing strong feelings within the medical community. In the past, schemes such as Dr Foster11 and CHKS12 have ranked a hospital's performance on the basis of crude hospital-level mortality rates, although case-mix adjustment has been added more recently. These tables often present mortality figures for the hospital as a single entity, rather than examining individual specialties and procedures, beyond certain ‘index’ procedures such as fractured neck of femur. There is now a proposal to rank individual physician and surgeon mortality rates in league tables, as in cardiothoracic surgery.

The use of crude mortality figures is inappropriate as case mix and case selection play a pivotal role in the outcomes of medical interventions, with older patients with greater co-morbidity having a higher mortality rate. As case mix varies widely between hospitals, any interpretation of league tables based on crude mortality rates will be inaccurate and unreliable. Ultimately, the publication of unadjusted data may drive hospitals and physicians to treat younger, fitter patients, and deny treatment to patients who are elderly or have significant co-morbidity13. Alternatively, cases may be coded with inappropriate complexity so that the hospital comes out better in the rankings, behaviour widely recognized, especially in the USA where league tables have been published for some years13–16.

An alternative to league tables could be a system developed to assess the mortality rates at individual hospitals, and to suggest mechanisms for monitoring the mortality rate locally. The theoretical basis of this study was to assess mortality rates and to investigate whether hospitals could prove evidence of safety, rather than no evidence of danger. The study approached the question of clinical governance and mortality monitoring, using elective AAA repairs as an example. It used Hospital Episode Statistics (HES) data17 between 1 April 2000 and 31 March 2005, as well as local Patient Administration System (PAS) data from the authors' hospital, and built on the findings of a previous theoretical study18.

Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

HES data for all consultant episodes in National Health Service (NHS) hospitals in England were acquired for the years 2000–2005. Data were extracted for elective infrarenal AAA repair using previously published methods and inclusion criteria4, 19. The data included details of the admitting hospital, patient age and sex, duration of hospital stay, timing of operations relative to date of admission, complications, and whether the patient was discharged alive or dead. The quality of this data set was assessed and validated4, 20.

Local data, obtained from St George's Hospital PAS database using the same inclusion criteria, were used to validate the HES data, and construct a method for local mortality monitoring.

Safety charts: the quantification of safety or danger

Safety charts were constructed to show for each hospital the strength of evidence that the death rate was below a threshold equal to the national average and below a threshold equal to twice the national average. If there was statistically significant evidence that the death rate was below the threshold, this was described as evidence of safety. If the evidence was not statistically significant, insufficient there was evidence of safety. If there was statistically significant evidence that the death rate exceeded the threshold, this was described as evidence of danger. Insufficient evidence of safety or danger was considered unhelpful and not reported.

The mortality rate in each hospital was compared with the national mean mortality rate, after the exclusion of the data from that hospital (that is compared with the mortality rate elsewhere). This provided the relative risk (RR) of mortality at each hospital. The investigation was performed twice, comparing each hospital with the national average mortality rate (k = 1) and with twice the national average mortality rate (k = 2). The national average and twice the national average mortality rates were termed safety thresholds in this study. The risk ratios were calculated relative to the estimated risk in the comparison group for both investigations (k = 1 and k = 2), rather than to the safety threshold.

When the safety threshold was equal to the national average, the hospitals were arranged into two groups: hospitals with a RR less than 1 and those with a RR of 1 or more.

With the safety threshold set at twice the national average mortality, the data were arranged into three groups: hospitals with a RR less than 1, hospitals with a RR between 1 and 2, and hospitals with a RR greater than or equal to 2.

To assess the evidence for safety or danger of individual hospitals, mortality at each hospital was compared with the mortality elsewhere, that is national data excluding data from the individual hospital. The number of procedures was considered to be fixed both for the individual hospital and for all other hospitals combined, and the numbers of deaths to be independently binomially distributed. For a safety threshold equal to the national average, the joint distribution for the numbers of deaths was based on a common death rate using the maximum likelihood estimate, that is the combined mean national death rate.

The P value for an individual hospital diverging by as much as observed, or more, from the safety threshold was determined as the sum of probabilities from the joint distribution for all possible pairs of number of deaths for which the RR was as extreme, or more extreme, than observed. For a safety threshold equal to twice the mortality elsewhere (k = 2), the maximum likelihood estimate for a mortality rate exactly twice that elsewhere was used instead of the national mean.

For hospitals with death rates below (or above) the threshold value, the one-tailed P values indicated the strength of evidence of safety (or danger). P values were displayed on a scale of log10(odds) to distinguish small values that differed by orders of magnitude. Odds were used rather than P values to exploit the fact that log(odds) are equal to 0 for P = 0·5 and so evidence of safety and danger can be shown in different directions on the y-axis. P values were shown without adjustment for the multiplicity of testing implied by comparing mortality at each hospital with mortality elsewhere.

Log(odds) of 1·3, equivalent to one-tailed P = 0·050, was indicated by horizontal lines on the charts. Hospitals that lay outside the two lines generated a significant weight of evidence that their mortality rate was inconsistent with the threshold value, being either higher or lower. Hospitals that lay within the ‘control bands’ may still have had a RR of mortality greater than, or greater than twice, the national average, but there was insufficient evidence to be able to identify them as safe or unsafe. Overall, this technique provided three alternative states into which hospitals fell: evidence of safety, insufficient evidence of safety or danger, or evidence of danger.

Maximum likelihood estimator for the safety test

Under the null hypothesis that the death rate is a constant, p, at the hospital of interest while it is exactly p/k elsewhere, the maximum likelihood estimate for p can be determined by standard methods. The numbers of procedures at the hospital, n2, and elsewhere, n1, were assumed to be fixed. The corresponding numbers of deaths, d2 and d1, were not assumed to be fixed. This is an important difference from the χ2 test and many other tests of a contingency table. A further assumption was that the deaths at the hospital of interest were independent of one another, and that the same applied elsewhere. Thus, the joint likelihood of d1 and d2 is the product of two binomial probabilities. The maximum likelihood estimate is obtained by differentiating the log(likelihood) with respect to p and setting the differential equal to 0. The maximum likelihood estimate is a solution of the quadratic equation a·p2 + b·p + c = 0, where a = n1 + n2, b = − (d1·k + n1 + n2·k + d2) and c = (d1 + d2k.

Homogeneity of death rates across hospitals

The hypothesis that the death rate is the same at all hospitals is intrinsically unlikely and is irrelevant to demonstrating safety at an individual hospital. The χ2 test was used to test homogeneity. All hospitals were tested in a single test, and the death rate at each individual hospital compared with the rate elsewhere using the Bonferroni correction to the χ2 test. A significant result of a test of a contingency table with fixed totals does not imply anything about the magnitude of the difference in death rate. The Bonferroni correction was also applied to the safety test using the maximum likelihood estimator. A significant result implied that there were hospitals where the death rate was above the threshold and could not be explained away by more chance.

Quantification of the number of procedures required to have a chance of demonstrating safety

The minimum number of procedures to demonstrate safety at an individual hospital was calculated, and this required there to be no deaths. For a hospital with the national average death rate, the probability of no deaths in the minimum number of procedures was calculated, as were the numbers of procedures required for 90 and 95 per cent probability of demonstrating safety. For the corresponding numbers of procedures the probability that the 95 per cent confidence interval for the death rate exceeded the safety threshold was determined.

Moving average chart

To address the issue of local mortality monitoring, a moving average chart using local PAS data displayed the moving average of 180 days of discharges as a continuous trace. The local average mortality for 5 years was shown as a horizontal line with upper and lower 95 per cent binomial confidence limits based on the average number of procedures in 180 days. The chart also showed the average mortality for other hospitals for the most recently available HES year.

Risk adjustment for age and sex was incorporated in the moving average. A logistic regression model based on 5 years of local PAS data was used to calculate the odds ratios for age and sex. These were used to determine the expected mortality corresponding to the age and sex case mix of each 180-day interval. The moving average mortality shown was the average mortality for 5 years multiplied by the ratio of the actual to expected mortality for the 180-day interval. The 5-year average mortality was not adjusted for case mix.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Between 1 April 2000 and 31 March 2005 there were 112 527 diagnoses or repairs of aortic aneurysms in the UK. Of these, there were 26 822 infrarenal AAA operative repairs, 15 515 elective, 4845 urgent and 6462 ruptured. The mean mortality rate for elective repairs was 7·4 per cent.

Comparison with the national average mortality rate

With the safety threshold set equal to the national average (Fig.1), there was evidence that 30 (7·3 per cent) of 410 hospitals had mortality rates significantly above the national average. Additionally, these hospitals tended to provide low annual volumes of surgery. A minimum of 39 elective AAA repairs was needed to assess safety, although the probability of being able to assess safety at this level was low.

thumbnail image

Figure 1. Safety plot for in-hospital mortality after elective abdominal aortic aneurysm repair with the safety threshold set at equal to the average elsewhere (k = 1). Hospitals above the upper dashed line had evidence of danger and those below the lower dashed line had evidence of safety; those falling between the two dashed lines had insufficient evidence of safety or danger. RR, relative risk

Download figure to PowerPoint

Most of the hospitals were grouped within the control bands, which meant that there was insufficient statistical evidence to state that the mortality rate was greater, or less, than the national average. A large number of hospitals had a RR greater than 1, despite lying within the control bands, that is the mortality rate was greater than that elsewhere, but it was not possible to show that these results were consistently greater than the national average.

Comparison with twice the national average mortality rate

There was evidence that the mortality rate for a number of hospitals was significantly less than twice the national average (Fig.2). However, there were three hospitals for which there was sufficient evidence to state that the mortality rate was consistently more than twice the national average (mortality rate nearly 15 per cent). There was greater than a 95 per cent probability that the mortality rate at these hospitals was more than twice the national average, although the figures were not adjusted for co-morbidity.

thumbnail image

Figure 2. Safety plot for in-hospital mortality after elective abdominal aortic aneurysm repair with the safety threshold set at twice the average elsewhere (k = z). Hospitals above the upper dashed line had evidence of danger and those below the lower dashed line had evidence of safety; those falling between the two dashed lines had insufficient evidence of safety or danger. RR, relative risk

Download figure to PowerPoint

Twenty elective AAA repairs were needed to assess the safety of a hospital at a threshold of twice the national average mortality rate, with a probability of demonstrating safety of 21·6 per cent.

Moving average chart

A moving average chart was constructed using local PAS data to demonstrate local mortality monitoring (Fig.3). The local mortality since 2002 was 3·7 per cent. The moving average, over the last 180 days of discharges, was within the confidence limits (1·6 per cent), despite fluctuations in the mortality.

thumbnail image

Figure 3. Moving average chart using Patient Administration System data for 2002 onwards. The solid line is the national average for 2000–2005. The bold horizontal line represents the National Average mortality rate over 5 years of 7·4 per cent. The moving average (blue line) is the local average mortality rate over the last 180 days of discharges (1·6 per cent). The middle red line is the local average mortality rate over 5 years (3·7 per cent), with the upper and lower red lines representing the 95 per cent binomial confidence limits based on the average number of procedures in 180 days. Data were risk-adjusted for age, gender and considered elective cases only

Download figure to PowerPoint

Tests of homogeneity of death rates across hospitals

An overall χ2 test for differences between hospitals had P = 0·0000035. However, 55 per cent of cells had expected counts fewer than 5 so that the precise P value was not reliable. Use of Fisher's exact test was not practical for this quantity of data. The three hospitals with the greatest evidence of danger had significantly high death rates after applying the Bonferroni correction to individual χ2 tests to allow for multiple testing of all hospitals; however, for only the least extreme of the three hospitals were the cell counts 5 or more. Applying the Bonferroni correction to two-tailed P values from the safety test, which used the maximum likelihood estimate, the probability at the most extreme hospital remained statistically significant by a small margin. The hypothesis of homogeneity was not supported by these data.

Quantification of the number of procedures needed to demonstrate safety

On the k = 2 safety plot, 20 procedures with no deaths was the minimum number required to demonstrate safety. For a hospital with the national average mortality, the probability of achieving this was 21·6 per cent. For 90 and 95 per cent probability of demonstrating safety, 100 and 200 procedures respectively were required. The probability that conventional 95 per cent confidence limits excluded twice the national average was similar. For 20 procedures, the binomial exact probability was 21·6 per cent; for 200 procedures the usual approximate 95 per cent limits to the normal had a 96·0 per cent probability of excluding twice the national average.

A minimum of 200 procedures was required for a probability greater than 95 per cent of demonstrating safety. This was an average of 40 procedures per annum, which was consistent with the volume thresholds identified in a recent meta-analysis and UK assessment of the relationship between hospital volume and outcome for elective AAA repair (40 and 32 procedures per annum respectively)4, 21.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Using contemporary UK data, this study demonstrated how in-hospital death rates could be monitored and compared with the national average. The example of elective AAA repair was used, although the methods could be applied to any coded procedure, diagnosis or specialty. In this example, sufficient evidence was found to conclude that some hospitals had mortality rates that required further investigation.

The techniques described and developed here added further evidence to the concept that increasing surgical volume reduced mortality for high-risk procedures4, 21–24. They should not be interpreted alone, but used in conjunction with other measures of risk and statistical analyses, such as risk-adjusted mortality rates and mortality control charts25, 26.

If any test of safety is to be useful it must have well defined thresholds at which further investigations of results should be initiated. The techniques presented here were based on a belief that a local mortality rate exceeding twice the national average and reaching a 0·05 level of significance constituted a breach of the threshold, and therefore presented sufficient evidence to instigate investigation. Three centres met these criteria on the k = 2 safety plot.

It was possible that natural fluctuations in the death rates of surgery existed owing to the heterogeneous state of patients as a study population, and the role of external factors such as healthcare economics and the financial resources within the NHS. Because of this possibility the investigation was repeated at two threshold values. However, it is unlikely that this would explain a 15 per cent mortality rate for elective AAA repairs, although rigorous local audit would be required before any definitive statement on individual unit mortality could be made. The use of such data, which were not risk adjusted, to make any decisions on local treatment protocols would be flawed without internal and external review.

Despite the generous allowance for natural variations included in the investigation, significant evidence remained that three hospitals in the UK had mortality rates greater than 14·8 per cent for elective AAA repairs. Thirty hospitals had mortality rates consistently greater than 7·4 per cent. Conversely, some hospitals performed a large volume of surgery and maintained mortality rates of less than 5 per cent per annum, and provided evidence of safety.

The moving average chart provided a way in which a clinician could convey information on local mortality rates to patients, and compare them directly with national data. The charts used the most current data available, while HES are historical by the time of publication. This deals directly with two issues raised explicitly in the Bristol Inquiry1, 2, namely stating misleading risk estimates to patients, and a failure to consider referring a patient to another hospital if the local mortality rates are higher than elsewhere.

Identification of hospitals with mortality rates statistically above the national average raised the question of whether the hospital management, or physicians, were aware. Many hospitals do not have mortality monitoring groups26, 27 that can identify trends in mortality. In addition to the visual aid of the moving average chart, a cumulative summation26, 28, 29 technique, such as the cumulative risk-adjusted mortality chart28, should be used as an alert when the death rate within a particular department changes suddenly. Any hospital can set up these control charts using locally available PAS data. They allow real-time analysis of local mortality rates, along with long-term local average death rates, comparison with the national average or a fixed standard, and visual reference to alarm rates.

These data must be risk adjusted if the results are to be interpreted correctly16, 30. In this study only age and sex were used for risk adjustment in the moving average plot, and only elective admissions were considered, as these factors have the largest impact on mortality rates. However, as methods for mortality monitoring are developed, other systems that require further clinical data such as the Surgical Mortality Score30 are desirable.

No age or sex adjustment was used in the safety plots. This was because, at present, no technique has been described for using HES data. This was especially applicable to the safety plot comparing mortality at a particular hospital with twice that elsewhere. This adjustment would be possible using logistic regression rather than bivariate binomial analyses, but these techniques need development and validation before applying them to data sets.

As there was significant evidence of heterogeneity, the conclusion of non-random variation was forced (that is there was a true variation in the mortality rate between centres). It did not follow that the exceptions to random variation necessarily occurred at the extremes of death rates, as there could be a hospital with a modestly divergent death rate but with a sufficient number of procedures to cause the statistical exception. Additionally, in the present stage of knowledge any variation in death rate should be linked only cautiously with quality of care.

The minimum number of procedures required to comment on the in-hospital death rate of a hospital was assessed in order that the data might be interpreted meaningfully. The results of recent studies of the relationship between hospital annual volume and mortality for AAA repair4 suggested that all hospitals aiming to provide AAA services should be performing at least 32 elective AAA repairs per annum. This study added a further dimension to the argument that hospitals should be required to perform a minimum threshold volume of any named elective procedure18. Not only has increased volume been shown to improve outcome4, 21, 31–33 but it contributes to the accurate assessment of local inpatient death rates.

Dr Foster11, a commercial analyser of NHS data, creates league tables based on standardized mortality ratios (observed to expected mortality) at the level of the hospital for certain index procedures. A second company, CHKS Ltd12, provides analytical benchmarking to the NHS through the production of league tables based on risk-adjusted mortality rates, complication rates, duration of hospital stay and readmission rates. To present data in these ways is not particularly useful to clinicians or informative to patients and, in fact, may mislead patient decision-making. Data must be available by procedure for useful comparisons to be made. The techniques described here allow procedural level data to be compared with the average elsewhere and for mortality rates to be examined at individual hospitals while avoiding the significant pitfalls of league tables. This study has not named any hospitals directly or produced a one-dimensional league table, but it has revealed that there was evidence of safety for some hospitals but not for others.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References