- Top of page
In the UK, clinical governance evolved following the General Medical Council inquiry into mortality rates after paediatric cardiac surgery1, 2. However, there is no consensus as to how individual unit mortality rates should be evaluated; defining a high mortality rate for a particular procedure is difficult owing to significant variations in case mix3.
A recent study of elective abdominal aortic aneurysm (AAA) repair using discharge data from England showed that the national mean mortality rate was 7·4 per cent between 2001 and 20054. This value was higher than that reported in recent national trials, even after risk adjustment5. The English discharge data also demonstrated that mortality rates at individual hospitals varied widely and were related to the annual hospital volume.
In reference to cardiology and trauma services, the Institute for Public Policy Research suggested that the closure of less effective services at smaller local hospitals would improve survival rates6, as modern interventions are complex, expensive and could be provided at a smaller number of specialist hospitals. This policy appears to have the support of the present UK Government, with the Prime Minister stating recently that patients will have to travel further to access specialist care7 (which is contrary to previous proposals8–10).
Hospital league tables have become topical in the popular press, despite arousing strong feelings within the medical community. In the past, schemes such as Dr Foster11 and CHKS12 have ranked a hospital's performance on the basis of crude hospital-level mortality rates, although case-mix adjustment has been added more recently. These tables often present mortality figures for the hospital as a single entity, rather than examining individual specialties and procedures, beyond certain ‘index’ procedures such as fractured neck of femur. There is now a proposal to rank individual physician and surgeon mortality rates in league tables, as in cardiothoracic surgery.
The use of crude mortality figures is inappropriate as case mix and case selection play a pivotal role in the outcomes of medical interventions, with older patients with greater co-morbidity having a higher mortality rate. As case mix varies widely between hospitals, any interpretation of league tables based on crude mortality rates will be inaccurate and unreliable. Ultimately, the publication of unadjusted data may drive hospitals and physicians to treat younger, fitter patients, and deny treatment to patients who are elderly or have significant co-morbidity13. Alternatively, cases may be coded with inappropriate complexity so that the hospital comes out better in the rankings, behaviour widely recognized, especially in the USA where league tables have been published for some years13–16.
An alternative to league tables could be a system developed to assess the mortality rates at individual hospitals, and to suggest mechanisms for monitoring the mortality rate locally. The theoretical basis of this study was to assess mortality rates and to investigate whether hospitals could prove evidence of safety, rather than no evidence of danger. The study approached the question of clinical governance and mortality monitoring, using elective AAA repairs as an example. It used Hospital Episode Statistics (HES) data17 between 1 April 2000 and 31 March 2005, as well as local Patient Administration System (PAS) data from the authors' hospital, and built on the findings of a previous theoretical study18.
- Top of page
HES data for all consultant episodes in National Health Service (NHS) hospitals in England were acquired for the years 2000–2005. Data were extracted for elective infrarenal AAA repair using previously published methods and inclusion criteria4, 19. The data included details of the admitting hospital, patient age and sex, duration of hospital stay, timing of operations relative to date of admission, complications, and whether the patient was discharged alive or dead. The quality of this data set was assessed and validated4, 20.
Local data, obtained from St George's Hospital PAS database using the same inclusion criteria, were used to validate the HES data, and construct a method for local mortality monitoring.
Safety charts: the quantification of safety or danger
Safety charts were constructed to show for each hospital the strength of evidence that the death rate was below a threshold equal to the national average and below a threshold equal to twice the national average. If there was statistically significant evidence that the death rate was below the threshold, this was described as evidence of safety. If the evidence was not statistically significant, insufficient there was evidence of safety. If there was statistically significant evidence that the death rate exceeded the threshold, this was described as evidence of danger. Insufficient evidence of safety or danger was considered unhelpful and not reported.
The mortality rate in each hospital was compared with the national mean mortality rate, after the exclusion of the data from that hospital (that is compared with the mortality rate elsewhere). This provided the relative risk (RR) of mortality at each hospital. The investigation was performed twice, comparing each hospital with the national average mortality rate (k = 1) and with twice the national average mortality rate (k = 2). The national average and twice the national average mortality rates were termed safety thresholds in this study. The risk ratios were calculated relative to the estimated risk in the comparison group for both investigations (k = 1 and k = 2), rather than to the safety threshold.
When the safety threshold was equal to the national average, the hospitals were arranged into two groups: hospitals with a RR less than 1 and those with a RR of 1 or more.
With the safety threshold set at twice the national average mortality, the data were arranged into three groups: hospitals with a RR less than 1, hospitals with a RR between 1 and 2, and hospitals with a RR greater than or equal to 2.
To assess the evidence for safety or danger of individual hospitals, mortality at each hospital was compared with the mortality elsewhere, that is national data excluding data from the individual hospital. The number of procedures was considered to be fixed both for the individual hospital and for all other hospitals combined, and the numbers of deaths to be independently binomially distributed. For a safety threshold equal to the national average, the joint distribution for the numbers of deaths was based on a common death rate using the maximum likelihood estimate, that is the combined mean national death rate.
The P value for an individual hospital diverging by as much as observed, or more, from the safety threshold was determined as the sum of probabilities from the joint distribution for all possible pairs of number of deaths for which the RR was as extreme, or more extreme, than observed. For a safety threshold equal to twice the mortality elsewhere (k = 2), the maximum likelihood estimate for a mortality rate exactly twice that elsewhere was used instead of the national mean.
For hospitals with death rates below (or above) the threshold value, the one-tailed P values indicated the strength of evidence of safety (or danger). P values were displayed on a scale of log10(odds) to distinguish small values that differed by orders of magnitude. Odds were used rather than P values to exploit the fact that log(odds) are equal to 0 for P = 0·5 and so evidence of safety and danger can be shown in different directions on the y-axis. P values were shown without adjustment for the multiplicity of testing implied by comparing mortality at each hospital with mortality elsewhere.
Log(odds) of 1·3, equivalent to one-tailed P = 0·050, was indicated by horizontal lines on the charts. Hospitals that lay outside the two lines generated a significant weight of evidence that their mortality rate was inconsistent with the threshold value, being either higher or lower. Hospitals that lay within the ‘control bands’ may still have had a RR of mortality greater than, or greater than twice, the national average, but there was insufficient evidence to be able to identify them as safe or unsafe. Overall, this technique provided three alternative states into which hospitals fell: evidence of safety, insufficient evidence of safety or danger, or evidence of danger.
Maximum likelihood estimator for the safety test
Under the null hypothesis that the death rate is a constant, p, at the hospital of interest while it is exactly p/k elsewhere, the maximum likelihood estimate for p can be determined by standard methods. The numbers of procedures at the hospital, n2, and elsewhere, n1, were assumed to be fixed. The corresponding numbers of deaths, d2 and d1, were not assumed to be fixed. This is an important difference from the χ2 test and many other tests of a contingency table. A further assumption was that the deaths at the hospital of interest were independent of one another, and that the same applied elsewhere. Thus, the joint likelihood of d1 and d2 is the product of two binomial probabilities. The maximum likelihood estimate is obtained by differentiating the log(likelihood) with respect to p and setting the differential equal to 0. The maximum likelihood estimate is a solution of the quadratic equation a·p2 + b·p + c = 0, where a = n1 + n2, b = − (d1·k + n1 + n2·k + d2) and c = (d1 + d2)·k.
Homogeneity of death rates across hospitals
The hypothesis that the death rate is the same at all hospitals is intrinsically unlikely and is irrelevant to demonstrating safety at an individual hospital. The χ2 test was used to test homogeneity. All hospitals were tested in a single test, and the death rate at each individual hospital compared with the rate elsewhere using the Bonferroni correction to the χ2 test. A significant result of a test of a contingency table with fixed totals does not imply anything about the magnitude of the difference in death rate. The Bonferroni correction was also applied to the safety test using the maximum likelihood estimator. A significant result implied that there were hospitals where the death rate was above the threshold and could not be explained away by more chance.
Quantification of the number of procedures required to have a chance of demonstrating safety
The minimum number of procedures to demonstrate safety at an individual hospital was calculated, and this required there to be no deaths. For a hospital with the national average death rate, the probability of no deaths in the minimum number of procedures was calculated, as were the numbers of procedures required for 90 and 95 per cent probability of demonstrating safety. For the corresponding numbers of procedures the probability that the 95 per cent confidence interval for the death rate exceeded the safety threshold was determined.
Moving average chart
To address the issue of local mortality monitoring, a moving average chart using local PAS data displayed the moving average of 180 days of discharges as a continuous trace. The local average mortality for 5 years was shown as a horizontal line with upper and lower 95 per cent binomial confidence limits based on the average number of procedures in 180 days. The chart also showed the average mortality for other hospitals for the most recently available HES year.
Risk adjustment for age and sex was incorporated in the moving average. A logistic regression model based on 5 years of local PAS data was used to calculate the odds ratios for age and sex. These were used to determine the expected mortality corresponding to the age and sex case mix of each 180-day interval. The moving average mortality shown was the average mortality for 5 years multiplied by the ratio of the actual to expected mortality for the 180-day interval. The 5-year average mortality was not adjusted for case mix.
- Top of page
Using contemporary UK data, this study demonstrated how in-hospital death rates could be monitored and compared with the national average. The example of elective AAA repair was used, although the methods could be applied to any coded procedure, diagnosis or specialty. In this example, sufficient evidence was found to conclude that some hospitals had mortality rates that required further investigation.
The techniques described and developed here added further evidence to the concept that increasing surgical volume reduced mortality for high-risk procedures4, 21–24. They should not be interpreted alone, but used in conjunction with other measures of risk and statistical analyses, such as risk-adjusted mortality rates and mortality control charts25, 26.
If any test of safety is to be useful it must have well defined thresholds at which further investigations of results should be initiated. The techniques presented here were based on a belief that a local mortality rate exceeding twice the national average and reaching a 0·05 level of significance constituted a breach of the threshold, and therefore presented sufficient evidence to instigate investigation. Three centres met these criteria on the k = 2 safety plot.
It was possible that natural fluctuations in the death rates of surgery existed owing to the heterogeneous state of patients as a study population, and the role of external factors such as healthcare economics and the financial resources within the NHS. Because of this possibility the investigation was repeated at two threshold values. However, it is unlikely that this would explain a 15 per cent mortality rate for elective AAA repairs, although rigorous local audit would be required before any definitive statement on individual unit mortality could be made. The use of such data, which were not risk adjusted, to make any decisions on local treatment protocols would be flawed without internal and external review.
Despite the generous allowance for natural variations included in the investigation, significant evidence remained that three hospitals in the UK had mortality rates greater than 14·8 per cent for elective AAA repairs. Thirty hospitals had mortality rates consistently greater than 7·4 per cent. Conversely, some hospitals performed a large volume of surgery and maintained mortality rates of less than 5 per cent per annum, and provided evidence of safety.
The moving average chart provided a way in which a clinician could convey information on local mortality rates to patients, and compare them directly with national data. The charts used the most current data available, while HES are historical by the time of publication. This deals directly with two issues raised explicitly in the Bristol Inquiry1, 2, namely stating misleading risk estimates to patients, and a failure to consider referring a patient to another hospital if the local mortality rates are higher than elsewhere.
Identification of hospitals with mortality rates statistically above the national average raised the question of whether the hospital management, or physicians, were aware. Many hospitals do not have mortality monitoring groups26, 27 that can identify trends in mortality. In addition to the visual aid of the moving average chart, a cumulative summation26, 28, 29 technique, such as the cumulative risk-adjusted mortality chart28, should be used as an alert when the death rate within a particular department changes suddenly. Any hospital can set up these control charts using locally available PAS data. They allow real-time analysis of local mortality rates, along with long-term local average death rates, comparison with the national average or a fixed standard, and visual reference to alarm rates.
These data must be risk adjusted if the results are to be interpreted correctly16, 30. In this study only age and sex were used for risk adjustment in the moving average plot, and only elective admissions were considered, as these factors have the largest impact on mortality rates. However, as methods for mortality monitoring are developed, other systems that require further clinical data such as the Surgical Mortality Score30 are desirable.
No age or sex adjustment was used in the safety plots. This was because, at present, no technique has been described for using HES data. This was especially applicable to the safety plot comparing mortality at a particular hospital with twice that elsewhere. This adjustment would be possible using logistic regression rather than bivariate binomial analyses, but these techniques need development and validation before applying them to data sets.
As there was significant evidence of heterogeneity, the conclusion of non-random variation was forced (that is there was a true variation in the mortality rate between centres). It did not follow that the exceptions to random variation necessarily occurred at the extremes of death rates, as there could be a hospital with a modestly divergent death rate but with a sufficient number of procedures to cause the statistical exception. Additionally, in the present stage of knowledge any variation in death rate should be linked only cautiously with quality of care.
The minimum number of procedures required to comment on the in-hospital death rate of a hospital was assessed in order that the data might be interpreted meaningfully. The results of recent studies of the relationship between hospital annual volume and mortality for AAA repair4 suggested that all hospitals aiming to provide AAA services should be performing at least 32 elective AAA repairs per annum. This study added a further dimension to the argument that hospitals should be required to perform a minimum threshold volume of any named elective procedure18. Not only has increased volume been shown to improve outcome4, 21, 31–33 but it contributes to the accurate assessment of local inpatient death rates.
Dr Foster11, a commercial analyser of NHS data, creates league tables based on standardized mortality ratios (observed to expected mortality) at the level of the hospital for certain index procedures. A second company, CHKS Ltd12, provides analytical benchmarking to the NHS through the production of league tables based on risk-adjusted mortality rates, complication rates, duration of hospital stay and readmission rates. To present data in these ways is not particularly useful to clinicians or informative to patients and, in fact, may mislead patient decision-making. Data must be available by procedure for useful comparisons to be made. The techniques described here allow procedural level data to be compared with the average elsewhere and for mortality rates to be examined at individual hospitals while avoiding the significant pitfalls of league tables. This study has not named any hospitals directly or produced a one-dimensional league table, but it has revealed that there was evidence of safety for some hospitals but not for others.