‘Measurement for Improvement Not Judgement’ – the Case of Percutaneous Nephrolithotomy



If outcome data is to be measured, reported and used for improvement, it must be derived from the most accurate and robust source possible, must engage clinicians and must be processed using sound methodology. We discuss this using the example of percutaneous nephrolithotomy (PCNL) in the UK.

From Where Should We Obtain Data?

The Hospital Episode Statistics (HES) database is a routine administrative dataset, capturing information on all English NHS hospital admissions and providing an overview of an individual patient's care [1]. It is collected prospectively by trained clinical coders. There are both advantages and disadvantages in data collection being non-clinician led. The accuracy of data reflects the accuracy of coding, which is improving but may vary in quality [2-4].

Unlike routinely captured administrative data, registry datasets are designed and submitted by clinicians, to capture the information that is felt to be clinically relevant. For example, the BAUS PCNL registry records rates of stone clearance postoperatively, a level of detail that is not available in HES data [5]. Furthermore, the registry records information on stone burden and complexity, which is thought to be highly important in determining risk-adjustment when comparing PCNL outcomes.

However, most registry data are submitted by surgeons voluntarily. Inevitably, this results in an incomplete dataset leading to an inherent bias when attempting to compare the outcomes of different surgeons and centres. Comparison of data recorded in HES with that collected from the BAUS PCNL data registry demonstrates the relative incompleteness of the registry data [5, 6]. The registry data were extracted for a period of 20 months, from which 50 centres submitted data on 1028 procedures. By comparison the HES data, over almost 5 years, captured 6783 procedures from 165 centres. There also remains a reasonable concern that ‘delayed outcomes’ (e.g. emergency hospital readmission) may not be reliably reported by clinicians, whereas HES data may be used to follow patients indefinitely through different hospital admissions.

Therefore, for evaluating PCNL outcomes, a dataset linking HES data with registry data may represent the best way forward, as long as concerns persist about the completeness of registry data. The two data sources may be used in conjunction to corroborate findings, and the differing strengths of each data source may be used to address the other's limitations [6]. Moreover, the Society of Cardiothoracic Surgeons (SCTS) has endorsed the use of national registry data, enriched with HES data for the reporting of surgeon-level outcomes [7].

What Is the Role of the Urologist?

For measurement of outcomes to engender improvement requires the committed engagement of all clinicians involved. Specifically, clinicians are best placed to design the registry datasets and can drive improvement in the accuracy of clinical coding when HES data are used.

Sensitively engaging surgeons in the process also averts the risk of it feeling judgemental or punitive. However, surgeons are usually not best placed to tackle all of the methodological challenges inherent in ensuring that robust data are published. Dedicated methodological support is invaluable.

Attention to these concerns has been exemplified by the SCTS, who incorporated a team of database managers, an audit project manager and data analysts into their drive to report surgeon-level outcomes [7]. This infrastructure helped to ensure accurate and complete data and enables clinicians to engage with, without being unduly burdened by the process.

What Are the Methodological Pitfalls?

Volume and Power

A recent methodological study examined whether individual surgeons perform sufficient numbers of procedures to reliably allow identification of poor performance [8]. This found that where volumes are low, the chance of identifying outliers is limited. Recommendations from this study include using outcomes that are frequent, considering the hospital as the unit of reporting when numbers are low, and avoiding interpretation of no evidence of poor performance as evidence of acceptable performance.

Volume is known to vary widely between centres in PCNL [6], which may mean that hospitals, rather than surgeons should be compared, to optimise statistical power. However, this is at odds with the wider prioritisation of transparency at the level of the individual surgeon. Resolving such conflicts between methodological and clinical priorities is challenging.

How to Handle Missing Data

Incompleteness of data is unavoidable. It is essential to adopt an approach to handling missing data that is robust from both statistical and governance perspectives; it must not be ignored [4]. Bridgewater et al. [4] recommend replacing missing outcome data with unfavourable values, as default. This maximises the chance of detecting underperformance and potentially incentivises clinicians to ensure data from their units are complete. This approach may be regarded as excessively severe; however, especially where data collection is not adequately supported or funded and essential to deciding on such strategies is consensus among those being measured.

How to Respond to Poor Performance

The SCTS identified four components to a constructive response to poor performance:

  • Analysis of the data for accuracy.
  • Analysis of the caseload to ensure that the risk stratification mechanism accurately reflects expected outcomes.
  • Analysis of institutional factors that may contribute to the divergence in clinical outcomes.
  • Analysis of the surgeon's performance.

The president of the SCTS contacts surgeons with unexpectedly poor outcomes, enabling them to discuss each of the above considerations. Importantly, outcomes are then published on the SCTS website [7]. The SCTS has assumed ownership of the process.


Openness and transparency must be pursued responsibly. Prematurely publishing inaccurately risk-adjusted data or comparisons that lack validity or are unrepresentative have been seen to be counterproductive [9].

Furthermore, where numbers of cases are relatively low, there is a real risk that reported differences in outcomes are either over-exaggerated or under-reported [8]. In some cases, it may be inappropriate to attempt surgeon-level analysis at all, due to a lack of statistical power in any comparison.

Mechanisms for exploring the reasons for unexpectedly poor outcomes must be in place [2]. Strategies for managing data that show outliers, in terms of safety and effectiveness outcomes, need to be developed. The appropriate time for this is before publication and ideally before collection or processing of the data.

While measurement has the potential to engender improvement in outcomes, openness and transparency must be pursued carefully and responsibly.

Conflict of Interest

None declared.


Hospital Episode Statistics


percutaneous nephrolithotomy


Society of Cardiothoracic Surgeons