External validation of the Codman score in colorectal surgery: a pragmatic tool to drive quality improvement

The simple six‐variable Codman score is a tool designed to reduce the complexity of contemporary risk‐adjusted postoperative mortality rate predictions. We sought to externally validate the Codman score in colorectal surgery.


INTRODUC TI ON
The Report on the Lancet Commission on Global Surgery recently highlighted six core measurable indicators essential to achieving the goal of universal access to safe, affordable surgical and anaesthetic care [1]. Despite being rapidly taken up by practitioners, data points from which to derive the indicators were not defined, limiting comparability across time or settings. A convention of global experts took place to evaluate and explicitly define the indicators to improve comparability and support achievement of 2030 goals to improve access to safe affordable surgical and anaesthetic care globally.
One indicator, in-hospital perioperative mortality rate (POMR), was identified as crucial for monitoring progress towards this goal both in the original commission and following the convention. However, adverse events, including postoperative mortality, are an inevitable consequence of major surgery and some adverse events may be expected or even acceptable. Without case mix adjustment and taking the heterogeneity of surgical patients and procedures into account, hospitals that manage sicker patients appear to have worse outcomes [2]. Reporting POMR alone offers little by way of meaningful comparisons or identifying opportunities for quality improvement [3]. In the published update of the indicators, with respect to POMR the following statement was made: 'comparisons may become feasible at the intermediate and full level, when we agree that covariates for risk adjustment at the patient level should also be collected'. This paper aims to contribute to the work in identifying the covariates for risk adjustment.
External benchmarking, which allows direct inter-hospital performance comparisons, has been the cornerstone of quality improvement programmes in high income countries [4][5][6][7][8]. These comparisons exploit inter-hospital variation in risk-adjusted outcome estimates to identify centres performing significantly better or worse than their peers [9]. This information can be used to provide feedback to individual centres and to assist them with developing targeted and informed quality improvement initiatives [2,6]. Measurement alone may even improve outcomes-the so-called Hawthorne effect [2,8].
To date in the United States, the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) remains the most robust risk-adjusted, reliable and accepted tool [9].
The ACS NSQIP has been shown to reduce both morbidity and mortality in enrolled hospitals with initially worse performing hospitals having the largest improvements [5,7]. The programme requires the collection of over 150 variables, and a major limitation of the ACS NSQIP is that it is prohibitively expensive for many smaller rural US hospitals. The likelihood that it could be utilized by a hospital in a low-to-middle income country is low where arguably the greatest variation in surgical outcomes exists and therefore also the greatest potential for quality improvement. Previously, our group described the derivation and validation of a simple six-variable index, the Codman score, designed to preoperatively identify adult general or vascular surgery patients at risk for a major complication [10]. In the original validation it demonstrated excellent discrimination (area under receiver operating curve [AUC] 0.82, 95% CI 0.74-0.91) to predict in-hospital mortality in a prospective cohort in an academic hospital in South Africa. Applying the Youdin index principle identified that a binary cut-off score of 7 had a sensitivity of 80.8% and specificity of 80.0% to predict a major complication. Developed to ensure minimal data collection burden, the score can be used for preoperative planning including intensive care unit (ICU) allocation as well as be applied retrospectively to track ICU admission practices. Furthermore, a coefficient-based prediction rule has been derived based on the application of the score to the NSQIP essentials dataset, enabling surgical units around the world to use the Codman score for global benchmarking of risk-adjusted outcomes against the NSQIP consortium.
While the Codman score performed well in a validation study with general surgical patients, it is unclear whether it would perform well in a subset of specialty patients. As has been done with NSQIP, the procedure-specific model for subspecialties required additional variables than the core model for general surgery. The assumption is that procedure-specific outcomes and quality would require more granular data to assess. Our goal is to test this assumption, by seeing whether the Codman score, developed from data for general surgical patients, could perform for subspecialty patients. In this study, we sought to externally validate the Codman score in colorectal surgery.

Data source
We conducted a retrospective cohort study of the ACS NSQIP database. In 2012, ACS NSQIP implemented a targeted colectomy module to collect additional preoperative, intra-operative and postoperative variables specific to colorectal resections. These data can be merged with the ACS NSQIP master file for procedure-specific data augmentation. Prospectively collected data from the most recent master ACS NSQIP data file from 2020, as well as the colectomy module of 2020, were merged for analysis and linked based on the unique, de-identified case ID.

What does this paper add to the literature?
We have externally validated the Codman Score (age, ASA, emergency status, degree of sepsis, functional status, preoperative blood transfusion) for colorectal surgery as a simple tool to reduce the complexity of contemporary riskadjusted outcome predictions. We propose application of the Codman Score to drive quality improvement initiatives in colorectal surgery.

Exposure and outcome
Any patient who underwent a colorectal resection for any indication in 2020 was included in the analysis. A Codman score was assigned to each patient. The Codman score relies on six preoperative variables: age (≥65 years), functional status (partially or totally dependent), preoperative transfusions (≥4 packed cells in 72 h), emergency status, sepsis status (sepsis or septic shock) and the American Society of Anesthesiologists (ASA) score (≥3). The primary outcome was in-hospital mortality, and the secondary outcome was morbidity at 30 days ( Figure 1). Measures of association for each component of the Codman score, the complete score and the outcomes of interest were established using the chi-squared or Fisher's exact test. Logistic regression analyses were then performed individually using the Codman score and the ACS NSQIP mortality and morbidity algorithms as independent variables for the primary and secondary outcomes. The predictive discriminatory performance (AUC) of the Codman score and these algorithms were compared. When statistically significant differences in AUC were identified, a manual forward entry stepwise logistic regression analysis was performed to explore variables which may improve discriminatory performance. Furthermore, the performance of calibration was assessed using the Hosmer-Lemeshow goodness of fit (GOF) test. If a significant difference was identified, subgroup calibration analysis was performed. These comparisons were repeated in subgroups defined by age and comorbidities. The mortality and morbidity probability predictions of the Codman score algorithms were established and their correlation with their respective ACS NSQIP counterparts was calculated using a Spearman coefficient test. All statistical analyses were done in Stata 14.0.

RE SULTS
During the study period a total of 40 589 patients underwent a colorectal resection and were included in our analysis. The mean age of the cohort was 60.13 years (95% CI 60.12-60.14). A minimally in-  an ASA of greater than or equal to 3, 5980 (14.7%) were emergency cases, 3752 (9.2%) had evidence of sepsis or septic shock, 1069 (2.6%) had a blood transfusion ≤72 h prior to surgery and 1221 (3.0%) were partially or totally dependent. Patients were well distributed amongst the scores from 1 to 10.
A total of 883 (2.18%) patients died and 8081 (19.91%) experienced some morbidity at 30 days ( Table 2). Each component of the score had a significant association with mortality and morbidity (P < 0.0001). Table 2 also demonstrates that, for every 1-point increase in Codman score, there is a significantly increased odds of all adverse outcomes of interest. This is further demonstrated in Figure 2, which indicates that there is a stepwise increase in mortality and morbidity rates and an exponential increase in the odds of mortality for every 1-point increase in Codman score (P < 0.0001).
In the logistic regression analysis, the odds of in-hospital mortality ranged from 3.62 (95% CI 1. 18 (P < 0.05) (Figure 3). A manual stepwise forward entry regression analysis identified body mass index (BMI) and surgical approach (minimally invasive surgery, MIS) to be most significantly associated with morbidity. Adding these to the Codman score improved the morbidity AUC to 0.72 (95% CI 0.71-0.73), which was no longer significantly different from that of the NSQIP morbidity prediction (Figure 4).
The GOF tests of calibration were almost perfect for both the morbidity and mortality NSQIP predictions, with a chi-squared probability of almost 1 (P > 0.05). The GOF tests of calibration for the Codman score were significantly different from 1, suggesting poor calibration; however, the subgroup analysis revealed non-significant differences of observed versus expected mortality and morbidity at Codman scores of less than 4 or greater than 7.
The discriminatory performance of both scores was decreased in those aged >80 years, those who had experienced weight loss prior to surgery and those taking steroids. However, the discriminatory performances remained comparable (P > 0.05). The discriminatory performance of the morbidity predictions was improved in both scores in those patients with BMI >40 and the NSQIP algorithm remained significantly better. The patterns of calibration were not changed in the subgroup analyses.
There was a highly significant Spearman correlation to the Codman score and ACS NSQIP mortality algorithms (rho coefficient 0.88, P < 0.0001). The Codman score and the ACS NSQIP morbidity algorithms were also significantly correlated (rho coefficient 0.75, When a Codman cut-off score of 7 was applied, 382/883 (43.3%) unexpected failures and 2224/39 706 (5.6%) unexpected successes in mortality outcome were identified for the morbidity and mortality conference as presented in Table 3.

DISCUSS ION
In a cohort of 40 589 colectomies, we have validated the Codman score as a pragmatic tool for external benchmarking that has predictive ability comparable to the ACS NSQIP mortality and morbidity prediction rules. In this retrospective analysis, 99.2% of patients had complete data for the Codman score only limited by the ASA score, which was missing in 32 patients. Given the low data burden associated with the Codman score and its parsimonious design, this validated tool can be used in the most resource-limited settings prospectively to contribute to the operative decision making, informed consent and ICU allocation processes. Retrospectively it can be used to track risk-adjusted outcomes between providers and for the morbidity and mortality conference to drive quality improvement.
To date in the United States, the ACS NSQIP remains the most A coefficient-based prediction rule has been derived based on the application of the score to the NSQIP dataset and, based on a sixvariable Codman score, the expected observed over expected ratio can be calculated per patient and therefore for cohorts of patients per provider, hospital, collaborative, as described in our paper [2], to look for outliers of the observed over expected ratio and identify opportunities for quality improvement.
In the original derivation of the Codman score, a five-step methodology was used: 1. Development of a de novo surgical outcomes database modelled around ACS-NSQIP in a busy academic tertiary hospital in South Africa.
2. Use of the resultant data to identify all predictors of in-hospital death with more than 90% capture indicating a low data collection burden.  (Table 3). Arguably, these unexpected outcomes could just represent poor calibration of the score to this cohort; however, given our validation results, we propose this to be the signal to prompt further enquiry into these unexpected failures (and successes), stimulate quality improvement initiatives and add some objectivity to the morbidity and mortality conference.
Previous work has supported the fact that a parsimonious model, based on only a few variables, may provide enough discrimination to measure surgical outcomes in a risk-adjusted manner.
Rubinfeld et al. [12] found the c-statistic for mortality decreased only slightly from 0.907 using all variables to 0.902 using 10 variables and argue that only a few variables are required for predictive accuracy. Dimick et al. [6] found that limited models based on  This model has been tested against a wide variety of operations; however, for open colectomy (the only colorectal procedure to which it has been applied) the AUC was 0.86 for mortality and 0.66 for morbidity [15].
The Codman score, like other predictive scores, is less precise for  [16,17]. The advantage of the Codman score compared to these alternative models is the relative ease of data collection.
Recent work has moved towards the more minimalistic approach to predictive modelling used in the Codman score, with the most current NSQIP modified frailty index considering only five of the initially included 11 variables and demonstrating strong predictive ability for mortality and postoperative morbidity across surgical subspecialties [18]. This five-item modified frailty index was examined in a subset of patients undergoing colorectal surgery and was shown to be a valid prediction tool for postoperative outcomes, with strong agreement with the more complex 11-item modified frailty index (kappa 0.987) [19]. These studies support the proposition that such population-level prediction tools can be simplified and retain their utility and accuracy.
A significant weighting of the Codman score has been attributed to the ASA score. The American Society of Anesthetists (later the American Society of Anesthesiologists) first devised a classification system for patients in 1941. The initial intent of the scoring system was to have a common language for describing patients so that different anaesthesia providers could communicate with each other and it allowed for an easier way to record data. The original concept was not to create a tool for surgical outcomes prediction. More recently, however, it has been consistently shown to be one of the most powerful variables in surgical outcomes research. In our paper published in 2018, we proposed that it is frequently assumed that the most granular data and use of the largest number of variables for risk-adjusted predictions will increase accuracy [20]. However, this complexity is often at the expense of utility. In that paper, we used the single best predictor in surgical outcomes research, the ASA, to demonstrate that this is not the case. We concluded that one can simplify ASA into a three-category variable without losing any ability to predict outcomes. Similarly, in the original derivation of the Codman score, a binary ASA was equally predictive as the original ASA.
Our work must be interpreted within certain limitations of this study. The retrospective nature of this work is most certainly a limitation. Given the low data burden of this score, a valuable next step to its validation would be a prospective interrogation of its precision by units already participating in the NSQIP programme. Validating a NSQIP-derived score within a procedure targeted NSQIP dataset may not be the purest form of external validation but rather the most practical first step. Further external validation would be encouraged by this work, but this would be limited by the lack of global standardization of datasets. We would like our study to encourage the inclusion of the variables of the Codman score into a minimal dataset to promote global standardization of general and colorectal databases and external benchmarking of risk-adjusted outcomes. This would certainly align well with the original work of Ernest Amory Codman, a courageous early 20th century champion for an 'end results system' to track hospital outcomes [21].

CO N FLI C T O F I NTER E S T S TATEM ENT
No conflict of interests declared by any authors.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

TA B L E 3
Unexpected successes and failures identified by a Codman score of 7 or less.