A machine learning model to predict the need for conversion of operative approach in patients undergoing colectomy for neoplasm

Abstract Background Studies comparing conversion from laparoscopic to open approaches to colectomy have found an association between conversion and morbidity, mortality, and length of stay, suggesting that certain patients may benefit from an open approach “up‐front.” Aim The objective of this study was to use machine learning algorithms to develop a model enabling the prediction of which patients are likely to require conversion. Methods and Results We used ACS NSQIP data to identify patients undergoing colectomy (2014‐2019). We included patients undergoing elective colectomy for colorectal neoplasm via a minimally invasive approach or a converted approach. The outcome of interest was conversion. Variables were included in the model based on their correlation with conversion by logistic regression (p < .05). Two models were used: weighted logistic regression with regularization, and Random Forest classifier. The data was randomly split into training (70%) and test (30%) cohorts, and prediction performance was calculated. 24 327 cases were included (17 028 training, 7299 test). When applied to the test cohort, the models had an accuracy of 0.675 (range 0.65–0.70) in predicting conversion; c‐index ranged from 0.62–0.63. This machine learning model achieved a moderate area under the curve and a high negative predictive value, but a low positive predictive value; therefore, this model can predict (with 95% accuracy) whether a colectomy for neoplasm can be successfully completed using a minimally invasive approach. Conclusion This model can be used to reassure surgeons of the appropriateness of a minimally invasive approach when planning for an elective colectomy.


| INTRODUCTION
Colorectal surgery is a rapidly evolving field with a history of leadership and pioneering in the adoption of new technologies, most prominently minimally invasive surgery (MIS).Laparoscopic surgery is now considered the standard approach for most elective colectomies.
[3] Intraoperative conversion from laparoscopic to open colectomy is variably reported in the literature as occurring in 5.2%-77% of cases. 4,5The decision to convert may be for technical reasons, like the F I G U R E 1 PRISMA flow diagram of cohort build.Post-hoc exclusion of cases missing T-stage data is not captured in this figure .T A B L E 1 Univariable logistic regression for operative conversion.class III-IV; a history of smoking; and weight loss were at a higher risk for conversion. 14e identification of surgical patients at high risk for conversion may improve shared decision-making and patient selection for elective colectomies.In this study, we sought to identify patientspecific and disease-specific factors that can estimate the risk of conversion from laparoscopic to open colectomy among colorectal cancer patients undergoing an elective colectomy.We aimed to test the ability of two machine learning algorithms to predict for conversion.

| Study design and cohort build
We performed a retrospective cohort study using the NSQIP dataset.
The study cohort was built by matching records from the PUF with those from the TCF using a unique NSQIP identifier.Exclusion criteria included: patients undergoing surgery via a planned open approach, patients undergoing emergency or non-elective surgery, patients who were systemically unwell at the time of surgery (i.e., had disseminated cancer, were mechanically ventilated, or who had sepsis or septic shock), patients who were undergoing colectomy for indications other than colonic neoplasm (included diagnoses can be found in Table S1), and patients undergoing significant additional procedures at the time of surgery (e.g., multivisceral resections, Table S2).A flow diagram of the cohort construction is shown in Figure 1.
The primary outcome of the study was any conversion from a minimally invasive approach to an open approach.Operative approach dichotomized as either MIS or converted, with laparoscopic and robotic approaches, as well as laparoscopic and robotic approaches with open assist all grouped under the umbrella MIS approach.The cohort was divided into two groups by approach (i.e., MIS and converted) and factors of interest were compared to identify significant predictors of conversion.Factors of interest included patient-and disease-specific variables.

| Statistical analysis and machine learning model
The outcome was conversion from MIS to open surgery, intraoperatively.Association with conversion screening was performed based on testing the correlation of each factor to the occurrence of conversion.
This was performed using a logistic regression model.All factors with a p-value <.05 were used as candidate predictors to build the predictive models (i.e., univariable regression was used to select which factors to include in subsequent predictive models).
A total of two predictive models were used: ( A grid search for each randomly chosen combination of parameters was set up to identify the best tuning parameters.The different parameters set in the grid was assessed, and the optimal parameters were selected according to a maximum area under the curve (AUC).A final model was obtained by retraining the algorithm on the entire development set using the optimal parameters selected, and the performance of this final model was assessed on the testing cohort.
We measured each models' prediction performance by computing C statistics, accuracy, sensitivity and specificity.All analysis was performed using R 3.6.1 (glmnet package) and Python 3.9 (scikitlearn package).

| RESULTS
A total of 35 136 patients were included.In 10 771 patients, T stage data was missing-these cases were excluded post hoc.On univariable logistic regression analysis, age, body mass index (BMI), sex, dyspnea, diabetes, ASA class, wound class, hypertension, renal failure, chronic obstructive pulmonary disease (COPD), presence of ascites, T stage, weight loss, pneumonia, steroid use, and presence of a bleeding disorder were significant predictors of conversion (Table 1).
In order to develop an instrument to predict the occurrence of conversion, 24 327 patients were included (patients with missing  This study presents two machine learning models to predict conversion of operative approach in colectomy (from laparoscopic to open), using a large international dataset collected through NSQIP.The two models demonstrated similar performance across metrics, suggesting that neither approach was superior in this context.
The performance analysis of these models demonstrates a moderate AUC, with a high NPV and a low PPV.This suggests that these models can provide a high degree of confidence as to whether a minimally invasive procedure can be successfully completed; however, other clinical decision measures must be used to accurately predict which patients might require conversion.Given the numerous potential benefits associated with an MIS approach to colectomy, the model described here may provide physicians with reassurance that an MIS approach is feasible in cases of uncertainty.
Though the goal of the study was to create a model that can reliably predict which patients undergoing colectomy for neoplasia will require conversion (i.e., a model with a high PPV), we instead created a model that reliably predicts which patients will not require conversion.This may, in reality, be more clinically useful because it provides reassurance to patients and surgeons who are favoring an MIS approach.In contrast, a model with a high PPV may dissuade patients and surgeons from attempting an MIS colectomy.[17][18][19] Although the models used in our study did not achieve high performance metrics, other, more sophisticated machine learning models (like neural network and support vector machine models) may hold promise-this a potential future direction of study.
Our study was limited by the retrospective nature of its design, as well as its use of the NSQIP database.Though this database has been frequently used for outcomes-based research, it is imperfect and does not capture many of the granular details that might be useful in answering a research question like this one.For example, the NSQIP database does not capture reason for operative conversion, nor the timing of operative conversion (i.e., at what point in the procedure conversion occurred).Prior research suggests that these factors can provide a more nuanced understanding of conversion by differentiating between "types of conversion."One of the major categories traditionally discussed is the a priori likely conversions, in which the surgeon suspects that a case may be difficult to perform via MIS and decides to convert after quickly surveying the state of the abdominal anatomy.This is in contrast to a conversion that occurs later in the case following an intraoperative event (e.g., an injury) that cannot be readily controlled using an MIS approach. 20Similarly, the NSQIP database does not capture T A B L E 3 Performance metrics for two machine learning models for the prediction of conversion of operative approach.pertinent aspects of a patient's medical history, such as the presence of prior abdominal surgery, or the level and type of training the surgeons received.A more granular dataset providing such variables may have allowed our machine-learning model to predict for these different types of conversion more accurately; this is an interesting direction for future research.A notable strength of this study is its large sample size and its novel approach to the question of whether or not operative conversion in cases of colectomy can be predicted in a meaningful way.

| CONCLUSION
The machine-learning model described here can be used to provide patients and surgeons with a high degree of reassurance that a colectomy can be successfully performed using an MIS approach; however, other clinical decision measures must be used to positively predict for operative conversion.

2. 1 |
Data source and study population Data were obtained through the American College of Surgeons National Surgical Quality Improvement Program (NSQIP), an international validated, quality improvement program that collects demographic, operative, and 30-day outcome data from participating hospitals in patients undergoing specific surgical procedures.Since 2014, in addition to the general Participant User File (PUF), NSQIP has collected additional clinically relevant data on patients undergoing colectomy procedures and compiled it in the Targeted Colectomy File (TCF).The target population for this study was patients undergoing a colectomy procedure for benign or malignant neoplasm of the colon between 2014 and 2019 (inclusive).
information were excluded): 17028 (70%) in the training cohort used to create the predictive model, and 7299 (30%) patients were included in the test cohort to validate the model using only available data.Patients with missing variables were excluded from analysis.Following analysis of association with conversion status in the training cohort, variables included in the machine-learning based models were age, BMI, sex, dyspnea, diabetes, ASA class, wound class, hypertension, renal failure, COPD, presence of ascites, T stage, weight loss, pneumonia, steroid use, and presence of a bleeding disorder.When applied to the test cohort, the models had an overall accuracy of 0.675 (range 0.65-0.70)for predicting conversion to open surgery and c-index ranged from 0.62 to 0.63 (Figure 2).The logistic regression model and the random forest classifier model performed similarly, with c-indices of 0.63 and 0.62, respectively.The logistic regression model was slightly more accurate than the random forest classifier model (0.70 vs. 0.65) and had a higher specificity (0.72 vs. 0.65) but a lower sensitivity (0.48 vs. 0.53).The positive predictive values (PPV) for both models were low (0.11 and 0.10), but negative predictive values (NPV) for both models were high (0.95), see

F
I G U R E 2 (A) Area under the curve (AUC) graph of the performance of the logistic regression machine learning model.Faded lines represent iterations of the predictive model (30 in total), the dark blue line represents the mean AUC of all iterations.Solid black line represents the reference y = x line.(B) Area under the curve (AUC) graph of the performance of the random forest classifier machine learning model.Solid black line represents the reference y = x line.

Table 3 .
T A B L E 2 Multivariable logistic regression for operative conversion.