Utilizing machine learning to predict unplanned cesarean delivery

To develop a comprehensive machine learning (ML) model predicting unplanned cesarean delivery (uCD) among singleton pregnancies based on features available at admission to labor.


| INTRODUC TI ON
Approximately 15% of vaginal delivery (VD) attempts in the USA eventually lead to an unplanned cesarean delivery (uCD). 1 The rates of uCD are even higher among some subgroups such as nulliparous women and those requiring labor induction, reaching reported rates of 21% 2,3 and 26%, 4 respectively. uCD during attempted VD is associated with higher morbidity and mortality rates than elective cesarean deliveries (CD), 5 including maternal trauma, hemorrhage, febrile morbidity, and other complications. 6,7 Precise prelabor identification of women at high risk of uCD could potentially contribute to a reduction of maternal and neonatal morbidity related to uCD on the one hand, while providing reassurance to the majority of women, who are at low risk for uCD.
Seminal studies developed clinical tools for the prediction of uCD among specific subgroups of women at relatively high risk for uCD such as nulliparous women, 2 those undergoing labor induction, 8 obese women undergoing induction, 9 or primigravid women induced with prostaglandins. 10 Some of these studies also provided an online calculator 8,9 or a nomogram. 2,4 However, these models were valid only among specific subgroups, limiting their generalizability and wide clinical use and most were limited to term deliveries. 2,4,9 Other models used intrapartum assessment such as fetal heart rate rating, so cannot serve for counseling at admission to labor. 11 A machine learning (ML) approach can detect patterns derived from large, complex data 12 and may provide a better and more comprehensive prediction in obstetrics 13,14 and other disciplines. ML for health care has been shown to be clinically useful also in low-income countries, 15 and primary care settings. 16 A recent study using ML reported an area under the curve (AUC) of 0.817 for uCD prediction, supporting the potential of ML for robust generalizable prediction of uCD. 17 However, the developed model was based on neonatal biometry, which is not available at the time of admission to labor, was limited to term deliveries, and the authors did not provide a model for clinical use.
Therefore, in the current study we sought to develop a comprehensive ML model for the prediction of uCD based only on features available at the time of admission for labor, providing women and obstetricians with a practical calculator for routine clinical use.

| Patients
We conducted a retrospective cohort study. The cohort consisted of deliveries that occurred at a university-affiliated tertiary medical center, between March 2011 and April 2021. Our medical center serves a large urban and rural area comprised of a heterogeneous population, mostly from Middle Eastern and European origin. Over 11 000 deliveries occur in our institution every year. We analyzed the electronic health records of all women who delivered during the study period to identify all deliveries that met the inclusion criteria.
These included VD attempts of vertex singletons, at 34 +0 weeks or more of pregnancy. We included deliveries from 34 weeks to increase the generalizability of the model. Exclusion criteria were intrauterine fetal demise or fetocide before admission for labor as well as trial of labor after CD. These were excluded because they comprise a different population with unique features. The study protocol was approved by the Sheba Medical Center review board (7145-20-SMC, 30/09/2020).

| Data collection
We collected baseline maternal baseline characteristics, obstetrical history, current pregnancy characteristics, fetal features, and features available at admission to labor.
Hypertensive disorders were defined according to the American College of Obstetricians and Gynecologists. 18 Diabetic disorders were defined as either pregestational diabetes, in accordance with the American Diabetes Association criteria, 19 or gestational diabetes mellitus, using the diagnostic thresholds established by Carpenter and Coustan. 20 When cervical ripening was required, either intracervical Foley catheter or prostaglandin E 2 was used at the discretion of the treating physician.

| Adjusted sonographic biometry according to gestational age
As fetal ultrasound biometry may change from the examination to the delivery, and due to the effect of fetal biometrical parameters on VD success rate, we adjusted the biometric parameters according to the gestational age at admission to labor. We used a validated method previously described in detail for adjusting the sonographic biometry parameters to gestational age at delivery (Appendix S1). 13

| Machine learning model and statistical analysis
We developed and report the model development, validation and test in adherence to the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis statement (TRIPOD). 21 An initial panel of 62 clinical characteristics was created.
Clinical characteristics were compared between the group of women who achieved VD and women who underwent uCD. Features with relative importance of more than 1% were selected to be part of the model. Additional feature selection was performed with the consideration of clinical knowledge.
The model building was performed using the training data set.
Four models (XGBoost, DRF, GBM, XRT) were examined to predict uCD using the selected features. Hyperparameters tuning of the models was performed using random search with cross validation.
The XGBoost model performed best, based on the receiver operator characteristic AUC results ( Figure S1).

SHAP (SHapley Additive exPlanations) summary was used to
quantify the contribution of each feature to the model prediction.
The SHAP plot combines feature importance with feature effect.
The features are ordered according to their importance.
Further description of the ML model development can be found in Appendix S1.
To evaluate the model's clinical discrimination and risk stratification across gestational ages and major clinical risk factors for uCD, we evaluated the model's positive predictive value (PPV) and negative predictive value (NPV) in the following subgroups: gestational age at admission to labor (≥34 to <37 weeks, >37 weeks), advanced maternal age (defined as ≥40 years), obesity (defined as body mass index [BMI; calculated as weight in kilograms divided by the square of height in meters] at admission ≥30), nulliparity, and the need for induction of labor.
Statistical analyses were performed using Python version 3.6.7.
The ML models were built using H2O package version 3.22.

| Ethics statement
The study protocol was approved by the institutional review board (#7145-20-SMC, 30/09/2020), who waived the need for informed consent.

| Demographics
During the study period 109 279 deliveries took place at the medical center. Based on the described inclusion and exclusion criteria, 73 667 deliveries remained in the final study cohort (Figure 1; Table S1).
Overall, 69 003 (93.7%) of the VD attempts were delivered vaginally, and 4664 (6.3%) VD attempts resulted in uCD. The characteristics of the VD and the uCD groups are presented in Table 1. Women in the VD group were younger, taller, and with F I G U R E 1 Schematic flow chart of patient inclusion in the study. a lower weight and BMI compared with women who underwent uCD. The median number of previous VD was higher in the VD group. Diabetic and hypertensive disorders rates were lower in the VD group. The adjusted biometric measurements: estimated fetal weight, head circumference, biparietal diameter, and abdominal circumference were lower in the VD group, whereas adjusted F I G U R E 2 Feature importance. *Adjusted for gestational age at time of admission for labor. BMI, body mass index (calculated as weight in kilograms divided by the square of height in meters).

F I G U R E 3
The SHAP® summary plot for the for the XGBoost model. SHAP, SHapley Additive exPlanations.
femur length and aminiotic fluid index did not differ between groups. Gestational age at admission to labor was lower in the VD group, whereas cervical dilatation and effacement were higher in the VD group. Fetal station at admission to labor was lower in the pelvis in the VD group. The rate of spontaneous onset of labor was higher in the VD group.
The training, validation, and test data sets comprised 48 084, 12 016, and 13 567 cases, respectively. The proportion of uCD were 6.0%, 5.9%, and 7.9% among each of the data sets, respectively (Table S2). Comparison between the study data set demographics is tabulated in Table S3. The leading indication of uCD was concern for fetal distress, leading to 53%, 52.6%, and 61.9% of uCD among the training, validation, and test data sets, respectively (Table S4).

| Unplanned cesarean delivery prediction model
The The performance of the validation data set is presented in Table 2.
The performance among the test data set is presented in Table 3.
The AUC was 0.84 (95% CI 0.83-0.85) ( Figure 4). Additionally, we observed the potential use of the model for stratifying all VD attempts to three distinct risk groups. Approximately 50% of deliveries were stratified to the lowest risk group at risk of 1% or less for uCD. In contrast, 1.16% of deliveries were stratified to the highest risk group with 65% risk of uCD. In between, women in at-risk centiles 51%-99% were at increasing risk for uCD ranging between 5% and 40%.
A calibration plot of the test data set is presented in Figure S4.
Although the rate of uCD among the test data set (7.9%) was higher than the rate among the training and validation data sets (6.0% and 5.9%, respectively), the calculated probabilities were in agreement with observed frequencies of uCD.

| DISCUSS ION
We developed an ML model for the prediction of uCD integrating maternal characteristics with fetal sonographic parameters that provides clinically useful stratification of the risk for uCD. We observed stable predictive performance across validation and testing data sets and satisfactory calibration signifying risk estimations were reliable. The clinical efficacy in stratifying the risk of uCD was consistent from 34 +0 to 42 +0 weeks of pregnancy as well as across groups with high pretest probability for uCD. All features required for the model are available at the time of admission to labor. We used a validated method to adjust biometric parameters assessed up to 5 weeks before labor, to the gestational age at admission to labor.
Two major hurdles limited previous attempts from developing generalizable models. The first was data availability. Our meticu- Our model is not the first ML model for prediction of uCD. A pioneering study presented ML models incorporating varying parity, maternal weight, and modes of labor-onset categories. 17 However, the study's cohort was limited to term deliveries only, the authors did not share the model for others to use, nor did they provide a web calculator or nomogram for clinical use. Furthermore, the model was based on neonatal biometry not available at admission to labor, rather than the fetal ultrasound.
Our model is available for use free of charge at Birth AI.org.
Approximately 50% of deliveries were stratified to the lowest risk group at risk of 1% or less for uCD. In contrast, approximately 1% of deliveries were stratified to the highest risk group with 65% risk of uCD.
In between, women with risk centiles of 51%-99% were at increasing risk for uCD ranging between 5% and 40% calculated for each group separately. A high probability for uCD does not mandate elective CD, but rather serves for counseling women at time of admission to labor.
Our study is not devoid of limitations. The retrospective collection of data remains a significant limitation even in large and granular data sets.

AUTH O R CO NTR I B UTI O N S
AT and RM designed the study, defined the study cohorts, cap-

FU N D I N G I N FO R M ATI O N
No external funding was used to conduct this study.

CO N FLI C T O F I NTE R E S T
The Sheba Medical Center has submitted a provisional patent application to protect the intellectual property aspects of the reported machine learning model. Dr. Raanan Meyer, Dr. Avi Tsur, and Roni Eilenberg are listed as inventors in the patent application.

DATA AVA I L A B I L I T Y S TAT E M E N T
Research data are not shared. Lift, the ratio of the target response divided by the average response.