Combination of deep learning and ensemble machine learning using intraoperative video images strongly predicts recovery of urinary continence after robot‐assisted radical prostatectomy

Abstract Background We recently reported the importance of deep learning (DL) of pelvic magnetic resonance imaging in predicting the degree of urinary incontinence (UI) following robot‐assisted radical prostatectomy (RARP). However, our results were limited because the prediction accuracy was approximately 70%. Aim To develop a more precise prediction model that can inform patients about UI recovery post‐RARP surgery using a DL model based on intraoperative video images. Methods and Results The study cohort comprised of 101 patients with localized prostate cancer undergoing RARP. Three snapshots from intraoperative video recordings showing the pelvic cavity (prior to bladder neck incision, immediately following prostate removal, and after vesicourethral anastomosis) were evaluated, including pre‐ and intraoperative parameters. We evaluated the DL model plus simple or ensemble machine learning (ML), and the area under the receiver operating characteristic curve (AUC) was analyzed through sensitivity and specificity. Of 101, 64 and 37 patients demonstrated “early continence (using 0 or 1 safety pad at 3 months post‐RARP)” and “late continence (others),” respectively, at 3 months postoperatively. The combination of DL and simple ML using intraoperative video snapshots with clinicopathological parameters had a notably high performance (AUC, 0.683–0.749) to predict early recovery from UI after surgery. Furthermore, combining DL with ensemble artificial neural network using intraoperative video snapshots had the highest performance (AUC, 0.882; sensitivity, 92.2%; specificity, 78.4%; overall accuracy, 85.3%) to predict early recovery from post‐RARP incontinence, with similar results by internal validation. The addition of clinicopathological parameters showed no additive effects for each analysis using DL, EL and simple ML. Conclusion Our findings suggest that the DL algorithm with intraoperative video imaging is a reliable method for informing patients about the severity of their recovery from UI after RARP, although it is not clear if our methods are reproducible for predicting long‐term UI and pad‐free continence.


| INTRODUCTION
Artificial intelligence (AI) is a computer science that approximates human cognitive functions such as decision-making, problem-solving, detection, and classification using algorithms. 1 A new machine learning (ML) technology called deep learning (DL) 2 is in increasing demand in the medical field. [3][4][5] Urology was one of the first areas where AI was used for detecting medical devices, identifying images, assessing surgical skill, and predicting clinical effectiveness in complex urological procedures. 6 Prostate cancer (PC) affects a high percentage of men worldwide, and robotic-assisted radical prostatectomy (RARP) is the standard of care for localized PC. However, post-prostatectomy urinary incontinence (PPUI) is a typical postoperative complication that significantly impairs the quality of life of those with PC. We recently reported that DL with magnetic resonance imaging (MRI) is useful for predicting the UI severity after RARP. 7 Our results suggested that a DL algorithm using preoperative imaging might aid treatment selection, especially for PC patients who wish to avoid long-term UI after RARP. However, results were limited, as the accuracy of prediction was only about 70%. 7 Recent studies suggest that the preservation of periprostatic structures by intraoperative surgical techniques such as nerve-sparing (NS), bladder neck-preserving, and Retzius-sparing modalities are associated with early recovery from UI after RARP. [8][9][10] Although surgical procedures can be expected to improve early recovery from UI, it is difficult to objectively assess the relationship between surgical procedures and the preservation of anatomic structures and whether they contribute to early recovery from UI after RARP.
We hypothesized that DL could objectively assess the relationship between surgical techniques and the preservation of anatomic structures and the relationship between these anatomical structures and early recovery from UI after RARP. In the present study, we aimed to develop a more accurate predictive model to inform patients with PC about the timing of UI recovery after RARP surgery, using a DL model based on intraoperative video images.

| Patients selection
All experimental protocols were approved by the Institutional Review Board (IRB) of Fujita Health University School of Medicine (IRB no. HM19-257). All methods were performed in accordance with the relevant local guidelines and regulations. The patients were explained the purpose of the study, and a website with additional information, including an opt-out option, was set up for the study. A database of 400 patients with PC was used in our recent study (from August 1, 2015 to July 31, 2019). 7 In 78 patients, surgical videos had been deleted or lost, and in 299 patients, video records could be viewed; however, snapshots necessary to perform the analysis could not be obtained (refer to Snapshots extraction from the intraoperative video).
Thus, we included 101 patients whose intraoperative video records were available. In addition, the video records of 30 additional patients undergoing surgery were prepared for internal validation.

| RARP surgery
RARP was performed by nine surgeons using the da Vinci Si or Xi system (Intuitive Surgical, Inc., Sunnyvale, CA, USA). NS surgery was performed according to the clinical stage and risk criteria, and bladder neck preservation was routinely included. Every patient underwent posterior and anterior reconstruction.

| Pre-and intraoperative risk parameters and the continence definition
Preoperative clinicopathological covariates, such as age, body mass index (BMI), neoadjuvant androgen deprivation therapy (NADT) history, membranous urethral length (MUL), prostate volume (PV), continence status before RARP, serum prostate-specific antigen (PSA) level, Gleason score (GS sum), clinical stage, and risk criteria based on the risk stratification in the European Association of Urology guidelines, and intraoperative covariates, such as operator experience, total operation time, console time, with or without NS, and bleeding volume, were assessed. We considered surgeons with more than 50 cases of RARP surgery experience as experts, whereas the others were nonexpert. Continence was evaluated using the Expanded Prostate Cancer Index Composite survey question: "How many pads per day did you usually use to control leakage during the last 4 weeks?" Patients who did not use pads with no urine leakage or used 1 safety pad for less than 20 mL at 3 months postoperatively were included in the "early continence" group, whereas others were categorized into the "late continence" group.

| Snapshots extraction from the intraoperative video
Three snapshots from intraoperative video recordings showing the pelvic cavity (prior to bladder neck incision, immediately following prostate removal, and after vesicourethral anastomosis) were extracted. Snapshot extraction was performed in accordance with the following principles while considering reproducibility: (1) Anatomical structures near the pubic symphysis (prostatic apex) should be included in all images. (2) In "before bladder neck incision," the bladder neck and prostatic apex should be visible. (3) In the "immediately after prostate removal," bladder neck preservation, nerve preservation, and degree of bleeding should be visible. (4) In "after vesicourethral anastomosis," the anastomosis should be visible without tension on the urethra or bladder. Figure 1 shows the representation of the three snapshots extracted from the intraoperative video records of the same patients.

| DL model
First, the given images were input into a convolutional neural network (CNN), which is a DL technique 2,11 that has an excellent ability to classify images. 12 CNN is applied in medical image processing and is widely used for lesion detection and differentiation and prognostication. 7,13,14 We focused on DenseNet, which is a type of CNN. 15 By tightly coupling the layers, information can be transmitted smoothly, even in a multilayered network, thereby improving the processing performance. DenseNet has several variations with different numbers of network layers; in this study, we used DenseNet169 (with 169 layers).
A common method for using CNNs is to input data and directly obtain the desired results, such as the classification output. In this study, three images were treated as input images; therefore, a simple DenseNet could not be used. CNNs are also used as feature extractors, where the convolutional layer of the CNN is responsible for extracting various features from the input images. CNNs trained on a large number of images can extract general-purpose features from images, and some studies have used these features for other purposes. In our previous study, 7 a CNN was used to extract 4096 features from a single image and predict whether urinary continence was good or poor using ML.
In this study, three images were input into DenseNet169, and 1920 features were obtained for each image. Overall, 5760 features were used for prediction. Dimensionality compression was required because the quantity of features was large in comparison to the number of samples. Principal component analysis was used for dimensionality compression, and the data were compressed into 20-dimensional principal components. These data were the image features obtained from the CNN.
Using the image features and clinical information obtained as described above, an ML method was used to predict whether urinary continence was early or late. Naïve Bayes, support vector machine, random forest, and artificial neural networks (ANNs) were each used as the ML method. This method of prediction, using only one ML method, is called the single model. We also introduced an ensemble model, in which the output of the above four ML methods (probability of UI) and 20 compressed features were input again to ML to predict UI.
The following six methods (Methods 1-6) were possible depending on the variation in input information (images and clinical information) and network configuration (simple and ensemble models). In this study, we compared the prediction performances of these methods as follows: Method 1, DL with three video images and simple ML;

| Statistical analyses
The EZR software (Saitama Medical Center, Jichi Medical University, Saitama, Japan) 16 Figure 3A,B shows the AUC and the accuracy of continence pre-  Figure 3A and Table S1), while intraoperative video snapshots alone (Method 1) achieved an AUC of 0.641-0.701 ( Figure 3B and Table S1), suggesting that the combination of intraoperative video images with clinicopathological parameters showed additive effects for PPUI prediction. We then attempted a combination of DL and ensemble ML using intraoperative video images (Methods 2 and 4).
We found that combining DL with ensemble ANN using intraoperative video snapshots (Method 2) had the highest performance, with an AUC of 0.882 (sensitivity, 92.2%; specificity, 78.4%; overall accuracy, 85.3%) for predicting early recovery from PPUI ( Figure 3C and  Figure 3D and Table S1). The AUCs of the four ML algorithms according to the DL and ML methods are shown in Figure 4A. Ensemble ML involving clinicopathological parameters had a non-notable additive effect on performance compared with simple ML (Methods 5 and 6 in Figure 3). We finally performed an internal validation test using snapshot photographs extracted from surgical videos of 30 recently operated patients. Although the combination of DL and simple ML using intraoperative video snapshots (Methods 1) did not give excellent results, the combination of DL and ensemble ANN (Method 2) performed best in predicting early recovery from PPUI with 0.858 AUC ( Figure 4B). These results suggest that ensemble ML has the potential to improve accuracy using information from F I G U R E 3 ROC curves and accuracies on continence prediction using intraoperative video images and clinicopathological parameters analyzed by DL and simple ML or DL and ensemble ML. (A, B) Intraoperative video images with (A) or without (B) clinicopathological parameters were analyzed by DL and simple ML, as described in Figure 1. ROC analyses were performed three times, and the representatives were shown. (C, D) Intraoperative video images without (C) or with (D) clinicopathological parameters were analyzed by DL and ensemble ML, as described in Figure 1. ROC analyses were performed three times, and the representatives were shown. DL, deep learning; ML, machine learning; ROC, receiver operating characteristic.
various methods. Support vector machines, random forests, and naïve bays place data in a multidimensional space and classify them using hyperplanes based on a policy of minimizing structural risk. 25,26 On the other hand, ANN is uniquely aimed at solving problems that arise during various classifications and pattern recognition. 27 ANN was developed to mimic the neuronal ecology of the human brain and is trained to reflect weighted combinations of input variables in its results. 28,29 The greatest advantage of ANN is its ability to efficiently approximate and analyze any nonlinear functional model. 26  Our study was an evaluation of intraoperative images, not a study to "exclude unfit patients for surgery," as focused on in previous studies using preoperative information. 7 However, our results indicate that both "anatomical" and "surgical factors (surgical techniques)" shown in the video are related to the risk of PPUI. Our results suggest that the information obtained from actual surgical images is more important for outcome prediction than surgical procedure and perfor- observed at 3 months with 0-1 pad has not been reproduced in men without pads or noted at the 6-or 12-month follow-up. Fourth, external validation was not enforced and the objectivity of the data is not ensured at present.

| CONCLUSION
Our findings may be useful for individual counseling within clinical practice, utilizing preoperative and intraoperative information to calculate the probability of UI after RARP surgery. It is expected that a method for identifying hotspots of intraoperative video information will be developed in the future, and that this DL model can be used as a tool for surgical navigation training to avoid prolonged UI after RARP surgery.

ACKNOWLEDGMENTS
We would like to thank Ms. Emi Bito at Fujita Cancer Center for her assistance in preparing and submitting this manuscript.