Value of the application of computed tomography‐based radiomics for preoperative prediction of unfavorable pathology in initial bladder cancer

Abstract Objectives To construct and validate unfavorable pathology (UFP) prediction models for patients with the first diagnosis of bladder cancer (initial BLCA) and to compare the comprehensive predictive performance of these models. Materials and Methods A total of 105 patients with initial BLCA were included and randomly enrolled into the training and testing cohorts in a 7:3 ratio. The clinical model was constructed using independent UFP‐risk factors determined by multivariate logistic regression (LR) analysis in the training cohort. Radiomics features were extracted from manually segmented regions of interest in computed tomography (CT) images. The optimal CT‐based radiomics features to predict UFP were determined by the optimal feature filter and the least absolute shrinkage and selection operator algorithm. The radiomics model consist with the optimal features was constructed by the best of the six machine learning filters. The clinic‐radiomics model combined the clinical and radiomics models via LR. The area under the curve (AUC), accuracy, sensitivity, specificity, positive and negative predictive value, calibration curve and decision curve analysis were used to evaluate the predictive performance of the models. Results Patients in the UFP group had a significantly older age (69.61 vs. 63.93 years, p = 0.034), lager tumor size (45.7% vs. 11.1%, p = 0.002) and higher neutrophil to lymphocyte ratio (NLR; 2.76 vs. 2.33, p = 0.017) than favorable pathologic group in the training cohort. Tumor size (OR, 6.02; 95% CI, 1.50–24.10; p = 0.011) and NLR (OR, 1.50; 95% CI, 1.05–2.16; p = 0.026) were identified as independent predictive factors for UFP, and the clinical model was constructed using these factors. The LR classifier with the best AUC (0.817, the testing cohorts) was used to construct the radiomics model based on the optimal radiomics features. Finally, the clinic‐radiomics model was developed by combining the clinical and radiomics models using LR. After comparison, the clinic‐radiomics model had the best performance in comprehensive predictive efficacy (accuracy = 0.750, AUC = 0.817, the testing cohorts) and clinical net benefit among UFP‐prediction models, while the clinical model (accuracy = 0.625, AUC = 0.742, the testing cohorts) was the worst. Conclusion Our study demonstrates that the clinic‐radiomics model exhibits the best predictive efficacy and clinical net benefit for predicting UFP in initial BLCA compared with the clinical and radiomics model. The integration of radiomics features significantly improves the comprehensive performance of the clinical model.


| INTRODUCTION
Bladder cancer (BLCA) is among the most commonly diagnosed malignant tumors of the urinary system, ranking as the tenth most prevalent cancer worldwide. 1 The latest statistics indicate that BLCA caused approximately 210,000 tumor-related deaths and affected 570,000 new individuals in 2020, posing a significant threat to public health. 1 The pathology features of initial BLCA play a crucial role in determining the appropriate treatment strategy and predicting prognosis. 2 Even though cystoscopic biopsies and transurethral resection of bladder tumor (TURBT) are standard methods for determining the histopathology of bladder tumors, up to 20%-53% of muscleinvasive bladder cancers (MIBCs) are misdiagnosed and 15% of tumors are inaccurately graded due to sampling errors, heterogeneous characteristics of the tumor, and the small size of the sample collected. [3][4][5][6][7] The standard of care for bladder tumors with unfavorable pathologic (UFP) features, such as high-grade muscular infiltration, is radical cystectomy, whereas the first choice of treatment for bladder tumors with favorable pathologic (FP) feature, such as low-grade nonmuscular infiltration, is TURBT. [8][9][10] Inaccurate assessment of pathologic features can lead to incorrect treatment planning, increases the risk of poor outcomes for patients and causing significant financial and emotional damage.
Computed tomography (CT) images are the most commonly used method for preoperative evaluation of patients with BLCA. The morphology and enhancement pattern of CT can reflect certain pathological features of the bladder tumor. However, due to the limited number of features that the human eye can recognize from thousands of pixels in CT images, the information that can be acquired is limited. In contrast, the use of computerrecognized imaging features can assist radiologists in making more accurate diagnoses. 11 With the advancement of medical and computer technology, radiomics has gradually gained importance in urologic tumors. [12][13][14] Radiomics is a new quantitative evaluation approach that involves converting digital medical images (including CT, magnetic resonance imaging (MRI), ultrasound, or positron emission tomography (PET) images) into highdimensional data and then extracting numerous image features through high-throughput computation. CT-based radiomics features of BLCA have been applied to predict the physiological information of tumors, such as preoperative histological grading and staging, [15][16][17] achieving satisfactory predictive accuracy. However, most studies have only constructed predictive nomogram models containing imaging features, without incorporating additional omics information.
This study aimed to construct and validate three prediction models for UFP in initial BLCA, including the clinical, radiomics, and clinic-radiomics models. The study further aimed to compare the comprehensive predictive performance of these models to identify the optimal model for aiding clinical treatment and prognostic assessment.

| Patient cohort
A total of 105 patients with initial BLCA diagnosed between 2017 and 2022 were included in the study and randomly enrolled into the training and testing cohorts, in a 7:3 ratio. Subjects were excluded according to the exclusion criteria: (1) missing clinical or imaging data; (2) incomplete clinical and imaging data at the time of initial diagnosis; (3) lack of pathological data after the CT scans First Affiliated Hospital of Nanchang University, Grant/Award Number: YFYPY202026 the clinic-radiomics model was developed by combining the clinical and radiomics models using LR. After comparison, the clinic-radiomics model had the best performance in comprehensive predictive efficacy (accuracy = 0.750, AUC = 0.817, the testing cohorts) and clinical net benefit among UFP-prediction models, while the clinical model (accuracy = 0.625, AUC = 0.742, the testing cohorts) was the worst.

Conclusion:
Our study demonstrates that the clinic-radiomics model exhibits the best predictive efficacy and clinical net benefit for predicting UFP in initial BLCA compared with the clinical and radiomics model. The integration of radiomics features significantly improves the comprehensive performance of the clinical model.

K E Y W O R D S
clinical model, initial bladder cancer, nomogram, radiomics, unfavorable pathology conducted at our center; (4) history of any significant interventions such as radiotherapy and biopsy prior to CT scan.

| Clinical research materials
Clinical-pathological data, including years, sex, tumor size, number of tumors, smoking history, gross hematuria, neutrophil to lymphocyte ratio (NLR), histological grade and pathological stage, was extracted from a prospectively managed medical database. UFP was defined by the presence of high-grade histology, BLCA with T stage ≥T2, BLCA with lymphatic metastasis (N1 stage) or BLCA with distant metastasis (M1 stage).

| Clinical model construction
Univariable and multivariable logistic regression (LR) analyses were conducted on clinical characteristics to identify the independent predictive factors of UFP in the training cohort. These independent factors were then used to construct the clinical model (clinical UFP-risk nomogram), and the clinical score was driven by Equation (1).

| Imaging research materials
CT images of the abdomen or pelvic cavity were acquired using a SOMATOM Definition CT scanner. Scan parameters were automatically modulated according to the following settings: tube voltage, 120 kV; tube current, 150 mAs; slice thickness, 5 mm; reconstruction interval, 1 mm; and slice gap, 1 mm.

| Region-of-interest (ROI) segmentation
The DARWIN scientific research platform (Beijing Yizhun Intelligent Technology Co., LTD., https://arxiv. org/abs/2009.00908) was utilized to delineate ROIs on the uploaded CT images. A radiologist (WD) with 5 years of experience manually segmented the 3D-ROI, and another senior radiologist (BF) with over 15 years of experience reviewed all segmentation results. Both radiologists were blinded to the tumors' pathological features and the patients' assigned groups. In cases where there was a disagreement over the tumor segmentation boundary, the tumor profile was discussed and determined by the highly experienced radiologist.

| Radiomics feature extraction and selection
After segmentation, 1781 radiomic features of lesions were extracted in the same platform using the Python Pyradiomics package (https://pyrad iomics.readt hedocs. io/en/lates t/). The radiomic features included first-order features, shape features, and texture features, such as graylevel co-occurrence matrix features, gray-level run-length matrix features, gray-level size zone matrix features, and gray-level dependence matrix features. Moreover, the radiomic features also included the first-order features and texture features obtained from the images processed using nine filter types (original, exponential, gradient, lbp, logarithm, square, square root, wavelet, and log-sigma). All radiomic features were normalized by Z-score.
To determine the most relevant features, we evaluated the linear correlation between each feature and the category labels using the optimal feature filter, which was determined by the sample variance F-value in the training cohort. Among the 1781 features, we filtered the 45 most relevant features were filtered. We then used least absolute shrinkage and selection operator regression to further select the best predictive features ( Figure 1A). Finally, we extracted the 16 texture features that were most strongly associated with unfavorable pathology (UFP) in initial BLCA ( Figure 1B).

| Radiomics model construction
Based on the optimal radiomics features, we constructed six machine learning models in the training cohort, including gradient augmented decision tree (GBDT), Knearest neighbor, LR, random forest (RF), support vector machine, and EXtreme gradient boosting (XGBOOT). To confirm the accuracy of the selected model, we employed the 10-fold cross-validation method. The classifier with the highest predictive performance was selected to construct the radiomics model, and the radiomics score was obtained using Equation (1).

| Clinic-radiomics model construction
We developed the clinic-radiomics model (clinicradiomics UFP-risk nomogram) by combining the clinical and radiomics models using LR, which enabled us to determine the risk score with Equation (1): The optimal radiomic features selection (B) for UFP of initial BLCA based on LASSO regression algorithm (A). BLCA, bladder cancer; UFP, unfavorable pathology; LASSO, least absolute shrinkage and selection operator.
Here, α 0 stands for the constant, α i represents the LR coefficients, and x i represents the value of chosen radiomics or clinical features. When constructing the clinic-radiomics model, x i denotes the score of each individual model.

| Model performance evaluation
To assess the comprehensive predictive performance of the models in both the training and testing cohorts, we quantified the area under the curve (AUC) of the receiver operating characteristic, as well as the accuracy (ACC), sensitivity (SEN), specificity (SPE), positive predictive value (PPV) and negative predictive value (NPV). We employed DeLong's test to compare the differences in AUC between the models. Calibration curves were employed to evaluate the calibration of the models, and decision curve analyses (DCA) were applied to gauge the clinical net benefit, which can indicate the clinical utility of the models. The workflow of this study was demonstrated in Figure 2.

| Statistical analysis
All categorical variables compared with Pearson's chisquared test were given in the form of numbers and percentages. The Student t-test was used to assess normally distributed continuous variables, which were reported as mean and standard deviation; for those with non-normal distribution, the Wilcoxon rank sum test was used, and their results were presented as median and interquartile range. SPSS 24.0 software (SPSS) and R software (version 4.1.0) were utilized for all statistical analyses and graphical representations. All tests were two-sided, and significance was set at p < 0.05.

| Clinic-pathological characteristics
Ultimately, 27 patients with PF and 46 patients with UFP were included to the training set, while 12 patients with PF and 20 patients with UFP were included in the testing set.

UFP-prediction models
After conducting univariable and multivariable LR analyses in the training cohort ( the clinical model (clinical UFP-risk nomogram) was constructed based on the risk factors ( Figure 3). The formula for the clinical model was provided in the supplementary material.
The prediction performances of six machine learning models were shown in Figure S1A,B. Comparison of the AUC in different models between the training and testing cohorts revealed falsely high predictive performance in the GBDT, RF, and XGBOOST classifiers in the training cohort. However, the LR classifier not only had good prediction performance in the training group (AUC = 0.885, ACC = 0.822, SEN = 0.852, SPE = 0.804, PPV = 0.719 and NPV = 0.902), but also had the best prediction performance (AUC = 0.817, ACC = 0.750, SEN = 0.700, SPE = 0.833, PPV = 0.875 and NPV = 0.625) than remaining classifiers in the testing cohort. Therefore, the LR classifier was chosen to construct the radiomic model. Figure S2A exhibited the distribution of the radiomics score for all cases, in which patients with UFP had a higher radiomics score than UP. In addition, as depicted in Figure S2B, the difference was statistically significant not only in the training cohort (p < 0.0001) but also in the testing cohort (p < 0.01) (Figure 3). The formula of radiomics model was provided in supplementary material.  Similarly, the clinic-radiomics model, known as the clinic-radiomics UFP-risk nomogram, was developed using LR analysis and by combining radiomics score with clinical score (Figure 4). To generate a UFP-risk score for each case, the expression for the clinic-radiomics score (outlined in the supplementary material) was utilized.

| Validation and comparison of the three UFP-prediction models
As shown in Figure 5A  models in both cohorts ( Figure 6A). Analysis of DCA curves in the training cohort ( Figure 6B, left) revealed that when the high-risk threshold was within a range of 0%-82.5%, using the radiomics model and clinic-radiomics model to identify pathological features and make therapeutic decisions had greater clinical benefit than the clinical model, "treat all" and "treat none" schemes. In contrast, the clinical benefit of the radiomics model was not superior to that of the clinical model in the testing cohort ( Figure 6B, right). Nevertheless, the clinic-radiomics model still provided the best outcome among the three UFP-prediction models.

| DISCUSSION
In recent years, radiomics has become a popular method for preoperative prediction of bladder tumor pathology, thanks to advances in medical technology, and has achieved remarkable success in that regard. 12,[17][18][19] Several researchers have applied CT, MRI, and ultrasound to radiomic studies for predicting histological grading and staging features of BLCA and have demonstrated good predictive efficacy. 20,21 However, in most studies, only predictive nomogram model that contained imaging features was constructed, and models for other histologies were not. Zhao et al. combined radiomic and preoperative clinical features of perihilar cholangiocarcinoma to form a clinic-radiomics nomogram for forecasting early recurrence, which proved to be the most effective (AUC = 0.868) compared to the single-element model. 22 Therefore, we hypothesized that combining multi-omics data could potentially further improve the performance of predictive models for BLCA. To test our hypothesis and assess the value of radiomics in predicting pathological features for initial BLCA by clinical characterizations, we constructed and validated three UFP-prediction models which relied on the optimal clinical and radiomic features, including a clinical model, radiomics model, and clinic-radiomics models. Overall, the clinic-radiomics model demonstrated the highest predictive efficacy and clinical net benefit among the three models, while the clinical model lagged. Thus, we concluded that radiomics could significantly improve the performance of the clinical model, and recommended that urologists use the clinic-radiomics model to examine the pathological features of patients with initial BLCA.
Clinical characteristics are widely used in the construction of nomogram models for preoperatively predicting the pathological features of various tumors; certainly, urologic oncology is no exception. 23,24 In this study, the clinical model of UFP for initial BLCA was constructed based on tumor size (OR, 6.02; 95% CI, 1.50-24.10; p = 0.011) and NLR (OR, 1.50; 95% CI, 1.05-2.16; p = 0.026). Several studies have pointed out that tumor size is significantly associated with adverse pathological features. Sylvester et al. and Fernandez-Gomez et al. have both identified the size of the bladder tumor as a prognostic factor. 25,26 In addition, the EAU and NCCN guidelines use a tumor size of 3 cm as the threshold for risk classification. 10, 27 Mariappan et al. found that the risk of incorrectly predicting highgrade BLCA as low-grade significantly decreased in larger tumors (>30 mm) (OR, 3.2; 95% CI, 1.0-9.7; p = 0.07). 28 Moreover, increasing evidence in recent years has highlighted that higher NLR correlates with lower BLCA pathology, as well as worse tumor outcomes. The study by Viers et al. evaluated the correlation between preoperative NLR and clinicopathological outcomes following radical cystectomy in 899 cases, and revealed that with a 1-unit increase of NLR, there was a 7% and 9% increased risk of UFP for extravesical tumor extension (p = 0.03) and lymph node involvement (p = 0.02), respectively. 29 Similarly, the work of Tang et al. suggested that a higher NLR signified a higher risk of high-grade and muscle-invasive disease. 30 Multiple studies have also corroborated the connection between NLR and oncologic outcome, and it has been utilized as a marker for risk evaluation. 31,32 The above demonstrates that the independent predictors used to construct the clinical model for UFP of initial BLCA are valid and reliable.
The present study revealed that the clinical model had AUC values of 0.777 and 0.742 in the training and testing cohorts, respectively, indicating its appropriateness for predicting UFP in patients with initial BLCA. Nevertheless, the radiomics model exhibited marginally better predictive efficacy in both the training group (AUC = 0.885 vs. 0.777) and the testing group (AUC = 0.817 vs. 0.742), although these differences did not reach statistical significance. However, compared to the clinic-radiomic model, the difference in AUC was statistically significant in the training group (0.777 vs. 0.895, p = 0.015). These results demonstrated that the integration of radiomics into the clinical model led to an increase in the predictive efficacy of the model. This was reflected in a statistically significant difference in AUC between the clinic-radiomic and clinic-only models in the training group (0.777 vs. 0.895, respectively). The predictive accuracy and net benefit of the model were also found to be significantly improved with the inclusion of radiomics, further reinforcing that radiomics can lead to greater accuracy and overall performance when used in combination with a clinical model.
Radiomics features can reveal small, subtle changes in medical images that are difficult to detect with the unaided eye. These changes can help to characterize and interpret changes in tumor biology, providing timely and accurate information for clinical purposes. 33 Among the F I G U R E 4 Clinic-radiomics model (clinic-radiomics UFP-risk nomogram) for predicting the risk of UFP in patients with initial BLCA. UFP, unfavorable pathologic; BLCA, bladder cancer.

F I G U R E 5
The predictive performance of three UFP-prediction models, including clinical model, radiomics model and clinic-radiomics model. ROC of three UFP-prediction models in the training cohort (A, left) and testing cohort (A, right). The predictive performance of three UFP-prediction models in ACC, SEN, SPE, PPV and NPV in the training cohort (B, left) and testing cohort (B, right). ACC, accuracy; AUC, area under the curve; NPV, negative predictive value; PPV, positive predictive value; SEN, sensitivity; SPE, specificity; UFP, unfavorable pathologic. radiomic features included in this study, local binary pattern (LBP) features account for the most significant proportion (12/16, 75%). The LBP filter can detect the image's microstructure by comparing local voxel gray levels and creatingbinary pattern codes, which has been successfully utilized in computer-aided diagnosis systems for various malignant tumors. 34,35 Machine learning classifiers are essential for optimizing the performance of radiomics models in terms of accuracy and diagnosis. Their inclusion allows for enhanced assessment of medical images and more reliable outcomes. 36 In this study, LR had the best classification performance (AUC, ACC, SEN, SPE, PPV, and NPV of 0.885, 0.822, 0.852, 0.804, 0.719, and 0.902, respectively) among six classifiers in both the training and testing cohort. LR is widely used in machine learning classifiers related to radiomics analysis because of its simplicity. The incorporation of regularization into the LR model helps to reduce the error between the predicted and actual values, while also moving the weights closer to their origins. This helps to prevent overfitting and enhances the model's generalization capabilities. 17 Radiomics offers numerous benefits which lead to the overall predictive performance of the clinic-radiomics model being much greater than that of the clinical model F I G U R E 6 Calibration curve of three UFP-prediction models in the training cohort (A, left) and testing cohort (A, right). DCA of three UFP-prediction models in the training cohort (B, left) and testing cohort (B, right). DCA, decision curve analyses; UFP, unfavorable pathologic.

T A B L E 3
Comparison of the AUC with the clinical model, the radiomics model, and the clinic-radiomics model. alone. These advantages allow for superior accuracy and greater overall performance when it comes to predicting outcomes. Although this study is the first to construct three models and systematically compare their comprehensive performance in predicting pathological features in initial BLCA, our study is not without limitations. First, this study is retrospective in nature, so there is potential for selective bias, and further prospective research should be conducted to confirm the results. Second, since the data was drawn from a single medical center, the sample size was relatively small, further limiting the scope of the study. Multicenter studies with large sample sizes are still needed to reduce regional differences and to evaluate the models. Third, some biological features or biomarkers that are valuable for guiding clinical diagnosis and treatment are not included in the model, such as genetic mutation data. Incorporating additional genetic information into clinical features and thus constructing more comprehensive clinic-radiomics predictive models may allow us to achieve higher predictive efficacy in future studies.

| CONCLUSION
Our study constructed three models for predicting UFP in patients with initial BLCA. The multi-omics model that combines CT based radiomics and clinical models had the best predictive efficacy and clinical net benefit for predicting UFP in initial BLCA. Radiomics contributed significantly to the improvement of the comprehensive performance of the clinical model.