CT‐based radiomics model to predict spread through air space in resectable lung cancer

Abstract Background Spread through air space (STAS) has been identified as a pathological pattern associated with lung cancer progression. Patients with STAS were related to a worse prognosis compared with patients without STAS. The objective of this study was to establish a radiomics model capable of forecasting STAS before surgery, which can assist surgeons in selecting the most appropriate operation type for patients with STAS. Method There were 537 eligible patients retrospectively included in this study. ROI segmentation was performed manually on all CT images to identify the region of interest. From each segmented lesion, a total of 1688 features were extracted. The tumor size, maximum tumor diameters, and tumor type were also recorded. Using Spearman's correlation coefficient to calculate the correlation and redundancy of elements, and redundant features less than 0.80 were removed. In order to reduce the level of overfitting and avoid statistical biases, a dimension reduction process of the dataset was conducted to decrease the number of features. Finally, a radiomics model included 44 features was established to predict STAS. To evaluate the performance of the model, the receiver operating characteristic (ROC) curve was used, and the area under the curve (AUC) was calculated, and the accuracy of the model was verified by 10‐fold cross‐validation. Results The incidence of STAS was 38.2% (205/537). The tumor type, maximum tumor diameters, and consolidation tumor ratio were significantly different between STAS group and non‐STAS group. The training group included 430 patients, while the test group was consisted with 107. The training group achieved an AUC of 0.825 (sensitivity, 0.875; specificity, 0.621; and accuracy, 0.749) and the test group had an AUC of 0.802 (sensitivity, 0.797; specificity,0.688; and accuracy, 0.748). The 10‐fold cross‐validation had an AUC of 0.834. Conclusion CT‐based radiomic model can predict STAS effectively, which is of great importance to guide the selection of operation types before surgery.


| INTRODUCTION
6][7] Therefore, it is critical to precisely distinguish STAS to choose an appropriate operation for patients with lung cancer.
Due to the prevalent utilization of thin-section computed tomography (CT) and the increased detection rate of early-stage NSCLC, 8 STAS has gradually become known and understood in academia.0][11] Nevertheless, the interpretation of CT images relies highly on the diagnostic experience of doctors.Radiomics can help analyze medical images and transform them into quantifiable multidimensional data, which presents the possibility of objectively identifying lesions at both the macro and microlevels.As a result, radiomics helps reduce the level of subjectivity in quantifying images and subsequently helps doctors to perform a more comprehensive qualitative analysis of tumor phenotype. 12umerous studies have indicated that radiomics exhibits high levels of stability and reproducibility. 13,14This study aimed to create a radiomics model that could forecast the presence of STAS in a retrospective cohort, which can determine the STAS status before surgery and provide guidance on the choice of an appropriate operation type.

| Patient selection
We retrospectively collected clinicopathological data and contrast-enhanced thin-section CT images of 585 consecutive patients undergoing surgery for lung cancer at Tianjin Medical University Cancer Institute and Hospital.The exclusion criteria were as follows: (1) pathology was benign tumor (n = 5); (2) preoperative neoadjuvant treatment (n = 8); and (3) incomplete data (n = 35).Finally, the cohort comprised a total of 537 patients, consisting of 265 males and 272 females, with an average age of 60.7 years.The ethics committee of Tianjin Medical University Cancer Institute and Hospital granted approval for this study (bc2022082).

| CT image collection and segmentation
All enrolled patients completed contrast enhanced thinsection CT scans, and all CT scans were performed by spiral CT scanners (Siemens SOMATOM Definition AS+ and Siemens SOMATOM Drive).The equipment settings for the scan included a detector collimation width of 64 × 0.6 mm, a tube voltage of 120 kVp, and automatic adjustment of the tube current.The images were reconstructed with either a 1.5 mm slice thickness and 1.5 mm gap or a 1.5 mm slice thickness and 1.0 mm gap.The reconstruction matrix was 512 × 512 pixels.The Digital Imaging and Communications in Medicine (DICOM) images were retrieved from the Picture Archiving and Communication System (PACS) and subsequently imported into open-source 3D-slicer software (version 4.11) for further analysis.
Two professional doctors (GJL and YR, with 4 years and 10 years of experience in chest radiology, respectively) assessed all the CT images on 3D slicer to determine the tumor type (pure ground-glass opacities [pGGO], mixed GGO [mGGO], or solid nodule) in consensus and segment of the regions of interest (ROI) on each layer of the CT images manually.Neither doctor knew the pathological type or STAS status.The Tdmax of all nodules and the solid component of mGGO were also measured to calculate the CTR.Interobserver agreement was assessed after segmentation, and intraclass correlation coefficients (ICCs) greater than 0.8 were considered to be available for further analysis.

| Radiomics feature extraction
The CT images with ROIs were input into the open-source software Pyradiomics (https://pyrad iomics.readthedocs.io/en/lates t/index.html)and the radiomic features such as textural, morphological, intensity, law, and wavelet features were automatically extracted.The gray level cooccurrence matrix (GLCM), neighboring gray-tone difference matrix (NGTDM), gray level size zone matrix (GLSZM), gray level run length matrix (GLRLM), and gray level dependence matrix (GLDM) were the texture features.In total, 1688 features were extracted from each segmented lesion.

| Feature extraction and radiomics model building
Spearman's correlation coefficient was used to calculate the association and duplication of elements, and redundant features less than 0.80 were removed.The number of features in the dataset was reduced by applying a dimension reduction process to reduce the level of overfitting and avoid statistical biases.First, the Mann-Whitney U test was used to select features that were highly correlated with STAS.The significance level was set to 0.05 (p < 0.05) as the threshold.Second, the interfeature coefficient (R) was calculated for all possible feature pairs to subsequently reduce the dimensionality of the dataset and avoid feature redundancy.The cutoff value for R indicating a strong correlation was set to 0.8.Within a strongly correlated feature pair, the one with a lower p-value would be dropped.Finally, the least absolute shrinkage selection operator method (Lasso) was used to sieve out the most significant features with nonzero coefficients to compute Rad-scores for all patients.The radiomics workflow is shown in Figure 1, and the selected features are shown in Figure 2.

| Histopathological evaluation
Two pathologists (SLN and GN, with more than 5 years and 15 years of experience in thoracic pathology, respectively) who were unaware of the clinical outcomes, conducted evaluations of the hematoxylin-eosin (HE)-stained tissue sections from all enrolled patients.STAS positivity was defined as tumor cells in airspaces outside the main tumor boundary.STAS positivity was characterized by three distinct morphological patterns: (1) micropapillary structures consisting of papillary structures without central fibrovascular cores; (2) tumor islands or solid nests composed of solid collections of tumor cells filling air spaces; and (3) scattered and discohesive single cells. 5Any disagreement was settled through discussion until a consensus was achieved.
We selected typical lung adenocarcinoma with STAS in this study, and the HE slices are shown in Figure 3.

| Statistical analysis
The accuracy of the model was assessed through 10-fold cross-validation.To evaluate the performance of the model, the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was calculated

| Clinicopathological features
The clinicopathological features of patients in both the training and test groups are shown in Table 1.There were 205 (38.2%) patients with STAS in this study.The pulmonary nodule density, Tdmax, and CTR were different between the two groups.
The comparison of clinical data between patients with and without STAS in the training group is shown in Table 2. Patients were categorized into either the STAS group or non-STAS group, depending on whether STAS was present or absent.The training group consisted of 430 patients with a median age of 61 years old (IQR: 56-65), and the positive rate of STAS was 37.2% (160/430).No statistically significant differences were observed regarding age, sex, smoking history, gene mutation, or adenocarcinoma subtype between the two groups.There was a significant difference in the operation type (p = 0.012) and tumor location (p = 0.014) between the STAS and non-STAS groups.The patients with STAS had a higher likelihood of undergoing lobectomy.The majority of STAS patients were stage T1c (p = 0.024) and N0 (p < 0.001).In terms of invasive pathological behavior, the patients with STAS in the training group were significantly more likely to have visceral invasion 3.2 | Rad-score building and selected features to predict STAS Forty-four features with nonzero coefficients were chosen to establish the Rad-score using a LASSO logistic regression model, including 17 first-order features, seven GLCMs, two GLRLMs, 10 GLSZMs, two NGTDMs, four GLDMs, and two shape features.The details of the features and coefficients are shown in Table S1.The comparison of these features is shown in Table 3.
Rad-score = intercept+βi × Xi. (β: the coefficient of each radiomics feature X: nonzero coefficient radiomics features, i: the sequence number of features; and intercept = −0.853).In this study, the Rad-score showed a notable increase in the STAS group compared to that in the non-STAS group (all p < 0.001, Table S2).

Rad-score and clinical factors
We first calculated the correlation between the Rad-score and each clinical factor in Table 1 in the training group.Subsequently, we selected the significant factors from this analysis for further calculation in the test group.The violin plots are shown in Figure 4.
In the training group, some immunohistochemical markers, such as TTF1, P40, CK56, Naspin A, and Syn, exhibited a significant relationship with the Rad-score, but this correlation was not significant in the test group.There was a correlation between gene mutations and solid component proportion with Radscore in the training group; however, the observed relationship showed no statistically significant results in the testing group (as seen in Figure S1), possibly due to the small sample size.In both the training and test groups, there was a significant correlation between the Rad-score and CTR, N stage, the proportion of lepidic components, and pleural invasion, with p values less than 0.05 for the nonparametric hypothesis tests.The correlation between the Rad-score and the cribriform pattern had a p value of 0.054, indicating a certain degree of correlation.
The radiomics model based on CT images not only had the ability to accurately identify STAS patients but also effectively distinguished patients with CTR values greater than and less than 0.5.Moreover, the model indicated its proficiency in detecting patients with pleural invasion, exhibited a high level of accuracy in identifying patients with advanced N stage, and successfully discerned differences in the proportion of lepidic components.

| Radiomics results
The study utilized 44 radiomics features, selected through the feature selection and model building process mentioned earlier, to predict STAS.The coefficients of these features are shown in Figure 2. The model showed satisfactory performance in both the training and test groups, and the AUC was 0.825 in the training group (sensitivity, 0.875; specificity, 0.621; and accuracy, 0.749) and 0.802 in the test group (sensitivity, 0.797; specificity, 0.688; and accuracy, 0.748), as shown in Figure 5.To validate the stability of the model, 10-fold cross-validation was conducted, and the AUC was 0.834 (sensitivity, 0.823; specificity, 0.748; and accuracy, 0.778), as shown in Figure 5C.

| DISCUSSION
STAS was first discovered in 2015 and was defined as the presence of micropapillary clusters and/or single cells spreading within air spaces beyond the edge of the main tumor, as well as solid nests. 1,5We are confident that STAS is not a random event caused by human factors when obtaining specimens but a real pathological risk factor that reflects the peripheral infiltration of tumor cells. 5][17] The positive rate of STAS in this study was 36.7%.9][20] STAS is association with lung cancerspecific death in lung neuroendocrine tumors and squamous cell carcinoma. 3,4Dai et al. 7 studied the impact of STAS and tumor size on survival and found that in lung adenocarcinoma <3 cm, STAS-positive patients had an unfavorable prognosis.In the 2021 WHO classification of lung tumors, STAS was considered a histologic feature with prognostic significance. 21 method that can predict STAS with good specificity and sensitivity is essential.Previous studies showed that the radiological factor CTR could preoperatively predict pathological invasive lung cancer with satisfactory specificity among clinical T1N0M0 peripheral patients. 22In this study, we found a significant difference in the CTR between the STAS and non-STAS groups (p < 0.001).Furthermore, 95% (152/160) of the STAS patients had a CTR value greater than 0.5, and the results were consistent with the study by Ding and colleagues. 11Moreover, a definite correlation between the Rad-score and CTR was observed, with a Wilcoxon test p-value < 0.01, which suggested a discernible association between STAS and CTR.
In the training group, more than half of the patients had solid nodules in the CT images (p < 0.001).Two studies reported a significant correlation between STAS and solid nodules, larger tumor size, micropapillary/solid patternpredominant adenocarcinoma, visceral pleural invasion, and lymphovascular invasion. 23,24In this study, we also found that pleural invasion, cribriform pattern, and visceral invasion were significantly different between patients with STAS and those without STAS in the training group.
The factors mentioned above are invasive characteristics of pulmonary nodules.In this study, pleural invasion and the cribriform pattern were found to have a certain correlation with the Rad-score, with respective p values of 0.041 and 0.054.This finding indicated that our model could effectively identify pleural invasion and, to a certain extent, identify patients with a cribriform pattern.In the field of organ transplantation, when dealing with donor lungs presenting with nodules on CT images, some CT manifestations could aid in estimating the invasive level of the nodules, such as nodule size, maximal CT value, lobulation sign, vessel abnormality, 25 and STAS.From this perspective, a multiparameter clinical and radiomics model to predict the lung nodule invasive level is highly necessary.
The most common adenocarcinoma subtype in the STAS-positive group in our study was acinar-predominant adenocarcinoma, followed by lepidic-predominant adenocarcinoma and solid-predominant adenocarcinoma.Similar to previous studies, 17 the STAS-positive group had fewer EGFR mutations and more ALK mutations in our study.In summary, the radiological and pathological features of STAS suggest that it may be a potential factor in tumor invasion.The resection extent of small-sized pulmonary nodules has always been a hotspot among thoracic surgeons.Lobectomy has been the most commonly used treatment for early-stage NSCLC, especially after the results of the JCOG0802/WJOG4607L study came out. 26The clinical trial JCOG0802 investigated whether segmentectomy was noninferior to lobectomy.The 5-year overall survival rates for patients who underwent segmentectomy and lobectomy were 94.3% and 91.1%, respectively.The segmentectomy group showed consistently improved overall survival across all predefined subgroups.Based on the results, it can be concluded that segmentectomy is the recommended standard surgical procedure for patients with small-sized peripheral NSCLC.For STAS-positive patients, it is crucial to choose an optimal operation type to preserve more lung parenchyma and achieve a better prognosis.Determining which treatment is more suitable for these patients, lobectomy or sublobar, remains a controversial topic.Two groups of researchers conducted separate studies to compare the prognosis of sublobar resection with that of lobectomy for early-stage IA lung cancer with STAS.Both studies concluded that STAS was an important adverse factor for sublobar resection. 16,27Patients with STAS who underwent sublobar resection were found to have a heightened risk of locoregional recurrence, regardless of the margin-to-tumor ratio. 27Kagimoto et al. 15 found that in patients with clinical stage IA lung adenocarcinoma and STAS, segmentectomy had a similar prognosis to lobectomy without an increased risk of locoregional recurrence.The optimal operation type is still controversial, so prospective studies investigating which kind of operation type is better for patients with STAS are necessary.
In our study, the radiomics approach showed promising results in predicting the presence of STAS in lung adenocarcinoma as well as in other histological types, such as squamous cell carcinoma, large cell lung cancer, small cell lung cancer, and neuroendocrine cancer.There were 537 patients included in the study, and both the training and testing cohorts were of substantial size.Furthermore, patients with T4 and N2 stages were also included.While several independent studies have established radiomics models to predict STAS, the number of patients enrolled in those studies is relatively small.For instance, Bassi et al. 28 created a radiomics model to predict STAS using a diverse dataset.Their model demonstrated an accuracy of 0.66 ± 0.02 through internal validation and 0.78 during external validation.Jiang and colleagues 29 developed a radiomics model based on CT images to predict STAS in lung adenocarcinoma, achieving an AUC of 0.754 despite a relatively low percentage (19.5%) of STAS-positive tumors in their study.While only 19.5% of tumors were positive for STAS in Jiang's study, the STAS incidence in this study was 36.7%.Additionally, Chen 30 and Han 17 also developed radiomics models to predict STAS in stage I lung adenocarcinoma, which indicated good performance.
Nevertheless, the study had a few limitations.First, the data were collected in a retrospective manner from a single institution, which affected the generalizability of the results.To validate the reproducibility and accuracy of our radiomics model, a larger prospective dataset from multiple centers is warranted.Second, without follow-up data, it is difficult to evaluate the effect of STAS on prognosis and survival after lung cancer surgery.We plan to update the study with follow-up data in the future.

| CONCLUSION
In conclusion, a significant correlation was found between preoperative CT radiomics features and STAS.We successfully established a CT-based radiomics model to CT, lung cancer, operation type, radiomics, spread through air space | 18757 GONG et al.
. The correlation coefficient between the Rad-score and clinical factors in both the training and test groups was assessed by the Kruskal-Wallis test or Wilcoxon test, as appropriate.Statistical significance was determined by conducting the chi-square test or Fisher's exact test for categorical variables and the Mann-Whitney test for continuous variables.A p value below 0.05 was considered significant.The Rad-score and features were compared using Student's ttest, and the results are expressed as X ± S. All statistical analyses were conducted using R software (version 4.1.0)and IBM SPSS Statistics (version 26.0).F I G U R E 1 Radiomics workflow.

F I G U R E 2
The selected radiomics features.F I G U R E 3 CT images of different tumor types with STAS and hematoxylin-eosin slide of STAS.(A) Mixed density nodule (blue arrow); (B) Pure ground-glass nodule (blue arrow); (C) Solid nodule (blue arrow); (D-F) Papillary clusters of tumor cells (arrows) in the alveolar space beyond the edge of the main tumor (x20).

F I G U R E 5
ROC curve for the radiomics model to predict STAS in training group (A) and test group (B).The 10-fold cross-validation for the model (C).
Clinicopathological features comparison between training and test group in radiomics.
Clinicopathological features in the training group.Radiomics features comparison between STAS and non-STAS group.