Clinical and radiological predictors of epidermal growth factor receptor mutation in nonsmall cell lung cancer

Purpose: To determine the prognostic factors of epidermal growth factor receptor (EGFR) mutation status in a group of patients with nonsmall cell lung cancer (NSCLC) by analyzing their clinical and radiological features. Materials and methods: Patients with NSCLC who underwent EGFR mutation detection between 2014 and 2017 were included. Clinical features and general imaging features were collected, and radiomic features were extracted from CT data by 3D Slicer software. Prognostic factors of EGFR mutation status were selected by least absolute shrinkage and selection operator (LASSO) logistic regression analysis, and receiver operating characteristic (ROC) curves were drawn for each prediction model of EGFR mutation. Results: A total of 118 patients were enrolled in this study. The smoking index ( P = 0.028), pleural retraction ( P = 0.041), and three radiomic features were signi ﬁ cantly associated with EGFR mutation status. The areas under the ROC curve (AUCs) for prediction models of clinical features, general imaging features, and radiomic features were 0.284, 0.703, and 0.815, respectively, and the AUC for the combined prediction model of the three models was 0.894. Finally, a nomogram was established for individualized EGFR mutation prediction. Conclusions: The combination of radiomic features with clinical features and general imaging features can enable discrimination of EGFR mutation status better than the use of any group of features alone. Our study may help develop a noninvasive biomarker to identify EGFR mutation status by using a combination of the three group features.

because of the small or low DNA content of tissue samples, it may be impossible to carry out gene detection, or incorrect detection results may be obtained. 3 Furthermore, due to tumor heterogeneity, there may be a positive mutation in the EGFR gene that is negative at the tissue biopsy site. [4][5][6] Although some clinical studies have suggested that adenocarcinoma, nonsmoking status, female sex, and Asian race are predictors of EGFR mutations, [7][8][9] studies have also shown that adenomatous hyperplasia, atypical adenomatous hyperplasia, adenocarcinoma in situ, and squamous dominant adenocarcinoma frequently carry EGFR mutations. [10][11][12][13][14][15] These results provide a reference for predicting the mutation status of lung cancer genes, but powerful noninvasive predictive markers are still lacking. Radiomics refers to the extraction of sub-visual yet quantitative image features with the intent of creating mineable databases from radiological images. 16 Some features have even been shown to identify genomic alterations within tumor DNA, a field that is now called "radiogenomics". 17 These features can identify specific driving mutations and changes in biological pathways. Recently, radiomic features extracted from chest CT have been used to predict EGFR mutation in NSCLC in some studies, [18][19][20][21] but most of these studies included only a few radiomic features in their analyses. [19][20][21] Additionally, in these studies, [18][19][20] only some clinical features were incorporated to improve the prediction ability of the EGFR mutation prediction model, and general imaging features were excluded. Therefore, in this study, we aimed to use reasonable statistical methods to screen meaningful features from numerous radiomic features and to establish a prediction model of EGFR mutation combined with general imaging features and clinical features.

2.A | Patient selection
A total of 1292 cases of NSCLC were collected from January 2014 to December 2017. The inclusion criteria were as follows: (1) patients with detailed clinical data, including gender, age, smoking index (number of cigarettes per day * number of years of smoking), family history of lung cancer, pathological type and pathological stage (classified according to the TNM classification system of the American Join Committee on Cancer); (2) patients with a clear mutation in the EGFR gene (using the Amplification Refractory Mutation System (ARMS)), and the tissue used for mutation detection was obtained from surgical excision specimens; and (3) standard unenhanced chest CT data were obtained within 2 months before the operation, and CT was performed by the same machine under the same scanning conditions. The exclusion criteria were as follows: (1) chemotherapy or radiotherapy performed before the detection of EGFR gene mutation; (2)   Therefore, the features are concentrated in different frequency ranges within the tumor volume.

2.C.2 | Stable radiomic feature selection
To obtain stable radiomic features, each image data point is subjected to VOI segmentation and radiomic feature extraction twice, the intraclass correlation coefficient (ICC) for each radiomic feature is calculated, and ICC > 0.75 is the stable feature.

2.D | Selection of prediction factors and establishment of prediction model
Patients enrolled in our study were divided into a training cohort and a validation cohort. To develop a better prediction model, we used more suitable statistical methods for predictor selection. In terms of the clinical and general imaging features, we applied a backward step-down selection process in a logistic regression analysis to select independent prediction factors. In the radiomics model, we used minimax concave penalty (MCP)-penalized LASSO regression analysis and tenfold cross-validation to select predictors, and before this process the radiomic features normalization were carried out through scale function in R software (version 3.5.2, http://www. R-project.org). A previous study showed that for statistical analysis of high-dimensional data, MCP-penalized LASSO regression analysis can avoid overfitting in the prediction and identify relevant variables for subsequent applications. 22 During the process of predictor selection for the combined prediction model, to address the multicollinearity problem that may exist among the groups of data, we did not cluster or combine the radiomic features, as in previous studies. 18,23 After features normalization we performed MCP-penalized LASSO regression analysis on all factors and ultimately obtained independent predictors. All predictors were used to develop prediction models. ROC curves were plotted, and AUC values represented the predictive ability of the models. Finally, all meaningful predictors were used to build a combined prediction model, which was compared with the radiomic feature prediction model, clinical feature prediction model, and general image feature prediction model. We also used the validation cohort to validate the discrimination ability of the prediction models.

2.E | Statistical analysis
Statistical analysis was performed using SPSS version 22.0 software (SPSS, Inc., IBM Company, Chicago, Illinois, USA) and R software.
The means of continuous variables were compared using the Mann--Whitney U test, and Pearson chi-square test was used for categorical variables between the EGFR (+) group and the EGFR (-) group by SPSS. ICC was calculated using the "psych" package in R. The "MASS" package was used for logistic regression in the clinical features group and general imaging features group. The LASSO regression analysis was performed for radiomic features and combined predictor selection by the "ncvreg" package in R. The ROC curve was built by the "pROC" package and "ggplot2" package in R. A nomogram was formulated by using the package "rms" in R, and the performance of the nomogram was measured by the concordance index (C-index), which was calculated with the "rcorrcens" package in "Hmisc" in R. The larger C-index represented an accurate prediction. Moreover, calibration curves were plotted for the nomogram. P < 0.05 was set as statistically significant. The related computerized programs with R are listed in the Appendix. The pathological stages were as follows: stage I for 71 patients (60.2%), stage II for 21 patients (17.8%), and stage III for 26 patients (22.0%). There was no significant difference in terms of age (P = 0.420), family history of lung cancer (P = 0.139) or pathological stage (0.810) between the two groups. However, significant differences in gender (P = 0.022), pathological type (P < 0.001), and smoking index (P < 0.001) were found between the two groups ( Table 1). Table 2, of the five general imaging features obtained from chest CT images, only pleural retraction was significantly different between the two groups (P = 0.003).  EGFR, epidermal growth factor receptor; GGO, ground glass opacity; OR, odds ratio; CI, confidence interval. *P-value was based on comparison between EGFR mutation (+) group with EGFR mutation (-) group.

3.C.2 | General imaging prediction model
In the training cohort of general imaging features, logistic regression analysis was performed, and the results revealed that GGO (p = 0.015) and pleural retraction (p = 0.041) were independent predictors of EGFR gene mutation. The ROC curve prediction model (imaging_training) based on general imaging features is shown in Fig. 4. The combination of the two models can significantly improve the predictive ability of EGFR mutation (imaging_training AUC = 0.703).   Fig. 6 show that the predictive ability of the combined prediction model was better than that of any single prediction model  The AUC, 95% CI, and the formula for calculating the score of the prediction models are shown in Table 3. No significant difference in AUC values was found between the training cohort and the validation cohort for any of the four prediction models.    The nomogram established by smoking_index, pleuralretraction and three radiomic features performed well in predicting EGFR mutation. It is an intuitive individual prediction model, and its prediction ability is supported by the C-index (0.894 and 0.92 for the training and validation cohorts, respectively) and the calibration curve.

| DISCUSSION
Limited by the small sample size, patients with EGFR exon 18, 19, 20, and 21 mutations were not analyzed separately in the present study. We hope that a large cohort of patients can be enrolled in future studies for further analysis.

| CONCLUSIONS
Smoking index, pleural retraction, and three radiomic features were identified as independent prognostic factors of EGFR mutation status in NSCLC. Radiomic features are better predictors than general imaging features or clinical features. Our study may help develop a noninvasive biomarker to identify EGFR mutation status by using a combination of the three group features.

AUTHORS' CONTRIBUTIONS
Dang YT was responsible for project conceptualization, data analysis, writing of the manuscript, and all manuscript revisions. Wang RT and Qian K were responsible for patient data collection. Lu J was responsible for CT data collection. Zhang HX was responsible for statistical analysis. Zhang Y was responsible for project conceptualization, manuscript revisions, and editing of the manuscript. All authors read and approved the final manuscript.

CONFLI CT OF INTERESTS
No conflict of interest exists.

ETH ICS APPROVAL
All procedures performed in studies involving human participants were in accordance with the ethical standards of both institutional and national research committees and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

CONSENT TO PARTICIPATE
Informed consent was obtained from all individual participants included in the study.

CONSENT FOR PUBLICATION
Not applicable.

CODE AVAILABILITY
All codes used with R are available in the Appendix.

DATA AVAILABILITY STATEMENT
The datasets used and analyzed during the current study are available from the corresponding author upon reasonable request.