Toward automatic prediction of EGFR mutation status in pulmonary adenocarcinoma with 3D deep learning

Abstract To develop a deep learning system based on 3D convolutional neural networks (CNNs), and to automatically predict EGFR‐mutant pulmonary adenocarcinoma in CT images. A dataset of 579 nodules with EGFR mutation status labels of mutant (Mut) or wild‐type (WT) was retrospectively analyzed. A deep learning system, namely 3D DenseNets, was developed to process 3D patches of nodules from CT data, and learn strong representations with supervised end‐to‐end training. The 3D DenseNets were trained with a training subset of 348 nodules and tuned with a development subset of 116 nodules. A strong data augmentation technique, mixup, was used for better generalization. We evaluated our model on a holdout subset of 115 nodules. An independent public dataset of 37 nodules from the cancer imaging archive (TCIA) was also used to test the generalization of our method. Conventional radiomics analysis was also performed for comparison. Our method achieved promising performance on predicting EGFR mutation status, with AUCs of 75.8% and 75.0% for our holdout test set and public test set, respectively. Moreover, strong relations were found between deep learning feature and conventional radiomics, while deep learning worked through an enhanced radiomics manner, that is, deep learned radiomics (DLR), in terms of robustness, compactness and expressiveness. The proposed deep learning system predicts EGFR‐mutant of lung adenocarcinomas in CT images noninvasively and automatically, indicating its potential to help clinical decision‐making by identifying eligible patients of pulmonary adenocarcinoma for EGFR‐targeted therapy.


| INTRODUCTION
Lung cancer is one of the leading causes of cancer-related death. 1 Non-small cell lung cancer (NSCLC) accounts for more than 80% lung cancer cases, where adenocarcinoma is the most common histological subtype. 2 With the advancements of genomics, several genomic alternations, such as oncogenic ALK rearrangements, 3 ROS1 rearrangement, 4 KRAS mutations, 5 and sensitizing EGFR mutations, 6 have been identified as predictive and prognostic markers for NSCLC. Targeted therapy with these molecular markers has become an essential part of precision medicine for lung cancer.
Small molecule tyrosine kinase inhibitors (TKIs) that target specific EGFR mutations have resulted in improved progression-free survival (PFS) and higher objective radiographic response rate in patients with EGFR mutations than standard chemotherapy. [7][8][9] Moreover, approximately 80% patients with EGFR-mutant lung cancer respond to EGFR TKIs therapy (at initial treatment). 10 However, the administration of EGFR TKIs (for example, gefitinib) on the patients without EGFR mutations, not just showed no effect but even resulted in worse PFS and unnecessary costs compared to platinum-based chemotherapy, 11 which highlights the importance of identifying eligible patients for the first-line TKI therapy.
Mutation profiling after biopsies or surgical resections has become a standard and informative medical procedure. However, the high cost and invasiveness of the approaches and repeating tumor sampling strongly limit the applicability of molecular testing. Besides, the poor DNA quality, intratumoral heterogeneity and long turnaround time raise much concern on the balance of costs and benefits. 12,13 Notwithstanding re-biopsy is feasible in clinical practice, it is an invasive operation and faces the same challenges voiced by previous studies. 14,15 These defects enormously limit the practicability of precision medicine at scale.
Alternatively, noninvasive markers are promising for predicting EGFR mutation. 16,17 Cancers with different genotypes drive specific biological processes involved in the development and progression of tumors, thus ultimately leading to different phenotypes. In other words, it is potentially feasible to predict the genotypes by identifying specific phenotypes. There have been studies investigating the phenotype-genotype associations to identify associated genomic changes, 18,19 based on tumor morphology on computed tomography (CT) 20,21 and magnetic resonance imaging (MRI). 22 Radiomics, 20 that encodes tumor phenotypes with innumerable quantitative features using predefined image analysis algorithms. In particular, previous studies have demonstrated that certain radiomic features are associated with EGFR mutations status, suggesting that those identified features may be driven by somatic mutations. 16,23 Although these studies have achieved impressive performances, especially when combined with clinical information, 17,23,24 conventional radiomics-based methods are born with three main challenges. Firstly, conventional radiomics methods require strict procedures, including detection, segmentation, feature extraction, selection etc, 25 which is tedious and time-consuming. Secondly, radiomic features are susceptible to the manual segmentation as well as CT scanning parameters, thus interobserver reproducibility analysis is necessary. Finally, hand-craft radiomic features are expressive but may be not enough for high-level tasks.
On the other hand, deep neural networks, or Deep Learning, have achieved remarkable success in several important problems of artificial intelligence (AI), for example, natural image classification 26 and human language translation. 27 Deep convolutional neural networks (CNNs), a family of neural networks, have shown incredible effectiveness in several tasks of natural image computer vision and medical image computing. 28,29 As powerful algorithms of representation learning, CNNs largely reduce the necessities of hand-craft feature engineering. Our previous study has proven the effectiveness and efficacy of deep learning in predicting the invasiveness of lung adenocarcinomas from CT images. 30 In this regard, we addressed the problem of CT-based EGFR mutation prediction by deep neural networks, to make our system automatic, robust and accurate.
In this study, we aimed to develop a deep learning system to predict the EGFR mutation status of lung adenocarcinoma based on CT images by integrating recent advances in deep supervised learning, such as dense connections 31 and mixup training, 32 to significantly reduce the empirical risks of overfitting. Our method is a labor-saving strategy without the requirement of precise nodule segmentation, and also expected to obtain more stable performance due to the enhanced nature of the employed learning algorithms.
in CT images noninvasively and automatically, indicating its potential to help clinical decision-making by identifying eligible patients of pulmonary adenocarcinoma for EGFR-targeted therapy.

| MATERIALS AND METHODS
We developed a novel analytic framework based on deep learning of the CT imaging phenotypes of lung adenocarcinoma to predict EGFR mutation status (see Figure 1). This retrospective study was approved by the Institutional Review Board of our institution (NO.20170103), which waived the requirement for patients' informed consent referring to the CIOMS guideline.

| Data collection
This study used local data collected in Huadong Hospital (HdH Dataset) and publicly available data selected from TCIA (TCIA Dataset). 33,34 In HdH Dataset, the inclusion criteria were (a) patient receiving thin-slice chest CT (0.75-1.5 mm) scan prior to biopsies or surgical treatment, (b) with pathology reports for the diagnosis of pulmonary adenocarcinoma, and (c) with detailed EGFR mutation testing reports. Only one malignant nodule was studied for each patient due to the availability of EGFR testing report. Among the 579 lung adenocarcinoma patients, 308 were EGFR-mutant (Mut) and 271 were EGFR wild-type (WT). The detailed characteristics of patients and lesions in HdH Dataset are presented in Table 1. CT acquisition parameters are described in Appendix S1. In the TCIA Dataset, 37 nodules were selected to validate the stability and generalization of the analytic framework. The inclusion criteria were (a) CT with slice thickness ≤1.5 mm (to avoid data inconsistency), (b) with EGFR mutations testing reports, (c) with pathology reports for the diagnosis of lung adenocarcinoma, (d) and lesions that could be certainly identified as the resected or biopsied lesions. The detailed information of the TCIA Dataset is described in Table S1.

| EGFR mutation profiling
Drug target-associated mutations on EGFR exons 18, 19, 20, and 21 were examined using a PCR-based amplification-refractory mutation system (ARMS) with Human EGFR Gene Mutations Fluorescence Polymerase Chain Reaction Diagnostic Kit (ADx-EG01, AmoyDx). Wild-type EGFR in this study referred to no mutation detected among those loci.

| Data annotation and pretreatment
Volumes of interest (VOIs) of the enrolled nodules in both datasets were manually delineated at voxel level by a radiologist F I G U R E 1 Overall pipeline for this study. A local CT dataset (HdH Dataset) and a public dataset selected from TCIA database (TCIA Dataset) of lung adenocarcinoma patients with EGFR mutation testing were used. Nodules were manually localized, segmented, and labelled as EGFR mutant (Mut)/wild-type (WT). For deep learning, 3D DenseNets were trained using the training subset. A strong data augmentation technique, mixup, was used for better regularization. Expressive representations, that is, deep learned radiomics (DLR), for the nodules were end-to-end learned during the training procedure. Meanwhile, conventional radiomics analysis following the common practice was carried out for performance comparison, and association study between the 3D deep leaning and conventional radiomics was performed by calculating the pairwise correlation coefficients with 4-year experience in chest CT interpretation using medical image processing and navigation software 3D Slicer (version 4.8.0, Brigham and Women's Hospital), and subsequently confirmed (modified or re-delineated) by another radiologist with 12-year experience in chest CT interpretation. DICOMformat images were imported into the software, subsequently the images with VOI information were exported into NII format for further analysis. Each segmented nodule was assigned a specific EGFR mutation label (Mut or WT) according to the corresponding EGFR mutation testing report.
Deep learning is known to require heavy tuning. Thus, we randomized the HdH Dataset into training, development, and test subsets, containing 60%, 20%, and 20% (only once) of each EGFR categories (Mut or WT), respectively. The training set was used for training the neural networks. The development set was used for tuning the hyperparameters, such as determining early stopping, discovering the most suitable neural architectures, and choosing the best model snapshots. Once we chose the best model using the development set, the holdout test set was used for fairly evaluating the performance of our method. To further validate the stability and generalization of our method, we evaluate our model on the independent TCIA Dataset, without any fine tuning.

| EGFR mutation status prediction with conventional radiomics
We investigated the performance of the conventional radiomics method using the two datasets. Following the common practice of radiomics method, 23 the radiomic features were extracted with numbers of image analysis algorithms. Specifically, 475 radiomics features, including 50 histogram features, 325 co-occurrence matrix features, and 100 run lengths matrix features, were automatically extracted using MATLAB 2016b for all delineated nodules in the two datasets. The detailed radiomics extraction methodology is described in Appendix S1.
Radiomic features require refined manual segmentation of nodules. Even so, many radiomic features are not stable due to different segmentation manners. To ensure the reproducibility of radiomics, 50 nodules in the HdH Dataset were randomly selected for independent segmentation by two radiologists. The 475 radiomics features of the 50 nodules were evaluated with interclass correlation coefficient (ICC) analysis using the "irr" package in R software (version 3.4.3). Features with an ICC > 0.8 were considered reliable. Finally, a total of 401 features were selected.
Logistic regression was used to model the relation between the radiomic features and EGFR status with the widely used scikit-learn library in Python. 17,35 Due to the simple hyperparameter setting of logistic regression, the training and development sets were merged into a "train-dev" set. Tenfold cross validation search was performed on the train-dev set with 1000 randomly sampled values of regularization term C. The trained model was used for scoring the HdH test Dataset and TCIA Dataset. We also tried heavy search for models and hypermeters with an automated machine learning tool (AutoML), 36 built upon widely used scikit-learn Python package 37 ; however it did not yield better performance. Note that the feature selection procedure to reduce the redundancy between the features had been included in the AutoML search procedure. Although there may exist a better model theoretically, the heavy search process implied conventional radiomic features are not strong enough representations.

| EGFR status prediction with 3D DenseNets
To take the advantage of deep representation learning, we designed a deep learning-based framework for analyzing the EGFR mutation status of nodules. Taking into account the characteristics of the used data, we have adopted the following principle to design the models: • 3D: CT images were presented in 3D, and the 3D views provided critical visual information from a practical point of radiology; • Parameter-efficient: the model should be compact and easy to train with limited data; • Data-efficient: our training method should lead to less overfitting; • Automatic: at inference stage, our model should be easy to use, given approximate locations of nodules, rather than the manual segmentation.
Following these principles, we adopted 3D DenseNets, which were derived from the powerful deep convolutional neural networks 2D DenseNets. 31 DenseNets have achieved great success in 2D natural images and medical images, 38 which elegantly reuse the features from lower layers, thus exploring high-level representations in an efficient way. Our 3D DenseNets used same notations but 3D convolutions, with a growth rate k = 16, a compression rate θ = 2, and a bottleneck B = 4. Batch Normalization 39 and Leaky ReLU 40 (α = 0.1) were used together as activation functions. The resulting neural networks contained only 0.55 M parameters, making the training compatible with limited data. We implemented the neural networks with Keras 2.1.5 41 and TensorFlow 1.4.0. 42 The detailed structure of the 3D DenseNets is depicted in Table 2.
The inputs of the proposed 3D DenseNets were cubic patches of 48 × 48 × 48 mm, generated by (pre-processed) chest CT scans and the coordinates c = [z,y,x] of the approximate mass centers of nodule. In practice, the coordinates can be marked by radiologists manually, or by automatic nodule detection systems. 43 Our method did not require the manually refined segmentation, which were usually done by the users (radiologists) manually. The preprocessing followed a "standard" procedure: the input patches were converted into Hounsfield units, followed by resizing the volumetric data into spacing of 1 × 1 × 1 mm by trilinear interpolation, clipping the voxel intensity into I HU ∈ [ − 1024,400], quantifying the density into grayscale, and transforming the values to I ∈ [−1,1) by the mapping I = ((I HU + 1024)∕(400 + 1024)) × 255 ∕128 − 1 . We further applied multiple data augmentation techniques to represent nodules in different views: • Moving the centers by small amounts in [−m, m] pixels in the three axes • Reordering the axes and rotation by 90° increments • Left-right flipping.

| Training for the deep learning models
During training, we applied the above data augmentation techniques with m = 8. These conventional techniques could effectively increase the training data size, yet not enough for a good training convergence. We further used a novel data augmentation technique: mixup 32 to stabilize the training. mixup showed remarkable improvements in state-of-the-art natural image classification neural networks. To our knowledge, this is the first comprehensive study introducing mixup into medical image computing.
The mixup produces extra informative training samples by the following easy-to-implement data augmentation routine. Given Beta( , ) ∈ [0,1], α is a predefined hyperparameter, and (x i , y i ) and (x j , y j ) are two input-label pairs from the training distribution, then extra training samples are obtained by the following equations: α defines the strength of mixup by controlling the beta distribution; when α = 0, mixup ceases to be effective. In our experiments, we set α = 0.9 for modest usage of mixup. As shown later in the Results section, mixup was critical for enabling the training of deep neural networks with limited data on this task.
We trained the 3D DenseNets with binary cross-entropy loss: where y is the ground truth of the input, and ŷ is the prediction given by the models. "Kaiming uniform" method 44  the best model of a single run was selected by the following heuristics: • initially, select five candidates with the highest AUCs on the development set; • then, choose one of the candidates with the lowest absolute difference of training and development AUCs.
Two hundred epochs were usually enough for a good convergence. In our experiments, the best model snapshot was generated at the end of epoch 131.

| Inference for the deep learning models
Representing nodules in multiple views not only increases the training data size, but also effectively reduces empirical variance of inference by the ensemble trick. In practice, given an input, we forwarded the well-trained neural network using the mentioned conventional data augmentation (m = 3) with 100 runs. Then, probability scores of EGFR Mut were obtained by averaging the multiple runs: We used the last-layer (the DLR layer in Table 1) 114-d outputs of the trained 3D DenseNet as the learned representations

T A B L E 3 Presentation of our
method and several previous studies in terms of methods, datasets, and resulting classification AUCs given nodules, that is, the deep learned radiomics (DLR). More robust DLR was obtained by averaging the feature outputs of multiple runs. Note the DLR and the probability scores could be obtained within a single neural network forward pass.

EGFR mutation status prediction
We compared the performance between deep learning and conventional radiomics on predicting EGFR mutation status of lung adenocarcinoma, which implied the associations of EGFR genotype and radiomic phenotype. The AUC, which is insensitive of class skews, was used for evaluation. As depicted in Table 3 and Figure 2C, the 3D DenseNets (with mixup, ensemble) outperformed our conventional radiomics-based method (P = 0.021, DeLong test 47 ). Several prior studies were also listed in Table 3. Since they used nonshared datasets independently, their results were only for reference only. On our holdout test set (HdH test Dataset) containing 115 patients and the TCIA Dataset of 37 patients, 3D DenseNets achieved AUCs of 75.8% and 75.0%, respectively. Considering the class-imbalance and threshold bias, we believed the threshold-free AUROC was the most appropriate metric. For reference, we applied a threshold that maximizes the accuracy to obtain the sensitivity, specificity,  Table S2. Notably, the data distribution of the TCIA Dataset differed much from that of the HdH Dataset in terms of patients and imaging; the comparable performances on the two datasets indicated the robustness of our method. More importantly, unlike conventional radiomics methods that require labor-intensive manual segmentation masks, our deep learning method was compatible with approximate locations of the nodules instead.

| Potential association between the conventional radiomics and deep learned radiomics
Radiomics analysis provides important medical insights. However, conventional radiomics requires manual segmentation, which is a tedious process in practice. Instead, our method based on 3D deep learning predicts the labels automatically with even better performance. To our knowledge, few prior studies compare these two methods together. Assumed that deep learning networks particularly learn the representation of radiomics with end-to-end training, we investigated the relation between the both, and discuss the reason why deep learning provided better discriminative performance on this problem. To this end, we analyzed 401 conventional radiomic features (ICC > 0.8), against 114 deep learned radiomics (DLR) features extracted from 3D DenseNets (see the "Inference for the deep learning models" section). As illustrated in Figure  2A,D, the unsupervised clusters of both the conventional radiomics and DLR matched much with the semantic labels (Mut/WT); in other words, the continuous regions on the black-grey bars shared numbers of similar features respectively. However, the cluster map of DLR had obviously shaper contrast than that of conventional radiomics, which indicated the DLR features are more representative and compact than the hand-craft features. It well explained the higher classification performance of DLR than conventional radiomics.
Moreover, we associated DLR and conventional radiomics, by visualizing the pairwise Pearson correlation coefficients on the HdH test Dataset with a cluster map in Figure 2B. Though the radiomic features and DLR features do not highly correlate (correlation coefficients range from −0.5 to +0.5), the cluster map indicated relatively high correlation (bright red or blue) between DLR and radiomics. It is worth noting that, most of the radiomic features were potentially correlated with at least one DLR feature, while certain DLR features showed low correlation with all the radiomic features, which potentially corresponded to the extra highlevel information with more hints on modeling the EGFR status. Considering that our DLR-based method showed better performance, it is reasonable to assume that the DLR features encoded automatically not only almost all information of the (highly correlated) conventional radiomics with even more compact representation (more informative with lower feature dimension), but also critical discriminative information not in the conventional radiomics.
To verify our assumption further, we used the 114 DLR features together with the 401 radiomics features to fit a cross validation searched logistic regression model as in our conventional radiomics analysis. An automatic machine learning tool (AutoML) 36 was also applied for searching a best-performing model. The resulting model achieved AUCs of 71.2% and 74.2% on the HdH test Dataset and TCIA Dataset, respectively, which performs poorer than using DLR alone. It implied that, combining the conventional radiomics and DLR naïvely could not boost the performance, due to the high correlation between the both. Note that the feature selection procedure to reduce the redundancy between the features had been included in the AutoML search procedure.

| Ensemble vs vanilla (no ensemble)
The inference-stage data augmentation by representing a nodule in multiple views, was not only reasonable from a medical perspective, but also made sense in our empirical results. As depicted in Table 4, this simple ensemble technique was beneficial to reducing the inference variance. The vanilla term referred to inference with a single forward. The ensemble inference produced a more stable performance, on the HdH Dataset (training, development, test) and TCIA Dataset, though the vanilla inference might produce even better results in some cases.

| Mixup vs no mixup
The mixup technique was critical for training our 3D DenseNets. As shown in Table 4, the models with mixup training significantly outperformed those without in all of our experiments. The models without mixup training follows the same model selection procedure. We explained the reason for these remarkable differences with the surprisingly strong regularization effects of mixup training. As demonstrated in Figure 3, the learning curves with mixup training were much smoother, and no severe overfitting was observed. On the contrary, the training without mixup was not stable; besides, there existed typical overfitting, making the model selection impractical. Considering the unreasonable effectiveness of mixup training, it could potentially be a standard technique in medical image computing in the future.

| The t-SNE visualization of the deep learned radiomics
To explore the manifold structure of the DLR intuitively, we visualized the DLR features on the HdH test Dataset using t-Distributed Stochastic Neighbor Embedding (t-SNE) 48 as illustrated in Figure 4. No distinct pattern was shown on the t-SNE visualization, which was reasonable considering the difficulty of using the phenotype information to predict the genotype expression. Even so, two clusters of EGFR Mut and EGFR WT could be found, see Figure 4. Most of the DLR features outside of the contours of the two clusters were assigned a score with high uncertainty by the 3D DenseNets. Nevertheless, the DLR features were shown to be meaningful and representative despite the difficulty of the task.

| DISCUSSION
This study developed a deep supervised learning approach to predict EGFR-mutant lung adenocarcinoma in CT images. Compared with conventional radiomics, the models are less prone to overfitting on the limited training data and show better prediction performance. The deep learning method is also labor-saving since it does not require precise segmentation of nodules. More importantly, we empirically find that the deep learning models learn more representative features than conventional radiomics, named deep learned radiomics (DLR), which are the keys to better analytic performances.
Our method outperformed previous studies and established a new methodology on this task. Using imaging information only, our deep 3D DenseNets achieved AUCs of 75.8% and 75.0% on the HdH test Dataset and independent public TCIA Dataset, respectively, which were better than the best of previous radiomics method (69%), and on par with their model combining additional clinical information (75%). We analyzed the associations between DLR and conventional radiomics, and empirically showed that DLR features were more representative and learned extra imaging information compared to conventional radiomics. Furthermore, we introduced the mixup training to our deep learning method, which suggested "unreasonable effectiveness" in regularizing the training of the deep neural networks with limited data.
Decoding image phenotypes using radiomics to predict tumor genotypes, called radio-genomics as well, shows promising performances and outcomes in precision medicine research. A few recent studies have investigated radiomics analysis to noninvasively predict EGFR mutation status. 16,23 For example, Rios et al 23 discriminate EGFR Mut and WT cases with an AUC of 69%, with an improvement on AUC (75%) when combining clinical information. Despite these impressive results, radiomics-based methods are limited with F I G U R E 3 The learning curves of the best models with mixup training and those without, in terms of binary cross-entropy loss. The losses on the HdH training, development (val) set and test Dataset were shown on the figures. "epochs" on the x-axis means the training consumes once the entire training set expressiveness of the hand-craft features; besides, the requirement of manual segmentation makes it difficult to apply in practical clinical contexts.
In contrast, our deep learning method is automatic in predicting the EGFR mutation status, which only requires an approximate location of nodule instead of labor-intensive voxel-wise segmentation. More importantly, the resulting DLR features are learned automatically with end-to-end training. In our association analysis, we found that some DLR features were not correlated with any radiomic features in our experiments, implying extra information extracted by our 3D DenseNets. Powered by dense connections 31 and mixup training, 32 our method achieved a state-of-the-art performance. Considering that sensitizing EGFR mutations are more likely found in Asian patients, 2 a selected subset from TCIA database 33 was used for testing the generality of our system. Our deep learning system showed robust performance on the public data set.
Radiomic features, defined by image analysis algorithms on numbers of visual characteristics, such as shape, volume, and intensity, are low-level features without high-level semantic information. However, these features are naturally "grey-box" interpretable. On the other hand, DLR features are semantically high-level thanks to the end-to-end learning, yet deep learning models are known to be black-box artificial intelligence lack of the desirable interpretability, especially in the medical contexts. Our association analysis not only provides insight on understanding the DLR features, but also approaches opening the black-box of deep learning in medical image computing by a grey-box model. However, it is hardly possible to identify the specific correlation between radiomics and DLR, we leave this for further exploration.
There were limitations in our study. Due to resource limitation, EGFR mutations were detected with ARMS-PCR covering specific loci, thus EGFR mutations were narrowly defined in this study. Further studies with mutations detected by second-generation sequencing are needed. Besides, information within the CT images is relative limited; integrating more available information of patients, such as clinicopathological facets, blood testing result, proteomics, and even the lifestyles, into the models can be beneficial to inferring the genotypes in multiple views. Recently, liquid biopsy, using blood as opposed to tumor samples for molecular analysis, was developed to identify EGFR mutations with promising results. 49,50 Taking advantage of these valuable information or integrating different promising approaches may improve the performance of predicting the EGFR mutation status. We consider it as a future direction. Moreover, the sample size of the independent validation was still small, and our deep leaning approach should be tested in larger cohorts. Also, this was a single-center study, our deep learning systems desire larger datasets with more diversity. Further improvement can be made to practically help scalable precision medicine; however, the methodology established by this study, could serve as a paradigm for future studies. Lastly, only thin-slice CT images were included in the current study to mitigate the radiomic feature variabilities. However, whether the deep learning method could perform better with different slice thickness CT images is worth further investigating. Actually, this is one of our ongoing research.
In conclusion, the proposed deep learning system predicts EGFR-mutant lung adenocarcinoma in CT images automatically and noninvasively with promising performance, indicating the potential to help clinical decision-making by identifying eligible patients of pulmonary adenocarcinoma for EGFR-targeted therapy. The association study between conventional radiomics and deep learned radiomics discusses the relation between the both, and approaches toward a greybox explanation methodology of black-box model. F I G U R E 4 t-SNE visualization of the deep learned radiomics in a 2D space on the HdH test Dataset. A, The t-SNE visualization scatter plot colored by the EGFR labels. Clusters can be found labeled by the ground truth. B, The t-SNE visualization scatter plot colored by the EGFR probability score predicted by the 3D DenseNets. The prediction scores are consistent with ground truth