Artificial Intelligence in Tumor Subregion Analysis Based on Medical Imaging: A Review

Medical imaging is widely used in cancer diagnosis and treatment, and artificial intelligence (AI) has achieved tremendous success in various tasks of medical image analysis. This paper reviews AI-based tumor subregion analysis in medical imaging. We summarize the latest AI-based methods for tumor subregion analysis and their applications. Specifically, we categorize the AI-based methods by training strategy: supervised and unsupervised. A detailed review of each category is presented, highlighting important contributions and achievements. Specific challenges and potential AI applications in tumor subregion analysis are discussed.


INTRODUCTION
In current clinical practice and research, tumor is usually assumed to be homogeneous or heterogeneous with similar distribution throughout the entire volume [2,26,49,59,153]. Recent studies have shown that some tumor regions may be more biologically aggressive than others and may play a dominant role in disease progression [41,53,109]. Neglecting such tumor heterogeneity at various spatial and temporal scales can lead to failures in prognosis and treatment [53]. Medical imaging has been shown to be able to reveal and quantify the heterogeneity within tumors [54,116,118]. Individual tumor can then be divided into sub-regions based on detected regional variations. Diagnosis, prognosis, and evaluation of treatment response can be performed individually in these subregions, and has proved superior to a simple analysis of the whole tumor [6,38]. Therefore, accurate detection and analysis of tumor sub-regions is of great clinical and research interest.
Over the last few years, artificial intelligence (AI) has achieved tremendous success in various tasks in the field of medical imaging [23,35,33,34,52,57,86,93,94]. Many AI-based methods have been proposed to locate and analyse tumor subregions for a variety of imaging modalities and clinical tasks. In this study, we review the applications of supervised and unsupervised AI models in imaging-based tumor subregion analysis. With this survey, we aim to:

ARTIFICIAL INTELLIGENCE
AI is a field that seeks to enable machines to learn from experience, think like humans, and perform human-like tasks. Machine learning (ML) is a discipline within AI, in which computers are trained to automatically improve performance on specific tasks based on experience. Training methods in ML are broadly composed of supervised, semi-supervised, or unsupervised strategies, each with decreasing need for human input. Within ML, deep learning (DL) employs multi-layer ("deep") networks of mathematical functions initially intended to imitate the structure and function of the human brain to fundamentally create a mapping from one representational domain to another (e.g. categorizing photos to names of the objects they contain). Both supervised and unsupervised methods are commonly used in DL for medical image analysis.

Supervised learning
In supervised learning, an algorithm is designed to learn a mapping function f (•) from the input variable (x) to the output variable (Y), i.e. Y=f(x). The goal is to approximate the mapping function well so that the output variable (Y) of new input data (x) can be accurately predicted. Least-absolute-shrinkageand-selection-operator (Lasso), random forest (RF), support vector machine (SVM), and artificial neural networks (ANN) are widely used algorithms in determing the mapping function. The Lasso is a shrinkage and feature selection method for linear regression [138]. It minimizes the sum of squared errors and the sum of the absolute value of coefficients. RF is an ensemble learning algorithm that boosts performance by combining the results of many weaker algorithms effectively reducing overfitting and building a model that is robust for discrete values in the feature space [90]. The object of SVM is to find a hyperplane in n-dimensional space that maximizes the separation of different classes of data in the feature space [19].
The multilayer perceptron (MLP) is a class of feedforward ANN wherein the biological unit of the brain ,the neuron, is modeled by the mathematical unit of a network node [126]. An MLP consists of at least three layers of nodes: an input layer, hidden layer, and output layer. All nodes except the inputs employ nonlinear activation functions. MLP uses a supervised learning technique called backpropagation to update the parameters of each node. The multilayer structure and nonlinear activation of MLP distinguish it from linear perceptrons and allow it to distinguish data that are not linearly separable. Although MLP has been successfully applied to practical problems in many fields, these models must be carefully trained and thoughtfully deployed to avoid overfitting or, alternatively, failure of convergence during inference.
Convolutional neural networks (CNN) have been widely applied in many tasks [5,50,55,82,137,166,167,170]. A typical CNN may be composed of several layers performing discrete computational tasks including: convolution at various scales of resolution, maximum or other forms of pooling, and batch normalization. The outputs of these layers may be omitted as in dropout or be passed as inputs to all subsequent layers when fully connected layers are employed. In order to improve the performance of deep CNNs, various architectures have been proposed. U-Net adopts symmetrical encoding and decoding paths with skip connections between them and is widely used in medical image segmentation. The residual network (ResNet) architecture employs a shortcut connection which reduces the likelihood of "vanishing" gradients during training, allowing the development of deeper networks.

Unsupervised learning
Supervised learning requires time-consuming and labor-intensive manual data annotations. In contrast, unsupervised techniques learn the distribution of input data and divide samples into clusters without labeled training dataset. Common unsupervised learning algorithms include the active contour model (ACM), hidden markov random fields (HMRF), the K-means and expectation-maximization (EM) algorithms, principal component analysis (PCA) and hybrid hierarchical clustering. ACM works to segment objects in an image by evolving a curve according to the constraints in the image [18]. The HMRF model is a random process generated by MRF. Its state sequence cannot be directly observed, but can be indirectly estimated through observation [165]. The EM algorithm is an iterative method that searches the (local) maximum likelihood or maximum a posteriori (MAP) estimate of the parameters in a statistical model [30]. PCA is an orthogonal linear transformation that reduces the dimensionality of the input data while retaining its most significant parts [152]. K-means identifies k centroids and assigns each data point to the nearest centroid by minimizing the sum of the squared Euclidean distances between each point and its assigned centroid [76]. Hybrid hierarchical cluster combines the advantages of bottom-up hierarchical clustering and top-down clustering, so it is applicable to various sizes of data [24].

Supervised learning in tumor subregion analysis of medical images
Supervised learning has been widely used in tumor subregion analysis for identification of recurrence volume, prediction of outcomes including overall survival (OS) or progression-free survival (PFS), and subregion segmentation. Sixty-four papers related to supervised learning are included in this paper. LOOCV: leave-one-out cross-validation. N/A:not available, indicating that the paper only provides the total number of samples.

Head and neck (HN)
CT and 18-FDG PET are often used in staging, radiation therapy treatment planning and evaluation of treatment response in patients with cancers of the head and neck [112,124]. PET provides detailed functional and metabolic molecular information, while CT reveals the precise anatomical position of the tumor. Table 1 shows a list of selected studies that used supervised learning in tumor subregion analysis based on medical images in the head and neck. Ding et al. investigated the clinicopathological characteristics of different supraglottic subregions and their correlation with the prognosis of patients with squamous cell carcinoma [32]. Supraglottic squamous cell carcinomas were divided into four types based on subregion: epiglottis, ventricular bands, aryepiglottic fold, and ventricle. A Cox proportional hazards model was used to generate a biomarker. They found that there were significant differences in the regional control rate, overall survival rate, and cancer-specific survival rates among different subregions, indicating that patients with carcinoma of the epiglottis or ventricular bands had an increased survival rate relative to those with disease in the aryepiglottic fold or ventricle. Beaumont et al. [14] developed a voxel-wise ML model to identify the sub-regions with tumor recurrence and to predict their location based on pre-treatment PET images. A RF model was trained with voxel-wise features. Voxel-wise analysis based on radiomic features and spatial location within the tumor was shown helpful in determining the location of recurrence and providing guidance to tailor chemoradiation therapy (CRT) through dose escalation within the area of radiation resistance.

Gliomas
Gliomas are the most common primary brain tumor and can be classified by histopathologic features into two groups: high-grade gliomas (HGG) and low-grade gliomas (LGG). Magnetic resonance imaging (MRI) is the main imaging modality to noninvasively diagnose brain tumors by providing high soft tissue contrast [13]. Dividing gliomas into substructures played an important role in glioma diagnosis, staging, monitoring and treatment planning for patients. Table 2 shows a list of selected studies using supervised learning in tumor subregion analysis based on medical images for gliomas.
Fiouznia et al. [42] developed a model to discriminate glioma tissue subregions based on multiparametric (mp) MRI. Based on the histopathological results, subregions were categorized into active tumor (AT), infiltrative edema (IE), and normal tissue (NT). In the study of Fischer et al., linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and SVM were applied to distinguish the three tissue subtypes from each other based on selected features derived from sub-regions. All three classifiers achieved the high classification performance (AUC 90%) with a combination of "CBV, MD, T 2 ISO, FLAIR" features . This capability might be advantageously employed to locate tissue subregions prior to image-guided biopsy procedures. Some studies further predicted OS or PFS based on tumor subregion analysis [87,174].
Zhou et al. [174] developed a framework to identify tumor subregions based on pretreatment MRI for patients with glioblastoma (GBM), correlating the image-based spatial characteristics of subregions with survival rate. Two datasets were included in this study. The habitat-based features were extracted from the GBM subregions derived from intratumoral grouping and spatial mapping. The results revealed that habitat-based features were effective for predicting two survival groups (accuracy is 87.5% and 86.36%, respectively). The results generated by classifiers (SVM, k-nearest neighbors (KNN), and naïve Bayes) showed that the spatial correlation features between the signal-enhanced subregions can effectively predict survival group (P ¡ 0.05 for all classifiers). GBM is further characterized by infiltrative growth at the cellular level that cannot be completely resected. Diffusion tensor imaging (DTI) has been shown to potentially detect tumor infiltration by reflecting microstructural destruction. To investigate the incremental prognostic value of infiltrative patterns over clinical factors and identify specific subregions that may be suitable for targeted therapy, Li et al. [87] [89] developed a model to predict IDH mutation status in GBM preoperatively based on multiregional radiomic features derived from mpMRI. The proposed model was tested on an independent validation cohort. IDH1 mutation was predicted by the RF model after using Boruta [83] for feature selection. The multi-tumor subregions were automatically segmented using a CNN [117]. The model's best performance achieved 97% accuracy with AUC 0.96, and F1-score 0.84. The multi-region model built using all-region features performed better than single-region models. The multi-region model achieved the best performance when combining age with all-region features. The results showed that the proposed model based on multi-regional mpMRI features has the potential to detect IDH1 mutation status in GBM patients prior to surgery.

BraTS challenge
As mentioned above, glioma subregion segmentation may play an important role in future glioma diagnosis, staging and treatment planning. Most of the research described here uses a non-public or institutional dataset, making it difficult to compare methods or results against other published work. The BraTS challenge stands in contrast to these, providing pre-operative mpMRI scans sourced from multiple institutions to inspire and evaluate the reproducibility of state-of-the-art methods for glioma brain tumor segmentation [10,9,11,12,103]. The data set includes images of four MR sequences: T1, T1-Gd, T2, and FLAIR. The labels are divided into four classes (0: healthy tissues, 1: necrosis and non-enhancing tumor, 2: edema, 4: enhancing tumor). The evaluation system divides the tumor into three regions for performance evaluation according to practical clinical application: (1) the whole tumor (WT) region with labels 1, 2, and 4; (2) the tumor core (TC) with labels 1 and 4; (3) the enhancing tumor (ET) region (lable 4). that can aggregate local detail information and 3D semantic context directly within the 3D convolutional layer [64]. Kamnitsas et al. developed a 3D-CNN with a dual pathway and 11 convolutional layers [75]. In order to cope with the computational burden of the 3D network, the processing of adjacent image paths was combined into a channel through the network during training, while automatically adapting to the inherent class imbalances existing in the data. They used a dual-path architecture to simultaneously process multi-scale input images to obtain multi-scale context information. A 3D fully conditional random field (CRF) was employed in post-processing and was shown to be effective in mitigating false positives. Havaei et al. developed a novel CNN with a two-pathway architecture which was adopted to simultaneously extract both local and global contextual features [61]. They modeled local label dependencies by cascade-CNN rather than CRF. This method can significantly improve computational speed by employing the efficient convolution operation rather than CRFs. Due to the tremendous success of the attention mechanism in computer vision at large [51,62,141,142,150,161] and medical image analysis specifically [1,110,120,136,162,171], Zhang et al. integrated an attention gate into U-Net to generate an Attention Gate Residual U-Net (AGResU-Net) model for brain tumor segmentation [163]. Several attention gate units were added to the skip connection of U-Net to highlight contrast information while disambiguating irrelevant and noisy feature responses. Table 4 lists the three top-performing studies from 2017 to 2019 with their results. Ensemble learning, cascade learning, and multi-scale operations are commonly added to CNNs to improve the accuracy of brain tumor subregion segmentation. In statistics and machine learning, ensemble learning combines models to surpass the performance of any one consitituent model and is commonly used to improve classification, prediction and segmentation performance. Kamnitsas et al. [74] developed a framework (EMMA) to combine several DL models for robust segmentation. EMMA independently trained DeepMedic [75], FCN [96], and U-Net [125], combining their segmentation predictions at testing. Myronenko et al. proposed a semantic segmentation CNN with asymmetric large encoders to segment tumor subregions [106]. A variational autoencoder (VAE) branch was added to the network to reconstruct the input images jointly with the segmentation and regularize the shared encoder. Finally, they assembled ten models trained from scratch to further improve performance. Zhao et al. [169] developed a self-ensemble U-Net, combining multi-scale prediction to boost accuracy with a slight increase in memory consumption. They also used the average of all models in the final ensemble and averaged the prediction of the overlapping patches to obtain a more accurate result. Cascade learning is a particular case of ensemble learning based on the concatenation-in-series of several models, using preceding model outputs as inputs for the next model in the cascade. Wang et al. trained three networks for cascade learning, each with a similar structure, including a large encoder part with dilated convolutions and a basic decoder [144]. The WT was segmented first and bounding box of the result was used for the TC segmentation. Finally, the ET segmentation was based on the bounding box of the TC segmentation. The 3 × 3 × 3 convolution kernel was decomposed into 3 × 3 × 1 and 1 × 1 × 3 kernels to reduce the number of parameters and deal with anisotropic receptive fields. Jiang et al. [69] developed a two-stage cascaded U-Net to segment brain tumor subregions from coarse to finescale. In the first stage, a U-Net predicts a coarse segmentation result based on the multi-modal MRI.
The coarse segmentation provides the rough locations of tumors and this is used to highlight contrast information. The coarse segmentation results are combined with the raw input images prior to input into a second U-Net with two decoder paths (one using a deconvolution, the other using trilinear interpolation) to generate a fine segmentation map. Zhou et al. [171] proposed an ensemble framework combining different networks to segment tumor subregions with more robust results. The proposed framework considered multi-scale information by segmenting three tumor subregions in cascade with a shared backbone weight and an attention block. Multi-scale and deeper networks may achieve better segmentation results because brain tumors have a highly heterogeneous appearance on MR images.
Mckinly et al. [101] proposed a U-Net-like network containing a DenseNet with dilated convolutions. The author also introduced a new loss function, a generalization of binary cross-entropy, to solve label uncertainty. In another study, Mckinly et al. [102] used a structure very similar to the previous one, [101] but replaced Batch normalization with instance normalization and added a simple local attention mechanism between dilated dense blocks. This study also included more data for training so that it may also improved the performance of the network. Isensee et al. made a minor modifications to U-Net, replacing ReLU and batch normalization with leaky ReLU and instance normalization to achieve competitive performance. [68] They also supplemented with data from their own institution to achieve a 2% increase in Dice similarity coefficient (DSC) on the enhancing tumor training data.
In the past three years, BraTS has also focused on prediction of OS. Table 5 lists the top three results for   [119] extracted features from segmented tumor region and introduced patient age into the feature space. PCA was performed to normalize the training set. The feature-wise mean, standard deviation, and projection matrix (W) were computed and stored during the rescaling phase of the PCA. The RF regression model was trained based on the normalized data. The feature vector of the test set was also normalized by the feature-wise mean and standard deviation derived from the training phase, and was then projected in the principal component space with W. The rescaled vectors were fed into the trained RF classifiers and the final prediction was obtained by majority voting. Sun et al. [135] extracted 4526 features from the tumor lesion based on their previous segmentation results. Important features were selected by decision tree and cross-validation. Finally, they trained an RF regression model to predict OS. MLP was another popular method for this task. Jungo et al. [72] computed 26 geometrical features from the segmented tumor regions and added age to complete the feature space. The four most important features were selected before being fed into a fully-connected neural network with one hidden layer and a linear activation function. Baid et al. [8] extracted features from segmented tumor regions and excluded high-correlation features by Spearman correlation. An MLP was trained with variables that demonstrated statistically significant correlation with OS. He et al. [143] selected seven features as input for a fully-connected neural network with two hidden layers. Their linear regression model also achieved good results. Feng et al. [47] extracted image features and non-imaging clinical features to construct a linear regression model. They used two-dimensional feature vectors to represent the nonimage features of resection status and compensate for sparse resection status data. They used a linear regression model to fit the training data after feature normalization. Weninger et al. [151] measured the volume of subregions based on segmentation results. The volume information, the distance between the centroids of tumor and brain, and patient age were used as input for linear regression to predict OS. In addition to radiomic features, Wang et al. [149] also considered biophysical modeling of tumour growth and calculated the ratio of second semi-axis length between TC and WT, to define a novel measure termed the relative invasiveness coefficient (RIC). Following feature selection, RIC, age and radomic features were fed into the epsilon-support vector regression. The method achieved an accuracy of 0.56 in OS prediction by incorporating RIC.

Unsupervised learning in tumor subregion analysis of medical images
Unsupervised learning has also been widely used in tumor subregion analysis of medical images for data without available or well-defined labelled training dataset. The twenty-five papers employing unsupervised learning techniques listed in Table 6 most focus on OS and PFS prediction and identification of tumor recurrence. There are several widely-adopted unsupervised algorithms, including level set methods (LSM), thresholding, individual-and population-level clustering, and K-means.

Level Set Methods
Level set methods are commonly used for unsupervised learning applied to segmentation tasks. Cui et al. [28] developed and validated prognostic imaging biomarkers to predict OS of GBM patients based on multi-region quantitative image analysis. Each tumor was semi-automatically delineated by the level set algorithm and the segmented lesion was further divided into several subregions based on the hidden Markov random field (MRF) model and the EM algorithm [165]. The biomarker was generated based on LASSO to predict the OS of the patients with GBM, and the model was tested by an independent cohort from the local institution. The concordance index and stratification of OS using the log-rank test were 0.78 and P = 0.018 for the proposed method, outperforming conventional prognostic biomarkers such as age (concordance index: 0.57, P = 0.389) and tumor volume (concordance index: 0.59, P = 0.409). In a later study, Cui et al. [29] defined a high-risk volume (HRV) based on mpMRI images for predicting GBM survival and investigated its relationship and synergy with molecular characteristics. Each tumor was delineated by the level set algorithm and manual correction was performed for eight failed cases. The patients with an unmethylated MGMT promoter and high HRV had significantly shorter OS (median 9.3 vs. 18.4 months, log-rank P = 0.002), indicating the volume of the high-risk intratumoral subregion identified on mpMRI can predict survival and complement genomic information.

Threshold-based Methods
Threshold algorithms are also suitable to separate tumor subregions based on imaging characteristics. Lawrence et al. [104] investigated whether three month treatment response of newly diagnosed GBM based on C-methionine-positron emission tomography (MET-PET) could better predict prognosis than baseline MET-PET or anatomic magnetic resonance imaging alone. A threshold of 1.5 times mean cerebellar uptake was used to automatically segment the metabolic tumor volume (MTV). Persistent MTV at three months was defined as the overlap of the three month MTV and the pre-treatment MTV. Cox proportional hazards was used to perform multivariate analysis of PFS and OS. The results showed that most patients (67%) with gross total resection (GTR) of newly diagnosed GBM have measurable postoperative MTV and that the total and persistent MTV three months post-CRT were predictors of PFS. GTV-Gd at recurrence encompassed 97% of the persistent MET-PET subvolume, 71% of the baseline MTV, 54% of the baseline GTV-Gd, and 78% of the three month MTV, respectively. The persisitent MET-PET subvolume best predicts the location of tumor recurrence. Legot et al. [85] developed a framework to identify the tumor subregions of head and neck squamous cell carcinoma (HNSCC) with the risk of high recurrence on 18F-FDG PET images so that these might be considered for CRT dose escalation. Follow-up 18F-FDG PET images were registered with baseline images using an automatic rigid registration algorithm based on mutual information. Seven metabolic tumor regions were segmented in baseline images by characteristic fixed percentages of SU V max and compared with two posttreatment subregions of local recurrence or residual metabolic activity. The overlap between metabolic tumor subregions derived from baseline and follow-up PET images was only moderate.
Estrogen Receptor (ER) status is a recognized molecular feature of breast cancer correlated with prognosis and its early detection can significantly improve treatment efficacy by guiding selection of targeted therapies [4]. Chaudhury et al. developed a novel framework to classify ER status by extracting textural kinetic features from peripheral and core tumor subregions [20] The WT was segmented using automatic threshold selection [113] combined with morphological dilation and connected component analysis. The WT was divided into two subregions according to tumor geometry. Two feature selection methods (wrapper [79] and correlation-based feature subset selection (CFS) [56]) and three classifiers (naive Bayes [70], SVM [19,36], decision tree [121]) were adopted in this study and each feature selector followed a classifier, for a total of six model composition combinations. The best classification accuracy approached 94%, indicating that sub-region texture feature extraction can accurately classify ER status.

Individual-and population-level clustering
Individual-and population-level clustering are used to assign each pixel or voxel to suitable clusters in order to divide a tumor into subregions. After tumor subregions are obtained, the relationship between tumor subregions, OS and PFS can be investigated.
Wu et al. used individual-and population-level clustering in three works related to tumor subregion analysis. In one of their studies [154], they developed a robust tumor partitioning method to identify clinically relevant, high-risk subregions in lung cancer. The method divided the tumor into subregions based on a two stage clustering process: it first performed patient-level over-segmentation of the tumor into superpixels via K-means clustering [76] on both PET and CT images, then these superpixels were merged to subregions via population-level hierarchical clustering [71]. High-risk subregions predicted OS and out-of-field progression (OFP) over the entire cohort with a C-index of 0.66-0.67. For patients with stage III disease, the C-index reached 0.75 (HR 3.93, log-rank P ¡ 0.002) and 0.76 (HR 4.84, log-rank P ¡ 0.002) for predicting OS and OFP, respectively. In contrast, the C-index was lower than 0.60 for traditional imaging markers. The results showed that the volume of the most metabolically active and heterogeneous solid components of the tumor could predict OS and OFP better than conventional imaging markers. In a second study, Wu et al. [156] developed an imaging biomarker to assess early treatment response and predicted outcomes in oropharyngeal squamous cell carcinoma (OPSCC). Based on 18F-FDG PET and contrast CT imaging, the primary tumor and involved lymph nodes were divided into subregions by individual-and population-level clustering. The proposed imaging biomarker was generated by the LASSO algorithm. The C-index was 0.72 for the training set and 0.66 for the validation set, suggesting the proposed biomarker can accurately predict disease progression and provide patients with better risk-adapted treatment. In a third study investigating risk-stratification in breast cancer, Wu et al. divided each tumor into multiple spatially segregated, phenotypically consistent subregions based on individual-and population-level clustering, and used a net strategy to construct an imaging biomarker based on image features derived from the multiregional spatial interaction (MSI) matrix [155]. The results showed that breast cancers may exhibit three intratumoral subregions with distinct perfusion characteristics, and tumor heterogeneity may be an independent predictor of recurrence-free survival (RFS), independent of traditional predictors.
In order to predict PFS in patients with nasopharyngeal carcinoma (NPC), Xu et al. extracted subregion features via individual-and population-level clustering to generate a biomarker by LASSO [158]. Three subregions (S 1 , S 2 , S 3 ) with distinct PET/CT imaging characteristics were obtained. The C-index and log-rank test for imaging biomarker S 3 and WT are 0.69 and 0.58, and P ¡ 0.001 and P ¡ 0.552, respectively, indicating S 3 is superior to WT in terms of prognostic performance. Imaging biomarker S 3 and American Joint Committee on Cancer (AJCC) stages III-IV were identified as independent predictors of PFS based on multivariate analysis (P=0.011 and P=0.042, respectively). When combined to form a scoring system, imaging biomarker S 3 and AJCC stages III-IV outperformed AJCC staging alone (log-rank test P ¡ 0.0001 vs. 0.0002; P ¡ 0.0021 vs. 0.0277 for the primary and validation cohorts, respectively). The results demonstrated that PET/CT subregion radiomics was able to predict PFS in NPC and provide prognostic information to complement other established predictors.
Even et al. [37] designed a subregional analysis for non-small cell lung cancer (NSCLC) using multiparametric imaging. The multi-parametric images were divided into subregions in two clustering steps: each tumor was first divided into homogeneous subregions (i.e. super voxels) before being segregated into phenotypic groups by hybrid hierarchical clustering [24]. Patients were clustered according to the absolute or relative volume of super voxels. The results showed that hypoxia, FDG avidity, and an intermediate level of blood flow/blood volume indicated a high-risk tumor type with poorer survival (P=0.035), providing evidence of the prognostic utility of subregion classification based on multiparametric imaging in NSCLC.

K-means
K-means is a popular unsupervised learning method that partitions samples into k clusters. Xie et al. developed a survival prediction model for patients with oesophageal squamous cell carcinoma (OSCC) prior to concurrent CRT [157]. The patient's tumor regions were divided into subregions by K-means clustering. Radiomic features were then extracted from these sub-regions to construct a biomarker based on the LASSO algorithm and predict OS. Independent patient cohorts from another hospital were used to validate the model. Torheim et al. used K-means in MRI imaging of cervical cancer to divide voxels into two clusters based on relative signal increase (RSI) time series. Clusters of hypo-enhancing voxels demonstrated a significant correlation with locoregional recurrence (P=0.048) [139]. Tumors with poor treatment response exhibited this characteristic in several regions, indicating a potential candidate for targeted radiotherapy.
Franklin et al. developed a method to semi-automatically segment viable and non-viable tumor regions in colorectal cancer based on DEC-MRI, and compared these with histological subregions of viable and non-viable tumor, analyzing extracted pharmacokinetic parameters between them [48]. The WT was manually delineated and four sub-regions were automatically obtained by PCA, followed by Kmeans. These four subregions were manually merged into two: viable and non-viable tumor. For viable tumor subregions defined by imaging and histology, DSC = 0.738 indicating the consistency of viable tumor segmentation between pre-operative DCE-MRI and postoperative histology. This technique may facilitate non-invasive assessment of treatment response in clinical practice.

Others
Seow et al. [129] segmented the solid subregion of high-grade gliomas in MRI images by active contour modeling (ACM). The different ratio ((s ACM −S manual )/ s ACM , where s ACM and s manual are segmented area of ACM and manual, respectively) is 1.3. This algorithm produced segmentations in under twenty minutes, while manual segmentation required an hour, demonstrating suitability for efficient segmentation of solid enhancing regions in the glioma tumor core. Fan et al. developed a framework to assess intratumoral heterogeneity in breast cancer based on the decomposition of DCE-MR images [39]. The whole breast tumor was segmented by the fuzzy C-means (FCM) algorithm [159]. A convex analysis of mixtures (CAM) method was then used to differentiate heterogeneous regions. Imaging features extracted from these regions were used predict prognosis and identify gene signatures. The results showed that tumor heterogeneity was negatively correlated with survival and the presence of cancer-related genetic markers of breast cancer. Wang et al. studied primary and secondary intrahepatic malignancies to determine whether an increase in tumor subvolume with elevated arterial perfusion during RT can predict tumor progression following treatment [146]. The arterial perfusion of tumors prior to treatment were clustered into low-normal and elevated perfusion by global-initiated regularized local fuzzy clustering (GIRLFC) [148]. The tumor sub-volumes with elevated arterial perfusion were extracted from the hepatic arterial perfusion images. The changes in tumor sub-volumes and arterial perfusion averaged over the tumors from pre-treatment baseline to mid-treatment were investigated for prediction of tumor progression following treatment. The results showed that an increase in intrahepatic subvolume with elevated arterial perfusion during RT may be a predictor of post-treatment tumor progression (AUC = 0.9). Lucia et al. [99] developed a framework to evaluate the overlap between the initial high-uptake sub-volume (V 1 ) on baseline 18F-FDG PET/CT images and the metabolic relapse (V 2 ) after chemoradiotherapy in locally advanced cervical cancer. CT images of recurrence were registered with baseline CT using the 3D Slicer Expert Automated Registration module [44] to obtain the deformation fields by optimizing the Mattes mutual information metric [100], and the corresponding PET images were registered using the corresponding deformation fields. The fuzzy locally adaptive Bayesian (FLAB) algorithm [58] was used to determine the sub-volumes V 1 and V 2 for baseline and follow-up PET images. The overlaps between the baseline high-uptake sub-volume and the recurrent metabolic volume were moderate to good (range (mean ± std)):

PREVALENCE OF METHODS
We have analyzed the percentage distribution of some attributes including the region of interest (ROI), learning strategy (supervised/unsupervised), technique (deep learning/non-deep learning), and imaging modalities (single/multi) ( Figure 2). Brain and chest sites are the most studied regions of interest, with brain being most studied overall, likely in part due to the BraTS challenge providing public data as well as ground-truth for the non-public data. Supervised learning accounts for 72% of works reviewed, owing to the greater reliability and transparency of training when groud truth is available. The category of multi-modal studies account for 85% of all works while the single-modality accounts for 15%. A Non deep-learning strategy is employed in 61% of the summarized studies.

SUMMARY AND OUTLOOK
AI methods from the field of computer vision have been widely adopted to complete several tasks in tumor subregion analysis. As reviewed in here, brain is the most commonly studied site followed by chest. Since the subregions of brain tumors are generally accompanied by ground-truth data, supervised learning methods are more commonly employed than unsupervised strategies. For other body sites, unsupervised methods are more popular due to the lack of ground-truth.
Currently, there is no universal image acquisition protocol for any imaging modality in clinical practice for sub-region analysis. Images acquired from different sites and scanners may affect the performance  [78] 2019 K-means DWI, PET Segmentation and Predict of these models. In order to address this issue, the quantitative imaging biomarkers alliance (QIBA) [15] and the quantitative imaging network (QIN) [73] have been working to formalize a standard imaging protocol.
Sample sizes in the reviewed studies were small to intermediate (median (range): 230 (4-626)). For supervised learning, a large training set is required to train a reliable model. A large validation set is also essential in rigorously evaluating the proposed methods. Except for the BraTS studies, most reviewed here used institutional data and may lack generalizability. Many studies on tumor sub-regions demonstrate correlations to survival, as well as treatment response and recurrence. To validate these findings, significant time must be invested in follow-up especially in diseases with low overall mortality. Validation may also be confounded by adjuvant treatment during the follow-up period, complicating the analysis of any relationships that are discovered.
Deep learning has demonstrated clinical utility in many tasks in medical imaging. At the time of writing, tumor subregion analysis is primarily in use for brain tumor subregion segmentation, but is rarely used in non-segmentation tasks or in other body sites. Great potential remains for DL applications in tumor subregion analysis. First, a CNN might be used to automatically extract useful features rather than relying upon handcrafted features. Secondly, for clinical tasks for which it is difficult to obtain manually-annotated ground truth data, an unsupervised CNN has been applied to solve the segmentation problem. As an example, Zhou et al. proposed a deep image clustering model to assign pixels to different clusters by updating cluster associations and cluster centers iteratively [173]. Thirdly, CNN could be used to generate radiomic signatures for various clinical applications based on tumor subregion such as OS prediction, treatment response prediction and clinical risk stratification. In order to realize the full potential of DL applications in tumor subregion analysis, models must be trained on large datasets with external cross-site validation.

Summary and Discussion
GANs have been increasingly used in the application of medical/biomedical imaging. As reviewed in this chapter, cGAN-and Cycle-GAN-based image synthesis is an emerging active research field with all these reviewed studies published within the last few years. With the development in both artificial intelligence and computing hardware, more GAN-based methods are expected to facilitate the clinical workflow with novel applications. Compared with conventional model-based methods, GAN-based methods are more generalized since the same network and architecture for a pair of image modalities can be applied to different pairs of image modalities with minimal adjustment. This allows easy extension of the applications using a similar methodology to a variety of imaging modalities for image synthesis. GAN-based methods generally outperform conventional methods in generating more realistic synthetic images with higher similarity to real images and better quantitative metrics. In implementation, depending on the hardware, training a GAN-based model usually takes several hours to days. However, once the model is trained, it can be applied to new patients to generate synthetic images within a few seconds or minutes. Due to these advantages, GAN-based methods have attracted great research and clinical interest in medical imaging and biomedical imaging.
Although the reviewed literatures show the success of GAN-based image synthesis in various applications, there are still some open questions that need to be answered in future studies. Firstly, for the training of GAN-based model, most of the reviewed studies require paired datasets, i.e., the source image and target image need to have pixel-to-pixel correspondence. This requirement poses difficulties in collecting sufficient eligible datasets, as well as demands high accuracy in image registration. As compared to cGAN, it is demonstrated that Cycle-GAN can relax the requirement of the paired datasets to be unpaired datasets, which can be beneficial for clinical application in enrolling large number of patient datasets for training. However, even the image quality derived by Cycle-GAN can be better than cGAN, the numerical performance may not be improved significantly in some synthesis tasks due to the residual mismatch between synthetic image and ground truth target image.
Secondly, although the merits of GAN-based methods have been demonstrated, its performance can be inconsistent under the circumstances that the input images are drastically different from its training datasets. As a matter of fact, unusual cases are generally excluded in most of the reviewed studies. Therefore, these unusual cases, which do happen occasionally in clinic setting, should be dealt with caution when using GAN-based methods to generate synthetic image. For example, some patients have hip prosthesis. The hip prosthesis creates severe artifacts on both CT and MR images. The related effect of its inclusion in training or testing dataset towards network performance is an important question that has not been studied yet. There are more unusual cases that could exist in all those imaging modalities and are worth of investigation, just to name a few: all kinds of implants that introduce artifacts, obese patients whose scan has higher noise level on image than average, and patients with anatomical abnormality. To conclude, the research in image synthesis is still wide open. The authors are expected to see more activities in this domain for the years to come.