A machine learning‐based survival prediction model of high grade glioma by integration of clinical and dose‐volume histogram parameters

Abstract Purpose Glioma is the most common type of primary brain tumor in adults, and it causes significant morbidity and mortality, especially in high‐grade glioma (HGG) patients. The accurate prognostic prediction of HGG is vital and helpful for clinicians when developing therapeutic strategies. Therefore, we propose a machine learning‐based survival prediction model by analyzing clinical and dose‐volume histogram (DVH) parameters, to improve the performance of the risk model in HGG patients. Methods Eight clinical variables and 39 DVH parameters were extracted for each patient, who received radiotherapy for HGG with active follow‐up. Ninety‐five patients were randomly divided into training and testing cohorts, and we employed random survival forest (RSF), support vector machine (SVM), and Cox proportional hazards (CPHs) models to predict survival. Calibration plots, concordance indexes, and decision curve analyses were used to evaluate the calibration, discrimination, and clinical utility of these three models. Results The RSF model showed the best performance among the three models, with concordance indexes of 0.824 and 0.847 in the training and testing sets, respectively, followed by the SVM (0.792/0.823) and CPH (0.821/0.811) models. Specifically, in the RSF model, we identified age, gross tumor volume (GTV), grade, Karnofsky performance status (KPS), isocitrate dehydrogenase (IDH), and D99 as important variables associated with survival. The AUCs of the testing set were 92.4%, 87.7%, and 84.0% for 1‐, 2‐, and 3‐year survival, respectively. According to this model, HGG patients can be divided into high‐ and low‐risk groups. Conclusion The machine learning‐based RSF model integrating both clinical and DVH variables is an improved and useful tool for predicting the survival of HGG patients.


| INTRODUCTION
Glioma is the most common type of primary brain tumor in adults, representing more than 80% of malignant intracranial tumors. 1,2 The World Health Organization (WHO) pathologically classifies glioma into Grades I, II, III, and IV glioma according to histological features. 3 Among them, Grades III and IV are categorized as high-grade glioma (HGG), characterized by poorly differentiated and highly aggressive tumor cells and a poor prognosis. 3 Despite the standard treatment consisting of surgery followed by radiotherapy and temozolomide (TMZ) chemotherapy, 4,5 the morbidity and mortality of HGG are still very high. 1 It is therefore essential to accurately predict the prognosis of patients, to guide clinicians in making personalized treatment decisions and surveillance strategies for patients with different risk levels.
Researchers have made great efforts to understand individual risk profiles and develop survival prediction models. Previous studies have shown that tumor grade, age, Karnofsky performance status (KPS), extent of surgery, and other clinical data are important prognostic factors for HGG patients. 6 A few molecular markers, such as isocitrate dehydrogenase (IDH) 1 and 2 mutations, 1p/19q chromosomal codeletion, and O-6-methylguanine-DNA methyltransferase (MGMT) promoter methylation, have also been suggested to be useful in survival prediction. 7,8 Interestingly, recent studies have elucidated that dosimetric parameters can affect prognosis. For example, Yang et al. found that the dose-volume histogram (DVH) signature reflecting the planning score was an independent predictor of progression-free survival in locoregionally advanced nasopharyngeal carcinoma. 9 Rijkmans et al. evaluated the association of dosimetric parameters with tumor regression in local advanced rectal cancer and found that tumor volume is the most important predictive factor. 10 Considering the crucial role of radiotherapy in the treatment course, in this study, we will integrate clinical variables with DVH parameters to predict the overall survival (OS) of HGG patients.
To resolve this issue, different approaches can be used. The classical approach-the Cox proportional hazard (CPH) model-can be employed to identify clinical parameters and dosimetric parameters that significantly affect the outcome of interest. The CPH model is the most frequently used method in survival analysis because of its convenience. However, it assumes that the outcome is a linear combination of covariates, but patient data are diverse and complex and generally, they cannot be considered linear. 11,12 The random survival forest (RSF) model, one of the most widely used methods of machine learning, enables the detection of relationships from complex datasets. RSF is a flexible nonparametric treeensemble method for the analysis of right-censored survival data. 13 It builds hundreds of trees and outputs the results by voting. 14 In addition, it reduces variance and bias by using all variables collected and by automatically assessing nonlinear effects and complex interactions. 12 In addition to RSF, the support vector machine (SVM) is another supervised machine learning algorithm used for classification and regression. 15 It separates the data by constructing a hyperplane in a high-or infinite-dimensional space, and it can classify nonlinear data using the kernel trick. 16 Therefore, in this study, we propose three models, namely, the machine learning-based RSF and SVM models, and the classical CPH model, to identify predictors of survival and examine treatment outcomes in patients with HGG by integrating clinical and DVH parameters. In addition, we used the calibration plot, concordance index (c-index), and decision curve analyses to evaluate the calibration, discrimination, and clinical utility of these three models.

| Patients and treatment
in postoperative tissues. OS was defined as the number of months between the date of diagnosis and the date of death from any cause. For concurrent chemotherapy, TMZ was administered once daily (7 days/week) at a dose of 75 mg/ m 2 /day. For adjuvant chemotherapy, TMZ started 4 weeks after the completion of radiotherapy. Adjuvant TMZ was administered at a dose of 150 mg/m 2 during the first cycle (Days 1-5 per 28-day cycle) if no toxicity was observed. Then, TMZ was administered at a dose of 200 mg/m 2 from the second cycle onwards. 17 This project was approved by the Independent Ethics Committee of the Second Affiliated Hospital, Zhejiang University School of Medicine, and informed consent was obtained from all patients.

| IMRT planning and DVH feature extraction
All patients were immobilized in the supine position by a thermoplastic head fixation mask, followed by CT simulation. The generated CT images were fused with enhancement magnetic resonance imaging (MRI) images and delineated slice-by-slice. According to our protocol, gross tumor volume (GTV) was defined as the surgical cavity and any residual contrast-enhancing tumor on postcontrast T1-weighted MRI, ignoring any edematous region. 18 Clinical target volume (CTV) 1 was GTV plus a margin of 1 cm, while CTV2 was GTV plus a margin of 2 cm. In addition, both were modified to avoid barriers of spread (i.e., bone, falx cerebri, ventricles). Then, the corresponding planning target volume (PTV) 1 and PTV2 were obtained by expanding the CTVs by 3 mm. The standard fractionation scheme was a dose of 60 Gy delivered in 30 fractions (2 Gy per day from Monday to Friday, 6 weeks). The prescription dose was 60 Gy for CTV1 and 54 Gy for CTV2 by the use of the Eclipse 10.0 treatment planning system. 19 A 6-MV photon beam was provided by the Varian Trilogy LINAC with a multileaf collimator.
DVH features were collected from the initial treatment plans, including GTV, CTV1, CTV2, equivalent spherical diameter, minimal dose, maximal dose, mean dose, modal dose, STD, EUD, TCP, the dose that covered 1% to 99% of the CTV2 (D1-D99), and the percent volume of CTV2 that received the dose of 50 Gy to 65 Gy (V50-V65). Some of the DVH features for each patient are exported directly from DVH text files, and the rest of the features such as modal dose, STD, EUD, TCP, D1-D99 and V50-V65 of CTV2, were calculated from the DVH with a program written in MATLAB (MATLAB R2011b, Mathworks, Inc.). The modal dose is the most frequent dose in CTV2. STD is the standard deviation of the dose distributions in CTV2. EUD represents the equivalent uniform dose which leads to the same control probability as the nonuniform dose distribution. 20 Therefore, it can be used to compare the local control or radiobiological effect for different dose distributions. TCP is the EUD-based tumor control probability. 21 The EUD and TCP equations are as follows: where a is a specific parameter of the EUD model, the D i and v i data pairs are obtained from the differential DVH, the TCD 50 is the dose to achieve 50% tumor control, and γ 50 is a specific parameter to describe the slope of the dose response curve. In this study, TCD 50 and γ 50 were set to −10, 60, and 2, respectively. 20,22,23 All doses are described as the total dose per course.

| RSF model
To make the statistics easier to analyze, we transformed all the continuous variables into categorical variables by the maxstat package using R version 3.6.1 (R Foundation for Statistical Computing), which is a bioinformatics tool to determine the optimal cut point for one or multiple continuous variables. The included patients were randomly divided into training and testing cohorts. The data were sampled using random bootstrapping to generate a training dataset. To minimize classification error in the training data, the model randomly selected a subset of feature variables (m try ) to obtain the optimal result. The corresponding trees of the training set were repeatedly generated until the out-of-bag (OOB) error rate had stabilized. Then the RSF model selected the model with the lowest OOB error rate. On the basis of the RSF model, the predicted survival time and risk score of each patient was calculated. The optimal cutoff point of the corresponding patient risk scores with the smallest Kaplan-Meier log-rank was determined by the maxstat package in R software. We used the cutoff point to stratify patients into highand low-risk groups. Then, Kaplan-Meier survival curves of the high-and low-risk groups were generated using the survminer R package.

| SVM model
The SVM model was constructed based on the training dataset via the caret package in R software. Each included feature was ranked according to its predictive ability. To obtain the optimal number of features to build the model, a 5-fold cross-validation approach was applied in the training set, and the variable number was identified according to the crossvalidation accuracy. The established model was then used to predict the samples in the testing dataset.

| CPH model
In the CPH model, univariate and multivariate models were constructed to evaluate factors correlated with survival. Univariate and multivariate Cox regression analyses were conducted to generate hazard ratios (HRs) with confidence intervals (CIs). Then, we calculated the predicted 1-, 2-, and 3-year survival probabilities for each patient using the nomogram that was constructed on the basis of the CPH model.

ROC curve
Receiving operator characteristic (ROC) curves were used to assess the predictive accuracy of the different survival analysis models (CPH, SVM, and RSF models). The area under the ROC curve (AUC) was used as a performance metric for all the models. The ROC curve and AUC were obtained by using the timeROC package in R. The AUC was calculated to compare the discriminatory ability of the above models.

| Construction and evaluation of the nomogram model
The nomogram of each model was constructed via the rms package in R based on the selected features in the RSF, SVM, and CPH models. Calibration plots were constructed and the c-indexes were calculated to evaluate the predictive accuracy of the nomogram model. DCA, which compares benefit versus harm, was used to estimate whether the clinical utility of the prediction models would do more harm than good. The net benefits were quantified to determine the clinical utility of each model.

| Statistics
The Pearson χ 2 test was employed to investigate significant differences between the training set and testing set. DVH features were exported from the treatment planning system and calculated by MATLAB software. All statistical analyses and plotting were conducted using R version 3.6.1. A p value of <0.05 was considered statistically significant. All tests were two sided, and 95% CIs were used.

| Clinical and pathological characteristics of the included patients
A total of 95 patients were assessed as eligible for inclusion in this study by using the patient selection algorithm described in the Methods section ( Figure 1). The clinicopathological characteristics of these 95 patients are shown in Table 1. There were 26.3% patients (N = 25) with Grade III glioma and 73.7% patients (N = 70) with Grade IV glioma. Among them, 63.2% (N = 60) were younger than 60 years, and 36.5% (N = 35) were older than 60 years. Gender was almost evenly distributed, with 51.6% male patients and 48.4% female patients. In terms of the patients' functional abilities, which were evaluated by the KPS in this study, most (N = 84, 88.4%) had good functional status, with KPS scores of more than 60. Twenty-four percent of patients (N = 23) did not receive any adjuvant chemotherapy, 51.6% of patients (N = 49) received less than 6 cycles of adjuvant chemotherapy, and the remaining patients (N = 23, 24.2%) received more than 6 cycles of adjuvant chemotherapy. We also included pathological factors, such as MGMT expression, IDH1 R132H mutation, and Ki-67 expression. Specifically, MGMT expression in 82.1% of patients (N = 78) was negative, and most of the patients (N = 78, 82.1%) did not carry the IDH1 R132H mutation. Moreover, Ki-67 expression in 68.4% of patients (N = 65) was greater than 50%. The median OS was 38.3 months, ranging from 31.6 to 45.0 months.

| Characteristics of DVH features
For patients with HGG, the NCCN guidelines recommended surgery first, followed by radiotherapy and concurrent chemotherapy, and then adjuvant chemotherapy. The contouring and treatment planning are specifically described in the Methods and Materials section. Figure 2 depicts the contoured structures and dose planning. In this study, we extracted 39 DVH features from the initial treatment plans, including GTV, CTV1, CTV2, equivalent spherical diameter, minimal dose, maximal dose, mean dose, modal dose, STD, TCP, EUD, D1-D99, and V50-V65 of CTV2 (Table S1). Previous research illustrated the effects of DVH parameters on the biological outcomes of patients. To explore the association between DVH features and prognosis, we transformed these continuous variables into categorical variables to simplify the following statistical analyses (Table S1).

| Machine learning-based RSF model for predicting prognosis
The random forest method is a machine learning technique used for classification and regression, presenting several advantages over other predictive methods. Therefore, we used random forest to predict the prognostic factors of HGG patients. First, the entire dataset was split into two mutually exclusive datasets, with 60% assigned to the training set (N = 57) and 40% assigned to the testing set (N = 38). There were no statistically significant differences in terms of clinicopathological features or survival outcomes between the two sets (Table 1). Then, we obtained the optimal results with an m try value of 1, a nodesize value of 7, and an ntree value of 2000, which yielded a low OOB error rate of 21.11% in the training set ( Figure 3A and B). Finally, we identified six significant factors strongly associated with survival, whose minimal depth was lower than the threshold of 2.247. Of those, age was of the highest importance, followed by GTV, grade, KPS, IDH, and D99 ( Figure 3C and D).
Based on these variables, we generate our RSF model. Figure 4A and B shows the ROC curves of the training and testing sets at 1, 2, and 3 years . For 1-, 2-, and 3- (Table 2).
Based on the RSF model, we divided the patients into highand low-risk groups ( Figure 4C and D). The low-risk group had a longer OS than the high-risk group in both the training (HR = 9.075, 95% CI [3.603, 22.86], p < 0.0001) and testing sets (HR = 17.4394, 95% CI = 3.738-81.37, p < 0.0001). Overall, this machine-learning-based survival prediction model enables us to identify HGG patients who are at risk of poor outcomes.

| Machine learning-based SVM model for predicting prognosis
In addition to the RSF model, we also used another supervised machine learning-based model, the SVM model, to predict the prognosis of HGG. In the SVM model, a 77% prediction accuracy was achieved using a four-feature combination ( Figure 5A). Specifically, age, grade, GTV, and CTV1 were identified as prognostic predictors ( Figure 5B) Figure 5D) ( Table 2).

| Classical analysis of potential prognostic factors
To evaluate the performance of the machine learning-based models, we also used the classical multivariate CPH model to predict the survival of patients. As shown in Table 3, we found that age (p = 0.011), KPS (p = 0.017), grade (p = 0.017), and D90 (p = 0.049) were significantly correlated with OS. Specifically, older age (>60 years), higher grade (Grade IV), and high D90 dose (>5829.90 cGy) were related to poorer survival, with HRs of 2.902, 12.377, and 2.236, respectively, and a higher KPS score (>60) was related to improved survival  Figure 5B) ( Table 2).

| Assessment of the predictive capabilities of the RSF model
The performance of the model was verified by calibration and discrimination. The calibration plots matched well with the ideal 45-degree line, implying good consistency between the model predictions and actual observations of the 1-, 2-, and 3-year survival probabilities in the training and validation cohorts for the RSF (Figure S1), SVM ( Figure S2) and CPH models ( Figure S3). In terms of discrimination ability, to compare the performance of the three models, we calculated the c-index, which measures the concordance between the predicted risks and the actual survival, applied to both the training and testing sets. As shown in Table 4, the c-indexes of the training and testing sets were 0.824 and 0.847 for the RSF model, 0.792 and 0.823 for the SVM model, and 0.821 and 0.811 for the CPH model, implying the good performance of the RSF model ( Figure 6). Moreover, DCA calculates the net benefit to evaluate whether a model is clinically useful. The results in Figure 7 show that the RSF model offered the best clinical utility, with greater net benefits than the SVM and CPH models.
The results indicated that the RSF model has greater clinical utility than the SVM and CPH models in terms of survival prediction. Overall, the RSF model shows better performance and improvement over the SVM and CPH models.

| DISCUSSION
In this study, we used two machine learning-based models and a classical model to predict OS by analyzing patient clinical parameters and DVH parameters. The RSF model showed the best performance and improvement among the three models, with c-indexes of 0.824 and 0.847 for the training and testing sets, respectively, followed by the SVM model (0.792/0.823) and CPH model (0.821/0.811). RSF is one of the widely used methods of machine learning, and it bypasses the requirement of parametric or semiparametric constraints on the underlying distributions and automatically deals with high-level interactions and higher order terms in variables, to generate much more accurate predictions. 24 SVM is also a learning method for the classification of linear and nonlinear data. However, it was generally proposed for response prediction and was recently extended to survival prediction. 25 In our study, the RSF model also outperformed the SVM model in predicting survival. Additionally, the CPH model was not designed to predict outcomes but to infer the impact of variables on a survival curve, while machine learning is more suitable for making predictions. The predictive accuracy of the CPH model as calculated by the c-index was the lowest among the three models. In the RSF model, the AUCs were 92.4%, 87.7%, and 84.0% for predicting 1-, 2-, and 3-year survival, respectively, in the testing set, suggesting that the model had good predictive ability. Thus, we can divide patients into high-and low-risk groups to predict OS according to this model. These results demonstrate the possibility of identifying HGG patients who are at risk of having a poor outcome. This stratification of risk will help clinicians provide much more individualized interventions and strategies in cancer therapy. The higher accuracy and better performance of the RSF model make it highly valuable for predicting survival in HGG patients. In fact, a few researchers have expended great effort to develop a prognostic prediction model for this highly malignant cancer, proposing a survival prediction model for HGG based on MRI radiomic features combined with genetic and clinical risk factors. [26][27][28][29] Imaging features from fluorodeoxyglucosepositron emission tomography (FDG-PET) have also been used to predict survival in recurrent HGG. 30 This field has attracted increasing attention because of its noninvasive nature and easy access. 31 However, there are challenges in image analyses, including the lack of standardization in imaging acquisition, poor reproducibility, and complex quantitative features. 32 Hence, comparing results across institutions can be challenging, thereby limiting its clinical use. Therefore, we did not include the imaging features of HGG patients in this study. Instead, we used the DVH features combined with clinicopathological variables to construct a prediction model. Few studies make use of the DVH -based treatment plan, which is essential for HGG patients after surgery.
In the RSF model, feature selection identified age, GTV, grade, KPS, IDH, and D99 as the most important predictors of HGG. Among them, age, grade, and KPS are all wellknown prognostic factors and were unsurprisingly selected as significant parameters in the RSF model. 6 Regarding well-studied molecular parameters, such as IDH1 R132H mutation, 1p/19q chromosomal codeletion, and MGMT promoter methylation, genomic sequencing, fluorescence in situ hybridization and methylation-specific PCR are recommended for detection. 7,8 However, these detection methods can be cost-prohibitive and are not covered by basic medical insurance in China. Therefore, our oncology center used immunohistochemistry for IDH1 R132H mutation and MGMT expression. The 1p/19q chromosomal codeletion is not routinely detected, so we did not include it in this study. We identified the IDH1 R132H mutation as a significant variable in the RSF model but not in the SVM and CPH models, which indicates that the RSF model might be more appropriate and accurate to some extent. GMT expression was not identified in the three models. This may be because we used immunohistochemistry for its detection, which is neither accurate nor sensitive.
Furthermore, in the RSF model, GTV and D99 were identified as prognostic predictors. These results support the notion that dosimetric parameters can predict patient prognosis. GTV was defined as the surgical resection cavity plus any residual tumor. A higher GTV might indicate a larger F I G U R E 7 DCA curves for the clinical benefit and the corresponding scope of application of the three models in the training (A) and testing sets (B). The RSF model had greater net benefits than the SVM and CPH models in the testing set preoperative tumor volume. It is difficult to irradiate large tumors with sufficient doses due to the limited toxicity tolerance of adjacent normal tissues. In addition, tumor hypoxia is more pronounced in larger tumors and is associated with treatment failure due to decreased radiosensitivity. D99, also called the near-minimum absorbed dose, presents the dose that covers 99% of the target volume. The near-minimum absorbed doses, including D99 and D98, can be used to evaluate the uniformity of dose distribution 33 and are recommended when reporting treatment plans. 21,34 D99 is more accurate than the minimum absorbed dose, because the minimum absorbed dose is often located in a high-gradient region, making it highly sensitive to the dose calculation resolution and the accuracy of target volume delineation. In addition, D99 was a potential predictive parameter in the tumor control probability model and demonstrated a better correlation with clinical outcome. 35 We did not identify EUD or TCP, which are very important biophysical factors for predicting radiobiological effects and tumor control probability, as significant prognostic predictors in the RSF model. As biological optimization parameters, EUD and TCP are increasingly being applied to treatment plans, but their credibility is still questionable. The model parameters, such as TCD 50 and γ 50, are not constant and can vary according to tumor volume, intrinsic radiosensitivity, tumor heterogeneity, tumor hypoxia, and so on. In addition, the tumor is time varying during treatment, so these parameters might change over time in different regions and to different extents. As a result, it is difficult to predict survival using EUD and TCP models, and the results in this study at least confirmed the limitations and instability of these biophysical models. Tumor heterogeneity and radiobiological factors should be considered in EUD and TCP models in the future.
There are two major strengths of this study. First, there are many reports on the survival analysis of patients with HGG based on clinicopathological factors and data mining of imaging variables. However, only a limited number of studies have investigated the importance of DVH for survival. 36 In this study, clinical and DVH features were combined to construct a prognostic prediction model, and we found that radiation dose information can affect prognosis. Therefore, DVH parameters should be taken into consideration in future prognostic studies of HGG. Second, we compared the performance of different models in the survival prediction of HGG and demonstrated that the RSF model showed a higher performance with HGG patient data than the SVM and CPH models. The RSF model offers potential benefit to patients by stratifying their risk and guiding clinicians in developing much more individualized interventions and strategies. 12 Undoubtedly, some limitations also exist in this study. First, the sample size was relatively small, which led to higher AUCs for 1-and 2-year survival in the testing set than in the training set for the SVM and RSF models. 37 To compare the performance of different models and select the best one, we did not conduct repeated cross-validation, which is a useful alternative in machine learning for small sample size studies. 38 Second, this was a single institute-based study, without multi-institutional validation. Third, longer follow-up is needed, especially for Grade III glioma patients. Finally, molecular parameters detected by standard diagnostic methods and radiomics features should be included in future studies.
In summary, we identified age, GTV, grade, KPS, IDH, and D99 as important variables associated with survival in the RSF model. The machine learning-based RSF model, which integrates both clinical and DVH variables, is an improved and useful tool for predicting the survival of HGG patients. Additional multi-institutional studies with more patients are needed to assess the interactions between clinical and DVH features and prognosis.

ETHICS APPROVAL
This project was approved by the Ethical Committee of the Second Affiliated Hospital of Zhejiang University School of Medicine.

Consent to par ticipate
Informed consent to participate was obtained from all patients.

Consent for publication
Informed consent for publication was obtained from all authors.