Patient‐based prediction algorithm of relapse after allo‐HSCT for acute Leukemia and its usefulness in the decision‐making process using a machine learning approach

Abstract Although allogeneic hematopoietic stem cell transplantation (allo‐HSCT) is a curative therapy for high‐risk acute leukemia (AL), some patients still relapse. Since patients simultaneously have many prognostic factors, difficulties are associated with the construction of a patient‐based prediction algorithm of relapse. The alternating decision tree (ADTree) is a successful classification method that combines decision trees with the predictive accuracy of boosting. It is a component of machine learning (ML) and has the capacity to simultaneously analyze multiple factors. Using ADTree, we attempted to construct a prediction model of leukemia relapse within 1 year of transplantation. With the model of training data (n = 148), prediction accuracy, the AUC of ROC, and the κ‐statistic value were 78.4%, 0.746, and 0.508, respectively. The false positive rate (FPR) of the relapse prediction was as low as 0.134. In an evaluation of the model with validation data (n = 69), prediction accuracy, AUC, and FPR of the relapse prediction were similar at 71.0%, 0.667, and 0.216, respectively. These results suggest that the model is generalized and highly accurate. Furthermore, the output of ADTree may visualize the branch point of treatment. For example, the selection of donor types resulted in different relapse predictions. Therefore, clinicians may change treatment options by referring to the model, thereby improving outcomes. The present results indicate that ML, such as ADTree, will contribute to the decision‐making process in the diversified allo‐HSCT field and be useful for preventing the relapse of leukemia.


| INTRODUCTION
Allogeneic hematopoietic stem cell transplantation (allo-HSCT) is an established therapy that is associated with a high rate of curability for acute leukemia (AL). [1][2][3] However, many patients still relapse after allo-HSCT, with common causes of death being relapse and leukemia-associated complications. 3,4 Since salvage therapy is limited for these patients, their prognosis is very poor, with a probability of long-term survival of <20%. 5 Thus, the establishment of prevention strategies against relapse after allo-HSCT is strongly needed.
Several pretransplant factors that may predict relapse after allo-HSCT were previously identified, such as patient backgrounds, including age, 6 the Refined Disease Risk Index (rDRI), 7 cytogenetic risk, 8 and the Hematopoietic Cell Transplantation-Comorbidity Index (HCT-CI). 9 Other technical components of allo-HSCT, including conditioning regimens, 10,11 the selection of graft sources, 3,12,13 HLA discrepancies, 14,15 and other components, 16,17 are associated with relapse after allo-HSCT. These prognostic factors have been evaluated with conventional statistical techniques, such as univariate and multivariate analyses, which are model (hypothesis)-driven techniques; they start with a model and assess whether the data fit the suggested model. 18 Although these techniques are popular and widely used in the analysis of medical records, they cannot simultaneously process multiple factors. Therefore, the complex network of multiple factors in a patient makes the patient-based prediction of relapse, which is generally useful in the bedside decision-making process regarding an indication for or the protocol of allo-HSCT, difficult.
The application of artificial intelligence (AI) to medicine, particularly machine learning (ML), a type of AI, has recently been attracting increasing attention. Multiple factors may be simultaneously analyzed, and AI may be applied to the examination of complex medical records. Since ML has the capacity to analyze multiple factors, we herein attempted to generate robust and accurate prediction models of relapse after allo-HSCT, which may be a useful tool in the bedside decision-making process to select a transplant method for reducing the relapse of leukemia.

| Patients
This analysis was a retrospective, data mining, and supervised learning study that included 217 AL patients. They underwent first allo-HSCT for AL at Niigata University Hospital (n = 148) and Nagaoka Red Cross Hospital (n = 69) between 1990 and 2016 and survived for more than 1 month after transplantation. The median follow-up of patients was 28.9 months (range 1.2-223.2 months). The diagnosis and classification of AL were based on criteria according to the WHO classification. 19,20 Among 217 patients, 135 had acute myeloid leukemia (AML) and 82 had acute lymphoblastic leukemia (ALL). The median age of patients at allo-HSCT was 38 years (range 16-67 years old). To compare the risk of relapse, patients were stratified based on rDRI. 7 (The definitions of rDRI and cytogenetic  risk have been excerpted from reference No. 7 in Table S1.) According to rDRI, 14 (6.5%), 121 (55.8%), 51 (23.5%), and 31 (14.3%) patients were at low (LOW), intermediate (INT), high (HI), and very high risk (VH), respectively. Donors were related for 97 patients (44.7%) and were unrelated for 120 (55.3%). Graft sources were peripheral blood stem cells (PBSC) for 47 patients (21.7%, including PBSCs from 22 haploidentical donors), bone marrow (BM) for 123 (56.7%), and cord blood (CB) for 47 (21.7%). Myeloablative conditioning was used for 169 patients (77.9%) and with reduced intensity for 48 (22.1%). Among 22 patients with haploidentical donor graft, thymoglobulin in 14 and post cyclophosphamide in eight patients were used as conditioning. None of the patients received T-cell-depleted grafts in the present study. The HCT-CI score was low (0, 1, 2) for 183 (84.3%) patients and high (≥3) for 34 (15.7%) (Detailed information is shown in Table 1).
The present study was performed in accordance with the Japanese Ethical Guidelines for Medical and Health Research Involving Humans and approved by the Ethical Committee of our facilities.

| ML and ADTree
The alternating decision tree (ADTree), one component of the ML approach based on AI, is a successful classification method. ADTree combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules. Boosting influences the node-weighted score (NW) by repeating the sample classification with each node and calculating errors and classification confidence each time. Moreover, it repeats re-weighting the training samples to focus on the most problematic factor. 21 (A more detailed principle is in reference No. 21).
Since ADTree learns previous data and predicts future classifications or discriminations, we used this algorithm in the present study as ML. ADTree was performed using WEKA software (Ver.3.9.1, Machine Learning Group at the University of Waikato, New Zealand, https ://www.cs.waika to.ac.nz/ml/ weka/index.html). The algorithm model was trained and tested using 10-fold cross-validation on the training data set (Niigata group) and validated again on the validation data set (Nagaoka group). The model evaluated the prediction accuracy and area under the curve (AUC) of the receiver operating characteristic (ROC) analysis, which discriminates the true prediction rate from a false prediction rate (FPR, also called the specificity). The tree was analyzed with the number of nodes between 6 and 11 and we adopted the number of nodes showing the highest κ-statistic value (Table S2).

| Other statistical analyses
Group comparisons for continuous or categorical variables were evaluated using the Mann-Whitney U test or Fisher's test. All times to events were computed from the date of transplantation. Overall survival (OS) was analyzed as the time until death or lost to the follow-up with the Kaplan-Meier estimator. The cumulative incidence of In the case of no remission after allo-HSCT, the time of relapse was defined when blasts circulating in peripheral blood or BM were detected. The cut-off value for age used in adjustments in the multivariate analysis was set to 40 years. 6 Univariate and multivariate analyses for CIR were performed using Fine and Gray models. Apart from ADTree, statistical analyses were performed using R-statistical software version 3.4.3 (The R Foundation for Statistical Computing) and EZR (Saitama Medical Center, Jichi Medical University), which is a graphical user interface for R. 22 The significance of differences was considered to be P < 0.05 with a two-sided test.

| The model constructed with ADTree was generalized and highly accurate
Since most cases of relapse occurred within one year in this cohort, the presence or absence of relapse within one year of allo-HSCT was set as learning content in ADTree to construct a prediction model. We selected seven factors for learning: age, diagnosis, rDRI, donor type, graft, the use of TBI, and the conditioning regimen, which were common prognostic factors, and all were identified prior to transplantation. GVHD and the progression of chimerism, which occur after transplantation, were intentionally excluded as analysis factors. The graphical output of ADTree from the training set (n = 148) is shown in Figure 2. The prediction accuracy, AUC of ROC, and κ-statistic value of this model were 78.4%, 0.746, and 0.508, respectively. Thirteen out of 97 patients who remained in remission in the first year were predicted to relapse (Detailed results are shown in Table 4); therefore, FPR of the relapse prediction was as low as 0.134. In an evaluation of the model with the validation set (n = 69), the prediction accuracy, AUC of ROC, and FPR of the relapse prediction were similar at 71.0%, 0.667, and 0.216, respectively. These results suggest that the model, constructed with ADTree, was generalized and highly accurate ( Table 5, Table S2).

| The branch point of therapeutic options by referring to the model
Each score besides that for nodes showed a prediction node weight (NW); NW < 0 means a lower relapse risk and NW > 0 a higher relapse risk. For example, among the same node level, donor type (node 4), an unrelated donor showed a lower risk (NW, −0.249) than a related donor (NW, 0.313), and this result was the same as that in the univariate analysis on CIR. The final judgment of the AL relapse prediction was performed by summing all the nodes through which it passed (NW sum). The NW sum > 0 predicted relapse and < 0 predicted no relapse in this model. According to the model, if ALL patients with rDRI HI receive allo-HSCT using the RIC regimen and a related donor, the NW sum is −0.742 < 0, which predicts no relapse. Moreover, if the diagnosis is AML, the NW sum is 0.147 > 0, which predicts relapse within 1 year of allo-HSCT. However, in the case of an unrelated donor, the relapse prediction changes; the NW sum becomes −1.475 (age ≤ 40 years) and −0.469 < 0 (>40 years), indicating no relapse (Figure 3).

| DISCUSSION
Historically, AI and ML were initially developed for image and voice recognition and were subsequently applied to the analysis of data sets of large volumes, such as purchase records. 18 AI and ML are now expected to handle and analyze complex medical records. When clinical study reports were searched using the following keywords on Pubmed: "machine learning", "diagnosis", and "prognosis", 18 reports in 2000 and 185 in 2010 were hit. Between 2015 and 2018, approximately 1000 reports were searched for each year.
Some groups in the hematology field also attempted to use AI and ML. Shouval et al analyzed the data of approximately 20 000 patients in the European Society for Blood and Marrow Transplantation with ADTree and succeeded in constructing a prediction model of early NRM after allo-HSCT. 23 They also evaluated the same data set and compared it with six other ML programs. All programs showed high predictability and versatility. 24 AI and ML have also been applied in the following fields: the morphological analysis of blood cells, 25 the identification of prognostic factors of ALL in childhood, 26 and the differential diagnosis of hematological diseases. 27 High CIR rates after allo-HSCT represent a clinical issue that needs to be resolved in adverse risk AL. [3][4][5] To improve outcomes, attempts are being made to develop strategies that reduce the risk of relapse. Many technical options are now available for allo-HSCT. 3,10-17 Furthermore, with the establishment of a safer method for elderly patients, 28 the number of patients indicated for allo-HSCT has increased. Since the technique of allo-HSCT is very diversified and complex, some clinicians may have difficulties selecting treatment options that improve the outcomes of each patient.

Factor
Hazard ratio (95% CI) P-value The concept of this analysis is "Changing therapy options to avoid leukemia relapse according to model predictions". We focused on what may be modified factors (such as conditioning or the graft source) and what are fixed factors (including age, disease status, or diagnosis) in the analysis prior to transplantation. Furthermore, a unique point in our analysis was the visualization of the branch point of treatment using ADTree (Figure 2). Although previous studies using AI or ML mainly aimed at discrimination and diagnosis, as described above, we herein attempted to construct patient-based treatment algorithms by applying AI. In high-risk AML, the branch point of therapeutic options from our simulation was the donor type ( Figure 3). By referring to the prediction results of ADTree, clinicians may change treatment options, thereby improving outcomes. In the present results ( Figure 2, at node 1), INT showed NW −0.565, whereas LOW together with HI and VH showed NW 0.447, indicating that LOW was a higher relapse risk than INT, which was unexpected. LOW patients generally do not need to receive allo-HSCT at first remission. LOW patients who received allo-HSCT failed first-line therapy, and may have a worse status than other patients; therefore, we speculated that ADTree judged LOW as a higher relapse risk than INT. This result suggests that ADTree provides us with different information and interactions from existing knowledge.
Medical records contain very diverse information. Patients have different backgrounds that are generally not ideal for statistical analyses. Classical statistical techniques require "noise" to be removed from data when medical records are analyzed. Model (hypothesis)-driven statistical techniques have identified many prognostic factors, but have been unable to adjust for each patient with individual factors, which are sometimes considered to be "noise". 18 These disadvantages have led to difficulties in the construction of "individualized transplantation therapy" for patients in clinical settings. One of the differences between the conventional method and ML is that the former focuses on proving "whether the hypothesis F I G U R E 2 Relapse prediction model; Graphical output. Each score beside nodes showed a prediction node weight (NW); NW < 0 means a lower relapse risk and NW > 0 a higher relapse risk. The final judgment of the AL relapse prediction was achieved by summing all the nodes through which it passed (NW sum). The NW sum > 0 predicted relapse and <0 predicted no relapse in this model In this prediction model, true-positive rate (TPR, also called the sensitivity) and false-positive rate (FNR, it shows miss rate) in relapse were not sufficiently. However, true-negative rate (TNR, also called specificity) was high-and false-positive rate (FPR, it means probability of false alarm) in relapse was very low. Therefore is true", whereas ML, such as ADTree, "attempts to explain previous data and predict the future". This difference is expected to be an advantage when using ML for the analysis of medical records. The limitation of the present study is that the volume of patient data for ADTree to learn was relatively small. The higher the amount of learning, the greater the prediction accuracy, and, thus, it is possible to construct a model that is more useful for the bedside decision-making process by clinicians. Furthermore, other factors, such as information on chromosome or genetic abnormalities and the chemotherapy protocol, HLA discrepancy rate, and the posttransplant maintenance therapy (donor lymphocyte infusion, azacytidine, et al) are needed to develop ADTree. The planned DLI and targeted posttransplant therapy may be effective at preventing relapse, but this study did not include the patients who received these therapies, so we could not evaluate their effects. The addition of social environments and educational history, which are complex factors, may also be beneficial. 29 The outcomes of allo-HSCT may vary among transplant centers. The present results suggest that ADTree is currently applicable to bedside decision-making in single institutions.
In conclusion, we attempted to generate robust and accurate prediction models of relapse after allo-HSCT that will contribute to preventing the relapse of leukemia. AI and ML, such as ADTree, may improve the decision-making process for therapy in the diversified allo-HSCT field. The usefulness of AI and ML is now being demonstrated, and further clinical applications are expected in the future.