Gene expression prognostic of early relapse risk in low‐risk B‐cell acute lymphoblastic leukaemia in children

Abstract ETV6::RUNX1 is the most common fusion gene in childhood acute lymphoblastic leukaemia (ALL) and is associated with favorable outcomes, especially in low‐risk children. However, as many as 10% of children relapse within 3 years, and such early relapses have poor survival. Identifying children at risk for early relapse is an important challenge. We interrogated data from 87 children with low‐risk ETV6::RUNX1‐positive B‐cell ALL and with available preserved bone marrow samples (discovery cohort). We profiled somatic point mutations in a panel of 559 genes and genome‐wide transcriptome and single‐nucleotide variants. We found high TIMD4 expression (> 85th‐percentile value) at diagnosis was the most important independent prognostic factor of early relapse (hazard ratio [HR] = 5.07 [1.76, 14.62]; p = 0.03). In an independent validation cohort of low‐risk ETV6::RUNX1‐positive B‐cell ALL (N = 68) high TIMD4 expression at diagnosis had an HR = 4.78 [1.07, 21.36] (p = 0.04) for early relapse. In another validation cohort including 78 children with low‐risk ETV6::RUNX1‐negative B‐cell ALL, high TIMD4 expression at diagnosis had an HR = 3.93 [1.31, 11.79] (p = 0.01). Our results suggest high TIMD4 expression at diagnosis in low‐risk B‐cell ALL in children might be associated with high risk for early relapse.


INTRODUCTION
In childhood acute lymphoblastic leukaemia (ALL), 'early relapse' is often defined as '< 3 years after diagnosis' and is more challenging to salvage compared to late relapse [1,2].After re-induction, these children may receive a haematopoietic cell transplant, but survival remains poor [2].Early and late relapses have different gene expression patterns and are two distinct types of disease progression [3,4].
We identified high TIMD4 expression at diagnosis as a prognostic factor for early relapse risk and validated our finding in an independent cohort of 68 children with low-risk ETV6::RUNX1-positive B-cell ALL and another independent cohort of 78 children with low-risk ETV6::RUNX1-negative B-cell ALL.A note on the inclusion of initial WBC count in risk-stratification criteria: The National Cancer Institute (NCI) standard risk is defined as age between 1 and 10 with WBC < 50 × 10E+9/L [23].WBC count, however, is reportedly not a prognostic factor in ETV6::RUNX1positive childhood ALL when treated on contemporary, more modern protocols [8].In contrast, WBC count is prognostic in subjects with hyper-diploidy [24].Accordingly, initial WBC count was not included as a criterion for low-risk in ETV6::RUNX1-positive childhood ALL in Total Therapy 16 and CCCG-ALL-2015 protocols [25,26].

Discovery cohort
This study was approved by the Academic Committee (IIT-NI2020001) and Ethics Review Committee (NI2020001-EC-

Detection of somatic point mutations
We did next-generation sequencing of 559 genes to identify somatic point mutations (Table S1).
Genomic DNA was extracted from bone marrow mononuclear cells (A) (B)

Transcriptome sequencing and gene-expression analyses
Bone marrow-derived mononuclear cells collected at diagnosis were lysed in TRIzol reagent (Invitrogen) and total RNA was extracted using the RNeasy MiniElute Cleanup Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions.The RNA-sequencing library was constructed using the NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs).Sequencing was performed on Illumina NovaSeq 6000 generating 150 bp paired-end reads, which were then aligned to the reference human genome hg19 using STAR (v2.6.1d)[32].For each sample, the results of RNA sequencing were deemed usable only if at least half of the reads were mapped to opening reading frames.The expression level of a gene was calculated as 'fragments per kilobase of transcript per million fragments mapped' (FPKM).Fusion transcripts were detected by STAR-Fusion [33] after STAR analysis and then filtered by excluding: (1) mappings to repetitive regions; (2) fusions of non-protein-coding genes; (3) mappings outside of reading frames; (4) junctions not at exon-intron boundaries; (5) fusions between homologs; (6) < 2 pairs of breakpoint-spanning reads or < 2 reads covering the breakpoint; and (7) frame-shifts.

Genome-wide SNV analyses
Genomic DNA from bone marrow samples was genotyped using the Infinium Human Asian Screening Array-24 v1.0 (Illumina) gene chip, which covers 750,000 genomic loci.Genotyping read-outs were analyzed by GenomeStudio software (Illumina).All samples passed the quality control (call rate > 95%).SNV calls were excluded if Illumina GenTrain quality score was < 0.8 or minor allele frequency was ≤ 5%.
We focused on SNVs located in autosomes.A final set of 335,206 SNVs were identified.Genotypes (AA, AB and BB) of individual SNVs were first converted to numbers (0, 1 and 2) and then normalized and analyzed using principal component analysis as described [34].

Validation cohorts
To

Determination of cut-off value for TIMD4 expression level
The lowest cut-off percentile value for classifying TIMD4 expression into a binary (high vs. low) with p < 0.05 were identified using Discovery Cohort.This ideal cut-off value was then applied to the validation cohorts.

Statistics
Statistical analyses were done in the R computing language.Single-and multi-variable relapse risk analyses used the Cox proportional hazards model, treating non-relapse mortality as right-censoring.Multiple testing was adjusted by the Bonferroni method [39].The false discovery rate (FDR) was set to be < 0.1 when conducting initial screenings on point mutations in the sequenced gene panel and genome-wide gene expression levels.Otherwise, (adjusted) p < 0.05 was considered statistically significant.To exclude the possibility our conclusion was a false positive occurring purely by chance, we also performed permutation analyses on Discovery Cohort by scrambling gene expression levels of the children for whom RNA-sequencing data were available.
Concordance ('C-statistic') between raw TIMD4 expression value (measured in FPKM) and early relapse time was calculated using the 'coxph' function in R.

RESULTS
From July 2015 to September 2020, there were 208 children (age ≤10) with low-risk ETV6::RUNX1-positive ALL treated on CCCG-ALL-2015 at the IHCAMS.One hundred and ten subjects who had available preserved bone marrow samples were used as Discovery Cohort (Figure 1A).Discovery Cohort's subject co-variates are displayed in Table 1.There was no significant difference between Discovery Cohort and the other ETV6::RUNX1-positive ALL cases that had no preserved bone marrow samples (Table S2).Note, however, that early-relapse risk was 9% in subjects with available bone marrow samples and 4% in subjects without bone marrow samples (p = 0.16; Table S2).This discrepancy could be caused by attendant biases when physicians decided whose bone marrow samples to preserve.Three-and five-year survival rates of Discovery Cohort were 98% and 96%, respectively (Table 1).Children with early relapses had significantly worse survival than the other children (p < 0.0001; Figure 1B).Unlike early relapses, most of the late relapses were salvageable as indicated by their 5-year survival rate of 100% (Figure 1B).controlling the FDR at < 0.1 (Figure 2A).In 12 children with WHSC1 point mutation, the mutation was E1099K (Figure 2C).

Neither of the first two principal components of whole-genome
SNVs was correlated with early relapse (p > 0.05).
Twenty-three children's RNA-sequencing results did not meet the criteria for quality control (Methods; Figure 1A).We were able to collect RNA-sequencing data on 87 children, seven of whom had early  3B).Out of the 12 candidate genes, only TIMD4's expression level was independently associated with early relapse (HR = 5.07 [1.76, 14.62] per FPKM; p = 0.0026; adjusted p = 0.03; Figure 3C).
To assess the possibility that TIMD4 was a chance false-positive hit, we conducted permutation analyses on Discovery Cohort.Based on 1 × 10E+5 trials of in silico experiments in which we scrambled TIMD4 expression level values of the 87 children for whom RNA-sequencing data were available (Figure 3D), we found FDR to be 1.4 × 10E-4; that is, only 14 out of 1 × 10E+5 trials of permutation experiments falsely identified TIMD4 expression as associated with early relapse.
In Discovery Cohort, the C-statistic between raw, quantitative TIMD4 expression level at diagnosis and early relapse was 0.68 (standard error [se] = 0.10).85th-percentile value was the lowest cut-off percentile value for classifying TIMD4 expression into a binary (high vs. low) with p < 0.05 (Figure 3E).Children with high (top 15%) TIMD4 expression at diagnosis had an HR = 4.87 [1.09, 21.83] (p = 0.04) for early relapse compared to the other children (Figure 3F).
We also conducted a discordance analysis between MRD-testing and TIMD4 expression (Table 3).Three early relapses in Discovery Figure 4A; Table S3).Subject co-variates at diagnosis in Validation Cohort #1 are displayed in Table 1.There were seven early relapses.

DISCUSSION
Previous studies reported associations between ETV6, CDKN1B, CDKN2A/B, CCNC, BCL2L14 and NR3C1 deletions and relapse [11,12,15,17,18].They, however, focused on identifying molecular signatures of relapse after relapse has already been detected.In contrast, we wanted to identify high-risk molecular signatures that are already present at diagnosis.Leukaemia clones in early relapse usually originate from sub-clones that are already present at diagnosis, with new mutations being accumulated after treatment [20,40,41].The presence of these sub-clones at diagnosis indicates the underlying cause of early relapse is a very early event that is possibly amenable to early detection, perhaps as early as at diagnosis.Assessment of early relapse risk at the time of disease presentation might also guide treatment decisions for preventing early relapse [42,43].
Multi-omics analyses could discover clinically relevant prognostic factors that might escape single-omics analysis [44][45][46].In the present study, we conducted a multi-omics analysis of 87 children with low-risk ETV6::RUNX1-positive ALL from China.We identified TIMD4 overexpression at diagnosis as a prognostic factor for early relapse risk and verified its validity using 153 low-risk subjects (68 children with ETV6::RUNX1 and 85 without ETV6::RUNX1) from the US.

TIMD4 (T-cell immunoglobulin and mucin domain-containing molecule
4) is a cell-surface glycoprotein in the TIM family of genes [47].In normal subjects, TIMD4 is exclusively expressed in antigen-presenting cells such as macrophages and is an important mediator of immune tolerance [47].Nevertheless, in diseased states, TIMD4 expression might not be limited to antigen-presenting cells.In fact, abnormal expression of TIMD4 has been reported in diverse cancer cells including diffuse large B-cell lymphoma (DLBCL), non-small-cell lung cancer, glioma, renal cell carcinoma and histiocytic sarcoma and usually associated with adverse prognosis [48][49][50][51][52][53].Over-expression of TIMD4 in DLBCL cancer cells, lung cancer cells and ovarian cancerassociated macrophages could promote cancer cell growth through mechanisms such as the Wnt/beta-catenin pathway and the oxidative phosphorylation pathway [50,[52][53][54].TIMD4 expression in tumourassociated myeloid cells such as macrophages and dendritic cells promotes autophagy of dying cancer cells and leads to attenuated antigen presentation and increased immune tolerance [55].Recently, TIMD4 blockade has been proposed as a potential strategy to enhance the efficacy of CD8 + T cell-based immunotherapies in cancers such as melanoma and lung cancer [56,57].Nonetheless, the molecular mechanisms underlying TIMD4 in leukaemias remain largely unknown.
Although in each sample of Discovery Cohort at least half of the mononuclear cells were blasts (presumably leukaemia cells), we cannot be certain whether TIMD4 over-expression occurred in the leukaemia cells or tumour-associated myeloid cells.
Gene expression profiling is becoming ever more important in the diagnosis and prognosis of leukaemia [58][59][60][61][62].Our results suggest testing for high TIMD4 expression in children with low-risk ETV6::RUNX1positive B-cell ALL, and plausibly also in low-risk ETV6::RUNX1negative cases, identifies those at high risk for early relapse.Changing therapy in these children might decrease early relapse risk if such a change is proven effective.
Our study has limitations.First, the number of early-relapse cases in the discovery cohort was small, and this increased the risk of false discovery in multiple-variable regression analyses.Second, the relapse rate of ETV6::RUNX1-positive subjects in our study was higher than the numbers reported in DCOG ALL10 and NOPHO ALL2008, and this discrepancy in CIR might raise concerns regarding the generalisability of our results [63,64].As we described above, one plausible explanation for our discovery cohort's apparently higher CIR is that physicians selectively preserved bone marrow samples of subjects they deemed higher-risk, but this cannot be verified.Third, although we did perform cross-validation of our findings, we could only confirm our results in validation-cohort subjects whose bone marrow samples had been sequenced, but the physicians' decision to sequence these patients might have attendant biases.Fourth, more validation effort is needed on additional subjects treated on a broader set of protocols (beyond CCCG-ALL-2015, COG AALL0232, COG AALL0331 and Total Therapy 16); until then, we cannot conclude TIMD4 expression level is a useful biomarker for prognosis in low-risk childhood ALL.
Fifth, our gene expression analyses were done on bulk populations of mononuclear cells isolated from bone marrow samples, and we do not know whether TIMD4 over-expression occurred in the leukaemia cells or other cells in the bone marrow.Finally, we did not conduct functional studies on the TIMD4 gene, and its role in early relapse remains unclear.
Mononuclear cells were isolated by density-gradient centrifugation using Ficoll-Paque media (TBDscience), viably frozen in fetal bovine serum (Gibco) with 10% dimethyl sulfoxide (Sigma-Aldrich) and stored at -196 • C. Bulk sequencing was conducted on the isolated mononuclear cells (more details are provided below).
using the Universal Genomic DNA Kit (CoWin Biosciences) according to the manufacturer's instructions and quantified by agarose gel electrophoresis and Qubit 2.0 Fluorometer (Life Technologies).Genomic DNA was ≥ 0.1 μg per sample.DNA was fragmented by sonication to a size range of 180-280 bp.Fragmented DNA was end-polished, A-tailed and ligated with adapters to generate DNA libraries using NEB Next Ultra DNA Library Prep Kit (New England Biolabs).Exon-containing fragments were captured, proliferated and purified using Agilent Sure-Select XT Custom Kit (Agilent Technologies) and then quantified with Agilent Bioanalyzer 2100 (Agilent Technologies).Paired-end sequencing was performed on Illumina NovaSeq 6000 (Illumina).The average sequencing depth was 832× (range 510-1604).

F I G U R E 2
Spectra of point mutations in Discovery Cohort.(A) Recurrent point mutations (mutation rate ≥ 2%).(B) Co-occurrence and mutual exclusion of point mutations.p-values were based on pair-wise Fisher's exact test.(C) WHSC1 point mutations.

Cohort were preceded by high TIMD4 expression at diagnosis without TA B L E 3
Discordance analysis in Discovery Cohort: TIMD4 expression at diagnosis vs. measurable residual disease (MRD) on day 46 (during early consolidation).cells indicate the number of early relapses.a positive MRD test on day 46, while three were preceded by a positive MRD test on day 46 without high TIMD4 expression at diagnosis.One escaped early detection despite both MRD-testing during early consolidation and measurement of TIMD4 expression level at diagnosis.Therefore, in several children with early relapses, the results of TIMD4expression-testing were more accurate compared with MRD-testing in prognosing relapse.Because there were potential ascertainment biases in Discovery Cohort, it is crucial to validate our finding in an independent cohort of children with low-risk ETV6::RUNX1-positive B-cell ALL.We interrogated a cohort of 68 children from the US (Validation Cohort #1;
Clinical characteristics of children in the discovery and validation cohorts.
[35]subjects in Discovery Cohort had MRD-test data 19 and 46 days after therapy started[35].MRD was quantified by TA B L E 1 validate the effect of TIMD4 expression level on early-relapse risk, [37,38] on COG AALL0232 had WBC > 50 × 10E+9/L (median = 78.6 × 10E+9/L; range, 51.6-117.6 × 10E+9/L), a factor that is reportedly not prognostic in ETV6::RUNX1-positive childhood ALL treated on modern protocols[8].The other 54 subjects met both the age and WBC criteria for NCI standard risk[23].Forty-five (66%) subjects had MRD-test data on day 46, of which eight (18%) had positive MRD.The estimated positive-MRD rates on day 46 were statistically indistinguishable between Discovery Cohort and Validation Cohort #1 (p = 0.82).To further investigate if TIMD4 expression has similar effect on relapse risk when there is no ETV6::RUNX1 fusion gene, we used a cohort of 78 children with low-risk (based on subject characteristics at diagnosis according to CCCG-ALL-2015 criteria) ETV6::RUNX1negative B-cell ALL ('Validation Cohort #2′; Table1), of which 26 were treated on COG AALL0331 and 52 on Total Therapy 16[37,38].All the subjects in Validation Cohort #2 were low-risk at diagnosis according to the CCCG-ALL-2015 or Total Therapy 16 criteria, and all were NCI standard-risk.Fifty-one (65%) subjects had MRD-test data on day 46, of which five (10%) had positive MRD.The estimated positive-MRD rates on day 46 were statistically indistinguishable between Discovery Cohort and Validation Cohort #2 (p = 0.13).
Additional in-frame fusion genes detected by RNA-sequencing in Discovery Cohort.