Immature acute leukaemias: lessons from the haematopoietic roadmap

It is essential to relate the biology of acute leukaemia to normal blood cell development. In this review, we discuss how modern models of haematopoiesis might inform approaches to diagnosis and management of immature leukaemias, with a specific focus on T‐lymphoid and myeloid cases. In particular, we consider whether next‐generation analytical tools could provide new perspectives that could improve our understanding of immature blood cancer biology.


Acute leukaemia: a haematopoietic perspective
As acute leukaemia (AL) is caused by malignant proliferation of blood cells arrested at an immature stage of development, it is essential to relate leukaemia biology to normal haematopoietic processes. As well as providing a theoretical framework to understand oncogenesis, comparison with normal differentiation can provide rational avenues for tackling the pathological differentiation block. AL classification has traditionally been based on historical haematopoiesis models that included an early separation into myeloid and lymphoid lineages. This means that ALs are broadly categorised as either acute myeloid (AML) or acute lymphoblastic leukaemia (ALL), with the latter being further subdivided by Band T-lymphoid origin. In rare cases, ALs cannot be allocated to either lineage, so are classed as acute leukaemias of ambiguous lineage (ALAL). Modern haematopoietic models (Fig. 1) no longer include this early myeloid/lymphoid dichotomy, due to strong  [130,131] whereas LT-HSCs contribute to differentiation upon transplantation or other severe stress [132]. CMP, common myeloid progenitor; ETP, early thymic progenitor; GMP, granulocyte-monocyte progenitor; Imm. evidence that progenitors retain the potential to differentiate into several lineages much later than was initially realised. For example, various lymphoid cell types have been shown to have myeloid potential at even relatively advanced stages of differentiation in the bone marrow [1][2][3][4][5][6], and even the thymus [7,8].
Given that the haematopoietic roadmap has evolved, do we need to update the leukaemia cartography too? In this review, we will first discuss the limitations of current diagnostic practices in the light of contemporary knowledge of blood cell development. We will next examine how mutational genotype could earmark AL subsets that develop from similar immature progenitors. Finally, we will consider whether new tools to analyse normal haematopoiesis could provide different perspectives on leukaemia ontogeny.

Current practical challenges
Although classification now increasingly incorporates analysis of specific molecular alterations in leukaemic blasts, AL categorisation in clinical practice is still effectively determined by multiparameter flow cytometry (MFC) assessment of immunophenotypic resemblance to normal haematopoietic precursors [9]. Almost all ALs are classed as either myeloid, B-lymphoid or T-lymphoid, based on analysis of expression of cytoplasmic or surface antigens that are associated with each lineage.
Two main sets of diagnostic reference criteria have been described. First, the European Group for Immunological Classification of Leukaemias (EGIL) proposed a scoring system based on a broad panel of markers to define myeloid, B-or T-lymphoid lineage (Table 1) [10]. This system is weighted by the lineage fidelity of each antigen, with fewer points attributed to markers that are known to be commonly aberrantly expressed (e.g. CD7). MFC positivity is defined at a single threshold of blasts with positive staining in comparison with an isotype control (≥ 20% for surface antigens and ≥ 10% for cytoplasmic markers), meaning that the intensity of antigen expression in ALs can vary markedly. In contrast to the relatively broad range of EGIL-defined antigens, World Health Organization (WHO) guidelines emphasise a small number of key lineage-defining markers, in particular MPO (myeloid), CD19 (B-lymphoid) and CD3 (T-lymphoid), with no recommended positivity threshold (Table 2) [9,11,12].
These classification systems provide a theoretically simple and practical workflow to diagnose most ALs with relative certainty. However, it might be expected that leukaemias arising from transformation of nonlineage-restricted haematopoietic precursors ( Fig. 1) could have patterns of antigen expression that are not so easy to pigeonhole. A small but not insignificant minority (≤ 5%) of ALs cannot be clearly categorised as either myeloid or lymphoid leukaemia in practice. These cases pose a clinical conundrum, as therapies for AML and ALL are very different. There is little consensus on how best to treat these cases, and patient outlook is typically poor [13][14][15].
The EGIL have defined biphenotypic acute leukaemia (BAL) as leukaemias with one blast population scoring higher than 2 points in their scoring system for more than one lineage, and bilineal leukaemia if markers are expressed on distinct blast populations [13]. In contrast, ALs that do not meet lineage categorisation  criteria are termed acute leukaemia of ambiguous lineage (ALAL) by the WHO, with no distinction for biphenotypic or bilineal expression patterns [9,12,16]. The ALAL subgroup comprises acute undifferentiated leukaemias (AUL) that lack expression of any lineagespecific antigens, and mixed-phenotype acute leukaemias (MPAL) that express specific markers for more than one cellular lineage. Although AUL have no lineage-specific markers, they commonly express CD34, HLA-DR, CD38 and/or TdT and may therefore be difficult to distinguish from the diagnostic category of AML with minimal differentiation (equivalent to M0-AML FAB subgroup). While true AULs are extremely rare, MPALs comprise 2-5% of AL [17,18] and are further subdivided by cytogenetic abnormality, specifically t(9;22)/BCR-ABL1 and t(v;11q23)/KMT2A (previously MLL, see below). Most cases do not harbour these alterations, so are classed as MPAL not otherwise specified (NOS) [9]. MPALs are much more frequently T/myeloid or B/myeloid than B/T or B/T/ myeloid, which are both vanishingly rare [19]. These relative incidences are in keeping with modern concepts of multilineage competency in late lymphoid precursors (T/myeloid and B/myeloid) [1][2][3][4][5][6][7][8], whereas the existence of progenitors with exclusively B-/T-lymphoid oligopotency has not been proven. Even in more 'straightforward' cases, categorisation by MFC is still affected by technical factors, observer interpretation and choice of diagnostic criteria. For example, retrospective analyses of both paediatric and adult cohorts have revealed higher incidences of EGIL-categorised BAL than WHO-categorised MPAL, which is almost certainly due to the more stringent criteria of the latter system [18,[20][21][22]. Choice of diagnostic guidelines may therefore influence treatment allocation and inclusion on clinical studies, in turn introducing a bias in prognostic and genetic studies and therapy evaluation. Diagnosis may be further influenced by technical aspects of immunophenotyping, including sample quality, permeabilisation procedure, antibody clones, compensation and gating strategies. The final interpretation of antigen expression can also be heterogeneous, with positivity for some markers (especially MPO) being more subjective than others. Efforts have been made to reduce this variability by standardisation of MFC protocols and reagent panels, quality control and file sharing between expert groups [10,[23][24][25]. For example, the EuroFlow consortium has developed antibody panels [24] and standardised protocols [25] for diagnosis of haematological malignancies. More recently, these efforts have extended to development of automated gating strategies [26,27] that would further improve diagnostic reproducibility through more reliable and robust identification and characterisation of leukaemic blasts and normal blood cells. The AIEOP-BFM consortium has also recently provided recommendations for pragmatic thresholds for interpretation of immunophenotypic marker expression that included a consensus antigen panel designed to fulfil EGIL and WHO 2008/2016 requirements for AL subtyping [28]. Recent technological advances in cytometer machines, fluorochromes and informatics tools are also likely to allow better analysis of the phenotypic complexity of the leukemic population, which should also lead to more robust diagnostic practice in the future.

The molecular landscape of T/Myeloid leukaemias
Given that current MFC scoring approaches are not ideal for all, could assessment of other aspects of leukaemia biology aid diagnosis and treatment choice? ALs typically harbour mutations, deletions or translocations in factors that regulate normal blood cell development. Some of these molecules (usually transcription factors) have lineage-specific (or even precursor-specific) activities. We can therefore imagine that mutational genotype could earmark leukaemias with shared molecular mechanisms of dysregulation of differentiation processes, even in ALs that are phenotypically heterogeneous.
As we have seen, co-expression of T-lymphoid and myeloid antigens on AL blasts is not unusual. T/myeloid MPALs may be difficult to distinguish from early T-cell precursor (ETP)-ALL, which is defined phenotypically by absence of T-cell markers CD1a and CD8, weak or absent expression of CD5, and positivity for at least one haematopoietic stem cell and/or myeloid antigen (CD34, CD117, HLA-DR, CD13, CD33, CD11b, CD65). This subgroup was originally identified in paediatric cases by transcriptional proximity to normal murine ETPs, and frequent treatment resistance [29], and accounts for 12-15% of childhood T-ALL and 20-25% of adult cases [30,31]. In addition, the most phenotypically immature AML subgroup, M0-AML, often co-expresses lymphoid-associated antigens such as CD7 or TdT [32]. Distinction between these three diagnostic categories may therefore rely purely on determination of cytoplasmic CD3 and MPO positivity. To be provocative, is this immunophenotypic nit-picking useful? This is an important clinical question, as immature T-ALLs and AMLs are more chemoresistant and have poorer outcomes than more mature leukaemia subgroups [29,31,[33][34][35]. By considering the normal haematopoietic framework (Fig. 1), we can imagine that these cases could all potentially arise from transformation of similar multior bipotent lympho-myeloid progenitors. Indeed, it has been proposed that T/myeloid MPAL, ETP-ALL and cases of M0-AML with evidence of T lineage identity should be defined as a single diagnostic entity of acute myeloid/T-lymphoblastic leukaemia (AMTL) [36]. This argument is supported by the shared repertoire of genetic alterations observed in these leukaemias, which has been extensively catalogued in recent years [17,30,31,[37][38][39][40][41][42]. Selected examples of the molecular pathways affected by these mutations (shown diagrammatically in Fig. 2) will follow.
As might be expected, mutations and deletions in transcription factors that normally regulate T-lymphoid and myeloid gene expression programmes are commonly found in these AL subgroups. Notably, paediatric T/Myeloid MPALs were reported to harbour alterations in transcription factor genes in 100% of cases in a recent study [17]. The high frequencies of RUNX1 inactivating mutations targeting RUNT and transactivation (TAD) domains seen in each of ETP-ALL [30,31,37,43], AML-M0 [41,42,44] and T/myeloid MPAL [17,38] reflect the key role that the core binding factor transcription factor complex plays at diverse stages of myeloid and lymphoid differentiation [45,46]. Similar patterns are observed for alterations in genes coding for ETV6, with most mutations located in the PNT and ETS domain coding regions [37,38,[47][48][49], and WT1 mutations in the zinc finger and WT1 domains [17,40,42,44,49,50]. In line with its diverse and critical roles in T-cell development [51], GATA3 loss-of-function mutations segregate with immature cases of T-ALL [30,31] and are found in T/Myeloid, but not B/Myeloid MPAL [38]. Signalling molecule mutations are also frequent in these leukaemia subgroups. T/Myeloid MPALs and ETP-ALLs harbour particularly high rates of activating mutations in members of the IL7R-JAK-STAT phosphorylation cascade, causing marked effects on pathway activity and disease biology [52][53][54]. Kinase gene alterations that link to aggressive AML biology, such as FLT3 activation by either mutation or internal tandem duplication [37,42,55] and NRAS gain of function mutations (most frequently at codon 12), are also common in these cases [17,30,31,37,39,40,42,50,[55][56][57][58][59][60]. This is in contrast to patterns of signalling alteration seen in more phenotypically mature leukaemias. For example, NOTCH pathway hyperactivity due to either NOTCH1 activating mutations or FBXW7 inactivating perturbations is more common in cases of mature T-ALL, albeit high-throughput sequencing has revealed higher incidences of these mutations in ETP-ALL than were previously recognised [17,30,31,58,61]. Similarly, PI3K-AKT-mTOR signalling deregulation, often due to PTEN inactivating mutations that may link to altered leukaemia stem cell activity [62], is more frequent in mature T-ALL subgroups [63,64]. There is additional evidence of interplay between cellular signalling pathways and transcription factor action in AL. As an example, the activity of MEF2C, which is highly expressed in some cases of immature T-ALL [65], has been shown to be altered both directly and indirectly by MAPK and SIK kinases in a distinct subset of AML [66,67].
An association between lineage-ambiguous leukaemias and epigenetic dysfunction has long been established, since the initial descriptions of pathogenic translocations in the histone methyltransferase gene KMT2A (previously MLL, or mixed lineage leukaemia [68,69]), which are highly prevalent in infant AL [70]. As epigenetic factors are key regulators of normal haematopoiesis [71][72][73][74], it is to be expected that the molecular dysregulation caused by genetic alterations in these factors would be leukaemogenic. It is now evident that mutations and deletions in factors that chemically modify either DNA or histones are common in blood cancers [75][76][77][78] and that epigenetic alterations are linked to marked deregulation of gene expression in leukaemic cells [79][80][81].
Several studies have highlighted increased rates of Polycomb factor mutations and deletions in ETP-ALL compared with more mature T-ALL subgroups [30,31,57], and these are also found in T/Myeloid MPAL and M0-AML [17,30,38,39,42]. Specifically, these loss-of-function alterations usually occur in components of the polycomb repressive complex 2 (PRC2) that critically regulates developmental gene expression, including in the haematopoietic system [73,74], through methylation of a specific histone lysine residue (H3K27) that is normally associated with transcriptional repression [82][83][84][85]. In the context of T-ALL, loss of PRC2 function due to mutation has been linked to altered K27 methylation that correlates with increased NOTCH pathway activity in leukaemic cells [86]. In line with this, PRC2 has recently been shown to play a key role in the delineation of the chromatin landscape during normal thymopoiesis [87]. We recently showed that haploinsufficiency of core PRC2 components EZH2, EED and SUZ12 due to mutation or deletion was strongly linked to poor outcome in childhood AML [88]. Interestingly, PRC2-depleted AMLs in our series were more likely to show weak MPO staining and/or express CD7, suggesting that some of these cases may show at least some phenotypic proximity to T/Myeloid MPALs.
Factors that control the methylation state of DNA are also often subject to leukaemia-associated dysregulation. Mutations in the de novo DNA methylating factor DNMT3A have been detected across the Tlymphoid and myeloid leukaemic spectrum [17,38,39,42,49,50,57,89] and are associated with poor prognosis in both AML [90,91] and T-ALL [43,92]. DNMT3A alterations are linked to increasing age at disease presentation and probably arise on a background of clonal haematopoiesis in most cases [93,94]. In contrast, mutations in isocitrate dehydrogenase genes (IDH) 1 and 2, which affect DNA methylation through generation of the oncometabolite 2-hydroxyglutarate [95], are found across all age groups [38][39][40]42,50] and are found more commonly in M0-AML than in more mature AML subtypes [42].
As these epigenetic factors act across the genome, alteration of their activity will result in widespread and pleiotropic effects on AL transcription. Furthermore, it is likely that the leukaemic epigenetic 'state' is also heavily influenced by cellular lineage and precursor type, in the absence and presence of somatic mutations. As an example, DNA methylation patterns were reported to be highly divergent between T/Myeloid and B/Myeloid MPAL, in some cases overlapping with more differentiated ALs [38]. While T/Myeloid MPALs showed higher levels of methylation in general, T-lymphoid ontogeny was reflected by hypomethylation of T-receptor signalling factor genes. Also in line with these concepts of global epigenetic dysregulation, recent intriguing data [96] have suggested that some cases of AML have epigenomic landscape changes reminiscent of the MYB regulatory complex reorganisation that is also found in T-ALL [97,98]. The genome-wide effects of epigenetic factor alterations are likely to be further affected by co-occurring mutations. Some clues may be seen in the case of PHF6, a chromatin-binding protein that modulates the recruitment of lineage-specific transcription factors and chromatin-remodelling agents [99]. PHF6 is widely mutated in AL, with a strong predominance in T-ALL, where loss-of-function mutations are found in 16% of paediatric and 38% of adult cases [100]. The majority of PHF6-mutated T-ALLs show immature surface phenotypes, and analysis of clonal evolution and mutation dynamics using diagnostic and relapse samples identified these mutations as early events in T-ALL development [100]. Mutations are also found in 3% of AML, but in contrast to T-ALL, a majority are acquired in subclonal populations in the later phase of the clinical course [101]. Approximately 16-55% of mixed-phenotype acute leukaemia also harbour these mutations [102], particularly in the very rare T-B-MPAL group [49]. No direct functional evidence has been found to demonstrate whether PHF6 mutations contribute to pathogenesis, albeit there have been reports that PHF6 affects lineage plasticity [103] and that it may regulate expression of the key lymphoid transcription factor LMO2 [104]. Murine PHF6 knockout models develop haematopoietic neoplasms with various phenotypes after a long latency, demonstrating a tumour-suppressive function that may be linked to reduced thresholds for oncogenic transformation and increased frequency of leukaemia-initiating cells [49]. This long latency also suggests that PHF6 loss is not sufficient to induce overt leukaemia, and that cooperative mutations that are required for malignant transformation, and may also influence leukaemia phenotype. In support of this concept, PHF6 mutations are seen alongside NOTCH1 alterations in T-ALL [100,105,106], but not in other AL types. We have also found that PHF6-mutated AL transcriptional profiles might segregate with NOTCH1 genotype [107], raising the possibility that co-occurring mutations affect leukaemia phenotype by altering lineagespecific gene expression programmes. It is to be hoped that ongoing research into the role of PHF6 in leukaemia and normal haematopoiesis will shed light on these complex effects.

New technologies, new perspectives?
As well as providing a thorough description of leukaemia-associated mutations and deletions, next-generation sequencing technologies have generated extensive information about the transcriptional and epigenetic landscape of AL, which can offer further useful insights into the molecular roadmap of immature acute leukaemias.
It is well-established that gene expression profiling can be used to subcategorise cancers, including leukaemias according to shared transcriptional signatures [108,109]. Given that AL-related mutations and deletions tend to recurrently converge on specific molecular pathways that are often linked to haematopoietic differentiation, it can be predicted that leukaemic gene expression profiles might cluster according to the underlying mechanism of oncogenesis. In line with this, several independent studies have shown that unsupervised hierarchical clustering of T-ALL transcriptional profiles reproducibly group cases according to a limited number of expression signatures [65,110,111]. These clusters correlate closely with the stage of T-lymphoid differentiation arrest, and so typically identify a distinct immature/ETP-ALL subgroup. Other tools like gene set enrichment analysis (GSEA) compare enrichment for defined transcriptional signatures between different sample groups, so can therefore be used to measure the relative activity of specific molecular programmes, or to assess transcriptional resemblance to other cell types [112]. In the case of ETP-ALLs, GSEA has identified shared gene expression programmes with both normal haematopoietic stem cell (HSC) and immature myeloid precursors [37], thereby providing clues about the leukaemia cell of origin, and the potential molecular basis of the differentiation block. However, most traditional analytical techniques are hampered by several factors, including reliance on comparisons of predefined sample groups, and limited capacity to either account for transcriptional heterogeneity between individual leukaemias, or to interpret and resolve relationships between the subgroups that are identified.
Once again, we should consider how analysis of normal haematopoiesis might inform our understanding of leukaemia. High-throughput single-cell sequencing approaches have yielded many important insights into normal blood cell development in recent years [113,114]. These technologies have stimulated the development of a range of new analytical and visualisation tools to help unravel the complex data they generate, and some of these methods have already been applied to AL data sets. One example is stochastic neighbour embedding (SNE). As with other dimensionality reduction techniques such as Principal Component Analysis (PCA), SNE reduces complex information to a small number of dimensions, so can therefore help to visualise clusters of biologically similar samples in large multidimensional data sets, in a way that generally agrees with dedicated clustering algorithms [115]. SNE has been used for single-cell transcriptomic analysis [116], but can also be applied to leukaemia bulk gene expression profiles [117][118][119][120][121][122]. For example, t-SNE was used together with predictive modelling to help discover novel subgroups of paediatric B-ALL linked by dysregulation of the B-lymphoid transcription factor PAX5 [120]. In the context of immature T/Myeloid leukaemias, an analysis of ETP-ALLs MPALs and AMLs was reported to identify molecular subgroups distinguished by the presence of either FLT3 or PRC2 alterations, which were associated with differences in patient prognosis [119]. Recent emerging data showed that t-SNE helped to identify a novel immature AL subset characterised by enhancer hijacking of the T-lymphoid regulator BCL11B [122]. Overall, these data suggest that SNE can help to reveal biological relationships between ALs that could provide novel insights into leukaemogenesis.
Iterative Clustering and Guide Gene Selection (ICGS) provides another example of a tool that was initially developed for single-cell transcriptional analysis [123], but which might also be used for leukaemia profiling.
Briefly, ICGS relies on clustering samples using a limited number of 'guide genes' that are sufficient to define more extensive gene expression patterns, including lineage-related programmes. Serial clustering and refinement of these guide genes allow case grouping according to shared transcriptional modules, thereby identifying common expression signatures between sample subsets. Importantly, analysis of haematopoietic progenitors has shown that ICGS can infer ontogeny relationships between cellular developmental states [123]. We recently used ICGS to analyse the transcriptional profiles of a large cohort of ALs that included a high proportion of immature T-lymphoid and myeloid leukaemias [107]. In contrast to traditional clustering methods, we found that ICGS could reveal a continuum of haematopoietic differentiation between T-lymphoid and myeloid cases, with a significant subset of cases grouping together at the T-lymphoid/ myeloid interface (Fig. 3). We used GSEA to show that these 'interface' acute leukaemias (IALs) shared gene expression programmes with a variety of multior bipotent myeloid and lymphoid hematopoietic precursor populations, including the most immature and rare CD34+CD1a-CD7-subset of human ETPs. Surprisingly, we found that IALs originally diagnosed as AML had transcriptional resemblance to lymphoid progenitor populations by GSEA, raising the possibility that these cases harbour latent lymphoid identity that cannot be detected by current diagnostic immunophenotyping panels. We further found that the expression of IAL transcriptional programmes correlated with poor prognosis in independent AML cohorts [124,125], suggesting that myeloid-directed therapies might not provide optimal treatment for these cases. Of note, while ETP-ALLs and M0-AMLs were more likely to cluster at the lineage interface in our analyses, IALs were not limited to these cases, and somatic mutational genotypes showed only partial correlation with known alteration patterns in immature ALs. As well as transcriptional profiling, it is now possible to use high-throughput approaches to investigate many other biological parameters at single-cell resolution, thereby creating opportunities to perform a highly granular and holistic analysis of transcriptional, epigenetic and phenotypic relationships in both normal and leukaemia data sets. As an example, a recent study used CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing [126]) and ATAC-seq (assay for transposase-accessible chromatin using sequencing [127]) to analyse normal haematopoietic precursors and MPALs at a single-cell level [128]. Crucially, integration of single-cell immunophenotypic, transcriptional and epigenetic data allowed generation of a haematopoietic roadmap without specific boundaries between cell populations, thereby recapitulating the normal continuum of differentiation. The latter was visualised by UMAP [129], a dimensionality reduction method similar to t-SNE, but which better preserves potential hierarchical structures within the data set, and which is also well suited for projection of distinct independently acquired data. By projecting MPAL profiles onto this roadmap, the authors could interrogate differences in transcriptional and epigenetic programmes between leukaemia cells and the corresponding normal precursor, allowing them to identify a specific disruption in RUNX1-directed gene expression in AL samples. This transcriptional signature was further correlated with poor outcomes in patients with AML, providing a direct link between this very specific molecular dysfunction and aggressive leukaemia biology [128].

Conclusions
Improvements in AL outcomes, particularly in children, have been one of the success stories of modern medicine. These advances have been underpinned by careful dissection of leukaemia molecular biology that has developed with reference to normal blood cell development. While current classification criteria allow many patients to receive effective treatments, new haematopoietic paradigms should lead us to consider whether these systems should be modified for ALs that seem to straddle the lineage spectrum.
In contrast to categorical approaches, we have seen that ICGS, and other analytical methods for single cellomics analysis can identify similarities and deviations from the normal haematopoietic continuum in human leukaemias, which in turn could be used to discover novel molecular pathology. We hope that this knowledge could inform AL diagnosis in practice, either by incorporation of new molecular assessments, or through refinement of MFC approaches based on the information generated by these novel tools. In the case of immature T/myeloid leukaemias, there is already persuasive evidence that rational treatment strategies might target common molecular alterations in cases that are currently categorised heterogeneously in clinical practice.
In summary, we believe that ongoing reference to an up-to-date haematopoietic roadmap will help to provide a better diagnostic and treatment journey for patients with acute leukaemia.