Genome-wide analysis of cytogenetic aberrations in ETV6/RUNX1-positive childhood acute lymphoblastic leukaemia


Correspondence: Kjeld Schmiegelow, Clinic for Paediatric and Youth Medicine, The Juliane Marie Centre, The University Hospital Rigshospitalet, Blegdamsvej 9, DK-2100 Copenhagen, Denmark.



The chromosomal translocation t(12;21) resulting in the ETV6/RUNX1 fusion gene is the most frequent structural cytogenetic abnormality among patients with childhood acute lymphoblastic leukaemia (ALL). We investigated 62 ETV6/RUNX1-positive childhood ALL patients by single nucleotide polymorphism array to explore acquired copy number alterations (CNAs) at diagnosis. The mean number of CNAs was 2·82 (range 0–14). Concordance with available G-band karyotyping and comparative genomic hybridization was 93%. Based on three major protein-protein complexes disrupted by these CNAs, patients could be categorized into four distinct subgroups, defined by different underlying biological mechanisms relevant to the aetiology of childhood ALL. When recurrent CNAs were evaluated by an oncogenetic tree analysis classifying their sequential order, the most common genetic aberrations (deletions of 6q, 9p, 13q and X, and gains of 10 and 21) seemed independent of each other. Finally, we identified the most common regions with recurrent gains and losses, which comprise microRNA clusters with known oncogenic or tumour-suppressive roles. The present study sheds further light on the genetic diversity of ETV6/RUNX1-positive childhood ALL, which may be important for understanding poor responses among this otherwise highly curable subset of ALL and lead to novel targeted treatment strategies.

Acute lymphoblastic leukaemia (ALL) is the most common cancer in children (Hjalgrim et al, 2003). In B-lineage childhood ALL the most frequent structural cytogenetic abnormality, present in 25% of the cases, is the chromosomal translocation t(12;21)(p13;q22) resulting in the ETV6/RUNX1 fusion gene (Shurtleff et al, 1995; Forestier et al, 2008). The event-free survival (EFS) of childhood ALL patients after first line therapy is approximately 80% (Schmiegelow et al, 2010), however presence of the ETV6/RUNX1 fusion gene is generally associated with a more favourable prognosis, with an EFS of approximately 90% (Forestier et al, 2008). The ETV6/RUNX1 fusion gene is believed to represent an initiating event in the development of this childhood ALL subset. This initiating event, seen as early as in utero (Wiemels et al, 1999; Hjalgrim et al, 2002), most likely requires additional genetic changes in order to lead to leukaemia e.g. ETV6/RUNX1 transcripts have been demonstrated in normal cord blood samples at low levels (Mori et al, 2002; Lausten-Thomsen et al, 2011) and neither ETV6/RUNX1-positive knock-in mouse models nor carriers of ETV6/RUNX1-positive pre-leukaemic cells will all develop leukaemia (Fischer et al, 2005; Hong et al, 2008). Thus, the development of ETV6/RUNX1-positive leukaemia requires one or more secondary genetic changes and previous studies have indeed shown additional genetic changes in the majority of ETV6/RUNX1-positive childhood ALL patients (Forestier et al, 2007; Mullighan et al, 2007; Kawamata et al, 2008; Lilljebjorn et al, 2007, 2010; Parker et al, 2008; Mullighan et al, 2008; van Delft et al, 2011).

In the present study, we applied high-resolution Affymetrix GeneChip® 500K profiling to 62 ETV6/RUNX1-positive childhood ALL patients to explore in detail copy number alterations (CNAs) throughout the genome. Furthermore, we investigated the inter-dependencies of the recurrent CNAs and involvement in protein-protein complexes among them, and correlated the CNAs with biologically relevant microRNAs (miRNAs) affected by the aberrations.

Methods and materials


The ETV6/RUNX1-positive childhood ALL patients (aged between 1 and 15 years at the time of diagnosis) from Denmark, Norway, Sweden and Iceland were diagnosed and enrolled in the Nordic Society for Paediatric Haematology and Oncology (NOPHO) treatment protocols (NOHPO ALL-92 or NOPHO ALL-2000) (Schmiegelow et al, 2010). A total of 133 patients were found to be positive for ETV6/RUNX1 by fluorescent in situ hybridization (FISH) and/or reverse transcription polymerase chain reaction (RT-PCR) by routine investigation from 1996–2007, and diagnostic tumour DNA samples were available for 99 of these patients. Ten samples were of poor quality and/or had insufficient amount of DNA to be analysed and of the remaining samples, 62 fulfilled the quality criteria (QC) for inclusion in the bioinformatics analyses (Table 1). The study was approved by The Capital Region of Denmark Committee on Biomedical Research Ethics and The Danish Data Protection Agency.

Table 1. Patient characteristics and copy number alterations
  1. WBC, white blood cell count; SR, standard risk; IR, intermediate risk; HR, high risk; NOPHO, Nordic Society for Paediatric Haematology and Oncology; EFS, event-free survival; CNA, copy number alteration; Mb, megabase.

Male37 (59·7%)
Female25 (40·3%)
Age, years, median (range)4·17 (1·30–15)
WBC, ×109/l, median (range)11·5 (0·80–110)
Risk group
SR28 (45·2%)
IR23 (37·1%)
HR11 (17·7%)
Treatment protocol
NOPHO ALL-9233 (53·2%)
NOPHO ALL-200029 (46·8%)
Induction failure0 (0%)
Resistant disease0 (0%)
Relapse4 (6·5%)
Death in remission0 (0%)
Secondary malignancy0 (0%)
Total events4 (6·5%)
CNA (mean)
Gains > 1 Mb0·73 (0–8)
Gains < 1 Mb0·02 (0–1)
Losses > 1 Mb1·24 (0–8)
Losses < 1 Mb0·85 (0–6)
Total > 1 Mb1·97 (0–9)
Total < 1 Mb0·85 (0–6)
Total2·82 (0–14)

G-band karyotyping and comparative genomic hybridization

G-band karyotyping was performed as part of the routine investigation on short-term cultured leukaemic cells by standard techniques. As part of a separate study, high-resolution comparative genomic hybridization (CGH) was performed for a subset of the patients (Kristensen et al, 2003). G-band karyotyping and/or CGH data was available in 54 (87·1%) of the included cases (Table S1).

Single nucleotide polymorphism (SNP) array analysis

DNA from bone marrow or blood samples acquired at diagnosis was extracted and purified by sodium chloride and ethanol precipitation. For SNP array analysis the DNA was processed and hybridized to Affymetrix GeneChip® Mapping 500K array set according to the manufacturer's instructions (Affymetrix, Santa Clara, CA, USA). The Affymetrix GeneChip® Mapping 500K array set comprises two arrays and is processed by two assay kits differing only in the restriction enzyme used (Affymetrix GeneChip® Mapping 250K Nsp Assay Kit and Affymetrix GeneChip® Mapping 250K Sty Assay Kit). Briefly, 250 ng DNA was digested by either Nsp I or Sty I, followed by ligation of adaptors, allowing PCR amplification of fragments in sizes from 200 to 1100 bp. PCR products were fragmented and end-labelled with biotin. Samples were subsequently hybridized to the arrays. The arrays were washed and stained with phycoerythrin-conjugated streptavidin (SAPE) using the Affymetrix Fluidics Station® 450 and were scanned in the Affymetrix GeneChip® 2500 scanner to generate fluorescent images, as described in the Affymetrix GeneChip® protocol. Cell intensity files (CEL files) were generated in the GeneChip® operating software (GCOS) version 5.0.

Data processing

The signal intensity data based on the raw CEL files was generated using the Affymetrix Power Tools (APT) Software Package. First, for each array a QC call rate was generated using the Dynamic Model algorithm (Di et al, 2005). According to the manufacturer's recommendations, only samples achieving call rates ≥93% were used for further analysis. For the 62 samples and 200 controls fulfilling QC, genotype calls were generated using the BRLMM algorithm ( Allele-specific signals were extracted from median polish summarized and quantile normalized Perfect Match (PM) probe-intensity values. The R-GADA package (Pique-Regi et al, 2010) was employed to detect the recurrent CNAs. The package implements a segmentation algorithm to call CNAs based on genome alternation detection analysis (GADA). During the segmentation procedure the parameters controlling the trade-off between the sensitivity and false discovery rate (FDR) were set to achieve high specificity at the cost of sensitivity. The minimum segment length required for classification as a CNA was 8 probes, in order to exclude detection of false alterations due to extreme outliers. The 200 controls were used to exclude alterations representing the normal CNAs. Final estimation of copy number states was verified by visual examination.

Data visualization and biological correlations

The display of all the CNAs in circos format was done according to Krzywinski et al (2009). Recurrent CNAs were defined as gains or deletions of material occurring in the same region in at least two patients and plotted as a heatmap (Fig S1). The recurrent CNAs were then analysed further with a CRAN R package Oncotree 0·31 to determine a tree model for oncogenesis (Desper et al, 1999; Szabo & Boucher, 2002).

To explore functional modules that are potentially affected by the CNAs, enrichment analysis on protein-protein interaction complexes was performed using custom written Perl scripts. Only patients with losses of genetic material were included in this analysis, as these are likely to result in a direct loss of function and hence are most disruptive to the complexes. Using a curated collection of protein-protein complexes (Lage et al, 2007), we defined a set of complexes that categorized the patients into distinct subgroups defined by different underlying biological mechanisms. All combinations of 2–5 protein-protein complexes were tested to find a set of complexes representing the largest possible group of patients with the smallest possible overlap between different complexes. From the final collection of complex sets that best fulfilled the criteria, only those that comprised of genes highly expressed in B lymphoblasts were selected and prioritized according to leukaemia-related annotations among BioAlma terms assigned to the complexes. The miRNA annotations were based on miRBase version 16 (Griffiths-Jones, 2010) while their experimentally validated target information was collected from miRecords and miRWalk databases (Xiao et al, 2009; Dweep et al, 2011).


Genomic profiling of ETV6/RUNX1-positive childhood ALL patients

The SNP array analysis revealed a total of 174 CNAs (30 were recurrent) among the 62 successfully analysed cases (whole chromosome deletions and amplifications each counted as one cytogenetic change). Of these, 129 were deletions, 77 above 1 Mb (including two whole X chromosome deletions), and 52 focal deletions below 1 Mb. A total of 45 gains were detected, of which 44 were above 1 Mb (including 18 whole chromosome gains), and only one focal amplification below 1 Mb (Fig 1 and Table S1). Since the 500K arrays do not have any probes covering the p arms of chromosomes 13, 14, 15, 21 and 22, some large amplifications of these chromosomes may actually be trisomies. The mean number of CNAs per patient was 2·82 (range 0 –14) and 13 cases (21%) did not display any CNAs (Table 1 and Table S1).

Figure 1.

The recurrent cytogenetic alterations in ETV6/RUNX1 patients plotted in Circos format (Krzywinski et al, 2009). The chromosomes are presented clockwise with the p arm starting from 0, followed by the centromere marked in red and by the q arm. The gains are shown in red, and the deletions in blue. The location of the probes from the 500K arrays is shown in green. The location of genes found to be important in other microarray based ALL studies is marked by green lines in the middle of the plot. Note: the alterations are stacked together; so one circle does not necessarily represent one patient.

Concordance between G-band karyotyping and/or CGH and the SNP array analysis data was 93%. Thus, in 51 of the 54 cases with available data, the SNP array results either confirmed and/or added to G-band karyotyping and/or CGH (Table S1). As summarized in Fig 1, all but chromosome 17 demonstrated CNAs above 1 Mb. If focal lesions are included, all chromosomes demonstrated alterations. The most common CNAs above 1 Mb involved deletions of 12p (39%), 6q (13%), 9p (10%), 11q (10%), 13q (10%), 8p (8%) and amplifications of Xq (11%) (one female and six males), 21 (10%), 10 (5%) and Xp (5%) (two females and one male). Of the focal changes, the most common included deletions in 14q32·33 (21%), 7p14·1 (18%), 22q11·22 (13%), 14q11·2 (8%) and 7q34 (6%).

Biological correlations

Overall, neither individual aberrations nor heatmap clustering of the recurrent aberrations (Fig S1) correlated significantly with age or white blood cell count (WBC). In an attempt to classify the patients more distinctively, a systems biology approach was applied investigating the likely effects of losses of genetic material at the protein complex level. Grouping of patients in this way reflects more closely the clinical parameters and suggests different potential biological mechanisms underlying the leukaemogenesis (Data S1, Fig S2).

Next, the inter-dependency of the recurrent CNAs was investigated by an oncogenetic tree analysis to explore the sequential order of CNAs as previously done by Lilljebjorn et al (2010). By this approach, deletions of 12p, 14q32 and gain of 21q were classified as early events, i.e. being close to the root and acting as new roots of their own subtree (Fig 2). Importantly, even though gain of 12p (the most common additional aberration) was close to the root, it did not seem to be a required precursor for other changes. Furthermore, the most common aberrations in ETV6/RUNX1-positive patients (deletions of 6q, 9p, 12p, 13q, and X, and chromosome 10 and 21 amplifications) (Forestier et al, 2007), seemed independent of each other, i.e. the occurrence of one did not change the likelihood of occurrence of the others. Still, 9p deletions seemed to precede 8p deletions and chromosome X deletions were rooted in 8p deletion according to the oncogenetic branching model (Fig 2). The reported estimated false-positive error rate was 0·0007 and the false-negative error rate was 0·30.

Figure 2.

Oncogenetic tree. A probabilistic model of the sequential order of acquiring of the copy number alterations. The root of the tree represents the initial event - the t(12;21) chromosomal translocation. The numbers at each edge indicate the probabilities of transition along the given edge by the time of observation.

Finally, regions undergoing recurrent aberrations in the patients were also investigated for transcribed miRNAs. miRNAs can regulate protein-coding gene targets, thus they may play an oncogenic or tumour suppressive role, depending on their transcription levels, DNA methylation and CpG content of the surrounding regions, mutations and the genes they regulate (Data S2). The most frequently deleted regions, containing important miRNA clusters, were on 11q and 13q, while the most frequently amplified regions were on chromosomes 21 and X, comprising other miRNA clusters (Data S2 and Table S2).


The present study sheds further light on the genetic diversity of ETV6/RUNX1-positive childhood ALL. Overall, the mean number of alterations per patient (2·82, range 0–14) is in agreement with previous studies of ETV6/RUNX1-positive childhood ALL patients (Forestier et al, 2007; Mullighan et al, 2007; Parker et al, 2008; Lilljebjorn et al, 2010; van Delft et al, 2011). Higher specificity was prioritized in attempt to reduce the rate of false positive findings, at the cost of reduced sensitivity, especially with regard to detection of focal aberrations. Furthermore, due to the size of the study, some recurrent changes may have been missed if they did not occur in at least two patients. On the other hand, the strict criteria increased the reliability of the oncogenetic tree and systems biology analyses, and is furthermore supported by the high concordance with previous results of G-band karyotyping and CGH analyses. The larger alterations: 12p, 6q, 9p, 11q, 13q and gains of chromosome 21 and Xq occurred in at least 10% of the patients, supporting previous reports (Forestier et al, 2007; Mullighan et al, 2007; Kawamata et al, 2008; Lilljebjorn et al, 2007, 2010; Parker et al, 2008; Mullighan et al, 2008; van Delft et al, 2011). Moreover, gain of Xq is far more common in males than in females, which supports the previous findings of Lilljebjorn et al (2007); (Lilljebjorn et al, 2010) and could imply loss of X chromosome silencing of that specific region in females leading to similar gene dosage effects in both sexes. The affected gene(s) remains to be determined.

Importantly, these non-random aberrations should be interpreted by their putative biological consequence. Here we proposed an alternative grouping of patients based on integrative analysis of recurrent CNAs and involved protein-protein interaction complexes, which may suggest different underlying biological mechanisms. This classification reflects that deletions of different genes, often from different chromosomes, may lead to disruption of the same complex resulting in a similar phenotype. In most cases the loss of a gene, evidenced through CNAs, leads to perturbation of interaction with another gene with high connectivity in the complex, and therefore is likely to have a functional effect.

To speculate on the relationship and order by which these CNAs occur, we used a branching oncogenetic tree model. The model identified deletions on 12p, 14q32 and gain of 21q as important early leukaemogenic events secondary to the ETV6/RUNX1 translocation. Later, independent events include the most common changes in ETV6/RUNX1-positive childhood ALL patients, such as deletions of 6q, 9p, 13q and X and amplification of chromosome 10, suggesting these changes occur later in the development of the leukaemic clone. These implications need to be confirmed, if possible, in studies directly observing the sequential order of occurrence of CNAs. A recent study also demonstrated a similar model with data from 164 ETV6/RUNX1-positive childhood ALL patients (Lilljebjorn et al, 2010). The authors demonstrated independence between deletions on 6q, 9p and 12p, however with 6q and 9p closer to the root. Also amplification of chromosome 21 was far down a subtree rooting from 6q deletion. However, the two models cannot be compared directly because the resolution of analysed aberrations differed between the two studies. Furthermore, Anderson et al (2011) has also recently performed an oncogenetic tree mapping, however the study only explored pre-selected CNAs at single cell level in individual patients. Thus, the results and those from the present study are not directly comparable, but the nonlinear branching nature of the sequence of the acquired CNAs is alike.

Finally, we associated recurrent CNAs with miRNAs transcribed in those regions in order to elucidate the consequences on gene regulation of the investigated miRNAs targets. Here we observed deletions and amplifications in several chromosomal regions containing miRNAs known to be associated with tumourigenecity, leukaemia and regulation of apoptosis among others (Data S2). Thus, miRNAs may play a crucial role in childhood ALL by targeting a number of proteins and affecting various functional pathways.

Even though many clinical trials classify ETV6/RUNX1-positive childhood ALL as one group, their cytogenetic diversity calls for refinement in the classification of these patients, which ultimately may lead to a better understanding of their natural history (including determination of pre- and postnatal hits) (Greaves, 2006), the clinical presentation and their specific treatment requirements. Based on CNAs in protein-protein complexes, we described four groups of patients. This analysis could be further refined in the future with a larger number of samples and higher density arrays.


First of all, we are very grateful to the patients who participated in the study and their referring physicians. We also thank Hanne Maage and Charlotte Scherling for very helpful technical assistance. Acknowledgements are also made to The RH Microarray Centre/KB4111 – Department of Clinical Biochemistry – The University Hospital Rigshospitalet, Copenhagen for providing technology consultation and laboratory resources. This study has received financial support from Ministry of Health (Grant no. 2006-12103-250), The Novo Nordisk Foundation, The Danish Research Council for Health and Disease (Grant no 271-06-0278 and 271-08-0684), The Danish Childhood Cancer Foundation. Kjeld Schmiegelow holds The Danish Childhood Cancer Foundation Professorship in Paediatric Oncology.

Author contributions

LB, AW and KS designed the study, interpreted data and drafted the manuscript. LB performed the SNP array analyses in collaboration with RB and FCN, and collected patient samples and clinical data. AW performed the data processing, visualization and biological correlations. TJ performed the miRNA analyses, interpreted miRNA data and provided critical input to the manuscript. MKA was responsible for the G-band karyotyping and CGH data, while OGJ, PSW, FW and BMF provided patient samples. RG and KS contributed to the data analyses and interpretation, together with critical input to the writing of the manuscript. All authors approved the submitted and final versions of the manuscript.

Conflict of interest

The authors have no competing interests.