Common gene expression signatures in t(8;21)- and inv(16)-acute myeloid leukaemia


Dr M. Satake, Institute of Development, Aging and Cancer, Tohoku University, Seiryo-machi 4-1, Aoba-ku, Sendai 980-8575, Japan.


Human acute myeloid leukaemia (AML) involving a core-binding factor (CBF) transcription factor is called CBF leukaemia. In these leukaemias, AML1 (RUNX1, PEBP2αB, CBFα2)-MTG8 (ETO) and CBFβ (PEBP2β)-MYH11 chimaeric proteins are generated by t(8;21) and inv(16) respectively. We analysed gene expression profiles of leukaemic cells by microarray, and selected genes whose expression appeared to be modulated in association with t(8;21) and inv(16). In a pair-wise comparison, 15% of t(8;21)-associated transcripts exhibited high or low expression in inv(16)-AML, and 26% of inv(16)-associated transcripts did so equivalently in t(8;21)-AML. These common elements in gene expression profiles between t(8;21)- and inv(16)-AML probably reflect the situation that AML1-MTG8 and CBFβ-MYH11 chimaeric proteins affect a common set of target genes in CBF leukaemic cells. On the other hand, 38% of t(8;21)-associated and 24% of inv(16)-associated transcripts were regulated in t(8;21)- and inv(16)-specific manners. These distinct features of t(8;21)- and inv(16)-associated genes correlate with the bimodular structures of the chimaeric proteins (CBF-related AML1 and CBFβ portions, and CBF-unrelated MTG8 and MYH11 portions).

Human acute myeloid leukaemia (AML) is a heterogeneous group of diseases. Diagnosis of AML-subtypes is of clinical importance, as they vary in their responsiveness to therapy and prognosis. These subtypes are recognised by the morphology of leukaemic cells and classified using the French–American–British (FAB) system. Each FAB subtype corresponds to the differentiation blockage of leukaemic cells at a specific stage in a certain lineage. In addition, various types of chromosomal rearrangements are seen in AML, and a particular type of chromosomal translocation is sometimes associated with a FAB subtype. Among AML-subtypes, genes most frequently encountered in the clinic are those encoding a core-binding factor (CBF; PEBP2) transcription factor (Look, 1997; Speck & Gilliland, 2002). AML involving CBF is known as CBF leukaemia. The AML1 (RUNX1, PEBP2αB, CBFα2) and CBFβ (PEBP2β) proteins constitute both DNA-binding and non-binding subunits of a heterodimeric CBF. Due to chromosomal rearrangements, the AML1-MTG8 (ETO) and CBFβ-MYH11 chimaeric genes are generated in t(8;21)-AML with an M2 subtype and in inv(16)-AML with an M4Eo subtype respectively (Miyoshi et al, 1991, 1993; Erickson et al, 1992; Liu et al, 1993).

Functional studies using reporter assays indicate that the AML1-MTG8 and CBFβ-MYH11 chimaeric polypeptides exhibit dominant negative activity against CBF-dependent transcription (Peterson & Zhang, 2004; Shigesada et al, 2004). Furthermore, when cDNAs of AML1-MTG8 and CBFβ-MYH11 are knocked into the respective Aml1 and Cbfβ loci, knocked-in heterozygotes exhibit essentially similar phenotypes as those seen in Aml1 (−/−) and Cbfβ(−/−) mice (Castilla et al, 1996; Yergeau et al, 1997). These phenotypes do not include leukaemia but rather involve the failure to develop definitive-type haematopoiesis, in strong support of the hypothesis that chimaeric proteins function as dominant negatives. This observation suggests that the two chimaeric proteins modulate expression of a common set of target genes in CBF leukaemic cells.

Alternatively, AML1-MTG8 and CBFβ-MYH11 proteins could also modulate gene expression in a specific fashion, as the amino acid sequence of C-terminal portion of each chimaeric protein is not homologous to CBF or to the other. MTG8 belongs to the Nervy family proteins (Kitabayashi et al, 1998) and reportedly interacts with various factors including transcriptional co-repressors (Peterson & Zhang, 2004). On the other hand, the MYH11 portion of the chimaera represents a rod domain seen in smooth muscle myosin heavy chain protein.

Several attempts to correlate karyotypic classification of AML with the gene expression profiles have been reported. Such studies have focused on t(8;21), t(15;17), inv(16), 11q23-alteration and normal karyotypes (Schoch et al, 2002; Debernardi et al, 2003; Kohlmann et al, 2003; Bullinger et al, 2004; Ross et al, 2004; Gutierrez et al, 2005). A recent report classified all manifestations of AML into 16 distinct subgroups based on gene expression profiles (Valk et al, 2004). Given that the AML1-MTG8 and CBFβ-MYH11 proteins are structurally bimodular, this study aimed to examine whether these chimaeric proteins regulate common as well as unique sets of targets in CBF leukaemia. To do so, we analysed gene expression profiles of AML clinical samples by microarray and found that several genes were commonly regulated in t(8;21)- and inv(16)-AML, and that others were regulated in t(8;21)-specific and inv(16)-specific manners. Below we describe our approach and discuss implications of our results.

Materials and methods

Patient samples

A total of 50 paediatric AML patients were enroled in this study (Table SI). Among these, 45 were derived from 54 patients reported previously (nine of the 54 patients were excluded because their FAB subtype or karyotype information could not be obtained) (Yagi et al, 2003). The other five patients (U06, U09, U11, U20 and U21) were included in this study to increase the number of samples grouped as G2, G3 and G4 (see below). Four normal bone marrow samples enroled in this study were also described previously (Yagi et al, 2003). This study was approved by the ethics committee of the National Cancer Centre and conducted according to tenets of the Declaration of Helsinki. Informed consent was obtained from each patient.

Microarray and statistical analysis

The microarray used in this study was Human Genome U95Av2 (Affymetrix, Santa Clara, CA, USA) containing 12 566 probe sets. For the 45 previously reported samples, the scanned image data obtained previously (Yagi et al, 2003) were re-used. Microarray analysis of the newly included samples was performed as described (Yagi et al, 2003). The analysis included preparation of mononuclear cells from bone marrow or peripheral blood, total RNA isolation, monitoring RNA integrity, preparation of biotin-labelled cRNA from total RNA, hybridisation to the microarray, and washing, staining and scanning of samples. Scanned image data were processed using Affymetrix Microarray Suite software version 5.0, and an expression value (signal) of each probe set was calculated and normalised, such that the mean of signal values in each experiment was 100, to adjust for minor differences between experiments. Statistical analyses and fold change calculations were performed using expression values that were log-transformed after the addition of 10. Hierarchical clustering analysis and matrix presentation were performed using cluster and tree view software.

To validate selected genes, microarray data of adult AML reported by Valk et al (2004) were down-loaded from the NCBI Gene Expression Omnibus database (GSE 1159 in, and the data of 222 AML and eight normal samples were used after excluding data from 57 samples whose FAB subtype or karyotype information was not given. Promoter analysis, including prediction of a transcription initiation site, extraction of genomic sequence around the initiation site, and assignment of transcription factor-binding sites, was performed using the genomatixsuite software (Genomatix, Munich, Germany). To assign AML1-, POU4F1-, and HOXB2-binding sites, V$AML1.01, V$BRN3.01, and V$HOXA9.01 matrixes in the software were used respectively.

RT-PCR analysis

For semi-quantitative RT-PCR analysis, cDNA was prepared from 0·5 μg of total RNA, and one fiftieth of the cDNA was used as template for each PCR reaction. Forward and reverse primers were designed using genetyx mac 9.0/search primer software. Sequences of forward and reverse primers were: CD34, 5′-ATTTCCTGATGAATCGCCGC-3′ and 5′-GCCTTTCCCTGAGCCTCAGG-3′; CAV1, 5′-ACCTCAACGATGACGTGGTC-3′ and 5′-CAAGTTGATGCGGACATTGC-3′; CLIPR-59, 5′-GTCTTCGCACCAGCATCCCG-3′ and 5′-AGGTTTCTGATCCAGGGTTG-3′; and HOXA9, 5′-GCACCGCTTTTTCCGAGTG-3′ and 5′-GCGGTGTACCACCACCATC-3′. PCR using Amplitaq Gold (Applied Biosystems, Foster City, CA, USA) was performed using conditions appropriate for each transcript. PCR products were run on agarose gels.


Selection of t(8;21)- and inv(16)-associated genes

A goal of this study was to extract genes whose expression was modulated in the presence of t(8;21) and inv(16) and evaluate how many genes were similarly up or downregulated in t(8;21)- and inv(16)-AML. For this purpose, we used microarray gene expression data of 50 paediatric AML patients including eight cases of t(8;21) and seven cases of inv(16). Initially, unsupervised hierarchical clustering analysis was performed as shown in Fig 1, in which samples are identified together with their FAB-subtype. All eight t(8;21) samples and six of seven inv(16) samples (except sample S12) were clustered as distinct groups, indicating that t(8;21)- and inv(16)-AML constitute an independent AML subgroup. In addition, neighbouring of t(8;21) and inv(16) clusters suggests that there may exist a common element in their gene expression.

Figure 1.

 Unsupervised two-dimensional hierarchical clustering analysis of 50 acute myeloid leukaemia (AML) and four normal bone marrow samples. Expression data of 804 probe sets that were determined by the following two criteria were used. Namely, their median expression in AML samples was twofold higher compared with that in normal bone marrow samples, and their expression in the 5th highest AML sample showed more than a fivefold difference compared with those in the lowest 5th AML sample. Thus, these 804 sets represent those whose expression was highly varied across the samples. Each row represents a respective probe set, and each column a respective sample. Relative expression levels normalised to the average for each probe set were indicated by colour, where red and green represent increased and decreased expression respectively. At the top, the group number (G), FAB-classification (M) and the identification of the AML sample are indicated. Relationships between samples are shown by dendrograms. It must be noted that clustering analyses were performed using different sets of probes that were selected by varying the criteria. In most cases of those analyses, t(8;21) and inv(16) samples were clustered as distinct groups, and they often neighboured to each other. Peculiarly, the inv(16) sample, S12, always behaved as an outlier in the shown as well as not-shown clustering analyses. The reason for this is not obvious, and we did not notice any particular clinical nor karyotypic features of S12 other than AML-M4 with inv(16).

To extract t(8;21)- and inv(16)-associated genes, we first categorised AML samples into five groups, G1–G5, according to FAB subtype and karyotype [G1, M2 with t(8;21); G2, other M2; G3, M4 with inv(16); G4, other M4; and G5, other FAB subtypes]. Sample numbers in G1, G2, G3, G4 and G5 were 8, 5, 7, 7 and 23 respectively. The method of gene extraction employed is shown schematically in Fig 2A. First, G1 was compared with G2 and also with a combined group of G2 + G4 + G5 and extracted transcripts whose average expression in G1 was more than twofold higher with a P < 0·01 according to the Student's t-test. As both G1 and G2 belong to the same M2-subtype, transcripts extracted by comparing G1 and G2 are considered to represent those associated with the t(8;21) abnormality but not with the M2-subtype. Transcripts extracted by these two rounds of comparison were defined as t(8;21)-associated highly expressed transcripts. Similarly, inv(16)-associated highly expressed transcripts were extracted by two rounds of comparison between G3 and G4, and between G3 and the G2 + G4 + G5 group. t(8;21)- and inv(16)-associated low expression transcripts, whose average expression was more than twofold lower with a P < 0·01, were also extracted. G3 samples were not included to extract t(8;21)-associated transcripts, and G1 samples were not used to extract inv(16)-associated transcripts. Thus, t(8;21)- and inv(16)-associated genes were selected independently of each other. As summarised in Fig 2B, 59 t(8;21)-associated highly expressed transcripts, 58 inv(16)-associated highly expressed transcripts, 15 t(8;21)-associated low expression transcripts, and 18 inv(16)-associated low expression transcripts were selected.

Figure 2.

 (A) Schematic illustration of the gene extraction procedure. Test (G1 or G3) and reference (G2, G4 or G2 + G4 + G5) samples were subjected to pair-wise comparisons as indicated by the double-headed arrows. To extract t(8;21)-associated transcripts, comparison was made between G1 and G2 and between G1 and G2 + G4 + G5. Genes extracted by two rounds of comparison were defined as t(8;21)-associated. Inv(16)-associated transcripts were similarly extracted. Criteria of gene extraction were that average expression of test samples was significantly (P < 0·01) greater than twofold increased or decreased compared with reference samples. (B) Venn diagram comparison of transcripts extracted by the method described in A. Indicated are numbers of transcripts belonging to t(8;21)-associated highly expressed, inv(16)-associated highly expressed, t(8;21)-associated low expression and inv(16)-associated low expression transcripts, consisting of 59 (53 + 6), 58 (52 + 6), 15 (13 + 2) and 18 (16 + 2) respectively. (C) Modification of the extraction procedure. The starting material was genes shown in panel B, and each criterion defined in broken rectangles was applied to them. As a result, 17 common, 25 t(8;21)-specific and 15 inv(16)-specific highly expressed transcripts, and 6 common, 3 t(8;21)-specific and 3 inv(16)-specific low expression transcripts were reclassified. The numbers of transcripts shown in panels B and C represent numbers of probe sets.

Selection of commonly and specifically regulated genes

The gene extraction method shown in Fig 2A identified six highly expressed and two low expression transcripts associated with both t(8;21) and inv(16) (Fig 2B). A review of the extracted genes, however, suggested that the numbers of commonly regulated transcripts was probably underestimated. For example, 19 of the 58 inv(16)-associated highly expressed transcripts showed more than twofold greater expression not only in G3 but also in G1 (data not shown). This is not surprising, as the extraction procedure described above adopted the strict standard of P < 0·01. Therefore, we modified the gene extraction procedure as shown in Fig 2C with the goal of selecting transcripts both commonly and specifically up or downregulated in the presence of t(8;21)/inv(16). Therefore, in the common group, transcripts showing more than a twofold increase or decrease in expression at P < 0·05 rather than P < 0·01 were added to the previously described six and two common transcripts. Thus, 17 and 6 transcripts were defined as commonly increased and decreased in expression respectively. In addition, transcripts whose average expression in G3 was more than 1·25-fold higher compared with G4 or G2 + G4 + G5 were excluded from the t(8;21)-associated highly expressed transcripts, and the remaining 25 transcripts were defined as t(8;21)-specific highly expressed transcripts. Using similar modifications, inv(16)-specific highly expressed transcripts (n = 15), t(8;21)-specific low expression transcripts (n = 3), and inv(16)-specific low expression transcripts (n = 3) were defined.

According to our criteria, 8 of 59 t(8;21)-associated highly expressed transcripts were also highly expressed in inv(16)-AML, and 3 of the 15 t(8;21)-associated low expression transcripts were also low expression transcripts in inv(16)-AML. Similarly, 15 of 58 inv(16)-associated highly expressed transcripts were also highly expressed in t(8;21)-AML, and five of 18 inv(16)-associated low expression transcripts were also of low expression in t(8;21)-AML. Overall, 15% (11/74) of t(8;21)-associated transcripts exhibited similar expression in inv(16)-AML, and 26% (20/76) of inv(16)-associated transcripts did so in t(8;21)-AML. On the other hand, 38% (25 + 3/74) of t(8;21)-associated and 24% (15 + 3/76) of inv(16)-associated transcripts appeared to be regulated in t(8;21)- and inv(16)-specific manners respectively. In summary, these results indicate that there exists a significant number of commonly regulated transcripts in addition to those regulated specifically by either t(8;21)- or inv(16)-AML. Although the genes whose expression was unique to either t(8;21)- or inv(16)-AML have been reported repeatedly, common gene expression signatures to both AML-subtypes are the first demonstration.

It must be noted that in some cases multiple probe sets were used for a specific gene. Taking redundancy into account, the numbers of individually selected genes were as follows: 15 common, 21 t(8;21)-specific and 13 inv(16)-specific genes as highly expressed; and 6 common, 3 t(8;21)-specific and 3 inv(16)-specific genes as low expressed. Table I details a list of these genes and includes the ratios of average expression values.

Table I.   Transcripts whose expression was modulated commonly to t(8;21)- and inv(16)-AML and specifically to each.
GroupProbe setPublic IDGene symbolGene nameRatio
G1/G2G1/G2 + 4 + 5G3/G4G3/G2 + 4 + 5
  1. Values of ratio highlighted by pink and blue represent more than twofold high and low expression respectively.

HighCommon36650_atD13639CCND2Cyclin D2  2·46 2·22 3·73 2·84
HighCommon38747_atM81945CD34CD34 antigen  7·07 6·10 3·27 2·20
HighCommon538_atS53911CD34CD34 antigen  4·24 3·89 2·82 2·02
HighCommon479_atU53446DAB2Disabled homologue 2, mitogen-responsive phosphoprotein  2·53 2·29 2·37 2·28
HighCommon37762_atY07909EMP1Epithelial membrane protein 1  2·35 2·20 5·16 3·65
HighCommon38052_atM14539F13A1Coagulation factor XIII, A1 polypeptide  2·72 2·75 3·32 3·47
HighCommon39070_atU03057FSCN1Fascin homologue 1  3·06 4·15 6·25 2·94
HighCommon38833_atX00457HLA-DPA1Major histocompatibility complex, class II, DP alpha 1  2·85 2·92 3·10 2·22
HighCommon38095_i_atM83664HLA-DPB1Major histocompatibility complex, class II, DP beta 1  2·83 4·38 4·54 2·82
HighCommon38096_f_atM83664HLA-DPB1Major histocompatibility complex, class II, DP beta 1  2·82 3·98 4·37 2·67
HighCommon32773_atAA868382HLA-DQA2Major histocompatibility complex, class II, DQ alpha 2  2·25 2·28 2·43 2·21
HighCommon41723_s_atM32578HLA-DRB1Major histocompatibility complex, class II, DR beta 1  4·17 3·55 3·11 2·47
HighCommon37749_atD78611MESTMesoderm specific transcript homologue (mouse)  2·11 2·23 6·06 4·10
HighCommon37283_atX82209MN1Meningioma (disrupted in balanced translocation) 1  3·11 5·0123·0812·00
HighCommon35523_atAF150241PGDSProstaglandin D2 synthase, haematopoietic  4·11 9·33 5·22 3·13
HighCommon32905_s_atM30038TPSAB1Tryptase alpha/beta 1  9·5319·7226·3318·17
HighCommon32323_atM63582TRHThyrotropin-releasing hormone 21·7141·54 4·55 2·75
Hight(8;21)-specific34512_atJ03853ADRA2CAdrenergic, alpha-2C-, receptor  3·45 2·55 1·00 0·98
Hight(8;21)-specific37543_atD25304ARHGEF6Rac/Cdc42 guanine nucleotide exchange factor (GEF) 6  2·08 2·22 0·87 1·00
Hight(8;21)-specific41690_atAL049471ARID5BAT rich interactive domain 5B (MRF1-like)  2·52 2·79 1·16 1·03
Hight(8;21)-specific36119_atAF070648CAV1Caveolin 1, caveolae protein, 22 kDa 18·5715·99 1·09 0·99
Hight(8;21)-specific39038_atAF093118FBLN5Fibulin 5  2·92 3·29 1·07 0·99
Hight(8;21)-specific37251_s_atAF016004GPM6BGlycoprotein M6B  2·79 2·22 0·65 0·82
Hight(8;21)-specific32845_atM85289HSPG2Heparan sulfate proteoglycan 2 (perlecan)  4·04 2·53 1·10 1·05
Hight(8;21)-specific406_atX53587ITGB4Integrin, beta 4  2·66 2·17 1·17 1·12
Hight(8;21)-specific33226_atAB020683JMJD2BJumonji domain containing 2B  3·26 2·74 0·58 0·74
Hight(8;21)-specific39237_atU43784MAPKAPK3Mitogen-activated protein kinase-activated protein kinase 3  2·57 2·16 1·16 1·10
Hight(8;21)-specific38803_atAF052142NCALDNeurocalcin delta  2·37 2·20 1·20 1·14
Hight(8;21)-specific40081_atL26232PLTPPhospholipid transfer protein  2·07 2·02 1·03 1·07
Hight(8;21)-specific36980_atU03105PNRC1Proline-rich nuclear receptor coactivator 1  2·38 2·08 1·03 1·22
Hight(8;21)-specific35939_s_atL20433POU4F1POU domain, class 4, transcription factor 1  8·64 7·36 1·09 1·05
Hight(8;21)-specific35940_atX64624POU4F1POU domain, class 4, transcription factor 1120·0580·04 1·05 0·93
Hight(8;21)-specific32176_atAB011110RASA4RAS p21 protein activator 4  3·12 2·68 0·72 0·83
Hight(8;21)-specific35638_atD43638RUNX1T1/MTG8Runt-related transcription factor 1/MTG8 11·12 9·39 0·95 0·89
Hight(8;21)-specific36192_atD83777SCRN1Secernin 1  6·57 4·37 0·48 0·50
Hight(8;21)-specific41246_atAL040655SERPINE2Serine (or cysteine) proteinase inhibitor, member 2  2·40 2·45 1·25 0·98
Hight(8;21)-specific38118_atU73377SHC1SHC transforming protein 1  3·09 2·28 0·87 0·93
Hight(8;21)-specific38997_atX96924SLC25A1Solute carrier family 25, member 1  3·98 2·92 0·80 0·91
Hight(8;21)-specific38998_g_atX96924SLC25A1Solute carrier family 25, member 12·892·580·830·91
Hight(8;21)-specific39825_atU25147SLC25A1Solute carrier family 25, member 12·582·180·870·96
Hight(8;21)-specific1953_atAF024710VEGFVascular endothelial growth factor3·812·460·480·66
Hight(8;21)-specific36100_atAF022375VEGFVascular endothelial growth factor2·492·140·640·80
Highinv(16)-specific34246_atAA418437C6orf145Chromosome 6 open reading frame 1450·910·952·662·09
Highinv(16)-specific33569_atD50532CLEC10AC-type lectin domain family 10, member A0·670·853·553·25
Highinv(16)-specific36095_atN99340CLIPR-59CLIP-170-related protein1·141·105·434·88
Highinv(16)-specific40936_atAI651806CRIM1Cysteine-rich motor neuron 10·460·844·053·29
Highinv(16)-specific37187_atM36820CXCL2Chemokine (C-X-C motif) ligand 20·591·036·533·62
Highinv(16)-specific39610_atX16665HOXB2Homeo box B20·610·724·733·17
Highinv(16)-specific2058_s_atM35011ITGB5Integrin, beta 51·130·992·582·44
Highinv(16)-specific32582_atX69292MYH11Myosin, heavy polypeptide 11, smooth muscle0·920·922·873·11
Highinv(16)-specific37407_s_atAF013570MYH11Myosin, heavy polypeptide 11, smooth muscle1·051·083·574·06
Highinv(16)-specific31886_atX55740NT5E5’-nucleotidase, ecto (CD73)1·061·044·093·93
Highinv(16)-specific38862_atY11215SCAP1src family associated phosphoprotein 10·741·042·072·14
Highinv(16)-specific2092_s_atJ04765SPP1Secreted phosphoprotein 1 (osteopontin)0·830·842·892·22
Highinv(16)-specific34342_s_atAF052124SPP1Secreted phosphoprotein 1 (osteopontin)0·560·543·962·93
Highinv(16)-specific41531_atAI445461TM4SF1Transmembrane 4 superfamily member 10·731·175·234·54
Highinv(16)-specific892_atM90657TM4SF1Transmembrane 4 superfamily member 10·680·973·243·05
LowCommon40585_atD25538ADCY7Adenylate cyclase 70·390·300·390·48
LowCommon37105_atM16117CTSGCathepsin G0·070·260·030·14
LowCommon41448_atAC004080HOXA10Homeo box A100·240·340·330·47
LowCommon873_atM26679HOXA5Homeo box A50·400·350·180·32
LowCommon37809_atU41813HOXA9Homeo box A90·090·110·050·15
LowCommon32193_atAF030339PLXNC1Plexin C10·320·300·240·31
Lowt(8;21)-specific36802_atM23197CD33CD33 antigen (gp67)0·460·500·890·92
Lowt(8;21)-specific39664_atU28413ERCC8Excision repair deficiency, complementation group 80·370·501·320·95
Lowt(8;21)-specific1463_atM93425PTPN12Protein tyrosine phosphatase, non-receptor type 120·430·470·871·08
Lowinv(16)-specific41175_atL20298CBFBCore-binding factor, beta subunit1·261·150·250·30
Lowinv(16)-specific35282_r_atM33680CD81CD81 antigen2·111·070·260·29
Lowinv(16)-specific41396_atAB006629CYLN2Cytoplasmic linker 21·051·010·500·35

Evaluation of selected genes

Expression data of selected genes were processed for matrix presentation, as shown in Fig 3. For this analysis samples and probe sets were aligned by alphabetical and numerical order respectively, within the each group. Each member of G1 and G3 groups was clearly separated from other groups in terms of the expression of selected genes. Also, t(8;21)-specific and inv(16)-specific as well as commonly regulated genes, showed patterns unique to the relevant groups, indicating that the method of gene extraction functioned effectively.

Figure 3.

 Matrix presentation of expression of 69 selected transcripts representing 57 high- and 12 low-expression examples in 50 AML and four normal bone marrow samples. Each row represents a respective gene, and each column is a respective patient. Relative levels of expression are indicated by colour, where red and green represent increased and decreased expression respectively. FAB classification is indicated at the top, and grouping of gene expression is on the left. Gene symbols are shown on the right.

We next investigated whether the expression evaluated by the microarray analysis reflected RNA levels. To do so, four representative transcripts were chosen: CD34 from the common highly expressed group, CAV1 from t(8;21)-specific highly expressed genes, CLIPR-59 from inv(16)-specific highly expressed genes, and HOXA9 from the common low expression group. cDNA was synthesised from RNA from 17 samples and processed for semi-quantitative RT-PCR (Fig 4). Overall the relative level of each transcript in a respective sample paralleled the relative expression values obtained by microarray analysis (see also the legend to Fig 4).

Figure 4.

 Semi-quantitative RT-PCR. Relative amounts of CD34, CAV1, CLIPR-59 and HOXA9 transcripts were compared with G1, G2, G3, G4 and G5 groups. At the top, the group number (G) and the identification of the AML samples are indicated. Microarray expression values (signal values) are also shown beneath the gels pictures. The G5 samples shown here appear to be rather exceptional cases in terms of their expression of CD34 and CAV1. Due to a lack of sufficient amount of RNA, we could not process other than the samples shown for RT-PCR.

A control analysis was performed to evaluate the significance of common gene expression signatures that were found in t(8;21)- and inv (16)-AML. This was done by examining how many genes might be selected as commonly regulated between G2 and G4 subgroups. Each of G2 and G4 were used as test samples, and G1, G3 and G1 + G3 + G5 were used as reference samples. The gene extraction procedure used was (Fig S1) similar to that shown in Fig 2, and the selected genes are listed in Table SII. Only three transcripts (two genes) were selected as commonly high expressed, and no transcript was extracted as commonly low expressed (5 G2-specific and 26 G4-specific highly expressed transcripts and 10 G2-specific and 11 G4-specific low expression transcripts were selected at the same time). The gene number common to G2 and G4 was much smaller compared with that common to t(8;21) and inv(16) [23 transcripts (21 genes)]. This indicates the following: firstly, very few common elements exist in gene expression of G2- and G4-subgroups. Secondly, the above described 23 transcripts were not selected by chance but probably reflect common features of t(8;21)- and inv(16)-AML.

Validation of selected genes using a different set of microarray data

We next evaluated whether the selected genes were valid indicators of t(8;21)- and inv(16)-AML activities. To do so we employed another set of microarray data from AML patients reported by Valk et al (2004). Their data set consisted of 285 patients and contained information on FAB-classification and karyotype for each AML sample. We used the data of 222 of those patients and excluded the other 63 due to lack of FAB subtype and karyotype information. According to our classification, the numbers of AML samples were 20 in G1, 35 in G2, 14 in G3, 31 in G4 and 122 in G5.

Figure 5 shows a matrix presentation of the expression data. Each of the aforementioned six categories of genes again appeared to behave as a distinct cluster, suggesting that most genes selected were indeed modulated in the presence of t(8;21) and inv(16) activities. Genes selected as inv(16)-specific highly expressed, however, did not yield such reproducible results. Half appeared to be high in the G3 samples, whereas the other half did not. Although the reason for this discrepancy is not clear, the samples used in our study were from paediatric patients, whereas those in Valk et al (2004) were from adults. In the case of inv(16)-AML, factors regulating gene expression may differ between childhood and adulthood AML (see Fig 5 legend regarding the comparison of our analysis and that of Valk et al (2004).

Figure 5.

 Validation of selected genes using microarray data of adult acute myeloid leukaemia (AML) patients obtained using Affymetrix Human Genome U133A microarray provided by Valk et al (2004). The 222 AML samples consisted of 20 G1, 35 G2, 14 G3, 31 G4, and 122 G5 samples, and there were eight normal bone marrow samples including four fractionated CD34+ cell samples. Relative expression levels are indicated by colour, where red and green represent increased and decreased expression respectively. FAB-classification is at the top, and gene expression grouping is indicated on the left. Gene symbols are shown on the right. Valk et al (2004) reported the identification of t(8;21)- and inv(16)-unique transcripts in their Supplementary Tables M1 and I1. We compared their Tables with our Table I and evaluated the degree of overlap of selected genes. Among the 26 t(8;21)- and 28 inv(16)-unique genes reported by Valk et al (2004) (their original 40 and 40 probes shown in their Tables were reclassified to 26 and 28 genes respectively, after taking into consideration the redundancy of probes for a specific gene and a difference of their U133A and our U95Av2 microarray), 10 and 6 were also included in our list of t(8;21)- and inv(16)-specific genes respectively. This suggests that a significant number of identical genes were selected by two different analyses. In contrast, only one from 26 and one from 28 genes were contained in our gene list common to both t(8;21) and inv(16). This indicates that our method used for extracting common signatures worked efficiently.

Analysis of AML1-binding sites in a promoter region

Finally, we investigated whether the promoter region of a selected gene harbors AML1-binding sites. Only highly expressed genes were examined, as the number of low expression genes was relatively small. Putative AML1-binding sites were evaluated in a genomic region covering −2000 to +500 bp with respect to the predicted transcription initiation site, using genomatixsuite software. We determined the number of AML1-binding sites for a specific promoter and calculated the average ± standard deviation for highly expressed genes of the common, t(8;21)-specific and inv(16)-specific groups (Table II). As references, 44 and 46 probe sets were used whose expression values showed approximately the average and median respectively, of all probe sets on the microarray. As seen in Table II, a significant difference was not detected in the average number of AML1-binding sites between promoters of selected genes and those of the reference genes. POU4F1 and HOXB2, which encode transcription factors, were extracted as a t(8;21)-specific highly expressed gene and an inv(16)-specific highly expressed gene respectively. The numbers of POU4F1- and HOXB2-binding sites in t(8;21)-specific and inv(16)-specific highly expressed genes were the same as those seen in the reference genes. This result suggests that the number of putative AML1-, POU4F1- and HOXB2-binding sites in the promoter region cannot account for the expression levels of the extracted genes.

Table II.   Number of transcription factor-binding sites per gene.
GroupNo. of genesAML1POU4F1HOXB2
  1. Transcription factor-binding sites were predicted using the genomatixsuite software l. The average number of sites per gene was calculated and presented together with SD for each group of genes.

Commonly high expressed (A)151·60 ± 1·663·07 ± 2·571·13 ± 0·88
t(8;21)-specifically high expressed (B)171·24 ± 1·112·33 ± 2·471·56 ± 1·83
inv(16)-specifically high expressed (C)101·10 ± 1·302·10 ± 2·022·10 ± 1·70
A + B + C421·33 ± 1·392·57 ± 2·461·55 ± 1·59
Around average441·64 ± 3·033·50 ± 4·602·68 ± 4·40
Around median461·43 ± 1·334·37 ± 5·251·59 ± 2·00


To compare gene expression between t(8;21)- and inv(16)-AML, AML samples were classified as groups G1–G5 and a pair-wise comparison was performed between these groups. Because of the combinations of test and reference samples used here, the genes identified are probably associated with t(8;21) and inv(16), but not with the lineage/stage specificity of leukaemic cells as exemplified by the FAB-classification. The existence of a gene expression signature characteristic of CBF leukaemia was reported previously (see Fig 4 in Ross et al, 2004, although a common element in t(8;21) and inv(16) is not immediately clear). However, our approach is unique in that the t(8;21)- and inv(16)-associated genes were selected independently of inv(16)- and t(8;21)-AML respectively, and commonly modulated genes were selected by comparing t(8;21)- and inv(16)-associated genes. Thus, we demonstrate that t(8;21)- and inv(16)-AML exhibit significant overlap in gene expression signatures. This result indicates that AML1-MTG8 and CBFβ-MYH11 chimaeric proteins affect a common set of targets in leukaemic cells. In addition, our method identifies new genes not previously reported.

Detection of commonly regulated genes agrees with the notion that both AML1-MTG8 and CBFβ-MYH11 exert a similar dominant negative effect on wild type CBF. Furthermore, our identification of specifically regulated genes is in line with the prediction that each chimaeric protein may also have a unique activity. Promoter analysis of selected genes, however, suggests that regulation is complex. The number of putative AML1-binding sites in a predicted promoter region did not differ significantly between selected and reference genes, regardless of the type of genes selected as common or specific highly expressed genes. This observation was also true for the POU4F1- and HOXB2-binding sites for t(8;21)- and inv(16)-specific highly expressed gene promoters respectively. Several mechanisms could explain the relationship between transcriptional activity of a chimaeric protein and regulation of gene expression. One is that these proteins regulate promoters through sites other than AML1 binding sites by interacting with other transcription factors and/or co-factors. For example, AML1-MTG8 interacts with p300/CBP and interferes with transcription mediated by an E-box transcription factor whose binding site is distinct from that of AML1 (Zhang et al, 2004). Alternatively, although both chimaeric proteins could target the promoters of the same set of genes, environmental factors, such as cell lineage or developmental stage, that influence the magnitude of gene expression may vary for each chimaera. A chimaeric protein may also target an unidentified molecule, which in turn could modulate expression of extracted genes.

Of selected genes, MTG8 and MYH11 are of interest. The probe sets for these genes correspond to the MTG8 portion of the AML1-MTG8 chimaeric transcript and to the MYH11 portion of the CBFβ-MYH11 chimaeric transcript respectively. The fact that the probe sets for MTG8 and MYH11 were selected as t(8;21)- and inv(16)-specific highly expressed genes respectively, indicates that our gene extraction procedure worked efficiently. Expression of MTG8 and MYH11 have been assigned repeatedly as the most characteristic signatures of t(8;21) and inv(16) respectively, in other microarray analyses (Schoch et al, 2002; Debernardi et al, 2003; Kohlmann et al, 2003; Bullinger et al, 2004; Ross et al, 2004; Valk et al, 2004; Gutierrez et al, 2005). Immunophenotyping studies of t(8;21)- and inv(16)-leukaemic cells document CD34 and HLA-DR as specific markers that may reflect the immaturity of cells (Hurwitz et al, 1992; Osato et al, 1997). Identification of corresponding transcripts as common highly expressed genes is probably the basis of this immunophenotype of cells. High expression of CCND2 also may confer a growth advantage on leukaemic cells, although ectopic expression of CBFβ-MYH11 in 32Dcl3 and Ba/F3 cell lines has been reported to retard the G1 to S transition and stimulate expression of p21WAF1 (Cao et al, 1997; Lou et al, 2000). It also is noteworthy that many genes related to endothelial cells and signal transduction were extracted as t(8;21)-specific highly expressed genes. The endothelial group includes VEGF, FBLN5, and ITGB4, whereas the genes that encoded signalling factors were VEGF, ADRA2C, SHC1, CAV1, RASA4, ARHGEF6 and MAPKAPK3. Also potentially critical are genes that were extremely highly expressed, including TRH, TPSAB1, MN1, POU4F1 and CAV1.

Among the low expression genes, HOXA5, HOXA9 and HOXA10 were categorised as common genes. Enhanced expression of HOXA9 is well established in many AML-subtypes including those with normal karyotypes (Lawrence et al, 1999; Debernardi et al, 2003; Bullinger et al, 2004; Gutierrez et al, 2005), and a NUP98-HOXA9 chimaera is generated in t(7;11)-AML (Nakamura et al, 1996). Nevertheless, the average level of HOXA9 transcripts in G1 and G3 was as low as 10% of that seen in other groups (evidenced by quantification of the results in Fig 4). Thus, an interesting possibility is that a pathway mediated by HOXA9 is not necessary or may even be antagonistic to the molecular mechanisms involved in t(8;21)- and inv(16)-AML. Low expression of CBFβ in G3 but not G1 may mean that the CBFβ-MYH11 protein negatively autoregulates expression of the wild type CBFβ allele.

In conclusion, we extracted and categorised t(8;21)/inv(16)-associated genes as t(8;21)-specific, inv(16)-specific, and genes common to both. This categorisation was enabled by a unique approach to gene extraction. While the presence of specific groups of selected genes correlated with the bimodular structures of the chimaeric proteins, it was notable t(8;21)- and inv(16)-AML display a significant degree of overlap in their gene expression signatures.


This work was supported by the programme for Promotion of Fundamental Studies in Health Sciences of the Pharmaceuticals and Medical Devices Agency (PMDA), by a Grant-in-Aid for Third Term Comprehensive Control Research for Cancer from the Ministry of Health, Labour and Welfare; and by Grants-in-Aid from the Ministry of Education, Culture, Sports, Science and Technology, Japan. We are grateful to Ms. Sachiyo Mitani for technical assistance and Ms. Michika Kuji for secretarial assistance. M.S. is a participant in the 21st century COE program ‘Center for Innovative Therapeutic Development toward the Conquest of Signal Transduction Diseases’ at Tohoku University.