Integrative genomic characterization and a genomic staging system for gastrointestinal stromal tumors


  • Antti Ylipää MSc,

    1. Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Signal Processing, Tampere University of Technology, Tampere, Finland
    Search for more papers by this author
  • Kelly K. Hunt MD,

    1. Department of Surgical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Jilong Yang MD, PhD,

    1. Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Bone and Soft Tissue Tumor, Tianjin Medical University Cancer Institute and Hospital, Tianjin, China
    Search for more papers by this author
  • Alexander J. F. Lazar MD, PhD,

    1. Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Sarcoma Research Center, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Keila E. Torres MD,

    1. Department of Surgical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Dina C. Lev MD,

    1. Department of Sarcoma Research Center, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Cancer Biology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Matti Nykter PhD,

    1. Department of Signal Processing, Tampere University of Technology, Tampere, Finland
    Search for more papers by this author
  • Raphael E. Pollock MD, PhD,

    1. Department of Surgical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Sarcoma Research Center, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Jonathan Trent MD, PhD,

    1. Department of Sarcoma Research Center, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    2. Department of Sarcoma Medical Oncology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    Search for more papers by this author
  • Wei Zhang PhD

    Corresponding author
    1. Department of Pathology, The University of Texas M. D. Anderson Cancer Center, Houston, Texas
    • Department of Pathology, Unit 85, The University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030
    Search for more papers by this author
    • Fax: (713) 792-5549



Gastrointestinal stromal tumors (GISTs) historically were grouped with leiomyosarcomas (LMSs) based on their morphologic similarities; however, recently, GIST was established unequivocally as a distinct type of sarcoma based on its molecular features and response to imatinib treatment.


To gain further insight into the genomic differences between GISTs and LMSs, the authors mapped gene copy number aberrations (CNAs) in 42 GISTs and 30 LMSs and integrated the results with gene expression profiles.


Distinct patterns of CNAs were revealed between GISTs and LMSs. Losses in 1p, 14q, 15q, and 22q were significantly more frequent in GISTs than in LMSs (P < .001); whereas losses in chromosomes 10 and 16 and gains in 1q, 14q, and 15q (P < .001) were more common in LMSs. By integrating CNAs with gene expression data and clinical information, the authors identified several clinically relevant CNAs that were prognostic of survival in patients with GIST. Furthermore, GISTs were categorized into 4 groups according to an accumulating pattern of genetic alterations. Many key cellular pathways were expressed differently in the 4 groups, and the patients in each group had increasingly worse prognoses as the extent of genomic alterations increased.


Based on the current findings, the authors proposed a new tumor-progression genetic staging system termed genomic instability stage to complement the current prognostic predictive system based on tumor size, mitotic index, and v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog (KIT) mutation. Cancer 2011. © 2010 American Cancer Society.

Gastrointestinal stromal tumors (GISTs) previously were grouped with spindle cell and other soft-tissue sarcomas, including leiomyosarcoma (LMS).1 However, in recent years, GIST has emerged as a distinct mesenchymal tumor type that frequently is associated with a gain-of-function mutation in the v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog (KIT) gene (80%-85% of GISTs) or the platelet-derived growth factor-alpha (PDGFRA) gene (5%-7% of GISTs).1-3 The presence of these mutations allows for targeted therapy using imatinib (Gleevec, STI-571; Novartis Pharmaceuticals, Basel, Switzerland), which has demonstrated efficacy in 60% to 80% of patients with GISTs.4 Conversely, LMSs are not associated with KIT gene mutations or overexpression and do not benefit from imatinib therapy. The treatment of patients with LMS using contemporary cytotoxic chemotherapy has resulted in a 53% objective response rate, whereas patients with GIST who received traditional cytotoxic chemotherapy have not had a measurable response.4, 5 Although mutations in KIT and PDGFRA explain why 60% to 80% of patients with GIST initially benefit from imatinib, the duration of benefit that patients receive from this therapy remains considerably variable. Furthermore, even rarely, some patients with KIT exon 11 mutations are resistant to imatinib; and secondary mutations of KIT reportedly have occurred in patients who initially responded to imatinib therapy.6 Thus, robust and biologically relevant prognostic factors, especially those for predicting the survival of patients with GIST, still are needed.

Growing evidence indicates that the accumulation of specific genetic alterations ultimately leads to a highly unstable underlying genome in cancer development and progression.7 Although some recurrent changes in GIST and LMS genomes have been investigated before, the deficiencies of early measurement technologies or small sample sizes that were used in early studies makes it necessary to accumulate additional genomic information in additional samples and to create a more refined map of the recurrent aberrations. Toward this objective, we conducted a comprehensive, high-resolution, whole-genome array comparative genomic hybridization (aCGH) analysis to map the recurrent copy number aberrations (CNAs) in GISTs and LMSs. We also investigated the clinical relevance of our results in an integrative analysis of the CNAs, gene expression profiles, and patient survival information. The results from this study led us to propose a new tumor-progression genetic staging system termed genomic instability stage (GIS) to complement the current GIST staging system, which is based on tumor size, mitotic index (MI), and c-kit mutation.


Primary Tumors and Pathologic Evaluation

In total, 72 primary tumors, including 42 GISTs and 30 LMSs, were acquired from surgical specimens from 1989 through 2005 at The University of Texas M. D. Anderson Cancer Center under an institutional review board–approved protocol. For transcriptome analysis, high-quality RNA was acquired from 32 GISTs and 25 LMSs. For genomic profiling, we used these samples as well as 15 additional samples (10 GISTs and 5 LMSs). The diagnoses were made on the basis of clinicopathologic evaluation and molecular marker studies. The clinical information is summarized in Table 1.

Table 1. Clinical Information on Patients With Gastrointestinal Stromal Tumors and Leiomyosarcomas
CategoryMean±SD or No. of Patients (%)
  • SD indicates standard deviation; GIST, gastrointestinal stromal tumor; LMS, leiomyosarcoma; KIT, v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog.

  • a

    Age refers to the patient age at which the sample was obtained.

  • b

    Length of follow-up was calculated as the time between the first diagnosis and the last contact.

GIST, n=42 
 Age, ya62±14
 Tumor size, cm12.7±8.8
  Men19 (45)
  Women23 (55)
 Primary site 
  Stomach19 (45)
  Small bowel20 (48)
  Large bowel2 (5)
  Uterus1 (2)
 Disease status 
  Primary no metastasis19 (45)
  Primary with metastasis7 (17)
  Local recurrence without metastasis6 (14)
  Recurrence metastasis10 (24)
 KIT status 
  Wild type11 (27)
  Exon 11 mutation27 (64)
  Exon 9 mutation3 (7)
  No data1 (2)
 Imatinib treatment 
  Preoperative imatinib25 (60)
  No preoperative imatinib17 (40)
 Length of follow-up, mob61±44
LMS, n=30 
 Age, y57±12
 Tumor size, cm15.3±9.1
  Men7 (23)
  Women23 (77)
 Primary site 
  Uterus9 (30)
  Retroperitoneal mass6 (20)
  Inferior vena cava5 (17)
  Other10 (33)
 Length of follow-up, mob41±27

Array Experiments and Preprocessing

Genomic DNA from tumors and pooled normal tissue was isolated according to standard procedure. Labeled genomic DNA was hybridized to the Agilent Human Genome CGH Microarray 4x44 Kit according to the manufacturer's instructions (Agilent Technologies, Palo Alto, Calif). The data were extracted from microarrays with Agilent Feature Extraction software version 9.5 using the default settings and were analyzed further with MATLAB version R2007b (The MathWorks, Inc., Natick, Mass) and R statistical software (version 2.6.2; R Development Core Team, available at accessed August 26, 2010). Intensity values were lowess-normalized to compensate for common nonlinear biases. Ratios of normalized intensity values from tumor tissues and normal tissue were transformed to log2-space. Then, log-ratio data were subjected to a circular binary segmentation algorithm8 (R implementation DNA copy; version 1.6.0) to reduce the effect of noise. The CGHcall algorithm9 (version 1.2.2. in R) was used to label the segments as lost, normal, or gained.

Gene expression data were measured by using whole human genome oligo arrays with 44-K 60-mer probes (Agilent Technologies) with 500 ng of total RNA starting material according to the manufacturer's protocol. Arrays were scanned with the Agilent dual laser-based scanner. Features were extracted from arrays with Agilent Feature Extraction software (version 8.0). The expression data were quantile normalized. Both aCGH and gene expression data are available at accessed August 26, 2010.

Statistical Analyses

DNA sequences were classified as recurrently aberrated if the number of aberrations in individual samples exceeded a threshold of statistical significance, as estimated using a permutation test. The 95th percentile values were chosen as the threshold of significance. By using this procedure, we estimated that similar aberrations in at least 14 samples (33%) for GISTs and 12 samples (40%) for LMSs were required for a sequence to be called recurrently aberrant. Probe average recurrence (PAR) was used to quantify the aberration rate of a recurrently aberrated DNA segment. The PAR is calculated by averaging the aberration rate over the probes in a contiguous, recurrently aberrated DNA segment. Differences in aberration frequencies between GIST and LMS were tested independently with the Fisher exact test for each probe. To account for the resulting multiple comparisons problem, the level of significance in these tests was set to .001. Differential expression between sample sets was determined with the Wilcoxon rank-sum test with a threshold of .05. In finding the subgroups within the GIST samples, hierarchical clustering with inner squared linkage was applied. The most informative genes for clustering were selected by using a 2-tailed t test. In estimating patient survival curves, Kaplan-Meier survival estimators were applied. A Mantel-Cox test also was used to determine the statistical significance of the difference of these survival estimators. A significance threshold of .05 was selected for all survival tests. A hypergeometric distribution with a significance threshold of .05 was used in computing gene set enrichments.


Gastrointestinal Stromal Tumors and Leiomyosarcomas Have Distinct Differences in Their Genomes

After performing comprehensive aCGH profiling experiments with 42 primary GISTs and 30 primary LMSs, we analyzed the recurrent CNAs in these tumors. Our analysis revealed several distinct loci throughout the genome that frequently were aberrant in GISTs (Fig. 1A) and in LMSs (Fig. 1B), similar to the reported data.10 Statistical comparisons of the concluded cancer genomes revealed that losses in chromosomes 1p, 14q, 15q, and 22q were significantly more frequent in GISTs than in LMSs (P < .001), whereas losses in chromosomes 10 and 16 were more common in LMSs (P < .001). We not only confirmed previous CNAs, such as loss of 1p, 14q, 15q, and 22q,10-21 but also demonstrated that the deletion of 22q was the most common recurrent deletion in GISTs (84% PAR): parts of 22q were deleted in >95% of GIST samples, a rate that was significantly higher than previously reported data.12, 21 In addition, although losses in 1p were common in both sarcoma types, many more and much larger deletions in 1p were observed in GISTs than in LMSs. In comparison, tumors from patients with LM more frequently had gains in chromosomes 1q, 14q, and 15q (P < .001).

Figure 1.

These are skyline plots and heat maps of signature gene copy number aberrations for gastrointestinal stromal tumors (GISTs) and leiomyosarcomas (LMSs). The recurrence of copy number aberrations throughout the genomes is shown both in skyline plots and heat maps for (A) GIST and (B) LMS. The value on the y-axis of each skyline plot is the percentage of patients who had tumors that had gains (positive axis; red) or losses (negative axis; green) in corresponding genomic loci. The probes are aligned evenly in chromosome order on the x-axis. The dashed line indicates the threshold for a significant number of patients whose tumors share the same aberration (14 patients for GIST). The significance threshold is computed with the use of a permutation test. Recurrence rates that exceed this threshold are deemed significantly recurrent and are color-coded to emphasize the locations. Gray color represents a nonsignificant amount of aberration in the locus. The lower plots in A highlight chromosomes 1 and 15, both of which contain 3 critical segments. Critical segments are contiguous genomic regions of recurrent aberration that harbor at least 1 dosage-sensitive, survival-affecting gene. Details of these segments are listed in Table 2. Losses in 1p, 14q, 15q, and 22q are markedly more common in GIST; whereas losses in chromosomes 10 and 13 are more common in LMS. Gains in chromosomes 1, 14q, and 15q are defining features of LMS.

From the aberration profiles illustrated in Figure 1, we created a gene-level map of the recurrent CNAs in GISTs and LMSs. In total, 328 recurrently aberrant segments of DNA were identified in GISTs (202 gains and 126 losses), and 373 were identified in LMSs (194 gains and 179 losses) based on the PAR, which was defined as the average recurrence rate of the probes that were included in a segment. We matched CNAs with corresponding gene expression profiles and identified which genes had expression that was correlated significantly with gene dosage. Next, we investigated the effect of each dosage-sensitive gene on patient survival and identified which recurrent CNAs harbored at least 1 dosage-sensitive gene that was correlated significantly with patient survival (Table 2). These clinically relevant CNA segments and the putative target genes offer a promising starting point from which functional validations can be carried out in future studies.

Table 2. Copy Number Aberrations That Harbored at Least 1 Dosage-Sensitive Gene Associated With a Poor Prognosis in Patients With Gastrointestinal Stromal Tumors and Leiomyosarcomas
No.Locus BoundaryCytogenetic BandCandidate Genes
ChromosomeStart, bpStop, bpPAR, %aStartStop
  • bp Indicates base pair; PAR, probe average recurrence; GIST, gastrointestinal stromal tumor; q, long arm; OXA1L, oxidase (cytochrome c) assembly 1-like; PPP2R5E, protein phosphatase 2, regulatory; p, short arm; RWDD3, ring finger and WD repeat domain containing 3; AKAP13, A kinase (protein kinase A) anchor protein 13; C15orf5, chromosome 15 open reading frame 5; GLCE, D-glucuronyl C5-epimerase; MESDC2, mesoderm development candidate 2; MTHFS, 5,10-methenyltetrahydrofolate synthase (5-formyltetrahydrofolate cyclo-ligase); SENP8, small ubiquitin-like modifier protein/sentrin-specific peptidase family number 8; SUHW4, suppressor of hairy wing homolog 4; USP3, ubiquitin-specific peptidase 3; AMPD1, adenosine monophosphate deaminase 1; AP4B1, adaptor-related protein complex 4, beta 1 subunit; BCAS2, breast carcinoma amplified sequence 2; APBA2, amyloid beta (A4) precursor protein-binding, family A, member 2; ARHGAP11A, rho guanosine triphosphatase-activating protein 11A; KLF13, Kruppel-like factor 13; IGF1R, insulin-like growth factor 1 receptor; AGL, amylo-alpha-1, 6-glucosidase, 4-alpha-glucanotransferase; HIAT1, hippocampus abundant transcript 1; DDEF1, development and differentiation enhancing factor 1; DNAH5, dynein, axonemal, heavy chain 5; FBXL7, F-box and leucine-rich repeat protein 7; HMGCLL1, 3-hydroxymethyl-3-methylglutaryl-coenzyme A lyase-like 1; VPS41, vacuolar protein-sorting 41 homolog; TDRD3, tudor domain containing 3; CCNL1, cyclin L1; VEPH1, ventricular zone-expressed PH domain homolog 1; MAP1B, microtubule-associated protein 1B; C13orf7, chromosome 13 open reading frame 7; P2RY1, purinergic receptor P2Y, G-protein coupled; LMS, leiomyosarcoma; SMARCA3, switch/sucrose nonfermentable-related, matrix-associated, actin-dependent regulator of chromatin subfamily A member 3; C14orf133, chromosome 14 open reading frame 133; C14orf156, chromosome 14 open reading frame156; C14orf168, chromosome 14 open reading frame 168; PSEN1, presenilin 1; ZADH1, zinc-binding alcohol dehydrogenase, domain containing 1; ZFYVE1, zinc finger, FYVE domain containing 1 (FYVE indicates the 4 cysteine-rich proteins: Fab1, YOTB, the vesicle transport protein Vac 1, and the early endosome antigen protein EEA1); ZNF410, zinc finger protein 410; SPPL2A, signal peptide peptidase-like 2A; USP8, ubiquitin-specific peptidase 8; C14orf10, chromosome 14 open reading frame 10; C14orf24, chromosome 14 open reading frame 24; FBXO18, F-box protein, helicase, 18; PFKFB3, 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 3; RBM17, RNA binding motif protein 17; CLYBL, citrate lyase beta like; STK24, serine/threonine kinase 24; TM9SF2, transmembrane 9 superfamily member 2; CDH12, cadherin 12, type 2 (N-cadherin 2); FBXL4, F-box and leucine-rich repeat protein 4; MEF2A, myocyte enhancer factor 2A.

  • a

    The PAR is the mean aberration rate of all the probes that belong to that segment.

GIST losses       
 11419,365,081106,329,84065q11.2q32.33OXA1L, PPP2R5E
 31548,206,99184,693,97652q21.2q25.3AKAP13, C15orf5, GLCE, MESDC2, MTHFS, SENP8, SUHW4, USP3
 41106,334,457114,964,38649p21.1p13.2AMPD1, AP4B1, BCAS2
 51519,109,15433,140,04348q11.2q14APBA2, ARHGAP11A, KLF13
 7199,947,682100,601,82146p21.2p21.2AGL, HIAT1
GIST gains       
 951,492,82116,643,21465p15.33p15.1DNAH5, FBXL7
 133157,321,066170,967,79347q25.31q26.2CCNL1, VEPH1
 163143,995,402156,589,22341q23q25.31P2RY1, SMARCA3
LMS losses       
 11476,889,27477,394,29746q24.3q24.3C14orf133, C14orf156
 21472,470,21673,993,08946q24.2q24.3C14orf168, PSEN1, ZADH1, ZFYVE1, ZNF410
 31548,417,86448,815,59046q21.2q21.2SPPL2A, USP8
 41433,942,77934,940,77944q13.1q13.2C14orf10, C14orf24
 5105,878,8206,526,55843p15.1p15.1FBXO18, PFKFB3, RBM17
 61397,404,02199,434,70343q32.2q32.3CLYBL, STK24, TM9SF2
LMS gains       

Genomic Instability Stage May Be a Valuable Prognostic System for Gastrointestinal Stromal Tumors

We performed a cluster analysis in an attempt to identify clinically relevant subgroups that were defined by chromosome-level CNAs. In contrast to LMS, which did not cluster well into clear genomic subtypes, GIST aberration profiles revealed 4 distinct groups with various degrees of genetic alterations (n1 = 12, n2 = 8, n3 = 12, n4 = 10) (Fig. 2A). A survival analysis of these groups revealed that patient survival is increasingly worse with the presence of more and more genomic aberrations (Fig. 2B). All 4 groups feature partial losses in distal 1p, 19, and 22q, which suggests that these deletions must be early events in GIST development. The defining chromosome-scale difference between Group 1 (with the least amount of aberrations) and Group 2 (with slightly more aberrations than Group 1) is the added deletion of chromosome 14q in Group 2. Patients with tumors classified into Group 1 or 2 have significantly longer survival (Fig. 2C) than patients with Group 3 or Group 4 tumors (which feature more aberrations than the tumors in Groups 1 and 2). Group 3 harbors the same aberrations that characterize Groups 1 and 2 but also have additional deletions of chromosome 15q and the proximal part of chromosome 1p. Tumors in Group 4 are distinguished from tumors in Group 3 by the additional loss of chromosome 10. Although Group 4 retains the characteristics of the first 3 groups, it also contains a more diverse set of tumors, which is apparent in the more heterogeneous pattern of CNAs compared with the other 3 groups. This also is reflected in the survival estimate, which falls between the first 2 groups and the third group.

Figure 2.

Subtypes of gastrointestinal stromal tumors (GISTs) are illustrated according to chromosomal aberrations. (A) Hierarchical clustering of copy number data reveals 4 subtypes of GIST (genomic instability stage 1 [GIS1] through GIS4). Green indicates losses, and red indicates gains. Locations of the key chromosomes are indicated on the top of the heat map. On the right, the 4 groups identified by hierarchical clustering are indicated by blue (GIS1; n = 12), light blue (GIS2; n = 8), red (GIS3; n = 12), and dark red (GIS4; n = 10). The same colors highlight the most distinct aberrations in each group that are visible in the heat map, in which the main characteristics of Group 1 are the losses of distal 1p and 22q; Group 2 has losses of distal 1p, 22q, and an additional loss of chromosome 14q; Group 3 has losses of 1p, 22q, and 14q and also features a loss of 15q; and Group 4, although more heterogeneous, is characterized by losses of 1p, 22q, 14q, and 15q and loss of chromosome 10. Black boxes on the right side of the heat map illustrate some of the known survival-affecting characteristics of individual tumors. WT indicates wild type. (B) Kaplan-Meier survival estimates are shown for each group. (C) Patients with late-stage GIST (GIS3 and GIS4; n = 22) hypothetically have a significantly worse prognosis (P = .006) than patients with early stage GIST (GIS1 and GIS2; n = 20). (D) Although v-kit Hardy-Zuckerman 4 feline sarcoma viral oncogene homolog (KIT) exon 9 and 11 mutations and imatinib treatment significantly affect survival, they are distributed among all 4 groups and, thus, do not affect the survival estimate for any 1 group significantly more than another. (E) This box plot illustrates the prevalence of aberrations for each GIST subtype and chromosome. The height of each box was computed as the square of the amount of aberrant probes divided by the total amount of probes in the chromosome. The plot emphasizes both the relative size of aberrations with respect to the length of the chromosome and the total amount of aberrated loci. This procedure is needed to illustrate the importance of both completely aberrant, small chromosomes and partially aberrant, large chromosomes in the same scale. The plot illustrates the sequential nature of chromosome-scale aberrations in the 4 GIST subtypes, which is revealed best in the highlighted chromosomes and also is visible, for example, in the increased amplification of chromosomes 3, 4, 5, and 6. The overall lower prevalence of aberration in GIS4 is explained by the greater genomic heterogeneity within that group.

These results lead us to propose a new tumor-progression genetic staging system termed genomic instability stage (GIS) to complement the current prognostic staging system for GIST based on tumor size, MI, and KIT mutation. Although we did not have MI information for all patients in the study, we did have sequencing data on KIT and PDGFRA gene mutations (Fig. 2A). This allowed us to investigate the relation between the mutation of these genes, especially KIT, and genomic instability manifested by the accumulation of CNAs. The high mutation rate of KIT exon 11 in Groups 1 and 2 suggested that KIT mutation is an early event in GIST, which is consistent with its role as a driver oncogene. Increased KIT mutation frequency was observed in Groups 3 and 4, and this observation was consistent with reports that secondary mutations of KIT occur at later stages in GIST progression.6 Imatinib-treated patients who had mutations in KIT exon 11 survived significantly longer (in a group-independent manner) than patients who had the same mutation but did not receive imatinib (P = .002) (Fig. 2D), suggesting that the differences in genomic survival estimators were not affected significantly by either imatinib treatment or KIT mutations. Furthermore, other common risk-assessment and clinical parameters, such as patient age, tumor size, sex, primary site, and the presence of metastases, were not correlated significantly with the groups. Because of the lack of data on MI, we were unable to fully compare the existing risk-assessment system4 with the genomic stages. Further prospective characterization of these genomic profiles, coupled with full risk assessment (based on 2007 National Comprehensive Cancer Network guidelines) and clinical outcome, is needed to validate this proposed model.

The incremental occurrence of the observed CNA patterns, increasing KIT mutations, and independence from clinical parameters other than survival suggested that the 4 GIS stages reflect the progressive accumulation of chromosome-scale genetic abnormalities during GIST progression. We further confirmed the sequential nature of the chromosome-scale events by determining which aberrations are the most prominent in each stage. That analysis clearly revealed that losses of 1p, 14q, 15q, 19, and 22q are the most distinct events in the 4 groups, although many smaller scale events also may play a critical role in GIST progression (Fig. 2E). Notably, the prominence of less aberrated chromosomes also increased from the first stage to the third, as observed in the amplification of chromosomes 3, 4, 5, and 6. In addition, the analysis revealed that dosage-sensitive genes in critical segments also changed their expression in a corresponding manner between the hypothesized GIS stages (Fig. 3A-D). From the gene expression rates, we could clearly observe copy number changes in the chromosomes that harbored the genes, such as loss of 15q for the anchor protein 13 (AKAP13) gene and the chromosome 15 open reading frame 5 (C15orf5) gene, loss of 14q for the oxidase (cytochrome c) assembly 1-like (OXA1L) gene, and gains in 3q for the switch/sucrose nonfermentable-related, matrix-associated, actin-dependent regulator of chromatin subfamily A member 3 (SMARCA3) gene. Because these are genes that significantly affect survival, these possible target genes also ultimately may be responsible for the worse outcome observed for patients in Groups 3 and 4.

Figure 3.

Sequential chromosomal aberrations were correlated with gene expression in survival-critical genes. (A,B) The stepwise loss of chromosome 15q, where the A kinase (protein kinase A) anchor protein 13 gene (AKAP13) and the chromosome 15 open reading frame 5 gene (C15orf5) genes are located, also is reflected in the lower median gene expression in genomic instability stage 3 (GIS3) and GIS4. (C) The oxidase (cytochrome c) assembly 1-like gene (OXA1L) exhibits a clear gene-dosage effect according to loss of chromosome 14q in GIS2 through GIS4. (D) Although amplifications affected gene expression less than deletions, expression of the switch/sucrose nonfermentable-related, matrix-associated, actin-dependent regulator of chromatin subfamily A member 3 (SMARCA3) gene has a pattern similar to that of chromosome 3 in Figure 2E. All 4 genes are aberrated recurrently and affect survival significantly.

Different Cellular Pathways Are Altered in Gastrointestinal Stromal Tumors With Different Genomic Instability Stages

The genes that were expressed differently between 2 adjacent GIS groups were used in an enrichment analysis with the objective of finding the biologic processes that were altered significantly during the progression from 1 stage to the next. We used the list of biologic processes in the Gene Ontology database22 as our reference. Differences in genome level translate into several distinct cancer-related processes at the transcriptome level (Fig. 4). The changes from GIS1 to GIS2 impaired mainly the apoptotic, DNA-repair, and damage-response pathways; whereas the progression from GIS2 to GIS3 affected the mitotic, cell cycle, and growth pathways. The final transition from GIS3 to GIS4 had substantially more differences in gene expression, most notably in the cell-cell adhesion and chromosomal organization pathways.

Figure 4.

Key changes in genome level confer changes in transcriptome level. The most prominent copy number changes that characterize the 4 stages are shown on the top of the figure. These and smaller scale aberrations convey various tumor-promoting properties to the cell by disrupting key biochemical pathways and biologic processes. The transition from genomic instability stage 1 (GIS1) to GIS2 is characterized by altered antiapoptotic processes and DNA repair as well as chromosomal organization and regulation of transcription. Mitotic, cell cycle, and growth terms were abundant in the transition from GIS2 to GIS3. The list of altered processes in the last transition was longer and more diverse than the previous list, probably because Group 4 was the most heterogeneous. Chromosomal changes, growth, and cell-cell adhesion were among the most relevant cancer-related processes.


Recent progress in cancer genomics, highlighted by the advancement of the Cancer Genome Atlas program, has demonstrated that comprehensive genomic characterization of a large number of cancer samples is highly valuable for fully understanding the molecular basis of human cancer and for classifying cancer into clinically meaningful subtypes.23, 24 In the current study, we conducted an integrated analysis of high-resolution genomic maps, gene expression data, and clinical information on GISTs and LMSs. Our genomic analysis provided further evidence that GISTs are distinct from LMSs at the genomic level and pointed out the exact chromosomal locations of the greatest difference and similarity. However, it is most noteworthy that our analysis provides a genomic view of GIST progression and demonstrates that staging by using a specific genomic alteration may offer a clinically meaningful system for predicting the prognosis for patients with GIST, even in those who receive imatinib therapy.

Although several previous studies have profiled genomic alterations in GISTs and LMSs using different generations of technologies and relatively small sample cohorts, a key aspect of the current analysis is the correlation of genomic alterations with gene expression data and clinical information. By using this integrative approach, we were able to pinpoint clinically relevant CNAs from the vast number of biologically irrelevant aberrations. Whereas simple mapping of recurrently aberrant genes can yield hundreds or thousands of clinically irrelevant passenger genes, the clinically relevant genomic segments (critical segments) that we have uncovered provide a reasonable number of putative targets for future validation studies.

Our integrated analysis also led us to a new appreciation for the genetic basis of the progression of GISTs. Pattern recognition analysis of the genomic alterations revealed that there is an obvious incremental accumulation of gene copy number alterations in GIST. Consequently, we have proposed a new tumor-progression genetic staging system (Genomic Instability Staging or GIS) to complement the standard tumor site, size, and proliferation risk-assessment system.4 According to the GIS staging system, deletions of distal 1p, 19, and 22q are the likely keys to early chromosome-scale events that may have triggered the transformation from normal tissue to GIS1 tumor. Whether these events occur before or after KIT mutation is not apparent from our data, because KIT mutation is a high-frequency event in every GIS stage. The most distinct event that follows these deletions is the deletion of 14q, which can be observed clearly as the defining feature in GIS2. Further key deletions of proximal 1p and 15q mark GIS3 disease. Loss of chromosome 10, which also has been associated with late stage in many solid tumors,25 defines the final stage, GIS4. Our GIS groups are consistent with previously reported data21, 26 but provide more specific information on the key aberrant events. The lack of significant differences in KIT and PDGFRA mutation status and in the response to imatinib for different GIS groups indicates that the GIS system may have independent prognostic value for patients with GISTs.

Our pathway analysis provides additional insight into the process of tumorigenesis in which early stage GISTs (GIS1 and GIS2) evade apoptosis, intermediate-stage GISTs (GIS2 and GIS3) undergo accelerated proliferation, and late-stage GISTs (GIS3 and GIS4) lose their dependence on cell adhesion, allowing invasion and metastasis. These different key pathway aberrations in different GIS groups validate the accumulative progressive character of GISTs. We believe that these findings are compelling; however, functionally confirming them would require a much larger study. We also must point out that, although we did not observe similar findings for LMSs, this may mean only that LMS is a more heterogeneous disease, and a larger sample size would be needed to reveal key signatures that underlie disease progression and prognosis in patients with LMS.


We thank Drs. Bogdan Czerniak, Jean-Pierre Issa, and Janet Bruner for their critical review of this article and valuable comments. In addition, we thank David Cogdell and Limei Hu for performing the microarray experiments and Drs. Robert Benjamin, Olli Yli-Harja, and Ilya Shmulevich for their significant contribution to the experimental design and interpretation of the results. We also thank Ms. Tamara Locke of the Department of Scientific Publications at The University of Texas M. D. Anderson Cancer Center for editing this article.


Supported by National Institutes of Health (NIH) grant R01 CA098570 (to W.Z.), an NIH Career Development Award (to J.T.), a Commonwealth Foundation for Cancer Research grant (to W.Z. and J.T.), Academy of Finland Projects 213462 and 122973 (to A.Y. and M.N.), and the National Natural Science Foundation of China (30901715/C171002; to J.Y.). This research is supported in part by the NIH through The University of Texas M. D. Anderson Cancer Center Support Grant CA016672.