Construction of oncogenetic tree models reveals multiple pathways of oral cancer progression

Oral cancer develops and progresses by accumulation of genetic alterations. The interrelationship between these alterations and their sequence of occurrence in oral cancers has not been thoroughly understood. In the present study, we applied oncogenetic tree models to comparative genomic hybridization (CGH) data of 97 primary oral cancers to identify pathways of progression. CGH revealed the most frequent gains on chromosomes 8q (72.4%) and 9q (41.2%) and frequent losses on 3p (49.5%) and 8p (47.5%). Both mixture and distance‐based tree models suggested multiple progression pathways and identified +8q as an early event. The mixture model suggested two independent pathways namely a major pathway with −8p and a less frequent pathway with +9q. The distance‐based tree identified three progression pathways, one characterized by −8p, another by −3p and the third by alterations +11q and +7p. Differences were observed in cytogenetic pathways of node‐positive and node‐negative oral cancers. Node‐positive cancers were characterized by more non‐random aberrations (n = 11) and progressed via −8p or −3p. On the other hand, node‐negative cancers involved fewer non‐random alterations (n = 6) and progressed along −3p. In summary, the tree models for oral cancers provided novel information about the interactions between genetic alterations and predicted their probable order of occurrence. © 2009 UICC

Oral squamous cell carcinomas (OSCC), like all solid tumors, are characterized by multiple chromosomal alterations and are genetically complex. 1 Dependencies between the numerous genetic alterations lead to observed karyotypic complexity, which results in the distinct biological behavior of oral cancers. 2 For example, node-positive OSCC are biologically aggressive and have poor prognosis when compared with the node-negative OSCC. 3 This indicates that different genetic pathways of progression exist in oral cancers, leading to the molecular subtypes with distinct clinical outcomes. Hence, it is necessary to identify the genetic alterations and the interactions between them that form multiple progression pathways. This approach may aid in better understanding the biology of oral carcinomas.
Comparative genomic hybridization (CGH), a genome-wide profiling technique, has revealed non-random pattern of genetic alterations in oral cancers. [4][5][6][7][8][9] An early study suggested that genomic alterations in OSCC may be more uniform than those of other solid tumors, 5 but a more recent study demonstrated that the initiation and progression of oral cancers involves divergent pathways. 9 Because, the study used frequency analysis, it could not evaluate interactions between the alterations and provided limited information on genetic pathways. Thus, it is desirable to use additional statistical methods that account for interactions and can estimate genetic pathways of cancer progression from CGH data.
Cancer progression has been described by mathematical models, such as oncogenetic tree models. [10][11][12] Tree models are more flexible than linear models of progression proposed earlier, 13 because trees can represent multiple pathways simultaneously. Oncogenetic tree models have been constructed for renal cell carcinomas, bladder cancers, head and neck cancers, nasopharyngeal cancers, prostate cancer, B-cell lymphomas and meningiomas. [14][15][16][17][18][19][20] For each cancer type, the tree models have identified multiple progression pathways and revealed different subtypes characterized by combinations of alterations.
To date, the pathogenetic pathways followed by oral cancer have not been thoroughly investigated. Elucidating the divergent routes in oral carcinomas could provide information about molecular subtypes, which might support treatment decisions. With this long-term goal in mind, in the current study, we constructed oncogenetic tree models based on CGH data of 97 primary oral cancers. Both mixture models and distance-based trees models were used to analyze genetic alterations in OSCC. Distance-based tree models were also constructed separately for node-positive and node-negative OSCC.

Study population
Tumor tissues were collected from 97 oral cancer patients who underwent surgical resection at the Tata Memorial Hospital (TMH), Mumbai. Tissue collection and the entire study protocol were approved by the Institutional Review Board at TMH. Informed consent was obtained from the patients. All patients underwent neither chemotherapy nor radiation therapy before surgery. After microdissection of tissues, the pathologist confirmed that each tissue sample had 60% tumor cell content. All the tumor samples were graded and staged according to the WHO and TNM and AJCC 2002 classification of tumors, respectively. Clinico-pathological data of the cases are summarized in Table I. The study group consisted of 75 male and 22 female with median age of 52 years, ranging from 23 to 77 years. There were more nodepositive tumors (n 5 54) than node-negative tumors (n 5 43).

CGH analysis
CGH was performed as described previously using direct labeling method with fluorochrome labeled dUTPs. 16,21 Using standard nick translation method, tumor DNA was labeled with Fluorescein-12-dUTP and normal DNA was labeled with Texas Red-5-dUTP (NEN, Boston, MA). An equal quantity (2 lg each) of tumor and normal DNA were mixed with 10 lg of unlabeled human Cot-1 DNA (Invitrogen) and dissolved in 10 ll of hybridization buffer. The denatured probes in the mixture were hybridized onto the normal metaphase spreads (Vysis, Touhy, Des Plaines, IL) at 37°C for 48 hr. After post-hybridization washes, the slides were counterstained with DAPI (Vectashield, Burlingame, CA) for chromosome identification before visualization by the fluorescence microscope (Zeiss Axioscope, Zena, Germany). The images were analyzed by digital image analysis system (Metasystems, Germany). The average ratio of green to red fluorescence intensity was calculated for each chromosome. The thresholds were set at 1.25 and 0.75 to determine copy number alterations (CNAs), i.e., gains and losses, respectively.
The thresholds 1.25 and 0.75 are commonly, though not universally used in CGH. The thresholds were proposed in the original article on CGH 21 and further evaluated in early studies. 22,23 The study of Jeuken et al. 24 showed that using thresholds of 1.2 and 0.8 did not make much difference in practice. Among the 6 previous studies on CGH of oral cancer that we found, 4-9 some used 1.25/0.75 thresholds; others used 1.2/0.8. As far as we could determine, none of these 6 studies analyzed their data using more than one pair of thresholds.

Fluorescence in situ hybridization
Interphase FISH was performed on archival oral cancer samples with known CGH results to evaluate the CGH-predicted gains of 11q13 and 8q24.3. Dual colored FISH was performed on 4 lm sections of archival OSCC samples (n 5 28) and the corresponding normal oral mucosal tissues (n 5 3). Centromere-specific BAC clones were Cy-3 (red) labeled and region-specific BAC clones were FITC (green) labeled using standard nick translation method. All BAC clones were obtained from CHORI (BACPAC Resource Center, Children's Hospital and Research Centre, Oakland, USA). The specific BAC clones we used were RP11-642A1 for region 8q24.3, RP11-73M19 for chromosome 8 centromere, RP11-149G19 for 11q13 region and RP11-135H8 for chromosome 11 centromere.
Sections on slides were first deparaffinized in xylene and then treated using commercially available Vysis pre-treatment kit (Vysis). Briefly, this involved treating the sections for 30 min in 1 M sodium thiocyanate, digesting for 20 min with protease at 0.5 mg/ml in 0.01 N HCl and fixing in 10% neutral buffered formalin. For FISH experiments, labeled probes were added to a slide in a hybridization solution containing 50% deionized formamide, 10% dextran sulphate, 2X SSC, 2 mg salmon sperm DNA and 10 mg Cot-1 DNA. The slides and probe DNA were denaturated at 75°C for 10 min and hybridized overnight in a humidified chamber at 37°C. Subsequently, the slides were subjected to posthybridization washes 50% deionized formamide/2X SSC at 42°C for 10 min and three times in 2X SSC at 42°C for 5 min. Inter-phase nuclei were counterstained with 4,6-diamidino-2-phenylindole (DAPI).
For evaluation of the experiments, hybridization signals from 100 non-overlapping interphase cell nuclei of each tumor sample were counted using a fluorescence microscope. A copy number gain was scored, if the average number of signals per nucleus was greater than or equal to three (3).

Construction of oncogenetic trees
Two types of oncogenetic tree models were used to describe the occurrence of genetic alterations during the progression of oral cancer. Distance-based trees were constructed using the software oncotrees 10,11 (http://www.ncbi.nlm.nih.gov/CBCresearch/Schaffer/ cgh.html) and mixture models of trees were generated by the Mtreemix software package (http://mtreemix.bioinf.mpi-sb.mpg.de). 25 Chromosomal aberrations from CGH data were used as input for the tree modeling procedures. Abnormalities on 1p, 16p, 22q and Y chromosome arms were excluded from analysis, because these chromosome regions are guanine/cytosine rich regions, which are known to yield false positives. For construction of tree models, the CGH profile was recorded as presence/absence of a gain and presence/absence of a loss on a chromosome arm. We first tried using more precise single digit bands (data not shown), but this representation led to confounding of spatial relationships between bands on the same chromosome arm and the desired temporal relationships between copy number aberrations.
Trees were constructed from CNA events selected as non-random by the method of Brodeur et al., 26 which is implemented within oncotrees. The method of Brodeur et al. requires a prior distribution for CNA occurrences. For this purpose, we initially assumed that the probabilities of a gain or loss on a chromosome arm are equal and proportional to the chromosome arm size; we used arm sizes from Morton. 27 However, it is known that gains are more common than losses in OSCC 4,5,7-9 and in the present data set, gains were approximately twice as frequent as losses. Therefore, we also used a 2:1 skewed prior distribution in addition to the balanced prior. To reduce the number of events selected as non-random, we used a threshold of 99th percentile in the test statistic of Brodeur et al. The events selected as non-random scored above the 99th percentile using both the balanced and the 2:1 skewed prior distributions. The selection of non-random events was redone for each tumor subset considered.
In the oncogenetic tree model, the root corresponds to the normal state of the cell. Vertices of the tree represent genetic alterations (events) and edges between vertices represent statistical dependencies between events. Each vertex is associated with the probability that the corresponding event will occur if the preceding event in the tree has already occurred. Thus, aberrations that are placed close to the root of the tree are estimated to occur early in tumor development, whereas those at longer distances are estimated to occur late in tumor progression. Correlated events tend to cluster in subtrees. In the tree models, the genetic events are assumed to be irreversible.

Mixtures of oncogenetic trees
The oncogenetic trees mixture model 12,18,25 consists of several weighted components, each of which is an oncogenetic tree as described above. Usage of several tree components allows more flexible modeling of oncogenetic pathways than using a single tree. We fixed one tree component with a star topology and uniform transition probabilities. This tree models events as independent and uniform and thus serves as a null model. The fraction of samples assigned to this tree does not show apparent branchinglike correlations. By contrast, the structure and parameters of the other tree components are learned from the data and they reveal pronounced dependencies among the cytogenetic alterations. The star component captures the possibility that gains and losses occur at random with no dependencies. The mixture model allows the addition of one or more non-star trees to model the dependencies between copy number aberrations. Using more non-star trees may fit the data better, but is susceptible to over fitting. We used a modified Bayesian Information Criterion (BIC) to trade off the increasing complexity of the model with the improved fit to the data as the number of tree components increases. 28 Using this model selection technique, one additional non-star tree gave the best BIC score.

Distance-based trees
The distance-based trees were constructed as described in previous studies. 16,29 For the distance-based trees, we use the Fitch 30 and Neighbor 31 programs of PHYLIP 32 to fit the distances to a tree. In all cases, the Neighbor program gave a better fit, which means the following. Let I ij be the input distance between CNAs i and j and let T ij be the distance implied by the output tree. We have two choices for the matrix T, one from the Fitch program and other from the Neighbor program. We define a matrix of differences D ij 5 |I ij 2 T ij | one choice of T ''fits'' better than another, if it gives a matrix D that has smaller entries. The total size of D can be measured by standard matrix norms. When we do so, for these trees, the tree distances T from Neighbor always give a better fit than those from Fitch.
To compute bootstrap confidence levels for each split in the distance-based trees we used a 3-step procedure. First, 100 bootstrap samples were generated and their associated distance matrices were computed using the bootstrapping module of oncotrees. Second, we used the Neighbor program to fit each bootstrap sample to a tree. Third, we used the Consense program from PHYLIP to compute the number of times each split in the tree occurred perfectly. This method counts positively only those trees in which a split into subtrees occurred exactly as in the original distancebased tree; therefore, one expects confidence levels to be fairly low when the split involves more than 2 or 3 events.

Tree visualization
Mixture models are drawn with Graphviz, 33 and distance-based trees are drawn with TreeView. 34 The relevant distance in the visualization of distance-based trees is the horizontal distance from the root node to the node representing a copy number aberration or the horizontal distance between 2 copy number aberrations. Vertical edges and distance are used to spread out the tree for easier visualization.

Genetic progression scores
The genetic progression score (GPS) of an observed tumor sample is defined as the expected waiting time of the mutational pattern of the tumor in the timed oncogenetic trees mixture model. 18 The absolute values of this quantity are only meaningful if information on the true age of each tumor is available. However, even without scaling, the expected waiting time provides a useful measure of genetic progression that is based on the tree model. Unlike simple counting of alterations, the GPS accounts for dependencies between events. The GPS distributions of node-positive and node-negative patients were compared using the Wilcoxon rank-sum test.

Time of occurrence
A simpler method to infer the relative early and late occurrence of the events is called time of occurrence (TO) analysis. 10,35 One computes, for each event, how many other events occur in all tumors that have the event. The general concept of TO analysis is that an event A occurs before an event B if the number of events co-occurring with A is smaller than with B. To compare the number of co-occurring events for A and B, one could use test statistics the average, the median, or the mode. Desper et al. 10 recommended using the average. H€ oglund et al. 35 who coined the term ''TO analysis'', recommended using the mode. Unfortunately, in our data set, many CNAs have a mode of ''9'' in the distribution of number of co-occurring aberrations, so the mode is not a very useful test statistic. Therefore, we followed the recommendation of Desper et al. 10 and used the average number of co-occurring events to propose an order of events.

Confirmation of some CGH results by FISH
To confirm some of the CNAs detected by CGH, we performed FISH on 28 samples to evaluate the regions 11q13 and 8q24 that are frequently gained (Fig. 1). For samples where we could obtain FISH signals, the concordance rates were good (81% for 11q13 and 88% for 8q24.3; Table II). For 11q13, the 5 discrepancies were due to tissues that were non-informative in 2 cases and FISH detected a gain that was not found by CGH in 3 cases. For 8q24, the 3 discrepancies comprise 2 non-informative tissues and one case of FISH detected a gain that was not found by CGH.

Construction of oncogenetic trees for oral cancer progression
The method of Brodeur et al. (see Material and Methods) selected 12 CNAs that occurred more frequently than would be expected at random: 15p, 17p, 18q, 19q, 111q, 117p, 118p, 120p, 120q, 23p, 28p and 218q. Here, a plus (1) symbol indicates the gain of a chromosomal region and a minus (2) represents a loss. Oncogenetic and distance-based trees were constructed using these 12 events. Both models represent the apparent multi-step and multi-pathway process of oral carcinogenesis (Figs.  2 and 3).

Oncogenetic trees mixture model
We estimated oncogenetic trees mixture models consisting of a star component and a non-star tree component to obtain a concise description of the genetic development of oral cancers (Fig. 2). A third of the tumors can be explained by the non-star tree component. In this branching tree, the root vertex corresponds to the normal oral keratinocyte, whereas the other vertices represent the CNAs of interest. Because the event 18q is the only direct successor of the root, it is predicted to be the initial event. Once this event occurs, the occurrence of subsequent events becomes much more likely. After 18q, the branching tree displays two independent pathways, one consisting only of event 19q, the other comprising the 10 remaining events and starting with 28p followed by 23p. The latter pathway was predicted to further branch into two pathway beginning with 218q and 17p, respectively. After the initial 18q event, the large 10-event sub-branching is more likely to develop than the 19q pathway, the likelihood ratio being 0.79:0.51 5 1.55.

Distance-based trees
In the distance-based tree model, the time to occurrence is proportional to the root-leaf horizontal distance (Fig. 3). According to this model, 18q was an early event. After the occurrence of 18q, the distance-based tree classified other events into 2 or 3 clusters. One cluster is marked by 28p and 23p (comprising  28p, 218q and 118p; and 23p, 19q and 117p); it might be split into two sub-clusters of three events each, but the bootstrap confidence for the split into two sub-clusters is low (14%). The other cluster is marked by events 111q and 17p (comprising 111q,  17p, 120p, 120q and 15p). These two clusters suggest OSCC genetic subtypes.
Though similarities were observed among the tree models, there were inconsistencies in terms of whether (i) 23p is part of the 28p pathway and (ii) 19q depends on 23p or not.

Construction of oncogenetic trees for node-negative and node-positive oral cancers
In a univariate analysis comparing node-positive and node-negative tumors, four events showed significant association with node-positive status by a one-sided Fisher's exact test: 28p (p < 0.008), 17p (p < 0.01), 218q (p < 0.04) and 19q (p < 0.04). These associations do not remain significant after correcting for multiple testing using the false discovery rate (FDR) method. 36 Using the FDR method, 28p (adjusted p < 0.06) and 17p (adjusted p < 0.06) approached statistical significance. Thus, associations of single genetic imbalances with nodal status are not sig-nificant after correcting for multiple testing. However, this finding does not preclude the possibility that associations of 2 or more imbalances with nodal status might be statistically significant. To test this hypothesis, we focused on multivariate analyses, which could shed more light on the genomic differences between nodepositive and node-negative OSCC.
Separate distance-based tree models were constructed to understand the progression pathways in node-positive versus node-negative oral cancers. More events (n 5 11) were selected as non-random in node-positive OSCC than in node-negative OSCC (n 5 6) (Fig. 4), consistent with the hypothesis that node-positive tumors are more advanced. For both sets of tumors, the non-random events were chosen systematically by the established method of Brodeur et al. (1982), although we raised the cutoff to the 99th percentile instead of the 95th. Being more stringent is not helpful. If we are less stringent (e.g., 95th percentile), then the number of events selected as non-random grows and the tree models become unwieldy.
According to Figure 4, the node-positive cancers may be classified into two main groups: one group includes 19q, 28p, 118p and 218q, whereas the other group includes, 17p, 111q, 120p, 120q and 15p. Events 18q and 23p are early events that do not fit clearly into either of these 2 groups. The splitting of the 2 large groups has only moderate bootstrap support (15%), whereas the separation of 18q and 23p is more pronounced (63% and 25%, respectively). The event 19q, which is suggested as an important  indicator of progression by the mixture tree model on all tumors, is selected as non-random for the node-positive tumors, but not for the node-negative tumors. The numbers of tumors in each subset were considered too small for mixture tree analysis Further, the distance-based tree models predicted progression of node-negative OSCC by two main classes (Fig. 4, bootstrap confidence 69%). One class included tumors that did not progress much beyond the aberrations on chromosome 8. The other class had 2 subgroups for which the confidence in the split is 64%: subgroup 1, which included tumors that progressed with 23p and subgroup 2 that progressed with 111q. The events in both these clusters were not placed near to each other. The alterations 17p, 218q, 120p and 15p were not selected for the node-negative OSCC. Overall, the distance-based tree model indicates that this subtype was karyotypically less complex as compared with the node-positive OSCC.

Genetic progression score
For each observed tumor, the GPS measures the level of genetic progression in the oncogenetic tree model. Node-positive cancers had progressed significantly further along the tree model than node-negative cancers (median GPS of 0.87 vs. 0.59, p 5 0.0016; Fig. 5).

Time of occurrence
TO statistics for the non-random events are shown in Table III. The average number of imbalances in tumors harboring 28p was less than the average for 23p, which indicates that 28p would  Displayed is the non-star tree component of the oncogenetic trees mixture model with a total of 2 components. The first component, which is not displayed, has the star topology, i.e., there is an edge from the root ''null'' to each event and the same conditional probability for each edges. By contrast, the topology of the displayed tree has been estimated from the data. Edges in this tree represent statistical dependencies. The edges are labeled with the maximum likelihood point estimate of the conditional probability of the event at the head of the edge given that the event at the tail has already occurred (1st line), the 95% confidence interval of this estimate (2nd line) and the count of this edge in 1000 trees obtained from bootstrapping (3rd line). Counts close to 1000 indicate high confidence in the presence of the edge. occur before 23p. The values for all alterations except 118p were consistent with the predictions of the tree analysis. The exception might be due to the lower frequency of 118p in our study (13.4%).

Discussion
The non-random CGH pattern of oral cancers detected in our study was generally consistent with previous studies, 4-9 although some of the losses reported as common vary from study to study; this variation may be due to differences between the populations sampled. 37 Among the observed alterations, the gain of 8q and loss of 3p have been reported to be the early events in oral carcinogenesis. 38,39 Besides these alterations, the sequence of other aberrations in oral tumorigenesis is not known. Elucidating progression pathways from complex CGH data requires statistical FIGURE 4 -Distance-based trees for node-positive (LHS) and node-negative oral cancers (RHS). Distinct pathways of progression are present in both subtypes. The 28p pathway with its subtree is prevalent in node-positive as compared with the node-negative oral cancers. When they could be computed in the Consense program, bootstrap confidence values are shown for each split into subtrees.
FIGURE 5 -GPS for the node-negative and the node-positive oral cancers. Displayed is a histogram of the GPS of all node-positive (dark bars) and all node-negative (light bars) tumors. The GPS distribution of node-positive tumors is shifted to the right relative to the node-negative tumors, indicating that, on average, node-positive tumors have progressed further along with the underlying oncogenetic tree (Fig. 2). The average number of genetic alterations in OSCC with the given alterations.
modeling techniques such as oncogenetic trees. The current study is first to apply tree models to CGH data obtained from a reasonably large number (n 5 97) of primary oral cancers. Among the previous CGH studies of oral cancer, the largest number of tumors in one study was 35. 9 The branching mixture models and distance-based tree models identified divergent pathways of progression in oral cancers. There were similarities and differences in the sequence of alterations revealed by them. The common findings led to the following inferences: (i) 18q is located near the root, hence it is an early event in cellular transformation of oral cancers; (ii) at least three subtrees emerge subsequent to the occurrence of 18q, which indicate that there are divergent pathways of progression in oral carcinomas; (iii) alterations 15p, 117p and 118p are late events due their long distance from the root; (iv) a close relationship exist between 17p and 111q; and 120p and 120q alterations, hence these are present in the same subcluster. Some inconsistencies were observed mainly in the relative ordering of 23p and 19q alterations, namely whether 23p is part of the 28p pathway or not and with respect to the placement of 19q, whether it is part of the 23p pathway or independent of the other events. Part of these inconsistencies may result from the fact that only about a third of the data can be mapped to the oncogenetic tree component displayed in Figure 2.
Comparing the current tree models of oral cancers with the tree analysis of the head and neck cancers, we found that some of the alterations (23p, 18q, 15p, 117p and 118p) were selected as non-random events for both the present oral cancer data set and an early collection of head and neck cancer. 16 Also, the prediction of occurrence of 18q and 118p suggested for the subset of head and neck cancers, coincided with the progression pathways identified by our study. Thus, 18q appears to be an early event in the development of oral cancers. Activation of the oncogene c-MYC on chromosome region 8q24 has frequently been implicated in oral carcinogenesis, but some studies have suggested other genes on 8q. 38 After the occurrence of 18q, the losses of 8p and 3p were predicted as subsequent early events during progression of OSCC. This novel finding seems to be in contrast to previous studies, which have reported that 23p is an early event 39 or explicitly predicted that the loss of 3p precedes the loss of 8p in both oral premalignant and malignant lesions. 40 Both of these studies were based on loss of heterozygosity (LOH) analysis using specific microsatellite markers. For the present data set, the TO analysis confirmed the oncogenetic tree predictions that 28p tends to occur before 23p (Table III). The different predictions could be due to differences between LOH and CGH or due to differences in the study populations.
Among the 3 subclusters identified by the distance-based tree, 17p and 111q appeared to form a distinct pathway comprising only chromosomal gains (111q, 17p, 120p, 120q and 15p). Genes at chromosomal regions 7p12 and 11q13 are known to affect the biological characteristics of OSCC. For example, activation of the oncogene EGFR on 7p12 has been implicated in nodal metastasis of OSCC, 41,42 whereas the oncogenes CCND1 and EMS1 on 11q13 have been associated with high proliferation, invasiveness and poor prognosis of HNSCC. 43,44 Because, the gains of 7p and 11q contribute to the aggressive behaviour of OSCC, this pathway may also has prognostic importance.
It is well established that node-positive OSCC have poor prognosis as compared with node-negative OSCC. This may be due to the differences in the underlying genetic alterations between the two subtypes. The GPS analysis suggests that node-positive OSCC have progressed further along the tree and they tend to be of a later genetic progression stage. On the other hand, the distance-based trees that were constructed for each subgroup separately indicate differences in the number and type of genetic events that occur in each subtype. Hence, we have found indications for node-positive OSCC as both late stage tumors in a unified progression model and as resulting from alternative progression routes.
Higher numbers of genetic alterations and higher GPS suggest increased karyotypic complexity and genetic instability in the node-positive OSCC. Furthermore, the pathway of 28p and its subtree as well as the alterations 218q, 17p and 19q were prominently observed in the node-positive as compared with the nodenegative OSCC. The loss of chromosomes 8p and 18q and gain of chromosome 7p have been reported to contribute metastasis and poor prognosis of HNSCC. 42,45,46 Taken together, the findings of the oncogenetic tree analyses indicate that the initial genomic event is 18q, from which at least three karytotypic pathways emerged. Thus, the events after 18q are not completely random. This implies that the process of oral carcinogenesis is not the accumulation of genetic alterations in an unordered fashion. Rather, alterations occur preferentially in certain orders that may define tumor subtypes such as the node positive and node negative oral cancers. The tree models have provided novel information about the sequence of alterations and the non-linear pathways they form. Future efforts should attempt to identify the genes present at the altered chromosomal regions identified by the tree analysis, which might help elucidating the pathogenesis of oral cancers.