The early evolutionary landscape of osteosarcoma provides clues for targeted treatment strategies

Abstract Osteosarcomas are aggressive primary tumors of bone that are typically detected in locally advanced stages; however, which genetic mutations drive the cancer before its clinical detection remain unknown. To identify these events, we performed longitudinal genome‐sequencing analysis of 12 patients with metastatic or refractory osteosarcoma. Phylogenetic and molecular clock analyses were carried out next to identify actionable mutations, and these were validated by integrating data from additional 153 osteosarcomas and pre‐existing functional evidence from mouse PDX models. We found that the earliest and thus clinically most promising mutations affect the cell cycle G1 transition, which is guarded by cyclins D3, E1, and cyclin‐dependent kinases 2, 4, and 6. Cell cycle G1 alterations originate no more than a year before the primary tumor is clinically detected and occur in >90% and 50% of patients of the discovery and validation cohorts, respectively. In comparison, other cancer driver mutations could be acquired at any evolutionary stage and often do not become pervasive. Consequently, our data support that the repertoire of actionable mutations present in every osteosarcoma cell is largely limited to cell cycle G1 mutations. Since they occur in mutually exclusive combinations favoring either CDK2 or CDK4/6 pathway activation, we propose a new genomically‐based algorithm to direct patients to correct clinical trial options. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
Osteosarcoma is the most common primary malignant tumor of bone that generally develops in children and adolescents, but there is a large population of patients presenting the disease in the last decades of life [1]. The osteosarcoma's cell of origin is currently unknown, but it is speculated that these tumors originate from early osteoblastic progenitors [2] that might be vulnerable to chromoanagenesis [3]. Should this happen, most cells will die in apoptosis. However, in rare cases, chromoanagenesis can generate TP53 mutations that allow cells with rearranged genomes to enter mitosis and eventually escape programmed cell death [4]. However, osteosarcomas can also develop without TP53 mutations and recent studies suggested that its mutations occur across a much broader range of genes (67 according to one [5]), questioning how such genetic diversity contributes to the development of a single cancer entity. In this study, we attempted to tackle the above conundrum by analyzing longitudinal sequencing data from matched primary and recurrent tumors of 12 patients. Phylogenetic and molecular clock analyses were then performed to estimate the timing of different cancer driver mutations and we explored which of the initiating ones can be utilized as therapeutic targets. These data provide an understanding of the evolutionary trajectories underpinning the development of osteosarcoma and highlight cell cycle G1 mutations as particularly promising early actionable mutations that become clonally pervasive in metastatic osteosarcomas.

Patients and samples
The study was approved at the University Hospital Basel, following the approval of the ethical committee for mutational analysis of anonymized samples ('Ethikkommission beider Basel' ref 274 /12). Informed consent was obtained from all 12 patients. All tumor samples were evaluated by an experienced bone pathologist to confirm the diagnosis. Tumor histology and all other clinically relevant characteristics were retrieved at this stage and are depicted in supplementary material, Table S1.

Validation cohorts
The validation cohort comprised two independent mutation datasets: the International Cancer Genome Consortium dataset consisted of mainly untreated primary osteosarcomas (n = 112) [5], whereas the INFORM dataset included lung metastases (n = 41) [6]. The mutation data used in our study had already been processed and therefore no additional filtering was needed.

Sequencing data availability
Illumina sequencing data have been deposited in the European Nucleotide Archive under the study accession number EGAS00001005199.

Fusion-gene identification from RNA sequencing data
In patients P6 and P7, RNA sequencing libraries were prepared from tumor DNA using the TruSeq RNA Sample Preparation Kit v2 (Illumina). Total RNA was extracted from fresh-frozen tumor tissue and mRNA was then purified from 1 μg of total RNA using oligo(dT) beads. Paired-end sequencing was performed on the Illumina HiSeq 2500 sequencers in rapid run mode according to the manufacturer's protocol using the TruSeq SBS Kit (ver. 3). Sequencing reads were mapped onto the GRCh37 human reference genome using STAR or HISAT2 (http://daehwankimlab.github. io/hisat2/). ChimeraScan (https://bioconductor.org/ packages/release/bioc/html/chimera.html), defuse (https://github.com/amcpherson/defuse), and Fusion-Catcher (https://github.com/ndaniel/fusioncatcher) algorithms were used to detect chimeric RNA transcripts from fastq data. Predicted fusions were filtered out based on the presence of chimeric reads, which were 'blasted' against the human transcriptome to exclude further ambiguity concerning the involved partners.

Semiconductor sequencing
The ion torrent sequencing (Thermo Fischer Scientific, Waltham, MA, USA) approach was taken for technical replication of selected somatic variants. Used in conjunction with the AmpliSeq library construction kit, the Ion AmpliSeq comprehensive cancer panel was used to capture exons of 409 genes from the Cancer Gene Census database and the resulting libraries were run on the Ion PGM sequencers. Raw reads were processed using the Ion reporter software with standard settings.
The early evolutionary landscape of osteosarcoma 557

Homologous recombination deficiency (HRD) testing
Homologous recombination repair deficiency (HRRD) was classified based on mutational signature analysis that was carried out using EMu software (https:// github.com/andrej-fischer/EMu). In addition, the degree of genomic scarring indicative of HRRD was determined as the sum of the degree of loss of heterozygosity (LOH) [7], telomeric allelic imbalance (TAI) [8], and large-scale transitions (LSTs) [9]. Consequently, the HRD-LOH score counts the number of LOH regions longer than 15 Mb. HRD-TAI similarly determines the prevalence of more than 11-Mb-long allele-imbalanced regions that extend to sub-telomeres but do not cross the centromere. Finally, the HRD-LST score is defined as the number of break points between regions longer than 10 Mb but after filtering out regions shorter than 3 Mb and adjusting for sample ploidy. Osteosarcomas with a summary score of 42 or higher were deemed positive. XY chromosomes were omitted from the analysis.

Phylogenetic analysis
We built phylogenetic trees from the SNV data using PAUP software (http://phylosolutions.com/paup-test/). First, we converted each variant set into a binary matrix, where the rows related to a particular tumor or the normal sample and the columns related to a specific variant. The binary encoding (0/1) designated the absence or presence of a variant. A nexus file was used to specify the parsimony parameters needed for the tree construction along with the variant matrix. The following functions and parameters were used: (1) the outgroup function was used to root all resulting trees to the normal sample (effectively, a column on the mutation matrix containing only zeros); (2) the hsearch function was used to perform a heuristic search of 10 000 000 trees from the given tree space, with 1000 of the shortest trees output for the main analysis; (3) the bootstrap function was used to perform a sub-sampling procedure 10 000 times that involved randomly selecting a set of mutations from the binary matrix (with replacement), with the proportion of each branch instance reported in a log file; and (4) the allTrees function was used in all cases because fewer than ten samples were present. This made it possible to perform an extended 'brute-force' run to acquire shortest tree(s) from the total search space, at the expense of computational time. The homoplasy index for the most parsimonious tree in a given set was automatically calculated and output to the PAUP log file. To obtain the shortest, and thus most parsimonious, tree, an Rscript using the ape package was used to input the .tre file.
Fluorescence in situ hybridization CDK4 amplifications were determined using fluorescence in situ hybridization, which was carried out on formalin-fixed, paraffin-embedded slides using the standard procedure specified by the probe manufacturer (ZytoLight ® SPEC CDK4/CEN 12 Dual Color; supplier: LabForce AG, Muttenz, Switzerland). Hematoxylin and eosin staining was carried out in parallel for comparative purposes.

Statistical analyses
Statistical analyses were performed in R. Unless otherwise stated, all statistical comparisons of two distributions used (wherever possible paired) Welch's t-test, and 95% confidence intervals (CIs) of means were determined using the ci function of the gmodels package in R. Contingency tables were analyzed using Fisher's exact test.

Results
The mutation landscape of primary and recurring osteosarcomas We performed paired exome and low-coverage genome sequencing of chemotherapy-naive osteosarcoma primaries (P) and local and/or distal recurrences (REC) in ten patients. In two additional patients, primary tumors before and after neoadjuvant chemotherapy were sequenced, but recurring tumors were not available for the analysis. In total, our discovery set comprised 36 tumor exome-genome pairs, which were sequenced to a median depth of 156X (range 87-420X) and 20X (range 5-77X), respectively. In each patient, additional exomes and genomes were sequenced using bloodderived leukocyte DNA to the equivalent median depth of paired tumor samples. For the discovery of somatic mutations, we restricted our analysis to protein-coding regions. Primary tumors acquired a median of 57 single-nucleotide variants (SNVs, range 6-144, 95% CI = 26.8-88.4) and 10 indels (range 0-21, 95% CI = 6.91-13.5), whereas recurrent tumors samples had $50% more of each type of mutation ( Figure 1). Without exceptions, the C:G>T:A changes were the most common changes followed by C:G>A:T and T:A>C:G changes. To prioritize these mutations for further investigation, we excluded all SNVs with moderate or benign functional effects (i.e. SIFT > 0.05, Poly-Phen2 < 0.7, Mutation Taster < 0.7, GERP++ < 0, CADD < 10, and PhyloP VT/PL < 0) but kept all protein truncating and splice-site mutations. We then inspected all variants in the Integrated Genome Viewer to exclude any variant of evidently poor quality and merged the resulting list of variants with copy-number alterations (gains of 5+ copies or losses of both copies) that were identified simultaneously. Eventually, mutations in 29 cancer driver genes remained, of which about one quarter were functionally relevant for cell cycle G1 progression.
Cell cycle G1 progression normally occurs when favorable conditions are met and is dependent on the activation of cyclin-dependent kinases (CDK2, 4, 6) by their regulatory units called cyclins. When the cell is ready to divide, Rb protein is phosphorylated to pRb by CDKs, leading to the inactivation of Rb. This process 558 M Kovac, B Ameline, S Ribi et al allows a cell to enter into the cell cycle S state that is normally repressed by p16, which is encoded by CDKN2A.
To replicate our findings, we analyzed public mutations data from two previous studies [5,6], yielding a cell cycle G1 mutation prevalence that was significantly lower than that observed in the discovery set of tumors (P: n = 58/112, 51.7%; REC: n = 20/41, 48.7%). We did not find any specific attribute that would explain the difference, although we initially thought that a bias for adult osteosarcomas in Behjati et al's cohort [5] could be the case. However, we still confirmed the mutual exclusivity of CCND3 and CCNE1 amplifications in these patients, as well as the mutual exclusivity of CDK4 amplifications and loss-of-function mutations in RB1 ( Figure 2C). The cumulative prevalence of CCNE1, CCND3, and CDK4 amplifications in tumors with active CDKs was 60.2% (n = 47/78), which translates into 30.7% (n = 47/153) of all cases.

Phylogenetic analysis confirms early acquisition of cell cycle G1 mutations
Cell cycle G1 mutations were chosen for more detailed analyses based on their known therapeutic potential, good population prevalence, and almost universal pervasiveness across recurrent tumor manifestations. To determine if other mutations would follow the same pattern, we constructed maximum-parsimony trees and inspected their topologies ( Figure 3 and supplementary material, Table S3). Using somatic SNV data, we observed that about half of the trees were characterized by shorter trunks (variants ubiquitous across different tumors) with comparatively longer branches, thus appearing 'apple tree'shaped. Mutations affecting cell cycle G1 were located on the trunks, whereas somatic mutations affecting other pathways occurred equally on trunks, branches, and leaves. For example, the NRAS p.I24N mutation (P3), which is a well-proven gain-of-function mutation, was detected only in a single metastasis that was phylogenetically ancestral to the primary tumor and to another metastasis, neither of which had it. Similarly, amplifications of MYC, BRCA2, and a co-amplification of the region PDGFRA-KIT occurred mostly in recurring tumors (P1, P2, P4, P6, P7, P10), whereas JUN and APC mutations occurred in a single osteosarcoma primary (P1). Finally, the most interesting novel somatic mutations that occurred on trunks and branches were activating mutations of the RAS GTPases KRAS and HRAS that predispose to non-ossifying fibromas [11] and malignant giant cell tumors of the bone [12,13] but have not been reported in osteosarcoma thus far. The early evolutionary landscape of osteosarcoma 559 We therefore hypothesized that the selective benefits of mutations occurring on branches and leaves were comparably weaker for the mutations occurring on trunks and searched for evidence by examining the relative ratio of non-synonymous to synonymous mutations at the corresponding trunks and branches (Figure 3 and supplementary material, Table S4). The nonsynonymous to synonymous mutation ratio is normally used to estimate the balance between neutral mutations, purifying selection, and beneficial mutations, such that values that are significantly above 1 were unlikely to occur without at least some of the mutations being advantageous. This indeed proved to be the case for most mutations occurring on a trunk, whereas the reduction in positive values on branches and leaves indicated weakened selection pressure.

Chromothripsis and staircase amplifications generate cell cycle G1 mutations
Primary osteosarcomas typically had $30% of the genome affected by copy-number alterations (CNAs), which was $14% less than in matched recurring tumors (P: 95% CI = 21-40%; REC: 95% CI = 35-53%). Most CNAs were either single copy gains or losses but about $1% of the cancer genomes underwent complex reorganization ( Figure 4A), resulting in chromothripsis and centromere/telomere-driven staircase amplifications. Chromothripsis (CT) events were larger and affected $389 genes (95% CI = 0-783 genes) across a 32 Mb chromosomal sequence (95% CI = 4.7-94.1 Mb). Staircase amplifications (SAs) were comparably smaller, affecting either $35 genes within the $3.5 Mb sequence flanking the ends of the The box represents the upper and lower 25% quantile; the central line represents the median value. P values were calculated using paired two-sample t-tests and are shown above the plots. Mean and median values are shown below the plots. (C) Mutual exclusivity of cell cycle G1 mutations was validated using data from two independent studies, comprising both primary (ICGC study [5]) and metastatic osteosarcomas (INFORM study [6]). The proportion of tumors harboring mutations and the individual mutation frequencies are shown as stacked bars. The mutual exclusivity of cell cycle G1 mutations is shown below and was tested according to Canisius et al [10]. The corresponding p and q values are shown below and were calculated under the assumption of a false discovery rate of 0.01.  . Focal amplifications were slightly larger (P: mean = 28 genes, 95% CI = 13.2-36.5; REC: mean = 32.9 genes, 95% CI = 23-42) and showed a peculiar tendency to undergo further breakage-fusion cycles as cancers evolved. For example, we describe JUN amplification that originated as a simple reciprocal translocation t(1:10) that underwent further breaking and rejoining ( Figure 4B, patient P1) as the primary tumor evolved (P1). Other cases of progressively increased complexity of focal CNAs were noted in USP14 (P1) and FGFR1 (P2), which are both involved in cisplatin resistance [15]; HRAS oncogene (P2); SOX2 transcription factor (P5); and THOC1 (P1), encoding a proapoptotic nuclear matrix protein that induces cell cycle G2 arrest after interacting with Rb [16].
Osteosarcomas acquire cell cycle G1 mutations no longer than a year before the primary tumor is biopsied Pairwise comparison of Jaccard similarities derived from CNA data yielded a median value of 0.74 (95% CI = 0.68-0.77; supplementary material, Table S5), suggesting that recurring osteosarcomas developed relatively stable genomes and that most high-level rearrangements must have existed before the primary tumors were clinically detected. To estimate when these mutations occurred, we used an approximation timing method that was recently applied to similar data from Ewing's sarcomas [17,18]. In each patient, we first recovered molecular clock signature AC1 (Figure 3) from the primary tumor data; confirmed the steadiness of its increase across sequential tumor samples; and then moved backwards to estimate the time when the tumor emerged. This turned out to be no later than a year prior to the initial biopsy (range 6.9-9.6 months), whereas the divergence of the metastatic cancer cells occurred within 5 months (range 0.6-5.8 months) from the estimated time of origin.
Taking advantage of precomputed 96-channel SNV mutation signatures, we then searched for 'genomic scars' to identify additional biological processes that could be used therapeutically. We specifically recovered aging (AC1), microsatellite-instability (AC6), and UV exposure signature (AC7), as well as the homologous recombination repair deficiency (HRRD) signature (AC3). The AC3 signature was specifically of interest to our group because it was detected only in four tumors (n = 4/12, 33%; P1, P2, and P10), which was fewer than what would be expected from our previous data [19]. To make the present results comparable, we extended our analysis by calculating the level of loss of heterozygosity (LOH-HDR) [7], telomeric allelic imbalances (TAIs) [9], and large-scale transitions (LSTs) [8]. Having done so, a higher prevalence of positive patients was achieved, although it varied greatly. For example, testing for LSTs identified 83% (n = 10/12) of patients as HRRD-positive, whereas testing for LOH-HDRs   Table S6). However, when all test scores were combined (https://www.accessdata.fda.gov/cdrh_docs/ pdf19/P190014B.pdf), only three patients with BRCA1/ BRCA2 mutations remained in the list, plus there was one additional case related to a pathogenic mutation in PALB2. To illustrate this case, we outline the clinical course of an 8-year-old patient in whom genetically simple primary osteosarcoma recurred as highly rearranged HRRD-positive metastases after acquiring PALB2 deletions (supplementary material, Figure S3).

Discussion
We report that the earliest and thus clinically most promising mutations in osteosarcoma affect the cell cycle G1 transition, which is guarded by cyclins D3, E1, and cyclin-dependent kinases 2, 4, and 6. Cell cycle G1 alterations originate no more than a year before the primary tumor is clinically detected and occur in a clonally dominant fashion in more than 90% and 50% of patients of the discovery and validation cohorts, respectively. Our findings suggest that CDK inhibitors might be suitable as an add-on therapy in osteosarcoma whilst the established treatment protocols remain the backbone therapy for most patients. However, CDK inhibitors are not yet approved for the treatment of bone and soft tissue sarcomas, although the first clinical trial is underway in chordoma (ClinicalTrials.gov Identifier: NCT03110744), another highly malignant tumor of bone. Outside this study, a dramatic response to palbociclib has been reported in a 62-year-old patient with advanced refractory CDKN2A-deficient chordoma who had been surgically treated for seven recurring tumors over the course of 20 years (manuscript under review, personal communication 23 March 2021 from Dr Mrinal Gounder, Department of Medicine, Memorial Sloan Kettering Cancer Center and Weill Cornell Medical College, New York, USA). Since chordomas are known to acquire nearly identical mutation landscapes to osteosarcomas [20], it is plausible that a similar treatment effect of CDK inhibition could be achieved in patients with osteosarcoma.
CDK genes are well expressed in human osteosarcoma tissues and their overexpression correlates with metastatic spread and poor prognosis. In a recent study, Sayles et al used palbociclib treatment on patientderived osteosarcoma xenografts with CDK4 amplifications to show that CDK4/6 inhibition interferes with cell cycle progression, induces cell senescence with apoptosis, and decreases the pRb levels [21]. The interaction between CDK4 and Rb is critical for CDK4 inhibitors to work since tumors deficient in Rb are generally The early evolutionary landscape of osteosarcoma 563 insensitive to CDK4 inhibition. Luckily, RB1 and CDK4 mutations seem to be mutually exclusive in osteosarcomas and thus CDK4/6 inhibition may be applied without fear of reverting mutations, although they could occur as a response to multimodal chemotherapy. Given that chemotherapy is the current standard line of treatment in osteosarcoma, it is important to add that CDK4 inhibition acts antagonistically to it and therefore strategies positioning CDK4 inhibitors between chemotherapy cycles are being developed to prevent survival of residual tumor cells that escape current chemotherapy regimens. In addition to CCND3 amplifications, amplifications of CCNE1 are common in osteosarcoma as well and the effect of CDK2 inhibition on tumor growth has been demonstrated by dinaciclib (SCH 727965) using patientderived xenografts [21]. However, these experiments showed that the current line of CDK2 inhibitors is not entirely specific to CDK2 and therefore more selective inhibitors need to be developed. Targeting CDK2 should therefore be considered with caution but has promise of becoming a therapeutic avenue that might offset tumor resistance to CDK4 inhibition, which CCNE1 amplifications normally confer to [22].
Cell cycle G1 mutations are typical early mutations but other cancer drivers can occur at any evolutionary time. An interesting finding that falls into this category is mutations that disrupt the ability of a cell to repair DNA damage through homologous recombination (HRR), and we specifically focus on BRCA2 and PALB2 mutations because of their association with PARP inhibition sensitivity (PARPi). We note, however, that without in vitro experiments the HRR deficiency (HRRD) trait is notoriously difficult to determine and therefore in silico tests should be interpreted with caution. For example, our group was the first to describe HRRD in more than 80% of 121 osteosarcomas and our studies are broadly consistent in that HRRD exists in tumors that harbor chromosomal 13 deletions, either as bi-chromosomal deletions that affect RB1 and BRCA2 genes equally or as singlecopy whole-arm deletions of chromosome 13 that are complemented by focal deletions in BRCA2. However, our present and past studies do not yield a similar prevalence of HRRD, due to the use of a more stringent methodology which has been increasingly updated over the last 5 years.
Taken together, our study represents one of the first longitudinal portraits of the osteosarcoma evolution and therefore only a few other studies exist with which to compare our results. In a study analogous to ours, Wang et al [23] performed multi-region sequencing of primary and recurring osteosarcomas of ten patients and used phylogenetic analysis to determine intra-and inter-tumoral heterogeneity. These data confirm that as time progresses, cancer cells acquire more and more mutations with higher antigen load and improved immunogenicity. In both studies, primary tumors were mainly driven by TP53 mutations, whereas additional drivers such as BRCA2 mutations occurred mostly in metastatic cells. Metastasis-specific mutations in osteosarcoma are also described in a study by Suehara et al [24], with which our findings are mostly consistent. Both studies describe cancer drivers at similar frequencies and similar tumor manifestations (for example, BRCA2 mutations in metastases), but Suehara et al suggest that co-amplifications of the regions PDGFRA-KIT and VEGFA-CCND3 are mutually exclusive, which our data contradict. Figure 6. Decision tree for application of CDK and PARP inhibitors. (A) The mutation status of CDK2/4/6, cyclin E1/D3, RB1, and CDKN2A is determined from sequencing data or by probe-based assessment techniques (arrays, multiplex ligation-dependent probe amplification, fluorescence in situ hybridization analysis). The tumor's response to CDK4/6 inhibition then depends on fulfilling two conditions: (1) CCND3 and CDK4/6 amplifications are present and (2) there are no mutations in RB1 and CCNE1. This is because CCNE1 amplifications confer CDK4 inhibition resistance by having Rb phosphorylated via the CDK2 pathway and the loss of Rb disrupts CDK4 signal transmission. In the same vein, the success of CDK2 inhibition relies on having the status of cyclin E1 as amplified but Rb status as wild type. (B) The analysis of HRRD follows the Myriad Genetics guidelines: a summary score is calculated from BRCA1/BRCA2 mutation status and three independent test scores -LST, LOH-HDR, and TAI. Tumors with cut-off scores higher than 42 are deemed positive and should respond to PARP inhibition.

M Kovac, B Ameline, S Ribi et al
Circling back to CDK inhibition, we would like to make one final note. Cell cycle G1 mutations are one of the few targets that are both pervasive and targetable, and activating mutations can be detected in diagnostic biopsies with techniques such as fluorescence in situ hybridization (amplifications of CDK4, CCND3, CCNE1), immunohistochemistry (p16 loss), next-generation sequencing, and multiplex-ligation probe-dependent amplification. Assuming that tumor biopsies of the primary osteosarcomas are available, we propose a simple decision tree ( Figure 6) to outline the diagnostic algorithm. If primary tumors are not available, the analysis of recurring tumor manifestations should offer an equivalent level of information for clinical staff to decide on an appropriate course of action, as well as providing knowledge about alternative targets such as HRRD mutations.