The molecular landscape of well differentiated retroperitoneal liposarcoma

Well differentiated liposarcoma (WD‐LPS) is a relatively rare tumour, with fewer than 50 cases occurring per year in the UK. These tumours are both chemotherapy‐ and radiotherapy‐resistant and present a significant treatment challenge requiring radical surgery. Little is known of the molecular landscape of these tumours and no current targets for molecular therapy exist. We aimed to carry out a comprehensive molecular characterisation of WD‐LPS via whole genome sequencing, RNA sequencing, and methylation array analysis. A recurrent mutation within exon 1 of FOXD4L3 was observed (chr9:70,918,189A>T; c.322A>T; p.Lys108Ter). Recurrent mutations were also observed in Wnt signalling, immunity, DNA repair, and hypoxia‐associated genes. Recurrent amplification of HGMA2 was observed, although this was in fact part of a general amplification of the region around this gene. Recurrent gene fusions in HGMA2, SDHA, TSPAN31, and MDM2 were also observed as well as consistent rearrangements between chromosome 6 and chromosome 12. Our study has demonstrated a recurrent mutation within FOXD4L3, which shows evidence of interaction with the PAX pathway to promote tumourigenesis. © 2021 The Authors. The Journal of Pathology published by John Wiley & Sons, Ltd. on behalf of The Pathological Society of Great Britain and Ireland.


Introduction
Retroperitoneal liposarcoma (RPLS) is the most common type of retroperitoneal sarcoma (RPS), making up around half of all RPSs [1]. RPLS typically presents incidentally on cross-sectional imaging as a large homogeneous retroperitoneal mass [2]. As these tumours become larger, the risk of their transformation from well differentiated (WD) to dedifferentiated (DD) increases. The molecular events that initiate WD-LPS and lead to dedifferentiation of well differentiated liposarcoma are currently unknown. Molecular changes seen in the various types of RPLS include coordinated amplification of MDM2 and CDK4 [3]; giant ring chromosomes [4][5][6]; amplification of CDK4, HMGA2, and TSPAN31; and the FUS-DDIT3 (solely in myxoid liposarcoma) fusion oncogene, seen in a subset of liposarcomas caused by t(12;16) or t (12;22) translocation [7,8]. To date, there have been few comprehensive studies on the molecular genetics of well differentiated RPLS [9]. The key feature is amplification of the MDM2 gene, which has been seen almost ubiquitously in WD-RPLS and DD-RPLS at rates approaching 100% [10]. In an exome sequencing study, Kanojia et al [11] found consistent amplification, as previously described, in MDM2 and HMGA2. The most frequently mutated gene in their cohort was in PLEC (27% of samples), which codes for plectin, a cytoskeleton protein. Amin-Mansour et al [12] studied exome sequencing trios of normal, well differentiated, and dedifferentiated tissue regions in patients with liposarcoma, finding no common driver but verifying the recurrent amplification of CDK4.
However, the lack of a comprehensive integrated analysis means that therapeutic options for treatment of this tumour are limited as the molecular pathways that are potentially dysregulated in liposarcoma have not been identified with certainty. We therefore carried out a comprehensive pilot multi-platform molecular characterisation, utilising whole genome sequencing, RNA sequencing, and methylation arrays of well differentiated retroperitoneal liposarcoma. a retroperitoneal mass. Immediately after the specimen had been removed from the patient, it was conveyed to the Queen Elizabeth Hospital Birmingham (QEHB) Histopathology laboratories, where a representative sample of liposarcoma and associated normal fat was taken as far away as possible from the tumour, and immediately flash frozen in liquid nitrogen and then stored at À80 C. Where no normal fat was available, normal adjacent organ (i.e. kidney/colon/pancreas, etc.) was sampled. Where no frozen material was available to match with the specimen, formalin-fixed, paraffinembedded (FFPE) tissue blocks were retrieved from the histopathology archives of the QEHB.
Ethical approval was obtained via the University of Birmingham Human Biomaterials Resource Centre Biobanking ethics (Ref 09/H1010/75). All experiments were performed in accordance with relevant guidelines and regulations, and informed consent was obtained from all participants.
DNA/RNA extraction DNA extraction was performed by proteinase K digestion. A 5-mm 3 piece of tissue was immersed in buffer ATL overnight with 20 mg of proteinase K. The resulting lysate was processed using the Qiagen DNeasy Blood & Tissue kit (Qiagen, Manchester, UK). Extracted DNA was eluted into buffer AL and was quantified using Nanodrop spectrophotometry for purity and Qubit (Thermo Fisher Scientific, Loughborough, UK) fluorimetry for DNA concentration.
Because of limited volumes of tissue, RNA extraction was performed on formalin-fixed, paraffin-embedded sections. Three 10 μm FFPE sections were cut using a microtome, then processed using the Purelink FFPE RNA extraction kit (Invitrogen, Renfrew, UK) according to the manufacturer's instructions. Extracted RNA was quantified for purity using Nanodrop spectrophotometry and for quantity using Qubit fluorimetry (Thermo Fisher Scientific, Loughborough, UK).

Whole genome sequencing (WGS) and targeted sequencing
Shotgun WGS was performed using 500 ng of paired tumour-normal DNA by the Illumina R&D group at the Illumina Great Chesterford facility. DNA was libraryprepared for sequencing using the Truseq PCR free library preparation kit according to the manufacturer's guidelines. For six samples (associated normal), no fresh frozen material was available and therefore a PCR-based WGS library preparation was used (Illumina FFPE) using 1 μg of DNA. Sequencing was performed on an Illumina HiSeq 2500, pooling one sample across four lanes using the v4 chemistry. Sequenced reads were aligned to the GrCh37 reference genome using the Isaac aligner [13]. Tumour-normal single nucleotide variant (SNV) and indel calling was performed using the Strelka variant caller [14]. Copy number variant calling was performed in Canvas in tumour-normal mode, and where no control normal was available, a panel of normals was used. Structural variant calling was performed using Manta.
In order to resolve complex variants and paralogues, targeted nanopore sequencing of the FOXD4L3 region was carried out. Custom primers were designed with Primer3 to target the entire coding sequence of FOXD4L3 in one amplicon. Long-range PCR (conditions available on request) was performed and PCR amplicons were cleaned and ligated using an Oxford Nanopore ligation sequencing kit (LSK-109; Oxford Nanopore, Oxford, UK). Library was then loaded onto an ONT MinIon 9.5.1 flow cell and run for 36 h. Data were basecalled using Guppy and variant calling was performed by MiniMap2. No polishing was performed.
Whole genome sequencing data for the well differentiated liposarcoma line 93T449 were downloaded from the NCBI SRA archive. Downloaded data were aligned to the GRCh38 reference genome using MiniMap2; then variant calling was performed with FreeBayes and copy number calling with Canvas as previously described.

RNA sequencing
Whole transcriptome sequencing using 1 μg of total RNA was performed by the Shenzen Research Laboratories of Beijing Genomics Inc using the Illumina TruSeq RNA Sample preparation kit v2. RNA was DNAasedigested, fragmented, and cDNA was synthesised using SuperScript III (Invitrogen). cDNA was then endrepaired, A-tailed, adapter-ligated, PCR-amplified, and purified. After library quantification, this was treated with duplex specific nuclease (Evrogen, Cambridge, UK) and purified. The final libraries were sequenced on an Illumina HiSeq 2000 to an average of 35 million reads with a read length of 90 bp.
Sequenced reads were quality-trimmed using Trimmomatic and aligned with the hg38 reference genome using STAR [15]. Absolute gene expression was determined using featureCounts [16] and differential expression was determined using the GSA module of Partek Flow [17,18]. Log normalised FPKM values were submitted to the CIBERSORT immune infiltrate pipeline [19] for deconvolution of immune cell types and were used to calculate the coordinate immune response cluster (CIRC) score [20].
RNA fusion analysis was carried out using the fusion analysis pipeline of STAR-FUSION [21].

DNA methylation arrays
Analysis of the methylome was achieved using Illumina HumanMethylation450 arrays at the University of Birmingham. One microgram of DNA was bisulphiteconverted using the Zymo EZ-DNA methylation kit using the custom Illumina microarray protocol. This was then hybridised to Illumina HumanMethylation450 microarrays according to the manufacturer's protocol. Hybridised arrays were washed and stained, and then immediately scanned using an Illumina iScan microarray scanner. Extracted microarray intensities were normalised using Molecular landscape of well differentiated retroperitoneal liposarcoma 133 the Bioconductor/ChAMP pipeline [22] and individual beta values were exported. Differential methylation was quantified using eBayes/limma and differentially methylated regions were ascertained using DMRhunter.

Further bioinformatics analysis
Identification of recurrent somatic mutations was carried out as follows. Firstly, exported subtracted tumournormal variant calls from Strelka were transformed from Variant Call Format (VCF) files to MAF (Mutation Annotation Format) files using vcf2maf. In patients where no matched normal was available, tumour VCF files were filtered using dbSNP 1000 Genomes Project and GnoMAD such that all SNPs with a minor allele frequency of greater than 0.01 were discarded. MAF files were then concatenated together in order for analysis in MutSigCV [23]. Analysis of recurrent mutations in the cohort was carried out using MutSigCV using standard settings. All recurrent mutations were ranked and sorted by P value. Identification of recurrent structural variations was carried out using the Manta [24] structural variant caller (https://github.com/Illumina/manta). The package was run with standard parameters on tumour BAM files. Subsequent output was then filtered to produce Circos (RCircos) plots by taking a subset of the data from Manta and limiting it to interchromosomal inversions and translocations.
Copy number alterations were called using the Illumina Canvas [25] (https://github.com/Illumina/canvas) copy number variant caller. The package was run with standard parameters on tumour BAM files. Regions were output as standard VCF files and then filtered using a custom script to preserve high-quality copy number calls (quality score > 30) on regions greater than 5 kilobases in length (as this was thought to be more functionally relevant). Copy number calls were then imported into GREVE [26] (http://www.well.ox.ac.uk/ GREVE/) with genome-wide plots and chromosome level significance plots output.

Validation of findings
Sanger sequencing to validate mutational findings was carried out using standard methodology. Primers flanking the region of interest were designed using ExonPrimer with a T m of 60 C. Primer specificity was verified using UCSC in silico PCR and primers were optimised with gradient PCR before use. 10 ng of tumour DNA underwent PCR using the Qiagen Multiplex kit using primers against FOXD4L3 (supplementary material, Table S1). PCR products were cleaned with ExoSAP-IT and underwent a BigDye sequencing reaction under standard conditions. Sequencing products were run on an ABI 3700 sequencer.
Nanopore sequencing was carried out via long-range PCR of the entire FOXD4L3 gene in one amplicon. A master mix was prepared containing template DNA, 2 μl of forward and reverse primers at 0.4 μM, 25 μl of LongAmp Taq 2X Master Mix (New England Biolabs, Ipswich, MA, USA), and nuclease-free water up to a total of 50 μl. Thirty cycles of long-range PCR (with a 2.5 min extension time at 65 C) were performed and the product underwent repair and end prep using the Oxford Nanopore LSK-109 sequencing protocol. Samples were indexed with the Rapid Barcoding Kit and then loaded and sequenced on an R9.4.1 MinIon flow cell for 24 h. Raw sequence FAST5 files were demultiplexed and converted to FASTQ using the Guppy Basecaller. Called FASTQ files underwent two rounds of polishing with Canu, followed by a round of polishing with Medaka. Sequence data were aligned using mini-map2 to the GRCh38 genome followed by variant calling using FreeBayes (command line freebayes - Raw allele counts at the desired sites were output with the samtools command.

Results
A total of eight WD-LPS samples underwent whole genome sequencing of three tumour-normal pairs and five tumour-only samples (due to lack of available paired normal tissue) to an average of 85x read depth, RNA transcriptomic sequencing of tumours only (average 35 million reads per sample), and analysis on the HumanMethylation450 array system. The characteristics of the patients from which tumour samples were obtained are shown in Table 1. Six out of eight patients had died at the time of publication: two from recurrence of a dedifferentiated liposarcoma and four from other causes.

Whole genome sequencing reveals recurrent mutations in FOXD4L3
The mutational rate in all tumours was between 1.65 and 1.8 mutations per megabase, making WD-LPS a relatively infrequently mutated tumour to others studied in the TCGA (supplementary material, Tables S2 and S3). Analysis of recurrent mutations using MutSigCV2 within the samples demonstrated significant mutations (Table 2 and Figure 1) in SPRN, FOXD4L3, ADH1C, and KRTAP2-2. Pathway analysis of these gene mutations determined that only FOXD4L3 was likely to be relevant in a cancer context, due to it being a Forkhead transcription factor-associated gene. FOXD4L3 was predicted by the STRINGS database (supplementary material, Table S4) to interact with PAX3, PAX7, T, and TBX19 transcription factors, all involved in cell differentiation and development, as well as being implicated in alveolar rhabdomyosarcoma [27], biphenotypic sinonasal sarcoma, and others. We also analysed recurrently mutated genes within the Cancer Gene Census and those previously reported in liposarcoma [11] using the Illumina Encore pipeline (supplementary material, Figure S1) including immune evasion (HLA-A, 5/8 samples), Wnt signalling (FAT1;4/8 samples), double strand   We therefore studied FOXD4L3 in more detail and found a recurrent mutation within exon 1 of FOXD4L3 (chr9:70,918,189A>T; c.322A>T; p.Lys108Ter) which resulted in a stop gain. This mutation was present in 7/8 samples and was verified by bidirectional Sanger sequencing of tumour samples; however, this mutation was also present in two normal samples (patients P001224 and P001207) but not in a further normal sample (P001193). The mutation seen in the normal tissue as determined by NGS was not seen on Sanger sequencing. Further investigation revealed that the 'normal' tissue was in fact 'normal' fat immediately adjacent to the tumour and probably represented inadvertent sampling of the leading edge of the WD-LPS sample. The mutation was just upstream of the Forkhead box domain of the protein (supplementary material, Figure S2), suggesting that this mutation may have functional consequences due to loss of this protein binding domain and is expressed in tissue (supplementary material, Figure S3).
Because of the incongruity between the whole genome sequencing and the Sanger sequencing results, we decided to carry out further investigation. FOXD4L3 is a paralogue of FOXD4 [which has multiple (six) paralogues], with minimal sequence variation between them. We therefore designed long-range PCR primers to flank the entirety of FOXD4L3 and performed long-range nanopore amplicon sequencing on a cohort of 24 FFPE samples (consisting of 12 tumour samples plus 12 adjacent normal tissues) plus the 3TL3 liposarcoma cell line. All samples had a mutation within FOXD4L3. We found that the variant allele frequency (VAF) of the A>T transversion was significantly different between tumour and normal tissue, with the tumour having a median VAF of the T allele of 0.43 [interquartile range (IQR) 0.39-0.45] and the normal tissue having a median VAF of 0.21 (IQR 0.16-0.22, Wilcoxon rank p < 0.001). Investigation of the 93T449 well differentiated liposarcoma cell line revealed a copy number (CN) = 3 of the regions surrounding the FOXD4L3 gene which was replicated in the whole genome-sequenced samples, suggesting a complex mutation and copy number event within FOXD4L3 giving the differences in allele frequency.

Copy number analysis of WGS confirms recurrent gains in MDM2 as part of a larger amplification
Copy number variant analysis of the tumour set (Table 3) demonstrated the recurrent copy number gain in HGMA2, the binding partner of MDM2 [28], in six out of eight cases, with all cases having amplification within the 12q15 region to some extent. Recurrent amplification was seen (Figure 2) throughout the long arm of chr12q13.13-q24.33, demonstrating that it is not necessarily restricted to the MDM2/HGMA2 amplification previously described. Analysis of structural variation (SV) revealed a complex pattern (supplementary material, Figure S4) of small-scale chromosomal translocation across the tumour types. Rearrangement of both between and within chromosomes 6 and 12 was seen (supplementary material, Figure S5), consistent which previous cytogenetic observation of WD-LPS.
Analysis of alternative splicing was carried out using STAR (supplementary material, Table S6), with FN1 (fibronectin 1) being the top ranked alternatively spliced gene (normalised counts tumour = 10.28, normal = 0.41; p = 4.52 Â 10 À9 , Q = 8.25 Â 10 À6 ). Pathway analysis demonstrated that the focal adhesion pathway was CIBERSORT analysis of genes associated with immune cell infiltration was carried out and showed that liposarcoma has variable immune cell infiltration, with all samples having some degree of T-cell infiltration, suggesting some degree of recognition of the tumour by the immune system (supplementary material, Table S7).

Gene fusion analysis demonstrates a recurrent fusion in SDHA which causes loss of activity
Gene fusion prediction ( Table 4) via analysis of WGS data demonstrated recurrent fusion events in HGMA2, SDHA, TSPAN31, MDM2, and WIF1. HGMA2 [28], TSPAN31, and MDM2 [9] amplification/fusion have been well described in liposarcoma; however, SDHA (succinate dehydrogenase complex flavoprotein subunit A) has not previously been identified, with 11 fusions in four samples in this dataset. In order to validate this, we then carried out RNA fusion analysis using a targeted RNA panel analysis on a separate set of 24 FFPE samples (12 tumour/12 associated normal) as previously described. This validated (supplementary material, Table S8) a recurrent fusion in SDHA in 8/12 samples, between exon 12 of SDHA and exon 8 of SLC9A3.

Methylation analysis shows dysregulation of metastasis suppression
We then carried out methylome analysis using the Illumina HumanMethylation450 array. The top differentially methylated gene (supplementary material, Figure S6 and Table S9) was ANKAR (cg06479433, % meth tumour = 95.0%, % meth normal = 74.1%, p adj = 2.59 Â 10 À10 , BF = 19.01). Analysis of differentially methylated regions demonstrated six differentially methylated regions (DMRs) that met a genome-wide significance level of 1 Â 10 À6 ( Table 5). The most significant DMR was TIMP2  Molecular landscape of well differentiated retroperitoneal liposarcoma 137 (tissue metalloproteinase 2), which acts as a metastasis suppressor by suppression of the Wnt/β-catenin pathway [29]. We observed an average increase in methylation across the region of 35%, which would suggest that TIMP2 becomes inactivated, thus dysregulating Wnt. Two DMRs were in CpG islands approximately 5 kb upstream of FOXC1 (a Forkhead box gene of unknown function, implicated in both breast [30] and small cell lung cancer [31]) and HOXD3 (a homeobox gene responsible for growth and differentiation [32]). Gene set enrichment analysis of the differentially methylated positions (DMPs) was carried out, which demonstrated significant enrichment of gene sets associated with cancer, with the top ranked gene set being 'FARMER_BREAST_CANCER_CLUSTER_5', AUC = 0.84, p = 2.94 Â 10 À7 .

Discussion
We have demonstrated a recurrent loss-of-function mutation in the FOXD4L3 gene, a Forkhead box gene. This is in contrast to other studies [11], which suffered from poor coverage of FOXD4L3 due to a mixed sample set. A significant weakness of our study is the lack of paired normal tissues for all samples, which may cause low frequency or rare mutant alleles to not be detected due to germline variation; however, we were limited by available tumour material because of the site and size of the tumour.
The FOXD4L3 gene is within a complex region of an ancestral duplication site [33] in chromosome 9 that is postulated to have occurred early in hominid evolution. Although the sequence similarity is high [34] between the different paralogues of FOXD4 (approximately >95%) the principal difference between paralogues is the presence of an indel leading to a frameshift, either a 1-bp deletion at codon 292 or a 53-bp deletion at codon 363, causing alternate protein products to be produced. We observed a stop gain mutation at codon 260, well upstream of the previously described deletions, within the Forkhead binding domain, supported by Sanger sequencing primers specific for FOXD4L3. This mutation has been observed in dbSNP 147 (rs7021123) from ExAC data, but at extremely low frequency (MAF = 0.00825) in one subject only. We did not observe any RNA sequencing reads that aligned to FOXD4L3, despite it being expressed in normal fat on the Illumina Human BodyMap 2.0 database. Typically, a premature stop codon would cause nonsense mediated decay (NMD) of the product after the initial round of translation but as FOXD4L3 is a single exon gene, we cannot with certainty point to NMD as the mechanism here.
We successfully demonstrated the previously identified [11] recurrent copy number change in HGMA2, the binding partner of MDM2, as part of amplification of a larger region on the long arm of chromosome 12 rather than an effect specific to HGMA2.
Our matched RNAseq analysis showed dysregulation in both the PI3K-AKT and the MAPK pathways, as well as genes associated with the transition away from fat to a 'tumour' phenotype. Methylation analysis of these tumours demonstrated activation of proapoptotic genes and genes associated with metastasis. The recurrent fusion within SDHA is also of interest, as similar genes (SDHB in paraganglioma [35] and SDHC in gastrointestinal stromal tumours [36]) are implicated in tumourigenesis.
Further study and functional models are required to understand the effect of changes in these genes and their relevance in WD-LPS as well as the consequences and effects of FOXD4L3 mutation in association with WD-LPS. However, multiple potential target pathways for therapy have now been identified.  Figure S1. Bar chart of the frequency of mutations from whole genome sequencing data for genes taken from the Cancer Gene Census and from previous literature on liposarcoma