Novel COL4A1‐VEGFD gene fusion in myofibroma

Abstract Myofibroma is a benign pericytic tumour affecting young children. The presence of multicentric myofibromas defines infantile myofibromatosis (IMF), which is a life‐threatening condition when associated with visceral involvement. The disease pathophysiology remains poorly characterized. In this study, we performed deep RNA sequencing on eight myofibroma samples, including two from patients with IMF. We identified five different in‐frame gene fusions in six patients, including three previously described fusion transcripts, SRF‐CITED1, SRF‐ICA1L and MTCH2‐FNBP4, and a fusion of unknown significance, FN1‐TIMP1. We found a novel COL4A1‐VEGFD gene fusion in two cases, one of which also carried a PDGFRB mutation. We observed a robust expression of VEGFD by immunofluorescence on the corresponding tumour sections. Finally, we showed that the COL4A1‐VEGFD chimeric protein was processed to mature VEGFD growth factor by proteases, such as the FURIN proprotein convertase. In conclusion, our results unravel a new recurrent gene fusion that leads to VEGFD production under the control of the COL4A1 gene promoter in myofibroma. This fusion is highly reminiscent of the COL1A1‐PDGFB oncogene associated with dermatofibrosarcoma protuberans. This work has implications for the diagnosis and, possibly, the treatment of a subset of myofibromas.

showed promising results in three case reports. [6][7][8] The gene alterations that drive the development of PDGFRB wild-type tumours remain unclear.
Although gene fusions are frequent and potent oncogenic drivers in soft-tissue neoplasia, 9,10 little is known regarding gene fusions in myofibroma. To date, only a few cases of myofibroma have been analysed, leading to the description of translocations involving the SRF gene, which encodes serum response factor. SRF is fused to various 3' partner genes in soft-tissue tumours, including RELA in a subset of cellular variants of myofibroma and ICA1L in cellular myoid neoplasms. [11][12][13] The goal of this study was to perform RNA sequencing of myofibroma samples to gain insight into the genetic basis of these tumours. We confirmed the presence of SRF fusion transcripts. More importantly, we unravelled a new recurrent fusion gene that leads to production of the growth factor VEGFD under the control of the COL4A1 gene promoter. This result has potential implications for the diagnosis and treatment of myofibromas.

| Study design
This study was approved by the medical ethics review board of the University of Louvain. We obtained archived fresh frozen samples from eight patients diagnosed with sporadic myofibroma or IMF according to the WHO classification. Some patients have already been described in a previous study. 3

| RNA sequencing, fusion and PDGFRB variant calling
Total RNA was extracted from fresh-frozen tumour samples or cryomold samples (P46, P113) using TriPure reagent (Roche, Switzerland). Paired-end RNA sequencing was performed using Illumina TruSeq Stranded mRNA libraries. Cryomold samples did not pass RNA quality thresholds and were analysed using Illumina TruSeq Exome mRNA libraries (P46 and P113). This generated at least 40 million paired-end reads of 150 nucleotides (Macrogen, South Korea) per specimen. After a quality check using FastQC, 14 reads (fastq files) were aligned on the GRCh37 reference genome with STAR aligner (version 2.7.2b) using two-pass mode and optimized parameters to collect chimeric junctions, as described. 15 Aligned sequences were then analysed for fusion calling using two different methods: STARFusion version 1.8 16 and FusionCatcher version 1.20. 17 The latter generated far more fusion candidates. We retained fusions predicted by both methods for further analysis.
Variant calling was performed according to GATK best practices. RNA-seq quantification was performed using Kallisto, 18 followed by the Bioconductor packages tximport 19 and DESeq2. 20 To perform gene set enrichment analysis, we used the Bioconductor package limma 21 with significantly differentially expressed genes (p-adj <0.05) from DESeq2 analysis and the GSEA software 22 with the whole expression data set.

| Molecular validation of gene fusions
We selected the most promising predicted fusions, based on bioinformatic criteria (mapping quality indices) and biological relevance (in-frame fusions involving protein-coding genes), for molecular validation. We confirmed the presence of the gene fusions by amplifying the predicted breakpoint junction by polymerase chain reaction (PCR) after reverse transcription of tumour RNA (see Table S2 for the complete list of oligonucleotides

| Plasmids and site-directed mutagenesis
The COL4A1-VEGFDssts mutant was produced after introduction of the mutations corresponding to VEGFD:p.R85S and p.R88S by site-directed mutagenesis, according to the QuickChange XL-II kit protocol (Agilent). The COL4A1-VEGFDiiss mutant was produced after introduction of the mutations corresponding to VEGFD:p. R204S and p.R205S. We verified every construct by sequencing. We

| Histological and immunofluorescence analysis
Cryomold tumour samples were cryosectioned in 5 µm-thick sections and immediately mounted on slides. The sections were post-fixed with 4% paraformaldehyde and stained either with haematoxylin and eosin (HE) or for immunofluorescence (IF), as described. 23 Tissue sections were heated for 10 min in 10 mM sodium citrate pH 6.0 for antigen retrieval. Sections were permeabilized for 5 min in 0.3% Triton X-100 PBS solution before blocking for 1 h in 0.3% milk,10% bovine serum albumin and 0.3% Triton X-100 in PBS. Primary antibodies were monoclonal rabbit anti-VEGFD inverted fluorescence microscope (Zeiss). HE slides were scanned using an Oyster imaging system (3DHistech).

| Characterization of novel gene fusions in myofibroma
We performed RNA sequencing on tumour samples from eight patients. We had previously analysed four of them by targeted sequencing of the PDGFRB locus. 3 All patients were children. Two presented the multicentric form of the disease (IMF). Table 1 summarizes the patient clinical characteristics and RNA sequencing results (see also Table S1 for details). Variant calling on RNA sequencing data indicated a PDGFRB mutation in two patients, which had been previously reported. In addition, we detected five different in-frame gene fusions in six patients. We validated four gene fusions by PCR amplification of the predicted breakpoint junction from tumour cDNA: COL4A1-VEGFD, SRF-ICA1L, SRF-CITED1 and MTCH2-FNBP4 ( Figure 1A and Figure S1). The two SRF fusions have been recently reported in myoid neoplasms related to myofibroma, 12,13 with slightly different exon junctions ( Figure S1). MTCH2-FNBP4 was previously described in a sample of breast cancer, without evidence of recurrence in large-scale studies. 26 Most of the MTCH2 coding sequence (until exon 12) and exons 7 to 17 of FNBP4 composed the resulting fusion gene ( Figure S1). Finally, our fusion calling pipeline also identified a FN1-TIMP1 gene fusion of unknown significance in patient P111. The predicted TIMP1 transcript involved in the fusion was non-canonical, including exon 5 and part of intron 5.
We focused on the COL4A1-VEGFD gene fusion because it was novel, potentially oncogenic and present in two patients. RNA sequencing results indicated that the breakpoints were located in intron 17-18 of COL4A1 and intron 1-2 of VEGFD and generated the same fusion transcript in the two patients. We amplified and cloned the full COL4A1-VEGFD open reading frame from patient P38 tumour cDNA. We confirmed the predicted junction by Sanger sequencing ( Figure 1B). The predicted fusion polypeptide was 643 amino-acid long, including the 319 first residues of COL4A1 and residues 31 to 354 of VEGFD.
Quantitative analysis of RNA sequencing data showed that VEGFD was expressed only in myofibroma samples carrying the corresponding fusions ( Figure 1C). The data also confirmed the homogeneous expression of VEGFD receptors KDR (VEGFR2) and FLT4 (VEGFR3), as well as FURIN, a proprotein convertase that processes VEGFD. 27,28 We performed pathway enrichment analyses on the gene expression results, revealing the presence of VEGF signalling as well as angiogenesis signatures in the samples bearing the COL4A1-VEGFD gene fusion, as illustrated in Figure 1D. Furthermore, the fusion was associated with a specific transcriptional profile relative to the other samples assessed ( Figure S2).

| The COL4A1-VEGFD gene fusion leads to expression of mature VEGFD
To confirm the expression of the fusion protein, we analysed tumour sections by immunofluorescence. The haematoxylin-eosin-stained sections of the P38 myofibroma ( Figure 2) showed classical tumoural architecture with a central haemangiopericytoma-like vascular pattern and fascicles of myofibroblasts at the periphery. 29 We validated the anti-VEGFD primary antibody on transiently transfected COS-1 cells ( Figure S3). The stained tissue sections demonstrated that the expression of VEGFD was robust in the P38 tumour sample, bearing the COL4A1-VEGFD gene fusion, compared to patient P46, used as negative control ( Figure 2C,D) and pericytes) demonstrated that tumour cells expressed the two proteins ( Figure 2E).

| D ISCUSS I ON
We identified a novel fusion transcript associating COL4A1 and To our knowledge, this is the first demonstration that VEGFD is a proto-oncogene subject to somatic gene alteration. Nevertheless, VEGFD has previously been shown to play a role in solid tumour growth, intra-tumoural angiogenesis, lymphangiogenesis and metastatic spread. 34,35 The analyses of the transcriptomic profiles of our samples evidenced a robust enrichment in multiple angiogenic pathways, supporting the production of bioactive VEGFD by these tumours. The fusion may also lead to reduced expression of the arresten protein, which is produced by proteolytic cleavage of the NC1 In conclusion, we identified a novel COL4A1-VEGFD fusion transcript as a recurrent genetic event. The COL4A1-VEGFD fusion leads to production of mature VEGFD after proteolytic processing, which may act as an autocrine growth factor for tumour cells. These findings shed light on a novel pathogenic mechanism of myofibroma development, suggesting opportunities for targeted therapies.

ACK N OWLED G EM ENTS
We thank Liming Pei for depositing the mPCSK3-pcDNA3.1 plasmid in Addgene. We thank Pascal Brouillard for technical support and Nisha Limaye for critical review of the manuscript. GD and BB are the

CO N FLI C T O F I NTE R E S T
The authors have no potential conflict of interest to disclose. Writing-original draft (equal).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.