Development and validation of an RNA sequencing panel for gene fusions in soft tissue sarcoma

Abstract Gene fusions are one of the most common genomic alterations in soft tissue sarcomas (STS), which contain more than 70 subtypes. In this study, a custom‐designed RNA sequencing panel including 67 genes was developed and validated to identify gene fusions in STS. In total, 92 STS samples were analyzed using the RNA panel and 95.7% (88/92) successfully passed all the quality control parameters. Fusion transcripts were detected in 60.2% (53/88) of samples, including three novel fusions (MEG3–PLAG1, SH3BP1–NTRK1, and RPSAP52–HMGA2). The panel demonstrated excellent analytic accuracy, with 93.9% sensitivity and 100% specificity. The intra‐assay, inter‐assay, and personnel consistencies were all 100.0% in four samples and three replicates. In addition, different variants of ESWR1–FLI, COL1A1–PDGFB, NAB2–STAT6, and SS18–SSX were also identified in the corresponding subtypes of STS. In combination with histological and molecular diagnosis, 14.8% (13/88) patients finally changed preliminary histology‐based classification. Collectively, this RNA panel developed in our study shows excellent performance on RNA from formalin‐fixed, paraffin‐embedded samples and can complement DNA‐based assay, thereby facilitating precise diagnosis and novel fusion detection.


| INTRODUC TI ON
Soft tissue sarcoma comprising more than 70 subtypes is a group of highly heterogeneous tumors characterized by local invasion, invasive or destructive growth, high recurrence, and distant metastasis. [1][2][3] Some subtypes have high pathological similarity and are difficult to identify. Here, ~30% STS have specific gene fusion variants, and NCCN guidelines recommend 44 specific fusion genes as diagnostic markers. 4 Gene fusions are derived from the breakage and reconnections between chromosomes, or from intra-chromosomal rearrangements with deletions, insertions, inversions, duplications, or altered transcriptions. 5 They represent an important class of somatic alterations and often act as drivers for tumorigenesis and progression. 6,7 Detection of defined gene fusions in STS can guide their classification and targeted therapies. 8 In addition, identification of novel gene fusions can provide insights into the mechanism of tumorigenesis, allow for subclassification of histologically similar STS and may serve as useful biomarkers for disease progression and treatment. 9 Traditional methods to detect gene fusions are immunohistochemistry (IHC) and FISH. However, these technologies have a common limitation in identifying multiple fused genes simultaneously. [10][11][12] The emergence of NGS technology and modern computational tools allows the identification of multiple fused genes in parallel. 13 Many large DNA-based NGS (DNA-NGS) panels have been developed to detect SNVs, insertions/deletions, and fused genes. 14 However, breakpoints are unpredictable, and any fusion detection requires a great many probes to cover a large range of genomes. Moreover, the rare fusion of ALK detected using DNA-NGS may be unable to be transcribed and translated into functional fusion proteins frequently sensitive to targeted therapies. 15 By contrast, RNA sequencing has been widely used to detect gene fusions without prior knowledge of the partner sequence or specific breakpoints in cancer cell lines, fresh frozen tissues, and FFPE samples. [16][17][18][19] In this study, we presented the development and clinical val-

| Panel design
In total, 67 genes were targeted from the presence of clinically relevant fusions or oncogenic isoforms present in STS. The genes targeted in this panel are listed in Table S1. Target-specific probes covering full exons were custom designed to identify known fusion transcripts and potential novel fusion transcripts associated with 26 cancer genes. To identify potential fusions involving 41 partner genes, multiple probes were designed to cover partial exon of partner genes, such as ETV1, ETV4, and CIC, enabling assessments of gene expression. Gene-specific primers were designed proximally to the exon-exon junctions involved in the fusions. These probes from Roche Company were typically 120 bp in length.

| RNA extraction and quality control
All included samples were pathologically assessed before RNA extraction, and the proportion of tumor cells was ≥20% except one case (Table S2). A minimum of 10 unstained slides from FFPE tissue were obtained for each sample and reviewed by a pathologist, who had samples and can complement DNA-based assay, thereby facilitating precise diagnosis and novel fusion detection.

K E Y W O R D S
fluorescence in situ hybridization, gene fusion, immunohistochemistry, next-generation sequencing, soft tissue sarcoma given a preliminary diagnosis combined with IHC and FISH results.  No obvious joint contamination was detected in the final library using the Agilent 4200 TapeStation system, and the main peak was between 300 and 500 bp. After quantification, NGS was performed on an Illumina Novaseq 6000 instrument (Illumina).

| Fusion detection
A custom pipeline was developed to perform reads alignment, fusion detection, and QC on RNA sequencing data. The software fastp (v.2.20.0) was used for adapter trimming. The software STAR (v2.7.6a) was used to align reads to the reference genome (UCSC's hg19 GRCh37). Star-Fusion (v1.9.1) was applied to identify the primary fused genes. To avoid false-positive fusion results, we use FusionInspector in 'inspect' mode to re-score and filter the predicted fusions with the following parameters: min_junction_reads 1, min_novel_junction_support 3, min_spanning_frags_only 5, max_ promiscuity 10, only_fusion_reads, fusion_contigs_only. Then we used a tier-based filter strategy: if fusion pairs were annotated to COSMIC or ChimerKB database, the cutoff was fusion fragment per million (FFPM) > 0.02 and support reads > 2, or else the cutoff was FFPM > 0.07 and support reads > 8. Fusion expression was calculated based on FFPM using the raw data from the RNA panel and the average number of reads in this cohort was 12,353,247. The "copies/ ng" of the fusion transcript was calculated using the droplet digital polymerase chain reaction (ddPCR). The primers used for RT-PCR are listed in Table S3. Fusion product sequences using DNA-NGS or qRT-PCR were then assessed to ensure that they aligned with the sequence predicted by the RNA panel.

| Detection limit
A complete process was illustrated from the total RNA isolation to the report of data analysis focusing on gene fusion detection ( Figure 1A). The RNA panel was designed to detect gene-specific fusions in STS.
To determine how tumor contents affected fusion detection, serial dilution experiments were performed. A standard RNA sample with an ETV6-NTRK3 fusion was serially diluted with RNA from a normal control to generate various levels, from 50 copies/100 ng to 500 copies/100 ng. The ETV6-NTRK3 fusion was identified in all replicates across all dilutions ( Figure 1B). Then, standard FFPE samples (50 copies) and cell line samples (50 copies) were detected stably in five replicates. Notably, the FFPM to support ETV6-NTRK3 and EWSR1-FLI1 differed greatly ( Figure 1C). In addition, interfering substances ethanol (1% V/V) and protease K (0.08 mg/ml) showed no effects on the experimental flow (Table S4). Finally, the samples were subjected to database construction, computer sequencing, and bioinformatics analysis.

| Validation based on clinical samples
To assess the clinical utility of the RNA panel, five fusion-positive, 28 FISH-positive, and 59 FISH-negative or no FISH clinical samples were selected. In our cohort, Ewing sarcoma, malignant small round cell tumors, and dermatofibrosarcoma protuberans (DFSP) were the three most common tumor types, occupying 10%, 10%, and 5% respectively (Table S2). The proportion of other subtypes ranged from 1% of lipofibromatosis-like neural tumor to 4% of inflammatory myofi-  Figure 2B). Notably, novel gene fusions were identified, including MEG3-PLAG1, SH3BP1-NTRK1, and RPSAP52-HMGA2, which were confirmed by qRT-PCR ( Figure S1).  DNA-NGS was used to validate the inconsistency between FISH assay and RNA fusion panel. In the two samples, the fusions were not detected using the RNA fusion panel, whereas an EWSR1 (intron)-ZNF444 fusion was identified in one sample by the DNA-NGS. Additionally, in another sample, DNA-NGS assay identified an EWSR1-DCTIN2 (intron) fusion, but not EWSR1-DDIT3 fusion detected using the RNA fusion panel. The rare fusion detected using DNA sequencing assays may not be transcribed and translated into the functional fusion proteins of matched targeted therapies. 15 Interestingly, we identified an FUS-TFCP2 fusion in one sample, which was initially diagnosed as inflammatory myofibroblastic tumor due to ALK disruption using FISH. However, we did not find any ALK fusion in the sample by DNA-NGS. The CV of FFPM was the largest among personnel consistencies, followed by that within inter-assay and the smallest within intra-assay.  -1 and A-2), in which A-1 contained three replicates (A-1-1, A-1-2, and A-1-3). Examiner B carried out only one sequencing run.

| High precision in detecting fusions by the
These findings indicated that FFPE samples could be used for RNA-NGS to detect fusion genes and our RNA panel was very stable to identify gene fusions. EWSR1 most frequently fused to exon 7 of FLI1, exon 9 of ERG, and exon 10 of ETV1 ( Figure 3A). in one patient, respectively ( Figure 3D). Interestingly, SS18 exon 5-SSX1 exon 9 fusions coexisted in one patient harboring SS18 exon 10-SSX1 exon 7 fusion.

| Sarcoma harboring novel fusions
The advantage of the RNA-NGS over the trapping based on DNA sequencing assay is that the RNA-NGS can recognize all fusion transcripts without the need to design probes to cover the fracture areas. In our study, three cases were found to have novel gene fusions. The novel gene of two fusions was identified as the partner gene fused to a gene known to be recurrently involved in the specific tumor type. However, another novel fusion contained two rare genes. These three cases were summarized as follows:

| Application of RNA-NGS data for the integrated histological and molecular classification
In combination with histological and molecular diagnosis, 14.8%

| DISCUSS ION
As an increasing number of clinically significant fusions are being  which indicated a great advantage in detecting unknown fusions for this assay. Second, some uncommon fusions detected using DNA-based assay generated no aberrant transcripts or proteins, therefore failing in the response to targeted therapies. 15   Furthermore, there were also significant differences in gene expression associated with anatomic localization and NAB2-STAT6 gene fusion variants. 34 Therefore, identification of distinct NAB2-STAT6 gene fusion variants may establish a potential molecular biologic basis for clinicopathologic differences in SFTs.
Interestingly, we also identified four intron-exon gene fusions in our cohort, including one ETV6-NTRK3 in congenital fibrosarcoma, two FUS-CREB3L2 in low-grade fibromyxoid sarcoma, and one EML4-ALK fusion in inflammatory myofibroblastic tumor.
Coexistence of exon-exon gene fusions was also present in all the patients. The reason for intron-exon gene fusion was intron reten- These findings illustrated the high impact of molecular markers on future sarcoma classification, and the RNA gene panel NGS may become an important tool for sarcoma diagnosis in the future through timely and reliable molecular profiling.
Several limitations of this assay should be of concern. First, degradation in FFPE samples led to low-quality RNA, which seriously affected the performance for fusion detection and increased the negative risk. Second, the RNA panel assay cannot detect fusions of the two genes excluded from our panel, which may miss some fusions that may play an important role in tumorigenesis. To better service clinical diagnosis, we will continuously upgrade our panel by incorporating newly discovered fusion genes in the future.
In summary, we established and validated a sarcoma-tailored 67-gene RNA panel for the improvement of STS molecular diagnosis. This panel can detect known or unknown gene fusions, thereby serving as a good supplement for DNA-based assays. Moreover, the method is applicable to routinely processed FFPE tissue samples and bears great potential for facilitating future integrated STS diagnostics as molecular classification.

ACK N OWLED G M ENTS
None.

D I SCLOS U R E
The authors declare that they have no conflict of interest.

AUTH O R CO NTR I B UTI O N S
WMH, LY, and XKZ were responsible for study conception and original draft preparation. YN, DCH, and ZCW participated in sample collection and investigation. XML, YL, CZ, WLD, MQT, and RD analyzed and interpreted the data. CS, JML, and XZ reviewed and edited the manuscript. All the authors read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.