Fusion sequencing via terminator‐assisted synthesis (FTAS‐seq) identifies TMPRSS2 fusion partners in prostate cancer

Genetic rearrangements that fuse an androgen‐regulated promoter area with a protein‐coding portion of an originally androgen‐unaffected gene are frequent in prostate cancer, with the fusion between transmembrane serine protease 2 (TMPRSS2) and ETS transcription factor ERG (ERG) (TMPRSS2‐ERG fusion) being the most prevalent. Conventional hybridization‐ or amplification‐based methods can test for the presence of expected gene fusions, but the exploratory analysis of currently unknown fusion partners is often cost‐prohibitive. Here, we developed an innovative next‐generation sequencing (NGS)‐based approach for gene fusion analysis termed fusion sequencing via terminator‐assisted synthesis (FTAS‐seq). FTAS‐seq can be used to enrich the gene of interest while simultaneously profiling the whole spectrum of its 3′‐terminal fusion partners. Using this novel semi‐targeted RNA‐sequencing technique, we were able to identify 11 previously uncharacterized TMPRSS2 fusion partners and capture a range of TMPRSS2‐ERG isoforms. We tested the performance of FTAS‐seq with well‐characterized prostate cancer cell lines and utilized the technique for the analysis of patient RNA samples. FTAS‐seq chemistry combined with appropriate primer panels holds great potential as a tool for biomarker discovery that can support the development of personalized cancer therapies.


Introduction
Clinical heterogeneity of prostate cancer (PCa) likely reflects the diversity of the underlying molecular landscape and raises challenges for disease management [1]. Gene fusions are commonly detectable in PCa, with the most frequent type being structural rearrangements between an androgen-regulated gene and a member of the ETS family transcription factor gene [2,3]. Typically, fusion events juxtapose androgen-regulated 5 0 -UTR with 3 0 protein-coding parts of oncogenic ETS family genes, resulting in their overexpression. The Cancer Genome Atlas (TCGA) revealed that 53% of patients with PCa had fusions involving ETS family genes [4]. In particular, the TMPRSS2-ERG (TMERG) fusion is the most prevalent somatic fusion event in PCa. The frequency of TMERG is variable among different populations: The highest frequency (> 50%) was reported in Caucasians, followed by African American (20-30%) and Asian men (< 20%) [5]. The association of TMERG with PCa clinical outcomes remains inconclusive. Some studies have reported the role of ERG fusion in PCa cellular growth, tumor progression, and bone metastasis development [6,7], while others found that patients with fusion-bearing tumors had not significantly different prognoses than those without [4].
Cancer-specific fusion transcripts mainly are the result of chromosomal rearrangements, such as translocation, deletion, or isochromosome formation [8]. Chimeric mRNAs can also form because of RNA polymerase read-through between neighboring genes encoded on the same DNA strand or trans-splicing of pre-mRNA [9,10]. Transcription-induced gene fusions are frequently detected in normal tissues [11], which provides an additional level of molecular complexity in the search for disease-related biomarkers, nevertheless, in some instances, nonchromosomal fusions, such as SLC45A3-ELK4, may also be involved in pathological processes [12]. Interestingly, different regions of a single tumor may contain distinct patterns of fusion isoforms suggesting that particular gene fusions may emerge independently in different regions of a single organ, as is the case with 17 reported isoforms of TMERG in PCa [13,14]. The prognostic significance of all individual TMERG fusion variants is still to be determined.
Fluorescence in situ hybridization (FISH), immunohistochemistry (IHC), and reverse transcription-polymerase chain reaction (RT-PCR) are predominantly used for the detection of fusion genes in clinical settings. Although sensitive, these methods typically test for the presence of already known fusion genes, often leading to false negative results attributed to novel or nontested gene fusions or isoforms. Moreover, low throughput and limited resolution severely limit the diagnostic scope [15]. These weaknesses can be overcome by the application of next-generation sequencing (NGS), which has already helped to elucidate more than 90% of known fusion genes [16]. However, highconfidence detection of fusion genes from whole genome or whole transcriptome sequencing data requires extremely deep sequencing, which can be costprohibitive. A minimum of 10009 coverage is necessary to detect low-abundance fusion transcripts by RNA sequencing [8,17]. Enrichment for target sequences via hybridization capture or PCR can improve the detection sensitivity of NGS-based assays. Targeted RNA sequencing employing hybridization probes was shown to increase the diagnostic rate of gene fusion detection from 63% to 76% compared with FISH and RT-PCR [18]. Among PCR-based enrichment techniques, anchored PCR chemistry retains the ability to detect previously uncharacterized fusion partners by using a single genespecific primer ('anchor') and a universal primer embedded in NGS platform-specific adapters [19]. While both approaches demonstrate high sensitivity and specificity, they involve complex and lengthy manipulations with samples that lead to longer turnaround times as compared to conventional methods.
Here, we describe a new NGS-based assay to profile the diversity of 3 0 fusion partners of target genes by extending a single target-specific primer and randomly terminating extension products with oligonucleotidetethered chain terminators bearing sequencing adapters covalently attached to their nucleobases. This reaction simultaneously produces DNA fragments suitable for short-read sequencing and labels the resulting molecules with a sequencing adapter. We termed this technique fusion sequencing via terminator-assisted synthesis or FTAS-seq. We demonstrated the detection of expected and previously uncharacterized TMERG isoforms in the NCI-H660 cell line and the correct identification of fusion-negative cases. Moreover, we screened for TMPRSS2 3 0 fusion partners in prostate tissue samples from PCa patients and discovered 11 new partner genes, as well as a rich diversity of TMERG isoforms.

Source and culture of prostate cancer cell lines
LNCaP clone FGC (ATCC:CRL-1740 TM , RRID:CVC L_1379) and NCI-H660 (ATCC:CRL-5813 TM , RRID: CVCL_1576) cell lines were obtained from ATCC. Both cell lines were authenticated by short tandem repeat (STR) profiling within 3 years before the study and regularly tested for the presence of mycoplasma.
LNCaP cells were cultured in Roswell Park Memorial Institute (RPMI)-1640 medium supplemented with 10% fetal bovine serum. NCI-H660 cells were cultured in RPMI-1640 supplemented with 5% fetal bovine serum, 5 lgÁmL À1 of insulin, 10 nM of hydrocortisone, 10 lgÁmL À1 of transferrin, 30 nM of sodium selenite, 10 nM of b-estradiol, and 2 nM of L-glutamine. Cells were cultured according to the standard mammalian tissue Biochemical recurrence (BCR) was defined as postoperative PSA levels of 0.2 ngÁmL À1 and above. Full follow-up data were available for 52 patients with a mean follow-up of 3.3 years. Clinico-pathological and molecular characteristics of the study subsets are provided in Table S1.
Total RNA from tissue was purified using TRIzol TM (Thermo Fisher Scientific) reagent according to the protocol provided by the manufacturer. NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific) was used to determine RNA concentration and purity.
Total RNA was purified in 2018 from January to March and stored at À80°C until use. Before manipulations were carried out in this study, RNA quality was assessed by the Agilent 2100 Bioanalyzer system (Agilent Technologies) using RNA 6000 Pico Kit.

Gene expression analysis by RT-qPCR
TMPRSS2-ERG and GAPDH expression levels in cell lines and patient samples were evaluated by reverse transcription-quantitative PCR (RT-qPCR) analysis. TMPRSS2-ERG fusion transcripts were amplified with the primers specific to TMPRSS2 exon 1 5 0 -CGC GAGCTAAGCAGGAG-3 0 and ERG exon 2 5 0 -GTCC ATAGTCGCTGGAGGAG-3 0 . GAPDH transcript was amplified with the primer set 5 0 -ATTCCATGGC ACCGTCAAG-3 0 and 5 0 -TTTGGAGGGATCTCGC TCC-3 0 . All primers used in this study were synthesized by Metabion International AG. RT-qPCR reaction mixtures were prepared with 100 ng of total RNA using Power SYBR TM Green RNA-to-CT TM 1-Step Kit (Thermo Fisher Scientific), and each sample was analyzed in duplicate. Cycling was performed following the manufacturer's recommendations. Negative controls without reverse transcriptase were prepared for each sample and multiple no-template controls were included in each run. Gene expression analysis was performed on QuantStudio TM 7 Flex Real-Time PCR System using QUANTSTUDIO TM REAL-TIME PCR SOFTWARE v1.7.1 (Thermo Fisher Scientific).

cDNA synthesis and purification
cDNA was synthesized from LNCaP, NCI-H660, and clinical samples' total RNA and used for semi-targeted RNA-seq library preparation, as well as for fusion breakpoint validation experiments. Reverse transcription was performed with 500 ng of total RNA using SuperScript TM IV VILO TM Master Mix (Thermo Fisher Scientific) followed by RNA strand hydrolysis by Escherichia coli RNase H (Thermo Fisher Scientific). cDNA was then purified using Dynabeads Cleanup Beads (Thermo Fisher Scientific). RT reaction mixtures (20 lL) were mixed with 10 lL of nuclease-free water, 60 lL of magnetic beads, 60 lL of 96% ethanol, and incubated at room temperature for 10 min. Samples were then placed in the magnetic rack and the supernatant was removed. Beads were washed with 85% ethanol keeping tubes in the magnetic rack. To elute cDNA, beads were resuspended in 10 lL of nuclease-free water and incubated at 65°C for 5 min.
The oligonucleotide-modified ddC ON

Semi-targeted RNA-seq library preparation
To prepare semi-targeted RNA-seq libraries from cDNA inputs, Thermo Sequenase (Thermo Fisher Scientific) enzyme, which is capable to extend primer and incorporate OTDDNs, was employed. Each sample was analyzed in duplicate. Primer extension reaction mixtures contained 8 lL of purified cDNA (corresponding to~500 ng of initial total RNA), 19 Thermo Sequenase Reaction Buffer, 0.05 lM of TMPRSS2 exon 1 specific primer 5 0 -TAGGCGCGA GCTAAGCAGGAG-3 0 , 1 lM of dATP and dGTP each, 0.9 lM of dTTP and dCTP each, 0.1 lM of ddC ON TP and ddU ON TP each, 40 U of Thermo Sequenase and nuclease-free water up to 20 lL final volume. Second strand synthesis conditions were as follows: denaturation at 95°C for 4 min, followed by 15 cycles of denaturation at 95°C for 1 min, annealing at 63°C for 30 s, extension at 72°C for 1 min, and final extension at 72°C for 5 min. Primer extension products were then purified following conditions described for cDNA purification, except that the elution volume was increased to 22 lL of nuclease-free water.
To amplify NGS libraries and to increase target detection specificity, indexing nested-PCR was performed. Fragments were amplified with an indexing primer set in which one primer was specific to the TMPRSS2 target sequence: i5 primer 5 0 -AATGATAC GGCGACCACCGAGATCTACACTCTTTCCCTAC ACGACGCTCTTCCGATCTGGAGGCGGAGGGC GAGGG-3 0 , and i7 primer 5 0 -CAAGCAGAAGAC GGCATACGAGAT[8nt_index]GTGACTGGAGTTC AGACGTGTGCTCTTCCGATCT-3 0 . Indexing nested-PCR reaction mixture contained 20 lL of purified primer extension products, 19 Invitrogen TM Collibri TM Library Amplification Master Mix (Thermo Fisher Scientific), 1 lM of indexing primers each, 20 U of 3 0 -5 0 exonuclease-deficient Phusion polymerase, and nuclease-free water up to 50 lL final volume. Amplification conditions were as follows: denaturation at 98°C for 30 s, followed by 20 cycles of denaturation at 98°C for 10 s, annealing at 60°C for 30 s, extension at 72°C for 1 min, and final extension at 72°C for 1 min. PCR products were then purified using Dynabeads Cleanup Beads following a double binding protocol. 50 lL of samples was mixed with 40 lL of magnetic bead suspension and incubated at room temperature for 5 min. Samples were then placed in the magnetic rack, the supernatant was removed, and beads were resuspended in 50 lL of nuclease-free water. For the second binding, samples were mixed with 45 lL of fresh beads suspension and incubated at room temperature for 5 min. After incubation samples were placed in the magnetic rack, the supernatant was removed, and beads were washed with 85% ethanol. Beads were then resuspended in 22 lL of nuclease-free water and incubated at room temperature for 1 min to elute final libraries. Fragment size distribution and concentration were evaluated by the Agilent Fragment Analyzer system (Agilent Technologies) using HS NGS Fragment Kit.
To generate enough material for sequencing, samples were reamplified with Invitrogen Collibri Library Amplification Master Mix with Primer Mix (Thermo Fisher Scientific) in a 50 lL reaction for 3-7 cycles according to the manufacturer's protocol. Final libraries were purified using Dynabeads Cleanup Beads following double binding protocol and analyzed by the Agilent Fragment Analyzer system (Agilent Technologies) with HS NGS Fragment Kit. Finally, libraries were quantified with the Invitrogen Collibri Library Quantification Kit (Thermo Fisher Scientific).

Fusion breakpoint validation by Sanger sequencing
To validate breakpoint sequences patient samples were reverse transcribed and fused fragments were amplified using TMPRSS2 exon 1 specific primer 5 0 -TAGGCG CGAGCTAAGCAG-3 0 and reverse primer specific to 3 0 fusion partner (Table S2). Amplification mixtures contained 8 lL of purified cDNA (corresponding tõ 500 ng of initial total RNA), 19 Phusion Hot Start II High-Fidelity PCR Master Mix (Thermo Fisher Scientific), 0.5 lM of forward and reverse primers each, and nuclease-free water up to 20 lL final volume. PCR conditions: denaturation at 98°C for 30 s, followed by 35 cycles of denaturation at 98°C for 10 s, annealing at 59-62°C (depends on primer Tm°C) for 30 s, extension at 72°C for 15 s, and final extension at 72°C for 5 min. Amplified fusion isoforms were separated by agarose gel electrophoresis on high resolution 2% MetaPhor TM Agarose (Lonza, Basel, Switzerland), and DNA was extracted using GeneJET Gel Extraction and DNA Cleanup Micro Kit (Thermo Fisher Scientific). Purified PCR fragments were cloned using CloneJET TM PCR Cloning Kit (Thermo Fisher Scientific), and ligation mixtures were then used for E. coli DH10B transformation. Plasmids were purified with GeneJET Plasmid Miniprep Kit (Thermo Fisher Scientific) and sequenced with BigDye TM Terminator v3.1 Cycle Sequencing Kit on 3130xL Genetic Analyzer system (Thermo Fisher Scientific). Obtained sequences were analyzed using Vector NTI Advance Software v11.5.4 (Thermo Fisher Scientific).

FTAS-seq captures unknown RNA 3 0terminal sequences nearby a defined target site
To analyze TMPRSS2 3 0 fusion partner sequences, we developed FTAS-seq (Fig. 1A), which utilizes a nucleotide-mediated adapter addition technology for the preparation of sequencing-ready molecules by the extension of a single site-specific primer. Firstly, total RNA was reverse transcribed using random and oligo(dT) primers, then RNase H was used to hydrolyze the RNA strand in an RNA-cDNA hybrid. We designed a primer specific to TMPRSS2 (RefSeq NM_005656) exon 1 to target the 5 0 end of the transcript. The semi-targeted approach requires knowing only a small fragment of the target gene to design a primer and enables the detection of adjacent 3 0terminal regions, in our case ERG sequences (RefSeq NM_004449.4), without a priori knowledge. Second cDNA strand synthesis was initiated from a targetspecific primer and utilized DNA polymerase, which is able to incorporate dideoxynucleotides. Basemodified dideoxynucleotides contained a synthetic oligonucleotide, which served as a universal priming site to amplify labeled cDNA fragments. Next, TMPRSS2 exon 1-specific primer, which hybridizes closer to the fusion breakpoint and contains a fulllength Illumina P5 adapter, was used in the library amplification step to increase TMERG detection specificity and to add Illumina P5 adapters to amplified fragments. A full-length P7 adapter was introduced through indexing primers complementary to OTDDN oligonucleotide. After amplification short fragments were removed by applying size-selection purification (Fig. 1B), and libraries were subjected to standard Illumina paired-end sequencing. The forward sequencing read (R1) contained fusion transcripts' 5 0 end sequences starting from TMPRSS2 exon 1 while the reverse read (R2) contained ERG sequences starting from random positions.
The use of click chemistry as a means of adapter addition was previously reported [29]. Here, we executed click reactions to generate oligonucleotide-modified dideoxynucleotides prior to their incorporation into the nascent DNA strand (Fig. 1c). We obtained conjugates of correct mass and 98% purity with > 30% yield. An essential requirement for these compounds is the compatibility of the unnatural linker with DNA polymerases to enable the use of the attached oligonucleotide as a priming site. We optimized the structure of the linker and identified polymerases able to use OTDDNs as substrates, as well as polymerases able to perform readthrough [30]. These findings pave the way for straightforward DNA labelling by any desired oligonucleotide irrespective of the sequence context of the template.

FTAS-seq detects the expected TMERG isoforms in prostate cancer cell line
To assess whether the semi-targeted RNA-seq approach is sufficiently specific and sensitive to detect rare fusion events in human transcriptome, we prepared FTAS-seq libraries from total RNA purified from prostate cancer cell lines. We selected the NCI-H660 cell line as a TMERG-positive [13,31] and LNCaP as a negative control [31]. Firstly, we evaluated TMERG expression levels by RT-qPCR and observed that TMERG was~1000-fold less abundant than housekeeping GAPDH transcript in NCI-H660 cells ( Fig. 2A). Although we obtained TMERG signal for LNCaP cells as well, melt curve analysis indicated that these were likely nonspecific amplification products. Afterwards, we sequenced FTAS-seq libraries prepared from 0.5 lg of the same total RNA. On average, 73.14 AE 0.34% of obtained reads mapped to hg19 in NCI-H660 samples. To detect gene fusions from semi-targeted RNA-seq data, we employed a command-line tool Arriba, which evaluates the reliability of each fusion according to an internal confidence scoring algorithm [27]. We considered fusion isoforms as putative true variants if they met the Arriba criteria and were detected in both technical replicates. Five TMERG fusions were found in NCI-H660 samples (Fig. 2B,C) of which three were previously reported [13]. In this study, we named fusion transcripts by exon numeration according to TMPRSS2 and ERG transcripts structure. In NCI-H660 libraries we also identified T1-E IIIa and T2-E5 fusion transcripts, which were not validated in NCI-H660 transcriptome before, although were reported in PCa patient samples [13]. All identified TMERG variants, excluding T1-E IIIa , which is identical to T1/ E_IIIc_4 described by Clark et al. [13], form between corresponding TMPRSS2 and ERG exons while T1-E IIIa contains partial ERG intron 3 sequence followed by exon 4.
To assess whether FTAS-seq data are affected by technical noise and would correctly detect TMERGnegative cases, we prepared FTAS-seq libraries from total RNA extracted from LNCaP cells. 81.32 AE 0.68% of obtained reads mapped to hg19. The majority of reads in LNCaP samples mapped to wild-type TMPRSS2 gene and no chimeric transcripts were detected. Experiments with PCa cell lines indicate that FTAS-seq is a specific method for the detection of rare TMERG transcripts, which enables distinguishing fusion-positive and -negative cases.  [27]. E IIIa is an ERG intronic sequence after exon 3. (C) The structure of T1-E4 and T2-E4 fusion transcripts detected in NCI-H660 RNA. The visualization was created using the Arriba Rscript arriba_draw_fusions.R.

The analysis and validation of TMPRSS2 gene fusions in PCa tumor samples
Following successful proof-of-principle experiments in cell lines, we applied the semi-targeted RNA-seq approach to analyze the diversity of TMPRSS2 fusion partners in PCa samples. The study cohort contained TMERG-positive and TMERG-negative cases as assessed by conventional PCR-based methods (Table S1). After sequencing, obtained reads were mapped to the hg19 reference genome-alignment rate varied from 22.90% to 89.04%, and this correlated with the quality of RNA used for library preparation (RIN 2.2-8.2). The analysis of NGS data also showed that target detection specificity and the number of contaminating non-TMPRSS2 reads are related to the integrity of RNA input. Importantly, quality differences did not force us to exclude samples from the analysis-meaningful information was retrieved in all cases. Our results show that the semi-targeted RNA-seq approach may be applied to analyze even highly degraded RNA. Nevertheless, a separate validation is needed to verify the compatibility with FFPE samples as FFPE-derived RNA tends to be not only fragmented but also chemically modified. Across a total cohort, FTAS-seq detected chimeric transcripts in 43 samples (77%) of which 38 (70%) contained TMERG fusion transcripts (Table S3). In comparison, 39 samples (72%) were previously reported as TMERG-positive by RT-qPCR and contained at least one of the variants T1-E4 and/or T2-E4 or T1-E2. To assess the overall concordance of FTASseq and RT-qPCR we compared the number of fusion isoforms detected by both methods. FTAS-seq correctly detected TMERG transcripts in 32 (82%) samples and nine isoforms were missed in 7 (18%) samples. Contradictory samples were then processed to Sanger sequencing to validate TMERG status. The results showed that three isoforms were indeed missed by FTAS-seq likely due to their low expression levels and/or interference from the wild-type TMPRSS2 transcript originating from the nontumor cells. Six other isoforms were not detected neither by FTAS-seq nor by Sanger sequencing indicating the false positives of RT-qPCR.

Novel 3 0 fusion partners of TMPRSS2
The semi-targeted RNA-seq approach enabled us to analyze the variety of 3 0 fusion partners associated with the TMPRSS2 gene. Across 54 PCa samples, we detected 11 novel TMPRSS2 fusion partners (Linc00114, PPP3CA, AMACR, CASZ1, SIM2, TTC18, FGFR2, OPTN, C1orf61, TBXAS1, RERE) bearing 21 different breakpoint sequence (Fig.3, Table S4). All novel variants were detected in individual samples, except for AMACR fusions that were found at a higher frequency. Eight (15%) samples contained TMPRSS2-AMACR breakpoints, which form between TMPRSS2 exon 1 or 5 and AMACR exon 2. The variety of chimeric transcripts showed that fusions may contain junctions of different structures. For example, 18 out of 21 new variants contained exonexon junctions, one variant (T5-S VI ) consisted of TMPRSS2 exon fused with the inverted sequence of SIM2 intron 6, and two variants consisted of TMPRSS2 exons fused with noncoding RNA Linc00114 sequences. FTAS-seq analysis indicates that PCa patient samples exhibit a wide diversity of TMERG and other chimeric transcripts, which cannot be detected by RT-qPCR without prior knowledge about both partner genes.
To verify novel transcripts, we processed samples to Sanger sequencing. PPP3CA, AMACR, TTC18, Linc 00114, FGFR2, TBXAS1, and RERE were identified as valid TMPRSS2 3 0 -terminal fusion partners (Fig. 3). We were not able to analyze CASZ1, SIM2, and C1orf61 fusion isoforms due to insufficient amount of input RNA. During validation, we found a large part of gene fusions detected by FTAS-seq, as well as additional previously uncaptured isoforms. Sanger sequencing revealed 15 fusion variants that were not found by FTAS-seq and could not have been detected by RT-qPCR targeting T1-E4. Our results show that the majority of novel fusion transcripts identified by semi-targeted RNA-seq, although rare, are true variants characteristic to PCa (Table S3).

Bioinformatic prediction of actionable gene fusions
Employing semi-targeted RNA-seq we identified a great structural variety of fusion transcripts in PCa patient samples. By using bioinformatic tools, we attempted to predict the functionality of the resulting chimeric proteins. Some fusion events, such as T3-P2, T5-A2, T2-TTC23, and T2-TTC24, do not alter reading frames. These chimeras may encode functional protein domains and may retain the biological functions of a 3 0 fusion partner. Androgen-regulated TMPRSS2 promoter leads to overexpression of these proteins in prostate tissue. Another part of fusion transcripts has an unclear reading frame, for example, T1-A2 and T1-F2, causing an uncertain effect on protein level. Such ambiguities might be caused by the inability to predict the expressed isoform of a 3 0 fusion partner that can have an alternative reading frame. Many isoforms were marked as unclear in our dataset indicating that complementary techniques, such as long-read sequencing, are needed to resolve such ambiguities. The third category of fusions leads to the loss of reading frame. If such rearrangements occur at the genomic level, this might indicate the loss of function of corresponding genes. Alternatively, these fusions may function as noncoding RNA. Bioinformatic predictions can assist in understanding the clinical importance and pathology of fusion transcripts in PCa, although more comprehensive experimental validation is needed to verify the anticipated effects.

Discussion
In the era of precision oncology, the detection of fusion genes that often play driver roles in tumorigenesis [32] is critical to improve diagnosis and personalize treatment. Moreover, recent reports indicate that even similar clinical phenotypes might substantially differ at the molecular level, implying different drug targets, thus the understanding of molecular features is crucial for precise cancer therapy [33]. Here, we have demonstrated the concept of semi-targeted RNA sequencing and identified a plethora of TMPRSS2 3 0 -terminal fusion partners in PCa. We have previously shown that the introduction of sequencing adapters via enzymatic incorporation of base-modified dideoxynucleotides improves NGS library preparation [30,34]. This work further expands the applicability of OTDDNs and suggests a semi-targeted sequencing technique for transcriptomic analyses.
Although long-read sequencing technologies can detect structural variations in DNA and splice isoforms in RNA [35], reliable and cost-effective approaches employing short-read sequencers are still of interest given their accuracy, wide adoption, and support by a broad range of data analysis tools. FTAS-seq developed in this work offers similar advantages to anchored PCR [19], including agnostic detection of fusion partners by targeting only one known gene. In addition, FTAS-seq provides a substantially simpler workflow and compatibility with highly degraded RNA that is frequently obtained from FFPE-derived biosamples.
The fusion of TMPRSS2 with ERG is by far the single most common fusion gene found in solid tumors [2,36]. Due to its prevalence and the availability of fusion-positive cell line models, the relevance of TMERG for the pathogenesis of PCa has been widely studied. Remarkably, these analyses revealed either positive, negative, or no association between the presence of TMERG mRNA and the clinical significance, progression, or aggressiveness of PCa [37]. Limitations of conventional gene fusion detection techniques may be at least in part responsible for these discrepancies and the ability to profile the whole spectrum of TMPRSS2 fusion partners may greatly facilitate the elucidation of their clinical significance.
High-throughput sequencing increased the number of known gene fusions to more than 30 000 [38], with approximately 10 000 fusion transcripts identified in normal tissues [11], which raises the question of whether newly detected fusions are clinically important or are random events [39]. The analysis of 2727 candidate fusion genes reported in PCa revealed that most (76%) of genes fuse to a single partner while genes that are likely oncogenic drivers fuse to multiple partners, e.g., the TMPRSS2 gene was associated with 23 partners. Interestingly, most fusion partners of TMPRSS2 (65%) were found to be located on different chromosomes [40]. Likewise, our study identified 11 new TMPRSS2 fusion partners, nine (82%) of which are from different chromosomes. Most of the identified fusions were detected in individual samples while TMPRSS2-AMACR transcripts were detected in eight (15%) cases. Alpha-methylacyl-CoA racemase (AMACR) is an enzyme involved in bile acid biosynthesis and peroxisomal beta-oxidation of branchedchain fatty acids [41]. Previous studies reported the overexpression of AMACR at both protein and mRNA levels in cancerous prostatic tissues [42]. Notably, the chimeric TMPRSS2-AMACR transcript might be a result of trans-splicing between TMPRSS2 and AMACR pre-mRNAs as both transcripts are abundant in PCa. To determine the origin of this transcript, additional analysis on a DNA level is needed.
Fusion transcripts between two genes in-frame are translated into fusion proteins that may act as potent oncogenic drivers [43]. In this study, we identified inframe fusions between TMPRSS2 and AMACR, PPP3CA, and TTC18. PPP3CA encodes the catalytic subunit A of a calcium-dependent protein phosphatase calcineurin. Previous studies indicated a protumorigenic role of calcineurin signaling in prostate and other types of cancers [44]. TTC18 encodes cilia-and flagella-associated protein 70, which is a regulator protein of the outer dynein arms strongly expressed in the human testis. Loss of TTC18 function was found to be responsible for multiple morphological abnormalities of the sperm flagella [45]. In our study in-frame isoforms of TMPRSS2-TTC18 retain only a small fraction of TTC18 exons, which likely do not exhibit their biological function, thus rearrangement probably leads to the knockout of TTC18. Collectively, our findings open doors for the consideration and further exploration of additional molecular mechanisms of PCa progression.
We were not able to validate TMPRSS2 fusions with CASZ1, SIM2, and C1orf61 due to the insufficient amount of RNA; however, all these genes are known to play roles in pathologic conditions, including cancers. CASZ1 encodes a zinc finger transcription factor whose dysregulated expression was linked to the pathobiology of neuroblastoma [46], glioma [47], and hepatocellular carcinoma, and aberrant fusion transcripts of CASZ1 were reported in colorectal and bladder cancers [48]. To date, no reports are linking CASZ1 to PCa. SIM2 encodes proteins belonging to a family of transcriptional repressors, which are known to be involved in the pathogenesis of solid tumors, including PCa. SIM2 was previously found to be upregulated in PCa and proposed as a biomarker of aggressive disease [49,50]. C1orf61, or CROC-4, encodes a transcriptional activator of c-fos protooncogene promoter [51]. C1orf61 was reported to act as a tumor activator and promote metastasis in human hepatocellular carcinoma [52]. This study for the first time indicates the potential role of CASZ1 and C1orf61 in PCa and reports the fusions of these genes with TMPRSS2.
There are a few alternative NGS-based techniques for fusion detection available in the market based either on PCR, semi-targeted PCR, or hybridization capture. The published benchmarking results indicate that all three approaches generate high-quality results. PCR-based methods show the lowest limit of detection, while semi-targeted PCR and hybridization capture techniques are superior for the discovery of uncommon or novel fusion partners [53]. Notably, long-read sequencing, currently offered by Oxford Nanopore Technologies [54] and Pacific Biosciences [55], can resolve multi-exon isoforms and accurately detect fusions as reads span the full length of transcripts, however currently these technologies are still more expensive than short-read sequencing. Moreover, there are few computational tools for the detection of structural variation in long-read data and these tools typically can characterize only a subset of structural variants [56]. In principle, FTAS-seq can be adapted for long-read library preparation by changing the adapter sequences and adjusting the fragment length by reducing the OTDDN to dNTP ratio at the second cDNA strand synthesis step (see Section 2). This might be useful to capture breakpoints that occur at a longer distance from the target site than can be reliably captured by our technique using short reads. The main technical characteristics of FTAS-seq and other technologies for fusion detection are summarized in Table S5. Notably, FTAS-seq provides the fastest sample preparation workflow given that OTDDN incorporation eliminates the need for cDNA fragmentation and adapter ligation steps. Technologically, FusionPlex and QIAseq RNAscan are the most similar products to the FTAS-seq developed in this study. In terms of performance, the FusionPlex assay is known to be susceptible to low-quality inputs while QIAseq RNAscan panels were shown to generate many false positive calls [57]. This indicates that although semitargeted amplification is a very attractive approach for fusion analysis, currently available products are suboptimal.

Conclusion
The profiling of RNA fusions expands the therapeutic landscape of cancerous tumors. The technique developed in this work allows simple and costeffective high-throughput profiling of 3 0 -terminal fusion partners of any cancer-related genes of interest. The developed technique was 100% specific and showed a sensitivity of 95% as validated by RT-qPCR. Further development of FTAS-seq might include the exploration of multiplexing options (with the possibility to design a panel) and tailoring the protocol for lower RNA input amounts. Profiling of 5 0 -terminal fusion partners is also possible through the intermediate double-stranded cDNA synthesis step. In general, this novel method holds great potential to improve clinically relevant gene fusion detection for higher applicability of modern cancer therapies targeting ALK, ROS1, RET, and other gene fusions [58]. Besides, clinically relevant isoforms might contribute to the molecular biomarker toolbox that can be further explored for noninvasive detection opportunities [59].

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. The scheme of oligonucleotide-tethered dideoxynucleotide (OTDDN) synthesis. Fig. S2. The flowchart of NGS data analysis to detect fusion transcripts. Table S1. Clinical-pathological characteristics of the study cohort. Table S2. Reverse primers used in end-point PCR to validate novel breakpoints. Table S4. Fusion breakpoint sequences detected by FTAS-seq. Table S5. The comparison of the main technical characteristics of the RNA fusion detection methods. Table S3. The variety of fusion transcripts detected in individual patient samples by FTAS-seq, RT-qPCR and Sanger sequencing.