Transcriptome sequencing of archived lymphoma specimens is feasible and clinically relevant using exome capture technology

Formalin‐fixed, paraffin‐embedded (FFPE) specimens are an underutilized resource in medical research, particularly in the setting of transcriptome sequencing, as RNA from these samples is often degraded. We took advantage of an exome capture‐based RNA‐sequencing protocol to explore global gene expression in paired fresh–frozen (FF) and FFPE samples from 16 diffuse large B‐cell lymphoma (DLBCL) patients. While FFPE samples generated fewer mapped reads compared to their FF counterparts, these reads captured the same library complexity and had a similar number of genes expressed on average. Furthermore, gene expression demonstrated a high correlation when comparing housekeeping genes only or across the entire transcriptome (r = 0.99 for both comparisons). Differences in gene expression were primarily seen in lowly expressed genes and genes with small or large coding sequences. Using cell‐of‐origin classifiers and clinically relevant gene expression signatures for DLBCL, FF, and FFPE samples from the same biopsy paired nearly perfectly in clustering analysis. This was further confirmed in a validation cohort of 50 FFPE DLBCL samples. In summary, we found the biological differences between tumors to be far greater than artifacts created as a result of degraded RNA. We conclude that exome capture transcriptome sequencing data from archival samples can confidently be used for cell‐of‐origin classification of DLBCL samples.

Recently, a novel RNA capture approach was developed for profiling samples from degraded tissue using next-generation sequencing. 2 This method takes advantage of exon-targeting probes to enrich for coding RNA fragments and covers as much as 97% of the coding genome with high read mappability (94%-96%) from degraded RNA, 3 providing an attractive option for investigating FFPE samples.
Diffuse large B-cell lymphoma (DLBCL) is the most common lymphoid malignancy in adults and accounts for about 35% of all non-Hodgkin lymphomas. 4 The disease originates from mature B-cells and, based on gene expression profiling (GEP), is subclassified in to clinically relevant subtypes, that is, germinal center B-cell (GC), activated B-cell (ABC), and an unclassified subtype, with distinct outcomes. [5][6][7][8][9] In clinical settings, various alternative approaches have been developed that are primarily based on immunohistochemistry to subclassify DLBCL patients as defined by GEP. Among these, one of the most commonly employed algorithms, proposed by Hans et al., uses staining of CD10, BCL6, and MUM1 and subsequently subgroups the patients into GC or non-GC (NGC) subgroups. 10 More recently, the lymphoma/leukemia molecular profiling project (LLMPP) developed a digital 20 gene expression-based test, the Lymph2Cx assay using NanoString technology, which has been shown to assign robustly DLBCL subtypes in FFPE tissue. 11,12 Although a number of studies have investigated different RNAsequencing (RNA-seq) protocols for degraded and low-quantity samples, 3,[13][14][15][16][17][18][19][20] there is none that, to the best of our knowledge, has applied the coding RNA enrichment assay and investigated clinically relevant gene expression signatures to assess performance in paired FFPE/fresh-frozen (FF) routine samples. Therefore

| Bioinformatics
Raw sequencing reads were aligned using the nf-core/RNA-seq (1.0), a pipeline written in Nextflow. 22 Briefly, raw reads were adapter trimmed with the help of Trim Galore (0.5.0) 23 using standard parameters and aligned to the reference genome (hg19) using STAR (2.6.1). 24 Duplicate reads were estimated with Picard's MarkDuplicates (2.18.14) 25 and Dupradar (1.8.0). 26 Quality of reads was determined using FastQC (0.11.7) 27 and RSeQC (2.6.4) 28 and quality metrics were summarized with the help of MultiQC (1.6). 29 Counts were retrieved with FeatureCounts 30 and counts for protein-coding genes were normalized with DESeq. 31 Counts were transformed using variance stabilizing transformation 32 or regularized logarithmic transformation 31 for visual representation. Transcript lengths were retrieved from the Ensembl database. 33 Downstream statistical analysis was performed, and all images were generated, in R (3.4.4).

| Assessing cell-of-origin using gene expression signatures and the NanoString assay
We used three different published signatures to assess our dataset.  Table 3S) was employed for the same purpose. 35 In addition to these signatures, the Lymph2Cx assay and Research Use Only version of the NanoString Lymphoma Subtyping Test (NanoString Technologies, Seattle, WA, USA) was utilized to classify COO subtype for all FFPE samples according to manufacturer's protocol. 11,36 Briefly, gene expression data from five housekeeping genes were initially analyzed to assess input RNA quality, while gene expression results from the gene panel included in the Lymphoma Subtyping Test were applied to create a linear predictor score to classify cases as ABC, GC, or unclassified subtypes (Supporting Information Table 1S).

| RNA isolation and integrity
To evaluate the performance of the Illumina TruSeq RNA Exome protocol in degraded samples, we performed RNA-seq on matched FF and FFPE samples from the same biopsy in 16 DLBCL cases. Sample integrity was assessed prior to library preparation using the DV 200 metric. We confirmed that FFPE samples displayed significantly lower RNA integrity compared to their FF counterparts (median DV 200 value of 39% vs. 83%, p < 0.001, Figure 1A and Table 4S). Furthermore, we found no correlation between storage time and RNA integrity for the FFPE samples ( Figure 1B). For two cases, the FFPE samples did not meet the recommended RNA quality measurements for the RNA Exome protocol (DV 200 ≥ 30%); these cases had instead DV 200 values of 21% (sample FFPE_15) and 27% (sample FFPE_4). Post alignment quality control assessment did, however, indicate sufficient quality for further downstream analysis and these samples were included in the study for comparative purposes and are highlighted in the analyses and further discussed below. All samples for the validation cohort displayed a DV 200 value > 30%.  Figure 2A and Table 4S). We observed differences in proportion of all mapped reads (97.4% ± 0.5 vs. 96.7% ± 0.6, average ± SD, p < 0.01) and in reads mapping to coding exon sequences (96.6% ± 0.7 vs. 94.7% ± 0.8, average ± SD, p < 0.001) when comparing FF to FFPE tissue, respectively (Figures 2B, C), values highly similar to previously published data. 3 All samples displayed high base sequence quality indicated by the Phred quality score with no difference between FF and FFPE samples ( Figure 1S).  Figure 2D). Although the two aforementioned samples with RNA integrity values lower than recommended (marked in red in Figure 2) displayed less than the average percent mapped reads as well as percent reads mapping to coding sequences (Figures 2B, C), they were not found to deviate from other FFPE samples when calculating the proportion of uniquely mapped reads ( Figure 2D) and were thus included for further analyses. The sequencing read metrics are summarized in Table 3S. Finally, in order to investigate potential batch effects among the different batches of library preparations and sequencing runs, we evaluated all samples using principal component analysis and noted no difference in global expression from the different sequencing runs ( Figure 2S).

| Transcript coverage and expression
We compared the variation in gene body coverage in FF versus FFPE tissue measured as number of reads for each coding base along the length of each gene in 5 0 -3 0 direction. We detected a highly similar  Figure 3D).
In order to get a view of gene size distribution in our sequencing libraries, we plotted the gene density of all expressed genes as a function of gene length. Among these genes, 95% displayed a length of 750-4250 coding bases with the most frequent gene length found around 1600 bases ( Figure 4A). We then plotted mean normalized expression of uniquely mapped coding transcripts for both FF and FFPE samples as a function of gene length. Here, genes within the mentioned range had uniformly high expression and displayed the low relative differences in expression when comparing tissue types and low variance within each tissue type ( Figure 4A). In contrast, we observed notable differences for small and large genes in gene expression when comparing FF and FFPE samples outside the mentioned size range. Based on these findings, we excluded low expressing genes defined as having fewer than 10 mapped reads on average for the dataset or genes having extremely small or large size as described above and found a notable reduction in variability in the number of genes expressed when comparing FF (11.7 K, range 11.0-12.5 K) and   (Table 1).

| Evaluation of COO and gene signatures
Next, we applied hierarchical clustering and previously published gene signatures that have been shown to classify DLBCL samples into biologically and clinically relevant subgroups to investigate similarity in expression within paired samples and across patients. 8, 35 We first employed a gene set published by Barrans et al. based on data from the LLMPP. 8,11 Here, all FFPE samples paired together with their FF counterparts from the same biopsy ( Figure 6A). We repeated the same exercise, this time using the B-cell-associated gene signature defined by Dybkaer et al. 35 (Figure 6B). Similar to the Barrans et al.

| NanoString assay
All samples were analyzed using the Lymph2Cx assay to determine COO. One sample (FFPE_15, DV200 value of 21%) returned a borderline quality score but was included in the analysis. Altogether, data from the NanoString assay correlated very strongly with the findings from RNA-seq analysis using the Reddy et al. algorithm for COO classification 34 (Table 1), with the only exception being Sample 3 which was classified as a GC subtype using RNA-seq data, while the NanoString assay identified the same sample as belonging to the unclassified subtype.  (Table 5S). This was followed by the comparison between RNA-seq and the NanoString assay, which showed 44/50 (88%) concordance, while the Hans and NanoString algorithms displayed 41/50 (82%) concordance (Table 5S).

| DISCUSSION
The primary objective of this study was to assess the performance of the Illumina Truseq RNA Exome protocol for the analysis of global gene expression from degraded RNA in archival samples. To this end, we performed RNA-seq on paired FF and FFPE samples from the same biopsy in 16 DLBCL samples. As expected, RNA extracted from FFPE tissue displayed significantly lower sample integrity when compared to RNA isolated from FF tissue. However, there was no correlation between storage time and FFPE sample RNA integrity. The latter observation is also in line with previous findings that for FFPE samples stored at room temperature, there is a significant and rapid decline in RNA quality within the first 6 months after which time the samples become stable. 38 The median time to sample isolation for our FFPE cohort in this study was 6 years (ranging up to 11 years) implying that sample age is not necessarily a limiting factor when choosing cases for analysis.
Mapped read metrics for the paired samples revealed that FF samples had longer average read lengths after adapter trimming and more frequently mapped to coding sequences. In addition, for several samples, sequencing data from FF tissue generated significantly higher number of total reads (Table 4S). Having said that, the FFPE tissue samples averaged 27.5 million mapped reads with 94.7% mappability to coding sequence and showed high base quality. Given that RNAseq experiments using intact RNA to study global gene expression often make use of ribosomal RNA depletion protocols with far lower T A B L E 1 Cell-of-origin subtype classification of the formalin-fixed, paraffin-embedded (FFPE) diffuse large B-cell lymphoma (DLBCL) samples using the Hans, 10 Reddy et al. 34 (RNA-seq classification) and NanoString algorithms 11 mappability to the coding exome (commonly around 60% of reads) and a typical targeted sequencing depth of around 30-40 million reads, the sequencing depth obtained in our experiments with FFPE RNA appears at the very least to be on a comparable level. Additionally, in our experiments, among the reads that did map to coding sequences, no difference was detected in the proportion of uniquely mapped reads when comparing the two tissue types, suggesting that library complexity is not diminished in FFPE tissue. This was also true for the two samples having RNA integrity values below the recommended threshold for the protocol (DV 200 > 30%). We next explored the read distribution along the length of all expressed genes in our dataset and found a striking similarity in the two tissue types.
We also noted that mapped reads were predominantly distributed in the second and third quartiles when compared to the 5 0 or 3 0 regions ( Figure 3A) as previously reported. 3,15,18 We detected on average a similar number of expressed genes for both tissue types  Figure 3D and Table 4S).
In order to explore further differences in gene expression between the tissue types, we first investigated the frequency at which expressed genes of various lengths are detected. In our dataset, 95% of all expressed genes displayed a length of 750-4250 coding bases.
Within this range, we noted only marginal inter-and intra-tissue variability in gene expression while genes on either side of this size interval instead showed significant differences both within and across tissue types. It could be argued that genes with extremely large coding sequences might be more prone to, and that small genes with relatively low expression are more sensitive to, degradation, in particular in FFPE tissue. Interestingly, in both these instances, we noted a higher expression in FFPE samples compared to FF counterparts, which points towards technical artifacts as the main reason for the variation seen in the dataset. However, to explore fully the underlying cause of the observed variation for these genes would be beyond the scope of this study. Regardless and based on our data, it is difficult to estimate accurately true expression for genes that are either very small or large in their coding sequence and these genes should therefore be either removed from subsequent analysis or at the very least be interpreted with caution. Nonetheless and despite these apparent differences between tissue types, we found in unfiltered data a nearperfect correlation in expression levels when comparing housekeeping genes alone or investigating all expressed transcripts indicating that global similarities in expression greatly outweigh the tissue-specific differences.
Ultimately, the best measure of FFPE RNA-seq performance is how well the data generated can identify known biological differences in samples in comparison to data obtained from FF tissue. This is one area in which previous research evaluating RNA-seq in FFPE tissue is lacking. 3,[13][14][15][16][17][18][19][20] In the present study, we first applied the classifier developed by Reddy et al. 34 Figure 6A). Most notably, all FFPE samples paired with their FF counterparts underscoring that the biological differences between samples appear to be far greater than artifacts created as a result of RNA degradation ( Figure 6A). We also used our dataset to test a refined DLBCL classification system, recently proposed by Dybkaer et al. 35 which is based on subset-specific BAGS for naive, centrocyte, centroblast, memory, and plasmablast B cells. The authors found that BAGS assignment was significantly associated with overall survival and progression-free survival within the GC subclass, where the centrocyte subtype had a superior prognosis compared with the centroblast subtype. 35 Here, samples within four pairs classified differently when comparing FF to FFPE tissue. However, in clustering analysis, three of the four FFPE samples paired correctly with their corresponding FF counterpart, while the final sample was found in the nearest branch. All FFPE cases displayed a highly similar gene expression pattern compared to their FF counterparts and, although classified differently, did not cluster with other samples of the same BAGS classification. This is an indication that the paired samples are still biologically highly similar, and that the classification mismatch is more likely due to the algorithm which in our study only contains 180 genes from the original list of 223 genes ( Figure 6B). Finally, we also investigated a validation cohort of 50 FFPE DLBCL samples where we compared COO classification using the Hans, RNA-seq, and Nanostring algorithms. Overall, the three different methods were highly concordant again confirming the feasibility of using capturebased RNA-sequencing for gene expression analysis.
In conclusion, our data suggest that archived routine FFPE samples can reliably be used for differential expression analysis using the Illumina Truseq RNA Exome protocol and RNA-seq. In addition, we recommend gene expression data to be filtered for both lowly expressed genes as well as those having extremely low or high number of coding bases as it may be difficult to accurately estimate true expression for these genes. Having said that, these recommendations do not solely apply for RNA-seq data generated from FFPE tissue but are also true for FF samples.

ACKNOWLEDGMENTS
The computations were performed on resources provided by Swed-

DATA AVAILABILITY STATEMENT
RNA-seq data is available upon request.