Deep sequencing of liver explant transcriptomes reveals extensive expression from integrated hepatitis B virus DNA

Hepatitis B virus (HBV) is a major cause of hepatocellular carcinoma (HCC). Integration of HBV DNA into the human genome may contribute to oncogenesis and to the production of the hepatitis B surface antigen (HBsAg). Whether integrations contribute to HBsAg levels in the blood is poorly known. Here, we characterize the HBV RNA profile of HBV integrations in liver tissue in patients with chronic HBV infection, with or without concurrent hepatitis D infection, by transcriptome deep sequencing. Transcriptomes were determined in liver tissue by deep sequencing providing 200 million reads per sample. Integration points were identified using a bioinformatic pipeline. Explanted liver tissue from five patients with end‐stage liver disease caused by HBV or HBV/HDV was studied along with publicly available transcriptomes from 21 patients. Almost all HBV RNA profiles were devoid of reads in the core and the 3′ redundancy (nt 1830‐1927) regions, and contained a large number of chimeric viral/human reads. Hence, HBV transcripts from integrated HBV DNA rather than from covalently closed circular HBV DNA (cccDNA) predominated in late‐stage HBV infection, in particular in cases with hepatitis D virus co‐infection. The findings support the suggestion that integrated HBV DNA can be a significant source of HBsAg in humans.


| BACKG ROU N D
Chronic infection with hepatitis B virus (HBV) is a major cause of hepatocellular carcinoma (HCC) 1 due to inflammation, formation of oncogenic viral proteins and integration of HBV DNA into the human genome. 2,3 The potential role of HBV DNA integrations for the development of HCC has been addressed in many studies. [4][5][6][7] Integrations are randomly distributed in all human chromosomes, 8 but some locations are reportedly more frequent in cancer tissue. 9,10 HBsAg expression from integrations has been proposed to be of importance for maintaining high levels of the hepatitis B surface antigen (HBsAg) in the blood. 13 This possibility is supported by the absence of HBsAg decline during prolonged antiviral therapy, by remaining HBsAg levels when HBV DNA is suppressed to low levels during the natural course of infection [14][15][16][17] and by high levels of hepatitis delta virus (HDV) with HBsAg in its envelope, despite low HBV replication. 18,19 If integrated HBV DNA were expressed, the transcripts would differ from HBV RNA derived from covalently closed circular DNA (cccDNA), which is the template for HBV replication. Transcripts from cccDNA are typically ≈3300 nt long (excluding the poly-A tail) and contain a 3′ RNA 'redundancy' with the 3′ end at nucleotide (nt) 1927.
By contrast, integration-derived RNA lacks the 5′ part that encodes the core antigen due to absence of an upstream promoter that initiates this transcription, 11 and ends at or slightly upstream of nt 1830. 4 For the present study, we utilized these predicted differences to assess the degree of expression from HBV integrations. For this purpose, we used transcriptome deep sequencing, 20 but with greater depth and longer read length than in standard RNA assays. The RNA purification comprised a ribosomal RNA depletion step and cDNA synthesis was performed with hexamer primers, a strategy aiming at increasing the likelihood of detecting reads that contain junctions between viral and human RNA as compared with using poly-A enrichment. Also, a sensitive bioinformatics pipeline was developed and applied on both transcriptomes obtained from analysis of explanted liver tissues and from publicly available RNA data from patients with HBV-related HCC (from International Cancer Gene Consortium, ICGC). The results imply that integrations are frequently expressed in HBV-infected human liver.

| Patients and liver tissue samples
Liver explants from five patients who underwent liver transplantation for HBV-related chronic liver disease were investigated, including two with HDV co-infection. Patients were selected to represent end-stage disease of chronic HBV infection, including cases with HDV co-infection, and included all liver transplanted HBV patients at our centre. All had liver cirrhosis and two HCC.
Liver tissue was obtained directly after surgery and without delay split into multiple pieces that were stored in 1.5 mL tubes at −70°C until analysed. For the analyses in the present study, we used slices from the frozen tissue pieces that were approximately 5 µm thick with an area of 1 cm 2 .

| Quantification of HBV DNA, HBsAg and HDV RNA in serum and liver tissue
HBV DNA and HBsAg levels in serum were quantified by Cobas TaqMan or Cobas 6800 (Roche Diagnostics) and by the Architect assay (Abbott), respectively. HDV RNA levels in serum were quantified by real-time PCR using primers HDV_F, GGATGCCCAGGTCGGAC and HDV_R, CCTCTTCGGGTCGGCAT, an MGB (minor groove binding) probe with a FAM fluorophore, ATCTCCACCTCCYCG, and a serial dilution of a plasmid carrying the target region as quantification standard.

| RNA extraction and library preparation
The liver tissue was homogenized as previously described. 21 The RNA was extracted using the RNeasy Mini Kit from Qiagen, and was, after confirmation of sufficient RNA integrity by TapeStation (Agilent Technologies Inc) analysis, processed by Eurofins/GATC for RNA-seq. Before library preparation of RNA, rRNA depletion was performed in order to enrich mRNA and other non-rRNA species.
For one library from patient 1, mRNA was also enriched for poly-A transcripts for comparison with rRNA depletion.
Prior to strand specific paired-ends library preparation (TruSeq Stranded Total RNA Library Prep Kit, Illumina Inc), the extracted RNA was fragmented using sonication into approximately 350 nt long fragments and converted to cDNA using random primers. No further size selection was made prior to sequencing. All RNA extracted, and all products from the library preparation were sequenced in all samples. Samples from patients 1, 2 and 4 yielded four, two and two libraries, respectively, and the sequence reads from these libraries were later combined in bioinformatics analysis.
Samples from patients 3 and 5 resulted in one library each. Transcriptome data were analysed using CLC Genomics Workbench (Qiagen) to (i) determine HBV read coverage and human gene expression, (ii) identify viral/human junction points and (iii) generate graphic profiles for the HBV RNA distributions. A customized bioinformatics pipeline for detection of all HBV reads and HBV/ human fusion reads was developed and applied. After trimming and quality analysis of the reads, reads that mapped to an HBV reference genome were identified using Burrows-Wheeler aligner. 22 Reads only partly mapping to HBV were detected using a softclip script and were aligned to the human reference genome (hg19) using BLAT. 23 In addition, paired-end reads with one read mapping to HBV in its entirety and a paired mate not mapping to HBV were extracted and also mapped to hg19 using BLAT. HBV reads with the same junction points and pair mates less than 400 base pairs (bp) from each other in the human genome were considered to represent the same unique integration. RNA splicing of HBV reads was detected using a STAR based script, 24 and by manual inspection of read mappings.

| Assessment of the proportion of putative integration-derived RNA
This estimation was based on the assumptions that all HBV RNA contains the X region and that integration-derived RNA does not contain the core region. It also assumes that precore RNA is much rarer than core RNA and that preS1 and X RNA are much rarer than preS2 RNA. Thus, the proportion of RNA that was integration-derived was calculated as follows: (average coverage of reads in X -average coverage of reads in core)/average coverage of reads in X.

| Analysis of publicly available data in the ICGC database
To expand the assessment of HBV transcriptome profiles, we analysed sequences retrieved from the LIRI-JP collection in the International Cancer Genome Consortium (ICGC) database (https:// dcc.icgc.org/proje cts/LIRI-JP), which contains RNA data from Illumina sequencing liver tissue samples from patients with HCC.
We retrieved RNA data from 21 patients that according to available metadata had chronic HBV infection (with tumour and non-tumour samples taken at time for resection or explantation) of which 37 samples were HBV RNA positive. The RNA sequences were analysed with the same bioinformatics pipeline as those from the liver explant patients.

| Ethics
The study was approved by the Regional Ethical Review Board in Gothenburg (registration number 835-17), and the patients gave informed oral and written consent to participate. This study is con-

| RE SULTS
The two sets of HBV RNA sequence data-from explant tissue and from the ICGC database-were processed applying the same bioinformatics strategy that extracted HBV reads to obtain RNA profiles and to identify viral-human fusions.

| HBV RNA profiles
The HBV transcriptome profiles are shown in Figure 1 (explant liver tissue) and Figure 2 (ICGC database sequences). Patient 1 had relatively high HBV DNA levels in serum (5.55 log IU/mL) and a transcriptome profile with moderate coverage of the core region, and ten times greater coverage in the S and X regions, suggesting that more than 90% of the S and X RNA derived from integrations (Table 1).
Patient 2 had a low HBV DNA level in serum (2.09 log IU/mL) as a result of tenofovir treatment for four months. The total number of HBV reads was low (2108 reads) in non-tumour tissue, with coverage in core that was similar to that in S and X regions, indicating that RNA originated from cccDNA. By contrast, tumour tissue showed a high HBV coverage (316 033 reads) and a profile with a 1000-fold greater depth in S and X compared with the core region, indicating absence of HBV replication in the tumour, but abundant transcription of S and X RNA, most likely from integrated HBV DNA. In theory, all RNA species from cccDNA should end at a common polyadenylation site at nt 1927. By contrast, RNA from F I G U R E 2 A-E, A shows a merged profile based on all HBV RNA reads in the 21 patients of the ICGC data set mapped to the HBV reference genome. B-E shows four RNA profiles from two ICGC cases (tumour and non-tumour). The profiles in blue show the HBV RNA reads coverage (max coverage left of the graphs). The bars above each profile show HBV genomic positions of each integration point, and the height of the bar represents the number of HBV/human fusion reads. The graphs have different Y-axis scales integrated HBV DNA (preS1, preS2 or X mRNA) should extend no further than nt 1830, but could be shorter if the genome has been truncated during integration. Shorter transcripts can also be generated from both cccDNA and integrations if an upstream polyadenylation signal was used. 25,26 As shown in Figure 1, transcripts that extended beyond nt 1830, indicating that cccDNA was the source, were found in patient 1 (the only patient with high HBV DNA in serum) and in non-tumour tissue from patient 2, that is only in samples that also contained significant amounts of core RNA.
The profiles of the HBV transcriptome in the ICGC data set were similar to those observed in explant tissue from patients.

| Fusion reads
A more direct way to demonstrate expression of integrated HBV DNA is to identify fusion reads consisting of both viral and host RNA. As shown in Figure 1, fusion reads were observed in all patients, but not all samples. More than 99% of fusion reads were composed of a 5′ HBV part and a 3′ human part. The number of fusion reads differed markedly between the samples (details in Table   S1), but the proportion of HBV reads that were fusion-derived was similar (range 0%-7%). The number of unique HBV/human fusion reads (with ≥2 reads coverage) ranged between 0 and 37. The tumour tissue in the sample with HBV-induced HCC (patient 2) contained a large number of fusion reads with the same junction point.
Overall, fusion reads were very frequent in the ICGC data set (range 0%-18% of all HBV reads), and almost all had an HBV 5′ part ending in the region 1750-1830 followed by a human sequence. All ICGC sample integrations are presented in Table S2. In addition to the integrations previously reported in the ICGC data set, 7 our analysis detected many new unique HBV integrations and the total number of fusion reads was also higher.

| Expression of human genes adjacent to HBV integrations
To explore the potential impact of HBV integration on the expression of human genes, we compared the human RNA data from our samples with published mRNA data for liver tissue from healthy individuals devoid of HBV infection. 27 Most integrations were found in introns or intergenic areas and had no significant impact on the human gene expression; 12% were found in exons (details in Table S1).

| Reads representing HBV splicing or recombination
Spliced HBV RNA forms were found in six out of seven explant samples but represented less than 1% of the total HBV RNA. The most common spliced RNA showed ends joining between nt 2067 and nt 489, and was found in >2000 reads in the tumour sample from patient 2. The second most common was nt 2985-489, found in 55 reads in patient 1. In the ICGC data set, HBV splicing was not abundant and did not seem to affect the transcriptome profiles.
Out of the >1 500 000 HBV reads in the ICGC data, only 306 reads represented splicing at previously reported donor/acceptor HBV splice sites, and the most common was nt 458-489 joining (28 reads in total). RNA reads suggesting recombination or deletions of (presumably integrated) HBV DNA rather than splicing were present in several samples both in the liver explants and in the ICGC data set, and the latter had a total of 47 178 reads indicating such events, representing 3% of all HBV reads.

| Hepatitis D virus RNA
Analysis of HDV RNA reads in liver tissue was performed on the

| D ISCUSS I ON
Three findings in this study argue that a large proportion of the HBV RNA in liver tissue, and indirectly much of HBsAg in serum, likely originates from integrated HBV DNA. First, most samples had a transcriptome profile with a much lower number of reads in the core region than in the S regions, indicative for a linear template that lacks a promoter upstream of the core gene. 29 Second, reads representing the 3′ 'redundancy' between nt 1830 and 1927 were rare, in agreement with the expected absence of this part in RNA from integrated HBV DNA. Third, we detected a large number of fusion reads, that is RNA with a 5′ viral part followed by a 3′ human part, almost all located near or upstream of nt 1830.
By comparing the number of reads mapping to the core and to the S region, we estimated that in most cases more than 90% of the HBV RNA were from integrations. An alternative calculation, based on the number of fusion reads, estimated that 10%-70% of all HBV RNA was derived from integrations. The latter estimation was obtained by the finding that fusion reads constituted 1%-7% of all the HBV reads and the assumption (based on read length and genome size) that fusion reads should represent 10% of all HBV RNA from integrations. The lower rate of integrations indicated by the frequency of fusion rates might to some extent be explained by polyadenylation of some of the integration-derived RNA at an upstream human poly-A signal, because then they would not be identified as fusion reads originating from integrations. Another possible explanation is that either the HBV or human part of some fusion reads was too short to be mapped and BLATed to the corresponding reference genome, and thus not be counted as an HBV integration. Despite these differences, both these analyses of RNA read counts indicate that expression of integrated DNA was significant and likely encoded most of the HBV RNA.
In a previous study in chimpanzees treated with an RNAi drug, similar HBV RNA profiles with few reads mapping in the core region were observed in three HBeAg-negative animals, and in one of these animal host-viral fusions were shown to be frequent by using single-molecule real-time sequencing. 12 Our observations corroborate these results in humans and in a larger number of individuals. A recent study analysed liver tissue (non-tumour and tumour) from five patients, mainly by targeted amplifications of different HBV transcripts followed by traditional sequencing, and found that HBV RNA from cccDNA overall was in minority or lacking. 19 RNA-seq was applied on four cases in that study, but as in previous investigations of the ICGC data set HBV transcriptome profiles were not presented.
In contrast, our study provides well-supported RNA profiles cover- were also on antiviral treatment and had no detectable HBV DNA in serum, but had relatively high levels of HDV RNA and HBsAg. In these cases, the transcriptome profiles almost completely lacked core region reads, suggesting that essentially all of the HBV RNA was integration-derived. The findings in patients 4 and 5, with high HDV RNA levels in serum and many HDV RNA reads in liver tissue in the absence of HBV core reads, suggest that HDV might replicate in hepatocytes that lack cccDNA and HBV replication, solely relying on HBsAg from HBV DNA integrations. This has to our knowledge not been observed in any previous clinical study, but is supported by in vitro data, 32 and experimental infection of humanized mice with HDV. 33 The potential oncogenic effect of HBV integrations was not in focus of this study. Notably, however, the tumour tissue from patient 2 contained 34 unique HBV integrations, of which a few predominated, probably as the result of mono/oligoclonal expansion. 34 Also, the HBV integration with the second highest coverage was found in TERT, an oncogene previously associated with HBV-induced HCC. 35,36 The expression of TERT was 1000-fold compared with healthy liver tissue. Interestingly, the tumour in patient 5 showed essentially no HBV reads, suggesting that in this case oncogenesis occurred independently of HBV integration, probably mainly as a result of HDV-induced inflammation.
Previous studies have described presence of spliced HBV RNA in liver tissue and serum, 37 and some have reported associations between spliced forms and clinical stage 38,39 or interferon treatment. 40 Our pipeline searched for reads containing a splicing point or read pairs indicative of splicing using known splicing donor acceptor sequences, including previously described HBV splicing sites. Overall, we observed low number of reads representing spliced HBV RNA forms (<1% in the explants, <0.1% in the ICGC data set). Although this seems to indicate that spliced HBV transcripts are rare, they might still have significant function and effects. Notably, the greatest number of spliced HBV RNA reads was found in tumour tissue, in accordance with the more frequent finding of spliced variants in serum samples from patients with HCC. 38 We also specifically searched for a spliced chimeric RNA derived from an HBV integration in the human CCNA2 gene, which has been suggested to have an oncogenic role. 41 The spliced transcript in the CCNA2 gene was not detected in any of our samples, but non-spliced RNA from an integration in CCNA2 was detected in one tumour sample in the ICGC data set. Several samples contained reads with rearranged HBV RNA sequences that were not at previously reported splicing sites, but likely represent recombinated integrations as previously observed. 21 The analysis of data from tumour and/or non-tumour samples from the 21 patients in the ICGC database showed transcriptome profiles that were very similar to those obtained in the liver explants, with few reads in the core region or beyond the typical fusion site near nt 1830. 4 In most of the ICGC cases, the HBV/human fusions were located to positions between nt 1750 and 1830 in the HBV genome. However, in one non-tumour case the predominant fusion point was at position 454, and this was strikingly reflected in the HBV transcriptome profile, as shown in Figure 2.
In summary, these results support that integrated HBV DNA could be an important source of HBsAg in patients with late-stage chronic HBV infection, and possibly even more so in patients with HDV infection. 12,13 Since HBsAg negativity is considered a requisite for cure, future HBV treatment may have to target HBsAg production from cccDNA as well as from HBV integrations. 30 Further studies on the abundance, histological distribution and clinical importance of HBV integrations are warranted in all phases of infection.

ACK N OWLED G EM ENTS
The authors would like to thank the clinical contributors and the data producers of the International Cancer Genome Consortium who have provided data for the LIRI-JP data set 28 . ICGC data were used in accordance with ICGC guidelines (https://icgc.org/icgc/goals -struc ture-polic ies-guide lines).

CO N FLI C T O F I NTE R E S T
None of the authors has any conflict of interest.