Insertional oncogenesis by HPV70 revealed by multiple genomic analyses in a clinically HPV‐negative cervical cancer

Abstract Cervical carcinogenesis, the second leading cause of cancer death in women worldwide, is caused by multiple types of human papillomaviruses (HPVs). To investigate a possible role for HPV in a cervical carcinoma that was HPV‐negative by PCR testing, we performed HPV DNA hybridization capture plus massively parallel sequencing. This detected a subgenomic, URR‐E6‐E7‐E1 segment of HPV70 DNA, a type not generally associated with cervical cancer, inserted in an intron of the B‐cell lymphoma/leukemia 11B (BCL11B) gene in the human genome. Long range DNA sequencing confirmed the virus and flanking BCL11B DNA structures including both insertion junctions. Global transcriptomic analysis detected multiple, alternatively spliced, HPV70‐BCL11B, fusion transcripts with fused open reading frames. The insertion and fusion transcripts were present in an intraepithelial precursor phase of tumorigenesis. These results suggest oncogenicity of HPV70, identify novel BCL11B variants with potential oncogenic implications, and underscore the advantages of thorough genomic analyses to elucidate insights into HPV‐associated tumorigenesis.


| INTRODUCTION
Infections with human papillomavirus (HPV) types that are classified as high-risk by epidemiological criteria (hrHPVs) underlie the vast majority of invasive cervical carcinomas (ICCs). 1 ICCs are responsible for 4.5% of cancer deaths in women worldwide and comprise a substantial fraction of the estimated 12% of cancers worldwide caused by various viruses. 2 Multiple large studies showed hrHPV types to be present in at least 90% of ICCs using established, PCR-or hybridizationbased methods for viral DNA detection. 3,4 HPV types categorized as group 1 known carcinogens by the International Agency for Research on Cancer working group are phylogenetically classified within the high-risk Alphapapillomavirus-7 and Alphapapillomavirus-9 clades of the Alphapapillomavirus genus and include the HPV types HPV16, HPV18, HPV31, HPV33, HPV35, HPV39, HPV45, HPV51, HPV52, HPV56, HPV58, and HPV59, with HPV16 and HPV18 being the most frequently detected types in ICC. 5 Despite also belonging to the species Alphapapillomavirus 7 and Alphapapillomavirus 9, other less frequent HPV types also belonging to these clades (HPV26, HPV30, HPV34, HPV53, HPV66, HPV67, HPV69, HPV70, HPV73, HPV82, and HPV85), do not fulfill the rigorous epidemiological criteria required to be categorized as group 1 carcinogens. 5 At least 40 HPV types are known to infect the urogenital tract. 1,6,7 Up to~10% of invasive cervical cancers are currently considered HPV-negative. 3,4 This might be because of tissue sampling errors, the portion of the HPV genome targeted by the assay was deleted, or the tumor is associated with a low prevalence HPV type. 8 An additional formal possibility is that some small fraction has a non-HPV etiology.
Population screening for women for cervical cancer historically has been accomplished by cytology (Papanicolaou test), but multiple clinical trials have provided evidence of improved sensitivity of hrHPV testing for detection of cervical cancer precursor lesions compared to cytology alone over multiple rounds of screening and follow-up. [9][10][11][12][13] The HPV types included in the screening assays are based on epidemiological prevalence data that balance sensitivity and specificity so as not to overtriage women to secondary screening such as colposcopy, and they currently include at most the 14 hrHPV types. Exclusion of less prevalent types from HPV screening reduces possible iatrogenic morbidity from subsequent colposcopy and excisional procedures following detection of viral types of uncertain pathological significance, and also reduces the costs involved in more extensive testing. 14 Unlike population-based screening, diagnostic HPV testing of identified preinvasive and invasive cervical lesions is applicable to research driven assays.
Although at least half of sexually active women incur a genital tract HPV infection during their lifetime, 15  HPV infection proceeds with the~8 kbp viral DNA genome replicating as a circular episome initially in the basal epithelial cell layer. A key step in HPV-associated tumorigenesis is integration of viral DNA into the human genome. [16][17][18][19] At least part of the viral genome is inserted into human DNA in most HPV-induced ICCs, and the integration process is thought to be mediated by host DNA repair mechanisms. Integration causes stable association of the virally encoded oncogenes with a host cell, triggers human genome rearrangements, and drives expression of the human oncogenes that flank the sites of integration. [20][21][22][23][24][25] In most ICCs, the HPV genome is inserted near one allele of a human, dominant-acting oncogene, and many human oncogenes have been identified as recurring HPV insertion sites in tumors. 22,[26][27][28][29][30] The presence of viral DNA adjacent to such human oncogenes is thought to be a consequence of integration occurring at a stochastic position within the human genome, followed by selectively advantageous, clonal proliferation of cells that suffered an integration near such oncogenes. Insertional oncogenesis by activation of flanking oncogenes is a long studied and thoroughly established mechanism of tumorigenesis by various retroviruses in experimental and naturally occurring animal systems in which insertion of viral DNA containing strong transcriptional elements, particularly enhancers, alters the transcription of the flanking gene. [31][32][33][34][35] This is also the causal mechanism in the tumors that occurred during gene therapy studies in humans. 36,37 HPV activation of nearby oncogene transcription may entail a variety of mechanisms including fusion transcripts. These initiate at viral promoters and proceed into the host genes. Viral enhancers may also activate flanking oncogene promoters. [38][39][40] Approaches to detect integrated viral DNA have included RNA sequencing (RNAseq) to detect fusion transcripts, and DNA sequencing strategies including PCR or hybridization capture to identify the junctions between inserted HPV and human genome DNA. [20][21][22][23] However, in most instances only one of the two junctions between the linear HPV DNA fragment and human DNA was identified, and structural characterization of the inserted viral DNA was incomplete or inferred. Here we describe the use of a multiple HPV type, hybridization capture approach yielding sufficient DNA recovery and massively parallel sequencing depth to identify both junctions of an  49 Junction fragments were computationally identified using a combination of programs; Delly 50 and SplazerS, 51 both of which specify a combined read pair discordancy and split read analysis.

| Long sequencing reads using the Oxford Nanopore Technologies MinION
Tumor-derived gDNA was purified and concentrated using the genomic DNA Clean and Concentrator-10 (gDCC-10) kit following manufacturer instructions (Zymo Research); concentration was assessed by Qubit and quality was assessed by Nanodrop (Thermo Fisher Scientific). The purified gDNA was then sheared by g-TUBE (Covaris, Woburn, MA) to 10 kb. Genomic libraries were prepared using the Oxford Nanopore 1D ligation library prep kit SQK-LSK108 following manufacturer instructions. Two independently prepared tumor DNA libraries were loaded onto two R9.4 flow cells and sequenced using a MinION Mk1b device (Oxford Nanopore Technologies [ONT]) using the standard 48-h scripts. Postsequencing base calling and FASTQ extraction was performed using Albacore v2.0 (ONT), whereby raw voltage channel data is translated into canonical nucleotides. All libraries and quality filtered pass reads (Q ≥ 7) were used for the subsequent analysis. Library-specific adaptors were trimmed and possible internal adaptors split using Porechop 41 with default parameters for adaptor identification (90% identity), end trimming (75% identity), and internal splitting (85% identity). Sequence reads were aligned to our custom combined reference genome using Ngmlr, 52 a structural variant caller that uses a structural variant aware k-mer search to approximate alignments followed by a banded Smith-Waterman final alignment, along with a convex gap cost model to account for higher sequencing error frequencies associated with long reads. Structural variants were called using Sniffles 52 with parameter adjustment for expected low coverage.
2.4 | Detection of chromosome structural alterations, aneuploidy, and copy number variation by whole genome sequencing and single nucleotide polymorphism analysis After sequencing, paired end reads were aligned to our custom human (GRCh37/hg19) plus HPV70 reference genome using BWA. 49 Files were mapped in BAM format, sorted using SAMtools, 53 and duplicates removed using Picard. 54 Genomic variant detection was accomplished for single nucleotide polymorphisms (SNPs)/small insertions and deletions (indels) using GATK v3.8, 55 structural variant (SV) detection using Delly v0.7.3, 50 and copy number variants detection using control-FREEC. 56 Following genomic variant detection, variants were annotated using the software ANNOVAR. 57 Tumor genomic copy number was also assessed using the Genome-Wide Human SNP array 6.0 (Thermo Fisher Scientific). DNA prehandling and array hybridization were performed according to the manufacturer's instructions (Affymetrix, Santa Clara, CA) and scanned in an Affymetrix GeneChip Scanner 3000. Data analysis and visualization was performed in Chromosomal Analysis Suite v4.0 (ChAS, Thermo Fisher Scientific) with a threshold of minimum 100 kb and 50 markers using NCBI build GRCh37/hg19.

| Global transcriptomic profiling of tumor RNA
Total RNA (2 μg) isolated from six, 10 μm, tumor sections with an RNA integrity number of 9.3 (Agilent 1000) and subjected to oligoT magnetic bead enrichment. Libraries were prepared following the NEB standard protocol as follows: cDNA was synthesized using random hexamer primers and M-MuLV RT (RNaseH-) followed by second strand synthesis with DNA polymerase I and RNaseH. The double stranded cDNA was purified using AMPure XP beads (Beckman Coulter), end tail repaired, adaptor ligated, size-selected, and PCR ampli- were aligned to the reference using STAR v2.5. 42 HTSeq v0.6.1 59 was used to count the read numbers mapped of each gene. As there was no tumor-free matched control for this patient, differential expression analysis was not performed. As our main interest was in potential fusion transcripts between BCL11B and HPV70, STAR-Fusion 0.8.0 60 was used for the detection of fusion transcripts.    Defined human papillomavirus oncogenes are depicted in red, other early genes in green, late genes in yellow, and the upstream regulatory region (URR) in blue. The inner circle shows a histogram of sequencing coverage ranging from of 0 to 4200 reads. All reads corresponded to the segment from genome positions 6906 to 2380 containing the URR E6 and E7, with no reads detected in the remaining half of the viral genome BCL11B were transcribed, and whether HPV70-BCL11B fusion transcripts were present. Therefore, RNAseq was performed on tumor RNA, and reads were aligned with the HPV70 segment-containing BCL11B gene ( Figure 3A). HPV70 transcripts derived almost entirely from a segment encompassing E6, E7, and E1 precisely up to the standard E1^E4 5 0 splice site (5 0 ss; Figure 3B). E6 and E7 are normally translated from transcripts containing both open reading frames (ORFs), 68 and about 40% of the transcripts in this tumor were consistent with excision of the standard HPV E6* intron ( Figure 3B).

| Validation of HPV DNA integration junctions and HPV70-BCL11B fusion transcripts
However, only few of the transcripts included the AUG start codon for E6 ( Figure 3C), indicating that only a fraction of transcripts could contain the full-length E6 or E6* ORFs, and those that did had very Such splice junctions from HPV70 to BCL11B ( Figure 3D) were identified in RNAseq split reads (Supporting Information Figure S2).
Most of these joined the viral E1^E4 5 0 ss to the 3 0 ss for BCL11B exon 4, showing that viral-host fusion transcripts were synthesized in this tumor. In addition, about 4% of E1^E4 5 0 ss's were joined to a cryptic BCL11B exon, here termed exon 3B, between the viral insertion and exon 4 ( Figure 3D). Cryptic exon 3B was spliced to exon 4 downstream, and both the 5 0 and 3 0 intronic sequences immediately flanking the cryptic exon had standard splice signals (Supporting Information Figure S3). Alignment of the RNAseq reads with a normal F I G U R E 3 RNAseq analysis of HPV70 and BCL11B transcripts in the tumor. A, Genetic diagram of the BCL11B gene allele and the HPV70 DNA insert, with exons and HPV70 are drawn larger than scale of the full-length BCL11B gene segment. The 5 0 to 3 0 transcriptional orientation of both BCL11B and HPV70 is from left to right. Unspliced primary transcripts inferred from the 150 nucleotide RNAseq reads are shown below the genomic DNA with introns depicted as dotted segments. Positions of key splicing components are labeled, specifically the E1^E4 5 0 ss, the BCL11B exon 4 3 0 ss, and the E6* intron. ORFs encoded by the spliced mRNAs that were supported by split reads are diagramed below the primary transcripts. The top two are HPV70-BCL11B fusions, while the bottom three are BCL11B transcripts. HPV70 sequences are in blue, and BCL11B sequences are in green with variably spliced exon 3B in green outline and white center. The alternative ORF of exon 4 that is read from exon 3B-containing transcripts (ORF 2) is shown in orange. B, RNAseq coverage of the HPV70 segment showing that virtually all reads located to the region encompassing E6, E7 and the portion of E1 up to the E1^E4 5 0 ss. A decreased number of reads was present over the E6* intron indicating removal of that intron in a fraction of the transcripts. Colored vertical bars are nucleotides that differ from the HPV70 reference sequence. C, Enlarged view of RNAseq reads near the start codon for HPV70 E6 gene ORF showing the limited number of reads near the ATG start codon. The E6 ORF from the ATG start codon is shown in red at the bottom. D, Sashimi plot showing split read support for splice junctions. The diagram below the plot shows the positions of HPV70 DNA (blue) and the BCL11B exons (green), and represents the custom-generated genomic segment containing the HPV70 DNA insertion that was used to obtain the plot. E, Sashimi plot of BCL11B exon splice junctions obtained using the BCL11B genomic segment without the HPV70 DNA insertion demonstrating the inclusion of alternative transcripts with exon 3B in this tumor, including split reads supporting splicing from exon 2 to exon 3B. F, The amino acid sequence encoded by the HPV70 E1-BCL11B exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. The six C2H2 zinc fingers within the exon 4 encoded segment are highlighted in yellow with the cysteine residues in purple and the histidines in red. Sixty-six split reads supported the E1 to exon 4 splice junction, five of which are presented in Supporting Information Figure S2. G, The amino acid sequence encoded by the HPV70 E1 to BCL11B exon 3B and exon 4 fusion ORF. HPV70 E1 encoded amino acids are highlighted in blue. Exon 3B encoded amino acids are highlighted in gray. Exon 4 ORF2 encoded amino acids are shown in orange. The nucleotide sequence of exon 3B including intronic splicing signals along with the four split reads that support E1 to exon 3B splicing are shown in Supporting Information Figure S3. H, Amino acid sequence of BCL11B encoded by exons 1, 2, 3, and 4 is shown for comparison. Exon 1 sequences are highlighted in green. Exon 2 sequences are highlighted in purple. Exon 3 sequences are highlighted in blue. The six zinc fingers encoded in exon 4 are highlighted as in panel (F). I, Inferred amino acid sequence encoded by BCL11B mRNA with alternative exon 3B. Segment colors match panels (G) and (H). The two underlined amino acids (proline and glutamate) at the beginning of exon 2 in panels (H) and (I) show amino acids that are excluded by a known alternative splice junction between exons 1 and 2 of BCL11B BCL11B allele sequence without HPV70 identified all the standard splice junctions of the host gene, including variable inclusion of exon 3 ( Figure 3E), indicating expression of normal BCL11B transcripts in the tumor as well, possibly from normal BCL11B alleles. In addition, splicing of a variant BCL11B transcript was identified in which the cryptic exon 3B was present between exons 2 and 4 ( Figure 3E).
Transcripts with the most frequently detected splice junction (E1^E4 5 0 ss to 3 0 ss exon 4) encoded a fusion protein with the first five amino acids of HPV70 E1 fused to the portion of BCL11B encoding all six of its zinc fingers ( Figure 3F). The E1^E4 splice to cryptic exon 3B joined the first five HPV70 E1 amino acids in frame to an ORF that completely spanned exon 3B, which in turn was spliced to exon 4 ( Figure 3G). However, the splice junction from exon 3B to exon 4 caused exon 4 to be read in a different ORF than normal and encompassing 66 amino acids and no zinc fingers ( Figure 3G). Similarly, while transcripts encompassing the standard BCL11B exons 1, 2, 3, and 4 encoded the standard six Zn-finger protein ( Figure 3H), splicing of BCL11B transcripts from exon 2 to exon 3B fused the ORFs in frame, but the junction from exon 3B to exon 4 caused the same frameshift as in the viral fusion transcripts ( Figure 3I). In summary, the HPV70 DNA insertion resulted mainly in HPV70-BCL11B fusion transcripts encoding an unusual, N-terminally truncated form of the BCL11B oncogene with five HPV70 E1 amino acids at the N-terminus ( Figure 3F). In addition, less abundant transcripts containing cryptic exon 3B that causes exon 4 of BCL11B to be read in a different ORF were detected with either viral or BCL11B encoded amino acids at the N-termini ( Figure 3G,I).
To verify that the HPV70-BCL11B fusion RNAs were present, RT-PCR was performed on tumor RNA using one primer in virus sequences and the other in exon 4 ( Figure 4A). This confirmed the presence of the fusion transcripts ( Figure 4B). In addition to RNA from the tumor, a formalin fixed, paraffin embedded tissue fragment was obtained from a histopathology specimen from the same patient archived 3 years before tumor resection.
The patient was diagnosed with CIN3 at that time, but was lost to followup. Both RNA and DNA were both prepared from the archived CIN3 sample. RT-PCR detected the presence of the fusion transcripts at both the tumor and the CIN3 stage ( Figure 4B). Veracities of the RT-PCR products were confirmed by sequencing of excised bands from the tumor RT-PCR. The more prominent, faster migrating band confirmed splicing from HPV70 E1 to BCL11B exon 4 (Supporting Information Figure S4). The faint upper band was also recovered from the tumor RT-PCR and sequenced thus confirming the structure of HPV70 E1 spliced to exon 3B, and exon 3B spliced to exon 4 of BCL11B in the same RNA (Supporting Information Figure S4). Presence of the 3980 bp HPV70 DNA insertion into BCL11B was also confirmed by PCR testing for the HPV70 L1 junction in the CIN3 biopsy as well as in the tumor ( Figure 4C). Thus, the HPV70 DNA insertion and fusion transcripts were present in the lesion by the time of CIN3 diagnosis as well as in the eventual tumor.

| DISCUSSION
Application of the HC + NGS approach to a cervical tumor identified the presence of HPV70 DNA and discerned the precise fraction of the viral genome present in the tumor. It also revealed that the viral DNA was integrated into one allele of BCL11B in the human genome, a gene that has been previously implicated in human tumorigenesis. 44 HPV70 was previously detected in ICCs and oral cancers, 69,70 and BCL11B was previously identified as a site of HPV16 integration in at least two cervical cancers. 43,62 Tumorigenesis by insertion of viral DNA near oncogenes is a thoroughly established oncogenic mechanism. [31][32][33][34][35] The frequent detection of HPV DNA near oncogenes in virally induced human tumors indicates that insertional oncogenesis is a key mechanism in HPV-induced tumors, and that it may act in concert with HPV-encoded oncogenes, perhaps even replacing them in some instances. 40 Detection of HPV70 DNA integrated into BCL11B and transcriptional activation of it, along with HPV E7 transcripts, strongly implicate HPV70 in this tumor.
Hybridization capture plus massively parallel sequencing has been used and proven to be reliable for identifying human genome integration sites of HPV DNA in tumors. 20,21,24,46 As demonstrated here, this  Figure S4 based HPV analyses of tumors have suggested roles for non-hrHPVs in oncogenesis in a minority of tumors broader HPV type screening would create substantial burdens. 1,8,[83][84][85] Nonetheless, it is important to be aware that screening based solely on hrHPV detection may miss a small fraction of at risk individuals such as the patient described here, where the L1 gene PCR target may be deleted or may lack sufficient sequence homology for recognition. Expanded PCR specificity for HPV analysis is a subject of current interest. 86 HPV typing of ICCs currently offers little or no direct benefit to