Sequence variation and physical state of human papillomavirus type 16 cervical cancer isolates from Australia and New Caledonia
Article first published online: 8 NOV 2001
Copyright © 2001 Wiley-Liss, Inc.
International Journal of Cancer
Volume 97, Issue 6, pages 868–874, 20 February 2002
How to Cite
Watts, K. J., Thompson, C. H., Cossart, Y. E. and Rose, B. R. (2002), Sequence variation and physical state of human papillomavirus type 16 cervical cancer isolates from Australia and New Caledonia. Int. J. Cancer, 97: 868–874. doi: 10.1002/ijc.10103
- Issue published online: 29 JAN 2002
- Article first published online: 8 NOV 2001
- Manuscript Accepted: 31 AUG 2001
- Manuscript Revised: 21 AUG 2001
- Manuscript Received: 28 MAY 2001
- human papillomavirus 16;
- sequence variation;
- physical state
Sequence diversity over 2600 nucleotides of the upstream regulatory region (URR) and the E6 and E2/E4 genes of 34 human papillomavirus (HPV)16 cervical cancer isolates from Australia and New Caledonia was investigated. One 81 base duplication, 41 single base substitutions and 1 single base insertion were identified in the URRs. Some of these changes are reported here for the first time. Several of the 19 changes impacting transcription factor binding sites had the potential to alter promoter activity. Twenty-eight (82%) of the variants belonged to the European lineage, 4 (12%) were Asian and 2 (6%) were Asian-American. Eighteen of 27 (67%) isolates where the E6 gene was examined contained amino acid substitutions. Of 13 isolates sequenced with intact E2 genes, 12 (92%) contained amino acid substitutions in the E2 protein and 3 (23%) amino acid substitutions in the overlapping E4 protein. Some of the changes in E6 and E2 may alter immunological epitopes or protein function. The physical state of HPV DNA was assessed by Southern hybridization and PCR for an intact E2 gene. Overall, 11 of 25 isolates contained only integrated HPV DNA, 10 only episomal HPV DNA and 4 both integrated and episomal DNA. No particular patterns of variation in the URR, E6 or E2/E4 genes predicted physical state. This investigation represents one of the most comprehensive studies of its kind and fills an important gap in global sequence data. © 2001 Wiley-Liss, Inc.
Human papillomavirus (HPV) type 16 is the most common cause of cervical cancer globally, yet the majority of women infected with this virus never develop severe cervical dysplasia or malignancy. Epidemiological evidence now suggests that sequence diversity in critical regions of the HPV16 genome such as the upstream regulatory region (URR), the E6 and the E2 genes may, in part, explain variations in the oncogenic potential of the virus.1, 2 Variants have been found to cluster along ethnic and geographic lines3, 4 suggesting that the marked discrepancies in cervical cancer rates from different countries may be related to the variable oncogenic potential of these variants, as well as to geographic differences in transmission patterns and exposure to co-factors.
Data from many centres worldwide have shown substantial sequence variation within transcription factor binding sites in the HPV16 URR. Noteworthy are changes within Yin Yang 1 (YY1) sites that de-repress the E6 promoter, potentially allowing malignant conversion without integration.1 The E6 regions of many HPV16 isolates carry amino acid changes in functional or antigenic domains that may also have biological implications. Differences in the functional activities of naturally occurring E6 variants have been demonstrated in vitro.5 A nucleotide change in E6 at nt 350, leading to a potentially important antigenic change in the encoded amino acid (L83V), has been investigated for its association with persistent cervical infection and progression of disease, but results have been conflicting.6 Less is known of the nature and extent of sequence variation within HPV16 E2, but some identified variants have the potential for up- or downregulating E6/E7 expression or altering host immune response.7, 8
Although upregulation of E6/E7 expression mediated by integration is pivotal to theories of HPV-associated malignant conversion, approximately one-third of HPV16-positive cervical cancers have been found to carry the virus exclusively in episomal form.9 It is possible that sequence changes within E2 or E6 may provide a mechanism, in addition to that associated with URR variation, by which the virus can achieve malignant conversion without integration. The relationship between E2 and E6 sequence variation and physical state, however, remains poorly defined.
The study described in this report investigated sequence variation over 2600 bases comprising the URR, E6 and E2 genes (and the E4 gene contained within E2) of 34 HPV16 positive cervical cancer isolates from Australia and New Caledonia. The findings were related to the physical state of the virus in the tumour cells, assessed by Southern hybridization and polymerase chain reaction (PCR) for an intact E2 gene.
MATERIAL AND METHODS
Investigations were undertaken on HPV16 isolates from 34 women (26 Australian residents and 8 from New Caledonia) who underwent primary treatment for cervical cancer at Royal Prince Alfred Hospital, Sydney, Australia, between 1990 and 1996. Of the 26 Australian residents, 20 were born in Australia, 4 in Europe (1 in Austria and 3 in England), 1 in Asia (Korea) and 1 had an unknown place of birth. Selection was based solely on the availability of material for the investigations. The mean age of the patients was 48 years (range 27–69).
Nucleic acids were extracted from fresh tumour samples using standard phenol/chloroform procedures. Nucleic acids extracted from HPV16-positive CaSki cells10 were used as positive controls in the PCRs.
Polymerase chain reaction
Polymerase chain reaction (PCR) for the URR, E6, E2 and E4 genes were carried out as published by May et al.,11 Wheeler et al.12 and Das et al.,13 respectively. The controls used and the precautions taken to minimize the possibility of cross-contamination of specimens at all stages of processing were as previously published.14 Five microliters of PCR product was electrophoresed in a 2% Nu Sieve™ agarose (FMC Bioproducts) gel. The DNA bands were visualized by staining with 1 μg/ml ethidium bromide and photographed under UV transillumination.
Nucleotide changes were determined by direct sequence analysis of pooled PCR products purified by polyethylene glycol precipitation using our modifications14 of the methods published by Lis.15 Sequence analysis was carried out using ABI PRISM® BigDye™ Terminator Cycle Sequencing chemistry using forward, reverse and/or internal primers (URR nt 7703–7685, 5′ CCTAACAGCGGTATGTAAG and URR nt 7666–7684, 5′ AATCACTATGCGCCAACGC). After resolution of data on an ABI PRISM® 377 sequencer, sequences were aligned and differences determined using Pileup (GCG) and Pretty (GCG) software. The HPV16 prototype strain, HPV16R,16, 17 provided an arbitrary reference. The presence of nucleotide changes was confirmed by repeat sequence analysis using different PCR products from the same samples.
E2 PCR for determination of the physical state
Because integration of the HPV genome into cellular DNA frequently disrupts E2, the physical state of the virus was investigated by PCR for the entire E2 gene as described by Das et al.13 The specificity of the E2 PCR products was confirmed either by sequencing or by cleavage with Hinc II to produce bands of 662 bp and 476 bp.
Identification of E2 breakpoint regions
For isolates yielding negative results for PCR of the entire E2 gene, the general region of the breakpoint was determined by PCR of: (a) the 3′ region of E2 (PCR E2-1) and (b) PCR of the 5′ region of E2 (E2-2) using the following primers: PCR E2-1: nt 3028–3040 5′ GTGGACATTACAAGACGTTAGCC and nt 3873–3854 5′ GGATGCAGTATCAAGATTTG; PCR E2-2: nt 2735–2753 5′ AGGACGAGGACAAGGAAAA and nt 3341–3360 5′ CTGCTAAACACAGATGTAGG. The 50 μl PCRs contained dNTPs 250 μM (E2-1, E2-2), MgCl2 3.5 mM (E2-1) and 2.5 mM (E2-2); and 250 to 500 ng of nucleic acids as templates (E2-1, E2-2).
Amino acid analysis
Nucleotide variations leading to amino acid changes in the E6, E2 and E4 proteins were determined using Translate and Eclustalw software. The effect of amino acid changes on hydrophilicity and the antigenic index of the E6 protein was examined by the methods of Kyte and Doolittle18 and Jameson and Wolf19 using PeptideStructure software.
Southern hybridization for assessment of physical state
Ten micrograms of tumour nucleic acids were digested with Bgl II (no restriction sites in HPV16) and BamH I (single restriction site in HPV16) and electrophoresed on a 0.7% agarose gel. The DNA was transferred to Genescreen Plus® membrane (NEN™ Life Science Products) and high stringency hybridization was carried out using the entire HPV16 prototype genome (labeled with 32P-dCTP to a specific activity of 1 × 108cpm/μg) as described previously.20
One large scale change (an 81 bp duplication at the 5′ end of the enhancer, nt 7453–7534 of the O2 URR), 41 single nucleotide substitutions at 40 positions and 1 single base insertion were identified in the 34 URRs examined. The number of changes in individual isolates ranged from 1–15 (Fig. 1), representing 4.5% variability over approximately 900 nt of URR sequenced. Some of the changes are believed to be reported here for the first time. The majority of the isolates (28 of 34, 82%) were characteristic of the European lineage: 4 (12%) were Asian and 2 (6%) were Asian-American lineage variants. The 8 New Caledonian women harbored only European lineage variants. Nineteen of the 42 single base changes impacted known transcription factor or hormone binding sites. GenBank accession numbers for the URR variants are AF026034 and AF404668–AF404691.
Twenty nucleotide changes at 19 positions (3.3% variation over 540 nt) were identified in the E6 region of the 27 isolates containing sufficient material for these investigations (Fig. 2). Six isolates (all European lineage) contained the prototype E6 sequence and another 5 differed from the prototype only by containing the T to G change at nt 350 (European subclass variants 350T and 350G respectively). Overall, 11 isolates (41%) contained the T to G change at nt 350. Twelve (60%) nucleotide variations resulted in amino acid substitutions (Fig. 3) : none of these impacted the 4 zinc binding motifs (CXXC) of the encoded protein (Fig. 4). No frameshift mutations were observed and no nucleotide changes conferred a premature stop codon. Substitutions at amino acid positions 27, 44, 87 and 92 resulted in large changes in hydrophilicity; the change at position 14 resulted in a large decrease in antigenic index, whereas that at position 27 substantially increased the antigenic index. GenBank accession numbers for the E6 variants are AF404692–AF404706.
Sequence data for E2 and E4 were obtained for 13 of the 19 isolates shown to have an intact E2 gene (Fig. 2). Thirty variant nucleotides were identified over approximately 1080 nt sequenced, corresponding to 2.7% variation. Only 1 isolate was identical in sequence to the HPV16 prototype. The C to T substitution at nt 3410 was the most common alteration, being present in 10 of the 13 isolates (77%). In the E2 gene, 12 nucleotide changes were identified in the transactivation domain, 5 in the hinge region and 13 in the DNA binding region (Fig. 4). Twenty-one of these 30 E2 changes resulted in amino acid substitutions (Fig. 3). No frameshift mutations were identified and no nucleotide changes conferred a premature stop codon. In the E4 gene, 14 nucleotide changes resulting in 5 amino acid substitutions were identified (Fig. 3); but 3 of these were observed only in the Asian-American variant K4. The amino acid change in the Asian isolate G1 was located toward the carboxyl terminal of E4 (codon 70) and created a new start codon. GenBank accession numbers for the E2/E4 variants are AF407214–AF407221.
Physical state analysis and E2 gene disruption.
Using combined data from Southern blot hybridization and PCR for an intact E2 ORF, the physical state of viral DNA was determined for 25 of the 34 isolates. Eleven cancers contained only integrated HPV sequences and 3 of these contained the full length E2 gene (H3, K3 and M2); 10 cancers contained only episomal viral DNA with a full-length E2; whereas 4 had both integrated and episomal DNA. Of the 8 isolates found to have integrated by disruption to E2, 2 lacked the entire E2 ORF, 1 contained the entire E2 ORF discontinuously, 1 was disrupted in the 5′ region and 4 were disrupted in the 3′ region. No patterns of variation in the URR, the E2 or E6 genes were predictors of episomal vs. integrated status.
Our study of isolates from Australia and the South Pacific fills one of the few remaining gaps in global HPV16 sequence data. Furthermore, by examining sequence diversity across approximately 2,600 bases of a substantial series of cancer isolates in relation to physical state, this represents one of the most comprehensive studies of its kind. It has been suggested that nucleotide alterations in 1 region of HPV16 are associated with changes in other regions of the genome.3 In our study, the sequences of the E6 and E2 genes were relatively lineage specific and there was no evidence of genetic recombination. As observed previously in our investigation of Chinese cervical cancer isolates,21 however, changes in the URR did not always predict those in E6 or E2.
The observation that the majority of Australian isolates belonged to the European lineage is consistent with the colonization of this continent during the 19th and early 20th centuries with migrants of predominantly Anglo-Celtic extraction. The proportion of Asian lineage variants (4 of 34), however, was larger than reported from all regions except South East Asia itself (9 of 35).4 This figure was in sharp contrast to the very low proportion (2 of 372) of this lineage reported by the same authors from all other non-Asian continents and may reflect the influx into Australia of large numbers of migrants from Asia over the past 30 years. The only Asian-born woman in the study was not infected with an Asian lineage variant. The proportion of Asian-American lineage variants in this survey (2 of 35) was virtually identical to that found by Yamada et al.4 in South East Asia. Overall, these comparisons indicate that the spectrum of oncogenic HPV16 variants circulating in Australia is most similar to that seen in our nearest Asian neighbors. Although all 8 New Caledonian isolates were of European lineage, the sample is too small to exclude the presence of other lineage variants in the tumors of New Caledonian women.
Although our isolates fell into patterns characteristic of other geographic regions, some of the nucleotide changes identified have not been previously reported. Our recent functional analysis of 6 of these variant URRs indicated that some nucleotide changes may be biologically relevant, in terms of the transforming ability of episomal-only virus.22 Of particular interest was the novel variation in the PSM of the entirely episomal Asian-American variant K4. This A to C change at nt 7894 prevented the binding of CCAAT displacement protein (CDP) and caused de-repression of the E6/E7 promoter. By analogy with the effect of mutations within Yin Yang 1 (YY1) binding sites,1 this may represent a new strategy for malignant conversion in the absence of integration. We also showed that changes in the YY1 site at nt 7792 and the octamer-1/papilloma enhancer binding factor 1 (Oct-1/PEF-1) site at 7676 affected promoter activity.
A substantial proportion of the changes found in E6 were located in the amino half of the protein that represents the binding region for the cellular protein E6-AP and has importance in both cell mediated and humoral host immune response.23, 24 Our observation that changes in codons 14 and 27 substantially altered the E6 antigenic index (large decrease and increase in antigenicity, respectively) and also potentially affected interactions with E6-AP or E6-BP, was of particular interest. The most common E6 variation, a G for T at nt 350 (L83V), was found in about 40% of our isolates—a frequency comparable to that reported in isolates from Europe,25 but a markedly higher frequency than the 10% found in our previous survey of Chinese isolates.21 The biological significance of 350G remains unclear, being reported as a ‘high-risk’ factor for cancer in some populations,26 but as ‘low risk’ in others,27 and as ‘no risk’ at all in still others.25 These conflicting findings may relate to genetic differences in the different populations, specifically a polymorphism in the p53 gene at codon 72 that results in either an arginine or proline that may alter binding of the variant E6 proteins. The distribution of this polymorphism in the Australian population warrants investigation.
Overall, our isolates carried changes broadly distributed across the E2 gene, but the nature and extent of E2 variation within the Asian-American variant was markedly different from that seen in the other lineages. Engineered E2 mutants, particularly of the transactivation region, increase immortalization capacity28 and transactivation activity.29 There have, however, been few investigations of the functional significance of natural E2 variants, apart from a study carried out by Veress et al.8 that showed that the transcriptional transactivation potentials of European and Asian-American E2 variants were similar. Recently, it was shown that there was a significant association between the 3684C-A variant and high-grade cervical squamous intraepithelial lesions, suggesting that this variant could be important in mediating progressive disease.6 This change was found in only 4 of 13 cancer isolates analyzed in our series.
Our assessment of the relationship between E2 and E6 sequence variation and physical state was of particular interest. On current evidence, changes in critical transcription factor binding sites in the URR such as YY1 and PSM are unlikely to account for all cases of episomal carriage of cancer isolates, even allowing for the possibility that some changes identified may impact uncharacterized transcription factor binding sites of major functional significance. Furthermore, there has been preliminary evidence in Asian-American isolates of an association between E2 variation, episomal state and high viral copy number.7 The Asian-American isolate in our study analyzed for physical state was found to be carried episomally. Nonetheless there is strong evidence from functional studies that upregulation of E6/E7 expression in this isolate was due to point mutations within the URR; 1 in the PSM22 and another in an unidentified binding site at nt 7729.30 There were no obvious associations between sequence changes in E2 or E6 and physical state in this series. Eight of the 11 isolates containing only integrated viral DNA were disrupted in E2, a proportion similar to that reported by Kalantari et al.;31 but interestingly, half of the cases were integrated at the 3′ end of E2, whereas in the earlier report the breakpoints predominantly occurred in the 5′ region. Further insight into the biological significance of the site of integration within E2 (or E1) will depend on a precise definition of breakpoints followed by a determination of the frequency of variation at these locations in low grade cervical lesions with well-defined clinical associations. The proportion of integrated isolates in our study was lower and the proportion of episomal isolates higher, than expected from published data. We believe this is likely to reflect technical factors because the 9 cancers where physical state could not be confirmed were negative both by E2 PCR and Southern hybridization, suggesting that they probably contained viral DNA integrated at a very low copy number. If this were the case the proportion of isolates with integrated DNA would have risen to 24 of 34, consistent with most other published data.9
In our study no amino acid changes were identified in the amino terminus of the E4 protein that associates with tonofilaments, resulting in disruption of the cytokeratin filament network. These findings are similar to those reported by Eriksson et al.,32 who suggested that this region of the E4 protein is intolerant of amino acid changes, in contrast to the corresponding region of E2. There are, however, almost no data on the potential implications of sequence variation in the E4 protein.
Assessment of the biologic significance of intratypic sequence variation in papillomaviruses has been complicated by marked differences in the nature and extent of variations in different geographic regions of the world. Function analyses of individual sequence variants will provide further useful information but clarification of the importance of sequence variation may ultimately depend on large scale studies comparing the characteristics of HPV16 isolates that have regressed spontaneously with those that have progressed to cancer.
We wish to thank Prof. M. Tattersall from the Department of Cancer Medicine at the University of Sydney and Drs. C. Dalrymple, J. Carter and members of the Departments of Gynaecological Oncology and Anatomical Pathology at King George V/Royal Prince Alfred Hospital, Sydney, for their continued support.
- 14Prevalence and distribution of human papillomavirus type-16 DNA in pelvic lymph nodes of patients with cervical cancer and in women with no history of cervical abnormality. Int J Cancer 1992;53: 1–4., , , et al.
- 17Human papillomaviruses. Los Alamos: Los Alamos National Laboratory, 1995. III 47–57., , , et al.