Molecular variants of HPV-16 associated with cervical cancer in Indian population


  • Conflict of Interest: A. Peedicayil has a potential personal conflict of interest as he attended a meeting organized by MSD pharmaceuticals in New Delhi, India, in April 2008 on “Diseases caused by HPV.”


Human papilloma virus is a causative factor in the etiology of cervical cancer with HPV16 being the most prevalent genotype associated with it. Intratype variations in oncogenic E6/E7 and capsid L1 proteins of HPV 16 besides being of phylogenetic importance, are associated with risk of viral persistence and progression. The objective of this multicentric study was to identify HPV-16 E6, E7 and L1 variants prevalent in India and their possible biological effects. Squamous cell cervical cancer biopsies were collected from 6 centres in India and examined for the presence of HPV 16. Variants of HPV-16 were characterized by full length sequence analysis of L1, E6 and E7 genes in 412 samples. Similar distribution of the variants was seen from the different centres/regions, with the European variant E350G being the most prevalent (58%), followed by American Asian variant (11.4%). Fifty six changes were seen in E6 region, 31 being nonsynonymous. The most frequent being L83V (72.3%), Q14H (13.1%) and H78Y (12.1%). Twenty-nine alterations were seen in E7 region, with 12 being nonsynonymous. The most frequent being F57V (9%). L1 region showed 204 changes, of which 67 were nonsynonymous. The most frequent being 448insS (100%), and 465delD (100%), H228D (94%), T292A (85%). The identified variants some new and some already reported can disrupt pentamer formation, transcriptional regulation of the virus, L1 protein interface interaction, B and T cell epitopes, p53 degradation, and thus their distribution is important for development of HPV diagnostics, vaccine, and for therapeutic purpose. © 2009 UICC

Cervical cancer is the second most predominant cancer worldwide.1 In India it is the most common cancer amongst women, accounting for 130,000 new cases and more than 70,000 deaths annually. Persistent infection with high risk Human papilloma virus (HPV) is the main aetiological factor in the development of cervical cancer and may depend on HPV genotypes and variants.2 HPV-16 is the most prevalent genotype associated with cervical carcinomas globally, as well as in Indian women.3–5 This could perhaps be because of differential HPV-16 variant frequency, with certain variants conferring greater oncogenecity.6, 7 The molecular variants or lineages differ in nucleotide sequence by no more than 2% in the coding region and 5% in the noncoding regions of the viral genome with respect to the prototype.8 Through nucleotide sequence comparisons, it has been found that HPV-16 has evolved along 5 major phylogenetic branches i.e., European, Asian, Asian American, African-1 and African-2.9 Intratypic variation of HPV-16 has been shown to be an important predictor of progression to clinical relevant cervical lesion.10 In a cohort study of young women 16 different HPV variants were found, one of which persisted over time, although the other variants were transiently detected.11 Considerable intratypic diversity of HPV-16 has been reported by other studies based on DNA sequencing from cervical tumors.12–14 The European variant is the most widely distributed worldwide except in Africa. Furthermore, variation studies performed on LCR, L1, L2, E6 genes of HPV-16 indicate that recombination between variants is rare or nonexistent.9, 14

The major L1 and minor L2 capsid proteins make up 83% and 17% of the viral coat, respectively, and thus L1 is present during the initial infection. Since L1 has the property of assembling into virus like particles (VLPs) which in turn are important in eliciting an effective immune response it has been used as an ideal target for prophylactic HPV vaccine. Few nucleotide differences found in HPV variants correspond to change in amino acids. Alterations that may interfere with the structure, functional or antigenic properties of specific viral proteins is important. For instance, variant 114K of HPV-16 assembles into VLPs in a heterologous expression system, whereas the reference (prototype) clone of HPV-16 does not have the same property.15 This difference has been attributed to a single amino acid change H228D of L1. Other changes in the L1/L2 region may be important for discriminating between the infectious potential of different variants and in defining epitopes relevant to vaccine design.

The oncoproteins E6 and E7 play an important role in transformation of the host cell in HPV induced cervical carcinogenesis.16 The transcriptional transactivator E6 protein targets the degradation of p53, eventually accelerating cell proliferation. E7 binds to and inactivates pRb and thus triggers the cell cycle progression. It is also reported that specific variations in the HPV-16 E6 protein, interfere with the T-cell cytotoxic immune response.17 HPV-16 sequence variations in cervical cancer have been reported from some countries.18–20 However, there is only limited data available in the literature from India, the data is regional, sample size is small and full length sequencing is not done. Comprehensive data integrating various regions is lacking.12, 18, 21, 22 In this study, we present results of HPV-16 L1, E6 and E7 variants observed in cervical neoplasia from 6 different regions of India i.e., Delhi, Kolkata, Mumbai, Bangalore, Thiruvananthapuram and Vellore. The novel variants seen in E6 region are Y70N, H78C, R135G, I26U, S142L, and in E7 region are V90C, C91A, Q96H, H51Y, S95L, 844delT leading to a frame shift. A large number of the variants detected in L1 region have not been reported earlier. The study represents the bioinformatic based analysis of the observed variants with emphasis into the possible biological implications of the identified sequence changes.

Material and methods

Sample collection

This was a multicentric study. Cervical cancer biopsies from 580 HPV positive untreated cases of cervical cancer were collected from 6 different regions of India i.e., from AIIMS, Delhi; CFI, Kolkata which covered the North-East population also; KMIO, Bangalore; TMC/ACTREC, Mumbai; RCI/RGCB, Thiruvananthapuram and CMC, Vellore. The samples were immediately frozen at −70°C until use. Ethical committee clearance was obtained from the respective Institutes. Informed consent was taken from each patient. Only histopathologically confirmed samples of squamous cell carcinoma of the cervix were taken for the study.

DNA extraction and HPV typing

A uniform protocol was adopted for DNA extraction from the cervical tumor tissues, similarly, in all the centres using the QIAmp DNA Kit from QIAGEN, Germany and purified using QIAmp spin columns as described by the manufacturer. The integrity of the DNA was checked by polymerase chain reaction (PCR) using beta-globin primers (540bp product). Only beta -globin positive samples were taken for further analysis. The samples were tested for the presence of HPV DNA by PCR using consensus MY09/11 primers which gave a 450bp product. HPV positive samples were processed further for the presence of HPV-16 by PCR using type specific primers (210bp product) and /or by HPV genotyping which was done for all the samples by Xcyton at Bangalore using Xcytonscreen HPV, a commercial DNA Macro Chip kit based on amplification using PGMY09/11 consensus primers and genotyping by Reverse Dot Blot Assay. Only those samples which were positive for HPV-16 were taken for variant analysis.

Molecular variant analysis by PCR and sequencing

HPV-16 positive samples from all the centres were subjected to full length sequencing of L1, E6 and E7 using the double pass sequencing at LabIndia, Gurgaon. All the samples could not be sequenced. The data is shown only for those samples which could be sequenced fully. The E6 (477bp) and E7 (296bp) regions of HPV-16 positive samples were amplified by PCR. The PCR products were sequenced after Exo-Sap treatment. Two pairs of primers were used, one set flanking the extreme ends of the 2 genes and the second set within the genes. The 5′-3′ sequence of E6 E7 forward primer was AAACTAAGGGCGTAACCGAA and of E6 E7 reverse primer was CTTCCCCATTGGTACCTGCAG. The second set of forward primer was TGGAATCCATATGCTG TATGT and reverse primer was TCCATGCATGATTA CAGCTGGGTT. The PCR amplicons were purified using commercial gel elution kit from Amersham. Twenty-five nanograms of the purified amplified DNA fragment was sequenced using Big Dye Terminator v3.1 Cycle sequencing kit and automated sequencer (Perkin Elmer ABI 310 Prism, Applied Biosystem). The same primers which were used for the amplification of the gene were used for sequencing.

To amplify and sequence the HPV-16 L1, the following strategy was used. Three sets of primers were used to amplify L1 (1595bp). This resulted in the amplification of L1 as three overlapping fragments. The primer pairs were L1-forward CCAAGCTC CTTCATTAATTCCTATAG and L1-reverse TTTACAAGCA CATACAAGCACATACA. The second set of primers were L1-1-482F GGCATTAGTGGCCATCCTT and L1-1-642R AGTGTT CCCCTATAGGTGGTTT. The third set of primers were L1-2-1023F GTGGTTCTATGGTTACCTCTGAT and L1-2-1191R GAAGTAGATATGGCAGCACATAA. These 3 fragments were each sequenced. To build back the whole L1 sequence, the 3 overlapping fragment sequences were compiled into a single contiguous sequence - the full-length L1. The softwares employed were DNA Baser for simple and batch DNA sequence assembly, Bioedit for editing alignments of nucleotide and aminoacid sequence and Pymol, an open source tool was used to map mutations onto protein 3D structure. To search the sequences for nucleotide mutations, the L1 sequences were lined up alongside the consensus HPV L1 sequence retrieved from NCBI. The Multiple Sequence Alignment (MSA) algorithm as implemented in the BIOEDIT software was employed. To identify whether these mutations caused a change in the amino acid sequence of the L1 protein, the L1 sequences were theoretically translated (in silico) to obtain the corresponding amino acid sequences. Then, as before, the L1 amino acid sequences were aligned alongside the consensus L1 amino acid sequence retrieved from NCBI Gen Bank. MSA allowed the identification of amino acid changes. These amino acid (protein level mutations) were classified into conservative (synonymous) and non-conservative (nonsynonymous) mutations. The final step was to try to predict whether the identified amino acid sequence changes affected the 3-dimensional structure of the protein. For this, the mutant amino acid positions were mapped onto the 3D structure of L1 protein using PyMol and Swiss PDB viewer software. Phylogenetic analysis was done by multiple sequence alignment with DNASTAR.

The same strategy was used for E6-E7 region, except that only 2 fragments were generated in each case as only two sets of primers were used to amplify full length E6-E7 region. The sequences were then compared against the reference sequences using NCBI BLAST (Gen Bank KO 2718)17 and classified into their respective variant classes.16 The European (E) HPV-16 class is: Ep (350T), E-G350 (T350G), E-G131G (A131G, T350G), Asian (As):T178G, Asian American (AA): G145T, T286A, A289G, C335T, T350G. African-1(Af-1): G132C, C143G, G145T, T286A, A289G, C335T. African-2(Af-2): T109C, G132T, C143G, G145T, T286A, A289G, C335T, G403G.


Characteristics of the study population

This multicentric study recruited 580 HPV positive subjects with squamous cell carcinoma of the cervix from 6 different areas of India. A total of 445 of these cervical cancer samples tested positive for HPV 16. The average age of these patients was 50.4 year, ranging from 25 to 87 years. For Delhi the total number of HPV16 positive samples were 84 with an average age of 47.8 years (range 30–75 years). Mumbai had 97 HPV 16 positive samples with average age of 53.0 years (range 28–80 years). For Kolkata the HPV 16 positive samples were 56, average age being 49.7 years (range 27–70 years). Bangalore had 77 HPV16 positive samples, with average age of 48.4 years (range 25-80 years). Thiruvananthapuram had 64 HPV 16 positive samples, with average age of 56.2 years (range 28-87 years). Vellore had 67 HPV 16 positive samples, with average age of 50.2 years (range 33-82 years).

Distribution of HPV 16 variants into different phylogenetic lineages

Analysis based on search for mutations in HPV-16 E6/E7 region helped identify different molecular variants of HPV-16 belonging to the different branches of geographical and phylogenetic relatedness. Of the 445 HPV 16 positive samples, only 412 samples could be sequenced. The study shows predominance of European (E) lineage strain among Indian HPV16 isolates (Table I). The highest prevalence from all the 6 regions was of E350G i.e., 239 of 412 (58%) ranging from 39.3 (Delhi) to 67.2% (Vellore). This was followed by the European prototype Ep (350T) which was seen in 109 of 412 (26.4%) samples and ranged from 17.7 (Mumbai) to 36.9% (Delhi).The European E-G 131 was seen in 2.4% of samples. Besides the European variant class, the next most prevalent variant seen from all the centres was the Asian American (11.4%). This ranged from 7.2 (Bangalore) to 16.4% (Mumbai). The Asian variant was seen in 1.7% of the samples from Delhi, Bangalore and Thiruvananthapuram only. The African variants were not seen in any of the samples. There was no correlation between age and number of variants seen per case, as a large number of them were seen even in women above 50 years of age.

Table I. Distribution of HPV-16 Variants Into Phylogenetic Lineages
  1. Distribution of HPV 16 variants into phylogenetic lineages in cervical cancer. “n” denotes the number of samples taken from each centre. “Ep” is European prototype, “E” is European, “As” is Asian and “AA” represents Asian American variant.

Ep31 (36.9%)15 (27.8%)14 (17.7%)20 (29%)18 (26.8%)11 (18.6%)109 (26.4%)
E-G 1319 (10.7%)01 (1.3%)00010 (2.4%)
E 350G33 (39.3%)31 (57.4%)51 (64.6%)42 (60.8%)45 (67.2%)37 (62.7%)239 (58%)
As2 (2.4%)002 (2.9%)03 (5.1%)7 (1.7%)
AA9 (10.7%)8 (14.8%)13 (16.4%)5 (7.2%)4 (6.0%)8 (13.6%)47 (11.4%)

Sequence variations of HPV-16 E6 in cervical cancers

Although detailed bioinformatic analysis of E6/E7 was not possible because of the absence of a good crystal structure of these proteins, a detailed list of nucleotide sequence alterations relative to the full length HPV-16 E6 from 412 samples of HPV-16 positive squamous cell cervical carcinomas from the 6 regions is shown in Table II. The maximum number of variants i.e., 25 variants in the E6 region were seen in samples from Mumbai and the least i.e., 12 variants in samples from Bangalore. The samples from Kolkata showed a total of 14 variants, those from Delhi showed 15 variants, those from Thiruvananthapuram showed 16 variants, and those from Vellore showed 13 variants.

Table II. HPV16 E6 Sequence Variations in Cervical Cancer
Amino acid R10GQ14HL19M D25ND25EI26V I27LI27RE29KE29QK34NR48W       F69LY70NS71C Y76HH78YH78C L83V
  1. Variant positions in HPV 16 E6 from the six centres. The nucleotide positions of E6 at which alterations were observed are shown in the second row. The top row shows the amino acid changes if any. The numbers in the bracket represent the number of samples sequenced from each centre.

Nucleotide position111131145158169176178179181182183188188205245254265267279286289293308311315320329335336342350
Delhi (84) 914 112          1 10101  1  10  53
Kolkata (54)  8  1  1   1      88  3   82 39
Mumbai (79)11152 1   1         1513   21 15 165
Bangalore (69)  5   2            55     15  47
Thiruvananthapuram (59)  8   31   1   11 188      8  45
Vellore (67)  4       2 111    44 1    4  49
Total (412)11054213711121211111150481133115021298
Amino acidL100SE113DH126YG130SR135G  C140RC140V  R141K R141SS142TS142L  S143I      
Nucleotide position402442479491506507508521521522522524525526527528529530531532532532536537562      
Delhi (84)     1      1        10         
Kolkata (54)    1    1           6 1       
Mumbai (79)  2    61 11    112219  1      
Banglore (69)   1  1        1     31        
Thiruvananthapuram (59) 3       8   11      2         
Vellore (67)1  1                 4         
Total (412)13221116191111111122134111      

A total of 56 different variants were seen in the samples from all the 6 centres. Of these 31 were nonsynonymous and the remaining were synonymous as compared to the HPV-16 E6 prototype sequence. A total of 9 deletions and 2 insertions were seen. The most prevalent variant was T350G. This led to an amino acid change of L83V and was seen in 72.3% of the samples. This variant was seen in 63% of samples from Delhi, in 82% of Mumbai samples, in 72% of Kolkata samples, in 68% of Bangalore samples, in 76% of Thiruvananthapuram samples and in 73% of Vellore samples. The other nonsynonymous changes seen in all the centres were G145T which led to an amino acid change of Q14H and was seen in 13.1% of the samples, and a change of C335T which led to an amino acid change of H78Y was seen in 12.1% of the samples. The other nonsynonymous changes observed were R10Q, L19N, D25N, D25E, I26V, I27N, I27R, E29Q, E29K, K34N, R48W, F69L, Y70N, S71C, Y76H, H78C, L100S, E113D, H126Y, G130S, R135G, C140R, C140V, R141K, R141S, S142T, S142L, S143I which belong to the European or Asian American lineage. The amino acid changes observed are shown in Table III, along with their biological relevance. The frequent synonymous changes seen were at T286A (12.1%), A289G (11.7%) and A532G (8.3%) and some of these were common for different centres. The novel variants seen in E6 region are Y70N, H78C, R135G, I26U, S142L. A number of samples from all the centres showed covariations in the E6, E7 and L1 region.

Table III. Summary of HPV-16 E6 Sequence Variations from all the Centres
Sequences analyzed at gene and protein levels
S.No.GenomeProteinSample %Biological function(s) affected
  1. Summary of the frequency distribution of HPV 16 E6 variants from all the centres. Sequence analysis at the gene and protein level is shown, along with the biological functions affected.

1A131GR10G2.5p53 binding & degradation, B & T cell epitope
2G145TQ14H13.1p53 binding & degradation, B & T cell epitope
3C158AL19N0.5B & T cell epitope
4G176AD25N0.7T cell epitope
5T178GD25E1.7T cell epitope
6T179G126V0.24T cell epitope
7A182T127L0.24T cell epitope
8T183G127R0.5T cell epitope
9G188CE29Q0.5T cell epitope
10G188AE29K0.24T cell epitope
11G205TK34N0.24Zn binding domain
12C245TR48W0.24T cell epitope
13T308CF69L0.24T cell epitope, E6 trancriptional transactivation
15C315GS71C0.7T cell epitope
16T329CY76H0.24p53 binding & degradation, T cell epitope
17C335TH78Y12.1p53 binding & degradation, T cell epitope, E6 trancriptional transactivation
18A336GH78C0.5E6 trancriptional transactivation, T cell epitope
19T350GL83V72.3E6 trancriptional transactivation,,T cell epitope, p53 degradation
20T402CL100S0.24E6 trancriptional transactivation, T & Bcell epitope
21A442CE113D0.7E6 AP binding, p53 binding and degradation
22C479TH126Y0.5E6 AP binding, p53 binding and degradation, T cell epitope, E6 trancriptional transactivation
23G491AG130S0.5E6 AP binding, p53 binding and degradation cell, T cell epitope
24C506GR135G0.24E6 trancriptional transactivation, p53 binding and degradation, B & T cell epitope
25T521GC140R1.45E6 trancriptional transactivation, p53 binding and degradation, B & T cell epitope
26T521GC140V0.24E6 trancriptional transactivation, p53 binding and degradation, B & T cell epitope
27524insAR141K0.24E6 trancriptional transactivation, p53 binding and degradation, B & T cell epitope
28A526TR141S0.24E6 trancriptional transactivation, p53 binding and degradation, B & T cell epitope
29T527AS142T0.24p53 binding and degradation, B cell epitope
30C528TS142L0.24p53 binding and degradation, B cell epitope
31C531TS143l0.5p53 binding and degradation, B cell epitope

Sequence variations in HPV-16 E7 in cervical cancers

E7 appeared to be better conserved as compared to E6, as has been reported earlier. E7 sequencing data of 412 samples from the different centres showed a total of 29 variants (Table IV). The samples from Delhi and Mumbai showed 7 variants each. The samples from Kolkata showed a total of 10 variants, those from Bangalore showed 9 variants, those from Thiruvananthapuram showed 8 variants and those from Vellore showed a maximum of 14 variants. The common variants detected from all the centres were T789C (11.9%) and T795G (11.9%), but they did not lead to any amino acid change. The most prevalent variant T732C which led to an amino acid change of F57V was seen in 9% of the samples. The observed amino acid changes in E7 region along with their biological implications based on literature ( are highlighted in Table V. The novel variants seen in E7 region are V90C, C91A, Q96H, H51Y, S95L and 844delT. The deletion led to a frame shift.

Table IV. HPV16 E7 Sequence Variations in Cervical Cancer
Amino acid T20SL28FN29S    A50TH51Y F57VFS FS    G88RFSI89FV90C,C91AFS FSS95LQ98HFS,K97N,P98H
  1. Distribution of HPV 16 E7 variant positions fiom the six centres. The nucleotide positions of E7 at which changes were observed are shown in the second row. The top row shows the amino acid changes if any. The numbers in the bracket represents the number of samples sequenced from each centre.

Nucleotide position585619645647663666668687709712730732746746747756789795823826826829842843844846849852854
Delhi (84)   21      9 1  992 1  1 2   
Kolkata (54)     2 1  16    881     8 11 
Mumbai (79) 1         121   1515   1  1    
Banglore (69)   2       3  2 55    5242   
Thiruvananthapuram (59)  33  1    3    88     3 3   
Vellore (67)11      11 4  1144 14   1  21
Total (412)123712111113711314949315156147131
Table V. Summary of HPV-16 E7 Sequence Variations from all the Centres
Sequences analyzed at gene and protein level
S.No.GenomeProteinSample %Biological function(s) affected
  1. Summary of the percentage distribution of HPV 16 E7 variants from all the centres. Sequence analysis at the gene and protein level is shown. The biological functions affected are listed.

1A619TT20S0.5DNA synthesis, E2F-pRb dissociation, pRb binding, NLS
2A645CL28F0.7DNA synthesis, E2F-pRb dissociation, pRb binding, NLS
3A647GN29S1.7DNA synthesis, E2F-pRb dissociation, pRb binding, NLS
4G709AA50T0.24Metal binding domain, pRb binding, E2F-pRb dissociation
5C712TH51Y0.24Metal binding domain, pRb binding, E2F-pRb dissociation
6T732CF57V9.0Metal binding domain, pRb binding, E2F-pRb dissociation
7746delAFrame shift0.24 
8747delGFrame shift0.7 
9G823AG88R1.7Metal binding domain, E2F-pRb dissociation
10A826TI89F1.2Metal binding domain, E2F-pRb dissociation
11826delAFrame shift0.24 
12829delGV90C,C91A0.24Metal binding domain, E2F-pRb dissociation
13842delGFrame shift1.2 
14844delTFrame shift3.4 
15T846CS95L1.7Metal binding domain, E2F-pRb dissociation
16G849CQ96H0.24Transformation, metal binding domain, mediates association with multiple cellular host proteins
17852delAFrame shift0.7 

HPV-16 L1 sequence variations

L1 region showed a total of 204 variants, with 67 being nonsynonymous. Most of the observed variants have not been reported earlier. The maximum number of 84 variants in L1 region were seen in samples from Kolkata (Table VI). The samples from Mumbai and Bangalore showed 56 variants each, those from Delhi showed 58 variants, those from Thiruvananthapuram showed 45 variants, and those from Vellore showed 62 variants. A change of C6240G at genome level and H228D at protein level was seen in 100% of the samples from all the centres, except Delhi, which showed this change in 63% of the samples. This is the site for viral assembly and is also a B cell epitope. An insertion at C6901 by ATC was seen at genome level which led to an insertion at 448 amino acid position by serine. This change was seen in all the samples from all the centres. A deletion GAT 6590 at the genome level which led to a deletion at 465 amino acid of aspartate was seen in 100% of the samples. This insertion and deletion have not been reported earlier. The other frequent variants detected at all the centres were C5562G at the genome level which led to a protein change of Q2E, and ranged from 9 to 19%, C5682T at genome level and H102Y at the protein level and ranged from 8 to 19%, C6163A at genome level and T202N at protein level ranging from 9 to 21%, A6178C at genome level and N207T at protein level ranging from 5 to 16%, A6693C at genome level and T379P at protein level was seen in 8 to 21%, G7058T at genome level and L500F at protein level was seen in 8 to 18%. The other frequent changes observed, but which did not lead to any amino acid change were G7058A, ranging from 3 to 20%, T5909C (8–19%), A6023T (1-9%), T6245C (8-21%), A6314G (3-7%), C6539A (1-7%), C6557T (8-21%), A6665C (7-22%), G6719A (8-21%), C6852T (10-19%), C6863T (8-18%), C6968T (8-18%), G6992A (2-20%) and G7058A (2-20%). The variants observed in L1 region along with their biological implications are shown in Table VI. The novel variants observed at the protein level in L1 are Q2E, T4P, C13W, E15G, Y21C, F24S, M27K, N58H, V91I, H102Y, T155A, T202N, N203I, A205T, N207T, H228D, D235Y, S265L, R277L, T292A, N296T, D300E, N311T, A313S, S315L, N316H, N334K, K335X, P336T, P336L, L338I, V359D and C5600A at nucleotide level.

Table VI. HPV16 L1 Squence Variations in Cervical Cancer
S.No.VariantsDelhiMumbaiKolkataBangaloreThiruvananthapuramVelloreTotal %Biological function(s) affected
GenomeProteinn = 69n = 60n = 42n = 53n = 35n = 59n = 318
  1. Summary of the distribution of HPV 16 L1 sequence variations at the genome and protein level from all the centres, along with their biological functions affected.

2A5568CT4P   1.9%  0.31% 
35595insC    3.8%  0.63% 
4T5597GC13W    2.85% 0.47% 
5T5597A    3.8%  0.63% 
6T5598GY14D    2.85% 0.47% 
7C5600AY14X 3.3% 1.9%  0.86%Nonsence
8A5602GE15G   1.9%  0.31% 
95605delA     2.85% 0.47% 
10G5607AD17N    2.85% 0.47% 
11C5609AD17E 3.3%    5.5% 
12A5620GY21C1.45% 9.5%  1.7%2.1% 
13T5629CF24S  2.4%   0.4% 
14G5636A    1.9%  0.31% 
15T5638AM27K  2.4%   0.4% 
165640delT     5.71% 0.95% 
17T5673C 1.45%3.3%   3.4%6.3% 
18T5681C 1.45%     0.24% 
19A5687G      1.7%0.28% 
20G5696A 8.7%18.3%9.5%7.5%11.4%8.5%10.6% 
21A5702C      1.7%0.28% 
22A5730CN58H   1.9%  0.31%B cell epitope
23C5756G   2.4%   0.4% 
24A5795G 1.45%     0.24% 
25A5797CK80T  2.4%   0.4%B cell epitope
26T5801G    1.9%  0.31% 
27A5813G 1.45%     0.24% 
28A5816T 1.45% 2.4%   0.63% 
29G5829AV91l  9.5%   1.6%B cell epitope
30A5834G 1.45%   8.57%1.7%1.0% 
31A5847C    3.8%  0.6% 
32T5855C  1.7%2.4%   0.67% 
33C5862TH102Y8.7%18.3%19%7.5%11.42%8.5%12.2%T & B cell epitope
34G5871TD105Y1.45%     0.24%B cell epitope
35T5906C    1.9%  0.31% 
36T5909C 8.7%18.3%19%7.5%11.42%8.5%10.8% 
37C5967T 1.45%     0.24% 
38T5999G    1.9%  0.31% 
396020delC  1.7%    0.28% 
40A6021GT155A 1.7%    0.28%L1-L1-interface
416022delC      5.1%0.85% 
42A6023T 1.45%3.3%7%9.4% 1.7%3.8% 
43A6023C      1.7%0.28% 
44G6024AE156K 1.7%    0.28%Assembly
456028delA   2.4%1.9%  0.7% 
466049insA   2.4%   0.4% 
47G6059A      1.7%0.28% 
48A6068G 1.45%  1.9% 1.7%0.83% 
496087delT      1.7%0.28% 
506092delA   4.8%   0.8% 
516105delT  3.3%2.4%   0.95% 
52C6163AT202N7.2%15%21.4%7.5%11.42%8.5%13.9%B cell epitope & assembly
53A6166TN2031   1.9%  0.31%B cell epitope
54G6171AA205T 1.7%7%1.9%2.85% 2.2%B cell epitope
55A6177C    1.9%  0.31% 
56A6178CN207T7.2%13.4%16.7%3.8%8.57%8.5%9.7%B cell epitope
57A6182C 1.45%     0.24% 
58A6200G    1.9%  0.31% 
59T6230C     2.85% 0.47% 
61T6245C 7.2%18.3%21.4%7.5%11.42%8.5%5.3% 
626253delG    1.9%  0.31% 
63G6261TD235Y 1.7%    0.28%B cell epitope
64T6269C      1.7%0.28% 
65G6278A 1.45%     0.24% 
666283insA    1.9%  0.31% 
67A6293CE245D   1.9%  0.31% 
68G6302A    1.9%  0.31% 
69A6314G  5%7%3.8%5.71%1.7%5.2% 
70T6317A      1.7%0.28% 
71C6352TS265L   1.9%  0.31% 
72C6365T 1.45%1.7%    0.5% 
73C6388TR277L4.3%1.7%    1.0%B cell epitope
74A6389G 1.45%15%   1.7%2.7% 
75A6432GT292A56.5%95%88%83%94.28%93.2%85%L1-L1 interface,B cell epitope
76A6445CN296T  2.4%1.9%  0.7%B cell epitope
77A6452G 1.45%   8.5%1.7%1.9% 
78A6458GD300E    5.71% 0.95%B cell epitope
79T6482C      1.7%0.28% 
80A6490CN311T   1.9%  0.31%B cell epitope
81G6495TA313S1.45%1.7%    0.28%B cell epitope
82C6502TS315L   1.9%  0.31%Assembly, B cell epitope
83A6504CN316H   1.9%  0.31%B cell epitope
84A6518G 1.45%     0.24% 
85C6539A 1.45%5%7%1.9% 1.7%2.8% 
86T6553C    1.9%  0.31% 
876554delA  1.7%    0.28% 
88A6554C      1.7%0.28% 
89C6557T 8.7%18.3%21.4%7.5%11.42%8.5%12.6% 
90A6559T   2.4%   0.4% 
91T6560AN334K   1.9%2.85% 0.78%B cell epitope
926560delT   2.4%   0.4% 
93A6561T    1.9%2.85% 0.78% 
946561delA   2.4%   0.4% 
956563delAK335X 1.7% 1.9%2.85%1.7%1.3%Nonsence, B cell epitope
96C6564AP336T  2.4%1.9%  0.71% 
976564delC    1.9%  0.31% 
98C6565TP336L   1.9%  0.31% 
99T6573AL338l1.45%     0.24% 
100A6581C    1.9%  0.31% 
101A6581T    1.9% 3.4%0.9% 
1026606insT   2.4%   0.4% 
103T6608C    1.9%  0.31% 
1046626insT   2.4%   0.4% 
105T6634AV359D1.45%     0.24%B cell epitope
1066638delT   2.4%   0.4% 
107G6652CS365T  2.4%   0.4%B cell epitope
1086654delA     2.85% 0.47% 
109A6656G      1.7%0.28% 
1106658delA    1.9%  0.31% 
1116663insA     2.85% 0.47% 
112A6665C 14.5%10%7%11.3%22.9%17%13.8% 
113A6668G      1.7%0.28% 
1146669delT   2.4%   0.4% 
115A6693CT379P7.2%16.7%21.4%7.5%11.42%10.2%12.4%B cell epitope
1166706delA  1.7%    0.28% 
117G6719A 7.2%18.3%21.4%7.5%11.42%8.5%12.4% 
118C6726T      1.7%0.28% 
119G6737T    1.9%  0.31% 
120A6779T 1.45% 7%   1.4% 
121A6801TT415S8.7%11.7%14.3%3.8%5.71%6.8%8.5%B Cell epitope
122T6822CS422P     1.7%0.28%B Cell epitope
123C6824T   2.4%   0.4% 
124G6836A      1.7%0.28% 
125C6852T 8.7%18.3%19%15%11.42%13.6%14.3% 
1266857delA   2.4%   0.4% 
127T6860C  1.7%   1.7%0.56% 
128C6863T 7.2%18.3%16.7%7.5%11.42%8.5%11.6% 
1296871delG   2.4%   0.4% 
130G6879AE441K1.45%1.7%    0.51%B Cell epitope
131T6887G 4.3%     0.71% 
132A6891C     2.85% 0.47%B Cell epitope
133G6893A      1.7%0.28% 
1346901insAC448insS100%100%100%100%100%100%100%B Cell epitope
1356909delA   2.4%   0.4% 
136T6911A    1.9%  0.31% 
137T6914C    1.9%  0.31% 
1386918delC   2.4%   0.4% 
1396934delC   2.4%   0.4% 
140A6935T  3.3%    0.55% 
141G6936A 1.45%     0.24% 
142A6938C      1.7%0.28% 
143G6945AE463K 3.3%    0.55%B Cell epitope
144A6947CE463D     1.7%0.28% 
145A6947G  1.7% 3.8% 6.8%2.05% 
1466950delGT465delD100%100%100%100%100%100%100%B Cell epitope
147C6957TL467F     1.7%0.28% 
148A6961CK468T   5.7%  0.95% 
149T6966C   2.4%   0.4% 
150C6968T 7.2%18.3%16.7%7.5%11.42%8.5%11.6% 
151C6969T   2.4%   0.4% 
152C6970AT471N   5.7%  0.95% 
153T6971G     2.85% 0.47% 
154T6972C   2.4%   0.4% 
155G6978A   2.4%   0.4% 
156A6979G     2.85% 0.47% 
157A6983C  1.7%2.4%   0.7% 
1586988delT   2.4%   0.4% 
159A6989G      1.7%0.28% 
1606989delA   2.4%   0.4% 
1616990delA   2.4%   0.4% 
162G6992A 7.2%20%23.8%7.5%11.42%8.5%13.06% 
163A6994G   2.4%   0.4% 
164A6997GK480R   5.7%  0.95% 
165T6999G   2.4%   0.4% 
166G7008CD484H2.9%5%  2.85% 1.79%B Cell epitope
167G7008A   2.4%   0.4% 
168C7011AL4851    5.71% 0.95%B Cell epitope
1697011delC   2.4%   0.4% 
170C7017A   2.4%   0.4% 
171T7025G  1.7%    0.28% 
1727027delT   2.4%   0.4% 
173A7028G    1.9%  0.31% 
174G7033CR492P  2.4%   0.4% 
1757049delA   2.4%   0.4% 
176G7058A  3.3%2.4%9.4%20%3.4%6.4% 
178G7062C   2.4%   0.4% 
179A7073T   2.4%   0.4% 
1807075delT   2.4%   0.4% 
181T7076C 1.45%1.7% 3.8% 1.7%1.44% 
1827076delT   2.4%   0.4% 
1837077delA   2.4%   0.4% 
184G7084AG509E   1.9%  0.31% 
185A7087CK510T 1.7%    0.28%NLS
186G7090AR511Q4.3%5%2.4%   1.95%NLS
187A7093CK512T   1.9%  0.31%NLS
188A7094G 2.9%     0.48% 
1897098delA   2.4%   0.4% 
190C7106A   2.4%   0.4% 
191T7110AS518T    2.85% 0.47% 
192C7117AT520N     1.7%0.28% 
1937120insTC     2.85% 0.47% 
194T7130A      1.7%0.28% 
195T7130G   4.8%   0.8% 
196A7132CK525T  2.4%   0.4%NLS
1977135delG   2.4%   0.4% 
198C7143A      1.7%0.28% 
199G7144C   2.4%   0.4% 
200T7145G      1.7%0.28% 
201A7147CK530T     1.7%0.28%NLS
202T7150AL531Q 1.7%    0.28% 
203A7154G  1.7%    0.28% 
204A7154CX532Y 1.7%    0.28% 

Figure 1 shows the position of various L1 mutations on the 3 dimensional structure of the protein (protein data bank ID: 1DZL). Table VII shows the list of variant positions that were mapped onto the L1-L1 interface from the L1 pentamer structure (PDB ID:2r5h). Only nonconserved amino acid substitutions are shown here since conserved changes are deemed highly unlikely to disrupt protein interface. Table VIII shows the list of nonconserved amino acid residues that mapped onto experimentally validated linear and conformational HPV-16 L1 epitopes.

Figure 1.

Shows the positions of the various L1 variants on the three dimensional structure of the protein (PDB ID: 1DZL).

Table VII. List of Positions of Variants that were Mapped Onto the L1-L1 Interface from the L1 Pentamer Structure (PDB ID:2r 5H)
SL no:Amino acid residues in the L1-L1 interface in the pentamerAmino acid positions of the variants in L1
  1. The amino acid residues involved in the L1-Ll interface and the variants that map onto these residues. Non-conserved amino acid residues predicted to disrupt L1 pentamer formation are given in boldface.

3T155-A160T155A, E156K
Table VIII. Nonconserved Amino Acid Residues that Map Onto Linear and Conformational HPV16 L1 Epitopes
SL No.L1 epitope along with defining residue positionsAmino acid positions of the variants
  1. Shows the experimentally validated epitopes derived from HPV16 L1, their sequence and amino acid positions. The epitopes marked in boldface are likely to be affected by the amino acid variant sequences found in this study.

4KGSPCTNVAVNPGDCPPLEL (197-216)T202N, N2031, A205T, N2071
6VHTGFGAMDFTTLQ (227-241)H228D, D235Y
9YIKGSGSTANLASSNYFPTP (302-321)N311T, A313S, S315L, N316H
10YFPTPSGSMVTSDAQIFNKP (317-336)N334K, K335, P336T, P3361

Biological implications of the identified variants: in silico analysis

Theoretical translations of the HPV16 L1, E6 and E7 gene sequences, allowed us to distinguish between nucleotide changes (point mutations, deletions and insertions) that affected amino acid sequence of these proteins and silent mutations/polymorphisms that did not affect protein primary structure. Although it is difficult to assess the biological significance of silent mutations in silico, prediction of the effects of amino acid residue changes on the structure and function of the protein is more meaningful and accurate.

Since high resolution X-ray crystal structures of the HPV 16L1 protein is available with the Protein Data Bank (PDB) (under the PDB ID:1DZL), we used structural information to guide our predictions. Specifically, the existence of this crystal structure, containing structural coordinates for residues 46 to the C-terminus of the L1 protein, allowed us to map most of the amino acid residue changes identified in this study onto the 3D structure.

To assess the effect of these point mutations on protein function, 2 different aspects were considered. First, we checked whether any of the mutations could potentially disrupt pentamer formation by HPV L1 which is required for viral propagation in vitro and also identified interface residues taking part in L1 protein-protein interaction. We then checked our list of identified L1 amino acid mutations to pick out those mutant positions involved in the L1 protein interaction interface. The potential of these mutations to disrupt the L1-L1 interaction was assessed taking into account the physico-chemical nature of the amino acid residue side-chains. For example, a hydrophobic to polar amino acid mutation (nonconserved change) was considered to have more potential to disrupt interactions than a hydrophobic to hydrophobic amino acid change (conserved). Thus, we have been able to demarcate a sub-set of mutations as more detrimental to disrupting L1 pentamer formation (Tables VI, VII and Fig. 1).

Since we also wanted to assess the significance of HPV 16 L1 variants on the efficacy of vaccines, we investigated whether any of the identified variants would potentially affect the L1 epitopes. To do this, we compiled a list of experimentally identified linear and conformational epitopes of HPV 16 L1 and mapped all the nonconservative changes on the epitope sequences. The assumption was that nonconserved amino acid residue changes are significantly more likely to affect epitope-receptor and epitope HLA interactions compared to conserved changes (Tables VI, VIII and Fig. 1).


Epidemiological data suggests that variants of the same HPV types are biologically distinct and may confer different pathological risks.23 To evaluate the health impact of HPV infections and in order to design HPV vaccines it is necessary to know the distribution of oncogenic HPV variants in different geographical regions and different populations. Very few studies on molecular epidemiology in Indian HPV 16 isolates are available in the literature. Thus, we characterized the molecular variants by sequencing L1, E6, E7 genes for HPV16. This study comprises the largest study to-date to evaluate the association between HPV-16 variants and cervical cancer in the Indian context. The few reports available on the status of HPV 16 variants in India,12, 22 show a predominance of European lineage with E350G variant being the most prevalent. Our findings show the presence of predominantly European variants, with the variant E350G being detected more frequently than the prototype E350T, and are in agreement with earlier reports. Other variant classes were not significantly represented in the samples analyzed by us, except for a prevalence of 11.4% of Asian American class. Another study conducted in French women has also shown increase in cervical disease progression in HPV type 16 E350G variant.24 Similarly, Londesborough et al,25 studying genetic variability in the HPV-16 E6 gene in a cohort of women from England, described a T to G substitution at nucleotide 350 that was predominantly associated with virus persistence and risk of cervical neoplasia. We observed a difference in the percentage distribution of the variants from the different regions, and this may be associated with ethnicity as there is an extensive admixture of ethnic groups in the Indian population.

The E6, E7 proteins of HPV-16 are important for several viral properties such as replication and transcription of viral DNA, interaction with the cytoskeleton network, immortalization and transformation.26 Sequence variants in one or more of these proteins may lead to altered biological function and thus affect the clinical outcome. The E6 and E7 viral proteins interact with a wide range of cellular proteins although they are best known for their ability to bind to and inactivate p53 and pRb, respectively. Studies have revealed strong intergene sequence covariation within individual HPV-16 isolates.27–30 We observed a number of variations in the E6 region. Many of these were at sites responsible for p53 binding and degradation, for E6 transcriptional transactivation and some were B and or T cell epitope sites. The functional significance of E6 polymorphism and carcinogenic potential of E6 has been addressed by in vitro and in vivo studies.28 Differences between E6 prototype and variants, and among variants, were observed in assays that assessed p53 degradation, Bax degradation and inhibition of p53 transactivations. HPV 16 E6 gene variant E350G (L83V) was found to display more efficient degradation of Bax and binding to E6 binding protein, thereby indicating that amino acid variations in HPV-16 E6 can alter activities of the protein important for its carcinogenic potential28. The oncogenecity of the L83V variant has been shown to vary geographically, possibly due to genetic differences between populations or perhaps requires simultaneously the presence of other co-mutations29–32. In our study, L83V was the most prevalent variant observed and in a number of samples we found upto 13 covariants in the same sample, involving E6, E7 and L1 genes. Another possible parameter involved with HPV 16 E6 variants and oncogenecity is the cellular immune response, particularly HLA distribution. It is known that the presentation of viral peptides to T cells in the context of HLA I and II molecules is influenced by genetic polymorphism of both HPV and HLA.27, 32 An HPV-16 E6 variant has been described with a nucleotide substitution A131G and a consequent amino acid change R10G that alters an HLA B7 peptide binding epitope.33 In our study, we have found this change only in samples from Delhi and Mumbai. This alteration influences the immune recognition of HPV-16 infected cell.

Differences in the functional activation of naturally occurring variants of HPV-16 E6 have been demonstrated.34, 35 The biological activity of HPV-16 E6 protein and its ability to induce degradation of p53 in vitro are directly correlated35 and are shown to be affected by mutation in the E6 ORF leading to amino acid change N58S.19 However, this change was not seen in any of our samples. The changes seen from Southern India i.e., Bangalore, Thiruvananthapuram and Vellore in E6 such as D25E, H78Y, E113D and F69L are similar to the changes reported by Wu et al2 and Pande et al.12 However, the samples from Mumbai and Kolkata showed D25N instead of D25E. The variant S142T reported as novel by Pande et al12 was seen in one of our samples. The other identified functions of the oncoprotein E6 are cell immortalization, antiapoptotic effect, chromosomal destabilization, activation of telomerase and perhaps blockage of interferon functions.36 Thus E6 protein shows multifunctionality and the observed variants can affect one or more of these functions.

The identified functions of E7 include cell immortalization, activation of cyclins E and A, inhibition of cyclin dependent kinase inhibitors, enhancement of foreign DNA integration and mutagenecity.36 Several groups investigating different geographical populations around the world have suggested that E7 oncoprotein is highly conserved in vivo.37, 38 In our study also we found E7 to be better conserved as compared to E6. However, studies from Japan, Korea, Indonesia and China have reported a high frequency of E7 gene mutations. In a Korean study,39 the prototype was detected in only 10% of invasive cervical carcinomas, whereas the variant N29S (A647G) was present in 70% of invasive cervical carcinomas. This variant was seen in some of our samples. A specific Javanese variant G666A in E7 and C6826T in L1 was found in 73% of Indonesian samples.19 This G666A E7 variant was seen in two of our samples. The most prevalent nonsynonymous variant observed in our study was F57V variant which was seen in all the regions. The other variant, N29S and the silent S95L variant observed in some of our samples have also been reported by Wu et al.2, 13 However, unlike Wu et al,2, 13 we did not find the change in 749 position of E7 genome in any of our samples. The variants G666A, T789C and T795G were seen in our samples as have been reported by another Indian study.12 Pande et al12 have reported E7 variation A647G to be absent in India, which is not the case, as we did see the presence of this variant in one of our samples. There are a few reports on the differential oncogenic potential associated with certain variants.35, 40 Immunoresistance has been shown to correlate with point mutation N53S in the immunodominant epitope RAHYNIVTF (aa 49-57) of HPV 16 E7 oncoprotein.41 In our study we did observe variants in this immunodominant E7 epitope, besides other epitopes. Thus mutagenesis of tumor antigens can lead to escape of malignant cells and should be considered in the development of cancer immunotherapy.

Changes in the L1/L2 regions of the HPV genome may be important in discriminating the infectious potential of different variants, as well as in defining epitopes relevant to vaccine design. We did not detect any of the changes in L1 as reported by Wu et al.2, 13 This may be due to the fact that their samples were mostly Asian variants, whereas ours were mostly European variants. Pande et al12 also reported A6667C and A6691G changes in L1 genome. We have seen similar variants in our study, but the variant T6862C reported by them was not seen by us, even though their samples were also essentially European variants like ours.

VLPs made from HPV16 L1 are vaccine candidates against genital HPV infection. Amino acid changes in HPV16 L1 can affect the efficiency of HPV16 L1 proteins to self-assemble into VLPs and can also affect the yield of VLPs.42, 43 L1 C- terminus is the most variable region and deletions in C-terminus that include any residues in helix h5, which anchors the C-terminal arm, make the L1 extremely protease sensitive, indicating that an ordered h5 is important for stability.44 Unlike the largely basic, C-terminal segment, the N-terminal region of L1 is relatively well conserved and this has been observed in our study also. The N-terminal segment of L1 determines the size of assembled particles as it lies at interpentamer contacts. However, mutations in the amino acid region 83 to 97 seem to affect the level of expression of the L1 protein.45 All our samples showed a change of His to Asp at position 228, in agreement with the results of Kirnbauer et al15.

It has been shown that naturally occurring antibodies against L1,the major capsid protein, are directed to conformational epitopes. Mutations in L1 can lead to conformational changes within epitopes and affect binding affinities for neutralizing antibodies.45–48 L1 mutations affecting viral assembly into VLPs have been shown to escape dendritic cell dependent innate immunity in cervical cancer.47 The V5 site is a dominant neutralizing epitope on HPV16 L1. We observed several variations in L1, at sites which happen to be B and T cell epitopes. Some of these may perhaps be CTL escape mutants. HPV16 genotype variants have different binding affinity for neutralizing mouse MAbs raised against HPV16 L1 VLPs. An A266T and a C428G mutation studied on assembly of these variant L1s showed that the A266T mutation reduced the binding by half as compared with wild type. Retention of C terminal 428-483 is critical for binding of conformation specific MAbs. We did observe a number of variants in this region. HPV16 is also being detected immunochemically using antibodies against L1 the major capsid protein in routinely stained gynaecological smears of cervical exfoliated cells. If mutations/variants exist in the epitope recognition by the antibody, detection of L1 may be bypassed, thereby affecting HPV detection.

Very few reports are available on L1 variants. Most of the changes observed by us in L1 have not been reported earlier. We were unable to find an updated web database on HPV16 variants/mutations being reported globally. Hence there is need for development of such a database.


The aim of the study was to establish the distribution of HPV 16 L1, E6, E7 variants in invasive squamous cell carcinoma of the cervix in Indian women. We report the differences in distribution of HPV 16 intratypic variants from different regions of India. The bioinformatic analysis of the data and literature survey has given us a select list of HPV 16 variants likely to affect virus assembly, immunological responses, pathogenecity, p53 degradation and transcriptional regulation of the virus. These variants may play a role in the development of invasive carcinoma of the cervix and may prove important for diagnostics and design of HPV16 vaccine strategy. Furthermore, indepth studies are needed to determine the clinical and biological effects of these variants.


We thank Department of Biotechnology, Government of India for financial support. Dr. M. Siddiqi would like to thank Drs. P.S. Basu, CNCI, N.R. Mondal, CCWH, P.S. Chakrabarty, Calcutta Medical College, Sutapa Biswas, CFI, Kolkata, A.C. Kataki and Debarata Barman, BBCRI, Guwahati for their contributions to the study.