An in-silico study of alphaherpesviruses ICP0 genes: Positive selection or strong mutational GC-pressure?

Authors

  • Vladislav Victorovich Khrustalev,

    Corresponding author
    1. Department of General Chemistry, Belarussian State Medical University, Minsk, Belarus
    • 7-24 Communisticheskaya Street, Minsk 220029, Belarus. Tel/Fax: 80172845957
    Search for more papers by this author
    • Tel/Fax: 80172845957.

  • Eugene Victorovich Barkovsky

    1. Department of General Chemistry, Belarussian State Medical University, Minsk, Belarus
    Current affiliation:
    1. 57-100 Chervyakova Street, Minsk 220022, Belarus
    Search for more papers by this author

Abstract

The purpose of our work was to analyze the case of the strong mutational GC-pressure influence on the ratio between nonsynonymous (DN) and synonymous (DS) distances (DN/DS ratio). We have used as the material the genes coding for ICP0 from five completely sequenced genomes of simplexviruses. DN/DS ratio, total GC-content (G + C), and GC-content in first, second, and third codon positions (1GC, 2GC, and 3GC, respectively) have been calculated separately for exon 2, nonconserved part of exon 3, and conserved part of exon 3 from ICP0 genes. Results showed that DN is more than DS only in the conserved part of exon 3 of ICP0 genes from cercopithecine herpesvirus 2 and cercopithecine herpesvirus 16. However, the cause of this result (DN/DS = 2.54) is the GC-pressure acting on the coding districts with 3GC = 99% rather than the biological process called positive selection. Only in these two viruses, because of the strong GC-pressure, 3GC has reached 99% in the conserved part of ICP0 exon 3, and so nucleotide substitutions that increase the GC-content practically cannot occur in third codon positions, where most substitutions are synonymous. In this case, GC-pressure has a substrate for nucleotide substitutions only in first and second codon positions, where most substitutions are nonsynonymous. © 2008 IUBMB IUBMB Life, 60(7): 456–460, 2008

INTRODUCTION

According to the descriptions of methods calculating the synonymous (DS) and nonsynonymous (DN) evolutionary distances, the difference (or ratio) between DS and DN is the indicator of the kind of natural selection (1). If DS > DN, then selection is negative; if DS = DN, then there is no selection; if DN > DS, then selection is positive.

In this work we showed that DN may be more than DS not because of the lower rates of synonymous nucleotide substitutions fixation (due to positive selection), but because of the dramatically decreased probability of the synonymous substitutions occurrence (due to strong mutational GC-pressure acting on coding districts with 3GC = 99%).

The gene coding for ICP0 (infected cell protein 0) is one of the most intensively studied genes of alphaherpesviruses. The product of this gene (one of the immediate early viral proteins) is known to have a function of ubiquitin ligase E3 (2). This gene is involved in the reactivation process. Its way of action (3) and even its effect (4) are still the subjects of dispute.

Although the studies describing the different aspects of ICP0 functions are numerous, a sufficient and wide investigation on the phylogeny of this gene has yet not been performed. An interesting but not well-described fact is the existence of two introns in the ICP0 genes from simplexviruses (5). Such intron–exon eukaryotic-like gene structure might occur in ICP0 genes from simplexviruses not very long time ago, because there are no introns in all homologous genes from varicelloviruses. But there is also the probability that ICP0 genes from varicelloviruses lost their intron–exon structure, while it is retained in simplexviruses.

The total level of guanine and cytosine (G + C) in the genomes of simplexviruses is very high. The minimal total G + C level for simplexviruses is 68% (for HSV1). The genes coding for ICP0 also has dramatically increased levels. The high G + C level in the gene or genome is the sign of mutational GC-pressure. GC-pressure is an imbalance of mutational processes when AT to GC nucleotide substitutions occur with higher rates than GC to AT ones (6). More valid evidence of GC-pressure is the situation when 3GC (GC-content in third codon positions, where most substitutions are synonymous) is much higher than 1GC and 2GC (6). Mutational pressure is the resulting “vector” of different mutational and repair processes causing nucleotide substitutions. In dsDNA genomes, in which coding districts are situated on both strands, there are only two possible results of these processes: AT-pressure or GC-pressure.

Our results showed that DN is higher than DS only in exon 3 part B of ICP0 genes from cercopithecine herpesvirus 2 (CeHV2) and cercopithecine herpesvirus 16 (CeHV16), where 3GC level is 99%. Is it the case of positive selection as the biological process? Can the strong mutational pressure lead to the same DN > DS situation? These are the questions we have discussed in our work.

EXPERIMENTAL PROCEDURES

We have used as the material the nucleotide sequences of genes coding for ICP0 from the completely sequenced genomes of the following alphaherpesviruses. Vernacular names of the viruses are given in brackets. From simplexvirus genus: Human herpesvirus 1 (Herpes simplex virus type 1) [NC_001806], Human herpesvirus 2 (Herpes simplex virus type 2) [NC_ 001798], Cercopithecine herpesvirus 1 (B virus) [NC_004812], Cercopithecine herpesvirus 2 (Simian agent A) [NC_006560], Cercopithecine herpesvirus 16 (Herpesvirus papio 2) [NC_ 007653]; from varicellovirus genus: Suid herpesvirus 1 (Pseudorabies virus) [NC_006151], Bovine herpesvirus 1 (Infectious bovine rhinotracheitis virus) [NC_001847], Bovine herpesvirus 5 (Bovine encephalitis herpesvirus) [NC_005261], Equid herpesvirus 1 (Equine abortion virus) [NC_001491], Equid herpesvirus 4 (Equine rhinopneumonitis virus) [NC_001844], Cercopithecine herpesvirus 9 (Simian varicella virus) [NC_002686], Human herpesvirus 3 (Varicella-zoster virus) [NC_001348].

Genes coding for ICP0 are present in the genomes of simplex- and varicelloviruses, but they are absent in species from mardivirus genus. As you can see in Fig. 1, there are three exons in simplexvirus ICP0 genes (5). Translated part of exon 1 is too short for applying any phylogenetical methods to it (just 16 codons in CeHV1, CeHV2, and CeHV16). So, in our work we used nucleotide sequences of exon 2 (222 codons in HSV1 ICP0 gene) and translated part of exon 3 (535 codons in HSV1 ICP0 gene).

Figure 1.

Scale diagram of HSV1 ICP0 gene. The position of the start codon (ATG) is indicated to show the border between nontranslated and translated parts of exon 1. The position of the stop codon (TAA) is indicated to show the border between translated and nontranslated parts of exon 3.

After the analysis of aligned amino acid sequences coded by exon 3 from simplexvirus ICP0 genes we cut each of them into two parts (part A and part B) according to the degree of similarity (see Fig. 1). Part A of exon 3 demonstrates a very low level of similarity while there is a conserved amino acid motif coded by exon 3 part B of all simplexviruses. For HSV1 ICP0 gene the length of exon 3 part A is 363 codons and the length of exon 3 part B is 172 codons (see Fig. 1). This alignment was performed using PAM protein weight matrix (1).

Total GC-content, 1GC, 2GC, and 3GC in exon 2, exon 3 part A, and exon 3 part B of ICP0 genes have been calculated with the help of our own algorithm VVK 3.4 (7), which can be found on our web page: www.barkovsky.hotmail.ru. This algorithm calculates the codon, amino acid, and nucleotide usage (including G + C, 1GC, 2GC, and 3GC), although its main function is the counting of nucleotide substitutions rates between two previously aligned sequences.

DN and DS have been calculated separately in exon 2, exon 3 part A, and exon 3 part B of ICP0 genes by using the Kumar method (1). We used the special software MEGA4 (8) for making alignments (using PAM protein weight matrix) and calculating DN, DS, and the ratio between DN and DS (DN/DS).

We have performed NCBI-BLAST analysis on two highly conserved amino acid motifs, one of which is coded by exon 2 and the other one is coded by exon 3 part B.

RESULTS

We have counted total G + C content and also GC-content in every codon position (1GC, 2GC, 3GC) for each of the selected parts of ICP0 genes. We have done this experiment to test the following hypothesis: in ancient genes with G + C > 0.6 the rule 3GC > 1GC > 2GC should work correctly. The illustration of the mentioned rule can be found elsewhere (1), including our original work (7) written in Russian (hyperlink is http://www.bsmu.by/bmm/04.2006/39.html).

When we use the word “ancient” as the characteristic of a gene or its part, we want to highlight that this gene has homologues in two or more domains of life. Most viral genes do have such homologues. It means that those “ancient” genes had been inserted in viral genomes from the host genomes (captured by viruses) somewhere in time. Some viral genes showing no recognizable similarity to host genes might occur due to deletions and insertions of different parts of coding districts accompanied by numerous single-nucleotide substitutions fixed by selection or genetic drift. Even when there are no obvious host homologues of the viral gene, but the rules of the 1GC, 2GC, and 3GC dependences on total GC-content (1, 7) are obeyed, we can hypothesize that the “primal material” for the evolution of this gene has been ancient. If the “material” for viral coding district was the set of tandem repeats or noncoding host DNA, or the predecessor of this gene has undergone the frameshift mutation, the rules of 1GC, 2GC, and 3GC dependences on total GC-content would rarely be obeyed.

In all ICP0 genes from five simplexviruses, the rule 3GC > 1GC > 2GC is obeyed in exon 2 and in exon 3 part B (in the conserved part of exon 3). To find out whether they have eukaryotic homologues, we performed the BLAST search.

Highly conserved Zn-finger domain is coded by exon 2 of ICP0 genes. There are many eukaryotic proteins possessing such Zn-finger domains (9). But the closest eukaryotic neighbor of the one from simplex- and varicelloviruses is the RING Zn-finger protein 24 (RNF 24). Figure 2 shows the amino acid alignment of Zn-finger domains from ICP0 (belong to alphaherpesviruses) and RNF 24 (belong to several eukaryotic organisms, from Homo sapiens to Arabidopsis taliana).

Figure 2.

Amino acid alignment of Zn-finger domain from ICP0 simplex- (exon 2) and varicelloviruses and eukaryotic RING Zn-finger protein 24 (RNF 24). Conserved amino acid residues are written in bold underlined type.

The proteins with Zn-finger domains are involved in many processes, including oncogenesis, signal transduction, development, and also can function as ubiquitin ligase E3 (9). But the concrete function of viral ICP0 is still the subject of dispute. Some authors consider that the function of HSV1 ubiquitin ligase E3 is associated with the conserved motif from exon 3 part B but not with Zn-finger domain (2).

This motif from exon 3 part B is the conserved one, but it is somewhat larger than it was described previously (2) (see Fig. 3). The full conserved region, according to the results of BLAST-analysis performed by us, has the homologous motif, which can be found in the eukaryotic translation initiation factor 4 gamma 1 (EIF4G1).

Figure 3.

Amino acid alignment of a highly conserved motif from exon 3 of simplexvirus ICP0 genes (residues 606–625 in HSV1 ICP0 protein) and the eukaryotic translation initiation factor 4 gamma 1 (EIF4G1). Conserved amino acid residues are written in bold underlined type.

The levels of 1GC, 2GC, and 3GC are very close to each other in exon 3 part A. These districts are still coding because of the open reading frames, despite the fact that there are numerous tandem repeats inside them.

Our next step was the calculating of the DN/DS ratio (Kumar method) for three districts of simplexvirus ICP0 genes. They are: 1) sequences of exon 2, 2) sequences of exon 3 part A (nonconserved part of exon 3), and 3) sequences of exon 3 part B (conserved part of exon 3). All our results (DN/DS ratios and G + C, 1GC, 2GC, and 3GC) are presented in Tables 1 (for exon 2), 2 (for exon 3 part A), and 3 (for exon 3 part B).

Table 1. DN/DS ratios (Kumar method, pairwise deletion) in exon 2 from Simplexvirus ICP0 homologues, and their GC-content
Virus nameHSV1HSV2CeHV1CeHV2G+C1GC2GC3GC
HSV1    0.700.670.520.91
HSV20.47   0.720.670.590.91
CeHV10.500.35  0.730.710.510.98
CeHV20.430.460.69 0.750.720.530.99
CeHV160.330.360.730.710.750.730.530.99
Table 2. DN/DS ratios (Kumar method, pairwise deletion) in exon 3 part A (nonconserved part of exon 3) from ICP0 Simplexvirus homologues, and their GC-content
Virus nameHSV1HSV2CeHV1CeHV2G+C1GC2GC3GC
HSV1    0.800.750.800.85
HSV20.76   0.820.740.830.87
CeHV10.690.52  0.850.910.800.85
CeHV20.800.650.70 0.830.850.790.85
CeHV160.770.650.540.470.820.870.770.81
Table 3. DN/DS ratios (Kumar method, pairwise deletion) in exon 3 part B (conserved part of exon 3) from ICP0 Simplexvirus homologues, and their GC-content
Virus nameHSV1HSV2CeHV1CeHV2G+C1GC2GC3GC
HSV1    0.710.640.580.91
HSV20.45   0.760.730.590.96
CeHV10.430.52  0.770.720.620.97
CeHV20.660.700.56 0.760.720.590.99
CeHV160.520.620.412.540.790.760.600.99

DN is higher than DS only in exon 3 part B from CeHV2 and CeHV16, where 3GC is 99% (see Table 3). Statistical significance (P < 0.05) of the difference between DN and DS for exon 3 part B from CeHV2 and CeHV16 has been confirmed by codon-based Z-test of positive selection. The variance of the difference was computed using the bootstrap method (1,000 replicates).

Is this case what is meant behind the term “positive selection” by the authors of the “DN/DS > 1” conception?

DISCUSSION

Table 3 shows that the 3GC in exon 3 part B of ICP0 from cercopithecine herpesvirus 2 (CeHV2) and cercopithecine herpesvirus 16 (CeHV16) is 99%. It means that the nucleotide substitution from A:T to G:C base pair practically cannot occur in third codon positions of these gene districts. Most nucleotide substitutions in third codon positions are synonymous, and most of the possible synonymous substitutions are in third codon positions (10). Thus, nearly half of all possible synonymous substitutions simply cannot occur in the studied gene districts.

All substitutions in second and most of them in first codon positions are nonsynonymous (10). In exon 3 part B of ICP0 the 1GC for CeHV2 and CeHV16 is 72 and 76%, respectively. The 2GC is 59 and 60%, respectively. It is clear that there is no “block” for AT to GC substitutions in first and second codon positions, and so nonsynonymous substitutions can occur with no limitations.

Coming back to our case of “positive selection” one can ask why the level of 3GC reached 99% and has not decreased yet. The level of 3GC = 99% can be stabilized in relatively neutral third codon positions only if the process of AT to GC substitutions is as powerful as the process of GC to AT substitutions (10). Even when the substrate for AT to GC substitutions in third codon positions is nearly 1%, they are taking place (actually, fixing) as frequently as the GC to AT substitutions. The substrate for the last ones is 99%.

In the situation when AT to GC substitutions occur (and so, fixing) more frequently than GC to AT substitutions, one can say that the given gene (or genome, or the part of the gene) is under the influence of mutational GC-pressure (6). If there is mutational GC-pressure in the gene, one can expect that it will increase the GC-content in third codon positions to much higher levels and much faster than in first or second positions (10).

In third codon positions of exon 3 part B of ICP0 from CeHV2 and CeHV16, GC to AT substitutions are rare because of GC-pressure, while AT to GC substitutions practically cannot occur because third codon positions are already full of G and C.

Once again, DS is lower than DN in exon 3 part B of ICP0 from CeHV2 and CeHV16 because synonymous substitutions cannot occur as frequently as nonsynonymous. This is the result of strong mutational GC-pressure and relatively silent negative selection.

In ICP0 exon 2 from CeHV2 and CeHV16, the level of 3GC is also 99%, and in the one from CeHV1 it is 98%. Despite this, DS is more than DN in these gene districts (see Table 1). However, DN/DS ratios are more than 0.69 in the above mentioned three gene districts. Between these three gene districts and HSV1 and HSV2 homologous ones and also between two last districts, DN/DS ratios are not more than 0.50. Probably, these results are due to not so dramatically GC-enriched third codon positions of HSV1 and HSV2 ICP0 exon 2 (91% for both).

Strong mutational GC-pressure exists in exon 2 just like in exon 3 part B of ICP0 genes, but DN is more than DS only in exon 3 part B from CeHV2 and CeHV16. One should remember that exon 2 is coding for Zn-finger domain that is thought to play an important role in the life cycle of these viruses. Once again, this is the only part of ICP0 genes conserved in both simplex- and varicelloviruses. These data make us hypothesize that the negative selection should be much stronger for exon 2 than for exon 3 part B. Indeed, in exon 3 part B more nonsynonymous substitutions caused by GC-pressure (especially in first codon positions) have been fixed, probably, because of their neutrality. As you can see in Table 3, the 1GC is 0.76 for CeHV16, while for CeHV2 it is just 0.72. This is the evidence of the fixing of AT to GC nonsynonymous mutations in exon 3 part B in the lineage from CeHV2 to CeHV16. For exon 2 the difference between 1GC levels in the same lineage is much lower (see Table 1).

The total GC-content in exon 3 part A is even more than in part B and in exon 2 (see Table 2). Despite such GC-enrichment (80–85%), DN/DS ratios are not more than 0.80 for exon 3 part A. The cause of it may be the presence of numerous tandem repeats. Those repeats occurred in every virus independently and have different sequences of repeated subunits. Also we should say that the levels of 1GC, 2GC, and 3GC are close to each other and the rule 3GC > 1GC > 2GC is not obeyed in exon 3 part B. So, relatively low DN/DS ratios might be the result of the absence of the “block” for substitutions exclusively in third codon positions.

In our work we have shown that DN > DS inequality sometimes may indicate the strong mutational pressure but not the positive selection, as the biological process.

Ancillary