Ostreococcus tauri virus (OtV-1) is a large double-stranded DNA virus and a prospective member of the family Phycodnaviridae, genus Prasinovirus. OtV-1 infects the unicellular marine green alga O. tauri, the smallest known free-living eukaryote. Here we present the 191 761 base pair genome sequence of OtV-1, which has 232 putative protein-encoding and 4 tRNA-encoding genes. Approximately 31% of the viral gene products exhibit a similarity to proteins of known functions in public databases. These include a variety of unexpected genes, for example, a PhoH-like protein, a N-myristoyltransferase, a 3-dehydroquinate synthase, a number of glycosyltransferases and methyltransferases, a prolyl 4-hydroxylase, 6-phosphofructokinase and a total of 8 capsid proteins. A total of 11 predicted genes share homology with genes found in the Ostreococcus host genome. In addition, an intein was identified in the DNA polymerase gene of OtV-1. This is the first report of an intein in the genome of a virus that infects O. tauri. Fifteen core genes common to nuclear-cytoplasmic large dsDNA virus (NCLDV) genomes were identified in the OtV-1 genome. This new sequence data may help to redefine the classification of the core genes of these viruses and shed new light on their evolutionary history.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
The marine unicellular green alga Ostreococcus tauri, a marine Prasinophyte belonging to a group of organisms generically known as the photosynthetic picoeukaryotes, is the smallest free-living eukaryote described to date (Chretiennotdinet et al., 1995; Courties et al., 1998). The O. tauri cell has a diameter size of less than 1 μm, a naked plasma membrane with no cell wall and lacks scales or flagella. Ostreococcus tauri has been isolated from different geographic regions and a range of depths (Diez et al., 2001; Worden et al., 2004). Small-subunit rRNA sequences of Ostreococcus spp. from cultures and environmental samples cluster them into four different clades that are likely distinct enough to represent different species (Guillou et al., 2004; Rodriguez et al., 2005). Genetic distances between isolates do not appear to reflect geographic variation; instead, their genetic divergence is thought to be driven by light and nutrient gradients that give rise to physiological differences between surface and deep isolates (Rodriguez et al., 2005; Cardol et al., 2008; Six et al., 2008).
Data on the isolation, characterization and genome sequencing of a virus that infects O. tauri (OtV5) were recently reported (Derelle et al., 2008). This breakthrough revealed a 186 234 base pair dsDNA virus isolated by plaque assay, which is morphologically similar to the Micromonas pusilla viruses (Mayer and Taylor, 1979; Waters and Chan, 1982; Cottrell and Suttle, 1995a). Taxonomic analysis classified OtV5 as a prasinovirus in the algal virus family Phycodnaviridae. The Phycodnaviridae comprise a genetically diverse (Dunigan et al., 2006), yet morphologically similar, family of large icosahedral viruses that infect marine or freshwater eukaryotic algae with dsDNA genomes ranging from 154 to 560 kbp (Van Etten et al., 2002; Schroeder et al., 2009; Wilson et al., 2009). Members of the Phycodnaviridae are currently grouped into six genera (named loosely after the hosts they infect), Chlorovirus, Coccolithovirus, Prasinovirus, Prymnesiovirus, Phaeovirus and Raphidovirus (Wilson et al., 2005a) – though a recently isolated large virus infecting Aureococcus anophagefferens (the causative agent of the New York/New Jersey Brown tides) remains unclassified (Rowe et al., 2008). O2-evolving eukaryotic algae are the hosts of the Phycodnaviridae, and their role in global carbon and nutrient cycling make them arguably, the most important microorganisms for maintaining life on planet Earth. It is perhaps surprising that only a few hundred (at most) phycodnaviruses have been isolated, less than a dozen have been sequenced and only one, PBCV-1 infecting Chlorella, has been studied in any depth (Van Etten, 2003).
Within the Phycodnaviridae, full genome sequences have been obtained from representatives of the Chlorovirus, Coccolithovirus and Phaeovirus genera (Dunigan et al., 2006) and evolutionary analysis of genomes places them within a major, monophyletic assemblage of large eukaryotic DNA viruses termed the nucleo-cytoplasmic large DNA viruses (NCLDVs) (Iyer et al., 2001; 2006). There are five families in this clade and include Poxviridae, Iridoviridae, Asfarviridae, Phycodnaviridae and Mimiviridae. The NCLDV concept is significant for two reasons. First, as the name suggests, it implies a likely propagation mechanism for members of the Phycodnaviridae where replication is conducted in the nucleus and/or cytoplasm of algal hosts. Typically, NCLDVs do not exhibit much dependence on host replication machinery which probably points to a role for the large number of genes of unknown function observed in the large Phycodnaviridae genomes, e.g. 86% of the genes in Coccolithovirus isolate EhV-86 have no database homologues (Wilson et al., 2005b). Second, phycodnaviruses are likely to have an ancient lineage. Phylogeny of the NCLDVs constructed by cladistic analysis indicate that divergence of the major families occurred prior to the divergence of the major eukaryotic lineages 2–3 billion years ago (Iyer et al., 2006). Given there are only 14 genes in common from a pool of c. 1000 genes between three genomes from different genera of the Phycodnaviridae (Allen et al., 2006), it supports the idea that a long time has evolved since Chlorovirus, Coccolithovirus and Phaeovirus genera diverged.
Other than a report of a transient Ostreococcus-like bloom off the eastern coast of the USA, whose rapid decline was thought to be caused partially by viruses (O'Kelly et al., 2003), there are no reports on the ecological role of Ostreococcus-specific viruses. Indeed, of the six Phycodnaviridae genera, the prasinoviruses and viruses of photosynthetic picoeukaryotes are the least represented in the literature, despite their global and ecological importance. Here we present the complete genome of a second O. tauri-specific virus (OtV-1) and compare it with the recently sequenced OtV5 (Derelle et al., 2008).
Results and discussion
Lytic virus isolate OtV-1 was isolated by plaque assay with host O. tauri strain OTH 95 using surface seawater collected on 6 September 2006 from station L4 in the Western English Channel (50°15′N, 04°13′W). Although characterization of the virus is not the focus of this paper, a brief description will help set the scene for genomic characterization (data not shown unless indicated). Viruses easily formed plaques approximately 1 mm in diameter on a lawn of O. tauri strain OTH 95 within 3–5 days. An exponentially growing culture of O. tauri strain OTH 95 lysed within 3 days following addition of OtV-1 to a multiplicity of infection of 1. Morphological characterization of the virus revealed it has icosahedral symmetry with a diameter of approximately 100–120 nm (Fig. 1). A limited host range analysis of OtV-1 suggests it only infects O. tauri strain OTH 95 (Table S1).
General description of the OtV-1 genome
Sequence analysis of the OtV-1 genome (GenBank Accession No. FN386611) revealed a linear genome of 191 761 bp (Fig. 2). General features of the OtV-1 genome sequence include: (i) a nucleotide composition of 45.36% G+C; (ii) a total of 232 predicted coding sequences (CDSs) that were defined as having a start codon followed by at least 65 additional codons prior to a stop codon; (iii) CDSs equally distributed on both strands (53% on the positive strand and 47% on the negative strand); (iv) an average gene length of 750 bp; and (v) a coding density of 1.199 genes per kbp. Although no major repeat regions were identified, dot-plot analysis (Lowe and Eddy, 1997), which compared the genome with itself, identified the location and orientation of three small but distinct families of repeats (designated A, B and C) in the genome (Fig. S1). All three repeat families occur in coding regions of the genome. Family A repeats fall within a 1260 bp segment (62 930–64 290 bp) and correspond to a GC-rich region (average 63.5%) within a predicted putative haemagluttinin protein product (OtV1_073). The remaining two repetitive regions occur between 93 800 and 96 600 bp (Family B repeats) and 99 090 and 100 850 bp (Family C repeats) respectively. Both these repeat regions fall within the same putative CDS, which encodes a putative viral A-type inclusion body (OtV1_115). All three repetitive regions consist of direct repeats.
Orientation of the OtV-1 genome revealed a high level of colinearity with OtV5 (Fig. 3). In addition, 87.5%, i.e. 203 of 232, CDSs in OtV-1 have orthologues in OtV5, with only 29 CDSs being unique to OtV-1 (Table 1). A general comparison of OtV5 genome features with the OtV-1 genome also showed similar attributes, with OtV5 having a comparable size (186 234 bp) and G+C composition (45.29%), though with a slightly smaller mean gene length (702 bp) giving rise to more predicted genes (268).
Table 1. Summary of genes that have no orthologues in OtV5 that are unique to the OtV-1 genome.
Top significant hit
Unnamed protein product
mRNA capping enzyme
ATP-dependent protease proteolytic subunit
Emiliania huxleyi virus-86
Identity of putative CDSs
Of the 232 CDSs identified in the OtV-1 genome, 162 (69%) encode putative proteins where no function has been attributed (Table S2). Of these unknown CDSs, 32 share close similarity with other hypothetical proteins in databases. Therefore, 130 (56%) of the CDSs do not correspond to known proteins, highlighting a potentially high degree of functional novelty in this virus. Predicted metabolic functions of the remaining 73 CDSs (31%), which have homology to known proteins, were split into nine functional groups (Table 2). Ten CDSs could be assigned putative functions involved in DNA replication, recombination and repair; 6 in nucleotide metabolism and transport; 7 in transcription; 19 in protein and lipid synthesis, modification and degradation; 1 in signalling; 4 in DNA methylation; 11 in sugar metabolism; 8 encoding capsid proteins; and a further 7 with miscellaneous functions. OtV-1 is also predicted to encode four tRNAs: encoding amino acids Ile, Gln, Asn, and a pseudo tRNA. All four tRNAs contain introns and are clustered in a region of the OtV-1 genome between 145 506 bp and 150 490 bp (Table S3).
Table 2. Predicted proteins encoded by the OtV-1 genome grouped by function.
DNA, RNA, replication, recombination and repair
Protein and lipid synthesis/modification
Prolyl 4-hydroxylase alpha-subunit
ATP dependent DNA ligase
DNA topoisomerase I
Ubiquitin C-terminal hydrolase
DNA topoisomerase II
ATP-dependent protease proteolytic subunit
33 kDa in vitro translation peptide
33 kDa in vitro translation peptide
Aminotransferase family protein
VV A18 Helicase
Nucleotide transport and metabolism
Ribonucleotide reductase, large subunit
Ribonucleotide reductase, small subunit
TPR domain-containing protein
ATPase (VV A32 virion packaging ATPase)
Oxo-acyl carrier protein dehydrogenase
FAD dependent thymidylate synthase
RNA transcription factor TFIIB
RNA transcription factor TFIIS
Capsid protein 1
SW1/SNF helicase domain protein
Capsid protein 2
TATA-box binding protein
Capsid protein 3
mRNA capping enzyme
Capsid protein 4
mRNA capping enzyme
Capsid protein 5
Sugar manipulation enzymes
Capsid protein 6
Capsid protein 7
Capsid protein 8
Virus inclusion body
Rhodanese domain-containing protein
ABC1 domain protein
Hep_Hag haemagluttinin family protein
Phosphate starvation inducible protein
Serine/Threonine protein kinase
Comparison of OtV-1 with other NCLDVs
OtV-1 is smaller than most other phycodnaviruses and sequenced NCLDVs (Table 3). A total of 15 Group I, II and III core genes was identified in the OtV-1 genome (Table 3). The presence/absence of these genes in five phycodnaviruses was subsequently determined. Recently, whole-genome comparisons of NCLDVs (Iyer et al., 2001; 2006) found nine genes (Group I) are shared by genomes from all NCLDV family members. A further eight genes (Group II) were found in all five NCLDV virus families but were missing in one or more lineages within those families. Finally, 14 Group III genes were conserved in three of the five families of NCLDVs and Group IV genes were conserved in two families of NCLDVs. However, fewer CDSs, including the core set of conserved virus genes for NCLDVs, were found in the OtV-1 genome when compared with the majority of NCLDVs sequenced to date, with the exception of the recently sequenced phaeovirus, FsV-158 (Schroeder et al., 2009) (Table 3). Hence, homologues were identified for only 5 of the 9 Group I genes, 6 of the 8 Group II genes and 4 of the 14 Group III genes in the OtV-1 genome (Table 3).
Table 3. Presence of NCLDV Core Genes (Groups 1, 2 and 3) in different NCLDV genomes
A high proportion of the CDSs of the OtV-1 genome bear close similarity to bacterial-type genes. It has been hypothesized that NCLDV genomes undergo successive accretions of bacterial genes, perhaps via bacteria-feeding hosts (Filee et al., 2007). The recent sequencing of the complete genome of the phaeovirus FsV-1 (Schroeder et al., 2009) and the prasinoviruses, OtV5 (Derelle et al., 2008) and OtV-1 (this study), allow us to review the current identification and classification of groups of core NCLDV genes. The genes for vaccinia virus (VV) D5 type ATPase, thiol oxidoreductase, VV D6R-type helicase and VLFT2-like transcription factor are absent from the OtV-1 genome. Neither were they reported in the OtV5 genome. In addition, the genome of FsV-158 does not have a VV D6R-type helicase. These findings have implications for the present classification system of core genes. The Group I core genes absent from OtV-1 could possibly now be regarded as Group II and not Group I core NCLDV genes. Furthermore, the presence of genes for a Proliferating Cell Nuclear Antigen (PCNA) and ribonucleotide reductase large and small subunits, in all sequenced genomes of members of the Phycodnaviridae could reclassify these genes as Group I core genes instead of Group II core genes.
OtV-1 DNA polymerase gene
A putative DNA polymerase gene (OtV1_208) was identified by blastp searches, which revealed close similarities with the DNA polymerase gene of other phycodnaviruses, and more distantly related species (data not shown). Phylogenetic analysis of highly conserved regions (Chen and Suttle, 1995) common to viral DNA polymerases (Fig. S2) showed that most algal viruses are more closely related to each other than to other groups of analysed viruses. Within this group, OtV-1 clusters with strong bootstrap support to the OtV5 and OtV-2 (W.H. Wilson and K.D. Weynberg, unpublished) viruses as well as the M. pusilla virus. This microalgal cluster is distinct from viruses infecting macroalgae (EsV-1, FsV and FirrV-1) and from Chlorella viruses (PBCV-1 and NY-2A) (Fig. S2). Integrating these results with previous observations on the basic characteristics of this virus suggests OtV-1 should be assigned to the family Phycodnaviridae.
The highly conserved motif I of the polymerase domain YGDTDS, which includes a catalytic-site residue, is interrupted at its centre (YGD/TDS) by a 329-amino-acid insert sequence (Fig. 4). This location, termed the Pol-c intein integration point, includes inteins in the type B polymerases of several archaeal DNA polymerase genes (YAD/TDG) (Niehaus et al., 1997; Waters et al., 2003), as well as in the recently described type B polymerase of mimivirus, a huge dsDNA virus infecting amoebae (Raoult et al., 2004) and in the type B polymerase of the raphidovirus HaV (YGD/TDS) (Nagasaki et al., 2005). The Intein Database and Registry (InBase) was used to confirm this insertion as an intein. The intervening sequence includes all the motifs of intein protein-splicing domains (Fig. 4). This is the first intein identified in an O. tauri virus.
No intein was observed in the DNA polymerase of OtV5. Perhaps more surprising, though, is the absence of any intein in a more extensive study recently performed on nearly 30 Ostreococcus virus isolates (Bellec et al., 2009). Despite this, the OtV-1 intein includes all of the motifs characterizing a functional autocatalytic self-splicing domain (Perler, 2002), including the residues that provide the nucleophilic groups in the self-splicing reactions (Pietrokovski, 1998a; Perler, 2002) (Fig. 4). All of the identified protein-splicing domains are at typical positions within the intein, including the signature dodecapeptide motif (EN1, Fig. 4) found in endonucleases in the LAGLIDADG family (Chevalier and Stoddard, 2001). The presence of three additional conserved regions (EN2, EN3, EN4, Fig. 4) in the OtV-1 DNA pol intein, all indicate an active homing endonuclease.
Phylogenetic analysis of the OtV-1 intein, inteins from the InBase Registry and DNA polymerase I motif C inteins from viral and archaeal isolates resulted in three distinct clusters (Fig. 5). Inteins from OtV-1 and phycodnavirus-like DNA pol genes formed one cluster, while inteins from halophilic and thermophilic archaeal isolates each formed monophyletic clusters. OtV-1 is now the fourth dsDNA virus described to exhibit an intein.
Genes unique to the OtV-1 genome
Among the 30 unique genes in OtV-1, i.e. those not found in OtV5 (Table 1), are genes which encode a potential exonuclease (OtV1_005) and DNA cytosine-methyltransferase (OtV1_055). The cytosine-methyltransferase shares close homology (∼40% similarity at the amino acid level) with the equivalent protein in PBCV-1. With an additional two adenine-specific methyltransferases and one cytosine-specific methyltransferase, the OtV-1 genome encodes four methyltransferases in total (Table 2). The level of methylation of the OtV-1 genome is unknown. These DNA methyltransferases lack obvious companion site-specific endonucleases within the OtV-1 genome.
A total of 11 CDSs in the OtV-1 genome share close similarity to genes found in the host genus, Ostreococcus (Table 4). High identity scores were seen for some of these, including genes that encode a putative dTDP-d-4,6-glucose dehydratase (OtV1_042), a proline dehydrogenase (OtV1_189) and a DNA topoisomerase II (OtV1_212) (Table 4). The OtV-1 genome also encodes a ribonucleotide reductase small subunit (OtV1_151), which shares 62% identity with the equivalent gene in Ostreococcus lucimarinus. One of the two mRNA capping enzymes encoded by OtV-1 (OtV1_087) shares 26% identity with a mRNA capping enzyme encoded by both O. lucimarinus and O. tauri. A second mRNA capping enzyme (OtV1–156) shares 30% identity to that encoded by the chlorovirus PBCV-1, and therefore bears a closer homology to the ancestral NCLDV mRNA capping enzyme. In addition, a fibronectin-like binding protein (OtV1_203) shares homology (∼40% amino acid identity) with a gene in both O. lucimarinus and O. tauri. Four of the 11 host-like genes are of unknown function.
Table 4. Summary of putative CDSs with database homology to host genes.
Phylogenetic analysis of the aligned amino acid sequences of the active sites of DNA topoisomerase II genes of several viruses and eukaryotes, including Ostreococcus spp., revealed that O. tauri viruses form a distinct cluster with strong bootstrap support (Fig. S3). The eukaryotic algal sequences, including host Ostreococcus genes, form a distinct cluster from all virus sequences (Fig. S3). This phylogeny suggests the DNA topoisomerase II gene arose in viruses in the evolutionary distant past, possibly prior to the divergence of the green algal lineage.
The OtV-1 genome encodes eight putative capsid proteins (Table 2 and Table S4). After a top blastp hit to OtV5, five of the capsid proteins have closest similarity to Pyramimonas orientalis virus PoV-1, two have closest similarity to Heterosigma akashiwo virus HaV-1, and one has closest similarity to a capsid protein encoded by PBCV NY-2A. Phylogenetic analysis of these proteins with other putative phycodnavirus major capsid protein (MCP) genes showed capsid protein 6 of the OtV-1 genome forms a distinct cluster with the phycodnavirus MCP gene sequences (Fig. 4), indicating that capsid protein 6 is likely to be the MCP in OtV-1. This would require further experimental analysis to confirm.
DNA replication and repair-associated proteins
The OtV-1 genome encodes 10 proteins that are involved in DNA replication, recombination or repair (Table 2). Thus, there are proteins encoding both types of DNA topoisomerases, i.e. types I and II. Topoisomerases introduce temporary single- or double-stranded breaks in DNA, which help to resolve problems associated with DNA topology during its replication, transcription and recombination (Champoux, 2001). Type I topoisomerases (ATP-independent) function by passing one strand of the DNA through a break in the opposite strand, while type II topoisomerases are adenosine triphosphatases (ATPases) and introduce a double-stranded gap (Roca, 1995). With the notable exception of Poxviridae, many dsDNA viruses (including NCLDVs and phages) encode their own type II DNA topoisomerase. Accordingly, OtV-1 exhibits a CDS (OtV1_212) 1072 amino acids in lengthwith a predicted a molecular mass of 121 kDa, which shares 43% amino acid identity to several eukaryotic topoisomerases. The OtV-1 protein is similar in size to one found in the chlorovirus PBCV-1, which encodes one of the smallest topoisomerases, with a molecular mass of 120 kDa, compared with 160–180 kDa in most other eukaryotes (Lavrukhin et al., 2000). Moreover, the PBCV-1 enzyme cleaves dsDNA approximately 30 times faster than the human type II DNA topoisomerase (Fortune et al., 2001). However, the smallest type II topoisomerase, a polypeptide 1058 amino acids in length, is found in the Chlorella Pbi virus CVM-1. This enzyme has a DNA cleavage capacity 50-fold faster than the human-type II topoisomerase (Dickey and Osheroff, 2005).
The wide distribution of the DNA topoisomerase II enzyme, i.e. from Chlorella viruses to viruses EhV-86, APMV mimivirus and now in all the OtV viral genomes sequenced to date (Derelle et al., 2008; this study) suggests this enzyme plays a key functional role. The high DNA cleavage activity of the PBCV-1 DNA topoisomerase may indicate a possible role in viral recombination (Dickey et al., 2005). Interestingly, the OtV-1 type II DNA topoisomerase shows 50% amino acid identity to the host O. tauri enzyme and 45% identity to the Ostreococcus lucimarinus orthologue (E-value = 0; Table 4). Phylogenetic analysis of the DNA topoisomerase II gene sequences of a number of eukaryotic algae and NCLDVs shows that the host algae form distinct clusters separate from the algal viruses (Fig. S3). It is possible then, that this gene has been acquired by horizontal gene transfer, but the direction remains unclear. The clustering of algal homologues into two distinct clades, suggests that this was no recent event and the topoisomerase gene must have transferred from host to virus, or vice versa, in the evolutionary distant past.
The OtV-1 genome also encodes a protein that resembles a Proliferating Cell Nuclear Antigen (PCNA) protein (OtV1_107). PCNA interacts with proteins not only involved in DNA replication but also DNA repair and post-replicative processing, such as DNA methyltransferases and DNA transposases (Warbrick, 2000). As the OtV-1 genome encodes proteins involved in both DNA repair and DNA methylation, this poses an interesting question as to whether the PCNA protein also interacts with the encoded methyltransferase proteins.
Nucleotide metabolism-associated proteins
Due to their large genome size, NCLDVs usually encode several deoxynucleotide synthesis enzymes to ensure the provision of sufficient quantities of nucleotides for their replication. Viral DNA synthesis most likely requires higher concentrations of dNTPs than the host can supply, so large quantities of dNTPs must be synthesized de novo by both viral and host encoded proteins. Indeed, OtV-1 encodes six CDSs related to nucleotide metabolism enzymes (Table 2), including both ribonucleotide reductase subunits (OtV1_133 and OtV1_151), one VV A32 ATPase (OtV1_092), thymidine kinase (OtV1_196), dUTPase (OtV1_199) and a FAD-dependent thymidylate synthase (OtV1_056). Thymidine kinase is a universal enzyme that catalyses the ATP-dependent phosphorylation of thymidine. These enzymes, involved in DNA precursor metabolism, are known to be important in other members of the Phycodnaviridae, most notably the chloroviruses, which have been extensively characterized (Van Etten, 2003). For example, in PBCV-1, the concentration of DNA in the host cell has been reported to increase by 4- to 10-fold, 4 h after infection due to viral DNA synthesis (Van Etten et al., 1984; Van Etten, 2002). The enzyme dUTPase hydrolyses dUTP to dUMP and pyrophosphate. dUMP is a substrate for thymidylate synthase and is required for de novo synthesis of thymidylate (dTMP), an essential DNA precursor. This enzyme is also required to avoid the incorporation of deoxyuridine into newly synthesized DNA. The gene for dUTPase is ubiquitous in eukaryotes, eubacteria and archaea and is also found in a number of retroviruses and DNA viruses, where viral dUTPases may help control local dUTP levels in order to enhance viral replication (Baldo and McClure, 1999).
OtV-1 lacks a traditional thymidylate synthetase A. Instead it encodes a protein that is a member of a new family of flavin-dependent thymidylate synthetases called ThyX (Myllykallio et al., 2002) (OtV1_056; Table 2). This enzyme is also found in chloroviruses but has not been reported in any other member of the Phycodnaviridae. Thymidylate (dTMP) is an essential DNA precursor. There are two pathways for thymidylate synthesis, both of which utilize a different thymidylate synthase, namely ThyA and ThyX (Myllykallio et al., 2002). Both enzymes, through the methylation of dUMP, convert dUMP to dTMP. However, there is no sequence identity or structural similarity between these two enzymes. Although both ThyA and ThyX depend on methylenetetrahydrofolate for activity, their reductive mechanisms differ markedly. Only ThyX uses FAD as a cofactor and has a highly conserved sequence motif, RHRX7S, or ThyX motif, as well as overall significant amino acid sequence similarity to a class of alternative synthases called ThyX. ThyX proteins have been found in approximately 25% of sequenced archaeal and bacterial genomes (Myllykallio et al., 2002), including many pathogenic bacteria (Liu and Yang, 2004), a number of bacteriophages (Bhattacharya et al., 2008) but only certain double-stranded DNA viruses (Graziani et al., 2004; 2006). Numerous ThyA homologues have been analysed from viral sources but equivalent data have not been reported for viral ThyX proteins to date. Biochemical studies indicate that ThyX proteins bind FAD and act as NAD(P)H oxidases in the presence of bound dUMP. This would indicate that viral infection requires FAD from the host. As reported for the PBCV-1 genome, OtV-1 also lacks a canonical ThyA.
OtV-1 encodes at least two putative transcription factor-like elements: TFIIB (OtV1_153) and TFIIS (OtV1_024). OtV-1 also encodes two proteins that are involved in creating an mRNA cap structure: both are mRNA capping guanyltransferases (OtV1_087 and OtV1_156). The CDS OtV1_156 does not have a homologue in OtV5 but has closest amino acid identity (30%) with the Chlorella virus PBCV-NY2A_B148R (Fitzgerald et al., 2007a). The first two steps in the capping of cellular mRNAs are catalysed by the enzymes RNA triphosphatase and RNA guanylyltransferase. The mRNA cap structure is a defining feature of eukaryotic mRNA and is required for mRNA stability and efficient translation. Expression of mRNA capping enzymes has been reported during immediate early infection (onset at 10 min post infection) of chloroviruses (Kawasaki et al., 2004).
OtV-1 does not encode any RNA polymerase genes, a feature in accordance with previous reported sequenced genomes of the PBCV-1 and EsV-1 lineages (Van Etten, 2003; Wilson et al., 2009). In contrast, the genome of the coccolithovirus EhV-86 encodes five RNA polymerase subunits. The smaller phycodnaviruses, chloroviruses and phaeoviruses, as well as the prasinoviruses described in this study, most likely underwent lineage-specific gene losses in their genomes, leading to the disappearance of a host-independent transcription apparatus. This would result in a major propagation change for the virus, which, through the loss of the RNA polymerase subunits, would become nuclear-dependent, not independent, for transcription (Allen et al., 2006).
OtV-1 encodes for a putative TATA-box binding protein (OtV1_140). A TATA binding protein transcription factor binds specifically to the TATA box sequence, which is usually found 25–30 bases upstream of the transcription start site in some eukaryotic gene promoters. It helps to position RNA polymerase II over the start of the transcription site. Since no recognizable RNA polymerase or RNA polymerase components have been detected in the OtV-1 genome, infectious viral DNA may target the nucleus utilizing host RNA polymerase(s) to initiate viral transcription, possibly in conjunction with viral-encoded transcription factors.
The OtV-1 genome also encodes a RNase III enzyme (OtV1_125) and a RNase H (OtV1_168) that are presumably involved in processing viral mRNAs and/or tRNAs. The OtV-1 genome encodes a SW1/SNF family helicase (OtV1_129), which has been implicated in chromatin remodelling (Kim and Clark, 2002).
Sugar manipulation enzymes
The OtV-1 genome encodes several proteins with high identities to enzymes involved in either manipulating sugars, synthesizing polysaccharides or transferring sugars to proteins. Following the advent of ‘-omic’ approaches, i.e. genomics, transcriptomics and proteomics as important biological research areas, the field of ‘glycomics’ is now emerging. The ‘glycome’ is defined as all the sugars a biological entity makes, including glycans fixed on proteins, lipids or DNA (Hirabayashi, 2004). Viruses can modify the glycome by two distinct mechanisms. Some affect expression of host glycosyltransferases and others express their own glycosyltransferases.
Glycosyltransferases form a complex group of enzymes involved in the biosynthesis of disaccharides, oligosaccharides and polysaccharides that are involved in the post-translational modification of proteins (N- and O- glycosylation), and the synthesis of lipopolysaccharides included in high molecular-weight cross-linked periplasmic or capsular material (Lovering et al., 2007). Glycosyltransferase encoding genes have been reported in bacteriophages, baculoviruses, poxviruses, herpesviruses and phycodnaviruses (Markine-Goriaynoff et al., 2004). In the phycodnaviruses, only two virus genera have been identified as encoding their own glycosyltransferases (Van Etten, 2003). The chlorovirus ATCV-1, encodes six glycosyltransferases (Fitzgerald et al., 2007b), while PBCV-1 encodes five putative glycosyltransferases (Zhang et al., 2007). Here we identified at least six putative glycosyltransferase genes in the OtV-1 genome, four from family 1, and one each from family 2 and 25 (Table 2). Typically, viral proteins are glycosylated by host encoded glycosyltransferases located in the endoplasmic reticulum and Golgi Body (Knipe, 1996), but the PBCV-1 MCP Vp54 is glycosylated by the virus itself (Wang et al., 1993). No transmembrane domains or ER/Golgi body signal peptides were detected in the OtV-1 glycosyltransferases, suggesting these enzymes function in the cytoplasm. It is possible that the glycosyltransferases encoded by OtV-1 play a similar role to those in PBCV-1. Viral encoded glycosyltransferases may be crucial in the glycosylation of the OtV-1 capsid structure, particularly as there are eight putative major capsid genes in the OtV-1 genome. Further characterization work is required to confirm this hypothesis.
OtV-1 also encodes a GDP-d-mannose dehydratase (GMD) (OtV1_012), an enzyme involved in the first step of biosynthetic pathways that lead to the formation of several deoxyhexoses, e.g. GDP-l-fucose and GDP-d-rhamnose, in both prokaryotes and eukaryotes. GMD sequences are well conserved with three conserved amino acid residues Ser/Thr, Tyr and Lys involved in catalysis, and a glycine-rich motif at the N-terminus involved in cofactor binding. GMD is a widely represented across all taxa and has recently been identified in three chloroviruses (Tonetti et al., 2003; Markine-Goriaynoff et al., 2004; Fitzgerald et al., 2007a), OtV5 (Derelle et al., 2008) and the Prochlorococcus phage P-SSM2 (Sullivan et al., 2005). GDP-l-fucose is a donor substrate for fucosyltransferase activity. Fucose is found in many glycoconjugates of prokaryotes and eukaryotes, where it often plays a fundamental role in cell-cell adhesion and recognition (Luther and Haltiwanger, 2009).
OtV-1 also possesses 6-phosphofructokinase (OtV1_172), a key glycolytic enzyme. This enzyme converts fructose 6-phosphate to fructose-1,6-bisphosphate with concomitant ATP hydrolysis, an irreversible step which commits the glycolytic process to completion. This is only the second virus, aside from OtV5, reported to encode this enzyme. As glycolysis is the breakdown of sugars to release energy, this important reaction may be controlled by a viral-encoded 6-phosphofructokinase during infection.
Protein, amino acids and lipid synthesis, modification and degradation
The inability of viruses to perform protein synthesis independently of the host is one of the features that distinguish these biological entities from cellular or ‘living’ organisms. However, similar to OtV5, OtV-1 contains a gene (OtV1_036) encoding acetolactate synthase (ALS) a thiamine diphosphate-dependent enzyme, which catalyses the first step in the branched amino acid synthesis pathway i.e for synthesis of leucine, isoleucine and valine. ALS is found in plants, fungi and bacteria. Interestingly, the OtV-1 gene seems to have been acquired from a bacterial source since all significant blast results are to the equivalent gene in different bacterial species with an average amino acid identity of 33% and E-values in the region of 1e-83.
A putative proline dehydrogenase gene (OtV1_189) was also identified in the OtV-1 genome. This enzyme participates in arginine and proline metabolism. Interestingly, this gene may have originated from the host, since the virus gene shares 49% identity with the O. tauri host copy (Table 3). Acquiring genes encoding ALS and proline dehydrogenase presents obvious advantages for this virus, in terms of facilitating amino acid synthesis during replication. OtV-1 also encodes a putative asparagine synthase (glutamine-hydrolysing) (OtV1_028), an enzyme that plays a role in the generation of asparagine from aspartate. This enzyme is also found in the dsDNA viruses OtV5 (Derelle et al., 2008) and APMV mimivirus (Raoult et al., 2004). 3-dehydroquinate synthase catalyses the formation of dehydroquinate and orthophosphate from 3-deoxy-d-arabino heptulosonic 7 phosphate (Bender et al., 1989). This reaction is part of the shikimate pathway, which is involved in the biosynthesis of aromatic amino acids. CDS (OtV1_038) encodes a putative 3-dehydroquinte synthase. The 3-dehydroquinate synthase domain is present in isolation in various bacterial 3-dehydroquinate synthases and also present as a domain in the pentafunctional AROM polypeptide. The CDS in OtV-1 shares 36% identity with an equivalent gene in the bacterium Magnetospirillum gryphiswaldense (Table S2). As this enzyme is found widely in bacterial species, it would be unsurprising if this were the source of the gene found in the OtV-1 genome.
OtV-1 encodes a putative FtsH metalloendopeptidase (OtV1_014) that has not been reported in other characterized eukaryotic algal virus genomes, with the exception of OtV5 (Derelle et al., 2008). FtsH is a membrane-bound ATP-dependent Zn2+ protease (Karata et al., 1999). This cell division protein is present in bacteria and plants and the CDS in the OtV-1 genome bears close sequence homology to marine bacterial genes, notably in Prochlorococcus (data not shown). Interestingly, this enzyme is involved in the early stages of photosystem II repair in the cyanobacterium Synechocystis sp. PCC6803 (Silva et al., 2003). It is postulated here that the OtV-1 virus utilizes this enzyme to perform a similar function to its cyanobacterial homologue, ultimately ensuring that the chloroplast functions normally during infection.
A putative N-myristoyltransferase (OtV1_086) is also present in the OtV-1 genome. Covalent modification with fatty acids is now a well-established feature of several cellular and viral polypeptides (Cross, 1987). Two common fatty acid modifications are palmitylation and myristylation. Myristylation occurs predominantly cotranslationally and involves addition of the 14-carbon saturated fatty acid, myristic acid, usually via an amide bond to an amino acid terminal glycine residue (Wilcox et al., 1987). Although myristylated proteins are known to be widespread both in eukaryotic cell membranes and in viruses (Grand, 1989), this is the first report of a putative N-myristoyltransferase in a member of the Phycodnaviridae. However, there is a similar encoded protein reported for the NCLDV member, APMV mimivirus (Raoult et al., 2004). This enzyme has a specific requirement for myristoyl-CoA (Towler and Glaser, 1986), produced from myristate by acyl-CoA synthetase in the presence of ATP. Myristolytransferase activity is associated with cytoplasmic and membrane fractions in both yeast and mammalian cells (Towler and Glaser, 1986). Myristoylation is a process whereby essentially cytoplasmic proteins or enzymes can become membrane-bound, thus locating them at their site of action. It is therefore possible that myristate can play a role in mediating protein–protein interactions within the virus, such as in virus capsid assembly. The putative gene in OtV-1 shares approximately 36% identity with several eukaryotic organisms.
A putative prolyl-4-hydroxylase enzyme (OtV1_111) was also found in OtV-1. This is a procollagen-modifying enzyme and a key enzyme in the biosynthesis of collagens, a family of extracellular matrix proteins in higher organisms (Eriksson et al., 1999). This enzyme has been reported in a small number of viruses, namely PBCV-1 (Eriksson et al., 1999), APMV mimivirus (Raoult et al., 2004) and OtV5 (Derelle et al., 2008). It had previously been thought that 4-hydroxyproline was restricted to certain plant and animal proteins only. The function of 4-hydroxyproline residues in all collagens and collagen-like proteins in animals is to stabilize their triple helical structures (Prockop, 1995). The functions of these residues in plant proteins are less well characterized but are also likely to involve stabilization of structures. The role of 4-hydroxyproline residues in viral proteins is likely to be similar to those in animal and plant proteins, but work is needed to elucidate these functions. Presumably, the OtV-1 virus utilizes this enzyme in the production of structural bodies during its propagation.
The presence of a putative 2OG-Fe(II) oxygenase (OtV1_170) indicates the virus may encode additional enzymes involved in the modification of collagen-like proteins. The enzyme 2OG-Fe(II) oxygenase belongs to a class of enzymes that are widespread in eukaryotes and bacteria and catalyse a variety of reactions typically involving the oxidation of an organic substrate using a dioxygen molecule (Prescott, 1993). An extensively characterized reaction involving this enzyme is the hydroxylation of proline and lysine side chains in collagen (Aravind and Koonin, 2001). The presence of both a putative prolyl-4-hydroxylase and a putative 2OG-Fe(II) oxygenase in the OtV-1 genome indicates that this virus may encode the stabilization of complex structures, such as carbohydrate units, during its replication.
Restriction modification enzymes
Although the level of methylation of the OtV-1 genome is unknown, OtV-1 encodes four methyltransferases (Table 2). Methyltransferases have already been reported in the genomes of Chlorella viruses (Zhang et al., 1998; Fitzgerald et al., 2007a). OtV-1 encodes one DNA methyltransferase that is predicted to methylate cytosine and two DNA methyltransferases which methylate adenine, while a putative FkbM methyltransferase is also present. Although the exact biological function of the virus-encoded methyltransferases is unknown, methylation of the virus genome most likely protects viral DNA from endonucleolytic attack (Essani et al., 1987). In bacteria, restriction modification systems confer resistance to foreign DNA and viral DNA. It has been observed that the chlorovirus PBCV-1 encodes three cytosine and two adenine methyltransferases (Nelson et al., 1998). A direct correlation has been reported between increasing 5-methylcytosine (m5C) concentrations in virus genomes and the sensitivity of virus replication to the cytidine methylation inhibitor, 5-azacytidine (Burbank et al., 1990). OtV-1 encodes an FkbM methyltransferase (OtV1_015), with some similarity to an enzyme from several bacterial species. In addition to the presence of a putative FkbM methyltransferase in OtV-1, this enzyme is encoded by just three chlorovirus types (Fitzgerald et al., 2007a,b), OtV5 (Derelle et al., 2008) and the Prochlorococcus phage P-SSM2 (Sullivan et al., 2005).
A gene (OtV1_030) encoding a PhoH family protein is also present in OtV-1. PhoH is a cytoplasmic protein and predicted ATPase that is induced by phosphate starvation in various bacteria (Wanner, 1993; 1996). Phosphorus (P) is an essential macronutrient for growth and development of living organisms. It is a constituent of key molecules such as ATP, nucleic acids or phospholipids, and as phosphate, pyrophosphate, ATP, ADP or AMP, plays a crucial role in energy transfer, metabolic regulation and protein activation. P is one of the key macronutrients potentially limiting phytoplankton growth and so it is not surprising that they have evolved various adaptive responses to cope with growth under conditions of limited P availability (Cembella et al., 1984). Biochemical and metabolic adaptations involve changes that increase the availability of endogenous and exogenous inorganic phosphate. The presence of a phosphate starvation inducible protein in the OtV-1 viral genome is of interest since the virus is likely to employ this protein to increase the phosphate concentration in the host cell to facilitate viral replication. The OtV-1 gene may have originated from a bacterial source, as it shares 41% identity with a similar protein encoded by the bacterium Thermoanaerobacter pseudethanolicus (Table S2), and shares similar homology to several other bacterial-encoded PhoH family proteins. The fact that a putative phosphate repressible phosphate permease (PPRPP) gene was previously identified in the coccolithovirus, EhV-86 (Wilson et al., 2005b) reiterates the idea that the availability of P is critical for viral replication.
Echinonectin is a dimeric galactosyl-binding lectin believed to play a role in embryonic cell-extracellular matrix adhesion in sea urchin eggs and embryos (Alliegro and Alliegro, 2007). This protein has a marked specificity for galactose. The presence of a putative gene (OtV1_113) encoding this protein suggests it may act as an adhesion protein, as it has this function in cells of developing echinoderm embryos, where it is believed to play a role in cell anchoring. This enzyme may also be related in function to the sugar manipulation enzymes encoded by the OtV-1 genome.
Despite the obvious ubiquity and abundance of their hosts, relatively little research has been conducted on prasinoviruses (generic term for viruses that infect the Prasinophyceae) with only a few isolates being described. This is perhaps ironic, given the first report of a virus isolate that belonged to the Phycodnaviridae was a prasinovirus isolated on M. pusilla (Mayer and Taylor, 1979). Research has focused largely on their ecological role and analysis of their genetic diversity (Cottrell and Suttle, 1991; 1995b; Sahlsten, 1998; Sahlsten and Karlson, 1998; Zingone et al., 1999; 2006). Micromonas pusilla specific viruses can lyse up to 25% of the Micromonas population daily (Evans et al., 2003). However, high host growth rates coupled with high diversity of both host (Worden, 2006) and virus (Chen et al., 1996) allow them to propagate in a stable coexistence (Cottrell and Suttle, 1995a) compared with a bloom-bust scenario observed with coccolithovirus infection of Emiliania huxleyi blooms (Bratbak et al., 1993; Jacquet et al., 2002; Wilson et al., 2002). Little has been reported on their molecular characterization. Although approximately 5.5 kbp larger than OtV5, the OtV-1 genome is still at the small end of the Phycodnaviridae-sequenced genome size spectrum (Wilson and Schroeder, 2009). To date, there are currently 10 Phycodnaviridae genomes that have been sequenced, ranging from FsV-158 at 154 kbp (Schroeder et al., 2009) to EhV-86 at 407 kbp (Wilson et al., 2005b). Other larger algal virus genomes are known to exist, such as the virus that infects Pyramimonas orientalis, a marine microalga, which has a genome of approximately 560 kbp (Sandaa et al., 2001).
The sequencing of the OtV-1 genome has revealed some surprising and exciting features. Aside from the core genes previously reported in members of the NCLDV group of viruses, there are several putative genes present in the OtV-1 genome, which have either previously never been described in a virus or have only been reported in a few viruses at most. Functional characterization of the genes reported in this study should help to gain a clearer insight into how these viruses operate during their replication cycle. As more prasinovirus genomes are sequenced, there is sure to be an explosion of exciting gene discoveries with novel functions. In addition, genome sequence information will provide a starting point for post-genomic tools to explore the ecological role of this important group of viruses.
Plaque-forming virus OtV-1 was isolated from surface seawater collected on 06/09/2006 at the L4 sampling station in the Western English Channel (coordinates are 50°15′N, 04°13′W). The OtV-1 host, O. tauri strain OTH 95 was grown in Keller (K) medium (Keller et al., 1987) at 20°C, under a 16:8 light : dark cycle, at irradiance of 100 μmol photons m−2 s−1 in a Sanyo MLR-350 incubator. OtV-1 was purified by plaque assays, as described by Schroeder and colleagues (2002), in order to obtain clonal virus stocks. Briefly, virus lysate was obtained by adding 100 μl of concentrated seawater to exponentially growing O. tauri. Once clearing of the host culture was observed, the lysate was passed through a 0.2 μm filter (Durapore, Millipore). The 1.5% (w/v) electrophoresis grade agarose plates were prepared by mixing sterile 7.5% (w/v) agarose in distilled water to 30 kDa filtered autoclaved seawater while both solutions were at about 70°C. The combined 1.5% (w/v) agarose seawater mixture was left to cool to 55°C, after which the necessary K media nutrients were added and the solution poured and allowed to set at room temperature in Petri dishes.
Exponentially growing phytoplankton cells were harvested by centrifugation at 5000 g for 5 min at 4°C and resuspended in the appropriate medium (50× concentration). The resuspended cells were then mixed with 100 μl 10-fold dilutions, 10−2 incrementally to 10−8, of the virus stocks in sterile medium and incubated at the appropriate temperature under constant illumination for 2 h to allow for virus adsorption. The virus-host suspension was mixed with 3 ml of molten 0.4% (w/v) electrophoresis grade agarose (40°C), made with K medium, and poured onto the bottom 1.5% (w/v) agarose plates. The plates were then kept in plastic bags and transferred to the incubator at the appropriate temperature and light conditions. The plates were monitored daily until clear plaques were visible. Single plaques were lifted from the plate using sterile pipette tips, resuspended in 0.5 ml K medium and used for subsequent inoculations after overnight incubation at 4°C. After three rounds of single-plaque purifications, viruses were subsequently produced by infecting liquid cultures of O. tauri in mid-exponential phase (∼1 × 107 cells ml−1).
DNA preparation and sequencing
For preparation of large quantities of viruses for genome sequencing, 10 l volumes of O. tauri exponentially growing culture were inoculated with 10 ml of OtV-1 at a multiplicity of infection of one. Lysed cultures were passed sequentially through 0.8 and 0.2 μm filters to remove large cellular debris. Virus filtrates were concentrated by ultrafiltration to ∼50 ml using a Quixstand benchtop system and hollow fibre cartridges with a 30 000 pore size (NMWC) (GE Healthcare Amersham Biosciences). The 50 ml concentrate was then further concentrated to ∼10 ml using the Mid-Gee benchtop system and hollow fibre cartridge with a 30 000 pore size (NMWC) (GE Healthcare Amersham Biosciences). Aliquots (3 ml) of the concentrated OtV-1 lysate were adjusted with CsCl to densities of 1.1, 1.2, 1.3 and 1.4, and gradients from 1.1 to 1.4 were formed by ultracentrifugation at 25 000 g at 22°C for 2 h in a SW40 Ti Beckman rotor. Virus bands were removed with a syringe and dialysed against 4 × 1 l volumes of filtered seawater.
Complete sequencing of the OtV-1 genome was performed at the Advanced Genomics Facility based at the University of Liverpool, UK.
Sequence assembly and finishing
The sample was sequenced to an estimated 10× depth using 454 GS-FLX sequencing technology. The resulting reads were then de novo assembled with the 454's own Newbler assembler software, version 1.1.03.24. Resulting contigs were next screened according to coverage depth to filter out the low coverage algal host contamination from the viral DNA. The remaining contigs were subsequently ordered and oriented with respect to the reference sequence, Ostreococcus virus OsV5 complete genome (NC_010191), using MUMmer 3.2. (Kurtz et al., 2004). Primers were designed to fill gaps between contigs, and the resultant Sanger sequence data was merged with the 454-generated contigs to form the completed genome sequence. Putative open reading frames (ORFs) were then identified, de novo, using glimmer3 using the provided g3-iterated.csh script (Delcher et al., 2007). Identified ORFs were blasted against the reference sequence to identify ORFs, which may have been split due to frameshift errors caused by the 454 sequencer. Where found, relevant CDS regions were joined and annotated accordingly. Coding regions of 65 amino acids or shorter were excluded from subsequent analysis. All remaining coding regions were then preliminarily annotated after blast searching against the reference to identify similarity. tRNA genes were identified with the aid of tRNAscan-SE version 1.23 (Lowe and Eddy, 1997). Annotation was then checked and supplemented manually within the Artemis software tool (release 11) (Rutherford et al., 2000).
Whole-genome sequences of OtV-1 were analysed and annotated using the software program Artemis (release 11) (Rutherford et al., 2000) with putative CDSs being generated based on predicted ORFs, correlation scores for each potential reading frame, GC content and codon usage indices. Similarities of putative CDSs were detected using blast and any homologous sequences were recorded. Coding sequences were assigned putative functions and were colour coded based on their function. A putative protein-coding region or ORF was defined as a continuous stretch of DNA that translates into a polypeptide that is initiated by an ATG translation start codon and extends for 65 or more additional codons. Each identified ORF was used in a search for homologues using the protein–protein blast (blastp) program (Altschul et al., 1990).
Comparative genomic analysis was conducted between the genomic data obtained in this study and similar virus genomes in the database using the software program Artemis Comparison Tool (act) (Carver et al., 2005). Inteins were analysed using the NEB InBase database (http://www.neb.com/neb/inteins.html).
Nucleotide sequence accession number
The OtV-1 sequence has been deposited in the GenBank database (Accession No. FN386611).
Analysis of repeat regions
To identify repetitive sequences within the OtV-1 genome, a dot-plot analysis was performed using the LBDotView Version 1.0 (Huang and Zhang, 2004). This analytical tool can compare one genome on the x-axis with a second genome on the y-axis. Here, the OtV-1 genome was compared against itself, enabling the precise location and orientation of homologous sequences to be located within the plot.
Multiple sequences of whole genes and partial genes were aligned using ClustalW using the default settings (Thompson et al., 1994). Phylogenetic analyses of the alignments were undertaken using a neighbour-joining algorithm within phylip, version 3.68 (Felsenstein, 2005), and trees were viewed using TreeView version 1.6 (Page, 1998).
Phylogenetic analyses were performed with the amino acid sequences of the DNA polymerase gene and intein. Phylogenetic trees were calculated from confidently aligned regions of homologous proteins by use of the phylip v3.68 (Felsenstein, 2005) package. DNA polymerase sequences from various representative viruses and several environmental samples that were very likely to be related viruses were multiply aligned with the ClustalW program (Thompson et al., 1994). This identified 10 ungapped conserved motif regions, totalling 198 amino acids, in each sequence. The same analysis of full-length viral DNA polymerases identified 10 regions, totalling 216 amino acids, in each sequence. These regions partially overlapped the 10 regions mentioned above but were distributed across entire sequences and could be identified in more virus families.
The designations and GenBank accession numbers of DNA polymerase sequences used for phylogenetic analysis were as follows [scientific name, with abbreviation in parentheses, followed by the database accession number (referring to the National Center for Biotechnology Information database)]: Alcephaline herpesvirus, AAK00812; Autographa californica nuclear polyhedrosis baculovirus (AcNPV), P18131; Bombyx mori nucleopolyhedral virus (BmNPV), NP047469; Chilo iridescent virus (CIV), AF303741; Chrysochromulina brevifilum virus PW1 (CbV-PW1), AAB49739; Chrysochromulina ericina virus (CeV01), ABU23716; Ectocarpus siliculosus virus (EsV-1), NP_077578; Emiliania huxleyi virus 86 (EhV-86), YP_293784; Equid herpesvirus, YP053063; Feldmania species virus (FsV), AAB67116; Feldmannia irregularis virus-1 (FirrV-1), AY225133.1; Fowlpox virus (FowPV), NP_039057; Heterosigma akashiwo virus (HaV), AB194136.1; Human herpesvirus, CAA28464; Lymantria dispar virus (LdNPV), AAC70269; Micromonas pusilla virus SP1 (MpV-SP1), AAB66713; M. pusilla virus SG1 (MpV-SG1), AAB49746; M. pusilla virus PL1 (MpV-PL1), AAB49747; Acanthamoeba polyphaga mimivirus (Mimivirus), YP_142676; Molluscum contagiosum virus (MOCV), NP043990; Ostreococcus tauri virus (OtV5), EU304328; O. tauri virus-1 (OtV-1), FN386611; Paramecium bursaria Chlorella virus (PBCV-1), AAC00532; Paramecium bursaria Chlorella virus NY-2A (PBCV NY-2A), AAA88827; Phaeocystis globosa virus 1 (PgV), ABD65727; Pyramimonas orientalis virus (PoV01), ABU23717; Vaccinia virus (VACV), A24878; and O. tauri virus-2 (OtV-2) (W.H. Wilson and K.D. Weynberg, unpublished).
Intein sequence motifs of the protein splicing domain were identified according to the method of Pietrokovski (1998b). These six motifs contained a total of 75 conserved positions. All were present in all inteins, except for motif N4, which was missing from or divergent in some inteins. The Drosophila melanogaster hedgehog protein HINT domain was used as an outgroup for the inteins.
The intein sequences used for phylogenetic analysis were all inserted into type B DNA polymerases. The organisms' scientific names, with intein names in parentheses, and database accession numbers (referring to the National Center for Biotechnology Information database) were as follows: D. melanogaster hedgehog HINT domain (HH DROME), Q02936; Haloarcula marismortui ATCC 43049 (Hma), YP_136425.1; Acanthamoeba polyphaga mimivirus (APMV), AAV50591; Chrysochromulina ericina virus (CeV), A7U6F1; Heterosigma akashiwo virus (HaV01), BAE06251; Methanococcus jannaschii (Mja Pol-1), Q58295; M. jannaschii (Mja Pol-2), Q58295; M. jannaschii (Mja TFIIB), F64397; Pyrococcus sp. GBD Pol (Psp-GBD Pol), AAA67132.1; Thermococcus aggregans (Tag Pol-1), CAA73475.1; T. aggregans (Tag Pol-2), CAA73475.1; T. aggregans (Tag Pol-3), CAA73475.1; Thermococcus fumicolans (Tfu Pol-1), P74918; T. fumicolans (Tfu Pol-2), P74918; T. hydrothermalis (Thy Pol-2), CAC18555.1; Thermococcus kodakaraensisKOD1 (Tko Pol-1), S71551; T. kodakaraensisKOD1 (Tko Pol-2), S71551; Thermococcus litoralis (Tli Pol-1), S42459; T. litoralis (Tli Pol-2), S42459; Thermococcus peptonophilus strain SM2 (Tpe Pol), E13953; Thermococcus sp. strain GE8 (Tsp-GE8 Pol-1), CAC12850.1; Thermococcus sp. strain GE8 (Tsp-GE8 Pol-2), Q9HH84; Thermococcus sp. strain GT (Tsp-GT Pol-2), ABD14869; and the OtV-1 intein presented here.
The DNA topoisomerase II sequences used for phylogenetic analysis were as follows: Paramecium bursaria chlorella virus AR158, YP_001498788.1; Paramecium bursaria chlorella virus NY-2A, YP_001497977.1; Acanthamoeba polyphaga mimivirus, YP_142834.1; Paramecium bursaria chlorella virus-1, NP_048939.1; Chlorella virus Marburg 1, AA495770.1; Paramecium bursaria chlorella virus MT325, ABT14100.1; Paramecium bursaria chlorella virus FR483, YP_001426181.1; Chilo iridescent virus, NP_149508.1; Chlamydomonas reinhardtii, XP_001700298.1; Emiliania huxleyi virus-86, YP_294202.1; African swine fever virus, P34203.1; Ostreococcus tauri, CAL56339; Ostreococcus lucimarinus, ABO99175; Phaeodactylum tricornutum, XP_002181228; Physcomitrella patens, XP_001769043.1; Thalassosira pseudonana, XP_002292682.1 and the OtV-1 DNA topoisomerase II sequence in this study.
This research was supported by a standard PhD studentship (ref. NER/S/A/2005/13204) awarded to W.H.W. and D.J.S. and small projects grant (MGF196) awarded to M.J.A. from the Natural Environment Research Council (NERC). We would like to acknowledge technical help from Margaret Hughes and Neil Hall at the NERC-funded Advanced Genomics Facility at the University of Liverpool.