Recombination and microdiversity in coastal marine cyanophages


*For corre spondence. E-mail; Tel. (+1) 401 2543311; Fax (+1) 401 2543310.


Genetic exchange is an important process in bacteriophage evolution. Here, we examine the role of homologous recombination in the divergence of closely related cyanophage isolates from natural marine populations. Four core-viral genes (coliphage T4 homologues g20, g23, g43 and a putative tail fibre gene) and four viral-encoded bacterial-derived genes (psbA, psbD, cobS and phoH) were analysed for 60 cyanophage isolates belonging to five Rhode Island Myovirus (RIM) strains. Phylogenetic analysis of the 60 concatenated sequences revealed well-resolved sequence clusters corresponding to the RIM strain designations. Viral isolates within a strain shared an average nucleotide identity of 99.3–99.8%. Nevertheless, extensive microdiversity was observed within each cyanophage strain; only three of the 60 isolates shared the same nucleotide haplotype. Microdiversity was generated by point mutations, homologous recombination within a strain, and intragenic recombination between RIM strains. Intragenic recombination events between distinct RIM strains were detected most often in host-derived photosystem II psbA and psbD genes, but were also identified in some major capsid protein g23 genes. Within a strain, more variability was observed at the psbA locus than at any of the other seven loci. Although most of the microdiversity within a strain was neutral, some amino acid substitutions were identified, and thus microdiversity within strains has the potential to influence the population dynamics of viral–host interactions.


Horizontal gene transfer and recombination are common evolutionary processes in microbial communities and contribute to generating the tremendous genetic diversity that exists within marine bacterial and viral communities (Hendrix et al., 1999; Fraser et al., 2007). Metagenomic studies of ocean environments reveal thousands of co-occurring viral types, many of which encode bacterial-derived genes important for cellular and metabolic functions (Breitbart et al., 2002; Dinsdale et al., 2008; Williamson et al., 2008). Viral communities thus appear to serve as reservoirs of genetic information that can be transferred to their hosts. In this manner, viruses have directly contributed to the genomic diversity, and evolution of their hosts (Ochman et al., 2000; Brussow et al., 2004). Likewise, viral diversity can in part be attributed to the exchange of genetic information between viruses and their hosts (Williamson et al., 2008). Over shorter time scales, viral–host interactions can generate and maintain diversity within host and viral populations by exerting selective pressure on the host for viral resistance and on the virus to overcome host resistance (Andersson and Banfield, 2008; Wilmes et al., 2009).

Marine cyanophages capable of infecting cyanobacteria Prochlorococcus and Synechococcus are frequently found in marine metagenomic libraries (Angly et al., 2006; Bench et al., 2007; Williamson et al., 2008) and several cyanophage genomes have been fully sequenced (Mann et al., 2005; Sullivan et al., 2005; Weigele et al., 2007). These genomes encode numerous bacterial-derived genes, including host-derived genes for photosynthesis, which are transcribed and translated during viral infection (Lindell et al., 2005; Clokie et al., 2006; Sharon et al., 2007). There is evidence that the photosynthesis genes are continuing to be exchanged via homologous recombination among phages and perhaps between phages and their hosts (Lindell et al., 2004; Zeidner et al., 2005; Sullivan et al., 2006; Sharon et al., 2007). The diversity of cyanophages has typically been assessed using single genetic markers such as the viral-encoded host-derived photosystem II psbA gene (Sullivan et al., 2006; Sharon et al., 2007; Chenard and Suttle, 2008; Sandaa et al., 2008) or the core-viral g20 portal protein gene (Zhong et al., 2002; Marston and Sallee, 2003; Short and Suttle, 2005; Sullivan et al., 2008). These markers reveal extensive genetic diversity among cyanophage isolates and within natural cyanophage communities.

In Narragansett Bay, Rhode Island, cyanomyophages that infect Synechococcus spp. are abundant and diverse (Marston and Sallee, 2003). Over 55 cyanomyoviral strains (designated Rhode Island Myovirus or RIM strains) with distinct g20 portal gene sequences (> 1.0% sequence divergence) have been characterized (Marston, 2008). Cyanophages with identical g20 gene sequences can be repeatedly isolated from Narragansett Bay water samples collected at different locations in the Bay and in water samples collected years apart (Marston, 2008). While g20 sequences have been useful in characterizing the overall diversity of cyanomyoviral communities, they cannot be used to predict the outcomes of viral–host interactions (Stoddard et al., 2007) or host range (Sullivan et al., 2008). Indeed, it is not known if g20 sequences are indicative of sequence similarities throughout the rest of the genome. If recombination occurs frequently within viral communities, isolates with the same g20 gene sequence may carry divergent alleles at other loci.

In this study, we used a multilocus sequence analysis approach to examine the divergence and extent of recombination among closely related cyanophages isolated from Rhode Island's coastal waters. Four core-viral genes (coliphage T4 homologues g20, portal protein; g43, DNA polymerase; g23, major capsid protein; and cyanophage Syn9_g101, putative tail fibre protein) and four viral-encoded bacterial-derived genes (psbA, photosystem II D1 protein; psbD, photosystem II D2 protein; cobS, a putative porphyrin biosynthetic protein; and phoH, a phosphate-starvation inducible protein) were analysed from 60 cyanophage isolates belonging to five RIM strains. These five strains co-occur in Rhode Island coastal waters, have broad overlapping host ranges, have been found in water samples collected years apart, and often account for up to 20% of the isolates in a sample (Marston and Sallee, 2003; Marston, 2008). We examined the patterns of gene diversity at each locus and tested for evidence of recombination within and between strains.


Primer design and sequencing

For this study, primer sets were designed for segments of three core-viral genes (coliphage T4 homologues g23, g43 and cyanophage Syn9_g101) and two bacterial-derived genes (cobS and phoH) (Table 1). Sets of primers for g23, g43, cobS and phoH amplified genes from all 60 RIM isolates included in this study. The PCR primers that were developed to target a ∼900 nt segment of the cyanophage Syn9_g101 putative tail fibre gene worked for all isolates belonging to the RIM2, RIM24, RIM49 and RIM50 strains, but did not work for any of the RIM17 isolates. This is not unexpected since homologous Syn9_g101 regions are not observed in all of the sequenced cyanophage genomes and tail fibre sequences can be highly variable (Desplats and Krisch, 2003). Segments of the g20, psbA and psbD genes were obtained from all isolates using previously published primer sets (Table 1). However, when these same primers were used in the sequencing reactions, we did not obtain a readable sequence for the psbA gene of RIM24_R1_1201 and we were only able to directly sequence the psbA region of RIM50 isolates in the forward direction. To determine if single nucleotide polymorphisms (SNPs) observed in gene sequences could have been due to sequencing error, the original viral lysates from isolates containing SNPs were used in new PCR amplifications and the products were sequenced. We reanalysed approximately 10% of the 475 isolate/gene combinations and in all cases the SNPs were present in both the original and second sequences, indicating that they were not due to sequencing error.

Table 1.  Gene function, origin and primer sequences.
GeneFunctionOriginPrimer sequencesAmplicon sizea (bp)Reference
  • a. 

    Amplicon size in cyanophage Syn 9 (DQ149023) and RIM49 isolates. Includes primer sequences.

g20Portal proteinCore ViralFor: 5′-GTAGWATWTTYTAYATTGAYGTWGG-3′
594Sullivan et al. (2006)
g23Major capsid proteinCore ViralFor: 5′-ACWGGWCTKATYTTCGCAATG-3′
568This study
g43DNA polymeraseCore ViralFor: 5′-GCWGGTGCWTATGTHAARGAACC-3′
476This study
Syn9_g101Putative tail fibreCore ViralFor: 5′-GGTGGTAMATTAACTSTTGATACTG-3′
993This study
psbAPhotosystem II D1 proteinCyanobacteriaFor: 5′-GTNGAYATHGAYGGNATHMGNGARCC-3′
844Zeidner et al. (2003)
psbDPhotosystem II D2 proteinCyanobacteriaFor a: 5′-TTYGTNTTYRTNGGNTGGAGYGG-3′
827Sullivan et al. (2006)
cobSPutative porphyrinBacteriaFor: 5′-BACYGTWTGGCACAAYGG-3′523This study
 biosynthetic protein Rev: 5′-CTTRGTNTCMTCATCRAARCG-3′  
phoHPhosphate-starvationBacteriaFor: 5′-GARATYGGDTTCYTDCCTGG-3′425This study
 inducible protein Rev: 5′-ACWARWCCAGADCKWACRATRTC-3′  

Genetic diversity

Isolates were initially assigned to an RIM strain based on g20 portal protein nucleotide sequences. Isolates belonging to the same RIM strain had < 0.5% nucleotide sequence divergence in g20 genes (Table 2), while the g20 sequence divergence between strains ranged from over 1.0–50% (Marston and Sallee, 2003). The mean nucleotide sequence divergence was also low (< 1.0%) among isolates of the same strain for the other three core-viral genes (g23, g43 and Syn9_g101) as well as two of the bacterial-derived genes (cobS and phoH) (Table 2). In contrast to these six loci, more genetic variability was observed at the host-derived photosystem II psbA and psbD loci. At these loci, the mean sequence divergence within a strain ranged from 0.08% to over 3.0% (Table 2). The psbA gene is particularly variable, with pairwise sequence divergences between some RIM2 isolates ranging up to 14%. The diversity present at the psbD locus is not as extensive as that observed at the psbA locus, but among isolates of RIM17 and RIM50, psbD pairwise sequence divergences of over 4.0% were observed (Table 2). No insertions/deletions (indels) were observed in any of the alleles within a strain; although indels were present in gene alignments containing different strains.

Table 2.  Divergence of four core-viral and four bacterial-derived genes among isolates belonging to five Rhode Island Myovirus (RIM) marine cyanophage strains.
Viral strainsCore–viral genesViral-encoded bacterial-derived genesConcatenated
  • a. 

    n, number of isolates belonging to each strain included in the analysis.

  • b. 

    Mean nucleotide sequence divergence of all pairwise comparisons among isolates.

  • c. 

    Maximum pairwise sequence divergence observed between any two isolates belonging to that strain.

  • d. 

    Sequence information not available.

  • e. 

    Range of mean nucleotide sequence divergences between strains.

RIM2 (n = 34)a
 Mean seq. divergenceb (%)0.030.610.
 Max. divergencec (%)0.374.140.460.2414.10.260.410.792.45
 Variable/total sites (nt)2/54323/5313/4382/840153/7894/7803/4835/381195/4785
RIM17 (n = 5)
 Mean seq. divergence (%)0.000.480.00nad0.821.790.000.940.67
 Max. divergence (%)0.000.920.00 1.274.490.001.571.11
 Variable/total sites (nt)0/5465/5430/432 13/78935/7800/4986/38159/3969
RIM24 (n = 5)
 Mean seq. divergence (%)0.000.450.090.000.360.
 Max. divergence (%)0.000.760.230.000.510.640.210.000.25
 Variable/total sites (nt)0/5434/5281/4380/8915/7895/7801/4830/38116/4833
RIM49/Syn9 (n = 8)
 Mean seq. divergence (%)
 Max. divergence (%)
 Variable/total sites (nt)0/5430/5250/4385/86139/7893/7800/4830/38147/4800
RIM50 (n = 9)
 Mean seq. divergence (%)0.100.370.000.271.961.280.510.000.67
 Max. divergence (%)
 Variable/total sites (nt)1/5436/5400/4387/89746/75045/78011/4830/381116/4812
Combined (n = 61)
 Divergence btw strainse (%)15.8–54.517.5–59.515.9–31.336.7–57.212.5–17.912.82–20.6420.9–45.216.8–47.321.4–35.2
 Variable/total sites (nt)253/546295/549177/444484/852224/789241/780223/498174/3812065/4899

Despite the high levels of divergence observed in the psbA and psbD genes, using the concatenated sequences of all eight genes (seven genes for RIM17 isolates), the overall mean sequence divergence of isolates belonging to the same strain was < 0.70% (Table 2). In contrast, isolates belonging to different RIM strains had sequence divergences of over 21% (Table 2) and shared no alleles in common at these eight loci. A phylogeny of the 61 isolates revealed well-resolved clusters that correspond with our strain designations (Fig. 1A). Thus g20 sequences can reliably be used to assign isolates to a strain. Nevertheless, within a strain, extensive microdiversity was observed (Fig. 1B). Each of the 34 RIM2 isolates had a unique nucleotide haplotype, differing from other RIM2 isolates by 1–117 nucleotides out of the 4785 nucleotides in the concatenated alignment (Table 3). All isolates belonging to RIM17, RIM24, or RIM49 strains also had unique nucleotide haplotypes. Only in the RIM50 strain were isolates with identical nucleotide haplotypes across these eight loci observed.

Figure 1.

Phylogenetic trees constructed by minimum evolution using the concatenated nucleotide sequence alignment of four core-viral (g20, g23, g43 and Syn9_g101) and four bacterial-derived (psbA, psbD, cobS and phoH) genes. A total of 4899 nucleotide positions were included in the analysis. Bootstrap values higher than 80% are indicated at the nodes.
A. Radial tree showing well-resolved clusters among the 60 RIM isolates and cyanophage Syn9. Each cluster corresponds to one of the five RIM strains. Syn9 clusters with RIM49 isolates.
B. Radial tree of the 34 RIM2 isolates. Extensive microdiversity is present within a strain. Each RIM2 isolate has a unique nucleotide haplotype. The five RIM2 strains with detectable intragenic recombination events (Table 4) are marked with asterisks.

Table 3.  Allelic profiles and haplotypes of the 34 isolates belonging to the RIM2 strain.
RIM2 isolateAllelic profilea (nt)Haplotypesb
  • a. 

    Alleles were arbitrarily numbered starting with 1. Alleles in bold represent those that lead to a change in amino acid sequence.

  • b. 

    Nucleotide and amino acid haplotypes based on concatenated sequences.


Allelic divergence and intragenic recombination

Within a strain, multiple alleles were frequently observed at each of the eight loci (Table 3). In general, two patterns of allelic divergence within a strain were observed. The first pattern was the most common, with alleles at a locus differing from one another by 1–6 isolated SNPs. The second pattern was the presence of multiple polymorphisms (i.e. 5–80 SNPs) clustered together and flanked by highly conserved sequences on either side (Fig. 2). This second pattern was observed most often in the psbA and psbD nucleotide alignments of isolates belonging to the same strain, and we also observed it in the g23 gene of RIM2 isolates and the cobS gene of RIM50 isolates. The second pattern is consistent with homologous recombination between strains which can lead to clustered polymorphisms when short segments of a gene are replaced with a homologous segment from a related but distinct strain (Casjens, 2005).

Figure 2.

Pattern of single nucleotide polymorphisms (SNPs) in a psbA nucleotide sequence alignment indicative of an intragenic recombination event. Grey horizontal lines represent the 759 basepair psbA nucleotide alignment. Black bars indicate SNPs relative to the RIM2_R37_907 sequence. The psbA alleles of RIM2_R37_907 and RIM2_R9_906 differ by 72 SNPs in the middle of the gene (underlined in black). The psbA sequence of RIM2_R9_906 in this region is identical to the psbA sequence of RIM47_R312_1207, suggestive of intragenic homologous recombination. This recombination event was detected by multiple methods in RPD3 (Table 4). RIM2_R37_907 and RIM2_R9_906 have identical nucleotide sequences at the beginning and end of the sequenced psbA region and also have identical nucleotide sequences at each of the other seven loci included in this study. Asterisks indicate SNPs that lead to amino acid changes.

To test if the clustered regions of polymorphisms were due to homologous recombination among distinct (> 1% sequence divergence) co-occurring RIM strains, we used multiple statistical methods (Martin et al., 2005a) to identify recombination events between RIM cyanophages isolated from Narragansett Bay. For this study we focused on recombination events that contributed to the microdiversity observed within a strain; putative recombination events that were shared by all isolates within a strain are not reported here. We detected intragenic recombination between RIM strains in psbA, psbD and g23 genes. In all, seven independent intragenic recombination events were identified by multiple methods implemented in RDP3 (Table 4). It is likely that we underestimated the number of recombination events between strains. For example, short regions of heterogeneity (e.g. clusters of 5–10 SNPs) were observed among some isolates of a strain; however, the ability to detect recombination depends on the number of polymorphic sites (Posada et al., 2002) and the statistical methods implemented in the RDP3 program did not detect recombination in any of these regions.

Table 4.  Intragenic recombination events.
Gene locusIsolateaPotential donorBreakpointsbDetection methodsc
  • a. 

    Isolates containing the same or similar recombination events are grouped together.

  • b. 

    Numbers represent nucleotide positions in the sequence alignment.

  • c. 

    Methods as implemented in the RDP3Beta34 program (Martin et al., 2005a).

g23RIM2_R3_906RIM12_R42_101200–318Maxchi, Chimaera, SiScan
psbARIM2_R1_999RIM24_R10_107181–753GENECONV, Bootscan, Maxchi, Chimaera, SiScan
psbARIM2_R9_906RIM47_R312_1207112–639RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiScan
psbARIM49_N9_907unknown513–753GENECONV, Bootscan, Maxchi, Chimaera, SiScan
psbARIM50_R5_704RIM15_R1_301588–739RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiScan
psbDRIM17_R501_906RIM6_R12_100621–753GENECONV, Bootscan, Maxchi, Chimaera, SiScan
psbDRIM50_R107_107unknown467–777GENECONV, Bootscan, Maxchi, Chimaera, SiScan

The majority (> 90%) of the SNPs within a strain did not result in amino acid changes. Nevertheless, the few SNPs that did lead to amino acid changes were often in genes that could potentially influence viral–host interactions (i.e. the major capsid g23 gene, the putative tail fibre Syn9_g101 gene, and the photosystem II psbA gene) (Table 3). Both point mutations and intragenic recombination led to changes in amino acid sequences. For instance, in the major capsid protein, the different amino acid haplotypes in RIM2 isolates were due to intragenic recombination between RIM2 and RIM12 (Table 4), while in RIM17 and RIM50 strains the different major capsid amino acid haplotypes were the result of isolated SNPs. Although we did observe differences in host range among isolates of the same strain, these host range differences did not correlate with the amino acid changes observed in the region of the putative tail fibre gene we sequenced. We are continuing to examine other regions of tail fibre genes to identify SNPs that might be responsible for host-range changes within a strain.

Homologous recombination within a strain

In addition to homologous intragenic recombination events between distinct strains, it is also possible that homologous recombination could occur within a strain (e.g. between two RIM2 isolates). A method commonly used in multilocus sequence typing (MLST) studies of bacterial populations was used to assess recombination within a strain (Maynard Smith et al., 1993; Maiden, 2006). Multilocus linkage disequilibrium was analysed using IA (the index of association) based on the allelic profile of the eight genes (Table 3) (Maynard Smith et al., 1993). Each RIM strain was analysed independently because there were no alleles shared in common among isolates of different strains. If isolates within a strain diversify primarily by the accumulation of mutations, then two closely related isolates should share many identical alleles at different loci. This leads to a strong association among alleles or linkage disequilibrium. If recombination occurs frequently, the alleles at different loci will be randomly associated and the expected value of IA is zero (Maynard Smith et al., 1993). For each of the five RIM strains, the IA value was not significantly different than zero, indicating that there is no evidence of linkage disequilibrium. The same results were obtained even when psbA alleles were excluded from the analyses. These results suggest that recombination occurs among isolates belonging to the same RIM strain and has led to the random assortment of alleles within a strain.

Genome size variation

To determine if non-homologous recombination might contribute to microdiversity within a strain, pulsed-field gel electrophoresis was used to estimate the genome size of selected isolates belonging to each strain. Within the limits of this assay (∼5 kb), no differences in genome size were observed among any of the isolates belonging to the same strain. The genome sizes of isolates of each strain were: 172 kb for RIM2, 240 kb for RIM17, 176 kb for RIM24 and RIM50, and 178 kb for RIM49.


Cyanophage genome evolution has been shaped both by non-homologous recombination events, which has led to differences in genome size and gene content (Mann et al., 2005; Sullivan et al., 2005; Weigele et al., 2007), and by homologous recombination of gene segments among phages and between phages and their host (Zeidner et al., 2005; Sullivan et al., 2006). Here we show that homologous recombination is also important in generating and maintaining microdiversity among closely related cyanophage isolates, but does not obscure the assignment of isolates to discrete strains. We initially assigned cyanophage isolates to an RIM strain based on g20 nucleotide sequences where the mean nucleotide sequence divergence of isolates belonging to the same strain was < 0.5% and the divergences between strains ranged from over 1.0–50% (Marston and Sallee, 2003). If recombination between distantly related cyanophage strains occurs frequently, we expected that isolates with identical or nearly identical g20 portal genes would carry divergent alleles at all or most of the other loci. This is not what we observed. Rather, sequence similarities at the g20 locus were largely indicative of sequence similarity at the other seven loci, despite the fact that these loci are dispersed across the genome and consist of both core-viral and bacterial-derived genes. Combining the sequences of all eight genes, isolates of the same RIM strain shared a mean nucleotide identity of 99.3–99.8% and phylogenetic analysis of concatenated sequences revealed well-resolved sequence clusters corresponding to our ‘strain’ designations. In fact, one of our isolates, RIM49_R12_907, was nearly identical to cyanophage Syn9 at these eight loci, differing by only 2 out of 4800 nucleotides. Syn9 was isolated from Woods Hole Harbor, MA, USA 17 years before RIM49_R12_907 was isolated (Sullivan et al., 2003; Weigele et al., 2007). We also identified RIM50 isolates in water samples collected 4 years apart that had identical nucleotide sequences at all eight loci. These results suggest that there can be stability in cyanophage genotypes over time.

Despite well-resolved sequence clusters, extensive microdiversity was observed among closely related isolates within RIM strains. This microdiversity appears to be generated by point mutations, homologous recombination among isolates within a strain, and less frequently by intragenic recombination between RIM strains. The cyanophages included in this study are T4-like bacteriophages. Homologous recombination occurs frequently in T4 and usually involves single genes or segments of genes, rather than larger modular cassettes (Mosig, 1998; Mosig et al., 2001; Miller et al., 2003). Homologues of the key T4 proteins involved in homologous recombination have been found in cyanophage genomes (Mann et al., 2005; Sullivan et al., 2005; Weigele et al., 2007), leading Mann and colleagues (2005) to suggest that recombination may also be important in the life cycle of cyanophages.

One source of microdiversity within a cyanophage strain appears to be homologous recombination among isolates of the same strain. Although homologous recombination among closely related isolates (< 1.0% nucleotide sequence divergence) can be difficult to detect (Posada et al., 2002), one method to assess recombination in this case is to look for the presence of linkage disequilibrium (Maynard Smith et al., 1993; Spratt et al., 2001). In microbial populations, low rates of recombination can lead to a clonal population structure where there is a non-random association between alleles at different loci (linkage disequilibrium) (Spratt et al., 2001; Fraser et al., 2007). When we tested for evidence of linkage disequilibrium within each of the five RIM strains, none was found. Although multiple isolates within a strain often share the same allele at a given locus, none of the isolates belonging to the RIM2, RIM17, RIM24 and RIM49 strains share exactly the same combination of alleles across the eight loci analysed in this study. Each isolate has a unique nucleotide haplotype. These results suggest that recombination among closely related isolates of the same RIM strain has led to the shuffling of genome segments and the random association of alleles at different loci. An alternate explanation is that back mutations or independent acquisitions of exactly the same mutation have led to the appearance of identical alleles in different clonal lineages. However, it is unlikely that back mutation or independent mutations alone would lead to the observed random assortment of alleles within a strain (Spratt et al., 2001). The rate of homologous recombination in bacteriophage and microbial populations increases as sequence divergence between recipient and donor genomes decreases, making recombination more likely among closely related individuals (Hendrix et al., 2003; Fraser et al., 2007; 2009). It has been suggested that homologous recombination among closely related isolates allows for novel sequence variants to be distributed relatively quickly through a population and creates new combinations of alleles on which selection can act (Hendrix, 2002; Wilmes et al., 2009).

Intragenic recombination between distantly related RIM strains was also responsible for creating some of the diversity observed within RIM strains. These events resulted in short highly polymorphic segments in a few g23, psbA and psbD alleles. In most cases, the donor of the replacement sequence could be identified as one of the other co-occurring RIM strains, with the caveat that the sequence could have been obtained independently by both isolates from a third source not included in the analysis, for instance from another phage, prophage, or host genome. Intragenic recombination among cyanophages and between cyanophages and their hosts has previously been reported for photosystem II psbA and psbD genes (Lindell et al., 2005; Zeidner et al., 2005; Sullivan et al., 2006; Sharon et al., 2007) and in this study, six of the seven identified intragenic recombination events occurred at the psbA and psbD loci. In one case, two RIM2 isolates with identical nucleotide sequences at each of the other seven loci had psbA alleles that differed by 72 nucleotides due to an intragenic recombination event (Fig. 2). Interestingly, most of the nucleotide diversity observed at the psbA and psbD loci was neutral and did not change the amino acid sequence. This is suggestive of strong, purifying selection which has previously been reported (Lindell et al., 2004; Zeidner et al., 2005). Overall, these results suggest that intragenic homologous recombination between RIM strains occurs more often at the psbA and psbD loci, but can also occur at other locations in the genome.

The acquisition of new genetic material via non-homologous (illegitimate) recombination has facilitated the long-term adaptation of bacteriophages to new hosts and environments (Desplats and Krisch, 2003) and is a prominent feature of bacteriophage evolution (Hendrix et al., 1999; 2003). In some marine bacterial populations, even closely related isolates with < 1% divergence of rRNA genes can exhibit large differences in genome size due to the insertions and deletions of large DNA fragments (Thompson et al., 2005). In this study, we did not detect any genome size differences among isolates belonging to the same strain, although non-homologous recombination has likely contributed to the observed differences in the genome sizes among RIM strains (ranging from 172 kb for RIM2 isolates to 240 kb for RIM17 isolates).

Microdiversity among closely related viral isolates may be important in determining the outcome of viral–host interactions and could influence overall rates of infection and viral-induced host mortality (Wilmes et al., 2009). We have shown that cyanobacteria can readily be selected for resistance to cyanophage isolates (Stoddard et al., 2007) and that cyanophages capable of overcoming host resistance can be isolated (Marston, 2008). Other experimental studies using T4 bacteriophages found that host specificities could be altered either by homologous recombination of short segments of the tail fibre gene or by point mutations leading to single amino acid changes in the tail fibre gene (Tetart et al., 1998). In this study, we observed single point mutations in the putative tail fibre gene that led to amino acid changes in isolates of RIM2, RIM49 and RIM50 strains, although these changes did not correlate with changes in host range. Amino acid substitutions within strains were also observed several times in g23, psbA and psbD genes. The functional significance of these changes is not known. Further studies will be necessary to determine how the extensive microdiversity present within a cyanophage strain influences the overall population dynamics of viral–host interactions.

Experimental procedures

Phage isolates

The 60 cyanophage isolates used in this study were obtained from surface seawater samples collected between 1999 and 2008 from one of three locations along the coast of Rhode Island, USA: Roger Williams University, Bristol, RI; Colt State Park, Bristol, RI; or Brenton Point State Park, Newport, RI. Cyanophages were isolated via extinction-dilution enrichment using four marine strains of Synechococcus spp., WH 7803, WH 8012, WH 8018 and WH 8101 as previously described (Marston and Sallee, 2003; Marston, 2008). Following plaque-purification, the isolates were characterized using PCR-RFLP profiles of the g20 portal protein gene (Marston and Sallee, 2003). Assignments of isolates to an RIM strain were based initially on banding pattern similarities of the g20 gene and later confirmed by sequence analysis of a segment of the g20 gene as previously described (Marston and Sallee, 2003). After being assigned to an RIM strain, the isolates were given a designation based on the location from which they were isolated (R, RWU; C, Colt State Park; N, Newport), the number of the extinction dilution well from which they were obtained, and the month followed by the last two numbers of the year of their isolation (i.e, RIM2_R18_908). Host ranges of isolates were determined using Synechococcus strains WH 7803, WH 8012, WH 8018, WH 8101 and WH 8113 as previously described (Marston and Sallee, 2003). Cyanophage Syn 9 is genetically similar to RIM49 isolates and thus was also included in the analyses. Syn9 was isolated from Wood Hole Harbor, MA, USA in 1990 (Sullivan et al., 2008) and its genome has been completely sequenced (Weigele et al., 2007).

Primer design, PCR amplification and sequencing

Using the four full-length cyanophage genomes publicly available in 2007 (Sullivan et al., 2003; Mann et al., 2005; Weigele et al., 2007), PCR primers were designed for three core-viral genes (coliphage T4 homologues g23, major capsid protein and g43, DNA polymerase; as well as cyanophage Syn9_g101, a putative tail fibre protein) and two bacterial-derived genes often found in cyanophage genomes (cobS, a putative porphyrin biosynthetic protein and phoH, a phosphate-starvation inducible protein) (Table 1). PCR primer sequences were obtained from the literature for the viral g20 portal protein gene and two host-derived genes involved in photosynthesis: psbA, photosystem II D1 gene and psbD, photosystem II D2 gene (Table 1) (Zeidner et al., 2003; Sullivan et al., 2006). For the g23, g43, Syn9_g101, cobS and phoH genes, the reaction mix contained Taq reaction buffer (Invitrogen), 1.9 mM MgCl2, 100 μM dNTPs (Invitrogen), 0.1 μM of each primer, 1.25 units Taq DNA polymerase (Invitrogen) and 1.5 μl of viral lysate as the DNA template in a 50 μl reaction volume. For each set of reactions, a control sample containing all reagents but lacking template DNA was included. For all PCR amplifications, the following cycling parameters were used: initial denaturation for 3 min at 94°C, 34 cycles of 94°C for 45 s, 50–60°C for 1 min, 72°C for 1 min, and a final 10 min extension at 72°C. The annealing temperatures for the g23, g43, Syn9_g101, cobS and phoH primer sets fell between 50°C and 60°C; however, the optimal annealing temperatures varied with the specific viral isolate and primer set and were empirically determined for each viral isolate/primer set pair. For some pairs, the annealing temperature was raised to eliminate non-specific bands, while for other pairs it was necessary to lower the annealing temperature to obtain a PCR product. For the g20, psbA and psbD genes, PCR reaction mixes and cycling parameters were followed as previously described (Sullivan et al., 2006). PCR products from 2 or 3 replicate reactions were pooled and then purified using a MoBio Ultraclean PCR clean-up kit. Purified PCR products were directly sequenced in both directions using the same primers as in the initial reaction. When SNPs were observed in some isolates, the results were confirmed by going back to the original viral lysate, amplifying the gene using two separate PCR reactions, combining the products, and sequencing the gene. For 12 of the RIM2 isolates containing SNPs, an additional plaque purification step was done and the new viral lysates were then used in PCR amplifications as described above.

Sequence analysis

Forward and reverse sequences were assembled and manually corrected using the sequence chromatograms. Primer sequences were removed from all sequences. Sequences were aligned using MEGA version 4 (Tamura et al., 2007). Nucleotide sequence alignments were based on amino acid alignments of the same region. The sequences of all eight genes were concatenated to produce an in-frame sequence. The nucleotide alignment of concatenated sequences was used to construct phylogenetic trees using minimum evolution as implemented in MEGA4. To compare relative support of the branches, 500 bootstrap replications were performed in MEGA4. Sequence divergences were calculated using p distances for comparisons of isolates within a strain and a K2P model of DNA substitution for comparisons among strains using the pairwise deletion option in MEGA4. Nucleotide diversity indexes as well as the numbers of synonymous and non-synonymous substitutions were calculated using DNAsp v4.9 (Rozas et al., 2003).

Intragenic recombination was investigated by aligning genes from isolates included in this study with genes from other RIM isolates (i.e. RIM1–RIM50) (Marston and Sallee, 2003; Marston, 2008). Single-nucleotide polymorphism patterns among isolates were visually inspected for potential intragenic homologous recombination events. Multiple approaches as implemented in the RDP3Beta34 program (Martin et al., 2005a) were then used to statistically confirm recombination events. Multiple methods were used to overcome the various limitations of individual methods. Different methods vary in their ability to detect recombination events depending on the amount of recombination and the levels of divergence among individuals (Posada et al., 2002). The following algorithms in RDP3 were used at the default settings: rdp (Martin and Rybicki, 2000), GENECONV (Padidam et al., 1999), Chimaera (Posada and Crandall, 2001), MaxChi (Maynard Smith, 1992), BootScan (Martin et al., 2005b) and SiScan (Gibbs et al., 2000). We only considered recombination events were one of the parents belonged to the same strain as the putative recombinant. Only recombination events detected by three methods with a statistical significance of P = 0.01 after Bonferroni correction for multiple comparisons are reported. For the RDP3 analyses, each gene (i.e. psbA, psbD, etc.) with potential recombination events was considered separately.

Following a scheme commonly used in multilocus sequence typing studies of bacteria, for each gene, an allele number was arbitrarily assigned to each sequence variant (Maiden, 2006). By combining the allele numbers at each of the eight loci, each isolate was given an allelic profile, which corresponds to that isolate's nucleotide haplotype. Isolates with the same haplotype have identical sequences at all eight loci. Multilocus linkage disequilibrium was assessed using these allelic profiles. The index of association (IA) (Maynard Smith et al., 1993) was calculated for each strain. The statistical significance of the value was based on a comparison of values estimated from 1000 randomized datasets under a null hypothesis of no linkage disequilibrium.

Pulsed-field gel electrophoresis (PFGE)

Before plugs were cast for PFGE, new viral lysates were obtained by incubating cyanophage isolates on host cells. The lysates were passed through 0.45 μm filters (Fisher Scientific) and then concentrated using 100 kDa Amicon filters (Fisher Scientific). The concentrated lysates were cast into plug molds using equal volumes of molten InCert agarose (Bio-Rad) and lysate as previously described (Wommack et al., 1999). Following an overnight digestion with Proteinase K in 1% SDS – 250 mM EDTA, plugs were washed several times with a 10 mM Trizma – 1 mM EDTA solution and stored at 4° in a 20 mM Trizma – 50 mM EDTA solution (Wommack et al., 1999). The samples were run on a 1% PFGE certified agarose gel in 0.5× TBE gel buffer using a Bio-Rad DR-II contour-clamped homogeneous electric field cell (Bio-Rad) electrophoresis unit with a 2–20 s switch time for 22 h at 14°C. Molecular size standards (New England Bio Laboratories) were included on the gels. Gels were visualized using 1× SYBR Gold stain (Invitrogen) and band sizes were analysed using a Kodak Gel-Logic System.

Nucleotide sequence accession numbers

The sequence data have been submitted to the DDBJ/EMBL/GenBank databases under accession numbers FJ874674FJ874733 and GQ283909GQ284322.


This research was supported by NSF Biological Oceanography grant OCE-0314523, by the RWU Foundation to Promote Research, and by RI-INBRE Grant No. P20RR016457 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH). The contents are solely the responsibility of the authors and do not necessarily represent the official views of NCRR or NIH. We thank J. Bliss, S. Taylor, D. Perley, H. Kunkel, R. Baxley and C. Emmons for assistance in the laboratory, D. Rand for helpful discussions, and J. Martiny for comments on a previous version of the manuscript.