Rapid spread of male-killing Wolbachia in the butterfly Hypolimnas bolina


Anne Duplouy, School of Biological Sciences, The University of Queensland, 4072 Brisbane, Qld, Australia. Tel.: +617 336 52471; fax: +617 336 51655; e-mail: uqaduplo@uq.edu.au


Reproductive parasites such as Wolbachia can spread through uninfected host populations by increasing the relative fitness of the infected maternal lineage. However, empirical estimates of how fast this process occurs are limited. Here we use nucleotide sequences of male-killing Wolbachia bacteria and co-inherited mitochondria to address this issue in the island butterfly Hypolimnas bolina. We show that infected specimens scattered throughout the species range harbour the same Wolbachia and mitochondrial DNA as inferred from 6337 bp of the bacterial genome and 2985 bp of the mitochondrial genome, suggesting this strain of Wolbachia has spread across the South Pacific Islands at most 3000 years ago, and probably much more recently.


Maternally inherited symbionts have evolved a number of strategies to invade host populations, often involving the distortion of sex ratio toward the female sex. Whilst mathematical models allow us to predict the speed of spread of sex ratio distorters through host populations, empirical estimates of the rate of spread are limited. Here we use sequence data of male-killing Wolbachia bacteria and co-inherited mitochondria to address this issue in the butterfly Hypolimnas bolina. Wolbachia infections in this species have attracted recent attention for several reasons. First, the bacteria can achieve extremely high prevalence, producing strongly female-biased population sex ratios and disturbing the host mating system (Charlat et al., 2007a). Second, the prevalence strongly varies among island populations within the South-Pacific region (Charlat et al., 2005, 2006), making this system ideal to investigate the underlying causes, and assess the consequences, of this variation through a comparative approach. Third, the spread of a host suppressor of male-killing activity has been observed in some populations, suggesting a dynamic evolutionary arms race is at play (Hornett et al., 2006; Charlat et al., 2007b).

Recent studies based on sequencing of the CO1 mtDNA locus have suggested that one particular male-killing strain, namely wBol1a, has recently spread in this species in spite of the highly fragmented habitat within the South-Pacific region. The mitochondria associated with the wBol1a infection show absolutely no variation at the CO1 locus (Charlat et al., 2009). In addition, this CO1 mitotype is highly divergent from the other Hypolimnas bolina mitotypes, and groups in phylogenies with mitotypes obtained from other species from the same genus, suggesting the infection was recently introduced through hybridization and introgression (Charlat et al., 2009).

A nearly perfect linkage between the wBol1a infection and this divergent mitochondrial haplotype was observed in all populations surveyed, suggesting very efficient vertical transmission of this infection (Charlat et al., 2009). However, the wBol1a strain shows strong geographic variation in its prevalence, with neighbouring islands harbouring complete absence to near fixation of this infection (Charlat et al., 2005). One possible explanation for this pattern is that each island is at a different time step of a highly dynamic process. Alternatively, different subtypes of wBol1a might have followed different evolutionary trajectories in the different island populations, resulting in different prevalence equilibria.

In order to test this hypothesis, and better assess the speed and geographical route of the wBol1a spread, we sequenced 2985 bp of the mitochondrial genome associated with wBol1a, and 6337 bp of the bacterial genome from specimens collected throughout the species range. Surprisingly, not a single polymorphic site was detected in either the mtDNA or bacterial loci. These results suggest that the variation in wBol1a prevalence among islands likely reflects different stages of infection spread rather than evolved differences in Wolbachia biology that would produce differences in equilibrium prevalence. They also suggest that the spread of wBol1a has occurred within the last 3000 years, despite the high level of fragmentation of its habitat in the South Pacific region.

Materials and methods


The geographical origin of the butterflies used in the present study is given in Table 1. Specimens from the South Pacific region were taken from previous field collecting events (Charlat et al., 2005; Hornett et al., 2006) while commercial breeders from UK provided the specimens from South East Asia. Infection status was determined by PCR assays as previously described (Charlat et al., 2009). In brief, the presence of wBol1 infection was first assessed using the 81F/522R wsp primer pair (Zhou et al., 1998). Positive specimens were then subject to amplification of the Wolbachia phage locus Gp1b, where only wBol1a sub-strain yields a large amplicon (> 1000 bp). A subset of the wBol1a specimens was further investigated.

Table 1.   Geographic origin of the butterfly specimens for each locus sequenced in the present study.
 LocusGeographic origin
PhilippinesThailandEfate (43%)Savaii (100%)Aneytium (24%)Ile des Pins (83%)Moorea (83%)Tahiti (96%)Rurutu (69%)Total
  1. All specimens are infected with Wolbachia strain wBol1a. The prevalence of the male-killing Wolbachia (wBol1a + wBol1b) is given in parenthesis when available (Charlat et al., 2006).


Host primers

Conserved mitochondrial primers were chosen from the literature, a subset of which produced successful PCR amplification and sequencing, covering a total of 2985 bp (Table S1). Host sequences of the Wingless locus were obtained following amplification with primers Wgl_F and Wgl_R2 (Brower, 2000). All Wingless sequences were identical (GenBank accession number FJ842515) consistent with breeding and morphological evidence that the specimens belong to a single biological species (Charlat et al., 2009).

Bacterial primers

Three complementary approaches were used to design primers for the amplification of Wolbachia DNA, each based on the close relatedness of wBol1a with the wPip Wolbachia strain from Culex pipiens, inferred from wsp and MLST sequences (Charlat et al., 2009). First, we used all primer pairs designed by Duron et al. (2006) that yielded polymorphic amplification in wPip. Second, we selectively amplified several ankyrin loci, based on previously observed polymorphisms in C. pipiens (Sinkins et al., 2005). Finally, we used the program Tandem Repeats Finder (Benson, 1999) to screen for repeat elements in the wPip genome (Klasson et al., 2008) and designed eight primer pairs on this basis. As detailed in Table S1, 11 loci were successfully amplified: 5 phage loci, 1 ankyrin locus and 5 repeat element regions.

PCR conditions

Amplification of each locus was attempted on at least three wBol1a-infected butterflies, collected from at least three different localities within the South Pacific region. Specific PCR conditions for each primer pair are shown in Table S1.


PCR products were purified and the forward strand directly sequenced. Any unique sequence was sequenced again using a reverse primer to eliminate errors. Sequences were aligned and their nucleotide variability assessed using Sequencher4.5. Sequences from each locus were blasted against the NCBI database, which confirmed that the primers amplified the expected sequences. Sequences were deposited in GenBank under accession numbers FJ842501 to FJ842514.

Assessing the maximum age of the sweep

We used fourfold and twofold synonymous sites from protein coding mitochondrial sequences to assess the maximum age of the selective sweep. Nucleotide sequences were blasted against the mitochondrial proteome of the silkworm Bombyx mori (Yukuhiro et al., 2002) using blastx to determine the open reading frame and translation frame. The number of fourfold and twofold synonymous sites in each coding region was determined using the Codon Usage program (Stothard, 2000). The maximum age of the sweep was then assessed following Rich et al. (1998). This method estimates a 95% confidence interval for the time elapsed since a sweep event using synonymous sites from protein coding sequences. The method assumes (i) that selection only occurs on variation at the protein level, so that no selection directly affects polymorphism at fourfold synonymous sites and (ii) that the phylogeny of the haplotypes is star shaped, which is the case if population size is large and the sweep is recent.


We sequenced 6337 bp of the bacterial genome and 2985 bp of the mitochondrial genome from wBol1a-infected butterflies collected throughout the species range. Not a single polymorphic site was detected. Notably, this lack of genetic diversity in the mitochondrial DNA only occurs inside the wBol1a-infected populations of H. bolina. As previously shown, uninfected, wBol1b and wBol2-infected butterflies harbour polymorphic mitochondria, with a raw polymorphism index of 0.018 (Charlat et al., 2009).

We followed the method of Rich et al. (1998) to estimate a 95% confidence interval for the time elapsed since the spread of wBol1a using fourfold synonymous sites from protein coding sequences. Recent estimates suggest that Wolbachia substitution rates are two orders of magnitude smaller than that of mitochondria (Raychoudhury et al., 2009). In other words, mtDNA data can provide a much more accurate estimate of the age of the sweep. For this reason, we restrict this analysis to mitochondrial protein coding genes. There are a total of 2145 fourfold synonymous sites in our data set, with no polymorphism observed. In addition, no polymorphism was detected at the CO1 locus in a sample of 85 other wBol1a-infected specimens from a previous analysis (Charlat et al., 2009). The CO1 region used by Charlat et al. (2009) includes 106 fourfold synonymous sites, so that the combined datasets includes a total of 11 155 fourfold synonymous sites without polymorphism (9010 from Charlat et al., 2009; + 2145 from the present study). Assuming a substitution rate of 57 × 10−9 substitutions/silent site/year (Tamura, 1992), this dates the sweep between 0 and 4700 years before present (95% confidence interval). A more accurate estimate can be obtained by including the twofold synonymous sites. Following the estimate from Jiggins (2003), we consider the substitution rate at twofold degenerate sites to be 32% that of fourfold sites; that is, 18 × 10−9 substitutions/twofold silent site/year. Using the same approach as above, the combined dataset includes a total of 11 155 fourfold synonymous sites and 20 288 twofold synonymous sites, which dates the sweep between 0 and 3000 years before present (95% confidence interval).


Despite extensive sequencing of bacterial and mitochondrial loci from samples across the species range, we found no polymorphism associated with the wBol1a infection in H. bolina. This lack of diversity is in sharp contrast with the high mtDNA diversity seen in non-wBol1a cytoplasmic lineages (Charlat et al., 2009), suggesting it is the result of a recent selective sweep associated with the spread of the infection, rather than a population bottleneck which would have affected equally the entire species.

The H. bolina case parallels those of the butterflies Acraea encedon and A. encedana (Jiggins, 2003) but markedly differ from other host/male-killer associations, such as in Adalia bipunctata and A. decempunctata, where extensive variation was observed in male-killer-associated mtDNA (v d Schulenburg et al., 2002). From these comparisons, we note a plausible relationship between the age of infection and the efficiency of vertical transmission: virtually perfect transmission is only seen in the two systems where infection appears to be recent. More data points would obviously be necessary to further test this hypothesis.

One striking feature of the H. bolina/wBol1a association is the pronounced variation in infection prevalence among island populations. Two hypotheses can be proposed to account for this pattern. First, the infection might be at equilibrium prevalence, but some critical biological parameters, such as transmission efficiency, indirect benefits of male-killing through fitness compensation, or direct effects of infection on female fitness, might vary among islands. Alternatively, the wBol1a infection might not have reached its equilibrium prevalence in all populations, as suggested by Hornett et al. (2009). Our results give support to the latter hypothesis. The observation that no genetic variation can be detected among Wolbachia isolates from very distant locations makes it unlikely that distinct islands carry bacteria with pronounced biological differences. It remains possible, however, that undetected but critical Wolbachia variation, host variation, or environmental factors, do produce deterministic differences in equilibrium prevalence among islands.

We used fourfold and twofold synonymous sites of mitochondrial protein coding genes to estimate that wBol1a must have spread throughout the H. bolina species range within the last 3000 years (95% confidence interval). We can use this time estimate to assess the strength of selection underlying the spread of the male-killer. Let pt and qt denote the frequency of infected and uninfected females, respectively, at generation t; with qt equal 1 – pt. Let the uninfected maternal lineage have a fitness of 1 and ωi denote the relative fitness of the infected maternal lineage. Assume perfect maternal transmission of the infection (consistent with empirical data from Charlat et al., 2009). The frequencies of infected and uninfected females at generation 1 can be expressed as functions of fitness and frequencies at generation 0:




One can express the relative proportion of infected to uninfected females at generation 1 as a function of this proportion at generation 0:


More generally, this proportion at generation t can be written as follows:


This allows us to express ωi as a function of time and frequencies at generations 0 and t:


Using our maximum estimate of 3000 years since the start of the spread of wBol1a, a global infection frequency of 0.5 in modern populations and assuming, conservatively, that the initial infection frequency was 1/106 and that H. bolina has 10 generations per year, we estimate that ωI ≈ 1.0005. Thus, a tiny increase in fitness provided by male-killing to its maternal lineage is sufficient to explain the observed pattern.

The ‘brake’ on the spread of an advantageous trait provided by the geographic island structure of the species makes this very low level of fitness benefit an under-estimate of the true value. However, we would note that even a more rapid spread through panmictic populations can be explained by small increases in fitness associated with infection. Figure 1 is derived from the above formulae and illustrates that the estimated ωi is less than 1.002 for most of the plausible values for the duration of the male-killer spread. In fact, the estimated ωi increases above 1.01 only if the spread occurred within less than 100 years, that is, 1000 generations. Moreover, we note that the estimated ωi is robust to variation in initial or present frequency of the male-killer. This analysis illustrates that tiny fitness benefits can produce rapid spread. In other words, very small (and virtually undetectable) positive fitness effects of male-killing might be sufficient to explain rapid male-killing spread across a species range, notwithstanding the impediment produced by strongly structured populations.

Figure 1.

 Estimated relative fitness of the infected maternal lineage (ωi) as a function of the actual duration of the spread. ωi is estimated throughout the 95% confidence interval of the duration of the male-killer spread (0–3000 years, that is, 0–30 000 generations), for four different sets of initial and current male-killer frequency.


We would like to thank S. Bourlat, and D. Shoemaker for their advice on mitochondrial primer design, I. Iturbe-Ormaetxe, M. Woolfit, M. Riegler, E. Hornett and S. Geange for constructive comments on the manuscript, C. Vermenot, S. Boyer and M.J. McIlroy for appreciated help with butterfly collections, L. Duret, J. Engelstädter and F. Vavre for help and advice on estimation of sweep duration and selective coefficient. We are grateful to the NSF (grant-0416268), NERC (grant-NE/B503292/1), the Australian Research Council (DP0772992), the University of Queensland (UQCS and UQIRTA), and CNRS (grant ATIP - Avenir) for provision of the funds.