Ancestral state reconstruction analysis of hymenopteran sex determination mechanisms


Mark K. Asplen, Department of Entomology, University of Minnesota, 219 Hodson Hall, 1980 Folwell Ave., St. Paul, MN 55108, USA.
Tel.: +1 612 624 3715; fax: +1 612 625 5299; e-mail:


We provide the first phylogenetic evidence supporting complementary sex determination (CSD) as the ancestral mechanism for haplodiploidy in the Hymenoptera. It is currently not possible, however, to distinguish the evolutionary polarity of single locus (sl) CSD and multiple-locus (ml) CSD given the available data. In this light, we discuss the seemingly maladaptive hypothesis of ml-CSD ancestry, suggesting that collapse from ml-CSD to sl-CSD should remain a viable evolutionary hypothesis based on (i) likely weakening of frequency-dependent selection on sex alleles under ml-CSD and (ii) recent findings with respect to the evolutionary novelty of the complementary sex determiner gene in honeybees. Our findings help provide a phylogenetically informed blueprint for future sampling of sex determination mechanisms in the Hymenoptera, as they yield hypotheses for many unsampled or ambiguous taxa and highlight taxa whose further sampling will influence reconstruction of the evolutionary polarity of sex determination mechanisms in major clades.


In organisms with arrhenotokous haplodiploidy (arrhenotoky), unfertilized eggs develop into haploid males, whereas syngamy permits diploid female development. This genetic system is both ancestral and pervasive in the Hymenoptera (Heimpel & de Boer, 2008) and has been viewed as important to many evolutionary milestones in the order, such as eusociality, adaptive sex allocation, and the secondary transition to female parthenogenesis (thelytoky).

Chromosome number itself, however, does not appear to be the causal mechanism of haplodiploidy in many (if not all) hymenopterans. After early experiments demonstrated significant levels of diploid male production (DMP) with inbreeding by the parasitoid Habrobracon hebetor, Whiting (1943) proposed complementary sex determination (CSD) as a mechanism for hymenopteran arrhenotoky. Here, heterozygosity at one highly polymorphic sex locus (single locus or sl-CSD; as originally described by Whiting) or one of several such loci (multiple-locus or ml-CSD; Crozier, 1971), in the diploid state leads to female development. Males, however, can arise from either (i) hemizygosity in haploids as in standard arrhenotoky or (ii) homozygosity at all sex loci in diploids. The CSD phenotype has been confirmed in over 60 hymenopteran species (van Wilgenburg et al., 2006) and, in the honeybee, the gene responsible for sl-CSD has been cloned and sequenced (Beye et al., 2003).

An important correlate with CSD is that diploid males nearly always suffer fitness costs due to increased developmental mortality, decreased longevity, and/or sterility (Heimpel & de Boer, 2008; but see Cowan & Stahlhut, 2004). This diploid male disadvantage could have profound negative impacts for hymenopteran populations exposed to genetic bottlenecks, as it leads to a special case of inbreeding depression (Zayed & Packer, 2005). Given this fitness cost, it is perhaps not surprising that CSD is not universally exhibited in the Hymenoptera; in fact, both sl-CSD and ml-CSD have been reasonably falsified in several species (van Wilgenburg et al., 2006). The mechanism for arrhenotoky in these hymenopterans remains generally unknown, although genomic imprinting has been implicated for the chalcidoid Nasonia vitripennis (Beukeboom et al., 2007).

The fact that all major subgroups of the Hymenoptera (Symphyta, Parasitica, Aculeata) contain CSD-bearing species has been used to support the argument that CSD is the ancestral mode of hymenopteran sex determination (Cook, 1993b; Heimpel & de Boer, 2008). Furthermore, ml-CSD is believed to have evolved from sl-CSD as an adaptation against DMP, as the risk of homozygosity at all loci decreases with each additional sex locus (Crozier, 1971; de Boer et al., 2008; Heimpel & de Boer, 2008). No effort to date, however, has been made to test these hypotheses using phylogenetic methods. Here, we use maximum parsimony to reconstruct the ancestral state for sex determination in the order. Our goals are three-fold: (i) to test formally the hypotheses of CSD and sl-CSD ancestry in various hymenopteran clades, (ii) to develop hypotheses for untested or ambiguous taxa with respect to the presence or absence of CSD and sl-CSD, and (iii) to add a phylogenetic focus to future sampling schemes by highlighting unsampled taxa that could add resolution to future ancestral state reconstructions.

Materials and methods

Inference of sex determination mechanisms

The sex determination mechanisms of 97 hymenopteran species were coded for presence or absence of (i) CSD (whether sl-CSD or ml-CSD) and (ii) sl-CSD (Table S1). van Wilgenburg et al. (2006) developed the following ranking system to determine the strength of evidence for a given species: 1 – highly male-biased sex ratios in the laboratory or field without inbreeding studies, 2 – verification of male diploidy, 3 – an increase in sex ratio (i.e. proportion male) in the progeny of inbred crosses that matches CSD model predictions, 4 – a combination of 2 and 3, and 5 – molecular identification of the sex locus. Although we utilized this ranking system as a guide for coding sex determination, our criteria for sl-CSD presence differ significantly. Here, coding of sl-CSD (+/+ in Table S1) was restricted to species with a confidence code of 4 or 5, whereas species with a confidence code of 2 or 3 were coded as having CSD, but were considered uncertain for sl-CSD (+/? in Table S1). Species lacking sl-CSD can either possess ml-CSD (+/− in Table S1) or lack CSD altogether (−/− in Table S1). Absence of CSD in our coding system requires one of two lines of evidence: (i) a lack of sex ratio elevation and/or DMP after prolonged inbreeding studies (e.g. Cook, 1993a) or (ii) demonstration of thelytoky via gamete duplication, which fixes homozygosity in resulting females (Stouthamer & Kazmer, 1994). If such evidence was not present for a given species, yet sl-CSD was rejected, it was considered uncertain for CSD (?/− in Table S1).


A composite phylogeny for the Hymenoptera was compiled following the basic structure of Whitfield (1998) and Dowton & Austin (2001). Sources for phylogenetic relationships within subgroups are as follows: Vilhelmsen, 2001 (Symphyta); Belshaw et al., 1998; Michel-Salzat & Whitfield, 2004 (Braconidae); Wagener et al., 2006 (Ichneumonidae); Brothers, 1999 (Aculeata); Hines et al., 2007 (Vespidae); Brady et al., 2006; S.G. Brady, personal communication (Formicidae); Cameron & Mardulyn, 2001; Michel-Salzat et al., 2004; Danforth et al., 2006; Rasmussen & Cameron, 2007; C. Rasmussen, personal communication (Apoidea); Ronquist, 1999 (Cynipoidea); and Gibson et al., 1999; Gauthier et al., 2000; Desjardins et al., 2007 (Chalcidoidea). It should be noted that traditional vespoid relationships have recently been called into question (Pilgrim et al., 2008), although the new findings do not qualitatively change our results. In addition to the taxa for which CSD/sl-CSD is supported or rejected, we included major lineages for which the sex determination mechanism has not been examined for purposes of hypothesis generation. Uncertain relationships in the phylogeny are represented by polytomies, which were treated as ‘soft’ during ancestral state reconstruction analyses.

Ancestral state reconstructions

Two separate ancestral state reconstructions were performed in mesquite v. 2.01 (Maddison & Maddison, 2007), each using unordered parsimony: one including all species for which CSD can be coded as present or absent (two states), and one in which presence/absence could be verified for sl-CSD (two states). To test whether our reconstructions differed from random expectation, we generated random distributions for the number of evolutionary steps for both the CSD and sl-CSD reconstructions (Maddison & Slatkin, 1991). For each reconstruction, 5000 random trees were generated using the following commands in Mesquite: Taxa & Trees→Make new trees block from→Randomly modify current tree→Reshuffle terminal taxa. This technique preserves the topology of the original tree but randomly shuffles the terminal taxa and their character states for each replicate. Distributions of the number of character steps for each replicate were then developed, to which the original reconstruction step counts were compared at both the 0.05 and 0.01 confidence levels (one-tailed tests).

To examine the robustness of our results to alternative tree topologies, we followed the following command chain in Mesquite to generate a series of 100 randomly resolved trees for each of the two reconstructions: Make new trees block from→Randomly modify current tree→Randomly resolve polytomies. Ancestral state reconstruction analyses were then performed on each of the created trees and their results compared to the most parsimonious reconstructions on the polytomous trees for both CSD and sl-CSD.

Results and discussion

Ancestral state reconstruction of CSD and sl-CSD

The most parsimonious ancestral state reconstruction supports CSD ancestry in nearly all tested hymenopteran clades, including the order itself, with it having been lost independently at least three times (Fig. 1). This number of evolutionary transitions is significantly fewer than expected had CSD evolved randomly (mean of 9.42 steps; median of 10 steps; one-sided P < 0.01). All tested members of Cynipoidea and Chalcidoidea lack CSD, although it is unclear if CSD absence is a synapomorphy of the clade containing these two superfamilies. This is because an unresolved trichotomy is formed by Proctotrupoidea (an untested taxon), Cynipoidea, and Chalcidoidea + Platygastroidea (an uncertain superfamily with respect to CSD). If this is resolved with a basal placement of Proctotrupoidea, then CSD presence or absence are equally parsimonious reconstructed ancestral states for both Proctotrupoidea and the clade as a whole (Fig. 1). CSD has also been lost at least once in both the Aculeata (in the bethylid Goniozus nephantidis) (Cook, 1993a) and the Ichneumonoidea (in the braconid Cotesia flavipes) (Niyibigira et al., 2004).

Figure 1.

 Most parsimonious character mapping of CSD presence and absence on a composite phylogeny of the Hymenoptera. The character states of terminal taxa are classified for CSD (left rectangle) and sl-CSD (right rectangle) (see text for coding details). Any character states (denoted by branch color) assigned to these taxa represent the most parsimonious hypotheses given the character state distribution of known species. In cases where relationships change from the most parsimonious reconstruction after random resolution of polytomies, branches are multi-coloured to represent all possible, appropriate outcomes. Vertical bars delineate taxonomic groups.

Three untested apocritan taxa differed with respect to their reconstructed ancestral state of CSD presence or absence following random polytomy resolution: Ceraphronoidea, Evanioidea and Trigonalyoidea. In all cases, reconstructed states on randomly resolved trees for these taxa were either CSD presence or equivocality (Fig. 1). All other reconstructed states were robust to changes in tree topology.

Ancestral reconstruction analysis of sl-CSD, on the other hand, yields an ambiguous evolutionary pattern (Fig. 2). Here, absence of sl-CSD is reconstructed as the ancestral state, with it having been independently gained at least seven times; significantly fewer transitions than expected had sl-CSD evolved randomly (mean of 15.25 steps; median of 15 steps; one-sided < 0.01). sl-CSD absence is reconstructed as a shared ancestral state of Symphyta, Ichneumonoidea, the basal Aculeata (Chrysidoidea), and the clade containing Chalcidoidea, Cynipoidea, Platygastroidea and Proctotrupoidea. A single synapomorphic gain of sl-CSD is reconstructed for the Ichneumonidae, with additional evolutionary steps occurring in the Tenthredinoidea (at least one gain), Braconidae (at least three gains) and Aculeata (at least two steps). Unlike the CSD ancestral state reconstruction, however, the reconstruction for sl-CSD was far less robust to randomized polytomy resolution. Using different resolutions, the reconstructed states of the order and many internal clades can be either sl-CSD absence or equivocality, whereas the apid clade containing Euglossa and Eulaema can be constructed as either equivocal or sl-CSD presence (Fig. 2).

Figure 2.

 Most parsimonious character mapping of sl-CSD presence and absence on a composite phylogeny of the Hymenoptera. Graphical representations follow from Fig. 1.

Even if the sl-CSD reconstruction were robust, coding the presence or absence of sl-CSD as a dichotomous trait is problematic, because the absence of sl-CSD need not arise from the same mechanism. It can result either from (i) ml-CSD or (ii) absence of CSD altogether. This issue is most clearly illustrated by the braconid genus Cotesia. Here, sl-CSD absence is known from two species (C. flavipes and Cotesia vestalis), with sl-CSD having been gained in Cotesia glomerata (Zhou et al., 2006) (Figs 1 and 2). However, while the analysis assumes that sl-CSD absence in these two species is a homologous trait resulting from shared common ancestry, C. flavipes lacks CSD altogether whereas C. vestalis possesses ml-CSD (Niyibigira et al., 2004; de Boer et al., 2008) (Figs 1 and 2). As the developmental basis of species lacking CSD is not yet clearly understood, there is currently no formal way to assess the homology (and thus any order of evolution) of these forms of sl-CSD absence.

In light of this issue, a multi-state approach for the ancestral state reconstruction of sl-CSD that incorporates the three known mechanisms in Hymenoptera (sl-CSD, ml-CSD and CSD absence) is desirable. This is problematic as well, however, as it is unclear whether (i) many species lacking sl-CSD also lack CSD and (ii) the majority of species with CSD possess sl- or ml-CSD (Figs 1 and 2; Table S1). In species lacking sl-CSD, ml-CSD has to date only been supported in two species and rejected in 10, although eight of the rejections lie in two closely related clades (Chalcidoidea and Cynipoidea). This indicates a strong potential for phylogenetic bias in our current sampling of CSD absence, making it impossible to assess whether uncertain species in other clades (e.g. Ichneumonoidea and Aculeata) would be more likely to possess ml-CSD or lack CSD altogether. The relatively weak evidence (confidence codes < 4) for sl-CSD in the majority of CSD-bearing species also makes it difficult to assess the relative frequencies of sl-CSD and ml-CSD across the Hymenoptera (especially in the social aculeates, where prolonged inbreeding studies are more technically challenging). Until more experiments able to reject ml-CSD (e.g. Cook, 1993b; de Boer et al., 2008) have been performed throughout the order, we conclude that insufficient data exist to test the widely-held view of sl-CSD ancestry in hymenopterans.

Could sl-CSD evolve from ml-CSD?

In addition to the associated fitness benefits linked with lowered risks of DMP, an evolutionary transition from sl-CSD to ml-CSD via duplication of the sex locus is thought to be particularly parsimonious, as gene duplication is common in nature (Prince & Pickett, 2002; de Boer et al., 2008). The alternative hypothesis of ml-CSD ancestry is considered implausible as frequency-dependent selection should impede fixation of a sex locus through the advantage rare alleles confer in lowering the risk of DMP (Hedrick et al., 2006). We propose caution in rejecting ml-CSD ancestry on these grounds, however, as the strength of frequency-dependent selection on a given locus may be considerably weaker under ml-CSD than under sl-CSD. This is because high allelic diversity at other sex loci may lower the risk of DMP sufficiently to offset the advantage of rare alleles at a locus with low allelic diversity that is at risk of becoming fixed. We are currently developing a frequency-dependent selection model for a two-locus CSD system to assess the influence of allelic diversity at each locus on sex locus fixation.

An additional indication for the possibility of sl-CSD evolution from ml-CSD comes from a recent study on honeybees. Hasselmann et al. (2008) suggested that the csd gene evolved via gene duplication from an upstream gene feminizer (fem) on the honeybee sex locus. The ancestral fem shows relatively low evolutionary divergence within the Hymenoptera and appears equivalent in function to the sex determination gene transformer (tra) of Drosophila. Furthermore, the authors suggest that regulation of CSD via the csd gene is unique to honeybees, and that the multi-allelic sex determiner in other hymenopterans with CSD is as yet unknown. If csd-regulated CSD is indeed a synapomorphy of Apis, and CSD was ancestral to this lineage (Fig. 1), then ml-CSD would have (i) been present prior to honeybee diversification (csd + one or more ancestral CSD genes) and (ii) then evolved into the sl-CSD phenotype (regulated by csd alone) possessed by modern Apis species. This evolutionary scenario is more parsimonious than the alternative hypothesis that the CSD phenotype was absent in the ancestor to honeybees and subsequently re-gained via the evolution of csd, and thus provides support for the possibility of sl-CSD evolving from ml-CSD.

We would like to stress that these lines of reasoning are not meant to argue that ml-CSD is necessarily ancestral in the Hymenoptera as a whole. Rather, we introduce them to point out that there are reasons to suspect that ml-CSD can collapse (and has collapsed) to sl-CSD during the course of hymenopteran evolution. The selection away from sl-CSD is likely strong, as has been stressed by numerous authors (e.g. Crozier, 1971; Stouthamer et al., 1992; Zayed & Packer, 2005; Heimpel & de Boer, 2008). This could lead to scenarios in which sl-CSD may evolve into ml-CSD via gene duplication within a lineage, but then that all but one of these loci become fixed within populations due to relaxed frequency-dependent selection. If these populations become reproductively isolated, they could form the basis of species with sl-CSD.

Hypotheses generated by the reconstruction

The CSD ancestral state reconstruction (Fig. 1) produces testable hypotheses for ml-CSD presence or absence in several species lacking sl-CSD. Absence of CSD is predicted for all species in Chalcidoidea, Cynipoidea and Platygastroidea. Therefore, the following species for which sl-CSD has been refuted should also lack ml-CSD: Leptopilina boulardi and Leptopilina heterotoma (Cynipoidea: Eucoilidae), Telenomus fariae (Platygastroidea: Scelionidae), Muscidifurax raptor and Muscidifurax zaraptor (Chalcidoidea: Pteromalidae), Dinarmus vagabundus (Pteromalidae) and Melittobia chalybii and Melittobia sp. (Chalcidoidea: Eulophidae) (van Wilgenburg et al., 2006). Alternatively, species for which presence of CSD is reconstructed despite the absence of sl-CSD are hypothesized to possess ml-CSD. These include the ant Cardiocondyla obscurior and three braconids (Asobara tabida, Alysia manducator and Heterospilus prosopidis) (van Wilgenburg et al., 2006). The discovery of a single diploid male after seven generations of inbreeding in C. obscurior may tentatively support ml-CSD in this species (Schrempf et al., 2006), but more tests are needed.

The ancestral state reconstruction analysis of CSD also reveals sex determination hypotheses for several major hymenopteran taxa that have yet to be examined (Fig. 1). All untested Symphyta and Stephanoidea are expected to possess CSD, as are the aculeate families Mutillidae, Tiphiidae, Pompilidae and Scoliidae. Testing for CSD in these taxa will either strengthen or weaken the CSD reconstruction results outlined here. In addition, species in Symphyta, Ceraphronoidea, Evanioidea, Proctotrupoidea and Trigonalyoidea should receive special attention with regard to sex determination experiments given their phylogenetic placement (Fig. 1). This is especially true for Symphyta, for verification of CSD in untested, basal superfamilies would greatly increase the power of ancestral state reconstructions deep in the hymenopteran phylogeny.

Implications for the evolution of insect sex determination mechanisms

Complementary sex determination presence at the origin of hymenopteran arrhenotoky could have important ramifications for theories regarding the evolution of this alternative genetic system. Although many models and hypotheses propose a prominent role for sib-mating and sib-competition in favouring the transition from diplodiploidy to arrhenotoky (Borgia, 1980; Smith, 2000; Normark, 2004, 2006), the costs of DMP under inbreeding with CSD suggest that these may not explain the origin of arrhenotoky in the Hymenoptera (Heimpel & de Boer, 2008). Adequate assessment of the importance of hymenopteran CSD ancestry to models of arrhenotoky, however, requires improved understanding of two factors. The first concerns whether ancestral hymenopteran CSD involved one or more loci. If it was sl-CSD, the high risks of DMP with sib-mating would likely preclude high levels of inbreeding in ancestral hymenopterans. If, on the other hand, ml-CSD was the ancestral form of hymenopteran arrhenotoky, the inconsistency between CSD ancestry and inbreeding-invoking models of arrhenotoky evolution could be far less stark, as multiple loci can greatly depress DMP.

The second factor relates to arrhenotoky in nonhymenopteran insects. Arrhenotoky is known from several taxa outside of the Hymenoptera, such as thrips (Thysanoptera), whiteflies, and some scale insects and bark beetles (Byrne & Bellows, 1990; Jordal et al., 2000; Heimpel & de Boer, 2008). Virtually nothing is known, however, about the genetic basis of arrhenotoky outside of the Hymenoptera. In the thysanopteran Franklinothrips vespiformis, the B-strain of Wolbachia was found to mediate thelytoky (Arakaki et al., 2001). As Wolbachia-generated thelytoky in all examined hymenopterans is accomplished via gamete duplication, which rules out CSD presence (see above), this finding may indicate absence of CSD in at least one thrips species. It is important to note, however, that is unclear (i) whether diploidy is restored in similar ways by different Wolbachia strains and (ii) if F. vespiformis is indicative of other thrips species. The haplodiploid bark beetles all appear to engage in inbreeding (Jordal et al., 2000), making at least sl-CSD unlikely in this group. Clearly, however, greater research efforts into the genetic bases of sex determination in nonhymenopteran, arrhenotokous species are needed to determine whether the Hymenoptera are an exception to normal evolutionary trends of arrhenotoky.


The current evidence suggests that CSD is the ancestral sex determination mechanism in the Hymenoptera. Whether one or more loci were responsible in the ancestral form of CSD, however, cannot yet be determined. Several studies are needed to address this: (i) testing for ml-CSD in many species currently known to lack sl-CSD, (ii) more rigorous testing of sex determination models in CSD-bearing taxa for which sl-CSD is uncertain (e.g. the social Aculeata), and (iii) expansion of taxon sampling with respect to sex determination mechanisms, with special emphasis given to Symphyta, basal Apocrita, and Proctotrupoidea. Such data would permit three-state (sl-CSD, ml-CSD and CSD absence) analyses to test hypotheses of sl- or ml-CSD ancestry. We believe such studies are necessary, and caution against assuming that collapse of ml-CSD to sl-CSD is an implausible evolutionary scenario. Finally, although we have generated testable sex determination hypotheses for taxa in several major hymenopteran subgroups, it is also important that sampling continue in groups represented by relatively few genera (e.g. Ichneumonidae) and in large genera represented by few species (e.g. Cotesia) to ensure that intra-taxon variation is sufficiently captured. Data of all these types are needed to further our understanding of the evolutionary origins of arrhenotoky in the Hymenoptera.


We thank Sean Brady (ants) and Claus Rasmussen (stingless and orchid bees) for sharing their systematic expertise, Roger Blahnik, Patina Mendez and Alex Wild for advice regarding phylogenetic methods in the Mesquite program, Yoshi Yamada for sharing sex determination data for Echthrodelphax fairchildii, and Jeremy Chacón, Christine Dieckhoff, Karl Gruber, Virginia Howick, Nick Mills, Emily Mohl, Zeynep Sezen, Ruth Shaw, Peter Tiffin, and two anonymous reviewers for insightful discussions regarding the data set and previous manuscript drafts. This project was funded in part by NSF DEB grants number 03445829 to JBW and 0344131 to GEH.