Hybrid zones act as genomic sieves. Although globally advantageous alleles will spread throughout the zone and neutral alleles can be freely exchanged between species, introgression will be restricted for genes that contribute to reproductive barriers or local adaptation. Seminal fluid proteins (SFPs) are known to contribute to reproductive barriers in insects and have been proposed as candidate barrier genes in the hybridizing field crickets Gryllus pennsylvanicus and Gryllus firmus. Here, we have used 125 single nucleotide polymorphisms to characterize patterns of differential introgression and to identify genes that may contribute to prezygotic barriers between these species. Using a transcriptome scan of the male cricket accessory gland (the site of SFP synthesis), we identified genes with major allele frequency differences between the species. We then compared patterns of introgression for genes encoding SFPs with patterns for genes expressed in the same tissue that do not encode SFPs. We find no evidence that SFPs have reduced gene exchange across the cricket hybrid zone. However, a number of genes exhibit dramatically reduced introgression, and many of these genes encode proteins with functional roles consistent with known barriers.
The genomes of recently diverged species are mosaics. Shared polymorphisms will characterize many regions, whereas other regions will have diverged in allele frequency as a result of random lineage sorting or natural and sexual selection (Harrison 1991; Wu 2001; Turner et al. 2005; Nosil et al. 2009). Some divergent regions may harbor genes contributing to intrinsic barriers between species (or speciation phenotypes; Shaw and Mullen 2011). Identifying such genes and characterizing their distribution in the genome and their interaction with other genes (or gene products) provides insights into the mechanisms that lead to divergence and ultimately to speciation (Coyne and Orr 2004; Noor and Feder 2006; Nosil and Schluter 2011). With increasingly efficient sequencing technologies, we now have the capacity to identify divergent genes across the genome in organisms that reflect different modes of speciation and different stages of divergence (Noor and Feder 2006; Harrison 2010; Butlin et al. 2012; Nosil and Feder 2012).
Allopatric divergence and subsequent secondary contact of recently diverged species provide a unique opportunity to identify and characterize genes that contribute to reproductive barriers. In zones of secondary contact, hybridization and recombination over many generations shuffle divergent genomes and selection sorts the recombined genotypes (Barton and Hewitt 1985; Hewitt 1988; Harrison 1990; Payseur 2010). Alleles at loci that are equally fit in either genomic background will be easily exchanged between species. In contrast, alleles at loci that contribute to barriers between species or to local adaptation will have limited introgression and relatively steep clines across a gradient of hybridization (i.e., across a geographic transect or as a function of hybrid index; Barton and Hewitt 1985; Szymura and Barton 1986; Gompert and Buerkle 2009). Restricted introgression can result from exogenous selection across ecotones (Endler 1977) or in heterogeneous habitat (e.g., in mosaic hybrid zones; Rand and Harrison 1989). Alternatively, prezygotic barriers that are independent of the environment (e.g., assortative mating or postmating prezygotic barriers) or endogenous selection against hybrids (Key 1968; Barton and Hewitt 1985) can limit gene exchange. In fact, all of these barriers may be important in limiting gene exchange, but relating pattern of variation and mode of selection is not necessarily straightforward (Bierne et al. 2011). Estimates of differential introgression using large multilocus data sets are still rare (but see Lexer et al. 2010; Gompert et al. 2012a; Janousek et al. 2012; Hamilton et al. 2013), and in most of the cases examined thus far, postzygotic barriers appear to be most important in restricting gene flow.
Here, we use estimates of differential introgression for a large number of single nucleotide polymorphisms (SNPs) to identify genes that remain differentiated across a mosaic hybrid zone between the field crickets, Gryllus pennsylvanicus and Gryllus firmus. The field crickets diverged approximately 200,000 years ago (Willett et al. 1997; Maroja et al. 2009a) and hybridize in a zone composed of interspersed parental and mixed populations distributed along the eastern edge of the Appalachian mountains (Harrison and Arnold 1982). Multiple prezygotic barriers limit gene exchange between species, including habitat isolation (Harrison 1986; Rand and Harrison 1989; Harrison and Bogdanowicz 1997; Ross and Harrison 2002; Larson et al. 2013), positive assortative mating (Maroja et al. 2009b), and asymmetrical postmating prezygotic barriers (Harrison 1983). Crosses between G. firmus females and G. pennsylvanicus males have reduced egg laying (Maroja et al. 2009b; Larson et al. 2012a), and sperm do not successfully fuse with eggs (Larson et al. 2012b). In contrast, the reciprocal cross (G. pennsylvanicus female and G. firmus male) produces viable and fertile offspring that are capable of forming F2s or backcrossing to either parent (Harrison 1983; E. L. Larson and R. G. Harrison unpubl. data). Within the hybrid zone F1 hybrids are rare, but multigeneration backcrosses are common, indicating that there is on-going gene flow despite prezygotic barriers (Harrison 1986; Harrison and Bogdanowicz 1997; Maroja et al. 2009a; Larson et al. 2013).
Previous attempts to identify divergent genes between the two cricket species have met with little success (e.g., Harrison and Bogdanowicz 1997; Broughton and Harrison 2003). However, analysis of genes encoding seminal fluid proteins (SFPs) has not only shown such genes to be rapidly evolving, but several SFP loci show major allele frequency differences between the species (Andrés et al. 2006, 2008; Maroja et al. 2009a). Seminal fluid proteins are synthesized in the male accessory gland and transferred to females as part of the ejaculate. They are known to mediate critical aspects of fertilization in insects and have been proposed as candidate barrier genes contributing to the asymmetric fertilization incompatibility between G. firmus females and G. pennsylvanicus males. We used transcriptome scans of the male accessory gland to identify genes that have major allele frequency differences between species. We then compared patterns of introgression for genes encoding SFPs with patterns for genes expressed in the male accessory gland that do not encode SFPs. We find no evidence that SFPs have reduced gene exchange across the cricket hybrid zone. However, a number of other genes expressed in the male accessory gland exhibit dramatically reduced introgression, and many of these genes encode proteins with functional roles consistent with known barriers.
We assayed 232 SNPs from 183 genes (mean length 434 bp, min. coverage of 20×) expressed in the male cricket accessory gland. Single nucleotide polymorphisms were identified from 454 and Illumina sequencing of two focal populations of each species (Andrés et al. 2013). Twenty-two of these SNPs are known to occur in genes that encode SFPs. These genes were identified by combining analysis of G. firmus male accessory gland expressed sequence tags (ESTs) with proteomic analysis of spermatophore contents (Andrés et al. 2006, 2008). For introgression analysis, we selected SNPs that were bi-allelic and had large frequency differences between species.
To validate putative SNPs and define allele frequencies in “pure” species populations, we genotyped 71 crickets from three allopatric populations of each species (11–12 individuals per population) and nine crickets from a single mixed population that was outside of our focal study area (Table S1). To estimate introgression, we genotyped 301 crickets from 36 localities within a small region of the hybrid zone in Pennsylvania (Table S2). We extracted genomic DNA from single adult femurs using the DNeasy Tissue Kit (QIAGEN Inc., Valencia, CA) and diluted the DNA to 10 ng/μL.
We designed nine multiplexed assays targeting 232 SNPs in 181 genes (210 SNPs from the accessory gland transcriptome and 22 SNPs from 15 SFPs) using the Sequenom Mass-ARRAY platform (Sequenom Inc., San Diego, CA; Dryad doi:10.5061/dryad.164dg). Assays were designed using the MassARRAY Assay Design Software. For 34 genes, assays were designed to encompass two or more SNPs to confirm that our genotyping assays had consistent genotypes within a gene. For the remaining loci, one SNP per gene was selected for genotyping. Reactions were performed using iPLEX Gold chemistry at the Cornell Life Sciences Core Laboratories Center for Genomics (Ithaca, NY). Single nucleotide polymorphism genotypes were called using the Sequenom MassARRAY Typer version 4.0 Analysis software and checked by eye. Assays that had poor amplification or peak resolution in our test panel of crickets were excluded from further analyses.
ADMIXTURE AND GENOMIC CLINE ANALYSIS
We quantified admixture and estimated genomic clines using the R-package Introgress (Gompert and Buerkle 2009, 2010). We first estimated the parental allele frequencies for each SNP with high-quality amplification and genotype clustering in our allopatric populations using the function “prepare.data,” and calculated the interspecific differentiation index (D) as:
the absolute value of the allele frequency difference between the two species (Andrés et al. 2013). For our analyses, we selected only SNPs for which D ≥ 0.80 in comparisons of allopatric populations of the parental species; these represented 125 markers. We then quantified the ancestry of each cricket from the hybrid zone by estimating a hybrid index, which is an average of the genome-wide admixture for a given individual, calculated as the proportion of alleles at all 125 markers that are inherited from G. firmus. Interspecific heterozygosity (the proportion of an individual's genome with alleles inherited from both parental populations) was estimated using the function “calc.intsp.het” and compared to the hybrid index to infer each individuals’ hybrid class. Following Milne and Abbott (2008), crickets were defined as F1 individuals if they have an interspecific heterozygosity ≥85% and a hybrid index of 0.5, multigeneration hybrids if they have an interspecific heterozygosity <85% and a hybrid index between 0.25 and 0.75, or backcross individuals if they have an interspecific heterozygosity <85% and a hybrid index ≤0.25 (backcross into G. pennsylvanicus) or ≥0.75 (backcross into G. firmus).
We constructed genomic clines using multinomial regression to predict, based on the hybrid index and interspecific heterozygosity, the probability for each marker of observing each of the three possible genotypes (PP: homozygous G. pennsylvanicus, PF: heterozygous, and FF: homozygous G. firmus). We compared the likelihoods of the regression model to a neutral model of introgression to identify markers that do not conform to expectations of neutral introgression. Our neutral model was constructed using 2000 parametric simulations based on the observed genotype frequencies (Gompert and Buerkle 2009). For all analyses, estimates of the hybrid index and interspecific heterozygosity were calculated taking into account allele frequencies in our parental populations, and significance thresholds were adjusted using the false discovery rate procedure (Benjamini and Hochberg 1995). The Introgress output summarizes deviations from neutrality as either (1) excess or deficiency of one homozygote class (e.g., PP+ PF+ FF−), which is consistent with directional selection; (2) excess of heterozygotes (e.g., PP− PF+ FF−), which is consistent with overdominance; and (3) deficiency of heterozygotes (e.g., PP+ PF− FF+), which is consistent with underdominance, disruptive selection, or assortative mating. Evidence for an excess or deficiency of homozygous and heterozygous genotypes at a given locus is based on the proportion of neutral simulations that yield a model with higher total probability of a given genotype than the model based on observed data (Gompert and Buerkle 2010). Deviation categories were assigned based on the Introgress deviation category output, visual inspection of cline shape, and observed genotype classes (see Macholan et al. 2011).
Of the initial 232 putative SNPs, 208 SNPs (89.7%) were successfully amplified in ≥95% of individuals from our test panel, and the majority of these (166 SNPs) had an amplification success ≥99% (Table S3). One hundred and eighty-five SNPs (155 genes) were both successfully amplified and had clear genotype clusters; of these 146 SNPs (125 genes) had D ≥ 0.80 between our six allopatric populations and 54 SNPs (46 genes) had fixed differences between species (D = 1.0; Table S4). We restricted the remaining analyses to a single marker per gene, resulting in 125 genes included in our hybrid zone analyses (see Table S5 for gene annotation). Overall, SNPs had very high genotyping success for all populations: allopatric G. firmus 99.95% (4500 possible genotypes), allopatric G. pennsylvanicus 99.95% (4375 possible genotypes), our test mixed population 99.82% (1125 possible genotypes), and the hybrid zone 99.63% (37,625 possible genotypes).
CLASSIFYING CRICKETS IN THE HYBRID ZONE
The majority (∼93%) of individuals within the hybrid zone had low interspecific heterozygosity (heterozygous for <20% of markers) and the hybrid zone had a distinctly bimodal distribution of hybrid indices (Fig. 1). Only one cricket was considered to be an F1 hybrid, eight crickets were identified as multigeneration hybrids (F2, F3, or F4 individuals), and the remaining crickets were classified as either backcrosses into G. pennsylvanicus or G. firmus. Collections from the majority of localities were either predominantly G. pennsylvanicus or G. firmus, and all of these populations, except AK, also contained backcrossed individuals (Fig. 2). There were 11 localities that contained both parental types, but in only seven localities were there multigeneration hybrids.
GENOMIC CLINE ANALYSES
The extent of introgression varied greatly among the 125 markers. For some markers heterozygotes were rare, whereas at other loci, alleles of one species were found in crickets with hybrid indices characteristic of the other species (Fig. 3A). Forty-two markers (33.6%) did not show patterns that deviated significantly from neutral expectations. We categorized the remaining 83 markers as deviating from neutral expectations based on the excess (+) or deficit (−) of homozygous and heterozygous genotypes and overall cline shape (Table S6). Thirty-five markers (28%) had patterns of introgression consistent with positive directional selection for the introgressed allele. These were either loci that that had an excess of G. pennsylvanicus alleles (PP+ and FP+) in crickets with a G. firmus genomic background (25 markers) or that had an excess of G. firmus alleles (FF+ and FP+) in crickets with a G. pennsylvanicus background (10 markers). Twelve markers (9.6%) had an excess of heterozygotes, consistent with overdominance or heterozygote advantage. Thirty-six markers (28.8%) had fewer heterozygotes than expected, and an excess of one or both homozygous genotype classes, consistent with underdominance, disruptive selection, or assortative mating. Ten of these markers (8%) had a significant excess of both homozygous genotype classes. A summary of the functional annotation groups for the 36 markers with a deficit of heterozygotes is provided in Table 1. Figure 3B provides a summary of all 125 genomic clines, and individual genomic clines for each marker are shown in Figure S1.
Table 1. Summary of the functional annotation of proteins encoded by the 36 genes that exhibit reduced introgression
Cytoskeleton related proteins
COMPARISON WITH SFPs
We successfully genotyped putative SNPs in 13 of 15 genes known (through proteomic analyses; Andrés et al. (2008)) to encode SFPs (Table S7). Four of these were monomorphic and of the remaining nine SNPs five had D < 0.80. For the four markers with D ≥ 0.80, AG-0148P had a pattern consistent with positive directional selection (G. pennsylvanicus allele advantageous); AG-0203P had an excess of heterozygotes consistent with overdominance; and AG-0383F and AG-0501F did not deviate significantly from neutral expectations.
PATTERNS OF DIFFERENTIAL INTROGRESSION REVEAL CANDIDATE BARRIER GENES
Patterns of variation across the cricket hybrid zone differ dramatically among the 125 marker loci, ranging from no introgression (complete absence of heterozygotes) to elevated rates of introgression best explained by positive directional selection. Markers for which introgression is significantly reduced relative to neutral expectations are the focus for studies of speciation. Patterns of variation for these markers may reflect divergent or disruptive selection, positive assortative mating, or heterozygote disadvantage, all of which limit gene flow between populations. To examine patterns of gene exchange within the hybrid zone, we selected SNP markers that had large allele frequency differences (D > 0.8) between allopatric populations of the two species. Therefore, differential introgression, rather than random lineage sorting, is the best explanation for differences among loci in cline shape and position. Choice of SNPs with large frequency differences between the species could decrease our ability to detect significant deviations from neutral introgression (see Gompert and Buerkle 2009; Payseur 2010), but in fact >70% of our SNPs either introgress at the rate expected given the overall divergence of these two lineages or have elevated introgression rates. Similar patterns have been seen in other hybrid zones for which there are a large number of markers that show major allele frequency differences between allopatric populations (Teeter et al. 2008; Gompert et al. 2012a; Luttikhuizen et al. 2012). In these cases, many of the genes with the highest FST values between allopatric populations did not exhibit reduced introgression in the hybrid zone.
Field crickets have large effective populations sizes, are abundant throughout areas of suitable habitat, and exhibit little population structure, so neither genetic drift nor underlying population structure are likely to explain heterogeneity in patterns of introgression. Sex-linked markers might have patterns of introgression that differ from autosomal markers, because of gender-biased dispersal. In field crickets, females are XX and males are XO, so there are no Y chromosome markers. Furthermore, there is no evidence for dispersal differences between males and females. The hybridizing field crickets are usually flightless and disperse by walking; during years of high abundance both sexes can develop long hind wings and are capable of flight (Harrison 1979, 1980). Finally, stochasticity in sampling the distribution of hybrid indices may lead to deviations in cline shapes (Macholan et al. 2011; Gompert et al. 2012b), but a striking deficit of heterozygous genotypes is much less likely to be misinterpreted. As was noted by Dufková et al. (2011), careful marker selection and two-dimensional sampling of hybrid zones, as we have done here, makes differential introgression a reliable and useful tool for inferring selection on different genomic regions in natural populations.
Functional annotation of genes with restricted introgression reveals several interesting classes of proteins that may play a role in barriers between G. pennsylvanicus and G. firmus (Table 1). The most abundant class of proteins we observed was cytoskeletal proteins that bind to actin or tubulin. Cytoskeletal proteins are involved in several steps of fertilization, including sperm capacitation, the acrosome reaction, sperm–egg fusion, and male and female pronuclei fusion (Fenichel and Durand-Clement 1998; Dvorakova et al. 2005; Sun and Schatten 2006; Sosnik et al. 2009). Divergence in proteins that mediate sperm–egg binding and fusion may be involved in the failure of G. firmus sperm to enter G. pennsylvanicus eggs (Larson et al. 2012b). The second most abundant class of proteins was growth-stimulating proteins. These proteins could be involved in differences in body size and ovipositor length between G. pennsylvanicus and G. firmus. Both body size and ovipositor length may play a role in ecological adaptation to different environments (Rand and Harrison 1989, 2006), and body size has been implicated in female mate choice in G. firmus (N. Saleh, E. L. Larson, and R. G. Harrison, unpubl. data). Finally, we identified one gene with reduced introgression within the hybrid zone that has a functional class similar to previously described SFPs (peptidase inhibitor), but the product of this gene has not been identified in proteomic analyses of the male ejaculate.
Thus, functional annotation suggests several possible roles for protein products of genes that show restricted introgression. Without a linkage map, it is impossible to know the genomic distribution of genes with restricted introgression or how many are direct targets of selection. It is unlikely that all markers with restricted introgression reside within a single genomic region, but information on the number and size of such regions must await a linkage map for Gryllus. Regardless of genomic location, many of these genes show very limited introgression and may mark genome regions that are important for local adaptation and/or reproductive isolation.
NO EVIDENCE FOR REDUCED INTROGRESSION OF SFPs
Genes-encoding SFPs did not have restricted introgression relative to other genes expressed in the male accessory gland. Seminal fluid proteins mediate critical steps in fertilization (reviewed in Wolfner 2009) and often have elevated divergence between closely related taxa (Swanson and Vacquier 2002; Clark et al. 2006; Turner and Hoekstra 2008). There has been rapid evolution of SFPs between G. pennsylvanicus and G. firmus (Andrés et al. 2006, 2008), and these genes have been proposed as candidate barrier genes contributing to the fertilization incompatibility between G. firmus females and G. pennsylvanicus males (Andrés et al. 2006, 2008; Braswell et al. 2006; Maroja et al. 2009a). Two cricket SFP genes in particular, AG-0005F and AG-0334P, encode proteins with radical amino acid substitutions between species and near zero migration rates between allopatric populations (Andrés et al. 2008; Maroja et al. 2009a). However, in the much larger sample of individuals and populations included in the current analysis, average allele frequency differences for AG-0005F and AG-0334P were <0.8. Indeed, most genes encoding SFPs did not have sufficient allele frequency differences between species to meet our criterion for genomic cline analysis. Of the four SFP markers included in our introgression analysis (D > 0.8), none showed evidence of restricted introgression. The recent literature provides support for the notion that genes encoding reproductive proteins, including SFPs, are among the most rapidly evolving genes in the genome, and it is often suggested that elevated divergence may contribute to reproductive isolation. However, there is evidence for heterogeneity in the evolutionary rate of genes encoding SFPs, and although some may be rapidly evolving, a significant fraction may also be evolutionarily constrained (Findlay et al. 2008; Dean et al. 2009; Walters and Harrison 2011; Andrés et al. 2013; Good et al. 2013).
MAINTENANCE OF SPECIES BOUNDARIES DESPITE GENE FLOW
A consequence of extensive variation in the introgression of individual markers is variation in the genomic composition of individuals throughout the hybrid zone. The majority of the crickets within the hybrid zone appear to be multigeneration backcrosses, but we see few F1 or multigeneration hybrids, and therefore strong barriers must limit gene exchange. Prezygotic barriers between the two cricket species are well documented. Crickets are associated with different habitats; in Pennsylvania, G. pennsylvanicus occupies areas with natural vegetation and G. firmus occupies more disturbed habitat. There is also evidence of assortative mating (Maroja et al. 2009b) and barriers to fertilization between G. firmus females and G. pennsylvanicus males (Larson et al. 2012b), but there is no evidence of significant postzygotic barriers. Thus, disruptive selection or assortative mating likely maintain linkage disequilibrium within the hybrid zone and explain the patterns of reduced gene flow at many individual loci.
For the majority of markers, introgression is asymmetrical, with G. pennsylvanicus alleles more often found in crickets that have predominantly G. firmus genomic backgrounds (Figs. 1, 3A). The fertilization incompatibility results in asymmetrical introgression of mitochondrial DNA; all hybrids carry G. pennsylvanicus mtDNA and backcrossing can only lead to movement of G. pennsylvanicus mtDNA haplotypes into G. firmus (Harrison et al. 1987; Harrison and Bogdanowicz 1997). F1 hybrid males are unable to fertilize G. firmus females, but hybrid females can backcross with either parent (E. L. Larson and R. G. Harrison, unpubl. data), and mate-choice trials suggest that hybrid females preferentially mate with G. firmus males (Maroja et al. 2009b). Given these observations, alleles from either species should be incorporated into the genomic background of the other, perhaps at a slightly higher rate into G. pennsylvanicus given that both male and female hybrids can backcross with this species. Asymmetric introgression is therefore more likely due to the northward expansion and high population densities of G. firmus crickets (Larson et al. 2013). Gryllus pennsylvanicus appears to be less abundant in areas of contact, and as a result, F1 hybrids are likely to backcross into G. firmus. The observed asymmetry may also represent movement of the hybrid zone boundary, leaving a trail of neutral or weakly selected markers in its wake (Teeter et al. 2008, 2010; Macholan et al. 2011).
Gryllus pennsylvanicus and G. firmus are very recently diverged, yet they are already at a stage in divergence where they are clearly distinguishable, with different morphology, ecology, and behavior. The bimodality we see in the distribution of hybrid indices demonstrates that they are on distinct evolutionary trajectories. Multiple prezygotic barriers isolate these linages, but these barriers are incomplete and at some loci there is ongoing gene flow between species. The semipermeability of species boundaries allows some alleles to pass freely between species, while restricting those that contribute to reproductive barriers. By identifying genes that have restricted introgression across the hybrid zone, we target genes that likely contribute to prezygotic barriers between these species. A number of these genes have functional annotation consistent with known barriers or morphological differences. However, we found no evidence that previously proposed candidate barrier genes, SFPs, contribute to reducing gene exchange within the hybrid zone.
The authors thank E. R. Bondra for assistance with DNA extractions and L. Cote at the Cornell Core Laboratories Center for Sequenom genotyping. The authors also thank members of the Harrison lab, J. M. Good, R. R. Bracewell, and two anonymous reviewers for helpful comments on previous versions of this manuscript. This work was supported by a National Science Foundation grant DEB-0639904 to RGH and an AAUW American Fellowship to ELL.