Structure of a mosaic hybrid zone between the field crickets Gryllus firmus and G. pennsylvanicus

Hybrid zones provide insight into the nature of species boundaries and the evolution of barriers to gene exchange. Characterizing multiple regions within hybrid zones is essential for understanding both their history and current dynamics. Here, we describe a previously uncharacterized region of a well-studied hybrid zone between two species of field crickets, Gryllus pennsylvanicus and G. firmus. We use a combination of mitochondrial DNA sequencing, morphological data, and modeling of environmental variables to identify the ecological factors structuring the hybrid zone and define patterns of hybridization and introgression. We find an association between species distribution and natural habitat; Gryllus pennsylvanicus occupies natural habitat along forest edges and natural clearings, whereas G. firmus occupies more disturbed areas in agricultural and suburban environments. Hybridization and introgression occur across patch boundaries; there is evidence of substantial admixture both in morphological characters and mtDNA, over a broad geographic area. Nonetheless, the distribution of morphological types is bimodal. Given that F1 hybrids are viable and fertile in the lab, this suggests that strong pre-zygotic barriers are operating in this portion of the hybrid zone.


Introduction
Hybrid zones have been described as 'natural laboratories for evolutionary studies' (Hewitt 1988; Barton and Hewitt 1989) and 'windows on evolutionary process' (Harrison 1990(Harrison , 1993. They are places where diverged lineages meet and interact, providing insight into the genetic architecture of speciation and the evolutionary forces that shape divergence (Barton and Hewitt 1985;Harrison 1990;Payseur 2010). Hybrid zone studies have demonstrated that species boundaries are semipermeable and that permeability varies across the genome (Key 1968; Barton and Hewitt 1981;Harrison 1986Harrison , 1990Wu 2001;Nosil et al. 2009;Payseur 2010). Genomic regions that contain alleles contributing to reproductive isolation will either prevent the formation of hybrids or decrease hybrid viability or fertility, restricting introgression. Recombination over multiple generations breaks up parental genomes and some segments of the genome can then be freely exchanged between species. From studies of differential introgression, we can identify genomic regions that are under selection, estimate the strength of that selection, and ultimately link some of those regions to reproductive barriers (Harrison 1990;Payseur 2010).
Many hybrid zones may be tension zones, in which parental types persist because of selection against hybrid genotypes, independent of environment (Barton and Hewitt 1985). However, hybrid zones may also be maintained by environmental selection favoring different parental forms in different ecological settings. In the latter case, hybrid zones can be clinal, with parental forms favored on either side of an ecotone and hybridization occurring in the center (Endler 1977). Alternatively, hybrid zones can be mosaic, with parental forms patchily distributed across heterogeneous habitat and hybridization occurring across patch boundaries or in intermediate habitats (Harrison 1986;Harrison and Rand 1989;Rand and Harrison 1989;Ross and Harrison 2002; but see Searle 1993). In a heterogeneous landscape, hybrid zones may exhibit a mix of different dynamics (e.g., Bombina hybrid zone; Szymura and Barton 1991;Vines et al. 2003;Yanchukov et al. 2006), and as a consequence, reproductive barriers and patterns of introgression may vary geographically (Teeter et al. 2010). Therefore, characterizing multiple transects or regions is essential for understanding both hybrid zone history and hybrid zone dynamics.
Here, we describe a previously uncharacterized region of a well-studied hybrid zone between two species of North American field crickets, Gryllus pennsylvanicus and G. firmus. The hybrid zone has been carefully characterized in Virginia and Connecticut, and reproductive barriers are known to vary between these two regions. This study examines patterns of variation in Pennsylvania, compares these patterns with those seen elsewhere, and investigates what ecological factors maintain the structure of the hybrid zone.

Field cricket hybrid zone
The known hybrid zone between G. pennsylvanicus and G. firmus stretches from southern Connecticut to Virginia along the eastern slopes of the Appalachian, Blue Ridge, and Northern Highland Mountains (Harrison and Arnold 1982) (Fig. 1). The glacial history of the northeastern United States and the distribution of Gryllus mitochondrial DNA (mtDNA) haplotypes provide strong evidence that the hybrid zone formed as a result of secondary contact between lineages that diverged in allopatry (Harrison et al. 1987;Willett et al. 1997;Maroja et al. 2009a). Gryllus pennsylvanicus extends west from the Appalachian and Blue Ridge Mountains and through the mountains to the south. Gryllus firmus, also known as the beach cricket, occurs to the east of the Appalachian Mountains throughout the Piedmont, coastal plain, and along beaches south into Florida (Alexander 1957(Alexander , 1968Harrison and Arnold 1982). Both species occupy grassy, disturbed habitats and can be found under rocks, debris or clumps of vegetation. Both species are univoltine in the north; females lay eggs in the soil, eggs diapause over the winter, hatch in the spring, and adults emerge in late summer or early fall (Fulton 1952). In the south, G. firmus is multivoltine and females lay both diapause and non-diapause eggs (Fulton 1952;Alexander 1968;Walker 1980).
The two cricket species diverged about 200,000 ya (Willett et al. 1997;Maroja et al. 2009a). They are very similar morphologically and were considered part of a single variable species until the 1950s (Fulton 1952;Alexander 1957). Gryllus pennsylvanicus is, on average, a smaller cricket, with darker tegmina (modified leathery front wing) and a relatively shorter ovipositor (Fig. 2). Calling songs of the two species (used to attract females at long distances) are very similar, but have slightly different pulse and chirp rates; courtship songs (used to initiate mating) are identical (Alexander 1957;Doherty and Storz 1992). Despite these similarities, there is evidence of  behavioral isolation, with females of both species reluctant to mate with heterospecific males in the laboratory (Maroja et al. 2009b). The crickets are also isolated by postmating pre-zygotic barriers in one direction; G. firmus females mated with G. pennsylvanicus males lay few eggs, none of which are fertilized (Harrison 1983;Maroja et al. 2009b;Larson et al. 2012).
In Virginia, the two species are temporally isolated because of differences in development time, with G. firmus adults emerging later in the fall (Harrison and Arnold 1982;Harrison 1985). In Connecticut, adults emerge synchronously, but are associated with different soil types; Gryllus firmus on sandy soils and G. pennsylvanicus on loamy soils (Harrison 1986;Rand and Harrison 1989;Ross and Harrison 2002). What maintains this soil association remains unclear; females of both species prefer to oviposit in loamy soils in the laboratory (Ross 2000) and the viability of overwintering diapause eggs appears to be independent of soil type (Ross and Harrison 2006).
Although multiple barriers isolate these species and very few F 1 hybrids are present within the hybrid zone, evidence for introgression is clear (Harrison and Bogdanowicz 1997;Ross and Harrison 2002). Some barriers appear to be consistent across different regions of the hybrid zone (assortative mating, one-way fertilization incompatibility), whereas other barriers vary from one ecological setting to another (development time, soil association). Documenting patterns of species distributions and their ecological context build the foundation for comparing patterns of introgression among different regions of the hybrid zone. Here, we describe patterns of variation in a previously uncharacterized region of the hybrid zone in south-central Pennsylvania.

Cricket sampling
In the fall of 2010, we collected 104 crickets from nine localities in the northeastern range of G. firmus and G. pennsylvanicus. To these, we added 26 crickets from four localities collected by Maroja et al. (2009a). We also used mitochondrial DNA (mtDNA) sequence data for 98 crickets from 28 localities described in Willett et al. (1997) and Maroja et al. (2009a) (Table S1). These samples provide a broad geographic context for analyzing the distribution of cricket mtDNA haplotypes.
In late summer/fall of 2008 and 2010, we collected 877 crickets from 88 localities within a small region of the hybrid zone in south-central Pennsylvania (Table 1). Collection localities span the transition from the Appalachian Mountains into the Great Appalachian Valley. In this region, the Appalachians form a series of continuous ridges and intervening valleys that can range in elevation from 100 m to 650 m above sea level (elevation can change from 250 m to 570 m over only 1.8 km). The ridges are broken by several narrow and dramatic gaps where the Susquehanna River and other small waterways cross the mountains. The Great Appalachian valley is an extended chain of lowlands bounded by the Appalachian Mountains to the west and the Blue Ridge Mountains and Northern Highlands to the east, and includes the Shenandoah Valley in northern Virginia. To the east of the Blue Ridge Mountains are the lowlands of the Piedmont and the coastal beaches. There is a large gap in the eastern ridge of mountains between the Reading Prong (near collecting locality I) and South Mountain (collecting locality AJ) through which the Susquehanna River passes, connecting the Great Appalachian Valley with the Piedmont region.
In general, the mountains are heavily forested (poor cricket habitat), but have some natural clearings and are dissected by roadways and water gaps. The mountain valleys are typically moderately populated farmland, whereas the relevant portion of the Great Appalachian valley (Lehigh, Lebanon, and Cumberland valleys) is relatively heavily populated, and primarily agricultural or suburban.
Crickets were collected by hand, and maintained in plastic containers with food (cat food and rabbit food), water vials, and shelter (paper towels) prior to being frozen at À80°C. The majority of crickets were collected as adults, but in some cases, crickets were collected as late instar nymphs. Nymphs were allowed to mature in the laboratory before freezing.

Mitochondrial DNA
We sequenced the mtDNA gene cytochrome oxidase I (COI) for a total of 130 crickets from 13 localities across the hybrid zone and 119 crickets from 31 localities within the Pennsylvania hybrid zone. We isolated genomic DNA from a single femur using the DNeasy tissue kit (Qiagen, Valencia, CA). Locus specific primers were used to amplify a 1.9-kb fragment of the mitochondrial DNA gene cytochrome oxidase I (COI), the adjacent tRNA, and a portion of cytochrome oxidase II (COII): G. veletis COI F (102) (5′-ACCCCCATCATTAACCCTTTTA -3′) (Maroja et al. 2009a) and Eva/3372 (1885) (5′-GAGA-CCATTACTTGCTTTCAGTCATCT -3′) (Simon et al. 1994). A set of internal primers were designed from G. firmus mtDNA sequence for samples with shorter sequence length: cricketCOI.595 (5′-ATTTACGGTTG GAATAGATGTTGATACCC -3′) and cricketCOI.1270 (5′-GAAGCTTAAATTCATCGCACTTTTCTG -3′). DNA was amplified using polymerase chain reaction (PCR) in 10 lL reactions containing: 1 lL genomic DNA, 2 mmol/ ª 2013 The Authors. Published by Blackwell Publishing Ltd. L MgCl 2 , 0.2 mmol/L dNTPs, 0.2 lmol/L of each forward and reverse primer, and 0.1 lL (0.5 U) Platinum Taq polymerase (Invitrogen, San Diego, CA) in 1 9 PCR buffer (20 mmol/L Tris-HCL, pH 8.4, 50 mmol/L KCl). PCR was performed using an initial denaturation of 95°C for 2 min followed by a touchdown protocol of 35 cycles of 95°C for 50 s, 65-55°C for 1 min (the annealing temperature decreased by 1°C each cycle for the first 10 cycles) and 72°C for 1 min. Samples were enzymatically cleaned with EXOSAP (Invitrogen, San Diego, CA), sequenced in both directions with Big Dye chemistry, and analyzed on an ABI 3730 automated sequencer at the Cornell University Life Sciences Core Laboratories Center for Genomics. All sequences have been deposited in GenBank (KC488896 -KC489085).
The resulting chromatograms were base called using the phred-phrap algorithm and assembled in CodonCode Aligner software (CodonCode Corp, Dedham, MA). All assembled sequences were trimmed and visually inspected. We included an additional 28 sequences from Willett et al. (1997) and 70 sequences from Maroja et al. (2009a) of mtDNA COI and constructed a phylogeny using MrBayes version 3.1.2 (Huelsenbeck and Ronquist 2001). To select the optimal substitution model, we used hierarchical likelihood ratio tests implemented in jModeltest version 0.1.1 (Posada 2008). To generate trees, we used the general time reversible model with invariant sites, gamma rates, and default priors (GTR + I + G), allowing the rate at each site to change over evolutionary history. *Site located in Maryland. Gp = Gryllus pennsylvanicus, Gf = G. firmus, Admix = hybrid or backcross crickets, N = total number of crickets collected at each locality, N † = number of crickets genotyped for mitochondrial DNA, Lat = latitude, Long = longitude, Ele = elevation measured in meters above sea level. Habitat includes both the type of substrate crickets that were collected from under (e.g., rocks, trash, woodpiles, etc.) and the type of surrounding habitat (e.g., forest, field, etc.). Bold font indicates sites in which long-winged crickets were found.
We ran searches for ten million generations, sampling every 2,000 generations and discarded trees from the first 4,000,000 generations (burn-in time). We constructed a 50% majority-rule consensus tree from the remaining trees. The phylogenetic tree was rooted using three G. rubens sequences from Maroja et al. (2009a). MrBayes was run using the resources of the Cornell University Computational Biology Service Unit. We used the Sequenom MassARRAY platform to genotype crickets for 12 mtDNA single nucleotide polymorphisms (SNPs). We collected 301 crickets across 46 sampling localities in Pennsylvania and 81 crickets from seven localities across the hybrid zone. A total of 66 crickets had previously been sequenced for the entire mtDNA COI gene (Table 1), and were used to validate our genotyping results. We assayed five SNPs in the mtDNA COI gene (site numbers 796, 952, 1036796, 952, , 1204796, 952, , 1382796, 952, from Willett et al. 1997) and seven mtDNA SNPs identified in Andr es et al. (2013). Multiplexed site-specific primers were used to amplify target DNA, followed by a single base extension of a primer immediately adjacent to the target SNP. The resulting product was analyzed using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF) and the mass difference of each possible SNP nucleotide allowed unambiguous genotyping. Assays were designed using the MassARRAY Assay Design version 4.0 (Sequenom, San Diego, CA, USA). Amplicon primer sequences, amplicon length, annealing temperature, extend primer sequence, and target SNP for each assay are listed in Table S2. Reactions were performed using iPLEX Gold chemistry at the Cornell Life Sciences Core Laboratories Center for Genomics and SNP genotypes were called using the Sequenom MassARRAY Typer Analysis version 4.0. All genotypes have been deposited in the Dryad Repository (http://dx. doi.org/10.5061/dryad.rr387). We constructed a phylogeny of the mtDNA SNPs using MrBayes as described above (generations: 10,000,000; sampling: 2,000; burn-in: 4,000,000; 50% majority-rule consensus).

Morphological measurements
We characterized morphological traits that distinguish G. firmus and G. pennsylvanicus for crickets from nine allopatric populations (G. firmus: GUI, MAY, MOT, TOM, MET; G. pennsylvanicus: ITH, NBL, SCR, SCO) and from all collecting localities in our focal study area in Pennsylvania. We measured three morphological characters for each cricket: body length, femur length, and pronotal width. In addition, we measured the tegmina color in male crickets and the ovipositor length in female crickets (Fig. 2). Body length, femur length, and pronotum width all reflect overall size differences (G. firmus is typically lar-ger than G. pennsylvanicus). Tegmina color (males) is lighter and ovipositor length (females) is greater in G. firmus. Ovipositor length is the character that most clearly differentiates the two species (Harrison and Arnold 1982;Harrison 1986). We also recorded the presence/absence of fully developed long hind-wings on both males and females; a polymorphic trait that affects flight ability. All size measurements were made to the nearest 0.1 mm using a pair of vernier calipers.
To measure the color of the male tegmina, we used a USB4000 spectrophotometer with a PX-2 pulsed xenon lamp (Ocean Optics, Dunedin, FL) to capture spectral reflectance. The probe was mounted in a metal stand at a 90 o angle 0.7 mm from the surface of the tegmina. For each male, we recorded and averaged spectral reflectance for three points near the center of the tegmina. We used the program SpectraSuite version 2.0 (Ocean Optics) to capture the wavelength readings. All spectral measurements were recorded as the percentage of reflected light relative to a Spectralon white standard (Ocean Optics). We restricted our analyses to wavelengths of 300-700 nm and used a segmental classification method to quantitate three aspects of color: brightness, chroma, and hue (Endler 1990) using the software program CLR (Montgomery 2008). We calculated total brightness (B) as R 300-700 , the summed reflectance from 300 nm to 700 nm. We also divided our reflectance data into four bins of 100 nm each, calculated the total brightness for each bin (B r = 600-700, B y = 500-600, B g = 400-500, and B b = 300-400) and then calculated chroma: √(B r ÀB g ) 2 + (B y ÀB b ) 2 and hue: arctan

Analysis of morphological data
We used principal component analysis (PCA) to explore variation in morphological data. We performed separate PCAs for male body size, tegmina color (brightness, chroma, and hue), and female body size using singular value decomposition of the scaled and centered morphological data with the function "prcomp" in R (R Core Development Team 2010). We performed a one-way analysis of variance (ANOVA) for each morphological trait (ovipositor length and the first principal components of male body size, male tegmina color, and female body size) to test for differences between allopatric G. firmus and G. pennsylvanicus populations. We used a linear discriminant analysis to determine how well each of these morphological traits classifies allopatric crickets. ANOVA and linear discriminant analyses were performed in R using the packages "stats" (R Core Development Team 2010) and "MASS" (Venables and Ripley 2002).
To identify crickets from within the Pennsylvania hybrid zone as G. firmus, G. pennsylvanicus, or admixed individuals, we used a fuzzy c-means clustering algorithm based on morphological traits that distinguish the two species in allopatric populations (male body size, tegmina color, female body size, and ovipositor length; see above analyses). We clustered individual crickets into these three classes using the "fanny" function from the R package "cluster" (Maechler et al. 2002). In contrast to hard clustering algorithms (such as K-means) in which data elements are divided into distinct clusters, fuzzy clustering allows data elements to belong to more than one cluster and assigns a corresponding set of membership levels (or membership coefficients). Because we were interested in identifying morphological clusters for the two parental species and admixed individuals, we assumed that there were two clusters (k = 2), which divided individuals into two morphological clusters (membership coefficients ! 0.1 or 0.9) and a third "fuzzy" or admixed cluster (membership coefficients < 0.1 and >0.9). The membership exponent (r) determines the degree of "fuzziness" in cluster assignment. A value of r = 1.0 assigns each data element to a single cluster and is equivalent to a classic K-means clustering, while values of r > 1.0 become increasingly fuzzy until all individuals are equally distributed among the k clusters (i.e., all belong to single "fuzzy" cluster). The proportion of individuals classified as admixed depends on the choice of r, but there is no clear rule for selecting r values. We used an approach, similar to other ecological studies (Schaefer and Wilson 2002;Gompert et al. 2010), of conducting separate fuzzy c-means clustering analyses for male morphology (male body size and tegmina color) and female morphology (female body size and ovipositor length) using r values ranging from 1.0 to 2.5. Values of r > 1.75 for male traits and >2.0 for female traits classified nearly all individuals as fuzzy. Therefore, we report cluster assignments using three values of r for male traits (1.25, 1.5, 1.75) and three values of r for female traits (1.5, 1.75, 2.0), all using k = 2 ( Table 2). Values of r = 1.25 for males and r = 1.75 for females delineate cluster membership in a manner, which is consistent with classic Kmeans clustering algorithms and morphological indices that have been applied to these crickets (Harrison and Arnold 1982;Harrison 1986;Rand and Harrison 1989;Harrison and Bogdanowicz 1997) and were used for all further analyses.

Environmental predictors of species distributions
We assessed 12 environmental variables for each of our 88 sampling sites at a 1-km scale for all variables. We calculated percent natural vegetation cover based on a 30-m resolution land-cover raster from circa 2005 (Homer et al. 2007). We considered urban, pasture, agriculture, silviculture, and recreational (e.g., golf-courses) landcover types as non-natural. We calculated topographic complexity using the raster calculator feature in ArcGIS 9.3.1 (ESRI 2009), where each elevation pixel was assigned the variance of the neighbor pixels (Huaxing 2008). This metric provides significant information on habitat heterogeneity and microclimate turnover. We assessed physical soil characteristics for each sampling location (i.e., maxi-mum% sand, silt, clay, and organic matter) using spatial data made available by Soil Data Mart (Soil Survey Staff 2012). We also recorded vegetation density (USGS and FAO 2000), latitude, elevation (Jarvis et al. 2009), human footprint (Sanderson et al. 2002), annual temperature (Bio 1; Hijmans et al. 2005), and annual precipitation (Bio 12; Hijmans et al. 2005).
We analyzed our data using simple linear regressions (standard least squares). We used this univariate approach to test the relationship of each explanatory variable with ovipositor length or morphological clustering membership coefficient. We then used model selection tests including all environmental variables and their interactions to find the combinations of variables that best explained ovipositor length and cluster membership. Competing models were ranked based on Akaike Information Criterion (AICc), and we reported the model with the highest goodness-of-fit for each run. Linear regression and automated model selection were conducted using JMP 10.0 (SAS 2012).

Mitochondrial DNA
Phylogenetic analysis of the mtDNA COI gene produced a tree with four major groups, each group composed of conspecific crickets found primarily in circumscribed geographic areas: (1) northern G. pennsylvanicus, (2) southern G. pennsylvanicus, (3) northern G. firmus, and (4) southern G. firmus (Fig. S1). These four groups correspond to the haplotype groups identified by Willett et al. (1997) and Maroja et al. (2009a). Analysis of seven mtDNA SNPs identified the same four major haplotype groups, and two of these SNPs (Table S2, SNPs 448 and 554) were shared among the majority of G. firmus crickets (Fig. 3A). The proportion of crickets belonging to each mtDNA group for each sampling locality across the eastern United States and across our focal study area in Pennsylvania are presented in Fig. 3.

Morphological data
The first principal component of male body size explained the majority of variation in body size among male crickets (87.7% AE 1.62) and all three measurements of male body size (body length, femur length, and pronotum width) had positive loadings on PC1. We used PC1 to represent male body size for all further analyses and we refer to this component as "male body size." We found significant differences in male body size between G. firmus   Figure 3. Mitochondrial DNA phylogeny and haplotype distribution for Gryllus pennsylvanicus and G. firmus. Yellow and orange represent northern and southern G. pennsylvanicus haplotypes and blue and green represent northern and southern G. firmus haplotypes, respectively. Each pie shows the proportion of crickets belonging to a mtDNA group and letters refer to the location (details in Table 1 and Table S1). a) Bayesian 50% majority-rule consensus tree of five mtDNA SNPs from cytochrome oxidase I (site numbers 796, 952, 1036, 1204, 1382 from Willett et al. 1997) and seven mtDNA SNPs identified in Andr es et al. (2013) ( Table S2). Values on the branches correspond to the Bayesian posterior probabilities. The tree includes 81 crickets from 7 localities across the hybrid zone (Table S1) and 301 crickets from 46 sampling localities within the Pennsylvania hybrid zone. b) Distribution of mtDNA haplotypes across the hybrid zone. The rectangle highlights the location of the Pennsylvania study area. Shading indicates elevation in meters above sea level (m) with higher elevations represented by darker shades of gray. c) Detailed map of mtDNA haplotypes within Pennsylvania.
Within the Pennsylvania hybrid zone, the majority of crickets were classified as either G. firmus or G. pennsylvanicus based on fuzzy c-means clustering (membership coefficients ! 0.90) (Fig. 5A). Values of r ranging from 1.25 to 1.75 and 1.50 to 2.0 for males and females, respectively, classified approximately 15-50% of crickets as intermediate between the two parental types ( Table 2). The distribution of cluster membership coefficients was bimodal for both males and females, with slightly more males classified as intermediate and an overall greater number of crickets classified as G. pennsylvanicus (Fig. 5B) (see Fig. S2 for the distribution at each sampling locality). The proportion of individuals at each sampling locality classified as G. pennsylvanicus (membership coefficient 0.10), G. firmus (membership coefficient ! 0.90), or intermediate (membership coefficient >0.10 and <0.90) is depicted in Fig. 6. We found 2 collecting localities that were pure G. pennsylvanicus and 17 that were pure G. firmus. Gryllus firmus localities were more common in the Great Appalachian Valley between the Appalachians and the Blue Ridge, whereas G. pennsylvanicus were located mostly in the northeastern corner of our study area and in the large gap between the Reading Prong of the Northern Highlands and the Blue Ridge  haplotypes) at 32 localities that could overall be characterized as predominantly G. firmus (14) or G. pennsylvanicus (18). There were 36 localities that contained both parental types and admixed individuals (Table 1).
Sixty-one of the 877 crickets we collected were longwing crickets. These crickets were found at 20 localities of the 88 total localities sampled (Table 1). At most localities, we collected only one or two long-winged crickets (5 -15%), but at three localities (CH, BV, BR), nearly a third of crickets were long-winged and at BG 77% of crickets were long-winged. The majority of the longwinged crickets were G. firmus (based on morphological clustering) (Fig. 7).

Simple linear regressions
We found that natural vegetation, latitude, vegetation density, annual temperature, and annual rainfall best predicted ovipositor length in the Pennsylvania hybrid zone (Table 3, Fig. 8). Percent sand was only a marginally significant predictor of ovipositor length. Likewise, the same environmental variables were the best predictors of the morphological clustering membership coefficients: latitude, natural vegetation, annual temperature, annual rainfall, and vegetation density (Table 3). Percent sand and silt were only a marginally significant predictor of morphological cluster. The following environmental variables did not significantly influence either ovipositor length or cluster membership when considered in simple linear regressions: elevation, human footprint, topographic complexity, and soil physical characteristics for ovipositor (silt, clay, and organic content) and morphological cluster (clay, organic content).

Model selection: all possible models
When looking at all environmental factors simultaneously, natural vegetation and latitude best explained ovipositor length and morphological cluster membership coefficients (Table S5). The best model explaining ovipositor length included latitude, natural vegetation, and vegetation density as negative predictors, and also the interactions of natural vegetation with both latitude and vegetation density (Table 4). The best model explaining the cluster membership included latitude, natural vegetation, and organic matter as negative predictors, and also the interactions of natural vegetation with both latitude and organic matter (Table 4).

Discussion
The field cricket hybrid zone in Pennsylvania is a mosaic of genetically and morphologically distinct populations. The distribution of both genetic and morphological types is highly heterogeneous and cannot be explained as a function of distance across the hybrid zone; there is no clear clinal pattern of variation for any trait, but rather a patchwork of populations. This is similar to the patterns seen in other regions of the hybrid zone in Virginia (Harrison and Arnold 1982) and Connecticut (Harrison 1986;Harrison and Rand 1989;Rand and Harrison 1989) ( Table 5). Mosaic hybrid zones occur when the ecological settings and/or geography in the area of overlap are heterogeneous or complex, and species distributions are determined by environmental selection (e.g., Harrison 1986;Howard 1986;Harrison and Rand 1989;Bridle et al. 2001;Ross and Harrison 2002;Bierne et al. 2003;Vines et al. 2003;Ross et al. 2008). In Pennsylvania, we find an association between species distribution and natural habitat; G. pennsylvanicus occupies natural habitat along forest edges and natural clearings, whereas G. firmus occupies more disturbed areas in agricultural and suburban environments. Hybridization and introgression occur across patch boundaries; there is evidence of substantial  Table 1. admixture both in morphological characters and mtDNA, over a broad geographic area.
Broad scale distribution of G. firmus and G. pennsylvanicus: morphology and mtDNA Our sampling of G. firmus and G. pennsylvanicus revealed the same four major mtDNA haplotype groups that were found by Willett et al. (1997) and Maroja et al. (2009a). Gryllus pennsylvanicus consists of two major clades (northern and southern) and G. firmus has a distinct southern clade, and a northern group that is distinguishable from the other three well-supported clades (Fig. S1). Given that we find all four mtDNA haplotypes at three collecting localities in central Pennsylvania (BB, AR, BK), this region appears to represent the geographic divide between northern and southern groups of each species (Fig. 3). The hybrid zone appears to extend further north than previously described (Harrison and Arnold 1982;Harrison 1986;Maroja et al. 2009a). We found crickets in Rhode Island and Massachusetts that have introgressed mtDNA and the Massachusetts locality contains both parental types. To the north, we found only pure G. pennsylvanicus populations. Thus, Massachusetts may represent the northern range limit of G. firmus. The relationship among the mtDNA groups has not been resolved with sequence data from the mtDNA COI gene. Willett et al. (1997) found five equally parsimonious tree topologies for these groups and in all cases, either northern or southern G. pennsylvanicus clades were the basal group, with the southern G. pennsylvanicus clade having the greatest haplotype diversity. Additional genotyping of seven mtDNA SNPs identified two nucleotide positions in the ATPase6 and COIII genes that are shared between northern and southern G. firmus (Fig. 3A). This suggests that an ancestral cricket lineage split into two daughter lineages, one that became either northern or southern G. pennsylvanicus and a second that split into the other G. pennsylvanicus clade and G. firmus. Gryllus firmus has subsequently diverged into northern and southern groups.

Patches of natural habitat contribute to the mosaicism of the Pennsylvania hybrid zone
Throughout their ranges, both G. pennsylvanicus and G. firmus can be found in disturbed habitats along roadsides, in fields and pastures and around human settlement. Yet, in the Pennsylvania hybrid zone, we see an association between species distribution and natural habitat; Gryllus pennsylvanicus occupies natural habitat along forest edges and clearings, while G. firmus occupies disturbed habitat near human settlement and agriculture (Table 4, Fig. 8). In the Pennsylvania study area, there is more natural habitat in the Appalachian Mountains to the north, which can explain why we also see a correlation between the distribution of G. pennsylvanicus and higher latitudes, greater vegetation density, lower temperatures, and more rainfall. However, we also find G. pennsylvanicus crickets in patches of natural habitat further south along the Blue Ridge Mountains (e.g., AJ, AK, AN, BI) and near rivers, lakes, and parks in the large gap between the Blue Ridge Mountains and the Reading Prong of the Northern Highlands (e.g., C, D, H, L, Z, Y). Likewise, G. firmus occurs further west into the Appalachian Mountains than earlier surveys of the hybrid zone suggested (Harrison and Arnold 1982;Maroja et al. 2009a).
We found a high proportion of crickets with both G. firmus morphology and mtDNA haplotypes in the Great Appalachian Valley and the small valleys within the Appalachians. Gryllus firmus likely expanded north through these corridors and crossed the steep mountain ridges along roadways through natural water and wind gaps (CG, CH, CD, CE, BT). In some areas, it appears that human disturbance has facilitated the persistence of Given that both cricket species seem well adapted to disturbed areas, it is unlikely that either performance in or preference for disturbed habitat restricts the distribution of G. pennsylvanicus. It is more likely that G. firmus is either less well suited for the habitat characteristic of G. pennsylvanicus' range or that G. firmus is particularly well suited for disturbed habitat and is a better colonizer. Typically, field crickets have short hind-wings and are incapable of flight. Daily movements such as feeding, reproduction, and predator avoidance are accomplished by walking. But in both species, individuals can be found with long hind-wings and fully developed flight muscles, capable of long-distance dispersal (Alexander 1968). The development of long-winged morphs is triggered by environmental variables such as temperature, population density, and resource availability (Harrison 1980). However, G. firmus and G. pennsylvanicus have both intra-and interspecific variation in their proportions of long-winged individuals that can be explained in part by genetic differences (Harrison 1979). Gryllus pennsylvanicus has on average of 4% long-winged individuals (Alexander 1968;Harrison 1979), whereas the frequency for G. firmus has been reported to be as high as 10-30% in some populations (Veazey et al. 1976;Harrison 1979). Wing dimorphism is thought to be particularly prevalent in G. firmus because its natural habitat is often highly disturbed and ephemeral (e.g., sand dunes, beach grass, and under shoreline debris) and may necessitate frequent dispersal among habitat patches (Roff 1990). Indeed, approximately 82% of the long-wing crickets identified in this study were G. firmus, whereas only 3% were G. pennsylvanicus (the remainder had fuzzy morphological membership coefficients) (Fig. 7). Wing dimorphism averaged 5-10% in G. firmus populations, but ranged as high as 30-75% at some collecting localities. In all cases, long-winged crickets were found at localities with high population densities and in disturbed habitats (Table 1). This suggests that G. firmus is both capable of thriving in disturbed habitats and may have a greater propensity for long-distance dispersal. The association of G. pennsylvanicus with natural habitat and G. firmus with disturbed habitat is likely to be a major factor contributing to the mosaic structure of the hybrid zone in Pennsylvania. However, the environmental variables associated with hybrid zone structure are very different between Pennsylvania and other regions of the hybrid zone (Table 5). In Connecticut, soils vary over very short distances from loam to sand, and there is a clear association between species distributions and soil type; G. firmus occur on sandy soils and G. pennsylvanicus on loam (Harrison 1986;Rand and Harrison 1989;Ross and Harrison 2002). This may reflect adaptive differences Table 5. Summary of the ecological barriers to gene exchange for three well-characterized regions of the hybrid zone between the field crickets, Gryllus pennsylvanicus and G. firmus. Included is the spatial scale each region was sampled (Scale), the structure of the hybrid zone at that spatial scale (Structure), and the ecological barriers identified as contributing to the hybrid zone structure (Ecological barrier).

Region
Scale Structure Ecological barrier Citations  in ovipositor length (G. firmus has a relatively longer ovipositor) for egg placement in different soil types (Masaki 1979;Bradford et al. 1993). In contrast, Pennsylvania soils are predominantly clay ( ! 20% clay) and we saw no correlation between soil properties and species distributions (Table 3). In Virginia, there is also no association between species distribution and soil type. Instead, elevation and temperature appear to contribute to hybrid zone structure; G. pennsylvanicus occupies high elevation sites in the Appalachian Mountains, while G. firmus is primarily in the lowlands. This is likely driven by differences in development time. Gryllus firmus from Virginia develop more slowly than G. pennsylvanicus (both in the field and in the lab) resulting in offset adult emergence (Harrison 1985). There are likely climatic life cycle shifts in G. firmus; southern crickets develop quickly and have multiple generations per year, but in mid-latitudes (Virginia), development may slow to accommodate only one generation per year and at even higher latitudes (Connecticut), shorter growing seasons may again favor faster development rate (Fulton 1952;Alexander 1968;Walker 1980;Harrison 1985). In both Connecticut and Pennsylvania, crickets appear to emerge synchronously, and there is no evidence that temporal isolation contributes to hybrid zone structure.

Patterns of admixture suggest strong pre-zygotic barriers
In mosaic hybrid zones, the patchy distribution of parental types results in extensive contact throughout the zone. Hybridization and introgression occur across patch boundaries or in intermediate habitats.
In the Pennsylvania hybrid zone, there is a patchy distribution of natural and disturbed habitat and a corresponding distribution of G. pennsylvanicus and G. firmus. There are numerous opportunities for contact in areas where there are transitions in patch type: along mountains slopes, intersecting roadways, and near encroaching human development. Despite these opportunities for hybridization, we found that the majority of collecting sites are predominantly composed of a single parental type and a few individuals with intermediate morphologies that may be admixed (most likely backcrosses) (Fig. 6). Indeed, many of the crickets from sites with intermediate G. firmus morphologies had G. pennsylvanicus mtDNA haplotypes, suggesting that morphology is a good indicator of admixture. Each of these individual populations has an L-shaped distribution of morphological cluster membership (Fig. S2), but the combination of these predominantly G. firmus and G. pennsylvanicus populations results in an overall bimodal distribution within the hybrid zone (Fig. 5B). A few collecting localities contained both parental types, and a number of sites appeared to be mixed (containing both parental types and morphologically intermediate individuals). The topographic complexity of the region may also explain why the hybrid zone appears broader across the central Appalachian Mountains than early surveys of the hybrid zone suggested (Harrison and Arnold 1982;Willett et al. 1997;Maroja et al. 2009a). The sharp transitions between forested mountains and populated valleys increase the patchiness of natural habitat and could increase the extent of hybridization. In addition, increased human disturbance as a result of suburban expansion, agriculture, and resource extraction is likely expanding the area of contact by increasing suitable habitat for G. firmus. Contact in some of these areas may even be very recent. For instance, the occurrence of G. firmus along the Pennsylvania turnpike (CG, CH, CD) in relatively discrete locations suggests that G. firmus may have only begun occupying these high elevation sites in recent decades.
Although we found evidence of substantial admixture both in morphological characters and mtDNA over a broad geographic area, the two species remain distinct. Most admixed individuals are morphologically like one or the other parental type (Fig. 5B), and there are few intermediate individuals. Given that F 1 hybrids are viable and fertile in the lab, this suggests that strong pre-zygotic barriers are operating in this portion of the hybrid zone, a pattern consistent with characterizations of other regions of the hybrid zone in Virginia (Harrison and Arnold 1982) and Connecticut (Harrison 1986;Harrison and Bogdanowicz 1997). Barriers involved in behavioral isolation (Harrison 1986;Harrison and Rand 1989;Maroja et al. 2009b) and post-mating barriers that prevent fertilization (Harrison 1983;Larson et al. 2012) appear to be consistent across these different regions of the hybrid zone. In contrast, the ecological barriers that likely contribute to the hybrid zone's mosaic structure appear to vary between geographic regions (Table 5). In Connecticut, crickets are associated with different soil types, whereas in Virginia, crickets occur at different elevations and are temporally isolated due to differences in development time. In Pennsylvania, we found that the extent of natural habitat best explains the distribution of the two cricket species. This variation can have important consequences for patterns of introgression among different regions of the hybrid zone; genes involved in ecological isolation may vary in the extent of introgression in different environmental contexts (Harrison 1990;Payseur 2010). Interpreting patterns of variable introgression requires a clear understanding of the environmental context of species interactions (Nolte et al. 2009;Teeter et al. 2010;Macholan et al. 2011;Janousek et al. 2012). Species boundaries have been described as semipermeable, and this permeability varies not only across different genomic ª 2013 The Authors. Published by Blackwell Publishing Ltd.
regions but also among different geographic areas and ecological contexts . Characterizing multiple regions within a hybrid zone is therefore critical for understanding hybrid zone dynamics, and gaining insights into the nature of species boundaries.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. Bayesian 50% majority-rule consensus tree of the mtDNA gene cytochrome oxidase I sequence data. Yellow and orange represent northern and southern G. pennsylvanicus haplotypes and blue and green represent northern and southern G. firmus haplotypes, respectively. Values on the branches correspond to the Bayesian posterior probabilities. The tree includes all samples from Willett et al. (1997) and Maroja et al. (2009a), plus an additional 130 crickets from 13 localities across the hybrid zone (Table S1) and 119 crickets from 31 localities within the Pennsylvania hybrid zone (Table 1). Figure S2. Distribution of morphological clustering membership coefficients for males (r = 1.25) and females (r = 1.5) for each collecting locality. Dissimilarities in morphological traits between individuals (males: body size and tegmina color; females: body size and ovipositor length) are calculated using squared Euclidean distances (fuzzy c-means). Collecting localities are ranked by the average clustering index, with Gryllus firmus localities at the top-left and G. pennsylvanicus localities at the bottom-right.