The evolution of Hessian fly from the Old World to the New World: Evidence from molecular markers

Eighteen polymorphic microsatellite loci and 11 single‐nucleotide polymorphisms were genotyped in 1 095 individual Hessian fly specimens representing 23 populations from North America, southern Europe, and southwest Asia. The genotypes were used to assess genetic diversity and interrelationship of Hessian fly populations. While phylogenetic analysis indicates that the American populations most similar to Eurasian populations come from the east coast of the United States, genetic distance is least between (Alabama and California) and (Kazakhstan and Spain). Allelic diversity and frequency vary across North America, but they are not correlated with distance from the historically documented point of introduction in New York City or with temperature or precipitation. Instead, the greatest allelic diversity mostly occurs in areas with Mediterranean climates. The microsatellite data indicate a general deficiency for heterozygotes in Hessian fly. The North American population structure is consistent with multiple introductions, isolation by distance, and human‐abetted dispersal by bulk transport of puparia in infested straw or on harvesting equipment.


Introduction
Hessian fly (Mayetiola destructor L.), a devastating insect pest of wheat (Triticum aestivum L. emend. Thell. and related species), is native to the Near East (Stuart et al., 2008) and has accompanied its host to wheatgrowing regions around the world. In North America, Hessian fly was first noticed in Brooklyn, New York City, New York, in 1777 (Pauly, 2002;Porter et al., 2009), but the number, location, source, and role of introductions remain poorly understood in relation to the current genetic diversity of Hessian fly. Furthermore, Hessian fly now occurs in environments very different from the Middle East, the supposed center of origin, and it is therefore likely that Hessian fly is continuing to evolve in Correspondence: Brandon J. Schemerhorn, USDA-ARS, Department of Entomology, Purdue University, West Lafayette, IN 47906, USA. Tel: +1 765 494 7494; fax: +1 765 494 5105; email: bschemer@purdue.edu response to these environments, including climate and human mediated changes (insecticides, host resistance, and planting schedules). Hessian fly is an increasing threat as wheat is increasingly grown without tillage, since the puparia overwinter in the stubble of the preceding year's crop. Furthermore, virulence has rapidly arisen and spread in response to the resistance genes deployed in wheat (Cambron et al., 2010). The current population structure and gene dispersal in Hessian fly are informative about its evolutionary history and its ability to overcome currently deployed resistance genes in wheat, since a high level of intraspecific genetic diversity allows a rapid response to selection.
Two previous molecular surveys of Hessian fly have employed mitochondrial 12S rRNA haplotype (Johnson et al., 2004) and the first intron of the Hessian fly white locus, a nuclear gene (Johnson et al., 2011). The 1st study found 3 of the 7 worldwide mitochondrial haplotypes in North America, with haplotypes 2 and 3 most frequent in the northeastern and north-central United States and adjacent Canada, and haplotype 1 elsewhere. Only 11 of 23 population samples were monomorphic for haplotype, and a population near Peoria, Illinois, had all 3. The most similar Old World population was from Spain, although haplotype 2 also occurred in Kazakhstan. The second study identified 33 alleles over 26 variant nucleotide positions in white, with 4-10 alleles in each North American sample, a high frequency of endemic alleles in North America, and the most sharing of alleles between North America and Spain. The current report extends these two investigations to a considerably larger number of loci and individuals, using microsatellites and single-nucleotide polymorphisms (SNPs).
Microsatellites (SSRs) and SNPs have been used successfully to characterize populations in insects (Behura, 2006;Morton et al., 2011). Microsatellites have the advantages of rapid evolution, frequently multiple alleles, and likely selective neutrality (Li et al., 2000), but with these advantages come a high likelihood of evolutionary reversals (homoplasy) in allele size (van Zijll de Jong et al., 2011) and problems in scoring the alleles accurately (Selkoe & Toonen, 2006). Single-nucleotide polymorphisms require prior knowledge of sequence to design the SNP primer and assure its uniqueness in the genome (Doron & Shweiki, 2011). While any individual SNP can have at most 4 alleles, the number of SNPs in an entire genetic locus is limited only by the number of polymorphic nucleotide positions in its sequence, so long as primers can be designed to terminate adjacent to each one. Because a SNP in the 3rd position of many codons will not affect the resulting protein sequence, SNPs expectedly are more frequent than microsatellites within coding regions, which facilitates very specific targeting of genes in relation to population and fitness measures (Greenwood et al., 2002). Microsatellites and SNPs might be the most effective kinds of markers to differentiate populations that are experiencing high gene flow (Narum et al., 2008).
Our previous work has identified 94 microsatellite loci in Hessian fly (Schemerhorn et al., 2008(Schemerhorn et al., , 2009, and there is enough available BAC-end sequence and gene sequence in Hessian fly to search for SNPs. Therefore, we used both types of markers to analyze population structure and linkage disequilibrium from diverse Hessian fly samples, including many of the same populations sampled by Johnson et al. (2004Johnson et al. ( , 2011. These samples come primarily from the United States and Canada, but also include single populations from New Zealand, Spain, Kazakhstan, Syria, and Israel. This is the first report of the development and use of SNP markers in the evolutionary analysis of Hessian fly.

Sampling and DNA sample collection
In total, 1 095 individuals of Hessian fly, representing 23 populations, were collected from wheat fields in 7 countries: the United States, Canada, Spain, Israel, Syria, Kazakhstan, and New Zealand. Populations were sampled from two provinces in Canada (Ontario and Manitoba) and 16 states in the United States (Delaware, Virginia, North Carolina, South Carolina, Alabama, Mississippi, Louisiana, Texas, Oklahoma, Kansas, Illinois, North Dakota, Montana, Idaho, Oregon, and California). Individual flies were stored in 100% ethanol before extraction of genomic DNA according to a previously described protocol (Schemerhorn et al., 2008(Schemerhorn et al., , 2009.

SNP discovery
Primer sequences were selected from our BAC-end sequences and from GenBank accessions, mostly ESTs, that are conserved as amino-acid sequences versus the GenBank protein database, nr. While the optimum primer length was 20 nucleotides, the range in length was 18-26 nucleotides. The accepted primers had T m near 60°C (the minimum was 55°C), GC content about 50%, and product size between 350 and 1100 nucleotides.
Initially, 50 primer pairs were derived from chromosomal regions found by blast search to be conserved among Hessian fly, the orange wheat blossom midge (Sitodiplosis mosellana [Gehin]), and the Asian rice gall midge (Orseolia oryzae Wood-Mason). These primer pairs were screened against 24 individual flies, using different PCR conditions. A standard PCR condition was then chosen: a 25-μL final reaction volume containing 1× highfidelity PCR buffer, 0.2 mmol/L dNTP, 2 mmol/L MgSO 4 , 0.3 μmol/L primers, and 1.0 unit Invitrogen Platinum TM (Life Technologies, Grand Island, NY, USA) high-fidelity Taq polymerase, with initial denaturation at 94°C for 90 sec, then 35 cycles of 94°C for 30 sec, 52-58°C depending on the primer as shown in Table 2 for 30 sec, and 68°C for 60 sec, then extension at 68°C for 10 min and holding at 4°C. There were 22 primer pairs that were eliminated because they produced multiple bands or did not yield enough product under these standard conditions. The Hessian fly BAC library was screened to find clones that contain the primer-amplified regions, and the positive BACs were physically mapped on chromosomes by FISH as shown in Fig. 2 (Schemerhorn et al., 2009). Autosomal markers, which necessarily mapped to the A1 and A2 chromosomes, could be used in both sexes. Sex-linked markers, which belong to the X1 and X2 sex chromosomes, could be used only for female individuals.
To identify SNPs, 96 female flies were used from a subset of 13 of the 23 total populations: New Zealand, Kazakhstan, Syria, Israel, Ontario, Manitoba, Delaware, North Carolina, Florida, Alabama, Texas, Idaho, and Ore-gon. High-quality DNA was extracted from each fly and suspended in water prior to PCR, using the 28 primer pairs that amplified specific bands. The PCR product was freed of unincorporated nucleotides and very short fragments using the PureLink TM 96 PCR Purification Kit (K3100-96, Promega [Madison, WI, USA]). The 28 plates with purified amplicons were sequenced in the Purdue Genomics Center, and the 2 688 sequences thus obtained were analyzed with Consed to identify SNPs. The utilized SNPs came from 23 genic sequences, which have been deposited in GenBank at accessions KJ917309 through KJ917332 (Table 2).

Primer development and PCR conditions
Using Primer3 (Rozen & Skaletsky, 2000) on the sequences obtained above, primers were chosen to amplify a 100-300 bp product that contained 1 or 2 SNPs. The PCR primers were designed to anneal at about 65°C for greater target specificity. Each PCR reaction was performed in a 25 μL volume containing 0.2 mmol/L dNTP mix, 0.3 μmol/L of each primer, 1× reaction buffer, and 0.2U Taq DNA polymerase (Promega). Amplification was conducted with denaturation at 94°C for 90 sec, 35 cycles of denaturation at 94°C (30 sec), annealing temperature as shown in Table 2 (30 sec), 72°C (60 sec), final extension   at 72°C for 10 min, and holding at 4°C. The amplified DNA products were visualized using ethidium bromide under UV light after electrophoresis in a 2.5% agarose gel. Six microliters of mixed PCR products were purified with 2.4 μL ExoSAP-IT (US Biochemicals #78202) at 37°C for 15 min, then incubated at 80°C for 15 min and held at 4°C (Jiang et al., 2002).

Single-base extension and SNP verification
A third, interior, interrogation primer was chosen to end immediately 5 to a SNP within the amplified products. The third primer was used with the PCR product for single-base extension, since the PCR provided enough template to allow detection of the added base. The interior primers differed in length (20-81 bp) and annealing temperature (69-79 ºC), to allow multiplexed detection with the CEQ (Jiang et al., 2002;Sanchez & Endicott, 2006).
The 11 multiplexed SNP assays, together with the GenomeLab TM DNA size-standard kit 80, were subjected to capillary electrophoresis on a CEQ8000 Genetic Analysis System (Beckman Coulter, Fig. 4) using its SNP detection program. Subsequent analysis involved all 23 populations.

Scoring microsatellite alleles
The tabulated microsatellite data were examined for range, duplicate decimal points, equal consecutive values within rows, and equal consecutive values between individuals within the same sample, to detect errors. For each primer pair and individual, the measured peaks were counted into bins that were 0.1 nucleotide wide.
To score the microsatellite genotypes, the highest counted bin was identified, and all bins within f bins from the peak bin were classified as belonging to the peak bin. The next highest counted bin was then identified, and nearby bins similarly were assigned to it. This procedure was repeated until all bins with nonzero counts had been classified. The list of highest counted bins provided the called allele sizes, that is, allele size = bin index/10. This custom program read through all the data, compared each reading with the called allele sizes for its primer pair, and replaced the reading with the closest called allele size. There was no attempt to map allele sizes to a linear scale of integers, since electrophoretic mobility depends not only on the allele size but also its GC content and sequence, and since size does not matter once the different alleles have been distinguished. The value of f was varied from 6 to 18, that is, 0.6 to 1.8 nucleotides, and the finally chosen value of f was 0.8 for dinucleotide repeats, 0.9 for trinucleotide repeats and 1.5 for tetranucleotide repeats.

Data analysis
In total, 63 510 variants were scored from 18 microsatellite and 11 SNP markers from 23 populations. Each population sample contained from 8 to 56 individuals; most exceeded 48. All the microsatellite data were checked with Micro-Checker (van Oosterhout et al., 2004) for evidence of null alleles and scoring errors. The biascorrected F st distance and bias-corrected Nei's standard genetic distance (D ST ) were calculated, and phylogenetic trees were constructed by neighbor-joining or UPGMA in POPTREE2 (Takezaki et al., 2010). The microsatellites and SNP data were analyzed separately in Arlequin 3.1 (Excoffier et al., 2005) for linkage disequilibrium, Hardy-Weinberg equilibrium, pairwise population F st values, and Fisher's exact test of population differentiation. Conformity to Hardy-Weinberg equilibrium was also analyzed using GenePop (Raymond & Rousset, 1995;Rousset, Fig. 3 DNA sequence surrounding named SNP loci. All primer sequences are bolded. The PCR primers are also underlined, and each SNP primer overlies a green arrow. Each SNP primer name ends with the total primer length, including the bolded nucleotides plus any poly A tail at the 5 end. Each SNP itself is boxed in red. 2008), and the output was subjected to t-tests in R (R Development Core Team, 2008).
Because the accessions were identified only by state, province, or country of collection, it was not possible to perform a highly accurate analysis of genetic variation by geographic distance. Instead, proxy locations were chosen from each state, using the county reported by Johnson et al. (2004, Fig . 1) where available. Otherwise, the location was arbitrarily chosen to be near an agricultural university or within a wheatgrowing region as identified from Figure 1.2 in http:// iwheat.org/book/export/html/2 (accessed May 28, 2014) and refined by inspection of satellite imagery in Google Maps (https://maps.google.com), which provided geographic coordinates in decimal degrees. These locations appear in Table S1. A custom Perl script calculated Pearson's correlation coefficient r for individual allele frequencies versus distance from New York City, mean January temperature, mean July temperature, and total rainfall over June, July, and August, factors that can reasonably be expected to affect the survival of puparia and timely emergence of adults. Great-circle distance was calculated between each pair of proxy localities using the spherical law of cosines on the latitude and longitude (http://www.movable-type.co.uk/scripts/latlong.html, accessed May 29, 2014). Meteorological data were obtained from http://www.weatherbase.com (accessed May 29, 2014) and http://www.usclimatedata.com (also accessed May 29, 2014). The correlation coefficient was calculated only for the 97 alleles that exceeded a frequency of 0.15 in at least one North American population. The significance of these correlations was evaluated by recalculation of r for 10 000 random permutations of the distance, temperature, or rainfall, generated using the Mersenne Twister (Perl module Math::Random::MT), for each allele. An estimated P value was calculated from the number of absolute values of recalculated r that exceeded the absolute value of r calculated from the observed frequencies.
Thus, expectedly 4.85 alleles would appear significantly correlated at P = 0.05, and 0.97 alleles would appear correlated at P = 0.01, if in fact allele frequency was uncorrelated with other information. A perl script formally evaluated this multiple testing effect by applying the procedure of Benjamini and Hochberg (1995) to each set of 97 P values.
Genetic distance was calculated as Nei's D A (Takezaki & Nei, 1996). The square matrix of genetic distances was compared to the square matrix of geographic distances using the Mantel test as function mantel.rtest in R package ade4 (Dray et al., 2007). This function estimated a P value by evaluating r over 9 999 random permutations of the geographic distances.

Results
Eighteen polymorphic microsatellites and 11 polymorphic SNP markers were genotyped in 1095 individuals of Mayetiola destructor representing 23 different local populations from 7 countries. The automatic binning and scoring of microsatellites succeeded, although flanking the common alleles there were rare alleles that might have represented misscored instances of the common allele, due to variation in the reported peak size from the CEQ sequencer. All called alleles were included without further consolidation in the following statistical analysis. The only direct evidence for null alleles was from 2 apparently homozygous individuals in Kazakhstan, 1 that failed to amplify a product for HF-14 and another that failed to amplify for HF-124.
Overall 18 microsatellite loci, 273 alleles were recognized: 52 restricted to North America, 38 restricted to Eurasia, 6 restricted to New Zealand, 67 shared between Eurasia and North America, 8 shared between North America and New Zealand, 3 shared between Eurasia and New Zealand, and 99 shared worldwide. Few of the geographically restricted alleles reached high frequency; only 8 of 52 exceeded 0.15 in any North American population, and only 2 of 38 exceeded 0.15 in any Eurasian population. No endemic allele reached a frequency of 0.15 in New Zealand. The number of endemic alleles varied from 9 in Spain to 0 in Louisiana, South Carolina, North Carolina, and Missouri, but the last 2 samples each consisted of only 8 flies, versus about 50 for the others. Israel (8 endemic alleles) and Kazakhstan (7) were also rich in endemic alleles, whereas Syria had only 3. California (4), Ontario (4), Kansas (3), and Oklahoma (3) were the only North American populations with more than 2 endemic alleles.
Within North American samples of ca. 50 individuals, the total number of shared and endemic microsatellite alleles over all 18 loci ranged from 64 (3.56 per locus) in Montana to 137 (7.61 per locus) in California; there were 55 alleles from 8 individuals in North Carolina. Alabama, Oklahoma, and South Carolina also had 100 or more microsatellite alleles. Pearson's correlation coefficient r was −0.015 for allele count versus distance from New York City, indicating no trend up or down on a continental scale. The number of individual alleles with apparently significant correlation slightly exceeded the number (4.85 or 0.97) expected from 97 tests: 6 at 0.05 including 1 at 0.01 for distance from New York City, 8 at 0.05 including 2 at 0.01 for mean January temperature, 9 at 0.05 including 4 at 0.01 for mean July temperature, and 3 at 0.05 including 2 at 0.01 for summer rainfall (Table S2). Thus, few if any allele frequencies trended with these variables, although the highest r values reached 0.7. When multiple testing is considered by using the Benjamini-Hochberg procedure, the maximum minimal P value that is significant at P = 0.05 is 0.05/97 = 0.000515, and all the minimum estimated P values exceeded this: 0.0020 for distance from New York City, 0.0014 for January temperature, 0.0055 for July temperature, and 0.0029 for summer rainfall (Table S2). Thus, no individual alleles were significantly correlated to distance, January or July temperature, or summer rainfall, when each allele was treated as varying independently of all others.
The Mantel test provided a global correlation of genetic distance with geographic distance within North America. The resulting correlation was low but statistically significant (r = 0.281, P = 0.0077 based on 9 999 replications). Thus, remote populations tended to be relatively distant genetically, and the California population was a major contributor to this correlation because of its maximal number of alleles and geographic remoteness.
On a worldwide scale for microsatellites, the mean genetic distance (Nei's D A ) from North American populations to Eurasia was 0.332 for Kazakhstan, 0.383 for Spain, 0.422 for Syria, and 0.519 for Israel. These and other genetic distances appear in Table S3. California was genetically the closest North American population to Kazakhstan, Syria, and Israel, and it was 3rd closest behind Alabama and Kansas to Spain. On the supposition that a northern European population might be close to an intermediate of Spain and Kazakhstan, genetic distances were calculated from all the populations to a hypothetical population having the mean of the allele frequencies from Spain and Kazakhstan. California is the North American population closest to this hypothetical population, at D A = 0.240 versus 0.255 for Kazakhstan and 0.328 for Spain. Manitoba, Montana, and Mississippi are consistently the farthest from the Eurasian populations.
New Zealand is closest to 5 North American populations, Alabama (D A = 0.307), Louisiana, California, South Carolina, and Delaware, followed by Spain (D A = 0.346) and Kazakhstan (0.368). Syria is closest to Israel (0.217), then to California (0.323) and Kazakhstan (0.344). Israel is much farther from California (0.420) than Syria is, and Israel is even farther from all the other populations.
Allelic and genotypic diversity for the polymorphic loci from each of the sampled populations are summarized in Table 3. Where at least 50 gene copies had been examined, mean microsatellite allelic richness within populations ranged from 3.56 in Montana to 7.61 in California, and mean SNP allelic richness ranged from 1.46 in Virginia, North Dakota, and Illinois, to 1.91 in California, Alabama, and Spain. All the populations except Mississippi  Table 4), and each combination of locus and population (Benjamini-Hochberg filtered significances appear in Table 4). Over the 18 microsatellite loci in Table 4, there were 301 combinations of North American population (excluding North Carolina) and polymorphic locus. By the Benjamini-Hochberg multiple testing criterion, the maximum P values were 0.03239 for 0.05 false discovery rate (FDR) and 0.005216 for 0.01 FDR. There were respectively 195 and 157 locus-populations that met these limiting P values. Over the 11 SNP loci in Table 4, there were 120 combinations of North American population and polymorphic locus. By Benjamini-Hochberg, the maximum P values were 0.01417 for 0.05 FDR and 0.00208 for 0.01 FDR. There were respectively 33 and 25 locus-populations that satisfied these 2 limits.
Inclusion of the Old World populations raised the number of locus-populations to 391 for microsatellites and Table 4 Hardy-Weinberg equilibrium P value and heterozygosity.  (14) (80) (18) (10) (10) (10) (10) (21) (14) (5) (7) (13) (10) (13) (7) (4) (7) (13) (15)  159 for SNPs while scarcely changing the limiting P values to 0.03274 and 0.005269 for microsatellites and 0.01541 and 0.00214 for SNPs. The numbers of locuspopulations below these limits were respectively 256 and 206 for microsatellites and 49 and 33 for SNPs. Therefore, 53% of the microsatellite locus-populations deviated from Hardy-Weinberg genotype frequencies at 0.01 significance, even after accounting for multiple testing, as did 21% of the SNP locus-populations. The mean expected and observed heterozygosities in Fig. 3 clearly demonstrate that heterozygote deficiency was more frequent than heterozygote excess, that the expected heterogyosity was higher for microsatellites than for SNPs, and that SNPs deviated less from the expected genotype frequencies. California had the greatest number of deviating microsatellite loci (17) and tied with Manitoba, Canada, for the greatest number of deviating SNP loci (6), whereas South Carolina and Oklahoma had the fewest deviating microsatellite loci (7). Values of F ST and D ST are given for each locuspopulation in Table 5. These pairwise distance values were used to construct 2 phenetic trees (Figs. 5 and 6). Bootstrapping did not strongly support these trees over similar alternatives, and thus only limited conclusions can be drawn from them. From an unrooted neighborjoining dendrogram of 23 Hessian fly populations based on the corrected F ST statistic (Fig. 5), the North American populations are more similar to the Spanish population than to populations from the Middle East, and there are evident similarities of some geographically close populations, for example, Idaho and Oregon, and Alabama, and Mississippi. However, there are also juxtapositions of geographically remote populations, such as Montana with Alabama and Mississippi, Delaware with Idaho and Oregon, and California with Illinois. Also, the geographically close populations from Louisiana and Mississippi were not close together in the tree. The 3 American populations most similar to Spain's are from the east coast (Fig. 6), in contrast to the genetic distance D A , which places California and Alabama closer to Spain and Kazakhstan. This tree is consistent with diversification around the Mediterranean Sea, followed by probably independent introduction to New Zealand and North America from Spain. The Israeli population is the most different from the rest, which implies that it shares the most remote common ancestor with the remaining populations.

Locus
A second unrooted neighbor-joining dendrogram of the same 23 populations (Fig. 6) was produced using the corrected Nei's standard genetic distance. Although this diagram is presented differently from Figure 5, the 2 trees are topologically identical; the same populations occupy the same relative positions within each. In Figure 6, the branch lengths indicate a collective similarity of North American populations to those of the rest of the world, with Israel standing farthest away. Although this is an unrooted tree, an ancestral node can be imagined between Spain and Syria, with progressive changes as Hessian fly spread or was transported across North America. Again, 3 populations from the U.S. East Coast are the most similar to those of the Old World, and there are several pairs of geographically close populations. Therefore, F ST and D ST conveyed the same phylogenetic signal in the sampled populations.

Discussion
Many of the microsatellite alleles differed in length from some base length plus a multiple of the motif length. While this could indicate a problem with the automatic scoring method, there is published evidence of frequent length variation near but outside of the tandem repeats in microsatellite loci (Hale et al., 2004;Mathimaran et al., 2008). Also, microsatellites tend to cluster; a 3-base repeat might lie close to a 2-base repeat, allowing length changes of 1, 5, 7, etc. bases (Kofler et al., 2008). Old microsatellite loci can break down over evolutionary timescales by accumulating point mutations and single-base indels, but these regions remain liable to gain and loss of whole repeats.
The observed microsatellite heterozygote frequency was generally less than the expected heterozygote frequency at Hardy-Weinberg equilibrium, and the deficiency was more pronounced for microsatellites than for SNPs. This situation could occur for technical, statistical, or biological reasons that are not mutually exclusive (Selcoe & Toonen, 2006). The technical reasons include amplification failure of one or the other allele in heterozygotes, and misscoring of doublets as stutter bands, either of which could appear as too-few heterozygotes. However, a test for large-allele dropout was negative: large alleles were as likely to be amplified as short alleles. In addition, if too many alleles are recognized in relation to the number actually present, the lower individual allele frequencies would lead to spuriously high expected heterozygosity. The statistical issue is the small number of copies of each allele when ca. 10 alleles exist in 50 or so individuals, as in a typical locus-population. This affects the microsatellites more than the SNPs simply because the former have more alleles. The possible biological reasons include some low frequency of null alleles, the Wahlund effect (sampling at a greater scale than the true population size), assortative mating of similar genotypes, and negative heterosis, among others. A run of Micro-Checker Table 5 Values of genetic distance by population pairs.  Table 4.  (van Oosterhout et al., 2004) indicated that null alleles or random amplification failure could have occurred, and the population from Kazakhstan had 2 apparent homozygotes for null alleles. The greater heterozygote deficiency for microsatellites than for SNPs suggests enough of a scoring problem for the microsatellites to affect estimates of inbreeding. Since SNPs also frequently exhibited heterozygote deficiency, in the absence of stutter or differential amplification, there probably is also a biological reason for the heterozygote deficiency. One plausible reason is limited dispersal of flies within populations, which would increase the frequency of matings between related individuals. The ca. 3-d lifespan of adult females (Harris & Rose, 1991) accords with very limited natural dispersal, as does the strong tendency of the males to fly within or less than 30 cm above the wheat canopy, where they are unlikely to be carried far in the wind (Anderson et al., 2012). The microsatellite and SNP findings corroborate the evidence from 12S mitochondrial rRNA (Johnson et al., 2004) and the white locus (Johnson et al., 2011) that North American Hessian fly populations are genetically polymorphic, as if the species does not tolerate inbreeding and there is a minimum founding population size necessary for successful establishment. It appears unlikely that a single gravid female could found a new population. On the other hand, a single infested bale of wheat straw could easily harbor hundreds or more puparia, allowing human transport of an entire population's diversity to a new area. The dry puparia remain viable for up to 4 years at ambient temperature in Kansas (McColloch, 1919), abetting their movement with human commerce. Thus, the widespread polymorphism in Hessian fly likely results from human-abetted bulk transport of populations as puparia rather than ongoing long-range dispersal of individual adult flies.
The evidence from microsatellites, SNPs, and white all indicate that new alleles continue to evolve in North America, and that more variation was introduced than any one of the 4 sampled Eurasian populations can account for. There are surprisingly few loci correlated with the seemingly relevant climatic variables of January temperature, July temperature, and summer rainfall, although favorable temperature and wetting are required to trigger pupation and eclosure (McColloch, 1919).
The historical legend (Porter et al., 2009) is that Hessian mercenaries inadvertently brought Hessian fly to the United States in wheat straw for their bedding or horses during the Revolutionary War, since the 1st infestation was recorded in 1777 in Brooklyn, New York, near the Flatbush community where Hessian mercenaries had encamped in 1776. The Hessians came from Germany, not Spain, but there were no sampled populations from northern or central Europe in this study. Johnson et al. (2011) noted that Hessian fly is a minor pest of wheat in northern Europe and therefore is not currently investigated there, making it infeasible to obtain samples of German or English populations from local researchers. Thus, it is not known if northern European populations would more closely match North American populations than the Spanish or Kazakh populations do. While the introduction with Hessian mercenaries may well have occurred, other introductions at other times and locations are also likely. Since Alabama, Kansas, and California are the North American populations genetically closest to Spain for microsatellite frequencies, the hypothesis that the Spanish brought infested straw with their horses to Florida and Mexico merits consideration. The joint presence of white alleles 2, 3, and 9 in Spain and Alabama (Johnson et al., 2011) accords with this. The genetic similarity of Spain and California could also reflect independent adaptation to a Mediterranean climate, although there is no strong trend in the frequencies of common alleles across North America in relation to summer rainfall, January temperature, or July temperature. The high genetic diversity in California might also reflect the 5 major types of wheat, with potentially distinct resistance profiles, that are grown there (http://www.californiawheat.org/industry/classesof-us-wheat/, accessed May 9, 2014), unlike the broad swaths of single wheat types encountered in the eastern and central United States. Furthermore, Hessian fly can survive and reproduce at low levels on various wild species in the Triticeae and Bromeae (Anderson et al., 2012), so these species also select the Hessian fly gene pool; divergent selection on multiple hosts might favor increased polymorphism within populations.
The Israeli population is fixed for a distinctive mitochondrial haplotype (Johnson et al., 2004) and has only 2 alleles at white where both are also found in neighboring Syria (Johnson et al., 2011). It is also relatively isolated from the other populations in the trees drawn from the microsatellite and SNP frequencies, and it has the second-highest number of endemic microsatellite alleles. In contrast, its microsatellite genetic distance from Syria is only 0.217, typical of divergence among North American populations. Johnson et al. (2011) noted that the nucleotide divergence of 12S rDNA sequence between Israel and other Hessian fly populations was typical of an interspecific difference. The status of the Israeli population merits investigation of hybrid fertility and mating preference in relation to Syria, since there seems to be a biological impediment to gene exchange.
Allelic richness and mean heterozygosity did not decrease westward or southward from the historical point of introduction in New York City. Thus, the spread of the Hessian fly across North America did not involve a successive extraction of progressively less diverse populations over time by founder effect, with the possible exception of the coldest parts of its introduced range in Montana and Manitoba.
The Hessian fly has caused more yield and economic loss in North America than in its native range, possibly because it has escaped from natural enemies that had coevolved with it in the Middle East (Perkins & Holochuck, 1993). Apart from California, North American climates differ from the Middle East, especially in winter temperature and the distribution of rainfall during the year. The Hessian fly has been able to adapt to various climates, but it is not clear if this is because of evolution since introduction, evolution in Europe before introduction, or broad temperature tolerance in all native populations. It is also possible that the fly had become more virulent in Europe, and that the introduction to North America involved an uncommonly virulent strain. The relatively recent economic damage due to Hessian fly in Spain (Del Moral et al., 1994) and Morocco (Amri et al., 1992) might reflect greater virulence there than in the Middle East, although it is also possible that Spain is already outside the range of the Hessian fly's coevolved predators, or that Spanish wheat culture has become less defensive against it. Also, while it is generally assumed that the Hessian fly evolved to its modern form with the domestication of wheat in the Fertile Crescent, an alternative hypothesis merits consideration: that the ancestral Hessian fly had a major shift in host preference from some other member of the Triticeae or Bromeae (e.g., Elytrigia repens [L.] Desv. ex Nevski or Elymus caninus L.) in northern Europe after wheat was first cultivated there, and it escaped the most effective of its native diseases or predators as it moved southward and overseas. In this case, Syria and especially Israel would be the peripheral isolates, and the Hessian fly would be an unimportant pest in its truly native range. Such a hypothesis would require the Hessian fly to have evolved very recently, after humans first brought wheat to northern Europe.
In conclusion, microsatellite and SNP evidence corroborate earlier evidence from 12S mitochondrial RNA and sequence variants at the white locus, that the Hessian fly was introduced multiple times in multiple places to North America from multiple sources in Europe, and that it has spread by bulk transport of puparia, most likely in infested wheat straw in human commerce. Although all sampled North American populations are genetically polymorphic, their allele frequencies conform to a general isolation-by-distance model without general trends in allelic diversity or frequency over distance or along temperature and precipitation gradients. The high genetic polymorphism within populations raises the potential of Hessian fly to evolve virulence to newly deployed resistance genes in wheat.

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site:

Table S1
Proxy locations and meteorological data for geographic analysis of allele frequencies.
Table S2 Pearson correlation coefficients and their P values for allele frequency versus distance from New York City, mean January temperature, mean July temperature, and June-August rainfall. Table S3 Genetic distances (Nei's D A ) among sampled Hessian fly populations.