Phylogeography of the highly invasive sugar beet nematode, Heterodera schachtii (Schmidt, 1871), based on microsatellites

Abstract Plant‐parasitic nematodes (PPNs) threaten crop production worldwide. Yet few studies have examined their intraspecific genetic diversity or patterns of invasion, critical data for managing the spread of these cryptic pests. The sugar beet nematode Heterodera schachtii, a global invader that parasitizes over 200 plant species, represents a model for addressing important questions about the invasion genetics of PPNs. Here, a phylogeographic study using 15 microsatellite markers was conducted on 231 H. schachtii individuals sampled from four continents, and invasion history was reconstructed through an approximate Bayesian computation approach, with emphasis on the origin of newly discovered populations in Korea. Multiple analyses confirmed the existence of cryptic lineages within this species, with the Korean populations comprising one group (group 1) and the populations from Europe, Australia, North America, and western Asia comprising another (group 2). No multilocus genotypes were shared between the two groups, and large genetic distance was inferred between them. Population subdivision was also revealed among the populations of group 2 in both population comparison and STRUCTURE analyses, mostly due to different divergent times between invasive and source populations. The Korean populations showed substantial genetic homogeneity and likely originated from a single invasion event. However, none of the other studied populations were implicated as the source. Further studies with additional populations are needed to better describe the distribution of the potential source population for the East Asian lineage.

Considering the significance that H. schachtii has for the economy and ecology, there is an urgent need to develop more effective management of this pest (Subbotin et al., 2010). Understanding its genetic diversity and structure and deciphering its invasion routes are central to monitoring and controlling its spread. Indeed, numerous studies have been conducted over the last few decades to evaluate the genetic diversity of H. schachtii populations, using diverse markers and approaches including PCR-RAPD (Ambrogioni & Irdani, 2001;Amin, Subbotin, & Moens, 2003), AFLP (Madani et al., 2007), the ITS-rRNA gene (Maafi, Subbotin, & Moens, 2003;Madani, Vovlas, Castillo, Subbotin, & Moens, 2004), and microsatellites (Plantard & Porte, 2004). However, previous studies had limited spatial coverage, used low-resolution genetic markers, or employed inadequate statistical methods as discussed by Estoup and Guillemaud (2010). Genetic patterns of H. schachtii have never been fully described on a broad scale, and the discovery of new populations in Korea underscores the need for a better understanding of this species' movement and population connectivity.
In this study, we address fundamental questions on the genetics and past movements of H. schachtii using an integrative phylogeographic approach. Phylogeography has for decades been a frequently applied tool for investigating evolutionary scenarios in invasive species (Bock et al., 2015;Karn & Jasieniuk, 2017). It has been successfully used to indicate potential source populations, reconstruct invasion routes, and test multiple introduction events in diverse species including blue mussels (Mathiesen et al., 2017), fire ants (Ascunce et al., 2011), and parasites (Criscione, Poulin, & Blouin, 2005). These studies have provided essential knowledge about the evolutionary dynamics of invasive species otherwise difficult or impossible to obtain by other means (Gotzek, Axen, Suarez, Helms Cahan, & Shoemaker, 2015).
Here, we analyzed specimens from four different continents including the recently discovered Korean populations, covering both assumed source and invasive populations, and then genotyped each of the 231 individuals using 15 newly developed microsatellite markers (Kim et al., 2016). Compared with other markers (e.g., ITS-rRNA F I G U R E 1 Global map showing the sampling locations of Heterodera schachtii. Populations are labeled with abbreviations as given in Table 1. The insert map indicates the global distribution of H. schachtii according to the records at www.cabi.org (accessed on June 2017) and COI) or fingerprinting strategies (e.g., PCR-RAPD), microsatellites generally have high mutation rates that result in high levels of polymorphism (Pedreschi, Kelly-Quinn, Caffrey, O'Grady, & Mariani, 2014;Selkoe & Toonen, 2006), making them ideal for resolving finescale genetic structure, detecting recent changes in population demography and connectivity (Selkoe & Toonen, 2006), and inferring invasion routes (Mallez et al., 2013). We applied multiple data analysis methods including a new model-based approach, approximate Bayesian computation (ABC). ABC has been successfully applied in invasion genetics studies Guillemaud, Beaumont, Ciosi, Cornuet, & Estoup, 2010). It allows parameter estimation of complex evolutionary scenarios for which likelihoods are different or practically impossible to infer (e.g., Lombaert et al., 2011;Goss et al., 2014). Using these approaches, we aimed to (a) compare genetic diversity of H. schachtii populations from different continents; (b) assess their population structure and differentiation; and (c) compare evolutionary scenarios for the sampled populations using ABC.

| Sample collection and DNA preparation
From 2010 to 2015, we gathered a total of 231 individuals of H. schachtii from 12 populations in eight countries of Asia, Europe, North America, and Oceania (Table 1 and Figure 1). Cysts were sampled from roots of sugar beet and Chinese cabbage and then fixed in molecular-grade pure ethanol (99.9%) until genomic DNA extraction. Specimens were identified based on morphology and then validated by sequence comparison of ITS1-5.8S-ITS2 region among six Heterodera species (for details see "Species identity of Heterodera schachtii" in the Supporting information Appendix S1). As suggested by Plantard and Porte (2004), only one second-stage larva per cyst was used to avoid genotyping closely allied individuals (in this species, larvae of the same cyst share at least the same mother). Total genomic DNA was extracted from single individuals using a lysis buffer containing 0.2 M NaCl, 0.2 M Tris-HCl (pH 8.0), 1% (v/v) βmercaptoethanol, and 800 μg/ml proteinase-K (Holterman et al., 2006), and then amplified using a whole genome amplification kit (GenomePlex Complete Whole Genome Amplification Kit; Sigma, St. Louis, MO, USA) to guarantee a sufficient quantity of DNA for subsequent microsatellite genotyping.

| PCR and genotyping
Fifteen polymorphic microsatellite markers developed by Kim et al. (2016) were amplified for each sample. The PCR conditions were as follows: each 20 μl reaction contained 13.3 μl of distilled water, 2 μl of 10× Ex Taq buffer (Mg2 + free), 0.8 μl of MgCl 2 , 1.6 μl of dNTP mixture, 0.8 μl of each primer (10 pmole), 0.2 μl TaKaRa Ex Taq (Takara Bio, Shiga, Japan), and 0.5 μl template DNA, with 1 cycle of initial denaturation at 95°C for 1 min, followed by 40 cycles of denatura-  (Weir, 1996). The ENA (excluding null alleles) method, suggested by Chapuis and Estoup (2007), was applied to correct for the positive bias on To estimate the influence of the existence of null alleles on H. schachtii, we repeated analyses again, excluding the two loci that had a null allele frequency >0.19. This value is a threshold over which underestimation of expected heterozygosity due to null alleles becomes significant (Chapuis et al., 2008). The following phylogenetic and population structure analyses were also performed on both data sets (15 loci and 13 loci). For each analysis, multiple and model-free approaches were used when possible to minimize the influence of null alleles.

| Genetic cluster analysis
Genetic clusters within H. schachtii were first characterized with a Bayesian algorithm implemented in STRUCTURE version 2.3 (Pritchard, Stephens, & Donnelly, 2000). For each number of population clusters K (from 1-12), ten replicates were run using the admixture model, correlated allele frequencies and the prior population information with 100,000 Markov chain Monte Carlo (MCMC) iterations after a burn-in of 10,000 steps. Structure Harvester version 0.6.92 (Earl & vonHoldt, 2012) was applied to visualize the STRUCTURE output and to choose the optimal value of K based on the Delta K method (Evanno, Regnaut, & Goudet, 2005). The 10 independent replicates for each K were merged using CLUMPP version 1.1.2 (Jakobsson & Rosenberg, 2007), and the final plots were generated using DISTRUCT version 1.1 (Rosenberg, 2004). We further carried out a discriminant analysis of principal components (DAPC) in the R package adegenet version 2.0.1 (Jombart, 2008) to validate results obtained from STRUCTURE. DAPC is a multivariate and model-free approach for finding the principal components that explain the most among-group variation while minimizing within-group variation (Jombart, Devillard, & Balloux, 2010). To investigate possible hierarchical structuring, additional analyses were performed for each method after excluding Korean populations. Finally, a hierarchical analysis of molecular variance (AMOVA) (Excoffier, Smouse, & Quattro, 1992) was conducted based on 10,000 permutations in ARLEQUIN 3.5 (Excoffier & Lischer, 2010) to calculate the partitioning of genetic variation based on the grouping suggested above.

| Evolutionary scenarios estimated using ABC
Competing evolutionary scenarios among H. schachtii population were compared using an ABC approach in DIYABC version 2.1.0 (Cornuet et al., 2014). Considering little information is available about the international origin and dispersal of H. schachtii, reconstructing evolutionary relationships among all sampled populations is unrealistic, and so in our analysis, we focused on the putative origins of the newly documented Korean populations. To simplify scenario comparison, we applied a two-step procedure. The first step was to reconstruct a basic evolutionary scenario without the Korean populations. Three pooled populations were considered in this step, based on the three STRUCTURE-defined groups: NE, TU (TU_AK and TU_NI), and CORE (UK, US_CA, US_OR, AU, GE, and IR). Pooling populations with shared history is an efficient way to reduce the complexity of candidate scenarios and has been used in numerous other studies (see Pedreschi et al., 2014;Jeffries et al., 2017). Six evolutionary scenarios that considered population history events including divergence, admixture, and population size variation were evaluated. The CORE population was assumed to be the ancestral population in each scenario considering its broad distribution among continents. The second step was to reconstruct the origin of the Korean population based on the scenario with the highest support value in the first step. Four scenarios were compared without considering mixture events, because the Korean populations were found to be monophyletic. To integrate the effects of an unsampled source population on DIYABC inferences (see Lombaert et al., 2011), a ghost population was assumed in scenario 2-4 to represent an unsampled source population for the Korean group.
For both steps, one million simulated data sets were run for each scenario to build a reference table under a generalized stepwise mutation model with mean mutation rate ranging from 1.00E-004 to 1.00E-3 and uniform prior distribution. We measured the similarity between real and simulated data sets using summary statistics both within and between populations as suggested in the DIYABC manual, including the mean number of alleles, mean genetic diversity and mean size variance for single population, and pairwise F ST , classification index, and (dμ) 2 distance for pairs of populations. A pre-evaluation step based on a principal components analysis (PCA) was performed to ensure that at least one combination of scenarios and priors can produce simulated data sets that are close enough to the observed data set.
The relative posterior probabilities of the competing scenarios were estimated by a logistic regression analyses on the 1% of simulated data sets closest to the observed data set (Cornuet, Ravigné, & Estoup, 2010). The model with the highest posterior probability and for which the 95% confidence interval (CI) did not overlap with the CIs of other models was considered the best model. A model checking computation was then applied to evaluate the goodness of fit of the most highly supported scenario using PCA, in which the observations were the simulated data sets and the variables were the summary statistics including all single-population and two-population statistics. The "Confidence in scenario choice" function was used to estimate the proportion of type 1 and type 2 errors based on 1,000 pseudo-observed data sets. Small type II errors provide good confidence in the selected scenario even if the type I errors are large (Bermond et al., 2012).

| Null alleles and their influence on results
Null alleles were prevalent among populations, especially in Iran (0.200) and two Turkey populations (0.373 and 0.400 for TU_AK and TU_NI, respectively) (Supporting information Figure S1 in Appendix S1). The mean null allele frequency over all populations and loci was 0.100, with frequencies averaged over loci ranging from 0.013 to 0.251. DNA samples that failed to yield products at some loci could nevertheless be amplified successfully at other loci, a pattern implying the existence of null alleles for most loci and populations (Chapuis et al., 2008). Deviation from HWE was observed for each locus (Supporting information Table S1) and also for most statistics of each locus in each population (Supporting information Figure S2 in Appendix S1). Two loci were observed with null allele frequencies over 0.19 (HS005 and HS033), and so analyses were repeated excluding them. Since the analyses using all 15 loci and yielded similar results (in genetic diversity, phylogeny, and population structure) as the analyses using 13 loci, the following results are presented based on all 15 loci. Results using 13 loci are provided in the Supporting information Appendix S1.

| Genetic variation within and among populations
All fifteen loci were highly polymorphic but had different degrees of diversity, varying from 5 alleles in both HS021 and HS033 to 14 alleles in HS037, with a mean value of 7.73 across all loci (Table 2).
For each locus, the observed and expected heterozygosity ranged   (Figure 2).

| Phylogenetic relationships
The individual NJ tree (Figure 3a) revealed two divergent clusters with a clear geographic pattern, corresponding to groups 1 and 2 defined above. No subgroups were discovered among three Korean populations in group 1, while for group 2, most individuals in NE and IR assembled closely together as a subclade. The population NJ tree also revealed a clear distinction between Korean populations and all non-Korean populations, with a long branch connecting these two clades ( Figure 3b). The MSN generated similar results, as two divergent lineages appeared which separated Korean MLGs from the rest (the thin line between groups represents large genetic distance in Figure 4a). The median genetic distance among MLGs of groups 1 and 2 was 0.168 and 0.284, respectively, while the distance between them was up to 0.478 (Figure 4b).

| Population structure
For the STRUCTURE analysis, ΔK indicated the best grouping occurred at K = 2 (Supporting information Figure S4a in Appendix S1), which suggests the three Korean populations (green cluster) had a distinct genetic composition from all other populations (pink cluster) Based on the population clustering suggested above (groups 1 and 2), AMOVA analysis revealed that most genetic variation came from variation within populations (73.84%, p < 0.0001), followed by variation among groups (23.02%, p = 0.0045) and variation among populations within groups (3.13%, p < 0.0001) ( Table 3).
F I G U R E 4 (a) Minimal spanning network displaying the topology among all multilocus genotypes (MLGs) of 15 microsatellite loci based on Bruvo's distance; (b) boxplots of Bruvo's distances (y-axis) among MLGs of groups 1 (intra-group 1) and 2 (intra-group 2) and between MLGs of groups 1 and 2 (between groups) F I G U R E 5 Clustering analysis of Heterodera schachtii genotypes with admixture proportions for K from 2 to 10 in STRUCTURE. Each sample is indicated by a narrow bar divided into K colored segments showing the individual's mean membership coefficient for each of the clusters K. Population locations and the continent they belong to are also indicated at the bottom and top of the bar plots, respectively

| Evolutionary scenario reconstruction
The model pre-evaluation using PCA illustrated that the observed data fell within the cluster of simulated data points based on the scenarios in each step (Supporting information Figure S5a Figure S5c in Appendix S1). The next most highly supported scenario was Scenario 1-6 (PP = 0.250), and its 95% CI (0.230-0.269) had no overlap with Scenario 1-4. Under Scenario 1-4, TU and NE populations successively originated from the CORE population, each passing through a bottleneck stage ( Figure 7a). In the assessment of confidence in scenario choice, the type I error was 0.311 and the mean type II error was 0.097 (ranging from 0.046 to 0.227). This result was corroborated by the PCA, in which the observed data set was closely matched by many simulated data points from the posterior sample of Scenario 1-4, indicating that the simulations produced data sets similar to the observed data (Supporting information Figure S5e in Appendix S1).
In step two, Scenario 2-4 was the most highly supported scenario, with PP = 0.986 (CI: 0.976-0.996) and no overlap with the other scenarios (Supporting information Figure S5d in Appendix S1).
In this scenario, a ghost population split with the CORE population early in time, the TU and NE populations were later derived from the CORE separately (Figure 7b), and the Korean population originated from the ghost population recently. A bottleneck effect is assumed for each colonizing event. In the "confidence in scenario choice" analysis, the type I error was 0.225 and the mean type II error was 0.112 (min = 0.110 and max = 0.133), showing high statistical power for distinguishing between the alternative scenarios. In the PCA simulations, the observed data closely resembled the posterior sample of Scenario 2-4 (Supporting information Figure S5f in Appendix S1).

| D ISCUSS I ON
Biological invasion represents an important area of research in global change biology. About half a million species have been translocated due to anthropogenic activities, causing far-reaching ecological and economic impacts around the world (Pimentel et al., 2001). Plantparasitic nematodes, despite their capacity for huge economic damage, are still overlooked in evolutionary studies, and there is little information about their invasion biology (Wielgoss, Taraschewski, Meyer, & Wirth, 2008). In this study, we assessed the phylogeographic patterns of the highly invasive nematode H. schachtii from four continents using 15 polymorphic microsatellites. Unique features with regard to null alleles were observed in the data set, and unexpected patterns of population diversity and structure, as well as the most likely evolutionary scenarios regarding the origin of Korean populations, were inferred using multiple approaches.

| High prevalence of null alleles
The microsatellite loci of H. schachtii in this study are highly prone to null alleles, with a mean frequency up to 10%. Null alleles are alleles that fail to be genotyped due to mutations on primer-binding sites (Brookfield, 1996)  can obscure the true relationships between different genotypes and cause serious problems in parentage analysis (Huang et al., 2016).
However, its impact on phylogeographic analysis has not been fully Appendix S1) showed similar values.
We also performed the analyses after excluding the two loci whose null allele frequency was higher than 0.19. The expected TA B L E 3 Analysis of molecular variance (AMOVA) for Heterodera schachtii samples based on groups 1 and 2 Step 1: comparison of scenarios 1-1 to 1-6 when the CORE, NE, and TU are considered; (b) Step 2: based on the best-supported model in step 1, comparison of scenarios 2-1 to 2-4 to test the origin of Korean populations (KR). The best-supported model in each step is highlighted in blue and orange, respectively. Scenarios are shown together with their respective phylogenetic tree and relative times of divergence (0 indicates the present day and t on top of the bar indicates the oldest split; BD: the duration of the bottleneck period during introduction; note that times are not to scale) heterozygosity values based on just 13 loci were generally lower than the values using all 15 loci, both for individual and overall populations (Supporting information Table S2). In the phylogenetic and grouping analyses using 13 loci, all the results including the individual and population NJ trees (Supporting information Figure   S7 in Appendix S1), MSN tree (Supporting information Figure S8 in Appendix S1), and DAPC clustering (Supporting information Figure   S9 in Appendix S1) were similar to the analyses based on 15 loci. This indicates that phylogeographic patterns in this study are robust and not significantly biased by the high frequency of null alleles.

| Deep genetic divergence
The most striking pattern uncovered in this study is the high divergence of the three Korean populations (group 1) from the other populations worldwide (group 2) in the phylogenetic and genetic clustering analyses. Large genetic distance was estimated between groups 1 and 2: The median Bruvo's distance of MLGs between groups was nearly 3-and 1.5-fold larger than the intra-group distances ( Figure 4b). All these lines of evidence indicated that the Korean populations represent a cryptic lineage of H. schachtii.
Although many previous studies using various approaches have been conducted on this species (e.g., Plantard & Porte, 2004;Madani et al., 2007)

| Evolutionary scenario reconstruction
H. schachtii is considered native to Europe with observational data from most beet-farming areas, but it has now attained a global distribution (Cooke, 1992). Our group 2 displays a wide distribution across West Asia, Europe, North America, and Oceania ( Figure 5). the lack of precise knowledge of the relationships and historical movements among the studied populations, it is not currently feasible to pinpoint the location of the ghost population. Note that to simplify possible candidate scenarios, we did not resolve relationships among the CORE populations. In fact, the populations of the CORE have relatively low levels of differentiation, making it unsafe to reconstruct a "true" population tree to estimate their introduction routes . Additional sampling will be necessary to better characterize the origin of the Korean lineage and elucidate the finer details of global evolutionary patterns in H. schachtii. Nevertheless, this study provides an important phylogeographic perspective on H. schachtii that can help to monitor its spread and track further invasion events in East Asia.

| CON CLUS ION
Biological invasions and their underlying mechanisms have become increasingly better understood by the application of an evolutionary perspective (Chown et al., 2015;Lee, 2002;Lee & Gelembiuk, 2008). However, current research on nematode invasions proceeds slowly, with few studies focusing on intraspecific diversity, especially on a global scale (Wielgoss et al., 2008).
Understanding the genetic patterns of PPNs and reconstructing their evolutionary scenarios are timely problems. Based on polymorphic microsatellites and multiple analytical approaches (e.g., the model-based ABC approach), we assessed the genetic differentiation and structure of the highly invasive nematode H. schachtii from four continents and discussed the most likely evolutionary scenarios behind these changes. These data are an essential foundation for creating strategies for better controlling and preventing the spread of H. schachtii and also represent a good model system for studying important questions in the invasion genetics of nematodes (Subbotin et al., 2010).

DATA A R C H I V I N G S TAT E M E N T
Genbank accessions of the fifteen microsatellite loci used in this study are KT716061-KT716075, and the microsatellite genotyping data for Heterodera schachtii are available in the Supporting information Appendix S2.

ACK N OWLED G M ENTS
We thank the editors and reviewers for their constructive comments. We also thank the colleagues below for their generous

CO N FLI C T O F I NTE R E S T
The authors declare no competing commercial interests.