Tournament ABC analysis of the western Palaearctic population history of an oak gall wasp, Synergus umbraculus

Approximate Bayesian computation (ABC) is a powerful and widely used approach in inference of population history. However, the computational effort required to discriminate among alternative historical scenarios often limits the set that is compared to those considered more likely a priori. While often justifiable, this approach will fail to consider unexpected but well‐supported population histories. We used a hierarchical tournament approach, in which subsets of scenarios are compared in a first round of ABC analyses and the winners are compared in a second analysis, to reconstruct the population history of an oak gall wasp, Synergus umbraculus (Hymenoptera, Cynipidae) across the Western Palaearctic. We used 4,233 bp of sequence data across seven loci to explore the relationships between four putative Pleistocene refuge populations in Iberia, Italy, the Balkans and Western Asia. We compared support for 148 alternative scenarios in eight pools, each pool comprising all possible rearrangements of four populations over a given topology of relationships, with or without founding of one population by admixture and with or without an unsampled “ghost” population. We found very little support for the directional “out of the east” scenario previously inferred for other gall wasp community members. Instead, the best‐supported models identified Iberia as the first‐regional population to diverge from the others in the late Pleistocene, followed by divergence between the Balkans and Western Asia, and founding of the Italian population through late Pleistocene admixture from Iberia and the Balkans. We compare these results with what is known for other members of the oak gall community, and consider the strengths and weaknesses of using a tournament approach to explore phylogeographic model space.

Where the number of population scenarios to be compared is small, support for all of them can be compared directly (e.g., Hearn, Stone, Nicholls, Barton, & Lohse, 2014). However, it remains challenging to compare the large numbers of alternative possible scenarios that exist for even small numbers of populations when incorporating variation in population size, the topology and timing of population splits, and patterns of gene flow or population admixture (Bertorelle et al., 2010;Lombaert et al., 2014;Pelletier & Carstens, 2014). The challenge arises because of both the number of historical scenarios that can potentially be compared, and the numbers of simulations necessary to adequately explore prior distributions in parameter-rich models. Generation of a single very large ABC reference table for the entire set of scenarios may be difficult (c.f. Lombaert et al., 2014) or impracticable. One solution is to limit the set of compared models a priori, based on knowledge of the biological system (e.g., Boehm et al., 2013;Jacquet et al., 2015;Smith, Lohse, Etges, & Ritchie, 2012) or the specific hypotheses being compared (e.g., Fagundes et al., 2007;Hickerson & Meyer, 2008). However, this approach necessarily prevents detection of counterintuitive, but true scenarios that would be revealed by a more complete exploration of model space and it is hence vulnerable to preconception bias (Jackson, Morales, Carstens, & O'Meara, 2017;Thom e & Carstens, 2016). This a key issue at the heart of debates over model-based methods in inference of population history Templeton, 2009). A second alternative is to use a stepwise procedure, in which a subset of scenarios from initial analyses is included in one or more later rounds of analysis (e.g., Konecny et al., 2013;Lombaert et al., 2014;Pelletier & Carstens, 2014). The hierarchical tournament approach is a specific version of such a stepwise procedure, in which a set of scenarios is first divided into several non-nested pools. A first round of ABC is applied to each pool to identify the best-supported model(s). A second round then compares support across the pool-winning scenarios. Such an approach allows identification of a best overall model, or estimation of shared model parameters from a joint posterior distribution (e.g., see Barr es et al., 2012;Kerdelhu e, Boivin, & Burban, 2014;Wei et al., 2015).
Here, we use a tournament approach to infer the relationships between putative Western Palaearctic glacial refugia for an oakfeeding insect, Synergus umbraculus. We compare support for 148 alternative scenarios in eight non-nested scenario pools.
Many studies have addressed the influence of Quaternary glacial cycles on patterns of intraspecific genetic diversity (e.g., Connord, Gurevitch, & Fady, 2012;Hewitt, 2004). During periods of glacial advance, most temperate western Palaearctic species were confined to southern refugia in one or more of Iberia (AE northwestern Africa), Italy, the Balkans, Turkey, the Levant, Iran and the Caucasus (Hewitt, 1999(Hewitt, , 2004. Prolonged reproductive isolation resulted in genetic differentiation between refugial populations, which many studies have exploited to identify the source population(s) for postglacial range expansion during the Holocene (Hewitt, 1999;Schmitt, 2007) or Anthropocene (e.g., Nicholls et al., 2010b). However, much less is known about the older range expansions underlying the wide longitudinal distributions of many western Palaearctic taxa (Flanders et al., 2009;Hewitt, 1999Hewitt, , 2004Krehenwinkel et al., 2016;Stone et al., 2012;Taberlet, Fumagalli, Wust-Saucy, & Cosson, 1998). Very few studies have explored relationships between putative southern refugia across the region in a model-based framework. Likelihoodbased analyses to date have faced methodological constraints limiting comparison to three refugial regions Lohse, Barton, Melika, & Stone, 2012;Lohse, Sharanowski, & Stone, 2010).
The simulation-based framework of ABC allows exploration of more complex models, which we here apply to analysis of relationships between populations in Iberia, Italy, the Balkans and Western Asia.
These gall inducers are highly specific to particular oak taxa . A second group of closely related wasps (Cynipidae, tribe Synergini), which includes Synergus umbraculus, are termed inquilines, and feed as herbivores within galls induced by other gall wasps (Ronquist, 1994). Because they are associated with specific host galls and can cause the death of the gall inducer, in some ways inquilines resemble parasitoids (Ronquist, 1994;Sanver & Hawkins, 2000). Like their host gall inducers, inquiline gall wasps are also associated with specific oak taxa ( Acs et al., 2010). Both inducers and inquilines are attacked by parasitoids, which make-up the third trophic group (Bailey et al., 2009;Stone, Sch€ onrogge, Atkinson, Bellido, & Pujade-Villar, 2002). Least is known about the inquiline trophic level, in part because their taxonomy is currently only resolvable using DNA sequence barcodes ( Acs et al., 2010).
Multiple studies have addressed the Western Palaearctic phylogeography of the gall inducers Hearn et al., 2014;Mutun, 2010Mutun, , 2016aMutun & Atay, 2015;Nicholls, Challis, Mutun, & Stone, 2012;Rokas, Atkinson, Brown, West, & Stone, 2001;Rokas, Atkinson, Webster, Cs oka, & Stone, 2003;Stone & Sunnucks, 1993;Stone et al., 2007) and parasitoids (Hayward & Stone, 2006;Lohse et al., 2010Lohse et al., , 2012Nicholls et al., 2010aNicholls et al., , 2010b. Both groups show genetic structure compatible with Pleistocene refugia in the west (Iberia), centre (Italy and the Balkans) and east (Turkey and Iran) that broadly parallel those for deciduous oaks (Petit et al., 2003). Most species in both groups sampled to date show an east-to-west decline in population genetic diversity and are inferred to have spread westwards into Europe from Asia during or before the Pleistocene (the "Out of Anatolia" hypothesis; Rokas et al., 2003;Stone et al., 2009Stone et al., , 2012; see also Connord et al., 2012). The only known gall wasp exception to this pattern is an inducing species, Biorhiza pallida, whose population history is inferred to have involved an initial deep divergence between Iberia and the Balkans + Iran, followed by migration from Iran to Iberia that bypassed central Europe , possibly via North Africa. Three parasitoid species show a contrasting west-to-east "Out of Iberia" pattern for mitochondrial sequence . This raises the question of how general the "Out of Anatolia" pattern is for all three trophic elements of this community .

Synergus umbraculus is found throughout the temperate Western
Palaearctic, wherever suitable host galls are found. A previous analysis of mitochondrial sequence data for populations from Morocco to Iran (Bihari et al., 2011) found greatest haplotype diversity in Hungary, and hypothesized that the Balkan region acted as a Pleistocene refuge for this species. However, sampling in this study was biased towards central Europe, and inferences from it are subject to the caveats inherent in using any single locus to infer population history (Degnan & Rosenberg, 2006, 2009Hurst & Jiggins, 2005). An alternative is that the high-genetic diversity in Hungary is the result (wholly or in part) of gene flow into this region from other centres of genetic diversity to the east and/or west (Hewitt, 1999). This can result in higher genetic diversity in nonrefuge regions than in contributing refugia (e.g., Comps, G€ om€ ory, Letouzey, Thi ebaut, & Petit, 2001). Here, we revisit the phylogeography of S. umbraculus, using a substantially enlarged sequence data set (including six nuclear loci) spanning the longitudinal distribution of the species and combining two different approaches.
First we reconstruct relationships between genetically distinct genotype pools identified using STRUCTURE (Pritchard, Stephens, & Donnelly, 2000). This allows inference of relationships between genetic units that may be shared across geographic regions, and whose divergence from each other could predate their current geographic distributions.
We then use tournament ABC to infer relationships between populations in four regions identified by previous research on oaks and associated insects as putative Pleistocene glacial refugia. Our set of spatial scenarios includes both "Out of the East" and "Out of the West" directional models, and an equivalent of the contrasting pattern found for the gall wasp Biorhiza pallida. To allow the detection of counterintuitive but nevertheless well-supported scenarios, these models are compared to all alternative arrangements of populations in eight pools of topologically discrete (i.e., non-nested) scenarios.

| Sample selection
We sampled 98 individuals from 11 countries, spanning the full western Palaearctic distribution of Synergus umbraculus from Morocco to Iran, including putative western Palaearctic refugia (Iberia, Italy, the Balkans, Turkey, Lebanon and Iran; Figure 1,

| Molecular markers
We generated sequence data for seven markers, comprising fragments of mitochondrial cytb and six nuclear genes, totalling 4,233 base pairs (bp). One of the nuclear markers (ITS2) has previously been used in analyses of cynipid phylogeography , while five EPIC (exon-primed, intron-spanning) markers for LWopsin, Ran, RpS4, RpS8 and RpS23 (Lohse, Sharanowski, Blaxter, Nicholls, & Stone, 2011;Stone et al., 2009) are here used in this way for the first time.
Sample sizes for each region are given in Tables 1 and S2. GenBank accession nos for all new sequences are provided for each specimen in Table S1 (JQ752244-JQ752264 for cytb, JQ752265-JQ752405 for ITS2, JQ752406-JQ752543 for LWopsin, JQ752544-JQ752668 for Ran, JQ752669-JQ752779 for RpS4, JQ752780-JQ752916 for RpS8 and JQ752917-JQ753049 for RpS23). Male oak gall wasps are haploid , and where possible we selected male specimens to minimize difficulties associated with determining the allelic phase of heterozygous diploid females.
Sequence generation for the specimens in Bihari et al. (2011) used the original DNA extractions. For the 19 additional specimens, DNA was extracted following the chelex-based procedure described in Nicholls et al. (2010a). A 697-bp fragment of the mitochondrial cytb gene was amplified using the primers CP1B (5 0 -AATTTTG GATCTCTTTTAGG-3 0 ) and CP2C (5 0 -ATAACTCCTCCTAATTTAT-3 0 ) designed specifically for S. umbraculus (the CP2C primer sequence given here is corrected from an error in Bihari et al., 2011). We amplified six nuclear loci. A 572-bp fragment of the internal transcribed spacer region (ITS2) of the nuclear ribosomal array was amplified using primers ITS2f (5 0 -TGTGAACTGCAGGACACATG-3 0 ) and ITS2r (5 0 -AATGCTTAAATTTAGGGGGTA-3 0 ) (Campbell, Steffen-Campbell, & Werren, 1993  PCRs were performed in 20 ll reactions using conditions given in File S1. PCR products were cleaned using a shrimp alkaline phosphatase and exonuclease I protocol, and sequenced directly on an Lebanon and Iran), without incorporating sampling location information, for K = 1-10 for 2 million generations after a burn-in of 500,000 generations. Convergence in estimated parameter values was confirmed for ten independent (and highly concordant) runs for each value of K. STRUCTURE outputs were processed using STRUCTURE HARVESTER (Earl and von Holdt 2012), and we identified the best-supported K value both by calculating the likelihood of each K value (following Pritchard et al., 2000), and by calculating DK (following Evanno, Regnaut, & Goudet, 2005). We compared the support for models with/without admixture, and with/without correlated allele frequencies (Falush, Stephens, & Pritchard, 2003) using Bayes T A B L E 1 Genetic variability for nuclear and mitochondrial sequence markers across regional populations Sample size refers to the number of sequenced gene copies obtained. All diversities were calculated in ARLEQUIN and are corrected for between-sample variation in sample size. Values for nuclear loci are means AE 1 SE across six loci. Full summaries of diversity by locus are given in Table S2.
F I G U R E 1 STRUCTURE analysis of Synergus umbraculus. (a) STRUCTURE results for K = 3. (b) STRUCTURE analysis for K = 7. In each of (a) and (b), individuals are allocated to the genotype cluster for which they have the highest STRUCTURE assignment probability (given for each individual in Table S4), while the relative numbers of individuals sampled at each site (n = 1-9, total n = 98) are shown by pie chart size. Numbers correspond to the site locations given in Table S1. Insets in each map show the average STRUCTURE genotype cluster allocation for each native range individual over 10 replicates as a thin vertical line, partitioned into coloured segments that represent estimated membership coefficients in each genetic cluster. Individuals are grouped according to geographical region as labelled below the figure (Tur = Turkey, L = Lebanon) STONE ET AL.
| 6689 factors, following Kass and Raftery (1995). Assignment probabilities (Q-values) of individuals to populations were averaged over replicate analyses using CLUMPP (Jakobsson & Rosenberg, 2007) and represented graphically using DISTRUCT (Rosenberg, 2004). Genetic diversity measures per locus and pairwise F ST between the five putative refugia were calculated using ARLEQUIN 3.5.22 (Excoffier, Laval, & Schneider, 2005). Significance values for pairwise F ST values were estimated using the permutation procedure in ARLEQUIN, with 10,100 permutations. All molecular diversity measures were corrected for sample size variation in ARLEQUIN.
We assessed the phylogenetic relationships between STRUCTURE genotype pools using the hierarchical *BEAST model implemented in BEAST version 1.6.1 (Heled & Drummond, 2010). *BEAST uses Bayesian MCMC inference to co-estimate the species tree and its embedded gene trees simultaneously, under a multispecies coalescent model that incorporates incomplete lineage sorting (Heled & Drummond, 2010). The *BEAST model does not incorporate admixture, which was supported for our data in STRUCTURE analyses. Because most individuals were assigned to one genotype pool with high probability in the value of K chosen for the *BEAST analysis, we nevertheless consider this approach likely tp provides a useful indication of population relationships. In selecting appropriate models of sequence evolution, we initially chose the most complex models possible given the substitution types present in each gene or partition. The cytb data were partitioned by codon position, while low levels of sequence variation made this unnecessary for the nuclear loci. The initial most complex models were GTR + I + G for opsin, RpS8 and cytb third positions, and HKY + I + G for all other genes and partitions. Support for these and less parameter-rich models was assessed using Bayes factors, estimated as twice the difference in the natural logarithm of the harmonic mean of model likelihoods of each model (2ΔlnHML) (Kass & Raftery, 1995). As the harmonic mean estimator is unstable and can vary considerably among MCMC samples (Ronquist & Deans, 2010), we used a more conservative cut-off than Kass and Raftery (1995), taking >20 to indicate strong support for the model with higher likelihood (as in . The resulting best overall substitution models describing the data were GTR + I for opsin and RpS8, GTR + G for cyt b third positions, HKY + I for cyt b first and second positions and RpS4, HKY + G for Ran, and HKY for ITS2 and RpS23. We then used these models to assess Bayes' factor support for alternative clock models (strict or relaxed, Drummond, Ho, Phillips, & Rambaut, 2006) for each locus, resulting in use of an uncorrelated lognormal relaxed clock for cyt b and a strict clock for all nuclear loci. We then used the same approach to assess support for alternative species tree priors (Yule vs. birth/death) and population size models (piecewise linear, piecewise linear with constant root, or piecewise constant). Bayes factor comparison supported a Yule tree prior and piecewise linear with constant root population size model. This model allows linear population size changes over a branch with the constraint (not imposed in our ABC analyses below) that the summed population size of two diverging lineages at their divergence time equals the population size of the ancestral lineage (Heled & Drummond, 2010). For this final model, we conducted two independent runs of 2 9 10 8 generations, sampled every 2.5 9 10 4 generations, with parameter convergence and effective sample sizes confirmed using TRACER version 1.5 (Rambaut & Drummond, 2009). The final 5 9 10 7 generations from each run were used to assess tree topology and node support. To allow comparison with previous work on the gall wasp system and the ABC analyses below, we apply the calibration rate of 1.15% per lineage per million years for cyt b (1.15 9 10 À8 ; Brower, 1994) with two generations per year.

| ABC inference of population history
We inferred relationships between sets of individuals using approximate Bayesian computation (ABC) incorporated in the software DIYABC (version 1.0.4.46, Cornuet et al., 2008Cornuet et al., , 2010 Figure 2. This approach allows inference of the relationships between population units that may be shared across geographic regions, and whose divergence could predate the current geographic distribution of populations. Individuals were assigned to the genotype pool with the highest allocation probability (Q value). For the relevant analysis, all but two of 98 individuals were assigned to a single population with Q ≥ 0.9, and all but seven with a Q ≥ 0.95 (Table S4). t2 F I G U R E 2 Seven DIYABC scenario topologies compared for the three populations identified by the STRUCTURE K = 3 analysis. Branch colours indicate discrete population size parameters in the model, and t1, t2 and tanc represent splitting times or times of population size change. In topologies (e)-(g), one population is founded by admixture in proportions r and 1 À r from the other two, and the model incorporates a population size change after founding of the admixed population Second, to understand the history of relationships between putative refugia, we reanalysed the data, allocating individuals to four geographic regional populations-Iberia, Italy, the Balkans and the East (Turkey, Lebanon and Iran combined). Italy was considered separately in this analysis given its demonstrated role as a Pleistocene glacial refuge for the oak hosts of the galls attacked by S. umbraculus (Petit et al., 2003). Turkey, Lebanon and Iran were combined into a single eastern population given the results of our STRUCTURE analysis (following Lombaert et al., 2014), and to make the number of alternative possible population topologies manageable given our hierarchical tournament approach. We return to this decision in the In each scenario, every population was allowed a separate population size. Changes in population size were assumed to be instantaneous and to occur during population splitting events.
Because it is computationally unwieldy to compare all 148 scenarios in a single set, we used a hierarchical two-stage "tournament" approach to simplify scenario comparison (Barr es et al., 2012;Kerdelhu e et al., 2014). We first identified the best-supported scenario in each of the eight model pools, and then compared pool-winning scenarios in a further analysis. This approach assumes that ranking of scenarios in ABC is transitive, that is if we can show that scenario A is better than B, and also that scenario B is better than C, then we can infer that A is also better than C without having to compare these two directly. Because all scenarios were compared relative to a single set of observed values in the same summary statistic space, this assumption is sound. This hierarchical approach and our pooling strategy allows full exploration of model space for a given topology of scenario within each pool, and also allows topologically similar models in different pools to achieve highest posterior probability against alternatives without competing directly, a potential problem where many similar models are being compared (see Pelletier & Carstens, 2014). It also ensures that the best scenario overall will always be compared with the second best overall-either as two scenarios in the same pool or as the winners of two different pools.
Genetic data for each scenario were simulated in DIYABC for 2 million generations using sample sizes and sequence lengths identical to those sampled (Table 1) Brower, 1994), nuclear locus mutation rates (excluding gap characters) ranged from 3.1 9 10 À9 (ITS2) to 7.9 9 10 À9 (RPS8). To incorporate uncertainty in absolute and relative mutation rates, the six nuclear loci were assumed to show gamma distributed variation in mutation rates (shape parameter = 2) with a mean drawn in the range 10 À10 to 10 À8 . Because of uncertainty in assumed mutation rates (Ho et al. 2011), we emphasize the relative (rather than absolute) age of events. The 1:3 ratio of effective population size in mitochondrial and autosomal loci in haplodiploid organisms was incorporated in DIYABC by coding the nuclear loci as X-linked. In the absence of empirical information, the sex ratio was assumed to be equal. Our analyses assumed an HKY + I + G model of sequence evolution for both nuclear loci and cytb, and we estimated the shape parameter of the gamma distribu- The posterior probability of each scenario in a given pool was estimated using weighted polychotomous logistic regression on the 1% of simulated data sets closest to the observed data (Cornuet et al., 2008(Cornuet et al., , 2010. These posterior probabilities are equivalent to approximate marginal likelihoods (Boehm et al., 2013), allowing support for alternative models in any given set to be compared using Bayes factors (Kass & Raftery, 1995). Support for the strongest scenario was categorized as weak, substantial or strong for Bayes factors in the ranges 1-3.1, 3.2-10 and 10-100 following Kass and Raftery (1995 Posterior parameter distributions for each pool-winning scenario were estimated using local linear regression on the closest 1% of accepted data sets, and suitability of our prior limits was checked by confirmation that posterior parameter distributions for each scenario produced a unimodal peak within selected prior bounds. Parameter estimates are presented as median values (with 95% confidence intervals in brackets). We assessed the goodness of fit of a scenario following Cornuet et al. (2010) by locating observed summary statistic values within distributions generated using 10,000 simulations for that scenario with parameter values drawn from the posterior distribution. For analyses of the population groupings identified using STRUCTURE, we used the second set of summary statistics for this step, while for the geographic analyses, we used the same summary statistics used in generation of the reference table, for reasons described above. For the winning scenarios, we estimated precision in parameter estimation (Cornuet et al., 2010) by computing the relative median of the absolute error for 500 pseudo-observed data sets using values drawn from priors. Relative median of the absolute error is the 50% quantile (over pseudo-observed data sets) of the absolute value of the difference between the median value of the posterior distribution sample (in each data set) and the true value, divided by the true value (Cornuet et al., 2010).

| Patterns of genetic diversity in S. umbraculus
All six nuclear markers showed informative sequence variation, with allelic richness ranging from seven to 23 in 121-150 sampled gene copies (Table S3), and with nucleotide diversities around 15% of that seen in mitochondrial cytb (Table 1). All seven loci showed strong longitudinal patterning, with alleles that were specific to one of Iberia, Italy, the Balkans, Turkey or Iran (Table S3). Pairwise F ST values between putative refugial regions were significant for all pairs at p < .002 (Table 2), with lowest divergence between Turkey and Iran (F ST = 0.093), and greatest divergence between Iberia and Turkey (F ST = 0.401). Nuclear and mitochondrial nucleotide diversities and nuclear allelic diversity were all highest in the Balkans (Table 1) Comparison of analyses for K values from 3 to 7 (Fig. S1) shows that all share (i) east-west divergence, but also genotype pools that span two or more putative refugia, and (ii) presence of multiple genotype pools in central Europe (Italy, the Balkans) that are shared with regions to the west or east. The greatest diversity of genotype clusters for any single region was found in the Balkans, with three of three genotype clusters represented in the K = 3 analysis and four of seven genotype clusters for the K = 7 analysis ( Figure 1).

| *BEAST inference of phylogenetic relationships between the K = 7 STRUCTURE populations
Phylogenetic analysis of the K = 7 STRUCTURE populations ( Figure 5) shows a deep divide between primarily eastern (clusters 5,6,7) and western (clusters 1,2) lineage groups. The cluster restricted to Italy wholly or predominantly in the West (clusters 1, 2) or the East (clusters 6, 7). Assuming a mitochondrial mutation rate of 1.15% per lineage per million years (Brower, 1994), the median divergence time (and 95% confidence limits) between the eastern and western lineages is 750 (310-1,775) thousand years (ky) ago.

| ABC inference of relationships between STRUCTURE populations
Comparison of the seven scenarios for the K = 3 STRUCTURE populations yielded very similar results for each of the two summary statistic approaches used (posterior probabilities and error rates of all seven scenarios for each approach are given in Table S5a). In each case, the most strongly supported scenario was a three-way poly-  (Table S5a).
Posterior predictive simulations for the polytomy scenario provided a good fit to observed summary statistics, regardless of the set of summary statistics used in scenario choice and assessment of model fit (Table S5b; Fig. S2).   Table S5d). The winning scenario had substantial or strong Bayes factor support and nonoverlapping 95% posterior probability limits over the second-ranked model in seven of eight pools (Table 3). The exception was pool G, in which Bayes factor support for winning scenario 119 over the closest rival scenario 110 was weak (Bayes F I G U R E 4 Results of STRUCTURE analyses for the native range samples of Synergus umbraculus. (a) Mean estimated Ln probability of the data for K values of 1-10, calculated over 10 replicates using CLUMPP (Jakobsson & Rosenberg, 2007). The most strongly supported K value following Pritchard et al. (2000) is 7. (b) A plot of DK as a function of K, following Evanno et al. (2005), showing strongest support for K = 3. Both plots were generated using STRUCTURE HARVESTER (Earl and von Holt 2012)

IB IT BA TU IR
F I G U R E 5 Relationships inferred between the genotype clusters identified in the K = 7 STRUCTURE analysis using *BEAST. Colours associated with each cluster match those in Figure 1b. The values above branches are node posterior probabilities, and numbers below are median node ages in millions of years (with 95% highest posterior bounds) assuming a calibration rate of 1.15% per lineage per million years for cyt b (Brower, 1994). Filled squares at right show the geographic distributions of each genotype cluster factor 2.5). We therefore carried out parallel ABC analyses across pool winners using either scenario 119 or 110 as the winner of pool G. This had no impact on the overall winning scenarios in the second round of ABC analysis. Analysis of pseudo-observed data sets (PODs) showed false positive rates (Type 2 errors) for the pool-winning scenarios to be ≤10% in >91% of cases (Table S5d;    Posterior probabilities are based on 2 million simulations per scenario, using weighted polychotomous logistic regression on the 1% of simulated data sets closest to the observed data (Cornuet et al., 2008(Cornuet et al., , 2010. | 6695 (0.0002-0.007) and 0.012 (0.001-0.0231), respectively. The strongest support in Pool A was for scenario 10 ( Figure 6a, Table 3; PP = 0.75), in which the oldest divergence is between Iberia and the other three regions, followed by divergence of the East, and recent divergence between Italy and the Balkans. False positive rates for scenario 10 for pseudo-observed data sets (PODs) simulated under either an "Out of the East" or "Out of the West" scenario were low (4% and 6.5% respectively), and we thus reject both scenarios for S. umbraculus.

| Identification of the best scenarios overall
Among the scenarios involving only the four sampled populations  (Table 4).
The third supported scenario (139, Figure 6h), with a posterior probability of 0.20 (0.12-0.27), also involves initial divergence of Iberia from other sampled populations, but involves founding of the Balkans by admixture from Italy and an unsampled population.
Despite differences in topology and the presence and direction of gene flow between Italy and the Balkans, all of the pool-winning models show highly concordant median estimates for the timing of Iberian divergence from the other populations (190-374 ky, parameter tanc in Table 4), and the timing of gene flow or divergence between Italy and the Balkans (5-18 kya, parameter t1 in Table 4).  (Table S5e). Both scenarios underestimated pairwise F ST for nuclear markers between the Balkan and Eastern populations, and overestimated single-and two-population mitochondrial sequence diversity measures involving the Balkans (Table S5e). Precision in ABC estimation of parameters for each of the two best models, expressed as the relative medians of the absolute error (RMAE), was low enough to indicate significant signal for them in the data (Table S5f) and almost all that are associated with temperate habitats show strong longitudinal genetic structure associated with southern Pleistocene refugia (Connord et al., 2012;Hewitt, 2004;Schmitt, 2007;Taberlet et al., 1998). Synergus umbraculus is no exception, with STRUC-TURE analyses for K values from 3 to 7 all showing substantial longitudinal structure. STRUCTURE genotype pools in S. umbraculus are often shared between putative refugia (Figures 1 and 5), and Central European regional populations involve contributions from east and west.

|
These patterns are compatible with divergence of genotype pools before occupation of current geographic distributions, or divergence during occupation of refugia followed by dispersal between them, or both of these. The K = 3 ABC analysis showed a dominant signature of divergence between three genotype pools, with no evidence for founding of one by admixture from the other two. The *BEAST model used to infer relationships between the K = 7 genotype pools lacked admixture, and exploration of relationships between these populations (e.g., using ABC) may yet reveal it. However, the longitudinal distribution and divergence history of genotype pools for K values 3 and 7 make it highly likely that admixture has been important in the founding and development of regional populations.
The longitudinal patterns in S. umbraculus parallel those in other members of the oak gall wasp community in showing strong genetic differentiation between regions of southern Europe thought to have acted as glacial refugia for oaks (Petit et al., 2003) and associated organisms Hayward & Stone, 2006;Lohse et al., 2010Lohse et al., , 2012Nicholls et al., 2010aNicholls et al., , 2010bRokas et al., 2001Rokas et al., , 2003Stone et al., 2007Stone et al., , 2012. The Western Palaearctic oak gall wasp fauna has many regionally endemic lineages, which are thought to have diversified originally in Western Asia 5-7 million years ago , before spreading westwards into Europe. Most members of the oak gall community surveyed to date show east-west declines in genetic diversity that are concordant with such an "Out of the East" model Mutun, 2016a;Mutun & Atay, 2015;Nicholls et al., 2012;Rokas et al., 2003;Stone et al., 2012).
This pattern is also shared by many other taxa (e.g., Connord et al., 2012;Rossiter, Benda, Dietz, Zhang, & Jones, 2007;Wahlberg & Saccheri, 2007). A small number of species in this community show contrasting patterns. Several parasitoid species show highest genetic diversity in Iberia-a pattern compatible with an "Out of the West" model , while one gall wasp (Biorhiza pallida, which is a host gall for Synergus umbraculus) shows a deep split between Iberia and regions further east Rokas et al., 2001). Synergus umbraculus shows a similar pattern to B. pallida in this regard, and in contrast to other community members, intraspecific genetic diversity for S. umbraculus was highest in the Balkans. This parallels patterns seen in some other taxa (e.g., Michaux, Libois, Paradis, & Filippucci, 2004;Ursenbacher et al., 2008), and confirms previous analysis of mitochondrial haplotypes by Bihari et al. (2011).
Model-based phylogeographic analysis has only been applied to a subset of oak gall inhabitants-four parasitoid species (Lohse et al., 2010;Lohse, Sharanowski, et al., 2011) and one gall wasp . Methodological limitations associated with use of a full likelihood framework limited these analyses to consideration of only three putative refuge populations (Iberia, the Balkans, and Iran). The parasitoids showed patterns generally consistent with the "Out of the East" range expansion, over a range of timescales. In contrast, the gall wasp Biorhiza pallida showed divergence between Iberia and populations in the Balkans and Iran ca. 150 kya, with subsequent gene flow from Iran to Iberia ca. 40 kya without passage through the Balkans. This seems implausible until we consider that North Africa had more extensive oak forests over the timescale considered (Lamb, Eicher, & Switsur, 1989;M edail & Diadema, 2009), and so could have provided an alternative, non-European, route for eastwest dispersal .
Our analysis differs from these in including four rather then three putative refuge regions (with or without an additional unsampled fifth population), and as far as we know it is the first to do so in a fully explored, model-based framework.  (Comps et al., 2001;Hewitt, 1999Hewitt, , 2004Petit et al., 2003;Stone & Sunnucks, 1993;Stone et al., 2012). However, we did find support (albeit weaker) for a scenario (139) in which the Balkan population was founded by admixture from Italy and an unsampled population, and founding of the Balkan population by admixture was inferred in two other pool-winning scenarios (even though these received little support in the second round of ABC analysis) (Pools B and G; Figure 6b,g). The *BEAST analysis of the K = 7 STRUCTURE populations (Figures 1b and 5) suggests that each of Italy and the Balkans contain contributions from an older more local genotype pool (3 and 4, respectively), with contributions from additional, more recently diverging genotype pools to east and west. Admixture is the most likely explanation for three individuals identified in the K = 7 STRUCTURE analysis as having highest probabilities of membership in primarily Iberian (2) or Eastern (7) genotype pools (Figure 1b). It remains possible that comparison of more complex scenarios, not included in our model set, could identify significant signatures of admixture in both Italy and the Balkans.
One further aspect of our analysis requiring consideration is our decision to focus on four regions in our geographic analysis. Selection of four regions was motivated by our requirement for full exploration of model space, and guided by previous work on the phylogeography of oaks and associated insects. However, the K = 7 STRUCTURE analysis suggests that both Iberia and the East comprise more than one population unit. Previous work in Turkey and Iran on a range of taxa (Bilgin, 2011;Davis, 1971) including oak gall wasps (Mutun, 2016b;Rokas et al., 2003)

| Pros and cons of tournament ABC as an approach to phylogeographic model selection
Model-based inference of regional population history requires at least two major choices: which individuals to group into population units, and which population demographic scenarios to compare. We grouped individuals in two ways-into genetically coherent populations identified by the K = 3 STRUCTURE analysis, and into four geographic populations corresponding to putative glacial refugia through the Pleistocene. For each set of groupings, our demographic scenarios included all topologies for sampled populations and one unsampled population, with founding of (at most) one sampled population by admixture from two others. For the K = 3 STRUCTURE groupings, this resulted in seven scenarios, while for the four regions this resulted in 148 scenarios. As discussed above, scenario choice over the two sampling regimes produced plausible historical hypotheses.
While the 148 geographic population models are a large set, they are simple models, lacking more complex patterns of changes in population size or directional gene flow between established populations, as explored, for example by Fagundes et al. (2007) for emergence of human populations from Africa, and Pelletier and Carstens (2014) for a North American salamander. All models are wrong to some extent (Box, 1979;Hickerson et al., 2013;Nielsen & Beaumont, 2009;Thom e & Carstens, 2016), and the best model from such a set should probably be considered a starting point for exploration of more parameter-rich and realistic models. Given the complex refugial history of the Palaearctic through the Pleistocene, it is possible that the population history of S. umbraculus involves repeated cycles of isolation and gene flow between refugia-a possibility explored models by Jesus, Wilkins, Solferini, and Wakeley (2006). Nevertheless, concordance in key events across pool-winning scenarios, good fit of posterior predictive simulations to observed summary statistic values, high precision/low bias for important parameter estimates, and low type 2 error rates in model choice all suggest that the winning models capture important aspects of the true underlying population history.
Hierarchical approaches, adopted to varying degrees by previous studies (Barr es et al., 2012;Kerdelhu e et al., 2014;Wei et al., 2015), have some obvious drawbacks and advantages. One perceived drawback of including apparently implausible scenarios is that analysis is prolonged. It could be considered more efficient to exclude implausible population topologies or parameter combinations a priori (for a well-argued example, see Barr es et al., 2012), and to focus on more parameter-rich comparison of plausible models. However, apparently implausible scenarios will sometimes be true, and if they are not included, support for them can never be detected. The example A second drawback is that without a single reference table for all 148 scenarios we could not estimate posterior probability support and error rates over the full ABC experiment. We were also not able to explore the consequences of allocating scenarios to pools in different combinations on overall scenario choice (we know of no way to combine prior simulations for scenarios in DIYABC reference tables into new combinations). Because support for a scenario is conditional on the set of models being compared, it is possible that a model ranked second in one scenario pool could provide a better fit to observed data than the winning scenario in another pool. Were this true, the pool winners would not be the best set of scenarios given the data, and parameters estimated jointly over pool winners would not be the best estimates for the whole scenario set. However, these issues should not influence identification of the individual best scenario over the full set. This is because this model will always be compared directly with the second best model overall either within its own pool in the first round of the tournament, or as the winner of another pool in the second stage of the analysis. Given the high concordance across pool winners in major aspects of population history inferred for S. umbraculus, we consider our inferences robust. However, ranking of models over the full set remains a problematic issue in a hierarchical model selection strategy. Alternative ABC approaches (e.g., Jackson et al., 2017;Pelletier & Carstens, 2014) do allow direct comparison of similarly large numbers of scenarios, circumventing these drawbacks. Alternative approaches to ABC model selection, including random forest algorithms (Pudlo et al. 2016;Smith et al., 2017) and approximate likelihood methods (Jackson et al., 2017), can also compare large numbers of models and require significantly less computing effort to identify best-supported models across the full set than classic ABC approaches. Likelihood-based approaches avoid the simulation step and can provide unambiguous ranking of competing models (e.g., Lohse et al., 2010Lohse et al., , 2012Lohse and Frantz 2014) but are currently restricted in the complexity of scenarios they can compare.
Our tournament approach combines the ability to incorporate model complexity inherent in ABC with an analytical structure that reduces the number of models compared at each step, and avoids the crowding of model space that can occur when large numbers of similar models are compared directly (Pelletier & Carstens, 2014).
Tournament approaches may be useful where large numbers of populations and parameters increase the challenge of fully exploring model space, with the aim of minimizing preconception bias in scenario choice.

ACKNOWLEDG EMENTS
This work was funded by UK NERC grants NE/E014453/1 and NE/ J010499 to G Stone, and OTKA grant T049183 to Z. P enzes. We thank Dr. Peter Bihari for supplying DNA extractions for Synergus umbraculus samples using in his analysis and for providing the cytochrome b sequences for these individuals, and thank Dr. Zoltan Acs for additional samples. We thank Bryan Carstens and three anonymous referees for their constructive comments during the review process.