Hidden histories of gene flow in highland birds revealed with genomic markers

Genomic studies are revealing that divergence and speciation are marked by gene flow, but it is not clear whether gene flow has played a prominent role during the generation of biodiversity in species‐rich regions of the world where vicariance is assumed to be the principal mode by which new species form. We revisit a well‐studied organismal system in the Mexican Highlands, Aphelocoma jays, to test for gene flow among Mexican sierras. Prior results from mitochondrial DNA (mtDNA) largely conformed to the standard model of allopatric divergence, although there was also evidence for more obscure histories of gene flow in a small sample of nuclear markers. We tested for these ‘hidden histories’ using genomic markers known as ultraconserved elements (UCEs) in concert with phylogenies, clustering algorithms and newer introgression tests specifically designed to detect ancient gene flow (e.g. ABBA/BABA tests). Results based on 4303 UCE loci and 2500 informative SNPs are consistent with varying degrees of gene flow among highland areas. In some cases, gene flow has been extensive and recent (although perhaps not ongoing today), whereas in other cases there is only a trace signature of ancient gene flow among species that diverged as long as 5 million years ago. These results show how a species complex thought to be a model for vicariance can reveal a more reticulate history when a broader portion of the genome is queried. As more organisms are studied with genomic data, we predict that speciation‐with‐bouts‐of‐gene‐flow will turn out to be a common mode of speciation.


Introduction
Genomic studies of species formation are increasingly revealing that divergence and speciation are marked by gene flow (Rheindt & Edwards 2011;Fontaine et al. 2015;Mallet et al. 2015;Payseur & Rieseberg 2016). This new appreciation for the role of gene flow in diversification departs from a more standard view of lineages splitting through clean, bifurcating histories that sunder through vicariance, never to cross paths again. It is no coincidence that our appreciation for the role of gene flow during diversification has grown with our ability to collect more genomic data (Abbott et al. 2016). Highthroughput sequencing now allows for broad genomic investigation, even at the level of whole genomes, making it possible to reconstruct species histories in situations where gene histories disagree (Jarvis et al. 2014), and specifically in cases of reticulate evolution resulting from gene flow (Yu et al. 2014). Whole-genome studies of recent speciation between sister species show that gene flow can be ongoing through the process of species formation (Ellegren et al. 2011;Osborne et al. 2016), with fixed species differences sometimes occurring in only a fraction of the genome affecting aspects of the phenotype (Poelstra et al. 2014).
While these recent studies have heightened our awareness of the role of gene flow in speciation, less clear is whether gene flow plays a prominent role in the generation of biodiversity in species-rich regions of the world where allopatry and vicariance are assumed to be the principal modes by which new species are formed (Haffer 1969). In the Mexican Highlands, for example, diversity and endemism have often been attributed solely to vicariance caused by dynamic geological activity, mountain formation and habitat fragmentation (Halffter 1987;Sullivan et al. 2000;Castoe et al. 2009;Bryson et al. 2011;McCormack et al. 2011). Alternative models, however, posit a more active role for recent and ancient gene flow in species evolution (Velo-Ant on et al. 2013;Mastretta-Yanes et al. 2015). In considering alternative models, we draw a subtle, but important contrast between the classic divergence-withgene-flow model of speciation (Nosil 2008;Pinho & Hey 2010), which implies continuous gene flow and focuses on the role of local adaptation in divergence, and what we call divergence-with-bouts-of-gene-flow, in which initial vicariance is crucial, gene flow is inconsistent through time (possibly involving nonsister taxa), and the result is a messy divergence process in which selection may or may not play a prominent role.
We revisit a well-studied system, jays in the genus Aphelocoma, to test for patterns of recent and ancient gene flow among lineages in the Mexican Highlands. We focus on four lineages in different mountain ranges (Fig. 1). One of these lineages is considered a separate species, the Transvolcanic Jay (Aphelocoma ultramarina), and is sister to the other three, which together comprise allopatric lineages of the Mexican Jay (Aphelocoma wollweberi). These lineages diverged between 1 and 5 million years ago (McCormack et al. 2011). Prior conclusions from studies relying mostly on mitochondrial DNA (mtDNA) conformed to the standard model of allopatric divergence and speciation, and showed little evidence for reticulate processes or gene flow among hundreds of individuals sampled throughout Mexico (McCormack et al. 2008b(McCormack et al. , 2011. Several lines of evidence suggest, however, that despite the bifurcating and monophyletic patterns observed in mtDNA, there is a more obscure history of gene flow among the four lineages. First, despite no mtDNA haplotype sharing, results from 14 microsatellites (McCormack et al. 2008b) indicated a broad cline of nuclear gene flow between two of the A. wollweberi lineages in the Sierra Madre Oriental (East and Central; Fig. 1). Second, a prior phenotypic study of thousands of museum specimens suggested contact zones within A. wollweberi and between A. wollweberi and A. ultramarina (Pitelka 1951;Peterson 1991 perfectly to different genetic clusters, detailed inspection on a locus-by-locus basis revealed a distinctive macromutation in one microsatellite locus whose geographic distribution suggested ancient gene flow between the two species (McCormack & Venkatraman 2013).
Our goal was to test for patterns of recent and ancient gene flow among the four lineages of Aphelocoma jays using a broad genomic subsample. We collected data from thousands of nuclear genomic regions using sequence capture of ultraconserved elements (UCEs). We were also able to assemble nearly complete mtDNA genomes from off-target reads. We compared the topology of mtDNA and nuclear trees to assess whether there was well-supported conflict between them, which often reflects gene flow (Carling & Brumfield 2008;Toews & Brelsford 2012;Peters et al. 2014) or gene flow coupled with selection (Ribeiro et al. 2011;Pavlova et al. 2013). After finding disagreement between the mtDNA and nuclear trees, we used single nucleotide polymorphisms (SNPs) to conduct population assignment tests, visualize discordant histories on a species tree, and test geographic hypothesis for gene flow with the ABBA/ BABA method (Green et al. 2010;Durand et al. 2011).

Sampling and DNA extraction
We sampled 20 Aphelocoma wollweberi and four Aphelocoma ultramarina, using blood or tissue from 14 localities chosen to best represent the breadth of their distribution in the Mexican Highlands ( Fig. 1 and Table 1). We complemented these samples with two samples from the closely related Western Scrub-Jay (Aphelocoma californica), which we used as outgroups for both the phylogenies and the ABBA/BABA tests. We extracted genomic DNA from tissue or blood using a Qiagen DNeasy Blood and Tissue extraction kit. We visualized extractions on an agarose gel to ensure fragments were >200 bp, and we quantified the resulting doublestranded DNA using a Qubit fluorometer (Life Technologies, Inc.).

Sequence capture and next-generation sequencing
To collect a representative subsample of the genome, we used sequence capture of ultraconserved elements (Bejerano et al. 2004). This method uses 180-bp synthetic baits to capture DNA data from thousands of orthologous loci identified from alignments of the chicken and Anolis genomes (Faircloth et al. 2012). Following capture, sequencing and assembly, variable base positions are mostly found in the regions flanking the UCEs. This method has proven useful for solving phylogenetic questions at deep timescales , as well as more recent questions at phylogeographic scales (Smith et al. 2014;McCormack et al. 2015;Manthey et al. 2016).
We followed the protocol for library preparation and UCE enrichment from Faircloth et al. (2012) which is available at ultraconserved.org. Briefly, we sheared 100 lL of DNA at a 20 ng/lL concentration to a size distribution with a peak between 400 and 600 bp using a Bioruptor ultrasonicator (Diagenode). We prepared genomic libraries for each sample for the Illumina platform with a KAPA LTP library preparation kit, attaching custom indexing tags (Faircloth & Glenn 2012) to DNA from each sample to allow pooling during the capture phase. We enriched three pools of eight samples each for 5060 UCE loci using a set of synthetic RNA probes (MYbaits_Tetrapods-UCE-5K kit; Mycroarray). After enrichment and recovery PCR, we verified that library size range was between 400 and 600 bp using a Bioanalyzer (Agilent Technologies). We quantified the enriched pools using qPCR and combined them in equimolar ratios prior to 250-bp paired-end sequencing on one run of an Illumina MiSeq.

Bioinformatic processing, assembly and alignment of UCEs
We followed the PHYLUCE (Faircloth 2015) pipeline to perform quality control on the demultiplexed reads, as well as to assemble UCE loci from the reads, and to align UCE loci for phylogenetic analysis. As part of this pipeline, we trimmed reads of adapter contamination and low-quality bases with Illumiprocessor (Faircloth 2013) and Trimmomatic (Bolger et al. 2014). We assembled reads into contigs using the PHYLUCE script assem-blo_abyss, in concert with ABySS (Simpson et al. 2009), using kmer=51. We identified assembled contigs matching UCE loci by aligning each contig to the original UCE probe sequences with the PHYLUCE script match_-contigs_to_probes, which uses LASTZ (Harris 2007). Finally, we aligned UCE loci using MAFFT v7.13 (Katoh & Standley 2013) by running the PHYLUCE script seq-cap_align_2, using the default trimming algorithm.

Mitochondrial genome assembly
Mitochondrial DNA reads are abundant in off-target sequences resulting from sequence capture experiments (Picardi & Pesole 2012;do Amaral et al. 2015). We identified and assembled mtDNA genomes from off-target, trimmed Illumina reads. The complete mtDNA genome of the Rook (Corvus frugilegus) served as a reference (GenBank accession NC_002069) because this was the  (Chevreux et al. 1999) that takes a baiting and iterative mapping approach for assembly. We annotated the assembled mtDNA regions de novo with the MITOchondrial genome annotation Server (MITOS) (Bernt et al. 2013). To facilitate alignment, we selected and aligned only the 13 protein-coding mtDNA regions with MAFFT and built a partitioned, concatenated matrix.

Phylogenies from concatenated data
We created a concatenated data matrix of UCE loci including outgroups. We allowed some missing data, dropping a UCE locus only if it was absent from more than 25% of the samples (i.e. more than six individuals). We analysed this unpartitioned, concatenated data matrix with RAXML v8.0.19 (Stamatakis 2014) using the GTRGAMMA model of evolution by searching for the best-scoring maximum-likelihood (ML) tree, performing 1000 bootstrap searches and reconciling the best ML tree with the bootstrap replicates. We followed the same ML tree search and bootstrapping strategy to analyse a concatenated matrix composed of protein-coding mtDNA loci partitioned according to locus.

Calling and analysing SNPs
In addition to tree-based approaches, we also wanted to employ methods capable of identifying individuals and populations with evidence for introgression, which is facilitated by SNP data. We called SNPs using the assembled UCE contigs of the ingroup individual with the highest number of recovered UCEs as a reference. We indexed and mapped the trimmed Illumina reads to the reference assembly with the BWA-MEM algorithm, which is suitable for reads ranging from 70 bp to 1 Mbp (Li 2013 (Bouckaert 2010), which is expected to show fuzziness in parts of the tree that conflict due to gene flow or other causes. We also analysed the SNP data in STRUCTURE v2.3.4 (Pritchard et al. 2000) to assign individuals to a predetermined number of genetic groups using their multilocus genetic profiles. We ran simulations under the admixture model to detect signals of gene flow among lineages. We assumed an admixture model and correlated allele frequencies. We ran the data assuming different numbers of genetic clusters (K = 2 to K = 8) for one million generations and five replicates. These preliminary analyses suggested that K < 6 was much more likely than K > 6 scenarios. This led us to increase our computing effort, focusing on K = 2 to K = 6, increasing the number of generations to 10 million with 10 replicates for each value of K. Outgroup (O4) for all tests was a Western Scrub-Jay (Aphelocoma californica); hypotheses correspond to geneflow scenarios on the map in Fig. 1; nABBA, number of ABBA patterns; nBABA, number of BABA patterns.

Introgression tests
To detect potential past hybridization, we used the ABBA/BABA test (Green et al. 2010;Durand et al. 2011). In a phylogeny with topology (((P1,P2))P3)O4), where an allele B arises via mutation from allele A in the common ancestor of P1, P2 and P3 (where O4 is the outgroup), it is expectedin the absence of other evolutionary forcesthat stochastic lineage sorting will produce equal numbers of (( (A,B))B)A) and (((B,A))B)A) patterns in the descendent lineages. An excess of ABBA patterns is expected when there has been gene flow between nonsister lineages P2 and P3, provided that P1 and P3 are not also exchanging genes. Such asymmetries in the frequency of shared, derived variants are detectable using Patterson's D-statistic (Green et al. 2010;Durand et al. 2011). The ABBA/BABA test has been applied to SNP data derived from whole genomes (Green et al. 2010), RADseq protocols (Eaton & Ree 2013;Rheindt et al. 2014;Streicher et al. 2014) and exon capture data (Heliconius Genome Consortium 2012), but thus far not to SNPs mined from UCEs. We performed 12 comparisons to test different hypotheses for gene flow (letters in Fig. 1). For each comparison, we assumed a tree topology (((P1,P2))P3) O4), where P represents ingroup taxa and O the outgroup, in this case a Western Scrub-Jay (FMNH 334087). The ABBA/BABA method tests for evidence of gene flow between P3 and either P1 or P2 (traditionally P2 is designated as the lineage hypothesized to be exchanging genes with P3). We used the mtDNA phylogeny as a guide to identify P1, P2 and P3, making an assumption that this topology reflects the original history of divergence per McCormack et al. (2011), with later gene flow resulting from secondary contact. Thus, we made sure P1 and P2 were more closely related to one another than either was to P3. We chose P2 to be the lineage hypothesized to have gene flow with P3, which, if true, would result in more ABBA patterns relative to BABA patterns as indicated by a positive Patterson's D-statistic (Green et al. 2010;Durand et al. 2011). We tested each of the six hypotheses depicted in Fig. 1 using two different sets of individuals to ensure that results were not driven by outlier data at the individual level. We tested two null hypotheses where P1, P2 and P3 were all drawn from the same lineage, expecting no differences in ABBA vs. BABA patterns.
The D-statistic was originally applied to continuous genomewide data (Green et al. 2010). To compute the standard error, Green et al. (2010) took a block jackknife approach where the resampling block size was larger than the extent of linkage disequilibrium. Our data set comprises UCEs located on average 188 150 AE 12 485 bp apart in the chicken genome (Faircloth et al. 2012).
Given this distance, it is possible that jackknifing among UCE loci will overcome the extent of linkage disequilibrium. Because we mapped against a reduced reference genome that we created by concatenating UCE loci, we used a block size of 500 bp (a number near the average size of UCE loci, see Results) to resample among loci. Additionally, to test the effect of block size on the outcome of each test, we resampled with 200 and 1000 bp block sizes.
To isolate SNPs for use in the ABBA/BABA tests, we mapped the trimmed Illumina reads of each sample to the outgroup with BWA. After removing PCR duplicates from the mapped BAM files with Picard, we followed the GATK pipeline as above and added a base recalibration step. Given that there is not a SNP database available for these species, we followed the base recalibration procedure suggested by the GATK best practices for nonmodel organisms. This consists of performing an initial round of SNP calling on the original uncalibrated data, selecting the SNPs with the highest confidence (a minimum emission and call quality of 40) and using these as the database of known SNPs. We executed four rounds of base recalibration on the original data. We used the recalibrated, mapped read files as input to the program ANGSD v0.70 (Korneliussen et al. 2014), which makes use of genotype likelihood estimates rather than SNP calls, counts ABBA/BABA variants from the mapped data, and estimates the Dstatistic. For each individual, we only considered bases with a minimum map score and quality of 30. We executed the jackknife procedures with the jackknife.R script included in ANGSD to obtain standard deviations and Z-scores for the D-statistic. We then converted Z-scores to their equivalent P-values.

UCE and mtDNA trees from concatenated data
On average, we obtained around 1.5 million reads per sample (range: 834 774 to 2 200 836 reads) after initial quality control and adapter trimming (Table 1). We assembled an average of 336 000 contigs per sample with an N50 between 586 and 764 bp. We recovered an average of 4303 UCE contigs per sample (Table 1). A total of 4188 contigs were present in at least 20 ingroup and outgroup individuals, forming an aligned matrix of 2 205 383 bp.
We assembled the mtDNA genomes of 20 of the individuals from between 2625 and 44 213 reads at an average read depth of 1529. For the four samples that were derived from blood (02N9800 and 02N9802 from site 3, and 05J0230 and 04J0128 from sites 8 and 9), we obtained a low number of mtDNA reads per sample (range: 160-341), resulting in low read depth and many ambiguous base calls. Despite the low number of mtDNA reads, coverage of nuclear UCE loci enriched from these blood samples was high and on a par with tissue-derived samples. Thus, we removed these four individuals from the mtDNA analysis, but retained them in the UCE analysis. The final, aligned mtDNA matrix contained 11 352 bp comprising 13 concatenated, protein-coding regions.
The ML phylogeny of the mtDNA genomes (Fig. 2B) showed the same topology as that described in previous studies using one or two mtDNA regions (McCormack et al. 2008b(McCormack et al. , 2011. There was a deep split between Transvolcanic Jays and Mexican Jays. Within Transvolcanic Jays, the subspecies in the east (Aphelocoma ultramarina ultramarina) and west (A. ultramarina colimae) were reciprocally monophyletic. Within Mexican Jays, the East, Central and West groups were each monophyletic, with the East group sister to the Central + West groups. Within the West group, samples from the north (A. w. arizonae) and south (A. w. wollweberi) in the Sierra Madre Occidental showed reciprocal monophyly. As in previous studies, two divergent mtDNA clades were present even within A. w. arizonae (McCormack et al. 2008a).
The ML phylogeny from concatenated UCE data ( Fig. 2A) differed from the mtDNA phylogeny by showing less evidence for monophyly of the previously described clades. Where both phylogenies agreed was in the deep split between Mexican and Transvolcanic Jays, the reciprocal monophyly of the two groups within Transvolcanic Jays, and the monophyly of the East lineage of Mexican Jays. However, the UCE tree did not place the East group of Mexican Jays sister to all other Mexican Jay groups, but rather recovered the East group as sister only to the Central group, which itself was not monophyletic. The West group was also not monophyletic, contrasting with the mtDNA tree. Rather, some West individuals appeared more closely related to the Central group. The central Sierra Madre Occidental population within the West lineage (site 3 in Fig. 1) also showed conflicting phylogenetic affinities, with the UCE tree supporting a sister relationship with northern populations and the mtDNA tree supporting a sister relationship with southern populations. The long terminal tips in the UCE phylogeny are likely an artefact of singletons, possibly caused by sequencing error, that are interpreted by the phylogenetic program as derived variants for terminal taxa.

SNPs, STRUCTURE and SNAPP results
We used FMNH 393742, which produced 4418 UCE loci, as our reference for mapping and SNP calling. We mapped between 22 and 31% of the trimmed reads to The SNAPP species tree obtained from 2501 SNPs was similar to the UCE tree generated from concatenated data, but the cloudogram revealed more about the conflict within the genetic data for those individuals that differed in their placement between the mtDNA and UCE trees (Fig. 3A). For example, certain West individuals appear to have conflicting genetic signal across loci and affinities to both the West and Central groups in the SNAPP tree (black arrow in Fig. 3A). The distribution of posterior trees was approximately evenly split between these two evolutionary scenarios. In contrast, there was no observable conflict in the SNAPP tree for two other cases where the mtDNA and UCE trees differed. A sister relationship between the East and Central groups was strongly supported by the SNAPP tree. Likewise, individuals from site 3 within the West group were unanimously supported as being part of the northern lineage.
Results from STRUCTURE were largely consistent with the SNAPP species tree, but provided a more nuanced understanding of the genetic affinities of individuals (Fig. 3B). STRUCTURE runs of K = 4 have the highest likelihood and identified genetic clusters corresponding to the Transvolcanic Jay and three groups within Mexican Jays. These three groups did not conform perfectly to the three geographic groups West, East and Central. Instead, STRUCTURE detected two genetic groups within the West (north and south) and did not split the East and Central groups. West individuals (site 6) that appeared to share ancestry with the Central group in the UCE and SNAPP trees showed mixed assignment in the STRUCTURE plots. Within the West group, individuals from site 3 showed signs of mixed assignment between north and south lineages.

Introgression tests
Results of the ABBA/BABA tests (Table 2) also suggest introgression between lineages based on 630 SNPs with patterns appropriate for testing. The null test where all individuals were drawn from the same lineage produced results with negligible deviations from the expected 50/50 ratio of ABBA/BABA patterns and the lowest Z-scores of any tests. The closest test to a positive control (Hypothesis C) was between the East and  Bootstrap support is shown for nodes >75%. Tip labels are museum numbers followed by site numbers relating to map inset. The black arrows denote two individuals at site 6 with evolutionary affinities to both West and Central lineages. (B) STRUCTURE plot generated using 2501 informative SNPs gathered from UCE loci. Colours represent assignment to four genetic clusters: West (north), dark blue; West (south), light blue; Central + East, green; Transvolcanic, orange.
Central lineages, where several lines of evidence point to introgression. This comparison produced the highest Z-scores. We observed the next highest Z-scores between the West and Central lineages (Hypothesis B), where the STRUCTURE plots and SNAPP trees suggested introgression. In the two tests for introgression between the most divergent lineages, Mexican and Transvolcanic Jays (Hypotheses D and E), we detected slightly elevated ABBA patterns, but they were only significant in one of four tests. Finally, in the case of Hypothesis A, where gene flow has never been hypothesized, there was an excess of BABA patterns (P-value > 0.95), which not only suggests little gene flow between the tested lineages, but that there was relatively more gene flow between the nonfocal lineages.

Discussion
Our current understanding of species limits, taxonomy and the speciation process is based largely on estimates of genetic patterns and processes inferred from one or a few markers. This is changing. New studies that sequence more individuals, use more markers and include markers from a broader portion of the genome are revealing complex portraits of speciation that have previously been overlooked (Cui et al. 2013;Mallet et al. 2015;Li et al. 2016;Wen et al. 2016). Aphelocoma jays have been a case study for speciation (Pitelka 1951;Peterson 1992;McCormack et al. 2008b), niche evolution (Rice et al. 2003;McCormack et al. 2010) and the role of isolation and Earth history events like glacial cycles and mountain formation in the generation of biodiversity (McCormack et al. 2011). Most conclusions from these studies were based on mtDNA data, where monophyletic gene trees and neatly bifurcating phylogenies lent themselves to an interpretation of species formation focused exclusively on vicariance. Where nuclear data have supplemented mtDNA, hints of gene flow have emerged that were not apparent in mtDNA alone (McCormack et al. 2008b;McCormack & Venkatraman 2013). In this study, by sampling broadly from the genome, a more complex story of speciation in Aphelocoma jays emerges, one where recent (although perhaps not current) gene flow among mountain ranges has played an important role, and one where the footprint of more ancient gene flow is present, although scarcely detectable.
The biggest difference between the story recorded in the mtDNA genome and that recorded in the nuclear genome is among the relationships of the three major genetic lineages of the Mexican Jay, A. wollweberi, which we here call East, West and Central. The mtDNA phylogeny we built from all protein-coding genes agrees with prior work (McCormack et al. 2008b). This phylogeny shows East sister to, and quite divergent from, West+Central (Fig. 2B). The UCE data, meanwhile, strongly suggest introgression between the East and Central lineages along the Sierra Madre Oriental. East and Central are sister lineages in the concatenated tree (Fig. 2) and the species tree (Fig. 3A). Introgression tests between East and Central (Hypothesis C) produced the highest Z-scores of any of the tested geneflow scenarios (Table 2). In a sense, this is the closest scenario we have to a positive control for the introgression tests, as prior work using microsatellites (McCormack et al. 2008b) and phenotype (Pitelka 1951) also both indicate gene flow between East and Central.
Given the strong evidence for nuclear introgression between East and Central, it is striking that there is no evidence for mtDNA mixing. As noted before, East and Central are monophyletic in the mtDNA phylogeny and are not even sister lineages. In a prior study, there was no evidence for a single mtDNA haplotype shared among 48 Central and 270 East individuals sampled (McCormack et al. 2008b). What does this lack of mtDNA mixing say about whether gene flow between East and Central is ongoing today? For one, it suggests there is little long-distance female dispersal. Otherwise, one would expect to find the occasional mismatched haplotype of a female that dispersed from one geographic area to the other. Rather, our results suggest that if introgression occurs today, it must happen across a very narrow cline, similar to the two hybridizing Western Scrub-Jay lineages that meet in the mountains of western Nevada (Gowen et al. 2014). In this scenario, mtDNA introgression is restricted by the effects of Haldane's rule, in which hybrid females are more likely to be sterile and therefore mtDNA is less likely to cross lineage boundaries (Carling & Brumfield 2008;Gowen et al. 2014). Another possibility is that gene flow might have stopped sufficiently long ago that introgressed mtDNA haplotypes were lost from populations due to genetic drift, while introgressed nuclear alleles persisted due to their larger effective population size.
While gene flow between East and Central was expected based on prior research, our observation of relatively high admixture between West and Central was unexpected. Neither the prior microsatellite analysis nor phenotypic work suggested a high degree of mixing between the West and Central lineages (McCormack et al. 2008b). However, the species tree, assignment tests and introgression tests all agree that the two individuals from site 6 share ancestry between West and Central. These same individuals were included in the microsatellite study of McCormack et al. (2008b), where they grouped with other West individuals. The mtDNA haplotypes of individuals from site 6 were either shared with other West individuals or were most closely related to other West haplotypes. What these results show is that genomic data can help detect gene flow where even prior nuclear data did not. In this case, we increased the amount of nuclear data from 13 microsatellite loci to thousands of UCE loci and UCEderived SNPs. Retrospectively, we discovered that a connection between West and Central had been hypothesized over 100 years ago based on careful phenotypic study (Nelson 1899). There is a certain degree of irony in the idea that painstaking phenotypic studies carried out long before the discovery that DNA is the hereditary material might, in fact, provide a wealth of hypotheses regarding the subtle influence of introgression on the appearance of populations. That phenotype is, of course, likely to be under the control of numerous nuclear genetic markers whose introgression occurs in a similar manner to the markers we queried.
Turning from more recent divergence among lineages within the Mexican Jay to the oldest split between Mexican Jays and Transvolcanic Jays, our results show a weak, but consistent signal of introgression. According to the phylogenies we inferred from both nuclear and mtDNA, there is no evidence for gene flow. The two species are reciprocally monophyletic and separated by long branches. Yet, the two introgression tests between the West lineage of Mexican Jays and a nearby Transvolcanic Jay population (Hypothesis D) both showed elevated ABBA patterns (although not statistically significant). Between the Central lineage of Mexican Jays and a nearby Transvolcanic population (Hypothesis E), ABBA patterns were also elevated, and one of the two tests was significant. The significant test amounted to an excess of 100 ABBA SNPs. For comparison, in the tests between East and Central, where gene flow was apparent from multiple lines of evidence, ABBA SNPs were elevated by almost three times as much.
If the above can be taken as suggestive evidence for gene flow between Mexican Jays and Transvolcanic Jays, that gene flow appears to be ancient based on reciprocal monophyly in the phylogenies and clear differences in genetic clustering. Ancient gene flow was the same conclusion drawn by a study that examined a macromutation in a single microsatellite locus (McCormack & Venkatraman 2013). Mexican Jays usually have very short alleles at this microsatellite locus, whereas Transvolcanic Jays were nearly fixed for alleles almost four times as long. The only place where Mexican Jays had long alleles was near the range border with Transvolcanic Jays, and there was a decreasing clinal pattern of the long allele moving away from the range border. However, even Mexican Jays with two long 'Transvolcanic' copies of the allele did not differ from other Mexican Jays in appearance or in their mtDNA type. This scenario mirrors, in some respects, the situation with modern humans and Neanderthals, which show no trace of introgression in their phenotypes or mtDNA, only in a small percentage of their nuclear genomes (Green et al. 2010).
Ultraconserved elements have been used previously to infer evolutionary history at both deep and shallow timescales (Faircloth et al. 2012;Smith et al. 2014;Manthey et al. 2016). However, the basic idea of enriching reads from around highly conserved anchor regions often leads to the question of whether enough variability will be sequenced from the flanking DNA, especially when these data are used to investigate questions at recent timescales. This problem can be compounded when analyses are limited to those that use individual gene trees because some UCE loci lack informative variation. This can produce nonbifurcating gene trees and add noise that can mislead certain types of analysis (O'Neill et al. 2013;Xi et al. 2015;Manthey et al. 2016;Meiklejohn et al. 2016).
One way to alleviate this perceived 'gene-tree problem' is to expand the phylogenetic toolkit to include methods that use individual gene trees as well as those that use concatenated data sets. Using gene-tree methods alongside concatenation has, in fact, always been a part of the UCE analytical pipeline, despite recent misrepresentations (Gatesy & Springer 2014). Another way to alleviate the 'gene-tree problem' is to gather the informative SNPs from UCE loci and use these variant sites in appropriate analyses (McCormack et al. 2015;Manthey et al. 2016). In this study, we were able to gather 2501 informative SNPs from our UCE loci for a species-tree analysis and an average of 630 SNPs with appropriate patterns for ABBA/BABA introgression tests. While far below the number typically culled from whole genomes for such analyses, this number is on a par with the number of SNPs generated by other reduced-representation methods. For example, Manthey et al. (2016) used RADseq to generate 2960 SNPs for species-tree analysis, and Rheindt et al. (2014) used RADseq to generate 954 SNPs for species-tree analysis and 889 SNPs for ABBA/BABA tests.
This study shows how a species complex previously thought to be a model for isolation and vicariance can reveal a more reticulate history when a broader portion of the genome is queried. What remains to be seen is whether 'hidden histories' will reveal themselves in other vertebrate cases as well, especially species with even lower dispersal, like reptiles and amphibians, which show strong phylogenetic structure across the Mexican Highlands in mtDNA (Bryson et al. 2011(Bryson et al. , 2012Rovito et al. 2013). Our results reinforce the growing view that mtDNA gene trees, while useful in many ways, do not accurately represent the rich histories of species and populations, which often involve gene flow between sister and nonsister lineages. As more speciation scenarios are studied with genomic data, we expect that the only thing we will discover that is especially unique about the complex evolutionary story involving, for example, humans and Neanderthals is that it was the first to be studied carefully with genomes. Speciation-with-bouts-of-gene-flow might turn out to be the most common mode of speciation.