Assessing models of speciation under different biogeographic scenarios; an empirical study using multi‐locus and RNA‐seq analyses

Abstract Evolutionary biology often seeks to decipher the drivers of speciation, and much debate persists over the relative importance of isolation and gene flow in the formation of new species. Genetic studies of closely related species can assess if gene flow was present during speciation, because signatures of past introgression often persist in the genome. We test hypotheses on which mechanisms of speciation drove diversity among three distinct lineages of desert tortoise in the genus Gopherus. These lineages offer a powerful system to study speciation, because different biogeographic patterns (physical vs. ecological segregation) are observed at opposing ends of their distributions. We use 82 samples collected from 38 sites, representing the entire species' distribution and generate sequence data for mtDNA and four nuclear loci. A multilocus phylogenetic analysis in *BEAST estimates the species tree. RNA‐seq data yield 20,126 synonymous variants from 7665 contigs from two individuals of each of the three lineages. Analyses of these data using the demographic inference package ∂a∂i serve to test the null hypothesis of no gene flow during divergence. The best‐fit demographic model for the three taxa is concordant with the *BEAST species tree, and the ∂a∂i analysis does not indicate gene flow among any of the three lineages during their divergence. These analyses suggest that divergence among the lineages occurred in the absence of gene flow and in this scenario the genetic signature of ecological isolation (parapatric model) cannot be differentiated from geographic isolation (allopatric model).


Introduction
Geography, gene flow, and time strongly influence speciation, but the relative importance of these mechanisms can be difficult to quantify (Via 2009;Pinho and Hey 2010).
Competing species concepts often differ fundamentally in the contribution of gene flow to the process of speciation. Recently diverged taxa facilitate studying the influence of gene flow on speciation because the signature of past introgression may persist in the genome (Pinho and Hey 2010).
It is difficult to test empirically for signatures of past introgression in natural systems due to pervasive genetic drift, genetic draft and variation in coalescence times (Hudson and Turelli 2003). Differences among gene genealogies may arise from differences in male/female dispersal, assortative mating and differential selection (Coyne and Orr 2004). Such processes result in discordance among gene trees of recently diverged species (Pollard et al. 2006;Degnan and Rosenberg 2009;Zhang 2011) and it is difficult to discriminate patterns of lineage sorting from patterns of past introgression because they have similar genetic signatures (McCormack et al. 2009;Payseur 2010;Pinho and Hey 2010). The competing explanations of discordance between gene trees render the study of the contribution of gene flow to speciation in natural systems challenging, although the more loci examined throughout the genome, the more likely a clear phylogenetic signal can be distinguished (Leach e and Rannala 2011).
Recent advances in biotechnology and biostatistics enable investigations into speciation. New molecular technologies and multi-locus genomic methods can fuse evolutionary history within an ecological context (Brito and Edwards 2009). Genomic approaches also allow for simultaneous exploration of differences in introgression among different parts of the genome (Payseur et al. 2004;Geraldes et al. 2006;Teeter et al. 2008;Melo-Ferreira et al. 2009). This integration of population genetic and phylogenetic perspectives promotes the creation of meaningful species trees (Degnan and Rosenberg 2009).
Explorations into the relative importance of divergence and gene flow require identifiable patterns of speciation, such as cases in which two recently diverged populations come into secondary contact. Ecotones between two distinct habitats facilitate the testing of hypotheses on patterns of divergence. In this situation, hybridization may occur and it can be an important part of the evolutionary process and an essential component in species' ability to adapt to a changing environment (Barton and Hewitt 1989;Arnold 2007;Payseur 2010).
Desert tortoises (Gopherus sp.) lend themselves well to testing for the drivers of speciation and the roles played by ecology because they are recently diverged and wideranging in multiple biomes (Fig. 1). Phylogenetic reconstruction of mtDNA haplotypes suggest a trichotomy of similarly, divergent matrilines that strongly associate with geography (Edwards et al. 2012(Edwards et al. , 2015b. Previous estimates of mtDNA divergence time between lineages of desert tortoise have been fairly consistent at 5-6 Ma (Avise et al. 1992;Lamb and Lydeard 1994;McLuckie et al. 1999;Edwards 2003). Importantly, regions of overlap occur between the distributions of divergent lineages. At these sites, hybridization is ongoing and there may be signals of past introgression (McLuckie et al. 1999;Edwards et al. 2010). Ecotones define the distribution of divergent lineages and selection appears to maintain taxonomic boundaries where they come into contact (Edwards et al. 2015a(Edwards et al. , 2015b. Furthermore, the three lineages in this system allow obtainment of a consensus among multiple gene genealogies, as there is greater potential to converge on an incorrect species tree when four or more taxa exhibit discordance among gene trees (Degnan and Rosenberg 2009).
Gopherus agassizii (Agassiz's desert tortoise) and G. morafkai (Morafka's desert tortoise) are a classic example of allopatric speciation resulting from geographic isolation by the Colorado River (Lamb et al. 1989;Avise et al. 1992;McLuckie et al. 1999;Murphy et al. 2011). The former species occurs primarily west and north of the Colorado River in the Mojave Desert and G. morafkai ranges solely south and east of the river in the Sonoran Desert. A small population of G. agassizii occurs on the east side of the Colorado River in the territory of G. morafkai (McLuckie et al. 1999). Edwards et al. (2015) used microsatellite and mtDNA genotypic data and performed habitat suitability modeling to characterize this secondary contact zone in northwestern Arizona. The distribution of each species strongly correlated with topographic, climatic, and vegetative variables. A relatively small number of hybrid individuals, most of which were identified as F 2 or backcrossed individuals, lived in ecotonal areas only. A limited distance of penetration from either parental or hybrid genotype class occurred across the contact zone. Ecological niche partitioning apparently maintains the species via a geographical selection-gradient.
In the southern part of the range of G. morafkai, parapatry may explain the formation of genetically and geographically distinct "Sonoran" and "Sinaloan" lineages (Edwards et al. 2012). The Sonoran genotype has a large distribution throughout desertscrub in Sonora, Mexico, and Arizona, USA. In contrast, the southern, Sinaloan lineage occurs solely in tropical deciduous forest and thornscrub environments ( Fig. 1; Edwards et al. 2015b). The lineages occur sympatrically in a narrow ecotone between the two habitats and limited hybridization has been detected (Edwards et al. 2015b). No geographic barrier limits gene flow. Although this pattern implicates a parapatric model of speciation, these desert and tropical environments likely expanded and contracted many times during the Pleistocene (Van Devender 2000; Riddle and Hafner 2006); this dynamic system has undoubtedly influenced speciation of the biota. Edwards et al. (2015b) performed a clinal analysis of the zone of overlap between the Sonoran and Sinaloan lineages using microsatellite and mtDNA data. A bimodal distribution of genotypes with a strong coincidence of slope and concordance of center between clines supported the hypothesis of secondary contact (Endler 1977;Barton and Hewitt 1985). The current contact zone between the Sonoran and Sinaloan lineages appeared to result from secondary contact after periods of isolation in Pleistocene refugia. Hybrid zones between parapatric taxa typified repeated population contractions into refugia followed by expansions during climate oscillations (Hewitt 1996). Edwards et al. (2015b) suggested that the shifting ecotone between tropical deciduous forest and Sonoran desertscrub likely acted as an ephemeral boundary providing recurring opportunities for interbreeding, which may have reinforced niche segregation in each lineage of tortoise. They characterized the Sonoran and Sinaloan lineages of G. morafkai as having independent evolutionary trajectories despite incomplete reproductive isolation.
The underlying population structure of an organism is critical to making inferences about the rate and patterns of speciation. Within each of the G. agassizii (hereto referred to as the "Mojave" lineage) and the G. morafkai Sonoran and Sinaloan lineages, gene flow is geographically extensive and there is genetic structure with isolation by distance (IBD: Edwards et al. 2004;Murphy et al. 2007;Hagerty et al. 2011;Edwards et al. 2015aEdwards et al. , 2015b. All three lineages appear to have experienced population expansions since the Last Glacial Maximum (LGM) based on mtDNA analysis; star phylogenies are observed at the tips of each of the long, matrilineal branches (Edwards 2003;Edwards et al. 2015b). Populations within each lineage have low estimates of genetic differentiation: F ST = 0.06 for G. agassizii; F ST = 0.05 for G. morafkai Sonoran; and F ST = 0.09 for G. morafkai Sinaloan (Edwards and Harrison 2014;T. Edwards unpubl. data). This suggests that any sampling  Table 1. DNA samples obtained from locations marked with a circle; RNA samples obtained from sites marked with a triangle. Hybrid zones where lineages come into contact represented by circles with split colors. Habitat distribution estimated from (Brown and Lowe 1980) and by digitizing published maps in B urquez et al. (1999) and Felger et al. (2012). Desert tortoise range limit from Germano et al. (1994). location within a lineage should contain roughly 90-95% of the genetic diversity of that lineage. At contact zones between lineages, if no species boundary occurs, then gene introgression should follow an IBD model. In contrast, where taxonomic boundaries are maintained at contact zones, then the rate of introgression for alleles that are under strong selection should be near zero in one or both directions (Payseur 2010).
Herein, we test hypotheses regarding biogeography and its influence on drivers of speciation. These involve comparisons of the well-established allopatric model of speciation observed between the Sonoran/Mojave lineages with predictions of the parapatric model between Sonoran/Sinaloan lineages. Under a parapatric model of speciation we expect that a signature of past introgression may persist in the genome because divergence occurred with the potential for gene flow, whereas under an allopatric model there would be no opportunity for introgression during divergence. The crux of this study system is that within this trichotomy of recently diverged taxa, the  Mojave/Sonoran allopatric model provides a base line for comparison with the Sonoran/Sinaloan parapatric model. This helps reduce the background noise caused by incomplete lineage sorting. Because hybridization currently occurs among the lineages, it would be assumed that any past events might leave a genetic signature. For the analyses, we use mitochondrial DNA (mtDNA), nuclear loci (nDNA), and RNA-seq data.

Hypotheses
We test hypotheses that the three lineages of desert tortoises experienced different mechanisms of speciation. Hypothesis H-MS i (Mojave/Sonoran isolation) assumes that the Mojave and Sonoran lineages experienced divergence in isolation. Alternatively, H-MS gf (Mojave/Sonoran gene flow) involves divergence with gene flow. Hypothesis H-SS gf (Sonoran/Sinaloan gene flow) assumes that the Sonoran and Sinaloan lineages experienced parapatric speciation without isolation and in the presence of gene flow. Alternatively, H-SS gf would also apply if these lineages diverged with cycling periods of isolation in refugia followed by secondary contact and with repeated periods of introgression. Hypothesis H-SS i (Sonoran/Sinaloan isolation) assumes that these lineages diverged without introgression during a single event of isolation (physical or ecological) followed by secondary contact (see supporting information; Tables S1 and S2).

Ethics statement
The University of Arizona Institutional Care and Use Committee (IACUC) approved all handling protocols (IACUC Control no. 09-138).

Samples
Phylogenetic analyses employed 82 DNA samples collected from 38 sites previously used in other studies (Edwards et al. 2004(Edwards et al. , 2010(Edwards et al. , 2015a(Edwards et al. , 2015bMurphy et al. 2007). These represented samples from across the range of the desert tortoises (Table 1; Fig. 1). In addition, we obtained two samples of G. berlandieri from a private collection and two samples of G. flavomarginatus collected in Durango, Mexico (Morafka et al. 1994). The RNA-seq data gathering used nine individuals in three flowcell lanes on the Illumina HiSeq platform. We dedicated two lanes to high-coverage sequencing of three individuals, one for each of the following lineages: a captive individual of G. agassizii in Arizona that originated in California (Moj_A haplotype); a captive individual of Sonoran G. morafkai from Arizona; and a wild-caught Sinaloan G. morafkai obtained from just outside of Alamos, Sonora Mexico (Rancho Las Cabras; RLC). Raw data were used to assemble reference transcriptomes. The third flowcell lane was used to generate low-coverage RNA-seq reads from six samples that were then mapped to the reference assemblies to assess diversity within and among the three lineages. For these six samples, we hand-captured and collected samples from wild desert tortoises from the following three sites in 2013: two Sinaloan G. morafkai from the Reserva La Sierrita, Sierra de Alamos, Sonora, Mexico ( For RNA sample collection, we obtained <1 mL whole blood via brachial or subcarapacial venipuncture and mixed it with a greater than 50% volume RNA lysis/binding buffer from the Ambion RNAqueous kit (Life Technologies, Thermo Fisher Scientific Inc., Grand Island, NY). Samples were immediately put on ice and then transferred to liquid nitrogen for storage within 4 h of collection. All RNA samples were verified as being of the assumed lineage (and not hybrids) with subsequent DNA analyses using 25 microsatellite loci and mtDNA (Edwards and Berry 2013).

DNA sequencing
We sequenced a 1109 base pair (bp) portion of mitochondrial DNA (ND3, tRNA arg , ND4L, and part of ND4) following Edwards (2003) and Murphy et al. (2007). Some individuals sequenced for this locus had been used in previous studies (Murphy et al. 2007;Edwards et al. 2012Edwards et al. , 2015aEdwards et al. , 2015b, including the same fragment for G. flavomarginatus . We optimized PCR conditions for six nuclear loci: BDNF, R35, and four uncharacterized loci identified by Thomson et al. (2008) derived from BAC libraries (TB02, TB07, TB53, and TB95). For PCR amplification of these loci, we used primer pairs as described by Leach e and McGuire (2006) and Thomson et al. (2008), except for R35, where we used a GenBank sequence (accession number AY434646) to design primers with OLIGO PRIMER ANALYSIS 6.68 (Molecular Biology Insights, Inc., Colorado Springs, CO) as follows: R35EX1_GOPH CACATACTGAATTTCCAGG, and R35EX2_GOPH GGACCTTTAAGTCATACAC.
We assessed optimal PCR conditions for each primer pair under 72 possible conditions by varying temperature  (Table 2). PCR began with an initial 3 min denaturation at 94°C, followed by 35 cycles of 30 s at 94°C, 30 s at the locus-specific annealing temperature (Table 2), and 90 s at 72°C, followed by 3 min incubation at 72°C. We submitted PCR product to the University of Arizona Genetics Core for DNA sequencing in both forward and reverse directions and followed standard protocols for the 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA). We used CLC DNA WORKBENCH 5.7.1 (CLC bio, Denmark) to visually align sequences and DnaSP 5.10.01 (Librado and Rozas 2009) to build fasta files and generate general descriptive statistics. We used PHASE (Stephens and Donnelly 2003) for haplotype reconstruction of diploid loci.

Phylogenetic analysis
We reconstructed unrooted haplotype networks of nuclear loci to visualize relationships among lineages using BEAST 2.1.2 . Substitution models were selected using MrModeltest 2.3 (Nylander 2004); all loci fit a HKY, gamma distribution with four discrete rate categories except TB07, which fit the GTR gamma distribution. BEAST analyses used a relaxed, log-normal clock and the tree was calibrated using a Yule model (Drummond et al. 2006). We ran the Markov chain Monte Carlo (MCMC) for 500,000,000 generations sampling every 5000, with a burnin of 10%. We viewed results in TRACER 1.6.0 (Rambaut et al. 2003(Rambaut et al. -2013 to ensure that the MCMC chains mixed well after the burnin and that ESS values were adequate (>100). We assessed patterns of haplotype diversity by grouping samples by species (G. flavomarginatus, G. berlandieri, G. agassizii, G. morafkai), by lineages within G. morafkai (Sonoran and Sinaloan) and by geographic regions where mtDNA differentiation had been previously observed (Murphy et al. 2007;Edwards et al. 2015aEdwards et al. , 2015b (Table S3).
For mitochondrial lineage reconstructions, we performed the analysis in BEAST as described above using G. flavomarginatus as the outgroup taxon to enable construction of a rooted tree. To establish estimates of time to most recent common ancestor (TMRCA) for the mtDNA locus only, we set the prior for our Bayesian analysis in BEAST for divergence time between G. agassizii and G. morafkai (Sonoran) lineages to 5.9 AE 0.5 Ma based on Edwards (2003). In addition, we used PAUP* 4.0b10 (Swofford 2002) to reconstruct maternal genealogies using both likelihood and parsimony optimality criterion searches to generate tree topologies. We compared these topologies with that derived from Bayesian analysis executed with BEAST. Analyses used unique haplotypes and all characters received equal weight. We performed a heuristic search with 100,000 random addition replicates. Support for inferred relationships was estimated using 10,000 nonparametric bootstrap replicates. We performed maximum likelihood analysis using the HKY model of nucleotide evolution.
We also performed maximum-likelihood estimates using branch models of CODEML in PAML 4 (Yang 2007) to determine the mean selection pressures on different branches of the mtDNA tree. This method compared the ratio d N /d S , termed x, where x < 1 indicated purifying selection, x = 1 indicated neutral selection, and x > 1 indicated adaptive selection. First, we calculated x under a one-ratio model in which the same ratio occurred across the tree. Next, we estimated an independent x value for each branch under the free-ratio model.
We used the *BEAST model (Heled and Drummond 2010) for species tree estimation in BEAST using mtDNA and four of the nuclear loci (TB02, TB07, R35 and BDNF). *BEAST analyses used multilocus data and the multispecies coalescent approach to infer species trees. We assigned individuals to putative species/lineages, which was difficult for individuals of G. morafkai that occurred along the thornscrub/desertscrub ecotone zone (Edwards et al. 2015b). We defined individuals with ques- tionable genotypes based on their mtDNA haplotype as either Sinaloan or Sonoran based on Edwards et al. (2015b). We used the HKY with four gamma distributed rate categories for all loci except TB07 which we applied the GTR with four gamma distributed rate categories. We used the Yule Process prior and did not set a coalescent prior. We set the population size function to linear with constant root (appropriate when the real population size dynamics tend to be continuous; Heled and Drummond 2008) and ran the MCMC for 500,000,000 generations with a 10% burnin for both strict and relaxed log normal clocks. We viewed results in Tracer; both runs achieved stationary MCMC distributions and effective sample size (ESS) values > 200. We used TreeAnnotator 1.7.5 (Rambaut and Drummond 2002Drummond -2012 to select the Maximum Clade Credibility tree that had the highest product of the posterior probabilities of all its nodes from the BEAST analysis, and FigTree 1.4.0 (Rambaut 2006(Rambaut -2012 to visualize the tree. We performed a qualitative analysis on consensus trees directly from the *BEAST trees file using DensiTree 2.2.1 (Bouckaert and Heled 2014).

Next-generation sequencing
We isolated total RNA from whole blood using standard protocols for the Qiagen RNeasy Kit (Qiagen, Valencia, CA). We quantified recovered RNA using a RiboGreen TBS-380 Flourometer (Turner BioSystems, Sunnyvale, CA) and assessed sample quality with an Advanced Analytics Fragment Analyzer using the High Sensitivity RNA Kit (Advanced Analytics, Ames, IA). We used the Illumina TruSeq RNA kit (Illumina, Inc., San Diego, CA) to build the cDNA library from total RNA. The kit targeted polyadenylated mRNA for second strand cDNA synthesis and size-selected via enzyme-mediated fragmentation. While building the cDNA library, we applied unique tags to each individual sample. We quantified each cDNA library using a Kapa Biosystems qPCR Kit (Kapa Biosystems, Inc., Wilmington, MA) specific to the Illumina Adapter sequence. Next, we pooled samples in equimolar concentrations for the final cDNA library. We ran the three high coverage individuals representing each of the three tortoise lineages on two flowcell lanes using an Illumina HiSeq 2000 and we ran the six lower-coverage individuals on a single flowcell lane using the Illumina HiSeq 2500 platform. All next-generation sequencing protocols were performed at the University of Arizona Genetics Core following standard protocols.

Transcriptome assembly
For each library, we processed raw reads to remove sequencing adaptors, trimmed for quality score (Q > 28) and length-filtered (>37 bp) using TRIMMOMATIC 0.32 (Bolger et al. 2014). We used reads from the three highcoverage samples (Mojave, Sonoran and Sinaloan) to create de novo transcriptome assemblies for each species/lineage, as well as a combined assembly consisting of reads from all three libraries to be used as a reference. We assembled transcript contigs using TRINITY (Grabherr et al. 2011;Haas et al. 2013) with default settings. As de novo transcriptome assemblies often consist of many thousands of possibly chimeric contigs that lack clear gene content (Cahais et al. 2012), we further filtered the TRINITY output for contigs with single gene annotations.
To accomplish this, we treated the TRINITY contigs as a query in a BLASTX search of mouse and chicken proteins from UniProt (Magrane and UniProt Consortium 2011) with an E-value cutoff of 1e-6. We then selected contigs containing unique BLAST hits to incorporate into a reference transcriptome for downstream analyses. We followed a slightly modified protocol of De Wit et al. (2012) for mapping and variant detection. We performed the analysis using the six low-coverage RNA-seq samples and mapped these to the reference transcriptome. We used Burroughs Wheeler Aligner 0.6.1 (Li and Durbin 2009) to generate Sequence/Alignment Map (SAM) files. We performed several trials to assess parameter sets and settled on using default parameters with assumed offset of 33, allowed for 0.005 differences between reference and query (-n), and allowed up to five differences in the seed (-k) to achieve > 67% of reads for each individual mapped to the reference transcriptome. We then converted SAM to BAM file format and removed duplicates using SamTools 0.1.18 ). Next, we merged all six individuals together to create a single BAM file and then followed recommendations in the De Wit et al.  2012) to build a Gaussian mixture model to be able to accurately distinguish true variant sites from false positives. Subsequently, we parsed out only the variant sites for which we have genotype information for all individuals with a Phred quality score cutoff of 20. We used these data for all downstream analysis.
We performed Principal Components Analysis as an initial assessment of these data using SMARTPCA in EIGENSOFT 5.0.2 (Patterson et al. 2006). We then annotated these data using TransDecoder (Haas et al. 2013) ª 2016 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd. and identified candidate coding regions within transcript sequences. TransDecoder generated a gff3 file which we then converted to a GTF (general transfer format) file using GFFREAD in CUFFLINKS 2.2.1 (Trapnell et al. 2010). Next, we used SNPdat 1.0.5 (Doran and Creevey 2013) to generate a GTF file annotating gene sequences shared by all six individuals.

Demographic inference using @a@i
We used @a@i 1.6.3 (Gutenkunst et al. 2009) to fit demographic models to our six transcriptome samples (see supporting information; Tables S1 and S2). In addition, the @a@i analysis generated a species tree for the three taxa independent of the multi-locus Bayesian analysis (Table S2). We built the folded joint allele frequency spectrum using only variants that SNPdat unambiguously called as synonymous and that were successfully genotyped in all six individuals. To choose between our final 5-and 6-parameter models, we performed a likelihood ratio test (LRT) with a statistical correction for linkage based on the Godambe information matrix (Coffman, Hsieh, Gravel, & Gutenkunst, pers comm). Parameter confidence intervals were estimated by 100 conventional bootstraps over contigs and calculated as h AE 1:96 Â rðhÞ, where h was the parameter of interest and r was the standard deviation of the bootstrap results. To convert parameters from population genetics units (scaled by ancestral population size N a ) to physical units, we assumed a divergence time between G. agassizii and Sonoran G. morafkai lineages of 5.9 Ma (Edwards 2003) and a generation time of 25 years (USFWS 1994). To convert bootstrap parameter values, we used our maximum composite-likelihood parameter values and the relation h = 4N a lL to estimate the quantity lL and then applied this value all bootstrap samples.

Results
We obtained usable sequences for the mtDNA and four of the six nuclear loci (BDNF, R35, TB02 and TB07; Table 2). Loci TB53 and TB95 generated electropherograms with excessive noise on repeated attempts, and thus they were excluded. In phasing the nuclear DNA data, all samples for BDNF had a minimum pair probability of 0.996 and all were selected for downstream analyses. For R35, all but two samples were selected with a minimum pair probability of 0.942. For TB02, all samples were accepted; three pairs fell below a probability of 0.922 but were still included. None of these samples had unique haplotypes that were not represented in another sample. For TB07, 51 samples had 100% pair probabilities and were selected for analysis; 18 samples had pair probabilities close to 0.50 between two pairs of haplotypes (13 samples with the same two possible pairs of haplotypes involved G. agassizii or Sonoran-type G. morafkai; five samples of G. morafkai collected in Mexico shared the same two ambiguous pairs). Both examples represented the same ambiguity in the same SNP and each of the four haplotypes among the ambiguous pairs was represented in the other 51 samples. For this locus we choose the pair with the highest P value to use for downstream analyses.
Among the nuclear loci, haplotypes were both lineagespecific and found globally across species of Gopherus (Table S3, Fig. 2). Haplotype diversity, nucleotide polymorphism, and nucleotide diversity estimates varied across loci and among-sample groupings and did not suggest strong trends. For example, haplotype diversity was greater in G. agassizii then G. morafkai at some loci but not others, and Tajima's D was positive for some loci and negative for others (Table S4).

Phylogenetic analysis
The matrilineal genealogy (mtDNA tree) had strong support across analyses for distinct Mojave/Sonoran/Sinaloan matrilines. Notwithstanding, the geographically distant Mojavian and Sinaloan matrilines were resolved as sister taxa, and together they were the sister group to the intervening Sonoran matriline ( Fig. 2A). The multilocus analyses exhibited the same topology for both strict and relaxed log normal clocks with species relationships consistent but with the important expectation that the adjacent Sinaloan and Sonoran matrilines clustered together and formed the sister group of the Mojave. Estimated TMRCAs exhibited wide standard deviations (Fig. 3). Qualitative analysis using DensiTree suggested strong concordance among iterations for the resulting tree topology but with less precision around depth of nodes (Fig. 3B).
For tests of selection on the mtDNA locus, our estimations of x employed different models for branches of the tree (Table 3). First, assuming a uniform x for all branches of the five species/lineages of Gopherus, x was estimated to be 0.6192, which was significantly less than one (P < 0.05). This result suggested that this gene was under overall strong purifying selection. Next, we applied a model in which every branch had its own x. This model was significantly more effective than oneratio model (P < 0.05; Table 3), suggesting that x varied among the different lineages. The value of x on the branch leading to SIN (Sinaloan lineage haplogroup) was estimated to be larger than one (1.1), indicating that positive selection may have affected this lineage (Table 3).

Network analysis
Our network reconstructions of alleles (Fig. 2) used midpoint rooting of the greatest distance, and not outgroup analysis, to present tree-like associations. As such, the terminals are alleles, and not individuals. Allelic associations of individuals were listed in Table S3. In one locus, BDNF, all species of tortoises shared most alleles. In contrast, unique alleles were constrained to species of desert tortoises in R35, TB02 and TB07, although a few alterna-tive and presumably primitive alleles occurred in more than one species.

Transcriptome assembly
We assembled 111,635,751 trimmed reads from whole blood total RNA into a combined G. agassizii and G. morafkai assembly that contained 235,412 contigs ( Table 4). The blast-filtered combined assembly contained 40,341 transcripts with a contig N50 of 3010 bp and a mean contig length of 1957 bp. After aligning the six individuals against the combined assembly and identifying the variant alleles, we characterized 95,220 polymorphic sites for which we have genotype information for all individuals. The PCA assessment showed extremely strong clustering of individuals within each lineage and relatively equidistant differentiation among lineages (Fig. 4).

Demographic modeling
We used the allele frequency spectrum (AFS)-based inference tool @a@i (Gutenkunst et al. 2009) to infer the joint demographic history of the three lineages of desert tortoise from our transcriptomes. Because AFS-based demographic inference was shown to be sensitive to genotyping errors (Gutenkunst et al. 2009) and selection (Williamson et al. 2005), we considered only synonymous variants successfully called in all six individuals, yielding an AFS with 20,126 synonymous variants from 7665 contigs.
To guide development of three-population models, we first considered simpler two-population models. Initial two-population models without gene flow (modeling allopatric speciation) consistently yielded a larger effective population size for the Sonoran population than the others and a more recent divergence between the Sinaloan and Sonoran populations than between either of those populations and the Mojave (Table S1). Consistent with this result, the best-fitting three population models involved recently diverged Sinaloan and Sonoran populations (Table S2). This result provided independent support for the *BEAST species tree. When we added gene flow (H-MS gf , H-SS gf ) into the models, either as continuous flow, such as through parapatric speciation with continuous contact, or delayed flow, such as resulting from cycling periods in refugia followed by introgression during secondary contact, the maximum composite-likelihood estimates for the gene flow parameter converged to zero (Table S1). Thus, we found no evidence of gene flow between any pair of the three populations and, thus, rejected hypotheses H-MS gf and H-SS gf .
Based on our two-population analyses, we considered three-population models in which the Sinaloan and Sonoran populations were sisters and there was no gene flow between them. Figure 5 showed the two best-fit models among those we tested (composite log-likelihood: À887 for the 6-parameter model vs. À909 for the 5-parameter model). While the two models had the same tree topology, the 6-parameter model had an additional free parameter for the effective population size of the contemporary Mojave population. Qualitatively, these models produced similar allele frequency spectra and residuals when compared with the data (Fig. 5C,D), but the 6parameter was preferred in a composite likelihood ratio test (adjusted likelihood ratio 9.242, P = 0.0024, chisquared test with df = 1). To estimate parameter uncertainty while accounting for linkage among variants, we used conventional bootstrapping. Table 5 showed the confidence intervals for the parameters of our 6-parameter demographic model. This best-fit model suggested that the Mojave and Sinaloan populations have similar effective sizes (128,000 and 150,000 individuals, respectively), but the effective size of the Sonoran population is much larger (600,084 individuals). The two divergence times in our model are also similar (Table 5), suggesting a trichotomy among these populations.

Phylogenetic relationships
The multilocus Bayesian species tree depicts genetically distinct Sonoran, Sinaloan, and Mojave lineages of desert tortoise. This tree depicts Sonoran and Sinaloan tortoises as sister lineages (Fig. 3) nested as the sister group of the Mojave lineage, which is expected given their geographic distributions (Fig. 1). While the topology of the species tree is robust, the branch lengths are not likely representative of true divergence times. The branch lengths may be distorted by differences in mean selection pressure among branches, as the PAML analysis indicated for the mtDNA locus. Because we cannot estimate the actual mutation rate of each locus, we must rely on the wellestablished date of the Bouse inundation that caused the vicariant divergence of G. agassizii and G. morafkai. This inundation now forms the Colorado River boundary between the species (Avise et al. 1992). Notwithstanding, mtDNA mutation rates based on this geological event are inconsistent with fossil records of divergence among other congeners. The molecular estimates of divergence within Gopherus may be too recent (Avise et al. 1992;Bramble and Hutchison 2014). Until a recalibration of the existing molecular clock is performed using distantly related groups, we consider our projected evolutionary rates for desert tortoises (Figs 2 and 3) to be conservative.

Patterns of divergence
The RNA-seq analyses yield two results that test our hypotheses. First, the best-fit model in the @a@i analysis for the relationship among the three taxa ( Fig. 5 and Table S2) is concordant with the *BEAST species tree ( Fig. 3) but more clearly elucidates the relative divergence times among the three lineages. The @a@i result suggests that the Sonoran/Sinaloan split occurred only a short time after (or simultaneous with) the divergence of G. agassizii (Table 5). Thus, the three lineages form a trichotomy with relatively equal times of divergence from each other (Fig. 5).
The @a@i analysis also finds no evidence of gene flow during divergence of the Sonoran/Sinaloan lineages. Despite strong biogeographic evidence that the current contact zone between the Sonoran and Sinaloan lineages is a result of secondary contact after periods of isolation in Pleistocene refugia (Edwards et al. 2015b) the rejection of hypothesis H-SS gf suggests that divergence in parapatry and/or periods of secondary contact did not result in significant introgression between lineages. The best fit model for Sonoran/Sinaloan divergence (e.g. no migration) does not differ from that obtained for the allopatric divergence of Sonoran/Mojave. Thus we cannot reject hypothesis H-SS i that Sonoran and Sinaloan lineages diverged without significant introgression and that isolation (geographical or ecological) is responsible for the divergence. Secondary contact followed the isolation, as is currently observed.
Lineages of desert tortoises have similar (and simultaneous) processes of speciation. Either geography (geographic distance and physical barriers) or selection through ecological niche segregation should drive divergence via a reduction in gene flow, and both drivers may occur simultaneously (Endler 1977;Cooke et al. 2014). Gopherus agassizii appears to have diverged first as a result of allopatry. The secondary contact zone in northwestern Arizona shows that G. agassizii adapted over time to the unique environmental conditions of the Mojave Desert (Edwards et al. 2015a). Differential adaptation may occur in allopatry and, thus, ecological speciation does not necessarily require sympatry (Bernardi 2013).
Geographic isolation may also explain speciation of the Sonoran and Sinaloan lineages and an allopatric 'speciation pump' (April et al. 2013)  long-term isolation. One alternative explanation, and one that better fits the geographical history of the region, is that the Sonoran and Sinaloan lineages first diverged into distinct ecotypes under a parapatric model of speciation during the Neogene Period. This scenario requires isolation in ephemeral Pleistocene refugia after the lineages differentiated (Fisher-Reid et al. 2013). Our results fail to find a genetic signature of ecological isolation (parapatric model) and in doing so this scenario cannot be differentiated from the allopatric model.
A growing number of empirical examples suggest that speciation can occur without spatial separation, particularly in the case of ecologically driven selection (Rundle and Nosil 2005;Pinho and Hey 2010;Smadja and Butlin 2011). Our assumption that signatures of ancient admixture between Sonoran and Sinaloan lineages would be identifiable relies on two conditions: (1) the likelihood of recurring biogeographic proximity during their evolution and (2) observations of contemporary hybridization. However, it may be that there are unique circumstances under which a signature of past gene flow will remain in the genome and these conditions were not met during the parapatric divergence of Sonoran and Sinaloan lineages of G. morafkai. Selection tends to favor divergence in the presence of gene flow only when a few traits or genes are involved, or when extensive pleiotropy exists (Smadja and Butlin 2011). Detection of past signatures of neutral introgression requires sufficient time for advantageous alleles to attain high frequency or fixation (e.g., time to fixation). In addition, the strength and timing of gene flow influences the likelihood of speciation (Kisel and Barraclough 2010;Pinho and Hey 2010;Smadja and Butlin 2011). Thus, even when divergence between sympatric taxa occurs, signals of past introgression and any remaining genetic signature may not remain or might constitute only a very minor portion of the existing genome (Mendez et al. 2012).
Our RNA-seq analyses involve six samples only and these are limited to discrete populations. However, the analyses include a massive amount of independent gene sequences and these allow for the high resolution of evolutionary patterns. Our sampling strategy minimizes geographic bias via equidistant sampling, while maximizing the opportunity to detect introgression in the genome. Short-read technology ensures that the number of loci does not limit our analysis (Table 4) and we present a high level of resolution beyond what could be inferred through traditional analyses. Importantly, analyses of the RNA-seq data effectively test our hypotheses, and we are confident that these results reflect the evolutionary history of the desert tortoise.
Inferences based on large numbers of gene sequences and few individuals have been shown to be robust for inference of population history (Wang and Hey 2010;Lohse et al. 2011;Jones et al. 2012;Hearn et al. 2014). Robinson et al. (2014) used simulations to test the ability of @a@i to differentiate between models of population divergence with and without gene flow. In particular, their simulations included cases very similar to our study, with two individuals sampled per population, 13,000-18,000 SNPs analyzed, and a divergence time of T = 0.25. They found that @a@i confidently distinguished between models with zero or moderate migration between two populations (median Akaike weight for the true model at least 0.9). These results suggest that if substantial gene flow had occurred between lineages during the evolution of desert tortoises, we would detect it.
For species of conservation concern, like desert tortoises, it is necessary to sample only a small amount of tissue using non-lethal methods. Thus, whole blood provides the most obtainable source of DNA for our study (Meitern et al. 2014). It is difficult to assess whether the number of putative transcripts in our analysis or the number of polymorphic sites characterized meets expectations because this study focuses on a nonmodel organism. Our assembly obtains almost twice as many functionally annotated transcripts than obtained from whole blood RNA-seq of greenfinch (Carduelis chloris), another nonmodel organism (Meitern et al. 2014). The assembly of de novo genomes has multiple challenges (McGettigan 2013) and we expect a high likelihood of generating technical artifacts. We address this concern through careful screening (UniProt queries) and the implementation of very conservative filtering. Regardless, these data may contain a low proportion of sequencing and assembly errors. Thus, our data appear more than adequate for the purpose of phylogenetic reconstruction. Table 5. Maximum composite likelihood parameter estimates and confidence intervals for the best-fit 6-parameter demographic model in Figure 5B. Parameters are effective population sizes (N) in individuals and times of divergence (T) in years.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Table S1. Models evaluated during demographic inference using @a@i 1.6.3 (Gutenkunst et al. 2009). Table S2. Inference of species tree topology among the three tortoise populations.