Candidate‐species delimitation in Desmognathus salamanders reveals gene flow across lineage boundaries, confounding phylogenetic estimation and clarifying hybrid zones

Abstract Dusky Salamanders (genus Desmognathus) currently comprise only 22 described, extant species. However, recent mitochondrial and nuclear estimates indicate the presence of up to 49 candidate species based on ecogeographic sampling. Previous studies also suggest a complex history of hybridization between these lineages. Studies in other groups suggest that disregarding admixture may affect both phylogenetic inference and clustering‐based species delimitation. With a dataset comprising 233 Anchored Hybrid Enrichment (AHE) loci sequenced for 896 Desmognathus specimens from all 49 candidate species, we test three hypotheses regarding (i) species‐level diversity, (ii) hybridization and admixture, and (iii) misleading phylogenetic inference. Using phylogenetic and population‐clustering analyses considering gene flow, we find support for at least 47 candidate species in the phylogenomic dataset, some of which are newly characterized here while others represent combinations of previously named lineages that are collapsed in the current dataset. Within these, we observe significant phylogeographic structure, with up to 64 total geographic genetic lineages, many of which hybridize either narrowly at contact zones or extensively across ecological gradients. We find strong support for both recent admixture between terminal lineages and ancient hybridization across internal branches. This signal appears to distort concatenated phylogenetic inference, wherein more heavily admixed terminal specimens occupy apparently artifactual early‐diverging topological positions, occasionally to the extent of forming false clades of intermediate hybrids. Additional geographic and genetic sampling and more robust computational approaches will be needed to clarify taxonomy, and to reconstruct a network topology to display evolutionary relationships in a manner that is consistent with their complex history of reticulation.


| INTRODUC TI ON
Gene flow across the boundaries of even distantly related species is now recognized as a common occurrence in many groups at both deep and recent timescales Larson et al., 2014;Nosil, 2008;Schield et al., 2019). These processes have numerous downstream effects, confounding our ability to infer bifurcating phylogenies (Leaché et al., 2014) and revealing that an evolutionary network is, therefore, a more accurate topology for many groups . Reticulations, rather than bifurcations, are consequently a common feature of the evolutionary relationships of many taxa (Blair & Ané, 2020). Similarly, genetic quantification of species boundaries now increasingly recognizes the likelihood of admixture between "completed" species and the possible existence of hybrid populations with distinct patterns of genomic ancestry (Chan et al., 2017).
However, several related challenges complicate accurate inference of these evolutionary processes at scale. First, species boundaries must be established to determine when and where introgression has occurred . Species limits are best represented as continuums of divergence rather than discrete boundaries; instances of hybridization may, therefore, represent fuzzy empirical outcomes in many cases (Chan et al., 2022).
Second, the signal for both recent and ancient gene flow may be unequally distributed within the genome and among taxa (Weisrock & Larson, 2006). In the most extreme cases, evidence may be erased from the nuclear genome by selection or drift, potentially leaving only captured mitochondrial haplotypes as evidence (Toews & Brelsford, 2012). Third, existing methods are highly constrained in their ability to estimate even moderately complex networks (Pardi & Scornavacca, 2015). Most commonly used algorithms are limited to level-1 networks (defined as those not sharing any edges between reticulations) and cannot estimate multiple hybridization events that intersect or share branches between them (Allman et al., 2019).
These conundrums are all evident in the plethodontid salamander genus Desmognathus (Figure 1). Of the 22 described species, several were morphologically cryptic and discovered only recently using molecular data (e.g., Camp et al., 2002). Many of the remaining morphospecies were discovered through further mitochondrial sequencing to represent polyphyletic assemblages , with at least 45 mitochondrial lineages (Beamer & Lamb, 2020). Subsequent analyses (Pyron et al., 2020) supported the distinctiveness of at least 49 "mito-nuclear candidate species" defined by ecogeographically monophyletic mitochondrial haplotypes and corroborated by genomic loci, revealing a complex history of reticulation involving both extant and ancestral lineages. However, these candidate species are based only on geography and phylogenetic or limited network analyses, and most have not yet been subjected to explicit delimitation analyses with population-level sampling.
Consequently, spatial boundaries and degrees of genealogical exclusivity are still undescribed for most lineages.
Additionally, the distribution and strength of the signal in the nuclear genome for the numerous known reticulation events has not been quantified. Some instances of nuclear admixture and mitochondrial capture are well known (Mead et al., 2001;Tilley et al., 2013). Others, such as a deep-time reticulation involving the lineage ancestral to aeneus + imitator, were unexpected and not reflected in present-day mitochondrial patterns (Pyron et al., 2020). Other clear instances of mitochondrial capture, such as fuscus E with haplotypes from auriculatus C (Beamer & Lamb, 2020), were not recovered by the preliminary network analyses. Therefore, a population-level genomic assessment of species boundaries and admixture combined with known instances of mitochondrial capture is needed to quantify an accurate species delimitation model and an estimate of quantifiable hybridization events.
Finally, it is relatively unknown what effects both shallow and deep-time reticulation events will have on estimation of both networks and phylogenies (Folk et al., 2016;Kutschera et al., 2014).
While some early estimates suggested that species-tree methods might be robust to modest amounts of gene flow (Leaché et al., 2014), many are now known to be inconsistent under these conditions , with particularly strong effects for rapid radiations (Jiao et al., 2020). Empirical descriptions of the

T A X O N O M Y C L A S S I F I C A T I O N
Biogeography; Evolutionary ecology; Taxonomy F I G U R E 1 A specimen (RAP0890/NPS-GRSM-196373) from the ocoee "A" lineage of the Balsam clade (see below), exhibiting the unusual erythristic pigmentation seen in some populations of this candidate species on Cataloochee Balsam in the Great Smoky Mountains. The Balsam clade is characterized by possessing fossil mitochondria from an ancient instance of 'ghost' admixture (Lawson et al., 2018;Pyron et al., 2020;Zhang et al., 2019); such complex patterns are common in Desmognathus. Photo courtesy of T.W. Pierson effects gene flow may have on topological estimation and lineagebased species delimitation are rare (Eckert & Carstens, 2008;McVay et al., 2017). Recent empirical work at the level of species and hybrid populations suggests a relatively straightforward effect: that admixed individuals often occupy artifactual positions on phylogenetic topologies (Chan et al., 2020). They create ladder-like "grades" of intermediate topological position between the various parental lineages in relative proportion to their ratios of hybrid ancestry, which "attract" closely related non-hybrids (Dolinay et al., 2021). As the history of Desmognathus is characterized by extensive cross-lineage gene flow (Pyron et al., 2020), we can, thus, quantify how these processes affect estimates of phylogeny.
We use an expanded population-level phylogenomic dataset to answer three primary questions in Desmognathus with broader relevance for understanding species delimitation across the phylogeography-phylogenetics continuum (Edwards et al., 2016).
First, of the 49 mito-nuclear candidate species, how many are supported by population-genetic evidence from clustering methods that account for admixture between lineages (Frichot et al., 2014)?
We anticipate that some candidate lineages may be collapsed, while other widespread lineages may contain significant phylogeographic structure that went undetected in previous analyses.
Second, which of these candidate species or phylogeographic lineages show evidence of hybridization across the nuclear and mitochondrial genomes, and what is the spatial extent of present-day hybrid zones (Burbrink et al., 2021;Szymura & Barton, 1986)? The existence, location, and width of these geographic admixture zones may vary significantly among species pairs and loci (Barton, 1983;Dufresnes et al., 2020). Nonetheless, we anticipate geographic localization of heavily admixed individuals to coincide with the location of mito-nuclear candidate species boundaries, aligned with physiographic features associated with climatic refugia and speciation in salamanders (Highton, 1995;Kozak & Wiens, 2006). Third, how does the existence of recent admixture events between mito-nuclear candidate species influence topological reconstructions? The impact of heavily admixed genomes on the terminal placement of individuals and their effect on the resolution of specieslevel clades is now known to be significant in many cases (Chan et al., 2022;Dolinay et al., 2021). Given the prevalence of hybrid individuals in our sampled populations of Desmognathus, we hypothesize that at least some of the phylogenetic structure detected in mito-nuclear candidate species by previous analyses may have been influenced by gene trees from these admixed terminal specimens, resulting in artifactual "clades" interpreted as meaningful units.

| Specimen sampling
Our previous studies (Beamer & Lamb, 2020;Pyron et al., 2020) included either a large number of samples (536) for a few mitochondrial genes (ND2, tRNAs, and COI), or a smaller number of individuals (161) for a larger number of loci (381 AHE genes). Based on our knowledge of the likely geographic extent of the 49 delimited mito-nuclear candidate species and their potential hybrid zones informed by previous research (e.g., Tilley, 2016;Tilley et al., 2013), we expanded this sampling in the current dataset. We increased the representation of nearly all lineages, with 896 specimens nearly doubling the largest previous study, ranging from 1 to 92 individuals per clade (mean = 19) from 732 distinct sites in 18 states in the eastern United States (Figure 2). This includes nearly every known geographic population segment of Desmognathus, excepting a few marginal populations that are presumed extirpated. We also  sequenced two formalin-fixed, fluid-preserved museum specimens (see Pyron et al., 2022) and only included these in a subset of analyses. Consequently, our primary sampling consisted of 894 specimens representing 49 mito-nuclear candidate species. Based on geography or other preliminary analysis of mitochondrial or nuclear data, we assigned each of the 896 individuals to the 49 previously delimited groups, whereas we reassigned group membership of some individuals prior to the main clustering and admixture tests (see below).
We performed part of the analyses on the GWU HPC Pegasus cluster (MacLachlan et al., 2020).

| Anchored Hybrid Enrichment data
Data were generated using the Anchored Hybrid Enrichment (AHE) approach (Lemmon et al., 2012) as described in Hime et al.
(2020) using the "Desmognathus version 2.0" probe set from Pyron et al. (2020). Allele phase was determined using the calling procedure described in Pyron et al. (2016). Sequencing and assembly proceeded in two batches, the first containing 810 samples and yielding 245 loci, the second containing 94 samples and yielding 316 loci. Homologous sets of loci between the two sets were determined by assembling their consensus sequences. Loci present in both assemblies were then merged via re-alignment using mafft version 7.475 (Katoh & Standley, 2013) with the FFT-NS-i algorithm (maximum 1000 iterations). To ensure quality of base calls for downstream analysis prior to alignment, the data were trimmed for quality by removing alleles with >80% missing data or >5% ambiguities. Individuals with >50% missing data or only a single allele called per locus were retained for gene-tree analyses but discarded for SNP-based population-genetic inferences. Eight individuals were removed due to failure or contamination, and the two fluid-preserved specimens were excluded from the primary analyses and evaluated separately (see below). The merged set of orthologous loci contained 894 individuals with data from up to 233 loci ranging from 1025 to 5234 bp, totaling up to 563,656 bp, resulting in a matrix which was 86.4% complete.
Two of the individuals included in the primary sequence capture protocol described above were from formalin-fixed, fluidpreserved specimens collected by Richard D. Highton in 1971 (USNM 468094-5; Desmognathus auriculatus, FL: Marion) from a population (Silver Glen Springs, Ocala National Forest) which is now believed to be extirpated (see Dodd, 1998

| Analytical strategy and computational constraints
Our ability to unravel the complexity of Desmognathus relationships fully is confounded by several factors. We know from previous analyses (Pyron et al., 2020) that there are multiple reticulations in the phylogeny of the group, both ancestral (i.e., involving internal branches) and recent (among terminal species). There are also several apparent instances of hybridization revealed by mito-nuclear discordance (Beamer & Lamb, 2020)  may not be identifiable (Pardi & Scornavacca, 2015).
Similarly, population-genetic methods designed to estimate or test explicit demographic models incorporating complex evolutionary dynamics are often heavily constrained in their ability to handle more than a few populations or terminal species (Excoffier et al., 2013;Gutenkunst et al., 2009;Hey, 2010;Jackson et al., 2017).
Additionally, those methods often perform best with large numbers (i.e., thousands) of independent loci, whereas our sampling is limited to 233, despite our long total alignment. We also lack a reference genome to pinpoint significant patterns of chromosomal admixture and genomic differentiation (Gante et al., 2016;Li et al., 2019).
Consequently, computational and data constraints prevent the ideal outcome of simultaneous inference of an adequately complex network and sufficiently parameterized species-delimitation model. For these reasons, we apply a series of simple but robust procedures to approximate this idealized estimate of the evolutionary history of Desmognathus, taking care to highlight potential areas of continuing uncertainty and foci for future research.
Integrating all these analyses in the context of delimiting terminal taxa requires some care. While we are overall very cautious in interpreting phylogenetic topologies given the apparent prevalence of gene flow and hypothesized impacts thereon, we treat reciprocal monophyly of geographically distinct clades as the clearest evidence for valid candidate species. We previously recognized 49 of these (Pyron et al., 2020). Based on our phylogenetic analyses (see below),

| Phylogenetic inference
We estimated 233 individual gene trees using IQ-TREE v2.1.3 (Minh, Schmidt, et al., 2020) with optimal models selected using ModelFinder (Kalyaanamoorthy et al., 2017) and support estimated using 1000 ultrafast bootstraps (Hoang et al., 2017) and the SHL-aLRT branch statistic (Anisimova et al., 2011). We then estimated a concatenated phylogeny using partitioned models (Chernomor et al., 2016) under the optimal merging strategy for the 233 loci combined, also with UFBoot and SHL-aLRT support values. For this topology, we finally estimated gene-and site-concordance factors (gCF/sCF) from the individual locus alignments . We conducted five runs and used the best tree as the starting point for a final analysis.
We initially evaluated estimating a species tree under the assumptions of the multi-species coalescent (MSC) model assuming incomplete lineage-sorting (ILS) as the primary driver of gene-tree discordance, in the program ASTRAL-III v5.7.7 (Zhang et al., 2018), which has shown overall high accuracy in simulation (Chou et al., 2015). However, numerous recent authors have cast doubt on the accuracy of these methods in the face of extensive gene flow (Jiao et al., 2020;Leaché et al., 2014;, as we observe in our dataset. Preliminary analyses of this dataset using all specimens and loci yielded anomalous topologies with low support that were also incongruent with species-tree results from our previous study sampling many of the same specimens and loci (Pyron et al., 2020). As we were unable to address the potential confounding effects of ILS and gene flow on MSC-based species trees, we proceeded with the concatenated and gene-tree estimates alone.

| Clustering and admixture analyses
The initial naming of Desmognathus clades was primarily qualitative , giving a letter designation to monophyletic sublineages of existing morphospecies using a tree-based procedure (Wiens & Penkrot, 2002). Later researchers attempted to formalize this nomenclature with systematic 'ecodrainage' sampling to ensure that all relevant potential lineages and geographic genetic segments were sampled (Beamer & Lamb, 2020). Our subsequent designation of 49 mito-nuclear candidate species was based primarily on qualitative geographic and topological assignment to these clades with relatively limited sampling of populations (Pyron et al., 2020). Thus, the population-level validity of these taxa remains unknown.
To provide a robust quantitative basis for future speciesdelimitation analyses based on integrative datasets, including morphology, ecology, etc., we performed several clustering and admixture analyses to assign individuals to quantitatively identified candidate species. For manageability, we first divided the concatenated topology into 12 groups of mito-nuclear candidate species as defined in our previous analyses ( Figure 3; Table 1). We extracted SNPs with <20% missing data from each locus, removed singletons (Linck & Battey, 2019), and concatenated them into cladespecific matrices. We first visualized nucleotide diversity relative to apparent genetic clusters using a PCA plot of the SNP matrix for those specimens (Dray & Dufour, 2007) in the R package 'adegenet' (Jombart, 2008), with individuals coded by their previously assigned mito-nuclear candidate species.
We initially evaluated the use of DAPC (Jombart et al., 2010) to identify statistical clusters. However, the results were identical to our clade-level admixture analyses (see below) in all but three cases; aeneus and fuscus where DAPC estimated additional phylogeographic structure, and marmoratus G, where the two disagreed on hybrid assignment to a parental species. We believe these minor differences to be a result of DAPC's failure to account for gene flow, which previous authors suggest overestimates diversity and mis-specifies hybrids (Chan et al., 2017), and we therefore do not present these results. We also initially investigated the 'snapclust' algorithm (Beugin et al., 2018), which identifies the number of K clusters in Hardy-Weinberg Equilibrium, and can identify F 1 and F 2 hybrids when K = 2. However, preliminary results were apparently anomalous for many clades, likely due to the strong violation of the Hardy-Weinberg assumption of no migration, and we did not pursue this approach further.
Our final aim is to estimate the prevalence and location of hybrid individuals, and the overall degree and spatial extent of genomic admixture between candidate species. We approached this with both individual-and taxon-based approaches. First, we used the 'sNMF' algorithm in the R package 'LEA' to estimate individual ancestry coefficients for each specimen (Frichot et al., 2014). For each clade or set of comparisons, we first optimized the regularization parameter α for values spanning several orders of magnitude: 1, 5, 10, 50, 100, 500, and 1000. Frichot et al. (2014) initially tested values up to 10,000 but found that values above 1000 were generally discarded for most datasets by the cross-entropy criterion. We selected the value of α that minimized median cross-entropy across 100 replicates. Using the optimal value of α for each clade, we then estimated ancestry using the values of K derived from the clustering analyses for that clade (see above) as well as determining the optimal value of K minimizing median cross-entropy across 100 replicates, if these differed. In a few cases where an elbow did not form, we selected the lowest value of K representing a significant improvement in cross-entropy using the 'notch' test of the boxplots (i.e., overlapping 95% SE of the median).

| Hybridization and topology
Recent authors (Chan et al., 2022;Dolinay et al., 2021) have provided verbal models and preliminary empirical evidence for an intuitive process: the presence of hybrid individuals in a phylogenetic analysis of phylogeographic datasets seems to create artifactual topologies. Specifically, hybrids between two parental populations seem to form ladder-like grades between them in proportion to their individual ancestry from each. They may also attract other hybrid individuals with similar admixture profiles, creating false clades that appear to represent real, distinct evolutionary lineages that are merely statistical clusters of hybrids. Such profiles can result from a wide variety of unrelated processes (Lawson et al., 2018). We noted the possible presence of these artifacts in our preliminary analyses.
Specifically, we noticed highly imbalanced clades with hybrid specimens in early-diverging positions.
We assess the presence of these potential artifacts in two ways. First, we tested for a relationship between the gCF supporting the placement of each terminal specimen and the largest individual ancestry coefficient for that specimen. Presumably, highly admixed specimens cannot, by definition, be supported by high gCF. This approach duplicates some gCF values for sister pairs of specimens, but our null hypothesis is that hybrids (i.e., highadmixture specimens) will cluster together, which would preserve the expected pattern. Second, we tested whether the imbalance of the subtrees for each of the 47 mito-nuclear candidate species (except marmoratus G which only had a single specimen) was related to the amount of hybridization within that clade. We measured imbalance as Colless' I, normalized proportional to distinguishable arrangements, rather than a Yule process which is likely inappropriate at the phylogeographic level (Blum & François, 2006). We ocoee D2 Figure 19 monticola A/C - Figure 18 monticola B - Figure 18 ocoee F/G/H - Figure 19 apalachicolae apalachicolae A1 Figure 19 apalachicolae A2 Figure 19 ocoee E ocoee E1 Figure 19 ocoee E2 Figure 19 Balsam ocoee A - Figure 20 ocoee B - Figure 20 conanti conanti A conanti A1 Figure 22 conanti A2 Figure 22 santeetlah - Figure 22 conanti E - Figure 21 'gamma' - Figure 22 conanti B/C/D conanti B/D Figure 21 conanti C Figure 21 'beta' - Figure 22 conanti F - Figure 22 carolinensis - Figure 22 Notes: These taxa and lineages circumscribe the genetic diversity of all known, extant population segments within the genus. For the history of these naming conventions and previous mitochondrial and nuclear estimates of these candidate species, see Beamer and Lamb (2020), Kozak et al. (2005) and Pyron et al. (2020).

TA B L E 1 (Continued)
that the entire lineage be composed of identical hybrids, which is unlikely; all clades contained at least some pure parentals, making this an appropriate measure.

| Phylogenetic inference
The concatenated estimate of the 894-taxon dataset is generally similar to recent results (Beamer & Lamb, 2020;Kozak et al., 2005;Pyron et al., 2020;Weaver et al., 2020), with no major topological novelties or newly discovered clades (Figures 3-23). The one remarkable difference is the placement of carolinensis, which is nested within the conanti species group (Figure 3a,b), a placement that has not been recovered in previous studies. This placement is strongly supported (~100%) by BS and SHL, but weakly supported by gCF (~0%) and sCF (~33%), suggesting strongly conflicting signal among genes (Minh, Hahn, et al., 2020). This is potentially related to ILS, hybridization, or gene-tree error, though we discard the latter explanation given the length and informativeness of our loci. It is difficult to untangle ILS from gene flow (Wang et al., 2018;Zhou et al., 2017), but given the prevalence of admixture, we are confident that gene flow is chief among the patterns driving topological variation across the many recent estimates. This hypothesis corroborates our previous results (Pyron et al., 2020).
Cross-referencing the topology with our admixture estimates (see below) suggests that numerous hybrid individuals occupy intermediate positions as an artifactual consequence of their admixed ancestry, which cannot be resolved adequately by concatenation or speciestree analysis. We estimate that 169 of the 894 specimens have >20% ancestry from a secondary phylogeographic lineage or candidate species ( Figure 1a). Chief among these include specimens of the Pisgah  (Table 1). Overall, none of the 49 mito-nuclear candidate species appear to be artifactual clades composed entirely of hybrids. However, a group of four heavily admixed specimens assigned to conanti F from southwestern North Carolina do appear to form such a cluster, estimated as the sister lineage to carolinensis in the concatenated tree ( Figure 22). Additionally, the marmoratus G lineage from the Pisgah clade may also represent such a hybrid ( Figure 9). Colless' I normalized by the PDA model ( Figure 3d). Thus, clades with more admixed individuals are more imbalanced on average.

| Candidate species and phylogeographic lineages
There are four major differences between our results and previous studies ( with more limited sampling (Beamer & Lamb, 2020;Means et al., 2017;Pyron et al., 2020). We, therefore, recognize it as valentinei B, a newly delimited mito-nuclear candidate species.
Third, the population-level analyses estimate auriculatus B & C as a single cluster; we collapse them to auriculatus B/C. Finally, while marmoratus G is estimated as a hybrid originating from multiple parental lineages, it possesses a unique mitochondrial haplotype (Beamer & Lamb, 2020) and topological position, and we, therefore, continue to recognize it as a provisionally distinct candidate species.
Our overall analyses, thus, support 47 distinct candidate species based on population-level analysis of phylogenomic data (Table 1).
Despite the stability of candidate species delimited from 2005 (Beamer & Lamb, 2020;Kozak et al., 2005;Pyron et al., 2020) to the present analyses, many exhibit admixture with other lineages, contain significant phylogeographic structuring, and show spatial genetic clines of gene flow between geographic clusters. Indeed, we estimate as many as 30 phylogeographic lineages within 13 of the 47 candidate species. Most of these exhibit admixture with respect to geographically,

| Seepage clade
As in previous studies, aeneus and imitator are reciprocally monophyletic species, which form either the successive outgroups to all other Desmognathus excluding the Pigmy (Beamer & Lamb, 2020;Kozak et al., 2005) or the Pigmy and Nantahala clades alone (Pyron et al., 2020), as here ( Figure 3). Our previous species-tree and network analyses of a smaller AHE dataset provided evidence that aeneus and imitator are, in fact, sister lineages (Pyron et al., 2020), but that a deep-time reticulation involving the stem lineage of the fuscus and conanti-group species is responsible for the poorly supported, non-sister topologies recovered here and in previous mitochondrial and concatenated nuclear estimates (e.g., Weaver et al., 2020). The sNMF analyses yielded K = 5, recovering no geographic population structure within imitator, but four phylogeographic lineages within aeneus (Figure 7). This extensive intraspecific diversity was noted in previous analyses (Beamer & Lamb, 2020;Pyron et al., 2020), and corresponds roughly to different mountain segments in the southern Blue Ridge, with one lineage comprising Piedmont populations. The extensive parapatry and admixture of these lineages suggests a complex phylogeographic history that deserves further scrutiny.

| Pisgah clade
The Pisgah clade (see Jones & Weisrock, 2018;Pyron et al., 2020) represents a complex scenario of introgression and diversification that strains our definitions of phylogeographic lineages and candidate species (Figures 8 and 9). The complexity stems from three in- them. This region is also the contact zone between carolinensis and orestes (Tilley & Mahoney, 1996).
We also observe in the Pisgah clade a phenomenon that occurs here across the genus, where some specimens exhibit "streaks" of minor ancestry (<10%) from several other lineages that stack on the Additional genomic and geographic sampling is needed to clarify the origin and status of this complex radiation.

| Appalachian clade
Our results (Figures 10 and 11 individuals occupying more early-diverging topological positions. We observe significant admixture between ochrophaeus and orestes B, and between orestes B and orestes A/C, as in previous studies (Mead et al., 2001).  specimen of welteri from extreme southeastern Kentucky is estimated to have a small amount (~16%) of individual ancestry from abditus, but we do not consider this significant evidence of admixture between them at present. While we treat both candidate species together in this analysis, they are not sister lineages in the concatenated phylogenetic estimate presented here, but rather successive divergences (see also Weaver et al., 2020). However, they have been estimated as sister lineages in previous concatenated analyses, and in exploratory network analyses (Pyron et al., 2020;unpubl. data). In contrast, mitochondrial phylogenies estimate widely separated positions for the two taxa (Beamer & Lamb, 2020;Pyron et al., 2020). Current sampling is evidently inadequate for a complete resolution of these relationships and patterns of potential genetic exchange.

F I G U R E 1 5
Lowland fuscus clade (auriculatus A, B, and C; fuscus C and D) with branches and PCA (a) colored by mito-nuclear candidate species, along with barplot and map of estimated individual ancestry coefficients (c) focused on auriculatus A and B/C. The dotted red outline on the map highlights the formalin-fixed specimens (USNM 468094-5). That pie chart shows the mean of their estimated individual ancestry coefficients from the two sNMF runs on subsetted SNP matrices. They are thus not, strictly speaking, equivalent, but the full and reduced analyses all recovered the same six lineages, and we, therefore, present them here for visual comparison. The full results for these specimens (see below), including the reduced phylogeny, are given in the SI   (Figures 13 and 14).   (Figures 3, 15, 16). Selection of K using minimum crossentropy did not form an elbow, but yielded K = 6 by the notch test. auriculatus A and fuscus C in a previous phylogenetic network analysis (Pyron et al., 2020), and mirrored by results from the formalinfixed historical specimens (see below), which are included on this plot ( Figure 15) for visualization purposes.

| Ouachita clade
As in recent concatenated, species-tree, and network analyses, brimleyorum, valentinei, and valentinei B form a monophyletic group (Pyron et al., 2020), unlike recent mitochondrial analyses in which valentinei +valentinei B and brimleyorum are the successive outgroups to the conanti species group (Beamer & Lamb, 2020;Pyron et al., 2020). Here, as in previous concatenated analyses (Pyron et al., 2020), the group is the sister lineage of the conanti +ocoee groups and their associated candidate species. In contrast, previous species-tree and network analyses estimated it as the sister lineage to the ocoee group alone (Pyron et al., 2020). While it is possible that ILS explains this variation completely, the dramatically differing mitochondrial and nuclear concatenated, species-tree, and network

| ocoee clade
The ocoee species group estimated here includes apalachicolae, monticola A/C & B, and ocoee D, E, and F/G/H, as in most previous mitochondrial (excluding ocoee D) and nuclear concatenated, species tree, and network analyses (Beamer & Lamb, 2020;Kozak et al., 2005;Pyron et al., 2020). Each candidate species is reciprocally monophyletic as previously defined, even with the substantially   (Figures 18 and 19).
Selection of K using minimum cross-entropy did not form an elbow, but yielded K = 9 by the notch test. Furthermore, comparisons of these patterns to the topology further suggest the impact of hybridization on ladder-like grades in the concatenated estimate. most other phylogenies. This mitochondrial lineage (ocoee "C") was not sampled in our previous work (Pyron et al., 2020), and we did not treat it as distinct here due to its limited sampling and close relationship with ocoee D in preliminary analyses. Another specimen of ocoee D1 also contains ~20% ancestry from ocoee E2, a phylogeographic lineage of ocoee E (see below), and several have streaks. and none of the sampled specimens have significant ancestry from any other single lineage. As with the monticola B and ocoee D specimens described above, one ocoee F/G/H has a ~40% "streak" of non-F/G/H ancestry. The geographically expanded apalachicolae (Beamer & Lamb, 2008) consists of two disjunct phylogeographic lineages, apalachicolae A1 comprising the originally described Coastal Plains populations in Alabama, Florida, and Georgia (see Means & Karlin, 1989), and apalachicolae A2 in the Blue Ridge foothills of north-central Georgia.

| Balsam clade
The enigmatic clade of ocoee A & B is estimated here as the sister lineage of the conanti species group (Figure 3), as in previous speciestree and some network analyses (TreeMix) based on a smaller AHE dataset (Pyron et al., 2020). This is in contrast to previous concatenated and other network (SNAq) analyses that estimate it as the sister lineage of the ocoee group, and mitochondrial estimates placing ocoee A-D as the sister lineage to all conanti, fuscus, and ocoee-group species (Beamer & Lamb, 2020;Kozak et al., 2005;Pyron et al., 2020). The mitochondrial versus nuclear placements and the result of the previous TreeMix analysis strongly support a scenario of ghost admixture (Racimo et al., 2015;Zhang et al., 2019) from an earlydiverging, likely extinct lineage of Desmognathus (Pyron et al., 2020).
The two candidate species are reciprocally monophyletic (K = 2) without any evidence of significant admixture ( Figure 20). However, the ocoee B lineage is also implicated in significant allele sharing with some conanti-group species (see results below). While we do not directly compare those candidate species here to estimate individual ancestry coefficients, the Dsuite results suggest an additional facet of complex history for this group to be examined in the future.

| conanti clade
Finally, the conanti species group (conanti A-F, 'beta,' 'gamma,' and santeetlah; possibly allied with carolinensis) represents perhaps the most complex and challenging set of candidate species and phylogeographic lineages. This group was first addressed in detail, in part, by Tilley et al. (2013), who referred to some of the lineages as "innominate forms" possibly representing "failed species." Here, we estimate both geographic and genealogical coherence of each candidate species and associated phylogeographic lineages, bolstering previous conclusions that at least some of them may, in fact, represent "good" species (Pyron et al., 2020), albeit with complex patterns of ancestral or recent contact and hybridization. Selection of K using minimum cross-entropy did not form an elbow, but yielded K = 10 by the notch test. with ~100% conanti F ancestry occurs sympatrically with specimens exhibiting ~100% conanti A2 ancestry, suggesting that these lineages can nonetheless maintain genetic distinctiveness in close geographic proximity.

| Admixture and reticulation
The results from Dsuite ( Figure 23 First is the complex interplay within and between candidate species from ocoee and conanti lineages, part of which was captured above in our admixture analyses. Within the conanti group, most of the admixture patterns described previously are captured in the Dsuite analyses; admixture between 'beta' and conanti F (64%) being supported most strongly, but with conanti B/C/D, 'gamma,' santeetlah, and conanti A also being implicated.
A second pattern that has not been estimated previously is gene flow to the conanti species group from the Balsam clade (ocoee A & B), most strongly from ocoee B. Given the partial symmetricity of the matrix, the reverse scenario is also possible, with ocoee B receiving alleles from one or more conanti-group lineages. This is likely related to the ghost admixture scenario estimated in a previous network analysis in TreeMix (Pyron et al., 2020).
Third, several species in the ocoee clade (apalachicolae and ocoee D, E, and F/G/H) are all implicated in contributing alleles to species in the Balsam and conanti clades. As this result is not significantly symmetric, the directionality may be more meaningful. This partially corroborates previous network analyses in SNAq that estimated conanti B/C/D arising as the result of a hybrid speciation event between ocoee F/G/H and 'gamma' (Pyron et al., 2020).
A fourth and related pattern of particular importance involves fuscus C as a recipient branch from conanti F and 'beta.' Many populations of fuscus C share mitochondrial haplotypes with carolinensis (Beamer & Lamb, 2020;Kozak et al., 2005;Pyron et al., 2020), here estimated as the sister lineage of conanti F, but the alleles shared between them are not significant in the Dsuite analysis.
However, our previous TreeMix analysis of a smaller AHE dataset estimated a three-way reticulation among carolinensis, conanti F, and fuscus C, which is, thus, partially corroborated here. We note again, however, that not all significant lineages are necessarily donors, based on the potential artifacts described above from branch correlations. In the population-level admixture analyses, we considered the fuscus, conanti, and ocoee species groups separately; future analyses will need to analyze them jointly to unravel these patterns.
A fifth more isolated instance of allele sharing is estimated between fuscus E and planiceps, a finding which also occurs in the admixture analyses (see above; Figure 14; Tilley et al., 2008). Sixth, orestes B & AC share a 26% excess of alleles, a pattern clearly established in the admixture analyses ( Figure 11) and previous analyses (Mead et al., 2001). Seventh, marmoratus E/H and quadramaculatus D are estimated to share excess alleles, which is reflected at low frequencies in our admixture analyses ( Figure 8) and potentially related to the sharing of mitochondrial haplotypes between marmoratus E/H and quadramaculatus E . Finally, the eighth major event is significant gene flow between quadramaculatus E & G (44% excess alleles), which is strongly corroborated in the admixture analyses ( Figure 9).

| Formalin-fixed sequencing
As noted above, extraction and library preparation for the two fluid- and 468095 (25%), the existence of which was hypothesized in our previous study (Pyron et al., 2020). However, we also noted long terminal branches for these specimens in the concatenated ML phylogenetic estimate (see SI) and a large number of unique, apparently homozygous SNP calls in these specimens (Pyron et al., 2022), potentially driven by DNA degradation from fixatives or preservatives (O'Connell et al., 2021). Thus, the branch lengths and "fuscus C" ancestry may be noise arising from template or sequencing errors misleading the sNMF algorithm. Future analyses can test the hypothesis of DNA damage (or other processes such as sequencing or assembly error) by attempting to sequence more loci to greater depth while accounting for DNA degradation (Ginolhac et al., 2011). We did not attempt this here due to the limited extent of capture success for these specimens. Our estimates of candidate species, most or all of which likely merit taxonomic recognition as distinct species, appear to be stable. In contrast, many of these lineages contain extensive genetic structure across geographic space apparently resulting from complex phylogeographic histories, future study of which will likely yield rich insights into speciation processes in the group. As noted above, we refer to distinct population clusters estimated by the admixture analyses, but which intergrade extensively across their contact zones and do not form reciprocally monophyletic groups in the phylogeny, as phylogeographic lineages.

| DISCUSS ION
Crucially, a series of recent empirical studies have highlighted a potentially artifactual process by which species delimitation methods that do not account for gene flow may erroneously inflate estimates by identifying hybrid populations as distinct taxa (Chan et al., 2017(Chan et al., , 2020(Chan et al., , 2022Dolinay et al., 2021). Admixed populations resulting from spatiotemporally proximate hybridization events may consequently produce individuals with similar allele frequencies, distinct from either parental population, that are therefore grouped by clustering and phylogenetic analyses. These populations may even possess distinct mitochondrial haplotypes, not because they are truly distinct evolutionary lineages, but which were instead captured asymmetrically from other lineages during introgression events (Mastrantonio et al., 2016).
We were initially concerned that detailed analyses might reveal such an artifactual origin of many previously delimited candidate species. However, the effect seems limited to two primary instances. The first is marmoratus "G," which occupies an intermediate topological position between quadramaculatus G and marmoratus C + quadramaculatus C, while possessing a genome apparently composed of contributions from all Pisgah lineages except quadramaculatus E1, a small plurality of which is from marmoratus C. Similarly, populations of marmoratus "G" have mitochondrial haplotypes closely related to those of quadramaculatus G, suggesting that they were captured during a past introgression event involving quadramaculatus G and marmoratus C, at a minimum. Consequently, marmoratus "G" may not, in fact, represent a distinct candidate species.
Second, a cluster of four individuals previously assigned to conanti F form a clade occupying a distinct topological position as the sister lineage to carolinensis, rather than nesting within the other sampled specimens of conanti F. Accordingly, the admixture analyses reveal them to have hybrid ancestry consisting primarily of conanti F and carolinensis, but also potentially conanti A, E, and 'beta.' While admixture between genealogically distinct parental populations does not appear to significantly confound delimitation of candidate species in our analyses, it does seem to exert a strong influence on phylogenetic topologies (Degnan, 2018). An extension of the artifacts first reported by Chan et al. (2020Chan et al. ( , 2022 and demonstrated by Dolinay et al. (2021) suggests that hybrid individuals will form ladder-like grades of intermediate topological position between "pure" parental populations in rough proportion to their degree of ancestry from each parent. This effect is corroborated strongly in our analyses, where the topological imbalance of candidate species is significantly related to their degree of admixture. In nearly all clades, highly admixed individuals occupy early-diverging positions in their primary candidate species as they are seemingly pulled toward their secondary candidate species in the tree. This is particularly notable in orestes, conanti, carolinensis, fuscus, and ocoee.
This is essentially the originating process at the population level which ultimately produces the species-delimitation artifacts originally described by Chan et al. (2020Chan et al. ( , 2022; sampling more highly admixed individuals from the clades mentioned would likely end up producing entire false clades in estimated trees. Finally, these same processes appear to be influencing topological estimation of deeper nodes, with ancestral relationships confounded by deep-time reticulation as described by numerous recent authors (Burbrink & Gehara, 2018;Knowles et al., 2018;MacGuigan & Near, 2019). Our previous network-based analyses resolved several instances of gene-tree congruence between mitochondrial and nuclear concatenated and species-tree analyses that apparently resulted from ancestral hybridization (Pyron et al., 2020 sketch of these dynamics in Desmognathus, although more data will be needed to provide more definitive explanations for many of the complex patterns described above. Additional questions of interest at the nexus of phylogeny and phylogeography (Edwards et al., 2016) include distinguishing between primary versus secondary contact (Feder et al., 2013), ancient versus recent admixture (McTavish & Hillis, 2014), and ILS versus introgression (Schaefer et al., 2021;Wang et al., 2018;Zhou et al., 2017). All of these will be crucial in future analyses to better understand Desmognathus relationships, speciation processes, and phylogeographic histories.

| Challenges for understanding species limits
Species delimitation is a challenge that cannot be easily settled by computational algorithms alone (Carstens et al., 2013;Padial et al., 2010;Sukumaran & Knowles, 2017). The interplay of stochastic coalescent variation (Knowles & Carstens, 2007), introgression (Martin et al., 2013), and spatial and ecological barriers to gene flow (Burbrink et al., 2021) can yield strikingly complex scenarios in the "gray zone" of speciation (Matute & Cooper, 2021;de Queiroz, 2007). Accordingly, a range of these scenarios is observed here in Desmognathus. Many if not most candidate species are topologically, genetically, and geographically cohesive. As such, they are spatiotemporally distinct ontological individuals in the evolutionary sense (Ghiselin, 1974;Hull, 1976), and most will likely be recognized as distinct species in future taxonomic revisions. In contrast, we highlight three major groups of candidate species described above presenting additional challenges for interpretation.
The first is the Pisgah clade of marmoratus and quadramaculatus sublineages. Our results suggest complex dynamics potentially indicating a "network radiation" (Kozak et al., 2021) consisting of speciation by hybridization (Mavárez et al., 2006) resulting from genomic processes (Abbott et al., 2013). Regardless, the six lineages estimated here by clustering admixture analyses are reciprocally monophyletic, morphologically diagnosable in some cases (marmoratus vs. quadramaculatus), and appear to have clear genetic and geographic boundaries as candidate species, albeit permeable ones . Therefore, we continue to recognize them as candidate species (with quadramaculatus E containing two phylogeographic lineages), but with marmoratus G having perhaps the strongest evidence for being of hybrid origin, and therefore potentially being considered conspecific with marmoratus C. However, the difficulty of treating multiple morphologically similar but distantly related candidate species that hybridize in parapatry without any apparent geographic, ecological, or microhabitat partitioning (e.g., quadramaculatus C, E, and G) is a thorny conceptual challenge.
A similar situation arises in the ocoee clade, with respect to apalachicolae and ocoee D, E, and F/G/H. These candidate species and their constituent phylogeographic lineages each exhibit signatures of geographically distinct genomic ancestry, while simultaneously exhibiting admixture across broad parapatric contact zones in the southern Blue Ridge mountains. While some peripheral, Piedmont, and Coastal Plains populations exhibit more distinctive phenotypes (Means & Karlin, 1989;Valentine, 1961), the Blue Ridge populations are essentially indistinguishable phenotypically. Given the spatial and genetic complexity of historical evolutionary relationships in this clade, additional genetic and population sampling will be required to establish robust species limits in the group. We continue to treat the candidate species as distinct, with phylogeographic lineages in ocoee D and E. Considering the extensive admixture of apalachicolae A2 with ocoee E2 and the exceptional geographic separation of apalachicolae A1, we consider apalachicolae A1 alone to represent the species Desmognathus apalachicolae (Means & Karlin, 1989) as originally described in the Coastal Plain of Georgia, Alabama, and Florida.
This reverses the earlier conclusions of Beamer and Lamb (2008) based solely on mitochondrial data. Thus, apalachicolae A2 likely represents either an additional phylogeographic lineage of ocoee E (which would then be paraphyletic, albeit in a topology we suspect to exhibit significant artifacts arising from gene flow), or possibly an additional distinct candidate species endemic to the foothills of the Blue Ridge in Georgia.
A third major instance of these patterns occurs with the conanti species group (Figures 21 and 22 (Pyron et al., 2020) and mitochondrial and allozyme data (Tilley et al., 2013).
In contrast to the ocoee group described above but more similar to the Pisgah clade, the conanti-group candidate species in the southern Blue Ridge exhibit several distinct phenotypes, including a smaller "mountain dusky" morphology with a round tail ('beta,' carolinensis, and some conanti A and F populations), a larger "dusky" morphology with a laterally compressed tail ('gamma,' conanti B/C/D), and the distinctively colored santeetlah (Petranka, 2010;Pope, 1924;Tilley, 1981;Tilley et al., 2013). Determining the precise evolutionary history of these populations and their interactions will require additional genomic and population-level sampling. Some empirical and theoretical research has even suggested that, through a series of complex demographic, genomic, and population-genetic processes, such hybrid populations as those observed here may actually serve to filter gene flow between species (Martinsen et al., 2001). This may lead to stable hybrid zones that promote adaptive introgression and prevent lineage collapse (Barth et al., 2020); studying these dynamics (selection in particular) may be revealing for Desmognathus as in other salamanders (Alexandrino et al., 2005;Johnson et al., 2015).
Precise definitions of species limits and descriptions of potentially new species remains a challenge for future studies, although most candidate species demonstrate substantial geographic, genetic, and morphological distinctiveness. As noted above, the Dsuite analyses, previous network analyses (Pyron et al., 2020), and the varying placements of ocoee A & B and carolinensis across studies suggest cross-clade interactions among the conanti, fuscus, and ocoee groups that were not captured by the design of our clade-specific admixture analyses. There are also additional apparent instances of mitochondrial genome capture (e.g., between auriculatus C and fuscus E, between fuscus B and D, and between the stem lineages of the Nantahala and Pisgah clades) that have yet to be estimated in any network or admixture analyses. There are numerous other putative instances of hybridization between distantly related but sympatric or parapatric species pairs (see reviews in Beamer & Lamb, 2020;Tilley, 2016). However, most of these typically occur at low frequency and are generally based on allozyme analyses which are susceptible to electromorphic homoplasy and consequently may not represent real introgression in some cases (Henriques et al., 2016).

| Historical specimens of extirpated populations
In the sNMF admixture analyses, both fluid-preserved specimens were estimated to have a substantial amount (~20%-40%) of hybrid ancestry from some fuscus C sublineages, along with one of the modern samples ( Figure 15). Our previous phylogenetic network analysis (Pyron et al., 2020) using PhyloNetworks  actually estimated a sister relationship between auriculatus A and fuscus C, with the ancestor of the pair receiving 32% of its ancestry from carolinensis. We noted that this relationship was not reflected in our other phylogenetic analyses at the time, nor is it estimated here among our recent specimens in terms of topology, clustering, or admixture, or by any mito-nuclear discordance.
However, we also noted that the extinction of numerous auriculatus A populations (such as the former fuscus subspecies "carri") may have limited our ability to recover this signal (Pyron et al., 2022).
Therefore, the signal of fuscus C ancestry in extirpated peninsular populations of auriculatus A may indeed be a real pattern reflecting historical evolutionary relationships and hybridization. Alternatively, it may reflect the known high error rate of sequencing for formalinfixed specimens (Hykin et al., 2015;Oh et al., 2015), or poor signal from our small dataset. One strategy may be to combine multiple extractions to increase input DNA into capture reactions and sequence available fragments at greater depth. Hopefully, future improvements in extraction and sequencing technologies will increase efficiency and reduce error rate for fluid-preserved specimens and shed additional light on the genomics of these enigmatic extinctions.

| CON CLUS IONS
Our aim in this study was to (i) evaluate the distinctiveness of the 49 previously defined mito-nuclear candidate species of Desmognathus or the presence of any new such groups, (ii) examine these putative taxa for additional phylogeographic lineages and the existence and extent of hybrid zones, and (iii) assess the impact of introgression on the reconstruction of bifurcating phylogenetic topologies. We find that previous estimates have converged on a roughly stable estimate of species-level diversity in the genus and corroborate the existence of 47 candidate species. Many of these candidate species exhibit extensive admixture with each other along their geographic margins, and in many cases with non-sister or even distantly related clades.
Similarly, many candidate species contain significant geographic ge-

ACK N OWLED G M ENTS
For advice on reticulation and phylogenetic networks, we thank E.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All alignments, code, and results described in this paper are available in the DataDryad repository https://doi.org/10.5061/dryad.f4qrf j6x8. This includes the AHE assemblies and SRA accessions, specimen data and localities, custom R scripts for analysis, and the phylogenetic, clustering, and admixture results. The raw reads for the AHE assemblies are available in SRA PRJNA776902, PRJNA778238, PRJNA778269, PRJNA778013, and PRJNA777585.