Opening the door to greater phylogeographic inference in Southeast Asia: Comparative genomic study of five codistributed rainforest bird species using target capture and historical DNA

Abstract Indochina and Sundaland are biologically diverse, interconnected regions of Southeast Asia with complex geographic histories. Few studies have examined phylogeography of bird species that span the two regions because of inadequate population sampling. To determine how geographic barriers/events and disparate dispersal potential have influenced the population structure, gene flow, and demographics of species that occupy the entire area, we studied five largely codistributed rainforest bird species: Arachnothera longirostra, Irena puella, Brachypodius atriceps, Niltava grandis, and Stachyris nigriceps. We accomplished relatively thorough sampling and data collection by sequencing ultraconserved elements (UCEs) using DNA extracted from modern and older (historical) specimens. We obtained a genome‐wide set of 753–4,501 variable loci and 3,919–18,472 single nucleotide polymorphisms. The formation of major within‐species lineages occurred within a similar span of time (0.5–1.5 mya). Major patterns in population genetic structure are largely consistent with the dispersal potential and habitat requirements of the study species. A population break across the Isthmus of Kra was shared only by the two hill/submontane insectivores (N. grandis and S. nigriceps). Across Sundaland, there is little structure in B. atriceps, which is a eurytopic and partially frugivorous species that often utilizes forest edges. Two other eurytopic species, A. longirostra and I. puella, possess highly divergent populations in peripheral Sunda Islands (Java and/or Palawan) and India. These species probably possess intermediate dispersal abilities that allowed them to colonize new areas, and then remained largely isolated subsequently. We also observed an east–west break in Indochina that was shared by B. atriceps and S. nigriceps, species with very different habitat requirements and dispersal potential. By analyzing high‐throughput DNA data, our study provides an unprecedented comparative perspective on the process of avian population divergence across Southeast Asia, a process that is determined by geography, species characteristics, and the stochastic nature of dispersal and vicariance events.


| INTRODUC TI ON
Avian biogeography in continental Southeast Asia, an area including the mainland and continental islands, has a long history of study (Deignan, 1945;Smythies, 1953Smythies, , 1960Wells, 2007). This base of knowledge has been augmented in the last 15 years by a steady stream of molecular phylogenetic reconstructions that have identified a complex pattern of colonization into, out of, and within the region (e.g., Moyle, Andersen, Oliveros, Steinheimer, & Reddy, 2012;Oliveros, Field, et al., 2019;Wang, Kimball, Braun, Liang, & Zhang, 2017), and which have substantially improved Southeast Asian bird classification (Cai et al., 2019;Cibois et al., 2018;Cibois, Kalyakin, Han, & Pasquet, 2002;Fuchs, Pasquet, Couloux, Fjeldså, & Bowie, 2009;Lim et al., 2019;Moyle et al., 2012;Sangster, Alström, Forsmark, & Olsson, 2010;Zhang et al., 2016). However, phylogenetic studies are imprecise when it comes to identifying the drivers of diversification and extinction, such as changes in gene flow and population sizes, because they span large temporal and spatial scales. On the other hand, phylogeographic investigations within species and species groups provide a better understanding of proximate mechanisms of avian diversification, spatial structuring of genetic diversity, and even adaptive variation within species (Rissler, 2016). Focusing genetic sampling within species and on recent evolutionary history also produces better data for resolving the effects of environmental, geographic, and geological changes on populations. Unlike phylogenetic studies, however, phylogeographic studies of Southeast Asian birds are still relatively rare and usually limited in geographic scope, largely because of inadequate availability of population samples for comparison. The general lack of geographically comprehensive datasets from multiple taxonomic groups (e.g., plants, insects, mammals) has hindered our ability to compare and synthesize information into a complete picture of conditions and factors that have shaped population structure in this important tropical region.
Almost all avian phylogeographic research in Southeast Asia has relied on mitochondrial DNA (mtDNA) comparisons (except Garg et al., 2016;Gwee et al., 2019;Lim et al., 2017;Manthey et al., 2017). Some studies have also compared a small number of Sangersequenced nuclear genes, but these rarely provide much information at the population level. Although mtDNA has many strengths, such as simple maternal inheritance and a rapid rate of evolution (Tamashiro et al., 2019;Zink & Barrowclough, 2008), it can yield inaccurate phylogeographic inferences, as has recently been demonstrated by three genomic studies of Bornean bird populations (Campillo, Oliveros, Sheldon, & Moyle, 2018;Lim et al., 2017;Manthey et al., 2017) that revisited earlier mtDNA studies (Gawin et al., 2014;Moyle et al., 2011). More importantly, as a single locus, mtDNA cannot provide much insight into key population genetic parameters, such as gene flow, genetic admixture within individuals, and timing of population (vs. gene) divergence (Ballard & Whitlock, 2004). The increased application of strategies for obtaining highly multilocus genomic datasets promises to yield more accurate and detailed phylogeographic inference from Southeast Asian birds.
Another reason mtDNA has featured prominently in phylogeographic studies is that the data are relatively easy to obtain from traditional museum specimens (Payne & Sorenson, 2003). Inclusion of these "historical" samples improves geographic coverage over the reliance solely on newly collected specimens. However, nowadays historical specimens can also provide comprehensive genomic data (Bi et al., 2013). The acquisition of data from historical specimens requires that such specimens exist for given populations and yield DNA of adequate quality, which is often not the case.
Historical specimens of birds from Southeast Asia generally represent only a small portion of a species' distributions, are old, and poorly documented, and their DNA is often severely degraded and geography, species characteristics, and the stochastic nature of dispersal and vicariance events.

K E Y W O R D S
Indochina, Isthmus of Kra, population genetics, rainforest birds, Sundaland, ultraconserved elements may require substantial analytical modification (Lim & Braun, 2016).
Nevertheless, the extraction of genome-scale data from traditional specimens may permit more extensive phylogeographic inference for some species in the region.
Here, we present a phylogeographic comparison of five bird species codistributed across two Southeast Asian biogeographic subregions: Indochina, that is, easternmost India, Myanmar, Thailand, Cambodia, Laos, Vietnam, and westernmost China; and Sundaland, the Sunda continental shelf and its constituent lands, including the Malay Peninsula, Borneo, Sumatra, Java, and Palawan ( Figure 1). The five species, representing five passerine families, are little spiderhunter Arachnothera longirostra (Nectariniidae), Asian fairy-bluebird Irena puella (Irenidae), black-headed bulbul Brachypodius atriceps (Pycnonotidae), large niltava Niltava grandis (Muscicapidae), and gray-throated babbler Stachyris nigriceps (Timaliidae). These species were selected because they (a) are widespread and largely codistributed in continental Southeast Asia, and are usually considered single species, as opposed to groups comprising allospecies (the Palawan population of Irena is sometimes an exception); (b) are common and thus well represented in collections, a necessity for historical sampling; and (c) represent distinct ecological types and thus vary in dispersal potential and potentially genetic differentiation across space. Arachnothera longirostra, I. puella, and B. atriceps are eurytopic nectarivorous/insectivorous or frugivorous/insectivorous species that range widely among habitats and in elevation (Sheldon, Moyle, & Kennard, 2001;Wells, 2007). Stachyris nigriceps and N. grandis are insectivores inhabiting hill and submontane forest. As such, their dispersal potential is expected to be more habitat-restricted than the first three species, a feature that has been linked with greater geographic structuring (Burney & Brumfield, 2009;Chua et al., 2017). Examination of intraspecific diversity in these five species provides a suite of examples of how species with varied ecologies have responded to the sea-level and habitat changes that have occurred in Southeast Asia during the cyclic global glaciation events of the Pleistocene Woodruff & Turner, 2009).
We assayed the genetic diversity of the study species using DNA sequences linked to thousands of ultraconserved elements (UCEs; Faircloth et al., 2012). This approach has proven useful for generating data from historical as well as modern specimens (Lim & Braun, 2016;McCormack, Tsai, & Faircloth, 2016;Ruane & Austin, 2017) and for both phylogenetic Oliveros, Field, et al., 2019) and population-level studies (Harvey, Aleixo, Ribas, & Brumfield, 2017;Smith, Harvey, Faircloth, Glenn, & Brumfield, 2014). We used these data to estimate population genetic structure in each species as well as population histories including divergence times and the incidence of past gene flow between populations. Our primary goals were (a) to evaluate the degree to which patterns of differentiation are concordant or discordant across species with diverse ecologies, and (b) to assess whether population genetic structure or demographic events were associated with F I G U R E 1 Relief map of the study region (darker = high elevation) and names of various geographic places (a). Localities of samples used (red boxes) and distributions of the five study species (b-f). For each study species, subspecies ranges are delineated based mainly on textual descriptions in Dickinson and Christidis (2014) Woodruff, 2003;Woodruff & Turner, 2009), (b) the biogeographic disjunction resulting in an east-west divide in central Indochina (Fuchs et al., 2015;Manawatthana et al., 2017;Reddy & Moyle, 2011), (c) the repeated island isolation and connection in Sundaland in the Pleistocene (Lim et al., 2017;Sheldon et al., 2015), and (d) the habitat requirements and corresponding dispersal potential of individual species (Chua et al., 2017;Lim et al., 2017). We discuss the potential importance of these drivers of evolution in Southeast Asian and identify novel patterns and processes in need of further study.

| Sampling, laboratory work, and data generation
Details on sample collection, molecular laboratory work, and generation of DNA sequence data were described in Lim and Braun (2016). Briefly, we obtained tissue samples (through museum loans or collecting) of 194 individuals belonging to the five study species (Arachnothera longirostra, Irena puella, Brachypodius atriceps, Niltava grandis, and Stachyris nigriceps) and nine outgroup taxa (Table 1). To obtain an appropriate distribution of specimens for comparison, we partitioned the study region into 28 subregions and attempted to obtain, given availability of existing specimens, an even sampling across them ( Figure S1). These areas were delineated based largely on published bird distributions or areas of endemism (King, Woodcock, & Dickinson, 1975;Reddy, 2005;Robson, 2005). DNA was extracted from toe pads of historical specimens using phenol and chloroform or from fresh blood or tissue samples using Qiagen DNeasy Blood and Tissue Kits or an Autogen Gene Prep machine. DNA extracts were ligated with dual-indexed Illumina adapters and enriched for ultraconserved element (UCE) loci using the 5472 120-mer myBaits tetrapod capture probe set from MYcroarray, Inc., now Arbor Biosciences (Faircloth et al., 2012 . longirostra, 41 I. puella, 25 N. grandis, 30 P. atriceps, and 31 S. nigriceps (and up to two outgroup samples per species; Table 1). Of the successfully sequenced ingroup samples, 78.1% were historical, that is, from toe pads obtained from bird study skins that were collected between 1873 and 1986. The remaining samples were derived from muscle or blood collected fresh in the field (either frozen or in preservative) and archived in museum genetic resource collections.

| Bioinformatics processing and data analysis
The bioinformatics workflow to generate genotype data for each species group was to: (a) build a pseudoreference genome for each species using contigs from a subset of individuals (10-15 individuals per species) using Phyluce version 1 and Geneious version 7.0.6 (Faircloth, 2016) (Lim & Braun, 2016). Using map-Damage version 2.0, we observed characteristic postmortem damage in the DNA of historical samples, which includes higher rates of C →T and G →A misincorporations at the 5′ and 3′ ends of reads, respectively, due to increased deamination of C residues along single-stranded overhangs (Jonsson, Ginolhac, Schubert, Johnson, & Orlando, 2013;Parks & Lambert, 2015). In addition to stringent SNP and genotype filtering, we overcame these issues using soft-clipping. Specifically, we used very sensitive local alignment that allowed for soft-clipping of ends of reads during Bowtie read mapping (see details in Lim & Braun, 2016). This option removed most of the C → T or G → A errors because they tend to occur near the beginning of each read (Gilbert et al., 2003). Finally, we output data in variant call format (VCF) files that were then subset and parsed to conduct a variety of downstream analyses.

| Phylogenetic network of concatenated SNPs
We converted the VCF file for each species into a multisequence alignment file in fasta format (length: 1,466-10,062 bp), concatenating all the SNPs using PGDSpider version 2.1 (Lischer & Excoffier, 2012). For each individual, heterozygous sites were collapsed into IUPAC ambiguity codes and indel variants were ignored (i.e., not converted into sequence data). We then trimmed the data to generate data matrices that were 80% full (i.e., no alignment columns had more than 20% unknown bases, N) using a custom script (prune_Q _pub.py, Note: Modern genetic samples (preserved frozen tissues or blood) are labeled as "modern" under year collected. If the date of collection is unknown, it is labeled as "?". Region = sampling region indicated in Figure S1. Cluster = genetic cluster defined in Figure 2. Labels of samples match those found in tip labels of phylogenetic networks in Figure 2.
TA B L E 1 (Continued) see Data Availability). Next, we used jmodeltest version 2.1.3 to find the best substitution model for each alignment using these options: five substitution schemes, unequal base frequencies and no rate variation among sites, and tree search strategy = best (Posada, 2008).
The best models were selected based on the Bayesian information criterion (BIC). Following this step, we used the jmodeltest substitution models and sequence alignments in SplitsTree version 4.12.6 to calculate pairwise genetic distances, which were in turn used to generate phylogenetic networks with the NeighborNet algorithm (Huson & Bryant, 2006). Each NeighborNet network is a collection of splits, with each split representing one bipartition of the taxa. To simplify the networks and reduce the number of branches, we filtered splits by removing any split whose weight fell below a given threshold (5 × 10 −4 , Table 2).

| Structure analysis
To simultaneously identify genetic cluster membership and genetic admixture of individuals, we analyzed the independent-SNPs data with the MCMC-based program Structure version 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). For each species, we ran Structure using the admixture model with correlated allele frequencies and by specifying K (number of genetic clusters) from 1 to 8. For each K value, we conducted 10 independent runs, setting burn-in and number of MCMC steps to 100,000 each.
We determined an optimal K value for each species using the ΔK method described by Evanno, Regnaut, and Goudet (2005), which was implemented in the Structure Harvester version 0.6.94 web server (Earl & vonHoldt, 2012). Because of complexities in real-world population structure (e.g., hierarchical structure) and assumptions made by

| Population branching and delimitation analyses
We ran SNAPP species tree analysis in BEAST version 2.3.3 to infer branching patterns of populations in each species using the independent-SNPs datasets and populations delineated in DAPC analyses (Bryant, Bouckaert, Felsenstein, Rosenberg, & RoyChoudhury, 2012). We ran two analyses for each of the five study species. One of the analyses included only ingroup samples; the other had an outgroup taxon included (except for Stachyris, for which we lacked outgroup data). We used the resulting topologies to guide population lumping in BFD* analyses and tree building for G-PhoCS analyses (see below). We ran each SNAPP analysis for at least one million generations, sampling every 1,000 generations. The first 10% of the MCMC chains were discarded as burn-in, and log-normal distributions were used for lambda and rate priors.
We conducted Bayes factor delimitation of populations using BFD* implemented in BEAST version 2.3.3 to evaluate support for alternative sample assignment schemes in which populations are increasingly lumped together Leache, Fujita, Minin, & Bouckaert, 2014).

| Estimation of demographic parameters and divergence time
To jointly estimate the demographic parameters related to the di- For each species' G-PhoCS analyses, the underlying guide tree (population divergence) topology was derived from the results of SNAPP analyses. We used population divergence models in which bidirectional gene flow between pairs of modern, geographically adjacent populations was allowed (migration model) and diffuse gamma priors were set for all demographic parameters. The shape (α) and scale (β) parameters that define gamma prior distributions for each of the demographic parameters were as follows: α = 1 and β = 5,000 for both θ and τ, and α = 1 and β = 0.01 for migration rates.
For each species, we conducted some initial MCMC runs with auto-fine-tuning to evaluate run speed, convergence, and mixing. To To obtain demographic parameters in absolute units (e.g., number of individuals and number of years) rather than in substitution-rate-scaled units, we estimated an approximate substitution rate for the UCE loci.
To do this, we used the time-calibrated phylogenetic tree of oscine songbirds by Moyle et al. (2016). This tree is based on data generated using the same UCE probe set (Mycroarray myBaits Tetrapod UCE 5K)

F I G U R E 3
Results of Structure analysis for each of the five study species, based on the optimal K values (  (Saether et al., 2007). This estimate was used to convert the following mutation-rate-scaled demographic parameters produced

| RE SULTS
We obtained an average of 6.88 million (SD = 4.86 million) qualitytrimmed and filtered reads per individual in our final dataset (N = 162).
After SNP and genotype filtering, the total number of SNPs per species ranged from 3,919 to 18,472, and the number of variable UCE loci ranged from 753 to 4,051 (  (Jonsson et al., 2013). However, we were able to significantly reduce its impact on genotype accuracy by trimming read ends during read mapping (local alignment, see Section 2). Our analyses showed that this approach reduced C →T or G →A substitutions near the beginning of each read from an initial 10% to 25% down to 1%-2%, a rate similar to that of other types of nucleotide substitutions in the untrimmed portions of the reads (Lim & Braun, 2016; Figure S9).

| Phylogenetic networks
The number of SNPs used to construct phylogenetic networks ranged

| Population structure and differentiation
Principal components explain a substantial amount of population genetic variation in each species, with eigenvalues of the first two PCs accounting for 24.0% (B. atriceps) to 53.6% (A. longirostra) of the total variance ( Table 2) (Table 1 and Figure S2, no. 33) occurs along axis one at a location intermediate between the Indochinese and the Borneo/Sumatra cluster. In

| Population history and demography
SNAPP species trees revealed the population branching history in each species, the details of which are described below in conjunction with the demographic results (see also Figure 4). BFD* analyses most strongly support (i.e., have the least negative marginal likelihood for) the most-split population assignment model in each species (Table 4). Within a species, the Bayes factor (2 × log e BF) of the best model differs by at least 40 from other models, indicating "very strong" support (Leache et al., 2014). In addition, Bayes factors indicated that incremental lumping of populations based on SNAPP tree topology resulted in increasingly poor models, thus supporting the hierarchical relationships in the SNAPP trees ( substitutions/year (SD = 3.14 × 10 −5 ) for our UCE dataset ( Figure S5).
Based on inspection of MCMC trace plots with Tracer, all G-PhoCS runs converged and displayed appropriate mixing after about 10%-35% of the total run length, which we removed as burn-in for all parameter estimates. In general, estimates of θ(= 4N e μ, population mutation rate) for modern (vs. inferred ancestral) populations tended to have higher ESS values compared to estimates of other parameters (e.g., migration rates, Table S1). The G-PhoCS dataset of each species either included all the loci that passed filtering (up to 3,640) or a smaller subset (≥2,200 loci) because of computer memory issues with the program (Table 5).
In the study species, the oldest divergence time between two populations is c. 1 myr or more ( Figure 4).  Table 5.

| D ISCUSS I ON
We conducted sequence capture and high-throughput sequencing of historical and modern DNA samples from populations of five codistributed species of Southeast Asian rainforest birds: Arachnothera longirostra, Irena puella, Brachypodius atriceps, Niltava grandis, and Stachyris nigriceps. Because of the large amount of data and broad sampling, the analysis provides an unprecedented comparative perspective on the range-wide genetic structure and population history of these species.
Overall, we discovered population patterns that were expected based on well-known geographic structures (e.g., the Isthmus of Kra), some that were consistent with habitat requirements of the species (e.g., structuring in eurytopic vs. stenotopic species). Sequencing of samples from areas that are previously unstudied also yields novel insights (e.g., India populations of both A. longirostra and I. puella are nested within, but strongly differentiated from, their respective Indochinese populations). We combined coalescent analyses and large amounts of data to produce highly resolved estimates of population demography and divergence parameters. These estimates support earlier findings that are largely based on mtDNA data.

| Impact of marker choice
Marker choice and the fit of data to evolutionary models used for analysis can impact results and inferences. The markers selected here, sequences linked to ultraconserved genomic regions, are useful because the same regions can be sequenced across taxa, reducing the systematic biases across species that are introduced by the interaction of sequence assembly algorithms with genetic diversity in datasets of nonoverlapping markers (Harvey et al., 2015). However, the conservation of UCEs is likely associated with strong purifying selection (Katzman et al., 2007), which might impact diversity at nearby variable sites used for analyses. If the impacts of purifying selection on these datasets result in a poor fit to evolutionary models used for analysis, which often assume neutrality, it may bias estimates within or across species. Accumulating evidence suggests, however, that few if any regions of the genome are free from selection (Cariou, Duret, & Charlat, 2016;Kern & Hahn, 2018). Thus, any genomic data might be subject to some degree of model misspecification. In fact, UCEs provide a better fit to some evolutionary models that assume neutrality than other markers, such as exons (Reddy et al., 2017). In addition, comparisons across species are likely reliable if all of the datasets are similarly impacted by selection. For example, both Tajima's D and Fu and Li's D are negative in all populations for which we had large samples (Table 5).
Together, these results suggest that inferences within and across species from markers linked to UCEs are likely to be no more biased than those from other classes of markers or genomic regions.

| Correspondence with hypothesized drivers of population structure in Southeast Asia
Several geographic features, events, and forces are thought to have played an important role in the diversification of Southeast Asian organisms. The effect of the most important of these on the population structure of the five target species is summarized here.

| The Isthmus of Kra as a population barrier
The north-south split that separates Indochinese and Sundaic avifaunas (and other groups of organisms) has intrigued biogeographers for generations (Wallace, 1876). The traditional view of this division is that the Isthmus of Kra (usually identified as the line connecting Chumphon and Ranong, Thailand, c. 10.5°N99°E), or the region just north of it, forms the interface between Indochinese and Sundaic biogeographic regions (Holt et al., 2013). This transition was thought to result from ancient marine transgressions (vicariance) at the Isthmus and/or a change in vegetation Woodruff, 2003). The former hypothesis has subsequently been rejected by Woodruff and Turner (2009); the latter is still viewed as a potentially important force in separating rainforest plant and animal species north and south (Baltzer & Davies, 2012

TA B L E 4
Results of BFD* analyses Figure 1) and from seasonal rainforest to mixed deciduous forest further north. It is driven mainly by variation in rainfall ( Figure S6A; van Steenis, 1950).

| The east-west transition in central Indochina
Two of the target species, S. nigriceps and B. atriceps, exhibit a distinct east-to-west genetic break in Indochina (Figure 2). In S. nigriceps, the east-west divide is especially well resolved spatially.
It occurs in the northern portion of the Tenasserim Range (the mountains that separates the majority of Myanmar from Thailand) TA B L E 5 Number of loci, and alignment length, and population genetics metrics of UCE loci used in G-PhoCS analyses  (Fu & Li, 1993;Tajima, 1989). or the Salween River, which runs along the western side of the Tenasserim Range into the Andaman Sea. In B. atriceps, the two northernmost individuals in the western Indochina group were collected in eastern Bangladesh and, therefore, the location of the split between the eastern and western groups is not as clear as in S. nigriceps. We suggest it may lie in the Tenasserim Range or the dry Irrawaddy Plains of central Myanmar.
The east-west genetic divide exhibited by S. nigriceps in Indochina was identified in the past based on species or subspecies distributions of other taxa (Deignan, 1945;Smythies, 1953). The

| Island connection and disconnection in Sundaland
The Greater Sunda Islands and the mainland experienced repeated connection and disconnection during the Pleistocene because of eustatic sea-level changes associated with periodic global glaciation events (Whitmore, 1981). Corresponding to these climatic events, habitat type, position, and coverage changed dynamically both on and between the islands (Cannon, Morley, & Bush, 2009;Morley, 2012).
These dynamics are known to have influenced rainforest bird populations in variable ways, resulting in (a) little structure among some Sundaic populations because of gene flow during land-bridge connections, (b) substantial structure among some island populations because of presumed habitat barriers (forests with relatively open canopy) across land bridges or dispersal limitation, (c) substantial population structure within Borneo because of early Pleistocene isolation followed by presumed more recent dispersal and secondary contact, and (d) substantial variation among and within islands because of paraphyly or pre-Pleistocene divergence among populations/species (Lim et al., 2017Moyle et al., 2011;Sheldon et al., 2009).
Our current study finds some concordance between population structure in Sundaland, and dispersal ability and habitat requirements of species. Across Sundaland, B. atriceps exhibits the least amount of structure. It is a partially frugivorous species that often forages along forest edges and may thus be more vagile than the other study species (Fishpool & Tobias, 2005). The lack of genetic distinctiveness is true even for the sole individual representing the baweanus species, which is a gray-morph bird (USNM 181552; no. 12, Figure S2) collected from Bawean Island in the Java Sea. The Bawean Island lies on the Sunda Shelf and was connected to other Sunda landmasses when sea level was approximately 50 m below current level (Voris, 2000). The other two eurytopic species (A. longirostra and I. puella) show more pronounced population structure in Sundaland, with peripheral Sunda islands (Palawan and/or Java) containing populations that diverged the earliest (Figure 4).

| Implications of divergence times and demographic histories
The timing of the deepest splits within our study species ranged from c. 1 to 1.5 mya (Figure 4). Such large within-species divergence values are not unusual in the tropics (Smith, Seeholzer, Harvey, Cuervo, & Brumfield, 2017), including Southeast Asia.
For example, within the bulbul species Iole propinqua (eastern Indochina) and I. viridescens (western Indochina), the deepest estimated population divergence times are 0.9 and 1.7 mya, respectively (Manawatthana et al., 2017). Similarly, between subspecies of Alophoixus ochraceus in the Thai-Malay peninsula and eastern Thailand/Vietnam, the divergence time is estimated to be 1.2 mya (Fuchs et al., 2015). Leonard et al. (2015) (Esselstyn et al., 2010;Lim et al., 2014). Palawan came to its current location northeast of Borneo c. 10 mya (Hall, 2009) but likely experienced periodic rainforest contraction, even as recently as 21 kya (Wurster et al., 2010), which would have affected population demography of its rainforest species. Lim et al. (2014) reviewed mtDNA divergence levels between pairs of avian sister taxa found on northern Borneo and Palawan, and they fell mainly into two groups: deep (7.9%-14%) and shallow (0.3%-0.9%  (Fuchs et al., 2015). A corresponding split in the Pycnonotus melanicterus bulbul species group, however, is dated at only c. 0.4 mya .
The east-west divide of B. atriceps is deep (1.4 mya), despite a low relative measure of population differentiation (F ST = 0.061).
This disparity may be caused by this species' unusually large N e , which would have slowed the rate of genetic drift. Using mtDNA data, Dejtaradol et al. (2016, Figure 3d) estimated the same split to be around 3 mya, but this assumes their northern lineage is equivalent to our eastern Indochina lineage. Within Sundaland, the ge-

| Future prospects for Southeast Asian phylogeography
Because of poor sampling in Southeast Asia, species-level studies usually present incomplete pictures of population structure and history across the region. To ameliorate this problem, researchers have turned to comparing mtDNA from historical museum specimens, but mtDNA comparisons can yield incorrect measures of genealogical relationships and they generally provide limited population genetic information (Funk & Omland, 2003). By applying next-generation sequencing techniques to historical DNA extracted from museum specimens, as well as modern samples, our study represents a breakthrough for range-wide phylogeographic studies in Southeast Asia. We were able to substantially improve geographic sampling and sampling of the genome. However, we still have a long way to go to achieve an in-depth understanding of Southeast Asian phylogeographic history. Facilities and expertise for conducting high-throughput DNA sequencing on historical samples are currently limited, and the cost is likely prohibitive for many researchers (Bi et al., 2013). Moreover, although we managed to use historical specimens in our study, the quality of DNA from these samples is naturally inferior to that from freshly collected samples. Historical museum specimens produce sequences that are often fragmentary and require substantial error correction. Further, although existing museum specimens improve geographic coverage, they may still represent patchy or biased samples of species' distributions. We believe strongly that extensive modern sampling is required, sampling that preserves as much information from the specimens as possible (e.g., skeletal and muscular structure, stomach contents, parasites, microbiomes, RNA molecules, proteins, soft parts) and which may be applied to studies of diet, parasitology, toxicology, epidemiology, etc., as well as helping to solve phylogeographic issues (Webster, 2017). Modern collections are essential not only for the study of evolution and systematics, they also become important snapshots in time as Southeast Asia experiences unprecedented changes to its natural environment. History (Kristof Zyskowski). We are grateful to three anonymous reviewers for providing helpful comments on the manuscript.

CO N FLI C T O F I NTE R E S T
None declared.