Evaluating hybridization capture with RAD probes as a tool for museum genomics with historical bird specimens

Abstract Laboratory techniques for high‐throughput sequencing have enhanced our ability to generate DNA sequence data from millions of natural history specimens collected prior to the molecular era, but remain poorly tested at shallower evolutionary time scales. Hybridization capture using restriction site‐associated DNA probes (hyRAD) is a recently developed method for population genomics with museum specimens. The hyRAD method employs fragments produced in a restriction site‐associated double digestion as the basis for probes that capture orthologous loci in samples of interest. While promising in that it does not require a reference genome, hyRAD has yet to be applied across study systems in independent laboratories. Here, we provide an independent assessment of the effectiveness of hyRAD on both fresh avian tissue and dried tissue from museum specimens up to 140 years old and investigate how variable quantities of input DNA affect sequencing, assembly, and population genetic inference. We present a modified bench protocol and bioinformatics pipeline, including three steps for detection and removal of microbial and mitochondrial DNA contaminants. We confirm that hyRAD is an effective tool for sampling thousands of orthologous SNPs from historic museum specimens to describe phylogeographic patterns. We find that modern DNA performs significantly better than historical DNA better during sequencing but that assembly performance is largely equivalent. We also find that the quantity of input DNA predicts %GC content of assembled contiguous sequences, suggesting PCR bias. We caution against sampling schemes that include taxonomic or geographic autocorrelation across modern and historic samples.

Although high-throughput sequencing has already proved widely useful for incorporating museum specimens into phylogenomic studies (Besnard et al., 2015;Burbano et al., 2010;McCormack et al., 2012), its application for collecting genome-wide markers at the population level has lagged behind its use for addressing questions at deeper evolutionary time scales due to limitations in the most commonly employed library preparation methods for reduced-representation Illumina sequencing (Suchan et al., 2016). The limitations of historic museum samples include their high degree of fragmentation and low concentration of long DNA fragments, which reduces the amount of flanking sequence that can be captured using ultraconserved element probes  and lowers the likelihood that multiple restriction digest recognition sequences are retained in a given DNA fragment (Baird et al., 2008;Peterson, Weber, Kay, Fisher, & Hoekstra, 2012). Only in the past few years have library preparation protocols suitable for population genomics become available (Bi et al., 2013;Jones & Good, 2016;McCormack, Tsai, & Faircloth, 2016), but their recent proliferation has meant that few have yet to be applied to multiple study systems in independent laboratories (McCormack et al., 2016).
As a result, our understanding of the efficacy and biases of different approaches to reduced-representation genome sequencing from degraded DNA remains incomplete relative to either Sanger sequencing (Soltis & Soltis, 1993;Wandeler et al., 2007) or high-coverage, singlesample whole genome sequencing (Poinar et al., 2006).
One promising but under-tested approach to museum genomics suitable for population-level studies is hybridization capture of restriction site-associated DNA (RAD) probes (hyRAD) (Suchan et al., 2016). Briefly summarized, the hyRAD method uses fragments produced by a double digest RAD (ddRAD) protocol (Peterson et al., 2012;Suchan et al., 2016) as the basis for biotinylated probes that capture orthologous loci in other samples, allowing them to be enriched and indexed for pooled Illumina sequencing. Although the method requires a high molecular weight DNA sample to produce the probe set, hyRAD offers advantages over other targeted capture methods in requiring no prior knowledge of the organism's genome, such as transcriptome data or pre-existing sequences for probe design (Bi et al., 2013;McCormack et al., 2016). Additionally, because hyRAD relies on hybridization capture of orthologous regions across samples rather than retained restriction-site recognition sequences, the method mitigates the concerns of allelic dropout due to polymorphisms at restriction sites with increasing phylogenetic distance intrinsic to other RAD-based protocols (Gautier et al., 2013).
In their original paper, Suchan et al. (2016)  (Lycaena helle) and grasshopper (Oedaleus decorus). They discussed the impact of library preparation, sample type, and bioinformatics pipeline on the number of SNPs produced. Here, we provide an independent assessment of the effectiveness of hyRAD using both fresh avian tissues and dried tissue taken from museum specimens up to 140 years old. We present a modified version of the hyRAD protocol aimed at increasing efficiency and minimizing reagent use and employ a custom bioinformatics pipeline with steps for detecting and removing microbial contamination in raw reads, contiguous sequences, and SNPs. We utilize hyRAD data to describe phylogeographic patterns in a New Guinea forest kingfisher (Syma torotoro) and we expand the available description of hyRAD's performance by investigating how variable input DNA affects sequencing, assembly, and population genetic inferences.

| Study species, sampling, and DNA extraction
A major promise of museum genomics is the ability to conduct population-level studies in regions that are too logistically difficult to be amenable to broad modern sampling programs. The island of New Guinea is an apt example of this scenario, with poorly known biodiversity, large historical collections, rugged terrain, and ongoing political instability (Mack & Dumbacher, 2007;Pratt & Beehler, 2014). Phylogeographic research in New Guinea has been limited (Deiner, Lemmon, Mack, Fleischer, & Dumbacher, 2011;Dumbacher & Fleischer, 2001), especially in the species inhabiting the island's ring of lowland tropical rainforest. To evaluate the efficacy of for use in a broader study of the phylogeography of lowland new Guinea, we sampled 21 individuals of forest interior resident S. torotoro (Yellowbilled Kingfisher), representing five named subspecies and the breadth of the species' range on the island of New Guinea (Table 1).
For seven individuals, we extracted whole-genomic DNA from fresh tissue using a DNeasy tissue extraction kit (Qiagen, Valencia, CA, USA) following the manufacturer's protocol. For the remaining 14 individuals, we extracted DNA from the toe-pads of museum study skins in a dedicated ancient DNA laboratory at the California Academy of Sciences using a phenol-chloroform and centrifugal dialysis method described elsewhere (Dumbacher & Fleischer, 2001). No modern DNA or post-PCR products are handled in this laboratory, which is located on a separate floor from the main genetics facility.

| Library preparation, hybridization capture experiments, and sequencing
We prepared samples for reduced-representation whole genome sequencing using a modified version (Hanna & Sellas, 2016) of Suchan et al.'s (2016) hyRAD method aimed at increasing efficiency Beverly, MA, USA) with a target peak at 270 bp and "tight" size selection range. We ran 16 cycles of real-time polymerase chain reaction (RT-PCR) and purified products by gel excision and a Zymoclean Gel Recovery Kit (Zymo Research). We preserved one aliquot of this product for sequencing while performing an additional MluCl/SphI double digest to de-adapterize a second aliquot. We labeled this deadapterized aliquot with biotin-14-dATP, using a BioNick DNA Labeling System (Thermofisher Scientific).
To produce whole genome libraries, we sheared high molecular weight DNA from modern tissue samples to ~400 bp using a M-220 Focused-ultrasonicator (Covaris). DNA from museum specimen toepads was already fragmented as a product of natural degradation associated with the age of the samples and was therefore left untreated. We then bound the probes to the beads by mixing and incubating at room temperature for 30 min. After performing three 500 μl washes of the bead-probe mixture using a prewarmed buffer containing 10% SDS with 0.1× SSC, we concentrated our final pooled libraries in 30 μl 10 mm Tris-HCl, 0.05% Tween-20 (pH 8-8.5). We next amplified these libraries using RT-PCR for 9-2 cycles, cleaned using 1.2× Ampure XP beads, and quantified using Qubit. We sent a single final pool with equimolar amounts of all three hybridized pools to University of California Bekeley's QB3 Vincent J. Coates Genomics Sequencing Laboratory (hereafter called "QB3") for sequencing with 100-bp paired-end sequence reads on a single lane of an Illumina HiSeq 4000.

| Sequence read quality control, assembly, and alignment
To clean and quality filter reads, assemble reads into contigs, align sequences across samples, and map reads to merged alignments for SNP discovery, we used a custom pipeline combining in-house R scripts as well as pre-existing genomics tools and wrapper scripts from QB3's two de novo targeted capture bioinformatics pipelines (https://github. com/CGRL-QB3-UCBerkeley; "denovoTargetCapturePopGen" and "denovoTargetCapturePhylogenomics"). We present our full pipeline online as both a tutorial and a list of shell commands (https://github. com/elinck/hyRAD/) ( Figure 1).
We first processed reads from our probe library with pyRAD version 2.17 (Eaton, 2014) to create a pseudo-reference genome to use as the basis for aligning sequences from samples in our hybridization capture reactions. After quality-filtering reads and trimming adapter contamination, pyRAD used the vsearch algorithm (Rognes, Flouri, Nichols, Quince, & Mahé, 2016) to cluster reads into loci within samples, cluster loci into stacks between samples, and aligned putatively orthologous loci using MUSCLE version 3.8 (Edgar, 2004). We implemented strict adapter filtering, retained reads longer than 70 bp after trimming, set a minimum sequence identity threshold of 97% for clustering, and kept four sites per cluster with a Phred Quality Score <20.
(This strict identity threshold was selected to generate an estimate of the total number of fragments in our probe library.) We removed repetitive genomic regions and paralogs with the NCBI BLAST+ version 2.4 tool BLASTn (Boratyn et al., 2013) by aligning the output file of assembled clusters against itself and retained only cluster sequences that aligned uniquely to themselves using an e-value of 0.00001.

| SNP discovery
Traditional SNP calling algorithms based on allele counting and quality scores are characterized by high degrees of uncertainty with lowcoverage sequence data (Korneliussen, Albrechtsen, & Nielsen, 2014).
We incorporated uncertainty into genotype estimation by calling SNPs and estimating allele frequencies using an empirical Bayesian framework implemented in the software ANGSD version 0.913 (http:// www.popgen.dk/angsd/index.php/ANGSD). ANGSD uses the likelihood of all 10 possible genotypic configurations for each site passing quality filters in all individuals to estimate a site frequency spectrum, which is then used as a prior to estimate the posterior probabilities for all possible allele frequencies at each site in each sample. Using these estimates, we called SNPs with a 95% probability of being variable and a minimum minor allele frequency of 5%.

| Contamination control and data filtering
In order to identify if any contigs in our assemblies represented offtarget mtDNA captures, we performed a BLAST+ (Boratyn et al., 2013) nucleotide search with each of our assemblies as a query against a database of the full mitochondrial genome of S. torotoro relative Halcyon santcus. We then removed all matching contigs from each sample's assembly fasta with in-house R scripts ("excerptcontigIDs.R" and "cutcontigsbatch.R") and used these mtDNA-free sequences for all subsequent assembly performance calculations. To prepare our sequence alignment in.sam/.bam format for SNP calling, we followed Bi et al. (2013)  Because our SNP matrix was derived independently from our assemblies, we performed a second step of contamination filtering by removing SNPs originating from read alignments to regions of exogenous microbial DNA and/or mtDNA present in the original probe sample RAD library. We used our full pseudo-reference genome as the query in a search of the entire BLAST+ (Boratyn et al., 2013) nucleotide database and our H. sanctus mtDNA BLAST+ database. We then used Henderson and Hanna's (2016) "GItaxidIsVert.py" script to identify sequences that were potentially microbial in origin and performed a second BLAST+ search with this subset to further select only the subset of contigs that had their best or only alignment with nonvertebrate reference genomes. To exclude such sequences as well as those aligning with mtDNA sequence, we used the vcftools version 0.1.11 "-not-chr" flag and removed indels in the same step with the "-remove-indels" flag.
We estimated independent empirical gene coverage and site depth distributions using QB3's denovoTargetCapturePopGen "9-preFiltering" script and used these distributions as input to the QB3 "10-SNPcleaner" script. Run separately for modern and historical samples, this script removed all sites with coverage below 6×, sites missing in more than half of our samples, and sites with variant identity biases associated with quality score, mapping quality, or distance of alleles from the ends of reads. Because hydrolytic deamination of cytosine (C) to uracil (U) residues is the most common form postmortem nucleotide damage present in historic museum specimens, which may result in misincorporation of thymines (Ts) instead of uracil during PCR amplification and bias population genetic inference, this script also eliminated all C to T and G to A SNPs (Axelsson, Willerslev, Gilbert, & Nielsen, 2008;Briggs et al., 2007;Hofreiter, Jaenicke, Serre, von Haeseler, & Pääbo, 2001). Finally, we used the BEDtools version 2.26.0 "intersect" function (Quinlan & Hall, 2010) to retain only the sites that passed all filters for both historic and modern specimens.

| Statistical analyses
To assess the differences in sequencing and assembly performance between modern and historical samples, we implemented Wright's two-sample t tests in the R (R Core Team 2016). We evaluated differences between groups in the mean percentage of duplicate reads, the mean number of on-target contigs, the mean length of on-target contigs, and the percentage of sequenced reads successfully mapping to our pseudo-reference genome. Because some commonly used polymerases bias against amplification of targeted DNA in favor of the GC-rich microbial contamination common to extracts from museum specimens (Dabney & Meyer, 2012), we assessed differences in GC content present in assembled on-target contigs. In order to determine whether sample age, initial sample DNA concentration, or sequencing effort were significant predictors of %GC content among historical samples and mean number or length of on-target contigs, we used simple linear regression. We used stepwise model selection with corrected Akaike information criterion scores to determine best-fit models and did not include interaction terms to avoid overparameterization given our small sample size.

| Population genetic clustering and discriminant analysis of principal components
Although accurate estimation of population genetic structure in S. torotoro was not the primary goal of our study, we were nonetheless interested in assessing hyRAD's ability to produce biologically meaningful results by testing whether our data reflected the signature of phylogeographic processes such as isolation by distance (IBD) and vicariance, rather than the signature of DNA degradation, contamination, or other artifacts of library preparation and sequencing. We implemented k-means clustering and discriminant analysis of principal components (DAPC) in the R package adegenet (Jombart, 2008), using 100% complete data matrix (1,690 SNPs) to avoid biasing inferences with nonrandom patterns of missing data. We retained all principal component (PC) axes for k-means clustering and inspected both population assignments and change in Bayesian information criterion (BIC) scores across multiple values of K to select an optimal partitioning scheme. To maximize among-population variation and calculate ancestral population membership probabilities for each sample, we performed DAPC on the first six PCs using two discriminant axes. We then chose to retain these six PCs to optimize the a-score value for our data, which is the difference between the proportion of successful reassignment of the analysis and values obtained using random groups. However, because the change in BIC scores failed to clearly indicate any "true" value of K, we repeated our analysis with clustering assignments for K values of 1-8. To explore correlations between our three retained PCs and variables expected to differ between modern and historic samples (specificity, sensitivity, fold enrichment, age, initial concentration), we again performed simple linear regressions. Finally, to test for patterns of IBD across our samples, we performed a Mantel test among all individuals based on 999 simulated replicates using the R package ade4 (Thioulouse & Dray, 2007).

| Hybridization capture experiments and sequencing
We obtained a total of 397 million sequence reads for the probe and hybridization capture libraries, successfully demultiplexing 20/21 samples, with one sample failing due to bar code error. The total reads per sample ranged from 1.6 million to 30.7 million, and the average number of reads per sample did not vary significantly between modern and historical samples (t = −0.946, df = 14.6, p = .359). Of the original 397 million reads, 19.8% passed initial Illumina quality filters, contamination checks, adapter trimming, and removal of PCR duplicates (Table 1). The resulting number of cleaned reads per sample ranged from approximately 16,000 to 3.1 million, with an average count of 766,662, and significantly fewer reads for historic samples (t = −3.185, df = 13.088, p = .007).
Most reads lost to quality control were PCR duplicates, with a range of 50.9%-91.1% duplicate reads per sample. 2,455 reads were removed as E. coli contamination from 12 of 20 individuals (range 1-2,318 reads per individual) ( Table 2). The average depth of read coverage per sample, calculated as the read depth per base averaged across the length of the pseudo-reference genome, ranged from 7.6 to 26.4, and was significantly lower in historic samples (t = −3.754, df = 12.632, p = .002).

| Assembly and alignment results
Assembly of our probe library in pyRAD resulted in a total of 554,048 contigs and 61.9 million nucleotides (nt), which was reduced to 160,014 unique contigs and 16.1 million nt after excluding repetitive regions. We captured orthologous loci from all successfully sequenced samples, and after merging the probe library with flanking regions from assemblies of other modern tissue samples our extended pseudo-reference genome contained 29,297 loci. The number of contigs per sample that was orthologous to our probe library ranged from 55 to 23,155 and was significantly higher in modern samples (Table 1; Figure 2). Across all samples, we discarded 55%-92% of the total number of assembled clusters as off-target loci, losing significantly more from historic samples (t = 6.6035, df = 16.889, p < .001). We removed an additional eight contigs from the modern samples and 53 contigs from the historic samples due to their mitochondrial origin (Table 2). Mean contig size ranged from 401 to 611 nt and did not differ significantly between modern and historic samples. However, the historic samples had significantly fewer contigs exceeding 1 knt in length (t = -3.181, df = 5.004, p = .025). The percentage of reads passing quality filters that successfully mapped , and average coverage (t = −6.248, df = 7.555, p < .001) in modern samples. We recovered a significantly higher total number of ontarget loci in modern samples (t = −12.239, df = 5.221, p < .001), but significantly higher mean percent GC content in historic samples (t = 6.997, df = 13.368, p < .001). We observed no significant differences between modern and historic samples for mean contig size or mean percentage of duplicate reads to the pseudo-reference genome (also known as specificity) ranged from 51.8% to 57.7% and was significantly higher on average for modern samples (Figure 2). Additionally, %GC content was significantly higher in historic than modern samples (Figure 2). Among historic samples, the number of cleaned reads was a significant predictor of the number of captured loci and sample input DNA quantity was a significant predictor of %GC content in on-target assembled contigs (Figure 3).

| SNP discovery and filtering
We identified two contigs of mitochondrial origin and eight contigs of potential nonvertebrate origin in our pseudo-reference genome and excluded all sites from these contigs in our alignment prior to SNP calling (  (Figure 4).  (Table 3).

| Population genetic clustering and discriminant analysis of principal components
F I G U R E 3 Among historic samples, the number of trimmed reads was a significant predictor of the number of captured loci (R 2 = .872, p < .001) and the initial sample DNA input quantity was a significant predictor of percent GC content in on-target assembled contigs (R 2 = .370, p = .016)

| Hybridization-RAD is an effective tool for sampling thousands of orthologous SNPs from historic museum specimens
In their description of hyRAD, Suchan et al. (2016) suggest that it allows "sequencing of orthologous loci even from highly degraded DNA samples" and can be used to retrieve sequence data using museum samples up to 100 years old.
Our hybridization capture experiments in an independent laboratory with an independent study organism (S. torotoro) largely support   T A B L E 3 P values for simple linear regression between PC axes and sample variables Suchan et al. (2016) included historical museum specimens up to 58 years old in their validation experiment and up to 100 years old in their pilot study. We successfully captured 55 on-target loci, including four contigs exceeding 1 kb in length, from a specimen collected in 1877, and as many as 508 loci from specimens collected from 1896 to 1934 (Table 1). Moreover, and contrary to similar analyses by Suchan et al. (2016) and McCormack et al. (2016), we found no significant linear relationship between age and assembly or sequencing success metrics across historical samples. While this is possibly an artifact of small sample size relative to both previous studies, it may also suggest a relatively shallow trend of degradation during a period when many bird specimens in natural history museums were originally collected.
This is an encouraging result for researchers looking to make use of these specimens as a genomic resource.

| Variation in sequencing and assembly performance across sample types reduces efficiency
Our Encouragingly, there were no significant differences between the two sample populations for the overall percentage of duplicate reads.
However, we wish to highlight the high percentage of duplicate reads present in all samples (50.9%-91.1%). This may be due to combination of the relatively high number of amplification cycles used to amplify libraries with low input DNA prior to pooling. While high duplicate read percentages have also been reported in RADseq studies with fresh tissue (Andrews et al., 2016), this inefficiency is important to consider when working with valuable, low-quality historical samples, as increased sequencing effort may be required to generate sufficient read depth for variant detection and accurate assembly of contiguous sequences.
Lastly, following assembly, the total number of captured on-target loci was higher among modern samples, likely reflecting higher copy number and limited degradation of DNA from fresh tissues. In contrast, mean contig length did not vary significantly among samples, indicating similar assembly performance relative to the amount of high-quality data for each sample type. While we believe this result will prove robust to different assembly methods, we encourage future studies to explore their influence on resulting assemblies and downstream analyses.

| Input DNA quantity predicts GC content, suggesting PCR bias
Although inferences are similarly limited by sample size, our regression analyses largely failed to reveal significant correlations between input DNA/sequencing variables and assembly performance except in two comparisons ( Figure 2). First, an increase in the number of filtered reads was positively correlated with the total number of assembled on-target contigs, which matches a standard expectation of increased recovery with greater sampling of a genomic library with an uneven distribution of fragments, suggested for our libraries by the high levels of duplication (Table 1). Second, the initial quantity of input DNA in a sample was negatively correlated with %GC content in resulting assemblies, for example, the two samples in our study that with the lowest input DNA quantity also the highest percentage of GC content across all assembled contigs (Table 1). This finding may reflect biased PCR enrichment of GC-rich exogenous microbial contamination in samples with low initial input DNA quantity (Dabney & Meyer, 2012) and explain the significantly higher GC content of historic samples overall. While our SNP calling pipeline and data filtering removed sites potentially originating from nonvertebrate sequences (although see further discussion below), we nonetheless recommend researchers interested in applying hyRAD to historical specimens heed the recommendations of recent empirical studies (Gamba et al., 2016) to select an extraction protocol suited to degraded DNA and maximize input tissue quantity whenever possible.

| Geographic and/or taxonomic autocorrelation with input DNA type is potentially problematic with hyRAD studies
We implemented rigorous and conservative laboratory and bioinformatic protocols to reduce the influence of exogenous DNA contamination and postmortem DNA degradation, the results of which we summarize as a reference for future studies in Table 2. Despite these precautions, repeated DAPC with different parameters failed to change a basic pattern where all modern DNA samples (n = 6) clustered until the chosen K value of ancestral populations was seven or more ( Figure 5). Unfortunately, as the close geographic proximity of modern DNA samples would also lead to this pattern, we could not easily determine from our sampling scheme if this pattern reflected biological reality or whether factors correlated with sample type, such as undetected microbial contamination, DNA degradation, and/or library amplification artifacts, were affecting population genetic inference. We first attempted to disentangle its potential drivers during PCA and DAPC analyses by examining histograms of %GC content per read for anomalous distributions, but failed to detect a significant second peak indicative of contamination with exogenous GC-rich microbes. We next performed linear regressions with specificity, sensitivity, fold enrichment, specimen age, and input DNA quantity as predictors for each of our first three retained PCs. From these regression analyses, we found significant correlations of all variables with PC1, but no other PCs (Table 3). While these results are consistent with the possibility that biased sequencing performance affected population genetic inference, the exact mechanism responsible remains unclear. To avoid artifacts related to the separate treatment of different sample types, we suggest randomizing individuals from both modern and historic DNA sources throughout library preparation, hybridization capture, and sequencing. Additionally, we suggest researchers intending to use hyRAD in studies with both modern and historic tissue attempt to avoid geographic and taxonomic autocorrelation wherever possible and include a control in their sampling scheme to indicate potential DNA input quality problems.

| The phylogeography of S. torotoro reflects biogeographic barriers in New Guinea
Our DAPC results are consistent with previous studies of lowland avian phylogeography in New Guinea, and we interpret them as independent confirmation the ability of hyRAD to reveal biologically meaningful patterns. K-means clusters recovered for three ancestral populations reflect broad trends in codistributed taxa and expected patterns of genetic variation given geographic barriers and the geologic history of New Guinea ( Figure 5) (Deiner et al., 2011;Dumbacher & Fleischer, 2001). Although IBD across all samples was not significant (see Section "3" for details), the initial division of samples into mainland eastern and western clusters is consistent with both the primary latitudinal axis of the island and the barriers to gene flow in lowland forest taxa presented by the Bewani Mountains and Trans-Fly savannah region (Deiner et al., 2011;Mack & Dumbacher, 2007).
Inclusion of Normanby Island subspecies S. t. ochracea in the western New Guinea cluster is consistent with previous studies that have reported genetic similarity between other taxa in far eastern and far western New Guinea, such as birds of the genus Pitohui (Dumbacher & Fleischer, 2001). Samples from southwest New Guinea clustered with those from the Aru Islands, suggesting shared ancestry among these currently allopatric populations. This is potentially explained by both the linkage of these landmasses during the Pleistocene via the Sahul Shelf (Voris, 2000) and the subsequent emergence of previously identified barriers to avian gene flow to the north, east, and west in the form of the Central Ranges, the Trans-Fly Savannah, and Aetna Bay, respectively (Dumbacher & Fleischer, 2001;Deiner et al., 2011;.) Our analysis reveals broad similarities between the phylogeography of S. torotoro and the codistributed lowland bird species Colluricincla megarhyncha (Deiner et al., 2011), albeit with lower resolution due to the inherent limitations of our sampling. We believe that future studies of resident lowland forest species with similar ranges that use hyRAD or other means of capturing nuclear DNA markers will continue to aid in building a cohesive picture of the comparative phylogeography of this biodiverse region.

ACKNOWLEDGMENTS
This material is based upon work supported by the National Science OD018174. In PNG, we thank the Institute of Biological Research, the National Museum and Art Gallery, the National Research Institute, and the Department of Conservation. We add special thanks to the many New Guineans who provided field and logistical assistance, especially Bulisa Iova and tribal leaders of the villages that participated in and allowed our work.

CONFLICT OF INTEREST
None declared.

AUTHOR CONTRIBUTIONS
EBL and JPD designed the research; AS and ZH developed the wet-lab protocol; EBL and AS performed molecular work; EBL and ZH developed the bioinformatics pipeline and analyzed data; EBL wrote the manuscript with editorial contributions from all authors.