A molecular phylogeny of historical and contemporary specimens of an under‐studied micro‐invertebrate group

Abstract Resolution of relationships at lower taxonomic levels is crucial for answering many evolutionary questions, and as such, sufficiently varied species representation is vital. This latter goal is not always achievable with relatively fresh samples. To alleviate the difficulties in procuring rarer taxa, we have seen increasing utilization of historical specimens in building molecular phylogenies using high throughput sequencing. This effort, however, has mainly focused on large‐bodied or well‐studied groups, with small‐bodied and under‐studied taxa under‐prioritized. Here, we utilize both historical and contemporary specimens, to increase the resolution of phylogenetic relationships among a group of under‐studied and small‐bodied metazoans, namely, cheilostome bryozoans. In this study, we pioneer the sequencing of air‐dried cheilostomes, utilizing a recently developed library preparation method for low DNA input. We evaluate a de novo mitogenome assembly and two iterative methods, using the sequenced target specimen as a reference for mapping, for our sequences. In doing so, we present mitochondrial and ribosomal RNA sequences of 43 cheilostomes representing 37 species, including 14 from historical samples ranging from 50 to 149 years old. The inferred phylogenetic relationships of these samples, analyzed together with publicly available sequence data, are shown in a statistically well‐supported 65 taxa and 17 genes cheilostome tree, which is also the most broadly sampled and largest to date. The robust phylogenetic placement of historical samples whose contemporary conspecifics and/or congenerics have been sequenced verifies the appropriateness of our workflow and gives confidence in the phylogenetic placement of those historical samples for which there are no close relatives sequenced. The success of our workflow is highlighted by the circularization of a total of 27 mitogenomes, seven from historical cheilostome samples. Our study highlights the potential of utilizing DNA from micro‐invertebrate specimens stored in natural history collections for resolving phylogenetic relationships among species.


| INTRODUC TI ON
Robust phylogenetic hypotheses are crucial to understanding many biological processes, ranging from those contributing to population history to those creating macroevolutionary patterns. The development of methods for phylogenetic estimation and high throughput sequencing (HTS) have, in combination, improved our understanding of the relationships among extant organisms. These advances are frequently applied to the estimation of relationships among higher taxa, for example, distantly related animal phyla (Laumer et al., 2019;Simion et al., 2017), families of flowering plants (Léveillé-Bourret et al., 2017), and orders of birds (Jarvis et al., 2014). While these deeper branches have received significant attention, those at the leaves (e.g., genera and species) often remain largely unresolved, especially for taxa that are under-studied. For the resolution of relationships at lower taxonomic levels, crucial as a backbone for answering many evolutionary questions, a rich and broad species representation is vital.
Traditionally, researchers have sequenced relatively freshly collected specimens. However, there is an increasing utilization of historical specimens for molecular phylogenetic reconstruction in the absence of fresh tissue (e.g., Jarvis et al., 2014). Historical specimens from museum or other institutional collections are valuable sources of information representing organisms that may be difficult or impossible to sample in contemporary populations (Holmes et al., 2016).
Alternatively, complete sequences of target regions from related species and/or genera (Anmarkrud & Lifjeld, 2017;Billerman & Walsh, 2019) are used to design baits or probes (Derkarabetian et al., 2019;Ruane & Austin, 2017). While these approaches are excellent for groups whose sequences are relatively well-understood, they are unfeasible for clades with low inter-genus sequence conservation, or for those lacking sequence data from closely related reference organisms, such as those we will present here.
Here, we use historical and contemporary material from lesser-studied, small-bodied organisms for the purpose of reconstructing robustly supported phylogenetic relationships using molecular data. Our target organisms are cheilostomes, the most species-rich order of the phylum Bryozoa, with ca. 6,500 described extant species, representing about 80% of the living species diversity of the phylum (Bock & Gordon, 2013). Cheilostomes are lightly to heavily calcified, sessile, colonial metazoans common in benthic marine habitats. Most species are encrusting, while fewer are erect, and most are small (colony size c. 1 cm 2 , module size c. 500 μm × 200 μm), and live on hard substrates that may be overgrown by other fouling organisms (including other bryozoans, hydroids, foraminifera, and tube worms).
Systematic relationships among cheilostome bryozoans remain largely based on morphological characters (Bock & Gordon, 2013) with molecular phylogenies being restricted to recently collected, ethanol-preserved samples where genetic data were obtained using PCR-based methods (Fuchs et al., 2009;Knight et al., 2011;Waeschenbach et al., 2012), or more recently, by a HTS genome-skimming approach . While these studies have improved our understanding of the phylogenetic relationships among cheilostomes, many key taxa, potentially filling important phylogenetic positions, are hard to procure and remain unfeatured in sequencing projects.
Contributing to the advancement of historical DNA methods (Billerman & Walsh, 2019;Knapp & Hofreiter, 2010), we pioneer the sequencing of air-dried bryozoan specimens (i.e., never preserved in ethanol) that have been stored up to 150 years since collection.
To do so, we employ a recently developed DNA library preparation method (SMARTer ThruPLEX, Takara) to amplify and sequence historical samples with low DNA concentrations. We bypass the need for primers/probes, a clear advantage for cheilostomes which are known to have low inter-genus sequence conservation Waeschenbach et al., 2012). We use the de novo mitogenome assembly from the target colony itself as a reference for iterative mapping so as to avoid difficult assumptions in reference selection especially in the cases where no conspecifics have been sequenced before. In doing so, we generate mitochondrial and ribosomal RNA sequences of 43 cheilostome colonies representing 37 species and 30 genera, with 14 of these from historical samples ranging from 50 to 149 years old (Table S1). As a derivative of the demonstration of our workflow, we present a well-supported cheilostome tree using 65 ingroup taxa and 17 genes, the largest and taxonomically most broadly sampled cheilostome phylogeny to date. Finally, the success of the methodology we employed is highlighted by the circularization of 27 cheilostome mitochondrial genomes, seven of which were from historical samples. This approach emphasizes the potential for analyzing DNA from micro-invertebrate samples stored in natural history collections, especially for phylogenetic reconstruction of many hitherto inaccessible cheilostome genomes.  Twenty-one dried historical cheilostome samples and 29 recently   collected samples, of which 26 were ethanol-preserved and three dried, were targeted (Table 1 and Table S1). We selected the historical samples to represent a spread of collection dates while including those that are already phylogenetically resolved (i.e., their phylogenetic placements are known and/or we have available contemporary samples of their conspecifics or congenerics) and those that are currently enigmatic. Recently, collected samples were selected for their potential for phylogenetic verification of the historical specimens.

| MATERIAL S AND ME THODS
Each colony was subsampled for DNA isolation and scanning electron microscopy (SEM), using a Hitachi TM4040PLus. For microscopy, we bleached subsamples in diluted household bleach for a few hours to overnight, removing soft tissues in order to document skeletal morphology. SEMs of dried samples were taken both pre-and post-bleaching. All physical vouchers necessary for taxonomic verification are stored at the Natural History Museum of Oslo, University of Oslo, and SEMs are available in the Online Supporting Information (SI) as SEM cards familiar to bryozoologists. Additional sample metadata are reported in Table S1.

| DNA isolation, sequencing, assembly, and annotation
DNA from the 21 historical specimens was isolated in a laboratory designed for handling samples with low DNA concentrations (sensi-lab, NHM, Oslo). DNA extractions were performed inside a hood equipped with UV lights and all equipment was bleached and UV-sterilized prior to use. Samples were vortexed twice in nuclease-free H 2 O, air-dried, then subject to UV for 10 min to minimize possible surface contaminants. These treated samples were subsequently crushed with a stainless-steel mortar and pestle (Gondek et al., 2018). To minimize the risk of sample cross-contamination, each historical sample was processed separately prior to DNA isolation. The 29 contemporary samples were isolated in a standard laboratory without the aforementioned surface decontamination and crushing steps.
Genomic DNA for all 50 specimens was isolated using the DNeasy Blood and Tissue kit (QIAGEN). Colonies were homogenized in lysis buffer, using a pestle, in the presence of proteinase-K prior to a 16-hr 56°C incubation period. An on-column RNase A step was additionally performed to obtain RNA-free genomic DNA. DNA was eluted in pre-heated 60°C Tris-Cl buffer (10 mM) and incubated at 37°C for 10 min. Recovered DNA was quantified prior to library preparation using a Qubit 2.0 fluorometer (ThermoFisher).
Sequencing libraries were prepared with the standard KAPA HyperPrep kit (Roche) by the Norwegian Sequencing Centre (NSC) for DNA samples >15 ng. For those with <15 ng DNA, the SMARTer ® ThruPLEX ® DNA-Seq Kit (Takara Bio Inc) was used in the sensi-lab at NHM Oslo (Table S1). The contemporary DNA samples were fragmented to a 350 bp insert size, while no such step was performed on the historical samples with already short DNA length, giving a variable insert size (Table S2). Unique dual indexes (UDIs) were used once per library and any unligated adapter removed. Libraries constructed from the KAPA and SMART preparation methods were each pooled, then separately loaded on two independent flow cells prior to multiplex sequencing with independent runs. Genomic DNA up to 150 bp in read length was pair-end (PE) sequenced on an Illumina HiSeq4000 at the NSC. A blank control, taken during the DNA extraction of the historical samples, was also sequenced (library preparation: SMARTer ® ThruPLEX ® DNA-Seq Kit). Illumina HiSeq reads were quality checked using FastQC v.0.11.8 (Andrews, 2010), then quality-and adapter-trimmed using TrimGalore v0.4.4 with a Phred score cutoff of 30 (Krueger, 2015). Trimmed reads were de novo assembled with SPAdes 3.11.1 (Bankevich et al., 2012) using k-mers of 21,33, 55, 77, 99, and 127. The mitogenome and rRNA operon of each sample were identified separately with blastn (Altschul et al., 1990) using blast + against a database constructed from broadly sampled cheilostome sequences already deposited in NCBI (Table S7). An E-value of 1.00e-185 and maximum target sequence of 1 were used to filter any blast hits of non-cheilostome origin.
We use three current assembly methods and compare their potential for the recovery of mitogenome sequences from our data and subsequent phylogenetic inference of our historical samples. For each historical sample, the SPAdes de novo assembled mitogenome sequence was used as the reference (seed) input for its own iterative mapped assembly using a relatively new but already widely-used method, GetOrganelle (Jin et al., 2020) and another commonly used method, NOVOplasty 3.7 (Dierckxsens et al., 2016), both under default settings. In addition, a mitogenome sequence from a taxon phylogenetically related to the historical sample in question was also provided for a second iterative assembly ( Figure S1). Both iterative methods of GetOrganelle and NOVOplasty work by mapping sequencing reads to a reference, or seed, before de novo extension of contig ends. Circularization (or closure) of each mitochondrial genome was confirmed using blast2 (Altschul et al., 1990), with the same sequence used as query and reference to validate overlap of contig ends. The overlapping mitochondrial sequence region was subsequently trimmed manually.

| Annotation and alignments
Mitogenomes from the three separate assembly methods, for each of the samples, were annotated with Mitos2 using a metazoan reference (RefSeq 89) and the invertebrate genetic code (Bernt et al., 2013) to identify two rRNA genes (rrnL and rrnS) and 13 protein coding genes (atp6, atp8, cox1, cox2, cox3, cob, nad1, nad2, nad3, nad4, nad4l, nad5, and nad6). In addition, two rRNA operon genes (ssu (18s) and lsu (28s)) were identified and annotated using RNAmmer (Lagesen et al., 2007). Further, mitogenes and rRNA operons from 27 bryozoan taxa (Table S6), obtained from NCBI, selected for their potential to verify the success of our workflow for our historical specimens, were aligned with our samples to compile TA B L E 1 Samples generated and analyzed in this study to remove any uncertain characters .
Ambiguously aligned characters were removed from each alignment using Gblocks (Talavera & Castresana, 2007) with least stringent parameters. The single-gene alignments were concatenated to a supermatrix using the catfasta2phyml perl script (Nylander, 2010). An initial mitochondrial supermatrix, consisting of up to five assemblies per sample ( Figure S1), was used to compare and evaluate differences among the three assembly methods via a phylogenetic analysis (see next section). Using only a single assembly method per taxon from this initial supermatrix of historical samples, we then created a downstream final supermatrix. Each sample (i.e., those presented in Table 1) was represented only once in the final supermatrix to which the two rRNA operon genes from the SPAdes de novo assembly were added (Figure 1). The alignments (both masked and unmasked) are available through Dryad (https://doi.org/10.5061/dryad.bk3j9 kd7m).

| Phylogenetic reconstruction
Maximum likelihood (ML) phylogenetic analyses were carried out for each single-gene alignment using the "AUTO" parameter in RAxML v8.0.26 (Stamatakis, 2006) to establish the evolutionary model with the best fit. The general time reversible (GTR + G) was the preferred model for the four rRNA genes (18s, 28s, rrnS, and rrnL), and MtZoa + G for all 13 protein coding genes. The concatenated datasets, divided into rRNA and protein gene partitions, each with its own separate gamma distribution were analyzed using RAxML.
The topology with the highest likelihood score of 100 heuristic searches was chosen. Bootstrap values were calculated from 500 pseudoreplicates.

| RE SULTS
We successfully sequenced and assembled 43 cheilostome colonies representing 37 species from 30 genera ( Note: Genus and species names are given, followed BLEED (numbers) which are numerical tags for the specimens. BLEED stands for Bryozoan Lab for Ecology, Evolution and Development.
Year refers to collection year. An * succeeding the year of collection indicates an air-dried historical sample, defined here as 50 years or older. Multiple historical samples have "<1970" indicated because the collection year was unstated. However, taxonomic identifications were made in 1970 for these specimens, implying that they must have been collected then, or earlier. Abbreviations for countries (approximate locations where the sample was collected) are as follows: NZ = New Zealand, SWE = Sweden, AUS = Australia, NOR = Norway, IT = Italy and UK = United Kingdom. The columns "rRNA" and "Mt" represent the number of rRNA and mitochondrial genes annotated and used in phylogenetic inference, with a maximum of 2 and 15, respectively. The size of the mitogenome, in base pairs (bp), is only shown if closed/circularized, otherwise the cell is labeled as "NO." For an expanded overview of metadata, see Table S1.

*Parasmittina jeffreysi BLEED1202 (1901)
with the utilized blastn parameters (see methods and Table S3). And for one sample, the assembly provided a limited contig number with no cheilostome target identifiable. The negative control provided 3.6 million reads that upon assembly gave 50 contigs >500 bp, with a 1,561 bp maximum. All assembled contigs from the negative control were attributed to contamination from Canis familiaris and Homo sapiens. No contigs from the negative control were attributed to cheilostome contaminants.
The three assembly methods (GetOrganelle, NOVOplasty, and SPAdes) yielded assemblies that resulted in a high level of phylogenetic congruence with one another, where the nodes subtending each sample formed fully supported monophylies ( Figure S1). This result is independent of the initial seed sequence (i.e., the sample itself or a conspecific or another closely related taxon) used for GetOrganelle and NOVOplasty, the two iterative methods. The only discrepancy was that of Porella compressa (BLEED 1,188) assembled using NOVOplasty with P. concinna (BLEED 579) as seed, where BLEED 1,188 grouped with the BLEED 579, rather than itself. Note that the NOVOplasty assembly of BLEED 1,188 using BLEED 579 was not used in downstream analyses that resulted in Figure 1, as only the single most complete assembly for each sample was used for the final phylogenetic reconstruction. The genes that are recovered using different assembly methods varied for multiple specimens. The de novo method using SPAdes recovered the most mitochondrial genes per sample on average ( Figure S1 and Table   S5). This was followed by the iterative method of NOVOplasty, with its results independent of seed input. Lastly, GetOrganelle consistently recovered the fewest mitochondrial genes per sample.
Additionally, GetOrganelle failed to assemble in approximately 40% of cases.
Given the phylogenetic congruence of the assembly methods ( Figure S1), we concatenated genes from these separate assemblies to form a final supermatrix with each sample represented only once (see methods and Table 1) to infer the phylogeny presented in Figure 1. Specifically, the input for each specimen to this supermatrix utilized the assembly method that gave the highest number of annotated genes. With an equal number of recovered genes between assemblies for a given sample, prioritization was as follows ( Figure S1; star indicates assembly utilized in Figure 1): the de novo assembly of SPAdes (9 taxa), before that of NOVOplasty utilizing a SPAdes seed (3 taxa), and lastly NOVOplasty using a seed from a close relative (2 taxa). No assembly from GetOrganelle was used in the final supermatrix. Sequence length, and % missing characters, for each taxon and gene utilized in the main phylogeny (Figure 1) are presented as supporting data (Table S4) In total, mitochondrial genomes of 27 of 43 samples generated from this study were circularized with a size range of 13,700-19,704 bp (Table 1). Of these, seven were from the 14 historical samples, with the oldest being that of Parasmittina jeffreysi (BLEED 1,202, 14,260 bp, collected in 1901; Figure 2).

| D ISCUSS I ON
Continued explorative expeditions and the discovery of yet unknown organisms will provide optimal organic material for DNA F I G U R E 1 The inferred phylogeny of cheilostomes based on 17 genes from historical and recently sampled material. Maximum likelihood topology of 65 cheilostome ingroup taxa and 5 ctenostome outgroup taxa with 8,324 nucleotide and amino acid characters inferred using RAxML (100 heuristic searches and bootstrap of 500 pseudoreplicates). The numbers on the internal nodes are ML bootstrap values (BS from RAxML) followed by posterior probabilities (PP from MrBayes). Circles indicate 100 BP and 1.00 PP. Only BS > 50 and PP > 0.95 are shown, dash indicates values below this. * indicates taxa generated in this study. Bold font indicates historical samples with sampling year in brackets. Clade X (blue), Clade Y (red), and Clade X (brown) discussed in the supporting text are highlighted. See Table S3 for genes available  and Table S4 for % missing characters for each taxon sequencing, and subsequent analyses of extant species. However, natural history collections are unique sources that can provide historical material for organisms that may be difficult or impossible to sample from contemporary populations. Experimentation with laboratory techniques and bioinformatic tools of such historical samples can expand the information space for our general understanding of the biology, including the genetics and phylogenetics, of both welland under-studied species. In this study, we expanded on our knowledge of a small-bodied and under-studied phylum, by applying HTS and a combination of de novo and iterative assembly methods. We demonstrated that historical samples can be used to boost inference of the phylogenetic relationships among a micro-invertebrate group that is challenging to work with, namely bryozoans.

| Overcoming challenges of sequencing understudied groups
While the cheilostome molecular tree is very far from complete (Fuchs et al., 2009;Knight et al., 2011;Waeschenbach et al., 2012), we have contributed to this community endeavor by sequencing a number of species that have never been sequenced before and by demonstrating that it is possible, without much extra effort, to extract sequence data of many genes from old, air-dried samples.
We utilized de novo and iterative methods to assemble our cheilostome mitochondrial sequence data to overcome the challenges of reference-based assemblies, due to the high degree of observed sequence variability among cheilostomes. This is especially important because systematic relationships remain largely based on morphological characters for cheilostomes (Bock & Gordon, 2013) which we know can sometimes be misleading ; hence, reference choice is fraught with difficulties. By using de novo assembled sequences from the target colony itself as a reference for iterative mapping, we circumvent the difficulties of reference choice. To validate our approach, we also explored the robustness of the phylogenetic placement by using both nominal conspecifics and congenerics as references where available ( Figure   S1). Encouragingly, we find that historical samples can be used as their own reference for iterative mapping assemblies, at least for the samples we studied. Phylogenetic comparison of these assemblies consistently demonstrated fully supported monophylies, independent of method and the supplied reference. While SPAdes (de novo), in general, recovered the most complete assemblies and mitochondrial genes, the output of NOVOplasty (iterative) was almost comparable, and better in some cases. We propose that the iterative method employed in NOVOplasty could supplement that of the de novo method, beckoning integration into mitogenome assembly pipelines, and removing the necessity for   (Freudenthal et al., 2019;Oliveira et al., 2020) and combined for plastid genome construction (Ma & Lu, 2019;Tan et al., 2019;Yan et al., 2019). Results from these previous comparative analyses, in contrast to our own, indicate GetOrganelle may be more suited to chloroplast assemblies. Finally, the workflow we describe in the present study proved suitable for circularizing mitochondrial genomes (27 samples, ranging from 13,700 to 19,704 bp; Table 1), including from historical samples >100 years old (Figure 2). While detailed mitogenome comparisons are beyond the scope of this current study, they are now available for further genome evolution research.

| Phylogenetic inference, missing data and taxon sampling
The phylogeny presented here is the most broadly taxonomically sampled and resolved cheilostome tree to date. The statistical support of the majority of our branches (Figure 1) Table S4 and Figure 1), indicating that the available data are phylogenetically informative. A broad taxon sample combined with a relatively low proportion of total missing characters in the final supermatrix (27.6%, see Table S4) has contributed to the high degree of phylogenetic resolution observed (Wiens & Morrill, 2011).
The recent introduction of HTS and genome-skimming to bryozoology has brought overdue support to phylogenetic relationships, but this approach is not without its challenges . Any given cheilostome bryozoan colony (sample) usually lives in close proximity with other fouling/encrusting organisms, including other bryozoan species, so contamination is always a concern when isolating and amplifying target DNA, independent of sample age. We alleviated this challenge by using phenotypic information from our physical and SEM vouchers (SI). Further, we employ a bioinformatic pipeline with robust filtering steps to eliminate sequences from non-target contaminants . Lastly, inclusion of contemporary conspecifics and congenerics in our study confirm phylogenetic placement of the targeted bryozoans (Figure 1). Microporidae (Gray, 1848;Jullien, 1888). Secondly, the Myriapora cox1 gene has previously been amplified utilizing barcoding primers (Cahill et al., 2017). We can therefore confirm the placement of Myriapora (BLEED1197) in the main phylogeny ( Figure 1) based on 17 genes, by showing its cox1 affinity to that of Myriapora cox1 from NCBI ( Figure S2). Lastly, the position of Tessaradoma (BLEED1183), whose taxonomic identity is confirmed by its SEM voucher, as sister to Microporella remains a working hypothesis until increased taxon sampling can either confirm or reject this position. We emphasize that nothing can substitute for the continued accumulation of broadly taxonomically sampled sequence and vouchered morphological data for aiding phylogenetic understanding. With the exception of Tessaradoma (see above), the confirmed phylogenetic placement of all of our 14 historical samples supports that any effect of possible non-target contaminant in the tree (Figure 1) can be excluded. In our study, the negative control provided only a limited read number that upon assembly gave few and short contigs attributed to mammal DNA, which were filtered out during the assembly pipeline. Importantly, no cheilostome DNA was observed in the negative control.  (Short et al., 2018;Yeates et al., 2016). Many more such collections exist throughout the world and our relatively simple workflow widens the possibilities of extracting useful sequence information from many more under-studied phyla and small-bodied invertebrates, beyond cheilostome bryozoans. This work illustrates the importance of natural history collections, and the need to store and maintain these for future generations and technological advances.

ACK N OWLED G EM ENTS
We thank two anonymous reviewers for their constructive com-

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVAILABILITY STATEMENT
Sequences generated and used in this study have been submitted to