Untangling taxonomy: a DNA barcode reference library for Canadian spiders

Approximately 1460 species of spiders have been reported from Canada, 3% of the global fauna. This study provides a DNA barcode reference library for 1018 of these species based upon the analysis of more than 30 000 specimens. The sequence results show a clear barcode gap in most cases with a mean intraspecific divergence of 0.78% vs. a minimum nearest‐neighbour (NN) distance averaging 7.85%. The sequences were assigned to 1359 Barcode index numbers (BINs) with 1344 of these BINs composed of specimens belonging to a single currently recognized species. There was a perfect correspondence between BIN membership and a known species in 795 cases, while another 197 species were assigned to two or more BINs (556 in total). A few other species (26) were involved in BIN merges or in a combination of merges and splits. There was only a weak relationship between the number of specimens analysed for a species and its BIN count. However, three species were clear outliers with their specimens being placed in 11–22 BINs. Although all BIN splits need further study to clarify the taxonomic status of the entities involved, DNA barcodes discriminated 98% of the 1018 species. The present survey conservatively revealed 16 species new to science, 52 species new to Canada and major range extensions for 426 species. However, if most BIN splits detected in this study reflect cryptic taxa, the true species count for Canadian spiders could be 30–50% higher than currently recognized.


Introduction
With 45 000 known species in 114 families, spiders are one of the most diverse orders of arthropods, a fact which stimulated the development of an online taxonomic catalogue (World Spider Catalog 2014). It now tracks the description of every new species, providing an authoritative resource for valid names. However, it does not simplify their identification. In common with other groups of arthropods, many species of spiders are difficult to identify, especially the juveniles that often dominate collections from seasonal environments (Foelix 2011). In fact, because most keys are only effective for adults and often just for males, many individuals cannot be identified to a species level through morphological analysis. Because of this difficulty, biotic surveys frequently neglect spiders, despite their diversity and important role as predators in terrestrial ecosystems (Freitas et al. 2013).
Studies over the past decade have revealed the efficacy of DNA barcoding, the analysis of sequence diversity in the 5 0 region of the cytochrome c oxidase I gene, as a tool for the identification of animal species (Hebert et al. 2003). Insects have been the target for much of this work, especially members of the order Lepidoptera. These analyses have shown that 95-100% of the species in regional faunas can be discriminated (Hajibabaei et al. 2006;Hebert et al. 2009;Hausmann et al. 2011). Recent work has tested the impact on barcode resolution of expanding from a regional to continental scale. Barcodes discriminated all 75 Australian species in the family Sphingidae regardless of their collection site (Rougerie et al. 2014), while 1000 species of Lepidoptera showed only a small reduction in identification success in comparisons involving Austria and Finland (Huemer et al. 2014). Further studies have confirmed that DNA barcoding is effective in other insect orders. For example, the analysis of 1500 Finnish and 3500 German species of Coleoptera revealed 99% success in their identification (Hendrich et al. 2014;Pentinsaari et al. 2014).
The effectiveness of DNA barcoding as a tool for the identification of spiders has seen less evaluation. Barrett and Hebert (Barrett & Hebert 2005) found that DNA barcoding discriminated all 168 species of Canadian spiders that they examined, but their sample sizes were small. A subsequent study that analysed nearly 3000 specimens from a site in the Canadian Arctic indicated that each of the 198 species at this locality possessed a diagnostic array of barcode sequences (Blagoev et al. 2013). Similar results were obtained for 63 species of Alaskan spiders (Slowik & Blagoev 2012) and 361 species of Canadian spiders (Blagoev et al. 2009). Although these studies have established that DNA barcodes are very effective in identifying known species, they also revealed possible overlooked species, as evidenced by many cases of deep intraspecific divergence (Barrett & Hebert 2005;Blagoev et al. 2009). The spiders in other regions have seen less investigation. Astrin et al. (Astrin et al. 2006) reported the strong performance of DNA barcoding in identifying 52 European species in the family Pholcidae, while Miller et al. (2013a) found that barcodes discriminated 31 species in the Netherlands.
Candek and Kuntner (2014) established that the effectiveness of DNA barcodes for identification was not impeded by geographic variation in a study on 20 species of spiders whose ranges span Europe and North America. These overall results have led to consensus that a comprehensive DNA barcode reference library will advance spider biology in important ways. For example, Miller et al. (2013b) showed how it could provide new insights into the parasitoids which attack spiders. A well-parameterized library will also facilitate the use of next-generation sequencing platforms to ascertain the species composition of mass collections (Hajibabaei et al. 2011;Yu et al. 2012;Ji et al. 2013). Because the latter advance will enable the inclusion of spiders in large-scale biotic surveys, while the former will reveal new details on their ecological interactions (Wirta et al. 2014), there is a need for further library construction.
This study reports progress towards the assembly of a DNA barcode reference library with 109 or higher coverage for all Canadian spiders. It represents a major taxonomic and geographic extension of earlier work which targeted a limited number of species and only examined them at a few localities. The goal of 109 coverage was established to permit the analysis of intraspecific variation in barcode sequences. Prior taxonomic studies have indicated the magnitude of the challengeabout 1460 species of spiders are known from Canada (Dondale 1979;Bennett 1999;Paquin et al. 2010; Dup err e 2013). These taxa include representatives from two of the three suborders and from 46 of the 114 families of spiders. Although this study does not deliver a complete reference library for Canadian spiders, it does provide an average of 209 coverage for twothirds of the fauna, including nearly all common species. As a result, it creates a highly functional identification system for this biotic assemblage. The study has also provided a better understanding of the importance of cryptic diversity in Canadian spiders by ascertaining the incidence of species which are assigned to two or more BINs (Ratnasingham & Hebert 2013) and presents an inexpensive pathway that can speed the registration of new species.

Specimen collections
Work began with the collection of spiders from sites across Canada. In contrast to many other groups of terrestrial arthropods where museum specimens are held dry on pins allowing barcode recovery for several decades , spiders are typically stored in 70-80% alcohol at room temperature, conditions which lead to rapid DNA degradation (Vink et al. 2005). As a result, barcode recovery is difficult from specimens more than 5 years old, especially small-bodied species (Miller et al. 2013a). Because most of its spiders were collected prior to 2000, the Canadian National Collection of Insects, Arachnids and Nematodes (Ottawa) contributed few records to this study. However, collections at the Royal BC Museum (Victoria) and the Lyman Entomological Museum (Montreal) provided 8.3% and 0.7% of the specimens examined, reflecting their recent involvement in large-scale surveys. The remaining specimens (91%) were acquired through sampling programs led by the Biodiversity Institute of Ontario that employed diverse methods including hand collecting, sieving, sweep netting and trapping (Malaise, pan, pitfall, sticky) at sites across Canada (Fig. 1).

Specimen identifications
All specimens were identified or verified by GAB through the genitalic examination of all adults for each species when they were available. Determination often occurred following barcode analysis, once the specimen was assigned a BIN (see below). In this way, DNA barcoding not only identifies specimens that already exist in the reference library, but it also limits the downstream work of specialists by sorting the remaining specimens into operational taxonomic units that require examination (deWaard et al. 2009). Furthermore, it often guides the specialist in the determination, providing higher taxonomic assignments and phylogenetic affinities. Morphological identifications were based on a comprehensive set of reference publications on the taxonomy of North American spiders with species nomenclature following the World Spider Catalog (2014). Collection data and a taxonomic assignment for each specimen and often a photograph are available in the Barcode of Life Data Systems (BOLD, www.boldsystems.org) (Ratnasingham & Hebert 2007) in the public data set, 'Spiders of Canada' at DS-SOC2014 (doi:10.5883/DS-SOC2014).

DNA sequence analysis
A total of 30 679 specimens were sequenced (Table 1). The interval between collection and sequencing ranged from a few days to 35 years, but most specimens were analysed within 2 years ( x = 414 days, r = 377). Tissue lysis, DNA extraction, PCR amplification, cycle sequencing and sequence analysis were performed at the Canadian Centre for DNA Barcoding (CCDB; www.ccdb.ca) employing standard protocols (Ivanova et al. 2006;deWaard et al. 2008;Hebert et al. 2013). For most samples, the primer cocktails of C_LepFolF and C_LepFolR (Folmer et al. 1994;Hebert et al. 2004) were used for both PCR and bidirectional sequencing of the barcode region. The residual DNA extracts are stored in the DNA archive at the CCDB where they are available for additional study. Sequences, electropherograms and primer details for each specimen are accessible on BOLD (doi:10.5883/DS-SOC2014) and Gen-Bank (see Table S1, Supporting information for accession numbers).

Barcode index numbers
After their upload to BOLD, all sequences >500 bp were assigned a BIN, an interim taxonomic system which aggregates similar barcode sequences into a BIN (Ratnasingham & Hebert (2013)). BOLD currently contains over 375K BINs (December 2014) generated using the Refined Single Linkage (RESL) algorithm which employs a threephased analysis to reach decisions on the number of BINs (= OTUs) in the overall sequence data set on BOLD (Ratnasingham & Hebert 2013). In contrast to some other approaches employed for OTU designation, such as Automatic Barcode Gap Discovery (Puillandre et al. 2012), its outcome is deterministic. It is also much faster than other approaches, such as the generalized mixed Yule-coalescent model (Pons et al. 2006;Fujisawa & Barraclough 2013), a critical requirement for the analysis of large data sets (see Ratnasingham & Hebert 2013 for details of algorithm details and comparisons).
Although past studies have not employed postprocessing of BIN results, we assessed the robustness of each of the 1359 BINs detected in this study in the following fashion. The overall sequence alignment was exported from BOLD and partitioned phylogenetically into two alignments of approximately equal size, each including all members for a particular set of families. A NJ tree for each alignment was generated in MEGA5 (Tamura et al. 2011) using the K2P distance model, and bootstrap values for each node were determined based on 100   (20), AMPambush predators (19), ACP --active predators (6). ‡Llarge (9), Mmedium (31), Ssmall (6).
replicates. With BINs mapped on the NJ tree (using 'groups' in MEGA5), each BIN was assigned the boostrap value for the terminal branch leading to its members. In the few cases where a BIN was paraphyletic, a value of N/A was recorded. BINs with bootstrap values greater than 60% were viewed as well supported, while those with lower values were not. Although this postprocessing was particularly directed towards evaluating the significance of the species involved in BIN splits, all species were analysed.

Geographic variation in sequences
Analysis of geographic variation was restricted to the 339 species that were represented by at least 10 specimens collected from sites at least 1000 km apart based on the shortest path connecting all sampling coordinates using the minimum spanning tree algorithm (Prim 1957). The correlation between the distance matrices for geographical distance and sequence divergence was determined for each species using the Mantel test (Mantel 1967). Species lacking a significant correlation (P > 0.05) were excluded from subsequent analysis. The slope of the relationship between sequence divergence and geographic distance per kilometre was calculated separately for each species with a significant correlation, and this was used to estimate the rise in sequence divergence per 1000 km. For example, if the slope of the relationship between sequence divergence and distance was 0.002% per kilometre, this would indicate a 2% increase in sequence divergence per 1000 km.

Sequence analysis
Barcode sequences were recovered from 89% (27 269/ 30 679) of the specimens that were analysed (Fig. S1, Supporting information). Although the primer sets were effective across the order, there was variation in amplification success among families (v 2 = 248.4; P < 0.001) with relatively low recovery from species in two families (Cybaeidae, Pholcidae). There was no evidence of nuclear copies of mitochondrial DNA (NUMTs), and the incidence of nontarget sequences from Proteobacteria, such as Wolbachia, was trivial as noted previously in spiders (Smith et al. 2012). Largely because of differing success in specimen acquisition, taxon coverage varied among the 46 families of spiders (Table 1). Six families lack records, but they collectively include just seven species, while the other 39 possess coverage ranging from 33 to 100% ( x = 67%) of their Canadian species.
The 27 269 sequences comprised 16 622 distinct haplotypes; most haplotypes were only represented once, but 3441 were recovered from multiple individuals (range = 2-91). Intraspecific divergences averaged 0.78%, while mean distance to the NN taxon was 10-fold higher, averaging 7.85% (Table 2). As a consequence, there was a clear barcode gap for most species (Fig. 2).

BIN analysis
The BIN count was substantially higher than the species count (1359 vs. 1018), but there was only a weak relationship (R 2 = 0.04, P < 0.001) between the BIN count for a species and the number of specimens analysed (Fig. 3), after the three outlier species were removed. The BIN/ species ratio for the 40 families varied from 0.93 to 2.74. Although it is well established that the proportion of OTUs that cannot be assigned to a named species increases for smaller-bodied taxa, there was no evidence that the B/S ratio was linked to size as small-bodied families showed a ratio similar to that for large-bodied families (Fig. 4a). The B/S ratio was significantly higher for families whose component species had broad vs. narrow distributions, but the coefficient of determination was just 16% (Fig. 4b).
There was a perfect correspondence between the specimens assigned to 795 of the BINs and the members of a particular species. The remainder of the BINs included a few cases where members of two or more species shared a BIN (merge) and many more cases where the members of a species were assigned to more than one BIN (split).

BIN merges
BIN merges were uncommon as just 15 BINs included representatives of two or more species. These cases involved 34 species, all morphologically similar congeners. Twelve of the 15 BINs included a pair of species, two others a triad, while the final case involved four species. Barcode divergences (ranging from 0.46% to 1.30%) permitted discrimination of the species in six of these pairs and one triad (Table 3). Species in the other six pairs and the other triad could not be discriminated because their component taxa shared barcodes. Three of the four species in the tetrad could be separated, but two species shared barcodes. As a consequence, 998 of the 1018 species (98%) possessed a unique array of barcode sequences permitting their identification, while the others could be assigned to a species pair or, in one case, a triplet.

BIN splits
BIN splits were far more common than merges, being detected in 197 species (Table S2, Supporting information). Maximum sequence divergences between the BINs assigned to a currently recognized species averaged 4.32%, but ranged from 1.24% to 12.11%. Many species represented by multiple specimens were placed in either a single BIN or a few, but three (Grammonota angusta Dondale 1959  relationship between the BIN count for a species and the number of specimens analysed for species with a split (Fig. 6b), but the coefficient of determination was not strong (R 2 = 0.19). Examination of the three outlier species revealed large disparities in the abundance of their component BINs, often coupled with differing distributions as discussed below.
Grammonota angusta. Described in 1959 from Nova Scotia, this Nearctic species is widely distributed in Canada (Paquin et al. 2010) and has no recognized synonym (Fig. 7a). Two of its 22 BINs were common (100, 252 records), while three were of intermediate abundance, being represented by 9-11 specimens. The other 16 BINs possessed five or fewer records, including eight represented by just a single specimen. The most common BIN had the broadest distribution, occurring in all eastern provinces as well as in Manitoba, Saskatchewan and Alberta. By contrast, the remaining BINs were restricted to eastern Canada, none occurring west of Ontario. Small sample sizes made it difficult to determine the ranges of these uncommon BINs, but several were collected from distant locales (e.g. Nova Scotia & Ontario). Inspection of all sequences indicated that most substitutions were confined to third codon positions, the pattern of variation expected if the variants represent the authentic mitochondrial gene rather than a NUMT. Twelve of the 14 BINs in G. angusta represented by two or more specimens possessed high bootstrap values.
Tetragnatha versicolor. Reported from Central America to Arctic Canada, T. versicolor is one of the most widespread species of Tetragnathidae (Dondale et al. 2003) (Fig. 7b). Because of its morphological variability, including differences in genitalic morphology (Levi 1981), T. versicolor is very likely to represent a species complex. As seven species have been assigned synonymy with it (World Spider Catalog 2014), some of the BINs detected in this study may represent named taxa. Two of the 20 BINs in T. versicolor were common (75, 125 records), while six others were represented by 10-53 specimens. The other 12 BINs were represented by five or fewer specimens. Bootstrap analysis indicated that 12 of the 14 BINs with two or more members had high support. Although some BINs represented by large numbers of specimens were restricted to either western (BOLD: AAB8024) or eastern (BOLD:AAN6690) Canada, several others had transcontinental distributions, meaning that several BINs of T. versicolor were present in most provinces.
Evarcha hoyi. Occurring across southern Canada, E. hoyi has at least one and possibly several synonyms (Marusik & Logunov 1998; World Spider Catalog 2014) (Fig. 7c). Two of its 11 BINs were common (18 records each), while the others were represented by just 1-4 specimens. One of the common BINs was restricted to British  Columbia, while the other was collected at sites across Canada. Three of the four BINs represented by several specimens were only collected in one province (two in British Columbia, one in New Brunswick), but the other was found in both Ontario and New Brunswick. Six of the seven BINs in E. hoyi with multiple specimens possessed strong bootstrap support.

Geographic variation in barcode sequences
Although nearly 28 000 barcode records were obtained, many of the species and BINs were represented by too few specimens or were collected from too few localities to permit analysis of the relationship between geographic separation and barcode sequence divergence. Analysis examined the 339 species represented by 10 or more specimens collected from sites at least 1000 km apart. Among these species, 188 did not show a significant correlation between geographical distance and sequence divergence, while the other 151 did (P < 0.05). The latter taxa included 103 species assigned to a single BIN and 48 placed in two or more BINs (Fig. 8). The median increase in sequence divergence in these 151 species averaged just 0.14% per 1000 km. Only eight species showed more than 1% sequence divergence per 1000 km, and each may be a species complex as their component specimens were assigned to two or more BINs.

Species diversity
The generation of a barcode reference library for Canadian spiders revealed 16 undescribed species as evidenced by their marked sequence and morphological divergence from known members of their genus. One of these species has now been described (Alopecosa koponeni, Blagoev & Dondale 2014), and descriptions are in progress for the others. As well, 52 species were newly recorded for Canada, generating first records for seven genera and one family (Table 4). Although most of these species were initially recognized through morphology, six species were only revealed when it was noted that their barcodes matched Eurasian specimens. A total of 426 new species records were added for provincial or territorial faunas (Fig. 9), in one case (Prince Edward Island) doubling the species count. In fact, the barcodedetermined species occurrence records tripled the publically available data on Canadian spiders (GBIF, accessed November 2014), illustrating the value of data releases based on DNA barcodes (Fernandez-Triana et al. 2014).
Results for the genus Neriene (Linyphiidae) illustrate the way in which DNA barcodes can reveal overlooked taxonomic diversity. Morphological study indicated the presence of all six species of Neriene reported from Canada as well as a single specimen divergent from any known taxon, a fact reinforced by its distinct barcode (Fig. 10). Specimens in five of the known species were assigned to a single BIN, but those of N. clathrata (Sundevall 1830) were placed in two BINs with a minimum divergence of 9.69%: one represented by six specimens from Ontario and the other by 19 from several provinces. N. clathrata is broadly distributed in Canada (Paquin et al. 2010), but was described from Europe so its range is Holarctic. The individuals in the BIN collected at sites across Canada (BOLD:AAA8358) showed close barcode similarity to European specimens, while those in the other BIN, BOLD:AAB7327, showed marked sequence divergence (Fig. 10). Morphological study (Fig. S2, Supporting information) indicated that males of the widely distributed BIN show close similarity to European N. clathrata (Nentwig et al. 2014), while those of the other BIN resemble the male of Neriene waldea Chamberlin & Ivie 1943, a species synonymized with N. clathrata nearly 50 years ago (van Helsdingen 1969).

Automated taxonomic placements of BINs
Although some specimens in this study could not be assigned to a known species, each of the 1359 BINs was correct family (Table 5). Most (94%) assignments were also correct for NN distances from 10 to 15%, but BINs with higher NN distances were often misassigned.

Discussion
The present study has assembled DNA barcode sequences for 1018 species of spiders, two-thirds of the Canadian fauna. Because 98% of these species possessed a diagnostic array of sequences, the present DNA barcode reference library is highly effective in identifying Canadian spiders. In those few cases where it fails to deliver a species-level identification, it assigns a speci-men to a few closely related species. The development of this barcode library has codified the expertise of the leading taxonomists working on Canadian spiders in a format that will allow more researchers to explore the diversity of this group. This study revealed that nearest-neighbour distances were 109 higher for Canadian spiders than intraspecific divergences (7.85% vs. 0.78%). These measures of variation presume that the current taxonomic system is valid and that all specimens have been correctly identified, requirements impossible to fully satisfy without further study. The few species lacking barcode divergence need to be investigated to validate their reproductive isola- Breadth of Sampling (shortest path between sampling sites) % Divergence per 1000 km Fig. 8 Increase in sequence divergence in the COI barcode region with geographic distance for species represented by >10 specimens and collected at least 1000 km apart as measured by a minimum spanning tree (see Materials & Methods for details). This analysis only plots the 151 species that showed a significant (P < 0.05) positive relationship between sequence divergence and geographic distance. greater need to understand the factors responsible for the many cases of deep intraspecific sequence divergence which led 16.5% of the species to be placed into two or more BINs. Because of the strong correspondence between BINs and traditionally recognized species (Ratnasingham & Hebert 2013;Huemer et al. 2014;Pentinsaari et al. 2014;Zahiri et al. 2014), many of these BIN splits may represent species overlooked by the current taxonomic system, but other explanations are  Neighbour-joining tree for six named and one unnamed species of Neriene from Canada. As N. clathrata was assigned to two BINs, sequences from 23 European specimens were included for comparison. Frontinella communis is a closely related outgroup.
higher than that reported in prior work on other groups of arthropods (e.g. noctuoid Lepidoptera -9.1% (Zahiri et al. 2014); Coleoptera -5.0-5.7% (Hendrich et al. 2014;Pentinsaari et al. 2014)), but reinforces earlier evidence for the prevalence of deep intraspecific divergences in spiders (Barrett & Hebert 2005;Blagoev et al. 2009Blagoev et al. , 2013. In fact, maximum sequence divergence exceeded 4% in 95 of the 876 species represented by multiple individuals and more than 2% in 322 of these taxa. The present study revealed three species with exceptional BIN diversity. For example, specimens of Tetragnatha versicolor possessed up to 9.69% divergence and were assigned to 20 BINs which showed evidence of differing geographic distributions. Because three to five lineages co-occur at many sites across Canada, tests for their reproductive isolation in sympatry should be possible. Although no other studies have aimed to quantify the incidence of terrestrial arthropods with high BIN diversity, other cases have been detected. For example, one species of Canadian noctuoid among 1531 species was assigned to 16 BINs (Zahiri et al. 2014). The present study revealed 16 undescribed species with clear barcode and morphological divergence from any known species, but they may only represent the tip of a taxonomic iceberg. Future studies need to examine the many species of Canadian spiders assigned to more than one BIN to ascertain if most of these cases represent a cryptic species complex. These follow-up studies could involve more detailed morphological investigation, and perhaps the inclusion of one or more nuclear markers to test for reproductive isolation. In addition, these analyses should exclude alternative explanations for the sequence divergence patterns, such as geographic structure, biased variation induced by maternally transmitted parasites (Kvie et al. 2013), heteroplasmy (Frey & Frey 2004) or coamplification of pseudogenes (Song et al. 2008). If these follow-up studies similarly corroborate the barcode results (e.g. Neriene clathrata, Fig. S2, Supporting information), this would imply that 30% of the fauna has been overlooked. Moreover, because half of the species (536 of 1018) examined in this study were represented by fewer than 10 specimens, it is likely that many barcode splits await detection. Expanded sample sizes might well reveal that the current species count for Canadian spiders is a 50% underestimate.
At least 40 000 species of spiders are thought to await description, a task whose completion has been estimated to cost $500M and to require another half century (Platnick & Raven 2012). By suggesting that the current taxonomic system overlooks many closely related species, the present study implies that the actual number of spiders awaiting description is larger, potentially raising costs and time. However, by showing the close correspondence between BINs and species, this study has revealed a pathway that can speed the registration of new species and lower the cost of such work. The Canadian spider fauna is better known and less diverse than many other regions, but expect this pathway to be universal. There is no evidence to suggest that the results would be markedly different in more diverse areas, or regions with fragmented and complex habitats, but this remains to be tested. The present results indicate that nearly 80% of the sequence clusters recognized through BIN analysis coincide with recognized species. Moreover, the present study shows that new BINs can gain automated taxonomic placements, as evidenced by the accuracy of family assignments in cases where NN divergences were less than 10%. As the barcode reference library gains parameterization, fewer species will exceed this threshold, and more newly encountered BINs will also be deeply embedded in clades comprised of species belonging to a particular subfamily or a genus, allowing the automation of more precise taxonomic placements. Given the accelerating pace of species loss, the need for speed in mapping species distributions and registering species diversity is obvious. The synoptic approach to species diversity provided by BINs can hasten biodiversity evaluations at precisely the time when this capacity is critically needed. ica Young) who played a critical role in making the collections, in sorting samples and in carrying out sequence and data analyses. Finally, we wish to thank the three anonymous reviewers whose comments greatly improved our manuscript.

Ethics statement
All animal work was conducted according to relevant national and international guidelines. No specific permissions are required to work with invertebrates in Canada. For field collection, no specific permissions are required for the collection of spiders from public areas.

Data accessibility
All collection and sequence data are available on BOLD (www.boldsystems.org) (Ratnasingham & Hebert 2007) in the public data set DS-SOC2014 (doi:10.5883/DS-SOC2014). The distance matrices for Fig. 8, the COI sequence alignment and the BOLD neighbour-joining tree (as both PDF and Newick files) have all been uploaded to DRYAD (doi:10.5061/dryad.21364). The sequence and primer data are also available on Gen-Bank (see Table S1, Supporting information for accession numbers).

Supporting Information
Additional Supporting Information may be found in the online version of this article: Fig. S1 BOLD neighbour-joining tree (K2P) for all 27,269 COI sequences >500 bp from spiders collected in Canada.