Springtail phylogeography highlights biosecurity risks of repeated invasions and intraregional transfers among remote islands

Abstract Human‐mediated transport of species outside their natural range is a rapidly growing threat to biodiversity, particularly for island ecosystems that have evolved in isolation. The genetic structure underpinning island populations will largely determine their response to increased transport and thus help to inform biosecurity management. However, this information is severely lacking for some groups, such as the soil fauna. We therefore analysed the phylogeographic structure of an indigenous and an invasive springtail species (Collembola: Poduromorpha), each distributed across multiple remote sub‐Antarctic islands, where human activity is currently intensifying. For both species, we generated a genome‐wide SNP data set and additionally analysed all available COI barcodes. Genetic differentiation in the indigenous springtail Tullbergia bisetosa is substantial among (and, to a lesser degree, within) islands, reflecting low dispersal and historic population fragmentation, while COI patterns reveal ancestral signatures of postglacial recolonization. This pronounced geographic structure demonstrates the key role of allopatric divergence in shaping the region's diversity and highlights the vulnerability of indigenous populations to genetic homogenization via human transport. For the invasive species Hypogastrura viatica, nuclear genetic structure is much less apparent, particularly for islands linked by regular shipping, while diverged COI haplotypes indicate multiple independent introductions to each island. Thus, human transport has likely facilitated this species’ persistence since its initial colonization, through the ongoing introduction and inter‐island spread of genetic variation. These findings highlight the different evolutionary consequences of human transport for indigenous and invasive soil species. Crucially, both outcomes demonstrate the need for improved intraregional biosecurity among remote island systems, where the policy focus to date has been on external introductions.


DArT-Seq SNP library preparation and quality filtering
Genomic DNA extraction and sequencing of the springtails Hypogastrura viatica and Tullbergia bisetosa was carried out at Diversity Arrays Technology (DArT) in Canberra, Australia. DArT employ DArT-SeqTM sequencing technology, which is a reduced representation next-generation sequencing method similar to double-digest restriction associated DNA sequencing (ddRAD). Following preliminary testing, the enzyme combination PtsI -HpaII was chosen for both springtail species for the double-digestion of genomic DNA. Digestion/ligation reactions (see method described in Kilian et al., 2012) were then used to ligate specific adaptors to both restriction enzyme overhangs. These adaptors facilitate sample identification (using unique barcodes) and Illumina sequencing, in a manner similar to that described in Elshire et al. (2011). Fragments were amplified in a PCR using the following conditions: 1 min at 94°C for denaturation; 30 cycles each consisting of 20 s at 94°C, 30 s at 58°C, 45 s at 72°C, followed by a final extension of 7 min at 72°C. Equimolar amounts of amplified product for each sample were then pooled and sequenced using an Illumina Hiseq2500.
Raw Illumina data were processed using a proprietary DArT pipeline. Sequences were processed to filter out bad quality reads (minimum read Phred score of 10), with more stringent filters applied to barcode regions of the fragments (minimum Phred score of 30) to ensure sequences would be assigned to samples reliably. Fragments were trimmed to 69 bp, and a second proprietary DArT pipeline (DArTsoft14) was used to call candidate SNP markers and conduct initial filtering. During this phase, SNPs were filtered for >90% reproducibility (informed by technical sample replicates), 20% call rate and a minimum read depth of 5.
The raw SNP data files generated by DArT for each species were imported as genlight objects into R v.3.4.3 for additional quality filtering using the 'dartR' package v.1.0.5 (Gruber et al., 2018). While the DArTsoft14 pipeline ensured that SNPs had a call rate of at least 20% (i.e. were present in at least 20% of individuals), call rates in the genomic literature are typically higher, ranging from 50% to 100%. There is growing evidence that less stringent filtering (lower call rates) provides greater phylogenetic resolution despite incurring more missing data (e.g. Díaz-Arce et al., 2016;Wagner et al., 2013), therefore, we chose to filter our SNPs to a moderate call rate of 80%. Secondary reads (those called from the same fragment as another SNP) were excluded to minimize genetic linkage, while further screening ensured that the minor allele frequency was ≥2% globally (across all geographic sites) or ≥20% locally (within a site). Following this, individuals with ≥50% missing data were removed and any loci that became monomorphic as a result were also removed. The SNP dataset sizes at each filtering step are provided in Table S4.

Table S12
Number of SNPs (out of 7,275) with fixed allelic differences for each pair of populations of Hypogastrura viatica. None of the values were found to be statistically significant, based on a false positive rate determined by 5,000 simulations.

Figure S1
PCoA for Tullbergia bisetosa based on a conservative dataset of 4,914 SNPs (excluding loci putatively under selection or violating Hardy Weinberg Equilibrium: see main text). The first two principal coordinate axes explain 44.7% of genetic variation. Individuals are coloured by population as shown.

Figure S2
PCoA for Hypogastrura viatica based on a conservative dataset of 5,281 SNPs (excluding loci putatively under selection or violating Hardy Weinberg Equilibrium: see main text). The first two principal coordinate axes explain 12.5% of genetic variation. Individuals are coloured by population as shown.

Figure S3
3D PCA for Tullbergia bisetosa based on 5,680 SNPs. The three axes shown explain 49.1% of genetic variation. Individuals are coloured by population. Figure S4 fastSTRUCTURE plot of K=7 identified clusters for Tullbergia bisetosa, based on 5,680 SNPs. Each vertical bar shows the degree of membership of an individual to one of seven clusters, represented by colours. The geographic source populations for individuals are indicated below the plot.

Figure S5
3D PCA for Hypogastrura viatica based on 7,275 SNPs. The three axes shown explain 14.1% of genetic variation. Individuals are coloured by population. Figure S6 fastSTRUCTURE plot of K=3 identified clusters for Hypogastrura viatica, based on 7,275 SNPs. Each vertical bar shows the degree of membership of an individual to one of three clusters, represented by colours. The geographic source populations for individuals are indicated below the plot.