Insights into invasive species from whole- genome resequencing

Studies of invasive species can simultaneously inform management strategies and quantify rapid evolution in the wild. The role of genomics in invasion science is increasingly recognised, and the growing availability of reference genomes for invasive species is paving the way for whole- genome resequencing studies in a wide range of systems. Here, we survey the literature to assess the application of whole- genome resequencing data in invasion biology. For some applications, such as the reconstruction of invasion routes in time and space, sequencing the whole genome of many individuals can increase the accuracy of existing methods. In other cases, population genomic approaches such as haplotype analysis can permit entirely new questions to be addressed and new technologies applied. To date whole- genome resequencing has only been used in a handful of invasive systems, but these studies have confirmed the importance of processes such as balancing selection and hybridization in allowing invasive species to reuse existing adaptations and rapidly overcome the challenges of a foreign ecosystem. The use of genomic data does not constitute a paradigm shift per se, but by leveraging new theory, tools, and technologies, population genomics can provide unprecedented insight into basic and applied aspects of invasion science.


| INTRODUC TI ON
As an unintended consequence of global commerce and climate change, biodiversity is being redistributed at an unprecedented rate (Ding et al., 2008;Muirhead et al., 2015;Ricciardi, 2007;Sardain et al., 2019;Seebens et al., 2015). Many introductions fail to form viable populations on foreign soil, but those that go on to establish and spread-invasive species-are a dominant cause of biodiversity declines and a major threat to global food security (Clavero et al., 2009;Clavero & García-Berthou, 2005;Maxwell et al., 2016;Oerke, 2006). Invasion biology is an interdisciplinary field that aims to understand the transport, establishment and spread of invasive species and inform management strategies that mitigate their impact.
Invasion genetics has proven to be an essential part of this effort (Barrett, 2015).
From a purely biological perspective, invasion events are ideal natural experiments that enable the observation of rapid adaptation, parallel evolution, inter-and intraspecific hybridization, and speciation in the wild (Lee, 2002;Ottenburghs, 2021;Prentis et al., 2008;Vallejo-Marin & Hiscock, 2016). Quantifying such phenomena in invasive species will inevitably shed light on the factors that facilitate their transport, establishment and spread. A cursory examination of previous issues of Molecular Ecology will attest to the long history of invasion geneticists working on both pure and applied aspects of invasion biology.
High-throughput sequencing is now broadly recognised as an important tool for monitoring, managing and mitigating the impact of invasive species (Chown et al., 2015;Hamelin & Roe, 2020;Neafsey et al., 2021;Rius et al., 2015;Sherpa & Després, 2021;Tay & Gordon, 2019). As a result, there has been a recent increase in the availability of reference genomes for invasive species, laying the groundwork for population resequencing projects (Martin et al., 2019;McCartney et al., 2019). Bioinformatics literacy continues to increase among researchers, as does recognition of the value of DNA sequence data in invasion science. Combined with the continued decline in the cost of sequencing, this suggests that whole-genome resequencing (WGR) will become a crucial tool in invasion genetics.
What can population genomics add to the field of invasion genetics? The decision to sequence multiple whole genomes is not straightforward. Progress in invasion genetics is limited by a lack of manipulative experiments at least as much as it is limited by a lack of genome sequence data (Bock et al., 2015). Additionally, WGR is currently a nontrivial cost in many systems, and reduced-representation sequencing (RRS) is in many cases sufficient to address key questions in invasion genetics (see Box 1). Here, we review the extent to which WGR has been adopted in the field of invasion genetics, assessing its existing and potential impact beyond other sequencing technologies (e.g., transcriptomics, RRS or comparative genomics of single reference genomes). To achieve this, we assessed 1614 publications that appeared in a Web of Science search using the term "invasive species" ¬ "genom*" "cancer". By combining the results of this search with recent preprints at the time of writing, a total of 32 studies that used WGR to study invasive species were identified (Table S1).
We highlight key case studies from this list, summarise theoretical considerations relevant to the population genomics of invasive species, and highlight newly developed technologies and analyses that enable novel insights though the use of WGR. Population genomic studies of native species evolving in response to invasive species, and studies of pathogens, are outside the scope of this review. It should also be noted that differences in terminology between disciplines may limit the studies we identify here. For example, studies of invasive plants that exclusively use the term "weed" rather than "invasive species" will not have been detected in our search, although we included some examples (e.g., Kreiner et al., 2019). Although WGR has been used to study invasive species from all kingdoms, the phylum Arthropoda represents the majority of the studies we identified (see Table S1). This taxonomic bias is probably the result of the early uptake of WGR in study systems with established genomic resources and does not reflect the important contribution that botanists have made to invasion genetics (Barrett, 2015). The research themes we identify below are not specific to any taxon, but rather represent the key opportunities that WGR can offer as it becomes an increasingly common tool.
We discuss five broad themes in invasion biology research that have involved, or could benefit from, high-throughput sequencing.
These are: the role of preinvasion adaptation in enabling subsequent spread (Part 2); tools to reconstruct invasion routes in space (Part 3); demographic inference to reconstruct the timing of invasion events, which also sheds light on the role of population bottlenecks during invasion (Part 4); in situ adaptation following introduction to novel bioregions (Part 5); and the role of hybridization and introgression during invasion (Part 6), which brings together all of the preceding themes. These key themes reflect the focus of existing WGR articles in invasion biology and span the temporal range of the invasion sequence ( Figure 1).

| Time-series data can distinguish between species destined for invasion and those that adapt in situ
One of the oldest postulations in the field of invasion biology is that some species are "predisposed" to invasiveness (Baker, 1965). In other words, some traits that facilitate spread in a new environment are not adaptations that arise following colonization (see Part 5), but instead exist in native populations prior to transport (Figure 1). For example, Kreiner et al. (2019) used WGR data to trace the evolutionary history of glyphosate resistance, which facilitates agricultural invasion success in introduced Canadian populations of Amaranthus tuberculatus (common waterhemp). A combination of demographic modelling (see Parts 3 and 4) and scans for selective sweeps demonstrated that populations in Essex County were introduced from native midwestern populations in the United States, and carried glyphosate resistance alleles that were selected for in the native population. Therefore, adaptation had occurred prior to the invasion event. By contrast, populations sampled in a different Canadian population (Walpole Island) indicated a separate demographic history and clear evidence of recent selective sweeps in invasive populations that conferred glyphosate resistance via different alleles. In this case study the basis of glyphosate resistance results from copy number variation, so unique structural variants and extended haplotype homozygosity were required to distinguish "preadapted" populations from those that adapted in situ; such analyses are only possible with WGR data (see Box 3).
In a second case study, the kudzu bug (Megacopta cribraria) was introduced from Asia to North America, where it initially grew on kudzu but within 9 months of detection had begun exploiting soy crops. Studies in the native range had shown that the genotype of the symbiont Candidatus Ishikawaella capsulata ("Ishikawaella" hereafter) mediates M. cribraria's ability to grow on soy (Hosokawa et al., 2006). Therefore, it was initially unclear whether the time between introduction and the switch to soy in the introduced

BOX 1 Sequencing strategies in population genomics
Whole-genome resequencing is one of several sequencing technologies that can be used for population genomic analysis. Currently the most common family of technologies for WGR is short-read sequencing, where reads are aligned to an already-available reference genome. The per-base pair error rate is approximately 0.31% for Illumina reads (Schirmer et al., 2016). Therefore, if rare variants (those that occur in few individuals, and which differ from the reference genome) need to be identified with high accuracy, high sequence depth may be required. This may often be the case when using demographic inference to estimate the timing of an invasion event (see Part 4.1).
With limited resources, there exists a trade-off between the number of individuals sequenced and the sequence depth. Depending on the biological question, higher individual sample sizes may be more valuable than high read depth (for detail on optimising sample sizes see Buerkle & Gompert, 2013;Fumagalli, 2013;Lou et al., 2021). For example, when inferring the geographic source of an invasive population, many reference individuals are required to achieve satisfactory geographic coverage (see Part 3). In some cases, such as when analysing historical museum or herbarium samples, low sequencing depth may be unavoidable (McGaughran, 2020), necessitating analytical pipelines designed to account for this limitation (e.g., Korneliussen et al., 2014). Linked-read technologies such as haplotagging allow variants to be imputed with high accuracy, which means that many individuals can be sequenced at the cost of low-depth sequencing with less of a compromise in terms of effective read depth (Meier et al., 2021;see Box 3). However, the optimal read depth will depend on the focus of the study, as imputation comes at the cost of detecting rare alleles.
One cost-reducing alternative WGR sequencing strategy is Pool-Seq (see Hivert et al., 2018). If a given analysis requires allele frequencies from separate populations (e.g., detecting directional or balancing selection; see Figure 2), genomic DNA from many individuals of the same population can be pooled in equimolar proportion and sequenced together. The concept of Pool-Seq is shown below, where read colours correspond to three different (but unlabelled) individuals sequenced together. Pool-Seq may be largely outdated now that methods for individual barcoding of large numbers of individuals for sequencing have become more affordable.
Reduced-representation sequencing is an alternative to the WGR strategies discussed above, which has been used successfully to study many evolutionary phenomena relevant to invasive species (see Andrews et al., 2016;Deschamps et al., 2012). Reducedrepresentation technologies produce sequence data from a small fraction of the genome, at restriction endonuclease cut sites (shown in grey below). range was due to ecological factors, or because of the time taken for selection to act on founding Ishikawaella genotypes. Brown et al. (2014) inferred the evolution of the symbiont Ishikawaella by sequencing its genome at various geographic locations, including the founding population in the year it was first detected, and at different time points since. Their analysis revealed that the founding population closely resembled native Japanese Ishikawaella samples known to enable growth on soy, with little evidence of allele frequency changes during invasion. This suggests that M. cribraria and its symbiont had arrived in the US already able to spread on soy plantations; its switch to soy was not a consequence of post invasion adaptation. This study exemplifies the significant and under-exploited benefits of WGR time-series data as a means of tracking genome-wide shifts in allele frequency during F I G U R E 1 Through anthropogenic dispersal, many species are increasingly being transported to new bioregions. Only some of these species are able to establish a viable population in the foreign environment and, of these, a smaller fraction will grow exponentially to become 'invasive'. The time between initial colonization and rapid population growth is known as the lag phase (Sakai et al., 2001). The simplicity of this model of biological invasion, known as the invasion sequence, means that its components can be parameterised in terms of ecology and evolutionary biology (Kolar & Lodge, 2001;Lodge, 1993;Sakai et al., 2001). For example, it is critical to understand the eco-evolutionary dynamics that underlie the transition from establishment to spread, or the traits that allow some species but not others to establish small yet viable populations. Across the temporal extent of the invasion sequence, we discuss five research themes in invasion genetics that whole-genome resequencing can shed light on. These research themes are interrelated. For example, anthropogenic transport may increase opportunities for hybridization, which may then decrease the deleterious effects of a population bottleneck or increase the efficiency of adaptation This can be an affordable alternative to WGR, especially where many individual samples are required, where the species of interest has a large genome, or when a reference genome is unavailable. Successful applications of RRS (or other approaches that do not use wholegenome sequence data, such as transcriptomics) have been particularly useful when reconstructing invasion routes and inferring the demographic history of biological invasions (e.g., Gibson et al., 2020;Schmidt et al., 2020;Vallejo-Marin et al., 2021). However, RRS data are less likely to identify low-frequency alleles with shallow coalescence times compared to WGR data, and consequently may not provide sufficient resolution to estimate the timing of recent biological invasions (Part 4.1). In contrast to WGR, RRS does not allow long haplotype data to be used (see Box 3) and has limited chromosomal resolution when using forward-genetic approaches to detect loci subject to selection (Parts 2 and 5) or adaptive introgression (Part 6). While phased WGR data (see Box 3) is most suitable for detecting the complete range of structural variants, RRS can be used to detect a small subset of CNVs (though with a greater potential for genotyping error) and large-scale structural variants (e.g., Dorant et al., 2020;Huang et al., 2020). In addition, library preparation can be more time-consuming (and therefore expensive) than WGS approaches. In summary, we remain at the early phase of the era of WGR; for species with very large genomes, even low-depth sequencing currently remains infeasible. Therefore, RRS and transcriptomics will continue to make useful contributions to the field in some systems. Of course, this second example is far from the biological reality of most study systems; the Ishikawaella genome is just 750 kbp and therefore at least an order of magnitude more affordable to sequence at a population scale compared to most invasive species.
Time-series WGR sampling has rarely been applied in studies of invasive plants or animals (but see Valencia-Montoya et al., 2020), no doubt because population-level WGR remains prohibitively expensive in many systems. However, this approach is well-suited to WGR data. Time-series studies do not require larger sample sizes than those employed in other population genomics studies of invasive species, (e.g., Kreiner et al., 2019;You et al., 2020) and are perfectly compatible with approaches such as Pool-Seq that can better facilitate large individual WGR sample sizes. By collecting and storing tissue soon after introduction, and at regular intervals over the course of invasive spread, WGR can be a powerful tool with which to determine the relative influence of pre-and post introduction adaptation (see Pavinato et al., 2021). Although RRS can also be informative when collected over a time-series, (e.g., Vandepitte et al., 2014) allele frequencies will go unmeasured at the vast majority of loci, potentially leading to an incomplete picture of the adaptive and demographic history of the invasion. WGR offers maximum sensitivity to detect any alleles that have been subject to selection in the invasive range, and to trace the origin of the haplotypes on which such alleles arise.

| Balancing selection maintains adaptive diversity that can facilitate invasion
Unlike the example above, pre-existing adaptations that increase invasive spread in the invaded range can sometimes experience a different F I G U R E 2 Stern and Lee (2020) show that balancing selection in native saltwater populations of the Eurytemora affinis species complex acted on the same loci repeatedly subject to directional selection in the invasive freshwater range. (a) Scan for the footprint of directional selection in the invasive range using Bayescan 3 (Foll et al., 2014). Points in blue show a signature of directional selection. (b) Signatures of balancing selection were quantified in four native populations using the summary statistic β (2) (Siewert & Voight, 2020). Points in blue are within the upper first percentile of β (2) scores calculated within each population. Each colour corresponds to a population; populations with similar colours (light/dark blue and orange) belong to the same clade. The chromosomal coordinates of genes are highlighted in grey in both panels, with candidate genes in the NHA family labelled. Figures (a) and (b) are adapted from Stern and Lee (2020). (c) Individual of the Eurytemora affinis species complex collected from the Columbia River estuary, photographed by Carol Eunmi Lee selective regime in the native range. Lee and Gelembiuk (2008) argue that fluctuating selection pressures in the native range (e.g., through regular disturbance events) can maintain either genetic variation or phenotypic plasticity, which can then be acted on by positive selection in novel environments. This prediction has been tested in the copepod species complex Eurytemora affinis (Stern & Lee, 2020).

| A phylogeographic perspective on invasive species and invasive genes
A useful application of population genetics is to quantify patterns of isolation-by-distance to identify genetically distinct management units, infer the source population(s) of invasive species, and estimate the timing of introduction(s) (Cristescu, 2015). Both phylogenetic and assignment-based population genetic methods, typically using mitochondrial DNA or microsatellite samples from native and invasive populations, have been applied to hundreds of invasive species over several decades (reviewed by . The same methods can be used with WGR data, which can add resolution especially in systems with little population structure (e.g., recent introductions or high interpopulation connectivity). For example, WGR data were used to show that the cosmopolitan crop pest Plutella xylostella (diamondback moth) most likely originated in South America (You et al., 2020). In complex invasion scenarios involving admixture (see Part 6), WGR data can also be used to infer the contribution of different source populations to the invasive gene pool. For example, WGR data were used to infer the source populations of specific alleles that facilitated the successful colonization of Canadian agricultural landscapes by A. tuberculatus .

| Making the most of genomic data
Population genomic data are best suited to analytical tools designed to work efficiently with large data sets and make the most of the available information. To this end, a number of new analytical approaches have been developed to infer the geographic origin of a genomic sample using continuous spatial models (e.g., Battey et al., 2020;Guillot et aL., 2016). Due to their computational efficiency, such measures can also be used to estimate the geographic origin of a sample in chromosomal windows. This feature is particularly useful when tracing the geographic origin of a candidate locus (e.g., a haplotype containing a pesticide resistance gene or a QTL known to be associated with invasiveness) or when investigating the F I G U R E 3 Battey et al. (2020) developed machine learning software, Locator, which can be used to estimate the geographic location of a genetic sample. This and other methods for spatial inference are promising for invasion genetics: the geographic origin of an individual, as well as the spatial origin of a particular locus of interest, can be estimated with WGR data. However, large individual samples sizes in the native range may be required, depending on the level of population structure. (a) Estimated geographic locations for 153 Anopheles gambiae/coluzzii samples in sub-Saharan Africa using a total of 612 training samples and a 2Mbp window size. Each inferred sample location (geographic centroid of per-window estimates) is a blue point connected by a line to the true location of the sample (black point). (b) Geographic error (distance between estimated and true location) and number of training samples per location, for locations in (a). Here, a relatively large number of individuals (>50) is required to achieve sufficient accuracy and precision at a spatial scale that is likely to be useful in practice. Figure

BOX 2 The demography of biological invasion
Demographic bottlenecks (in which the census population size decreases and then increases again) are expected to occur under simple models of biological invasion and may be common among invasive species. The consequences of demographic bottlenecks are well-established and far reaching in terms of population genetic models (reviewed by Gattepaille et al., 2013). Changes in population size cause shifts in the proportion of alleles that are rare. Therefore, statistics that summarise the site frequency spectrum, such as Tajima's D, will reflect the magnitude of the bottleneck and the number of generations since the invasion, and will ultimately show greater than expected variance among loci (Stajich & Hahn, 2005). After a sufficiently extreme bottleneck, all lineages will coalesce to form a star-like genealogy and will eventually produce an excess of rare variants; however if the bottleneck is more moderate and multiple lineages persist, an excess of intermediate-frequency variants may be observed (Depaulis et al., 2003). These scenarios are depicted below (adapted from Gattepaille et al., 2013).
Various ramifications become apparent when thinking about simple invasion scenarios not just in time, but in space. For example, during invasion, individuals at the range edge are presumably more likely to disperse into unexploited habitat than those at the centre of the metapopulation. Over time, this would create strong drift at the leading edge of an expanding population. As a consequence, a subset of low-frequency mutations that arise at the range edge will propagate over space and reach high frequencies simply as a consequence of population expansion (Edmonds et al., 2004). This phenomenon is known as genetic surfing (or "allele surfing"), and is more likely to occur in small populations that rapidly expand in range and size, which could include many invasive species (Klopfstein et al., 2006). Below, a mutation at the expanding range edge spreads to high frequency over a large area through genetic surfing (adapted from Foutel-Rodier & Etheridge, 2020).
contribution of different source populations across the genome.
For example, Locator (Battey et al., 2020) has been used to identify the geographic origin of Anopheles samples (Figure 3). Locator is one of several new analytical tools in population genomics that make use of machine learning (reviewed by Schrider & Kern, 2018; see also Flagel et al., 2019). Modern tools for geographic inference have been taken up more readily in other fields to date (e.g., forensic investigations into illegal wildlife trade), though they hold great potential in invasion science as a means of biomonitoring-for instance, determining the origin of invasive species intercepted at ports at a fine spatial scale.
Whole-genome resequencing increases our ability to infer the geographic origin of invasive populations. For relatively simple invasion scenarios, this increase in precision could be insignificant compared to other sequencing technologies, such as RRS, which may record a sufficient amount of information per sample to reach the same conclusion as WGR data (see Box 1). For example, analysis of RRS data was used to quantify ongoing migration rates of Aedes albopictus in Australia, and to identify the source of the incursions using Locator (Schmidt et al., 2021). This is relevant because geographic coverage is more important than genomic coverage when reconstructing invasion routes. Therefore, unless WGR is simultaneously being used to address a different research question (e.g., quantifying admixture during invasion or inferring the timing of an invasion event), a greater individual sample size with reduced genomic representation may often be superior to a smaller sample size of WGR samples. However, in many cases, inferring the time at which an invasion event occurred is of equal importance to inferring its geographic origin, necessitating a different sampling strategy (see Part 4.1). WGR will probably add most value to geographic inference where there is a weak signature of population structure, or in studies of admixed invasive populations aiming to infer the geographic origin of specific loci (see Part 6).

| INFERRING DEMOG R APHIC CHANG E DURING INVA S I ON AND QUANTIF YING ITS IMPAC T ON INVA S I ON SUCCE SS
In the simplest model of a successful invasion, some small fraction of a native population is transported to a new environment where a viable population forms and increases in size over time. This leads to a demographic bottleneck (a decrease in population size followed by an increase), which can have wide-ranging implications for both practical and theoretical aspects of invasion genetics (see Box 2).
From a pragmatic perspective, the extreme demographic dynamics of colonization can be used to reconstruct invasion events in time.
From a biological perspective, invasion geneticists are tasked with explaining how so many invasive species form viable populations, let alone dominate foreign ecosystems, in the face of a population bottleneck that reduces genetic diversity and increases the risk of inbreeding depression (Estoup et al., 2016). We discuss these two perspectives separately.

| Population bottlenecks as a timestamp in the genome
Population bottlenecks, detected using demographic inference methods, can help to infer the number and timing of independent colonization events. For example, a dramatic population bottleneck was detected in one invasive lineage of A. tuberculatus, consistent with its colonization of North American agricultural landscapes . In a separate study, there was a clear signature of population bottleneck in the invasive fall webworm (Hyphantria cunea), although this predated introduction to China in 1979 (Wu et al., 2019). In contrast, there was no signal of a population bottleneck in invasive North American populations of the common carp, which instead shared a similar demographic history to putative Genetic surfing creates fitness costs at the range edge (reviewed by Angert et al., 2020) in two ways. First, at least in one dimensional simulations, surfing causes genetic diversity to decline over space away from the population centre (Hallatschek & Nelson, 2008).
Second, deleterious mutations can surf on the wave of advance to reach high frequencies over a large range (Peischl et al., 2013;Travis et al., 2007)-a phenomenon known as expansion load (Peischl & Excoffier, 2015). These costs make the success of invasive populations seem even more paradoxical (Estoup et al., 2016). However, a number of solutions have been proposed to the cost of range expansion. First, under some coniditions, long range dispersal can ameliorate the loss of genetic diversity through surfing (Paulose & Hallatschek, 2020). Second, the spatial sorting of dispersal traits that results from superior dispersers finding mates more often at the range edge (Shine et al., 2011) can rescue populations from expansion load (Peischl & Gilbert, 2020).
Genetic surfing can also create geographic clines in allele frequency in the direction of range expansion (Klopfstein et al., 2006), clusters of low genetic diversity, and sweeps of random loci in different regions of the metapopulation (Hallatschek et al., 2007). These allele frequency patterns may be falsely interpreted as a footprint of selection (Excoffier & Ray, 2008).
Thus, when using genomic data to detect post introduction adaptation in an invasive population known to have undergone a population bottleneck, modelling approaches should be used to rule out potentially confounding demographic and spatial effects (e.g., Currat et al., 2006). Moreover, the model of expansion load of Peischl and Excoffier (2015) provides clear expectations in terms of the shape of the site frequency spectrum at the range front, which may be validated or rejected using WGR data from invasive populations.
BOX 3 Maximising the advantage of whole-genome resequencing with haplotype data All sequencing technologies allow allele frequencies to be measured. One of the key advantages of WGR over other technologies is the opportunity to exploit additional information, such as the haplotypes on which physically linked alleles are coinherited. Haplotype data enable the use of several powerful analytical methods (reviewed by Leitwein et al., 2020) that are relevant to invasion genomics.
Because recombination and mutation reconfigure haplotypes over time, the size and frequency of haplotypes convey evolutionary information-a phenomenon that Moorjani et al. (2016) refer to as the "recombination clock". For example, a haplotype on which a beneficial allele arises is swept to fixation faster than recombination can break it down to its expected size under neutrality. Therefore a signature of selection is left by unusually large stretches of haplotype homozygosity (i.e., linkage extends further from the selected locus than expected), and by the unexpectedly high frequency of a core haplotype (Sabeti et al., 2002). This is the basis for tests of extended haplotype homozygosity, used to scan the genome for signatures of selection (see Parts 2 and 5). Haplotype data are also a powerful means of reconstructing population size change through time, because demographic shifts cause distinct size changes in tracts of identity by descent (Harris & Nielsen, 2013). By analysing long haplotypes identical by descent (that have not yet been broken down by recombination), Browning and Browning (2015) were able to accurately reconstruct changes in human population size in the recent past (4-50 generations before present). This approach holds great potential for invasion genetics, where it is often difficult to reconstruct recent demography (see Part 4.1).
Haplotype data show most promise in recently admixed populations (see Part 6). Any analysis of hybridization using haplotype data will require the ancestry of an introgressed haplotype ("ancestry tract" or "ancestry block") to be inferred (for a review of approaches to ancestry assignment see Leitwein et al., 2020). Duranton et al. (2019) studied the introgression of Atlantic sea bass (Dicentrarchus labrax) into Mediterranean populations of the same species. By modelling the diffusion of introgressed haplotypes through space (by gene flow) as they are broken down over time (by recombination), the average per-generation dispersal distance could be estimated.
This approach is likely to be useful for reconstructing the spatial extent of introgression in invasive species (see Parts 3 and 6). Finally, adaptive introgression can be accurately detected using haplotype data (see Shchur et al., 2020). In summary, haplotype data open many possibilities in invasion genetics research, representing one of the key advantages of using WGR to study invasive species.
However, haplotype information cannot be directly extracted from WGR data generated using short reads. Therefore, until long-read sequencing becomes scalable, direct or indirect methods for inferring gametic phase (i.e., the two DNA sequences on which alleles occur, in the case of diploids) need to be used to leverage haplotype information from WGR data. Indirect phasing methods can be applied to whole-genome data sets obtained with short-read sequencing technology (reviewed by Rhee et al., 2016). The accuracy of these statistical methods depend on factors such as the number of samples and the density of nucleotide polymorphisms (Browning & Browning, 2007). Phasing errors can affect the downstream biological interpretations. Direct phasing methods, on the other hand, record chromosomal haplotypes during the generation of sequence data. Linked-read sequencing is a newly developed family of direct phasing technologies that results in fewer errors than indirect statistical approaches (Amini et al., 2014;Choi et al., 2018).
Though linked-read sequencing approaches show great promise in population genomics (e.g., Lutgen et al., 2020), many platforms are currently prohibitively expensive. One notable exception is haplotagging, a recent low-cost linked-read sequencing method (Meier et al., 2021).
Through haplotagging, kilobase-length DNA fragments are tagged with unique barcodes as they wrap around unique microbeads in solution.
ancestral populations in Europe (Yuan et al., 2018). A criticism of the latter two studies is that they both used a sequentially Markovian coalescent (SMC), which is most appropriate for inferring demog- Methods that use the site frequency spectrum, on the other hand, tend to perform better when inferring recent demographic change . A further complexity is that invasions often involve sequential introductions and admixture among differentiated source populations or even species, making their origins much harder to identify. Additional noise in the genome-wide signature of a population bottleneck may be added by post-introduction adaptation, which affects coalescence times at loci in linkage disequilibrium with those under selection (see Charlesworth, 2009). Nonetheless, the timing and magnitude of admixture can be inferred by analysing the size distribution of haplotypes-information that is not measurable using reduced-representation approaches (Harris & Nielsen, 2013; see Box 3). In particular, Bayesian estimates of complex demographic parameters can be improved with the use of whole-genome sequence data compared to RRS, largely due to the information added by haplotype statistics (Smith & Flaxman, 2020).
Whole-genome sequence data should, in general, enable more accurate demographic inference. This is because demographic inference in the recent past relies on the shallow coalescence times of low-frequency alleles. Assuming that sequencing is conducted at sufficient depth to accurately call low-frequency alleles, WGR data are therefore more likely to capture rare alleles required to time recent bottlenecks compared to RRS (Hahn, 2019). A study by Puckett et al. (2020) demonstrates this limitation. Using RRS, the authors set out to test the hypothesis that a 1768 shipwreck introduced the brown rat (Rattus norvegicus) to the Faroe Islands. Although three introduction events could be inferred, the timing of each event could not be estimated due to a lack of rare alleles, which were removed through the very bottlenecks these authors were attempting to date. As whole-genome sequence data capture many more of the rare alleles required to time recent bottlenecks, WGR is more likely to give better resolution in examples such as this.
In summary, estimating the timing of recent and extreme changes in population size may in some cases be an intractable problem, but WGR data increases the chance of success. In simpler demographic scenarios, more affordable sequencing approaches, such as transcriptome sequencing and microsatellite markers, have shown considerable success in reconstructing invasion histories (e.g., Fontaine et al., 2021;Popovic et al., 2020). While these remain satisfactory sequencing strategies in some contexts, WGR data can increase resolution and will be especially useful in estimating the time of recent or complex invasion scenarios.

| Population bottlenecks as a paradox to resolve
A central aim in invasion genetics is to understand the general impact of genetic drift on invasion success beyond individual case studies (Bock et al., 2015). Debates about the role of genetic drift in invasion success are as old as the field of invasion genetics (Baker & Stebbins, 1965;Barrett, 2015). Even today researchers seek to resolve the "paradox of biological invasion" by explaining how invasive populations rapidly adapt to new environments despite a loss in genetic diversity, a reduction in the efficiency of natural selection and an increased risk of inbreeding depression (Estoup et al., 2016).
Additionally, if an introduced population can overcome (or avoid) these challenges, it must then somehow endure expansion load (see Box 2). WGR data are well suited to the examination of this apparent paradox.
Standard short-read sequencing can then proceed with long-range haplotype information retained as unique barcodes. This method also allows individuals to be sequenced at lower depth, because missing genotypes can be imputed using haplotype information, and structural variants to be more readily identified.
First, WGR can be used to reliably test whether an introduced population has experienced a population bottleneck, and to distinguish among different demographic scenarios that explain the observed level of genetic diversity (Smith & Flaxman, 2020;Welles and Dlugosch, 2018). For example, higher genetic diversity was seen in invasive populations of P. xylostella compared to their native range, despite clear evidence of a population bottleneck, apparently as a result of admixture among introduced populations (You et al., 2020).
Second, WGR allows genetic diversity to be measured across contiguous loci, enabling association between nucleotide diversity and other factors that covary along chromosomes at a much finer scale than RRS can offer. Comeault et al. (2020) showed that, although also an understanding of the association (or lack thereof) of that diversity with fitness and fitness-associated traits in the invaded range (see Davidson et al., 2011;Szulkin et al., 2010).

| Approaches to measuring post introduction adaptation
The study of rapid adaptation is a major research theme in population genomics, so the application of relevant tools and technologies to invasion biology represents an exciting opportunity. Although there are relatively few examples to date, approaches such as genomewide scans for selection and association-mapping with traits of interest are so well established that we expect the study of in situ adaptation during invasion to benefit substantially from genomics over the coming decade. In terms of the invasion sequence, adaptive change in the invaded range can either end the lag phase by facilitating spread or accelerate spread in an already-invasive population (Prentis et al., 2008) (Figure 1). Although in situ adaptation is thought to play an important role in invasion success, it has often been difficult to quantify the contribution of an adaptive trait to the rate of spread (Bock et al., 2015). It should be noted that in situ adaptation may also occur long after spread and therefore may not contribute to the initial success of an invasion event. Regardless, the identification of adaptation in the invasive range not only contributes to a general understanding of biological invasion but also provides information for integrated management strategies.

| Forward-genetics in the wild
"Top-down" forward-genetic approaches start with a particular trait and dissect its genetic basis. There are few examples of this approach using WGR data in invasion biology. Forward genetic approaches have historically involved QTL mapping in invasive species reared under controlled conditions. However, if an invasiveness trait can be scored in wild individuals, genome-wide association studies can be used with WGR. For example, the genetic basis of wing length was investigated with admixture mapping using WGR from field-collected samples of introduced honey bees (Apis mellifera), though no major effect loci were identified (Calfee et al., 2020). In another WGR study, on Aedes aegypti throughout the native sub-Saharan range, a handful of major effect loci underlying preference for human odour were identified (Rose et al., 2020). Although the latter study was conducted within the native range of A. aegypti, the trait of human preference appears to contribute to its spread into urban habitats.

| Scans for the genomic signature of selection
WGR has more commonly contributed to reverse-genetic approaches, where whole-genome scans are used to identify loci that have been subject to selection without directly knowing the traits involved. In this way, inferences can be made about the genetic basis and evolutionary history of adaptation even when the ecology and life history of an invasive species is poorly understood. There are various ways to identify the signature of natural selection from genomic data sets. When studying a single invasive population, the footprint of a selective sweep can be identified from the site frequency spectrum of genetic variation (DeGiorgio et al., 2016;Nielsen et al., 2005). Alternatively, comparisons between populations (e.g., between different timepoints during invasion or between native and invasive populations) can be used to identify regions of high divergence, using summary statistics such as F ST or the population branch statistic (Yi et al., 2010). While traditional selection scans often assume independence among loci, analyses developed specifically for WGR data can account for linkage among sites to detect the signature of different evolutionary forces . This approach has been successfully used to identify adaptive loci following colonization of novel environments (Louis et al., 2020).
Another rarely exploited approach to measuring adaptation in invasive species is the use of sequence data collected in a time series-analogous to "evolve-and-resequence" experiments carried out in laboratory populations (Long et al., 2015;Schlötterer et al., 2015). This approach also provides a framework in which to identify allele frequency shifts resulting from simple or polygenic adaptation, though only under a fairly stringent set of assumptions (e.g., no migration) for polygenetic adaptation (see Buffalo & Coop, 2020;Dehasque et al., 2020;Otte & Schlötterer, 2020).
Where samples are not readily available from early timepoints in the invasion, historical museum or herbarium samples can be used to infer past allele frequencies (Bi et al., 2019;McGaughran, 2020). Notably, a null result using this approach can support a hypothesis of preinvasion adaptation.
The approaches mentioned above can be used with SNPs, transposable elements or structural variants, which are readily detectable using WGR data. Especially in invasive species with very large genomes, RRS data is also an important tool with which to measure the frequency of large structural variants and a subset of CNVs, as well as the signature of linked selection from a genome-wide sample of SNPs (e.g., Endersby-Harshman et al., 2020;Huang et al., 2020). By contrast, WGR enables the identification of both small-and large-scale structural variants segregating in invasive populations (Bertolotti et al., 2020) and allows specific loci subject to selection to be identified with greater precision. Moreover, WGR data can be used to infer the action of selection using information that is entirely unmeasurable with RRS data-namely long haplotypes (see Box 3). for association with invasiveness in 16 invasive and six native populations of Drosophila suzukii identified SNPs in two genes associated with independent invasion routes (Olazcuaga et al., 2020).
Using a similar approach that controlled for population structure, genome scans across the global distribution of P. xylostella identified three potentially novel insecticide resistance alleles (You et al., 2020), and signatures of positive selection were associated with sugar receptor genes in in Hyphantria cunea (mulberry moth) (Wu et al., 2019). Selective sweeps identified exclusively in invasive populations of monkeyflower (Mimulus guttatus) indicated that selection appears to have acted on genes associated with flowering time, abiotic stress and biotic stress (Puzey & Vallejo-Marin, 2014). Other studies have made use of WGR data by identifying structural variants and transposable elements, investigating their effect on fitness in invasive populations. For example, again in D. suzukii, 15 putative adaptive transposable elements were identified, one of which was 399 bp from a SNP previously associated with invasion success in this species (Mérel et al., 2020). In A. tuberculatus, copy number variation in the gene EPSPS correlated with a glyphosate resistance phenotype, and a selective sweep around EPSPS amplifications could be confirmed by a signature of extended haplotype homozygosity . This exemplifies the use of WGR to identify an otherwise invisible dimension of genetic variation. Identifying genes subject to selection during invasion is an important first step, though the association of those genes with fitness-associated traits is still key to understanding the effect that post introduction adaptation has on invasive spread.
It has long been realised that genome scans for selection need to account for background genomic processes that can lead to false positives for adaptive loci. These can include genetic drift caused by demographic changes and selective processes such as background selection. In some cases, the peculiar biology of invasive species makes them especially prone to such problems, as genetic bottlenecks can lead to signatures of reduced variation similar to those caused by selection (see Box 2). Furthermore, any summary statistic capturing the coalescent process will be influenced by variation in recombination rate (c) and changes in the effective population size (N e ) (Barton & Etheridge, 2004;Booker et al., 2020;Brandvain & Wright, 2016). N e and c can be estimated empirically with WGR data. Changes in N e can be inferred using demographic inference methods (see Part 4.1), while recombination rate variation along the genome can be estimated by constructing a linkage map or by using phased WGR data (Chan et al., 2012). User-friendly modelling tools, such as SLiM, can be used to explore the expected distribution of summary statistics under various combinations of N e and c . Tests for selection that explicitly incorporate demography and recombination can also be used (e.g., Luqman et al., 2020). Therefore, despite the confounding effects of recombination rate variation and demographic history on the summary statistics used in genome scans, it is becoming increasingly tractable to identify and account for these effects.

| Mapping introgression during invasion: Old ideas, new tools
Hybridization within and between species has long been recognised as a potentially important process mediating invasion success (Bock et al., 2015). This is a central theme in invasion genetics now more than ever, perhaps because the availability of genomic data has facilitated the quantification of hybridization in a wider variety of systems (Grabenstein & Taylor, 2018;McFarlane & Pemberton, 2019;Todesco et al., 2016;Viard et al., 2020). A wealth of newly developed tools make specific use of WGR data to detect adaptive introgression in particular, which may shed light on pre-and post invasion adaptation (see Parts 2 and 5) (Gower et al., 2021;Malinsky, 2019;Setter et al., 2020;Svedberg et al., 2021). Similarly, through the use of newly developed models, spatial patterns of neutral introgression from invasive populations into native populations can be used to reconstruct invasion events in space and time (see Parts 3 and 4.1, and Quilodrán et al., 2020). Thus, through the use of WGR, the quantification of hybridization can bring together many disparate themes in invasion genetics.

| Intraspecific hybridization
As discussed in Part 4.2, a longstanding challenge in invasion biology is to explain how invasive populations overcome or avoid the deleterious consequences of a demographic bottleneck. One solution to this challenge is seen where invasions involve admixture among multiple genetically differentiated source populations (Cristescu, 2015;. For example, population genomic studies of the invasive fungus Cryphonectria parasitica (causing chestnut blight) and the fall armyworm (Spodoptera frugiperda) show that gene flow among invasive lineages maintains genetic diversity (Demené et al., 2019;Tay et al., 2020;Yainna et al., 2020). Admixture not only alleviates the effects of inbreeding depression but can lead to the sorting of adaptive alleles into beneficial combinations. This may often explain the "bridgehead effect", where an initially successful invasion acts as a source of colonists for subsequent invasions (Lombaert et al., 2010). Rispe et al. (2020)  suggests the former plays a more important role in invasion success (Bertelsmeier & Keller, 2018).
Strong genomic evidence for the sorting of adaptive alleles following the hybridization of genetically differentiated populations has also come from replicate studies of geographic clines. In introduced Australian and North American populations of D. melanogaster, an F ST outlier scan was used to identify polymorphisms responsible for parallel latitudinal clines in both continents (Bergland et al., 2016).

| Interspecific hybridization
Interspecific hybridization is now recognised as being reasonably frequent, occurring in some 10% of animal and 25%-30% of plant species (Mallet, 2005;Rieseberg et al., 2006). Increasingly, through unintended anthropogenic dispersal, pairs of species are interacting for the first time since they last shared a common ancestor (Grabenstein & Taylor, 2018;McFarlane & Pemberton, 2019;Muirhead et al., 2015;Seebens et al., 2015). This opens the possibility of adaptive gene exchange that could contribute to invasion success in much the same way it does within species (Hovick & Whitney, 2014). However, interspecific hybridization will more commonly produce unfit offspring compared to intraspecific admixture, as reproductive incompatibilities are more likely to have accumulated. The trade-off between the cost of hybridization and the benefit of adaptive introgression creates ideal conditions for the study of speciation.
One example of interspecific introgression during invasion is the Iberian hare, Lepus granatensis, which replaced the now-extinct Arctic species, L. timidus, in its northern range. IL12B, a gene implicated in the inflammatory process and immune response to viruses in rabbits, underwent adaptive introgression from L. timidus to L. granatensis, potentially contributing to its northern range expansion following the last glacial maximum (Seixas et al., 2018). Similarly, some introduced populations of the three-spined stickleback (Gasterosteus aculeatus) have higher genetic diversity as a result of introgression from G. nipponicus (Yoshida et al., 2016).
In some cases, hybridization might contribute to increased fitness of native species. For example, the crop pests Helicoverpa armigera and H. zea developed strong prezygotic barriers to hybridization following more than one million years of divergence in allopatry (Laster & Sheng, 1995 (Dennenmoser et al., 2017(Dennenmoser et al., , 2019Hessenauer et al., 2020;Wang et al., 2020). Notably, interspecific introgression among Ophiostoma species increased genetic diversity and was associated with individual growth rate (Hessenauer et al., 2020). Moreover, introgression on chromosome 1 was positively associated with virulence, apparently as a consequence of adaptive introgression (Hessenauer et al., 2020). These observations suggest that interspecific hybridization can create novel combinations of adaptive variants that enhance spread, in addition to mitigating the negative impacts of population bottlenecks by maintaining genetic diversity.
Regardless of whether it increases the spread of invasive species, interspecific hybridization can threaten local biodiversity as a result of genetic swamping (where native genotypes are replaced by hybrids) or demographic swamping (where native population growth rates are reduced via outbreeding depression) (Todesco et al., 2016).
Genetic swamping is both a potentially cryptic mode of extinction (Todesco et al., 2016) and a mechanism by which genetic material from introduced domesticated species can dominate populations of wild relatives (Haygood et al., 2003). The high resolution of WGR means that it can be a powerful tool for monitoring and quantifying genetic swamping.

| CON CLUS ION
Many of the key questions in invasion genetics highlighted by Bock et al. (2015) remain unanswered, though our ability to obtain and interpret genome sequence data has matured substantially in the past five years. Though WGR data are certainly not a singular solution to outstanding questions about biological invasion, we are increasingly appreciating their potential; of the studies we assessed, over two thirds were published within the past two years (Table S1).
Whilst some research questions have more to gain from WGR than others (e.g., quantifying hybridization vs. spatial inference), appropriately designed population genomics studies can address multiple questions about invasions simultaneously. Indeed, almost without exception, the examples we have highlighted addressed hypotheses from many areas of interest.
Based on existing genomic data, processes that maintain adaptive genetic diversity (i.e., balancing selection, admixture and adaptive interspecific introgression) are often key to the success of invasive populations (e.g., Calfee et al., 2020;Hessenauer et al., 2020;Kreiner et al., 2019;Stern & Lee, 2020;Valencia-Montoya et al., 2020;Yainna et al., 2020). In other words, standing genetic diversity that has already been shaped by natural selection is often repurposed to rapidly overcome adaptive challenges; invasive populations do not have time to reinvent the wheel. This is not a new observation. Several authors have argued that the same "combinatorial" evolutionary processes known to facilitate major ecological transitions and adaptive radiations can also enable biological invasion (Hegarty, 2012;Marques et al., 2019;Prentis et al., 2008;Rieseberg et al., 2003). Genomic data have revealed the frequency of this phenomenon among invasive species. It is now clear that the "paradox" of biological invasion is often explained not only by the fact that many invasive populations avoid the negative effects of demographic bottlenecks, but also because they avoid the need for de novo mutation followed by in situ adaptation (Estoup et al., 2016). This implies a genic view of biological invasion in which the primary aim of management strategies should be to minimise the spread of alleles known to confer invasion success through introduced populations, and potentially through reproductively compatible native populations.
In the studies we surveyed, the most substantial individual contributions to our understanding of invasive species did not come from the largest data sets, but from studies that associated phenotypic or spatial information with genomic data in a hypothesis-driven design that incorporated appropriate modelling (e.g., Calfee et al., 2020;Olazcuaga et al., 2020;Stern & Lee, 2020). Future genomic studies will contribute considerably to our understanding of pre-and post introduction adaptation if they adopt such an approach. Useful inferences will also require development of tailored nonequilibrium models that can incorporate the complex demographic history of invasive populations. In particular, an exciting area is understanding the evolutionary history of loci that contribute to adaptive spread.
This will be easiest when studying recent invasions with samples taken over a time series. The second, more challenging step will be to quantify the marginal contribution of positively selected loci to invasive spread. This task will be made easier with the use of forwardgenetic approaches to dissect the basis of fitness-associated invasiveness traits, thereby allowing a more direct connection with the ecological basis of adaptive success. Specifically, identifying the source populations contributing alleles at QTL for invasiveness traits, and how allele frequencies change at QTL over the course of an invasion, will allow for a connection to be made between invasion success and evolutionary processes such as population bottlenecks and hybridization. Together, these approaches can be used to test whether the "combinatorial" view of invasion success holds up as a general trend.
Given the frequency of potentially cryptic gene flow during invasion, alongside the declining cost of sequencing, we anticipate that whole genome sequence data will become a standardised approach for monitoring the ongoing global redistribution of biodiversity.
Comprehensive genomic data sets will eventually allow invasion events to be consistently reconstructed at a resolution that is useful for informing management plans and they will put us in a better position to quantify the contribution of specific mechanisms to overall invasion success.

ACK N OWLED G EM ENTS
We thank Cynthia Riginos, Iva Popovic and Sinan-Saleh Kassam for their valuable input during early discussions on this topic. All members of the Insect Evolutionary Genomics Group at the University of Cambridge provided useful feedback, especially Joana Meier and Gabriela Montejo-Kovacevich. Three anonymous reviewers and the Subject Editor provided thoughtful feedback that improved the quality of this review. We are grateful to CJ Battey, Erin Calfee and David Stern for promptly providing code and data in a tidy and user-friendly format for reproduction (Figures 2-4). We are also grateful to Carol Eunmi Lee and James Dorey, who provided photographs for use in

AUTH O R CO NTR I B UTI O N S
All authors contributed equally to writing the final version of this review.

DATA AVA I L A B I L I T Y S TAT E M E N T
The list of 32 studies identified in our literature review have been uploaded as Supporting Information material (Table S1). Academic Press Inc.