Cryptic genetic variation in an inbreeding and cosmopolitan pest, Xylosandrus crassiusculus, revealed using ddRADseq

Abstract Each year new exotic species are transported across the world through global commerce, causing considerable economic and ecological damage. An important component of managing invasion pathways is to identify source populations. Some of the most widespread exotic species are haplodiploid ambrosia beetles. The ability to mate with siblings (inbreed) and their transportable food source (symbiotic fungus) have enabled them to colonize most of the world and become pests of plant nurseries, lumber, and forests. One of the fastest spreading ambrosia beetles is Xylosandrus crassiusculus. In order to discover the source populations of this globally invasive species, track its movement around the world, and test biogeographical scenarios, we combined restriction site‐associated DNA sequencing (RADseq) with comprehensive sampling across the species native and introduced range. From 1,365 genotyped SNP loci across 198 individuals, we determined that in its native range, X. crassiusculus is comprised of a population in Southeast Asia that includes mainland China, Thailand, and Taiwan, and a second island population in Japan. North America and Central America were colonized from the island populations, while Africa and Oceania were colonized from the mainland Asia, and Hawaii was colonized by both populations. Populations of X. crassiusculus in North America were genetically diverse and highly structured, suggesting (1) numerous, repeated introductions; (2) introduction of a large founding population; or (3) both scenarios with higher than expected outcrossing. X. crassiusculus, other wood‐boring insects, and indeed many other pests with unusual genetic structure continue to spread around the world. We show that contemporary genetic methods offer a powerful tool for understanding and preventing pathways of future biosecurity threats.

Our ability to infer the history of populations used to be restricted by the availability of molecular markers, which were historically difficult and expensive to develop (Avise, 2012). However, with the advent of high-throughput sequencing and genotype-by-sequencing strategies such as restriction site-associated DNA sequencing (RADseq), marker discovery and genotyping have proliferated. The ease and accessibility of RADseq have facilitated studies of ecological and evolutionary genetics in nonmodel organisms (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016;Narum, Buerkle, Davey, Miller, & Hohenlohe, 2013) including exotic species with unknown population histories (Garnas et al., 2016;Hasselmann, Ferretti, & Zayed, 2015).
The Xyleborini ambrosia beetles (Coleoptera: Curculionidae: Scolytinae) are some of the most widespread invasive insects (Haack, 2011). Their success has been facilitated by two features: the farming of a fungal symbiont as a food source and their haplodiploid reproductive strategy, characterized by high inbreeding (Jordal, Beaver, & Kirkendall, 2001). Increasingly, these exotic ambrosia beetles are becoming pests of plant nurseries, stored lumber, and forests (Hulcr & Dunn, 2011). Despite their importance, the population structure of globally distributed Xyleborini beetles has rarely been studied (Gohli, Selvarajah, Kirkendall, & Jordal, 2016), and the effect of their unique biology on population genetic structure remains unknown (Kirkendall, Biedermann, & Jordal, 2015).
Many extraordinarily invasive insects possess extraordinary genetic and biological features. For example, haplodiploidy and/or female-biased rapid reproduction is disproportionately represented among globally invasive pests (Ehrlich, 1986). Examples include sap sucking Heteroptera with female-biased reproduction and nonassortative mating (Liu et al., 2007) or ants with haplodiploidy and inbreeding which allows incipient populations to prosper even in the absence of genetic variability (Holway, Suarez, & Case, 1998).
An example of an invasive group that combines several unusual genetic features is the Xyleborini ambrosia beetles, a group of haplodiploid, highly inbred fungus farming wood borers. Perhaps the most widely spread Xyleborini species is the Asian granulated ambrosia beetle Xylosandrus crassiusculus (Flechtmann & Atkinson, 2016). This species is assumed to have originated in Asia and has been introduced throughout the tropics and subtropics likely via human-mediated means. Within the last several decades, it has reached all tropical and subtropical areas of the world, and it often becomes the dominant ambrosia beetle species in each newly colonized area. In the United States, for example, it was first recorded only 42 years ago and quickly became one of the most commonly collected ambrosia beetles in eastern North America (Miller & Rabaglia, 2009). Not only does it dominate natural communities, but it is also becoming a serious pest in the nursery industry. The beetle is highly attracted to volatiles from stressed and weakened trees and is able to precisely locate such trees in a nursery (Ranger, Reding, Persad, & Herms, 2010). Minor stresses, such as temporary flooding or late frost, used to be of little concern in nurseries because these trees typically recovered. Now, with X. crassiusculus ubiquitous in the landscape, stressed trees are rapidly colonized and killed (Ranger, Schultz, Frank, Chong, & Reding, 2015), and nurseries are forced to adapt their management or assume significant losses.
Despite the impressive speed of the X. crassiusculus global invasion, nothing is known about its population structure and spread. The number of introductions and their origins, the level of inbreeding, and any possible cryptic diversity remain unknown. The only available data on the population genetics of X. crassiusculus are from Dole et al. (Dole, Jordal, & Cognato, 2010), who used cytochrome oxidase I mtDNA sequencing (the DNA "barcode") to infer that the population in the United States is clonal, while genetic distances between other populations are large and analogous to distances observed in interspecific comparisons.
In order to discover the source populations of the invasive X. crassiusculus and track its human-assisted movement around the world, we used RADseq. The method combines high-throughput genomic marker discovery and genotyping, which is valuable in a system with few molecular markers. The RADseq approach is combined here with a comprehensive sample of the world's populations of X. crassiusculus, which enabled us to test, for the first time, specific hypotheses on the global spread of an inbreeding pest: Is there a single source population or multiple sources? Is the intraspecific genetic divergence of such rapidly spreading species low in invaded regions (suggesting a series of single introductions) or high (suggesting the global movement of multiple populations)? And, given that this is a highly inbred and haplodiploid species, will RADseq uncover population genetic structure expected from studies of regularly reproducing invasive species?

| Specimens and sample preparation
We sampled 198 female X. crassiusculus beetles from the species' native and introduced range for genotyping-by-sequencing (Table 1, Figure 1). All specimens had been stored at −80°C in 95% EtOH in the University of Florida Forest Entomology Collection at the School of Forest Resources and Conservation. Only females were used because males are rare in species with the haplodiploid system. In order to minimize nontarget species DNA carryover, all specimens were surface washed by vortexing in a PBS-Tween solution. Additionally, each beetle was dissected to remove the digestive tract, mycangium (the fungus storing organ), and the spermatheca (a female organ that stores sperm). DNA was then extracted from the remaining beetle muscle tissue using a Qiagen DNeasy Blood and Tissue Kit per the manufacturer's instructions. DNA's quantity and quality were assessed using semiquantitative gel electrophoresis. After dissection and DNA extraction, no voucher material remained.

| ddRAD sequencing
Specimens with at least 160 ng of nondegraded DNA were prepared for double-digest restriction site-associated sequencing (ddRADseq Peterson, Weber, Kay, Fisher, & Hoekstra, 2012). Following (Parchman et al., 2012), DNA was digested with the rare EcoRI and common MseI restriction enzymes resulting in fragments with sticky ends. Illumina sequencing adaptors and custom barcodes (Parchman et al., 2012) were ligated to the EcoRI terminus sticky end with a common adaptor ligated to the MseI terminus of the digestion product, and then the adaptor-ligated fragments were PCR amplified using primers for Illumina sequencing adaptors. Each adaptor+barcode was composed of an Illumina adaptor, a unique barcode between 8 and 14 bases in length that varied by at least four bases, and the restriction site. Amplified digestion-ligation products were visually inspected using gel electrophoresis and then pooled in equal volumes for sequencing. Gel-based size selection for 200-500 bp fragments and sequencing library purification using Agencourt AMPure XP beads (Beckman Coulter) were performed at the UF ICBR coresequencing laboratory (http://www.biotech.ufl.edu/). The sizeselected and purified sequencing library was further pooled with a similarly constructed library consisting of 96 bar-coded samples and then loaded on the Illumina NextSeq500 high-output flow cell for a shared 150-cycle sequencing run. Unfiltered demultiplexed sequences were deposited in NCBI GenBank Short Read Archive (Accessions: SRR4181068-265).

| Read processing and genotyping
Illumina adaptor sequences were automatically trimmed from all reads by Illumina's BaseSpace software. All subsequent steps were conducted in the UF High-Performance Computing instance of GALAXY (Blankenberg et al., 2001;Giardine et al., 2005;Goecks, Nekrutenko, & Taylor, 2010). Trimmed reads were quality-filtered using the FASTX (http://hannonlab.cshl.edu/fastx_toolkit/) toolkit for a Phred score of 20 across 90% of the read. Quality-filtered reads were demultiplexed using the FASTX Toolkit "barcode splitter." Barcodes and restriction site nucleotides were then trimmed from the beginning of the read using "FASTX trimmer" from the FASTX Toolkit. Due to variation in read length, all reads were trimmed to 71 bases. This maximized the number of reads recovered per individual because downstream processing required that all reads be the same length per individual.
T A B L E 1 Sample location and sample sizes with the origin of populations at each location, the first reference for establishing origin, and pest status pl script, RAD tags were first assembled into de novo loci per individual in "ustacks"; from these loci, a catalog of all possible loci was assembled in "cstacks"; and then, individuals were compared to the catalog in "sstacks". Optimum parameters for loci and catalog assembly were Single-nucleotide polymorphism (SNP) genotypes were generated from the output of denovo_map.pl using the "populations" module of STACKS. In STACKS "populations," a locus was retained if it was shared by 90% of the 198 individuals sequenced (-r 0.9), there was a minimum of nine reads per individual at a locus (-m 9), and a minimum allele frequency of 0.05 per locus (-a 0.05). The additional parameter that only a single SNP should be called per a read was specified using -write_single_snp. We reduced the potential confounding effect of the fungal sequence carryover in two ways: by removing the mycangium as described above and bioinformatically. Catalog loci were compared to a draft genome of Ambrosiella roeperi (the specific mutualist of X. crassiusculus; D Vanderpool and JP McCutcheon, unpublished). Loci that aligned to the fungus genome with 99% similarity across the entire read were excluded from SNP calling by specifying them as blacklisted loci (-B) in STACKS "populations."

| Population structure and history
Data transformation and treatment of missing genotypes were performed in R version 3.2.1 (R Core Team 2015) using the function tab from the package adegenet (Jombart, 2008). Genotypes were standardized and transformed into relative allele frequencies. Missing alleles (4.46%; Figure 2) were replaced by the mean locus-specific allele frequency. This treatment of missing data is recommended for downstream multivariate analyses. Any biases introduced are minimal given a low number of missing genotypes and lack of systematic missing data at specific loci or for specific individuals. Basic population and summary statistics were generated from STACKS "populations" sumstats.summary output, using the adegenet summary function and the hierfstat basic.stats function. Observed heterozygosity and inbreeding index F IS were used to assess the extent of inbreeding as indicated by reduced heterozygosity. Additionally, the proportion of shared alleles between individuals was calculated for the detection of clones. A Euclidian distance matrix was generated with stats dist for downstream multivariate analyses. Absolute genetic distance between locations was calculated in adegenet using dist.genpop and was visualized as a neighbor-joining tree using ape nj.
Hierarchical clustering was used to associate nonnative beetles with native beetle populations and to determine the relationships between and among groups. This clustering method was deemed more suitable than nonhierarchical clustering because population substructure is likely when sampling across a global metapopulation. All hierarchical clustering was performed using stats hclust.  (Pritchard, Stephens, & Donnelly, 2000). STRUCTURE was run with a burn-in of 1,00,000 generations with 1,50,000 subsequent generations in quintuplicate for K = 1 through K = 16, allowing for admixture. Change in log-likelihood scores between K (Evanno, Regnaut, & Goudet, 2005) was used to determine the most probable number of clusters with STRUCTURE HARVESTER (Earl & vonHoldt, 2012).
Genetic differentiation between clusters (determined here) and locations was tested by hierarchical analysis of molecular variance (AMOVA) in poppr.amova {poppr}. The extent of genetic differentiation was also estimated using phi-statistics within poppr.amova.
The significance of variance components, such as Φ CT , differentiation between clusters; Φ SC, differentiation among locations within clusters; and Φ ST , differentiation among locations, was assessed using has no model assumptions (Jombart, Devillard, & Balloux, 2010).
To determine the number of principal components to retain for the analysis, the a-score, which is the difference between observed discrimination and random discrimination, was calculated using adegenet a.score. Additional assessment of the optimum number of PCs to retain for the DAPC was performed through cross-validation by subsetting the data into training and validation sets using xvalDapc. After the number of principal components was determined, the DAPC was performed using for a specified number of clusters (hierarchical and location) using the function dapc, and the probability of individual membership to each cluster was visualized using compoplot.

| Sequencing and genotyping
Using ddRADseq, we successfully sequenced over 11 billion bases of DNA for SNP discovery and genotyping in X. crassiusculus. An aver- Average homozygosity (0.99 ± 0.02) and inbreeding coefficient F IS (0.84) were high for the 1,365 genotyped SNPs (variant sites) and at most geographic locations (Table 2) using all three population summary statistics programs: STACKS "populations," adegenet, and hierfstat. Estimates of these indices may be impacted by locus assembly parameters, such as overmerging of loci during assembly. However, heterozygote genotypes remained rare (Table 3). Additionally, more stringent mismatch parameters were tested for locus discovery/assembly and resulted in significant reduction in shared loci with no change in population summary statistics.

| Population structure and history
In its native East Asia, X. crassiusculus is comprised of at least two separate populations, sampled here: one in Southeast Asia including the mainland China, Thailand, and Taiwan, and one on most Japanese islands (Figure 2). Okinawa and Taiwan contain individuals from each population. The rest of the world was likely colonized from these two populations or a similarly closely related unsampled population: North America and Central America from the Japanese populations, Africa and Oceania from the mainland. Hawaii has been colonized by both populations. This is indicated foremost by hierarchal clustering where two highly divergent and very well supported (p < .001) primary clusters are observed ( Figure 3) and secondarily by Bayesian clustering (K = 2, ∆K = 118021.52, Table 4). These two most inclusive clusters, hereafter referred to as cluster 1 and cluster 2, encompassed all 198 individuals. Both clusters included individuals from the beetles' native range. Cluster 1 contained exclusively nativerange beetles from China and Southeast Asia, and cluster 2 primarily contained native-range beetles from Japan suggesting cluster origin, respectively. Cluster 1 also included all nonnative-range individuals from Taiwan, Southeast Asia, Africa, and five individuals from Hawaii, while cluster 2 included all introduced individuals from the continental United States, two individuals from Hawaii, and all individuals from Honduras. The genetic divergence between these two clusters is the largest source of genetic variation (61.2%, p < .01; Table 5) (Table 6).
At most locations, heterozygosity was low (<0.1; Table 2) as heterozygous genotypes were rare. The proportion of shared alleles was high (>0.9) except within the Taiwan, Okinawa, and Hawaii populations, which all had the lowest proportion of shared alleles (<0.8), in addition to large numbers of SNPs, and the highest gene diversities (>0.2). Some locations such as Maryland also had large numbers of SNPs, but gene diversity was low and the proportion of shared alleles high (>0.9). Inbreeding, as indicated by the inbreeding coefficient F IS , was very variable, ranging from −0.5 to 1.0 (Table 2). While inbreeding is clearly evident given the scarcity of heterozygous genotypes and often high inbreeding coefficients, outbreeding is also common at some locations as evidenced by the proportion of shared alleles, the higher than expected variation in inbreeding coefficients and even low F IS . In several populations, outbreeding appears to occur regularly (Madagascar, Aichi, and Honduras). These populations are characterized by negative F IS values, low numbers of polymorphic sites (<3), and high proportion of heterozygous sites relative to SNPs (Table 2).
However, given the few number of polymorphic loci at these locations, it is possible that the negative inbreeding indices reflect the high proportion of heterozygous polymorphic loci. For example, the single polymorphic loci in the population sample from Madagascar are heterozygous (Table 3).

| DISCUSSION
Populations of X. crassiusculus in invaded regions display significant population heterogeneity, particularly in the well-sampled North America. It is unclear whether this is a result of a few independent T A B L E 3 Genotype and heterozygous genotype frequencies at each location and averaged per individual introductions. However, this is unlikely, as in that case, we would expect to observe several distinct, clonal, or near-clonal populations in North America. The more likely explanations include (1) numerous, repeated introductions that blur between-group differences, or (2) introduction of a large founding population that retained its genetic variation, or (3) any of the above scenarios with a more than expected degree of outcrossing. All three scenarios are also mutually compatible.
Human-mediated expansion is the most likely route of introduction for X. crassiusculus especially at most U.S. locations given the documented continuous invasion of nonnative Xyleborini species through coastal areas (Rassati et al., 2016) and due to the fact that the species was simply absent in most of the sampled areas outside of Asia just a few decades ago. However, it is also possible that some expansion within the Old World has occurred without human-aid (Gohli et al., 2016). Historic dispersal is possible in X. crassiusculus where genetic divergence is high between nearby locations, for example, in Madagascar which is highly differentiated from all other African and Asian locations (Table 6).
It has been traditionally assumed that Xyleborini ambrosia beetles are functionally clonal, because haploid, flightless, and blind males are confined to their native galleries and unlikely to facilitate any gene flow between families. This has been revised by recent observations suggesting that males actually routinely crawl out of their native galleries in search for additional mating opportunities on the same tree (Peer & Taborsky, 2004). Indeed, at all of our locations, there are individuals with estimated inbreeding indices lower than expected for a completely inbred species, and this observation is driven by a few heterozygous genotypes. In fact, in a few locations, heterozygosity is uncharacteristically high for an inbreeding organism. These findings are contrary the traditional assumption of complete inbreeding in this species and call for reassessment of these assumptions in other putatively inbred invasive species.
The robust differentiation between the Japanese and mainland Asia beetle lineages might indicate cryptic speciation. High intraspecific genetic differences in X. crassiusculus have been observed among a few individuals sequenced at the COI locus (Dole et al., 2010). However, testing for cryptic speciation is beyond the scope of this study, and it is not even clear whether organisms with predominant inbreeding conform to any of the standard species concepts (see Fontaneto et al. (2007) for discussion). The significant substructure within the mainland lineage (cluster 1) might indicate that these are older populations that began to diverge earlier, or that the Japanese populations have not been sufficiently sampled.
Unsampled native-range populations could also be source populations. Given the high level of differentiation between lineages observed here and by others (Dole et al., 2010), we hypothesize that any unsampled populations would cluster within these two lineages. This can be tested for in the future using approximate Bayesian computation, which enables tests of multiple introduction scenarios, such as historic and recent, local expansion in introduced ranges, and admixture.
Xylosandrus crassiusculus continues to spread around the world.
Even during the preparation of this manuscript, a new expanding population has been reported from South America (Flechtmann & Atkinson, 2016). It is not the only insect with a rapidly expanding range. There are dozens of bark and ambrosia species, and thousands of species from other groups that are being introduced by human transport to nonnative locations. Communities of wood-boring insects around the world are beginning to resemble one another, and differences between regional assemblages fade (Brockerhoff, Bain, Kimberley, & Knížek, 2011;Haack, Rabaglia, & Peña, 2013). Global biotic homogenization is causing not only economic and ecological damage via the introductions of invasive species, but also the loss of biogeographic history.
Increasingly, variation inside organisms' genomes is the only evidence of the past geographical diversity of animals and plants on the planet.
Here, we show that affordable and effective analytical approaches are now available to disentangle the complex global structure of tramp species, even those with unusual reproduction strategies and even in the absence of any previous population or genome data.

CONFLICT OF INTEREST
None declared.

AUTHOR CONTRIBUTIONS
Caroline Storer contributed sample acquisition, specimen preparation, sequencing, analyses, writing, and interpretation. Adam Payton contributed to project design, analyses, writing, and interpretation.
Stuart McDaniel contributed to project design, data acquisition, and interpretation. Bjarte Jordal contributed to sample acquisition and interpretation. Jiri Hulcr contributed to project design, sample and data acquisition, interpretation, and writing.

DATA ACCESSIBILITY
Unfiltered barcode-split ddRADseq reads with quality scores for each individual have been deposited in the GenBank Short Read Archive (SRA), accession numbers SRR4181068-265 under BioProject