An endangered flightless grasshopper with strong genetic structure maintains population genetic variation despite extensive habitat loss

Abstract Conservation research is dominated by vertebrate examples but the shorter generation times and high local population sizes of invertebrates may lead to very different management strategies, particularly for species with low movement rates. Here we investigate the genetic structure of an endangered flightless grasshopper, Keyacris scurra, which was used in classical evolutionary studies in the 1960s. It had a wide distribution across New South Wales (NSW) and Victoria in pre‐European times but has now become threatened because of land clearing for agriculture and other activities. We revisited remnant sites of K. scurra, with populations now restricted to only one area in Victoria and a few small patches in NSW and the Australian Capital Territory (ACT). Using DArtseq to generate SNP markers as well as mtDNA sequence data, we show that the remaining Victorian populations in an isolated valley are genetically distinct from the NSW populations and that all populations tend to be genetically unique, with large F ST values up to 0.8 being detected for the SNP datasets. We also find that, with one notable exception, the NSW/ACT populations separate genetically into previously described chromosomal races (2n = 15 vs. 2n = 17). Isolation by distance was detected across both the SNP and mtDNA datasets, and there was substantial differentiation within chromosomal races. Genetic diversity as measured by heterozygosity was not correlated with the size of remaining habitat where the populations were found, with high variation present in some remnant cemetery sites. However, inbreeding correlated negatively with estimated habitat size at 25–500 m patch radius. These findings emphasize the importance of small habitat areas in conserving genetic variation in such species with low mobility, and they highlight populations suitable for future translocation efforts.


| INTRODUC TI ON
As with other animals, terrestrial invertebrates are increasingly being threatened by habitat destruction, climate change, invasive species, pesticides, and other threats connected to human activities (Black & Vaughan, 2009;Hafernik, 1992;Wagner & Van Driesche, 2010). Thus, terrestrial invertebrate population declines and extinction rates over the last few 100 years can match those of vertebrates and vascular plants (Harvey et al., 2020;Leidner & Neel, 2011;Thomas & Morris, 1994). Despite this rate of decline and the role of threatened invertebrates in essential ecosystem services such as pollination (Kim, 1993;Wagner & Van Driesche, 2010), there is still only a limited focus on their conservation around the world, including in Australia (Sands, 2018).
Part of the problem resides in taxonomic issues, with many species undescribed and/or lacking basic taxonomic information (Hochkirch, 2016;Kim, 1993;New & Sands, 2004), leading to the risk that some species may face extinction even before they are known. Yet in Australia, many threatened invertebrates represent unique evolutionary lineages that form an important component of biodiversity (Cranston, 2010).
Although genetic data are critical in informing conservation strategies, helping to resolve taxonomic issues, defining patterns of connectedness across populations, and assessing the adaptive capacity of populations to future environmental changes, very little genetic data exist for threatened terrestrial invertebrate species.
There are so far relatively few attempts to integrate modern genomic approaches based on genome-wide SNPs or genome resequencing into invertebrate conservation efforts (e.g., Chen et al., 2017;Dupuis et al., 2020). These approaches can provide very detailed information on patterns of gene flow, hybridization, and evolutionary potential in threatened species that can guide management actions (Allendorf et al., 2010).
Here we provide SNP and mtDNA-based analysis of populations of an endangered morabine grasshopper, Keyacris scurra (formerly known as Moraba scurra). Morabines represent a unique group of Australian flightless grasshoppers, with a characteristic matchstick-like appearance. The morabines consist of ~250 species and 41 genera found across Australia on a range of plant types including grasses, trees, and shrubs (Blackith & Blackith, 1969;Key, 1977). Keyacris scurra is one of the better-known morabines.
The genus Keyacris was named after the entomologist Ken Key (Day & Rentz, 2004) and studied by the Australian geneticist and evolutionary biologist Michael White (White, 1956;White et al., 1963).
The species was used in pioneering work on adaptive genetic polymorphisms in collaboration with the American evolutionary biologist Richard Lewontin (e.g., Lewontin & White, 1960;White et al., 1963) which led to an ongoing debate about population processes affecting chromosomal polymorphisms and particularly the existence of adaptive landscapes (reviewed in Grodwohl, 2017).
The species has two chromosomal races, defined through chromosome preparations from male genitalia as either 2n = 15 (seven pairs of autosomes and a small acrocentric X-chromosome) or 2n = 17 (with one of the autosomes split into two acrocentric chromosomes) (White, 1956). Populations can also differ in the frequency of chromosomal arrangements that represent inversion polymorphisms (White, 1956).
The species was found in northeastern Victoria in the wheat/ grazing belt and in the wheat/grazing belt of eastern New South Wales (NSW) as far north as Orange. White (1956) noted that K. scurra was already threatened when he indicated that they consist of "relatively minute 'islands' in the general area within which the species occurs." Most of these "ecological islands" studied by White were places which had escaped agricultural intensification and regular grazing, such as small rural cemeteries, small reserves, and railway cuttings (Rowell & Crawford, 1995).
The species appears confined to habitats of a special type in which the tall perennial grass, Themeda triandra, usually predominates. This once dominant grass is removed by cropping and is grazing sensitive, and it now only dominates relict areas, which are often also refuges for other similarly sensitive plant species (Dorrough & Scroggie, 2008), including many daisies that K. scurra requires for food (White, 1956). Suitable habitats occur in grassland, savannah woodland, and on the ecotones between the latter habitats and both "dry" and "wet" sclerophyll forest. Keyacris scurra is an overwintering species, hatching in summer and with a univoltine life cycle. The species is unfortunately found within one of the most modified regions of Australia (Glanznig, 1995) where very little remnant habitat remains. The species has very limited dispersal ability due to its flightless habit. The main threat likely remains the management of vegetation (e.g., cemeteries are now managed by repeated mowing close to ground level which destroys the habitat of K. scurra).
Here we use genome-wide SNP markers to assess the genetic structure of K. scurra, establish patterns of genetic variation in remnant populations, and consider associations between genetic variation and the extent of remaining suitable habitat. We resample areas where the species was previously found as well as new areas that appear to have suitable vegetation. Our results show a high level of genetic differentiation across regions even when these are relatively close together, evidence for genetic isolation by distance in both nuclear and mtDNA markers, evidence of inbreeding in some populations, and some genetic differentiation patterns unrelated to the chromosomal constitution of populations. We show that genetic diversity varies among populations, and we test whether this variability as well as inbreeding is related to the extent of habitat available in the proximity of the sampling sites. We find that even very small habitat patches may support populations with valuable genetic diversity, although increased inbreeding seems to relate to smaller habitat size.

| Sampling sites
Samples of K. scurra were collected from 17 locations from NSW and Victoria in 2019 for molecular work following an extensive survey to map the current distribution of this species (Figure 1). These samples had been collected prior to the listing of the species as "endangered" in NSW (https://www.envir onment.nsw. gov. au/resou rces/threa tened speci es/deter minat ions/C AMKe ysMat chsti ckGra sshop perES PD.pdf) and with the approval of the Department of the Environment, Land, Water and Planning in Victoria following the rediscovery of the species from Omeo.
Only a few individuals were collected from the smaller populations (particularly at Bungonia, Gundagai South Cemetery, Windellama North) (Table 1). Two sites (Windellama and Bungendore) had previously been subjected to a deliberate translocation by White (1957). Grasshoppers were collected with aspirators across an area of >20 m 2 and preserved individually in 100% ethanol in Eppendorf tubes. They were then brought back to the laboratory for DNA processing.

| CO1 PCR and sequencing
A total of 59 individuals were screened from the ACT and NSW (13 populations, 43 individuals) as well as Victoria (3 populations, 16 individuals) with all regions in Table 1 represented. DNA was extracted using a Chelex ® 100 Resin (Bio-Rad Laboratories, Hercules, CA) method on the upper half of a grasshopper hind limb. Tissue was crushed with 2 × 3 mm glass beads and 200 µl of 5% (w/v) Chelex ® 100 suspension using a mixer mill. Extractions were incubated for 2 hr at 60°C with 5 µl proteinase K (20 mg/ml) (Roche Diagnostics Australia, Pty Limited) and heated at 90°C for 10 min. Prior to polymerase chain reaction (PCR) amplification, extractions were spun at 14,000 rpm for 2 min, and DNA in solution was removed from just above the Chelex ® resin.
The PCR amplification profile for COI consisted of an initial denaturing step at 95°C for 4 min (1 cycle), 40 cycles of denaturation F I G U R E 1 Map of sites surveyed for molecular variation. These sites encompass most of the current known fragmented distribution of Keyacris scurra. Singletons from two additional sites were included in the molecular survey: a site close to Gundagai ("Gundagai Cemetery") and a site close to Windellama ("Windellama North"). Chromosomal races based on White (1956)

| DArT-Seq™ processing
A high-throughput genotyping method using the DArT-Seq™ technology at Diversity Arrays Pty Ltd was employed. Here, complexity reduction is used to enrich nuclear genome representations with active genes and low copy sequences through combinations of restriction enzymes and reduction methods (https://www.diver sitya rrays. com/techn ology -and-resou rces/darts eq/ and Kilian et al., 2012).
Implicit fragment size selection and next-generation sequencing of representations are subsequently performed with HiSeq2000 (Illumina) Kilian et al., 2012 with VCFtools (Danecek et al., 2011). In all cases, only loci with <5% missing data across those individuals included within a dataset were were not available, see Table 1) but using the SNPs identified from this filtering process, we then computed heterozygosity for all individuals. Following the recommendation in Schmidt et al. (2020), we also computed heterozygosity based on all nucleotides (i.e., including all polymorphic and monomorphic nucleotides).

| Heterozygosity
In characterizing individual heterozygosity, we derived three estimates; heterozygosity was computed based on either one SNP per (80 bp) locus randomly selected from the datasets, all SNPs across sequences, or all nucleotides sequenced including those that were monomorphic. In the first approach, there were 2060 SNPs retained from 13,518 SNPs identified at MAC = 1 for the heterozygosity estimate, whereas when all SNPs were retained, there were 10,618 SNPs that contributed.
All datasets were passed to the R "adegenet" package as genind objects for further calculations of heterozygosity and other statistics. Sites with only one individual were excluded from population measures but were included in individual heterozygosity assessments. The "Hs" function from "adegenet" was used to calculate expected heterozygosity for populations.

| Population structure analyses
The "all individuals" and "MAC = 3" dataset was used as the basis of a run of the program STRUCTURE in its ADMIXTURE mode (Pritchard et al., 2000). In these analyses, we did not want to bias toward variable regions and therefore we filtered by randomly selecting only one SNP per sequence. Following inference of lambda, MCMC runs were completed with a burn-in of 10,000 and a further 100,000 repeats for parameter inference. K values between K = 1 and K = 10 were investigated, with ten runs per value of K and K = 10 set as the maximum based on available computing resources.
Results were passed to the program CLUMPAK for collation and summarizing and evaluated according to various K-inference procedures to determine optimal settings (Kopelman et al., 2015), with K = 6 being selected by the modified Evanno method. A further run was conducted with the "even populations" and "MAC = 3" dataset, under the same conditions and a separate lambda inference.
Datasets with all individuals were passed through PCoA, PCA, and DAPC multivariate analyses (via "ade4" [Dray & Dufour, 2007] and "adegenet" [Jombart, 2008]). For principal component analysis, missing data were handled wherever possible by interpolation with the mean of the sampling location where the sample with missing data was found. The same principle was applied for the DAPC analyses.

| AMOVA
A hierarchical analysis of molecular variance (AMOVA) was conducted on the SNP dataset using the "pegas" amova function im- Pool, Burra). 1,000 permutations were conducted to test for significance across the differing levels.

| F ST and isolation by distance based on SNPs
Pairwise F ST s for each population were calculated via the "pairwise.
WCfst" function in the R package "hierfstat" (Goudet, 2005). These were converted to a distance measure, F ST (1 − F ST ) and compared to geographic distance between sites (in km) to check for isolation by distance. Mantel tests comparing distance matrices and genetic distance were also undertaken in "adegenet."

| Phylogenetic analysis
To provide another approach to look at associations among populations, phylogenetic relationships among the populations were established using dartR, version 1.1.6 (Gruber et al., 2018) within the R programming environment (version 4.0.3, R Development Core Team, 2018). For DArTSeq™ nuclear data, we concatenated (a) sequence fragments (trimmed sequence tags with SNPs) and (b) SNPs only across loci for individuals where heterozygous positions were replaced by the standard ambiguity codes, and exported these as FASTA files. There were 5,608 SNPs retained after the Diversity Arrays filtering used prior to the phylogenetic analysis. Sequences were aligned and checked in MEGA, version 5.2 (Tamura et al., 2011), and SNP data were then converted to a NEXUS file format using the "ape" R package, version 5.4 (Paradis & Schliep, 2019). The FASTA file and NEXUS file were imported into CIPRES Science Gateway, version 3.3 (https:// www.phylo.org/) (Miller et al., 2010) for maximum-likelihood (ML) and Bayesian inference (BI) phylogenetic analyses, respectively.
For ML, we used the program RAxML, version 8.2.12 (Stamatakis, 2014) on the "sequence fragments," that is, variant plus invariant data, for improved branch length and topological accuracy in phylogenetic trees (Leaché et al., 2015). We assessed support for  ac.nz) and the minimum spanning network option. This program was considered appropriate because we had no sites with missing data.

| mtDNA analysis
Genetic and geographic distance matrices were created using the average number of base pair differences and latitude and longitude coordinates respectively for all pairwise population comparisons. A relationship among distance measures was investigated using Mantel tests performed with the "mantel.randtest" function in R package "ade4" (Dray & Dufour, 2007) with 1,000,000 permutations. Nuclear genetic distance was also compared to mtDNA distance with nuclear distance calculated as the Euclidean coancestry coefficient.

| Vegetation analysis
We assessed whether genetic variation is related to the area of available habitat, measured at different scales. Such an association might be expected given that K. scurra is closely associated with relict patches of Themeda grasslands through its basic requirements for food and shelter (White, 1956). To analyze the extent of available suitable habitat at each collection point, we compared between the raster of the

| Population variation
The three measures of heterozygosity variation we estimated for populations based on individuals were all highly correlated across populations (r > 0.92). We therefore ran all further analyses with one SNP retained per sequence with all individuals considered, which maximizes the retention of variation in populations where sample sizes are small, but nevertheless allows all individuals from a population to be used in computing heterozygosity ( Figure 2 and Table 1).
There was a significant difference in heterozygosity among populations when heterozygosity of individuals was compared (p < 0.001, Figure 2). Observed population heterozygosity (H o ) varied from expected heterozygosity (H e ) in some cases as reflected by Cooma and Bungonia had particularly low levels of heterozygosity ( Figure 2). The low level of heterozygosity at Bungonia may partly reflect inbreeding given the large F IS although the sample size for this population was small (Table 1) (Figure 2). F I G U R E 2 Box plots for individual heterozygosity by location (for selected SNPs, all individuals prefiltered to MAC = 1 from data where population with an even number of individuals were sampled, but then computed for all individuals from a population). Note that populations are ordered to match the STRUCTURE analysis below. Gundagai Cemetery and Windellama North are not included here because they are represented by singletons Overall, these patterns suggest a range of genetic variability levels in K. scurra populations and high levels of variation even in some populations where suitable habitat appears limited.

| Vegetation associations
We found a significant negative relationship between habitat area (buffered at 25 and 50 m) and F IS (r = −0.54 and 0.57, p < 0.05), suggesting that populations in smaller habitat patches are more inbred (Figure 3, right column). This relationship was stronger when smaller radii were used to define available habitat. Relationships with observed heterozygosity were not significant (Figure 3, left column). Additional differentiation among sites was evident as K values were increased ( Figure 6). An AMOVA (Table 2) indicated significant effects of regions and sites: 32% of variance is found within sites, 17% between sites within regions, and 51% between regions.

F I G U R E 3
Association between a likelihood model of intact native grassland (S. J. Sinclair and M. D. White, unpublished)

| Phylogenetic analysis
The phylogenetic analysis confirmed the uniqueness of the populations (Supplementary Information). Both the Bayesian tree (Figure 9) F I G U R E 4 Variation in the COI gene sequence across Keyacris scurra as depicted by a network diagram. The numbers of nucleotide changes are indicated in brackets. The size of the colored areas reflects the number of haplotypes, and branch lengths reflect the number of nucleotide changes. COI, cytochrome oxidase subunit 1 F I G U R E 5 STRUCTURE plots for (a) all individuals and (b) even populations at K = 6 (best supported by modified Evanno method) and the ML tree ( Figure 10) showed that the individuals clustered into their collection sites. This included collection sites where the DAPC analyses did not clearly separate individuals into the collection sites, such as Kambah Pool and Burra, and also Windellama, Bungonia, and Gundary. Overall, the structure produced by these unrooted trees was consistent regardless of whether a ML analysis or a Bayesian analysis was used (Figures 9 and 10) with reasonable support for the clusters that were identified.   Figure 11). A Mantel test indicated a significant association between geographic and genetic distance (r = 0.7660, p < 0.001, 1,000,000 permutations) consistent with the IBD regression analysis.

| IBD analysis
We also ran an IBD analysis on the mtDNA data by comparing the number of nucleotide differences between populations. A Mantel test on the mtDNA data indicated a significant association between geographic distance and nucleotide differences (r = 0.876, p < 0.001) in agreement with the nuclear comparison. A Mantel test also indicated a positive association between the nuclear differences among populations and the mtDNA differences (r = 0.693, p < 0.001).

| D ISCUSS I ON
Keyacris scurra is an endangered species persisting for many decades in small areas where suitable habitat has remained. The grasshoppers from Windellama and Gundagai South cemeteries were sampled from suitable habitat covering only a few hectares which are surrounded by farmland. Keyacris scurra has persisted at these small sites since the 1950s and 1960s. Samples from both these sites show high levels of genomic variability relative to other samples, which suggests that there has been limited loss of genetic variation through genetic drift to date from these isolated sites. On the other hand, K. scurra has been lost from many other small remnant areas where they were recorded in the 1950s and 1960s, most likely through inappropriate site management. For instance, White et al. (1963) performed evolutionary studies on the cemetery site at Murrumbateman in the ACT, where we failed to find the grasshopper despite multiple attempts to locate them there. We also visited many cemeteries in Victoria where K. scurra had been present in the 1950s (White, 1956), but specimens could not be found. In these areas, we found that T. triandra grassland has often persisted, but we believe that site management has removed the specific habitat elements required for K. scurra to persist, either via the exclusion of daisies through overgrowth of Themeda (Stuwe & Parsons, 1977) or via structural modification of the grass sward by regular mowing to keep cemeteries neat (Clayden et al., 2018). indicate how well insect populations can survive in fragments as long as suitable habitat is available (Tscharntke et al., 2002).
The populations at Windellama and Bungendore had previously been subjected to a deliberate translocation by White (1957)  F I G U R E 9 Bayesian tree representing individual grasshoppers generated from SNP data with probability support values shown at branch nodes. Note the tight clustering by site location hybrid populations that are genetically distinct from parental populations as noted for the threatened field cricket, Gryllus campestris (Witzenberger & Hochkirch, 2008) and adders (Madsen et al., 1999).
Here we find that the two populations map in multivariate space with nearby populations (Figure 7) which were 19+ km away (Figure 1 Whiteley et al., 2015); it can be useful where there is strong evidence of a decline in genetic diversity and has been proposed as a useful approach for some threatened Australian insects (Roitman et al., 2017). However, with genetic variation persisting so far even in small areas, there is likely to be limited benefit from such an exercise. Instead, we suspect that it is important to maintain the remaining variation across the range of the species given that there is very strong genetic differentiation among the populations. The The value of small reserves in preserving invertebrates (Hafernik, 1992) and plant biodiversity (Kendal et al., 2017)  The substantial genetic distances separating populations raise the issue of how to conserve diversity within the species. Clearly at this stage, genetic uniqueness of populations is not associated with a loss of genetic diversity as is the case of marsupials and some other invertebrates (Weeks et al., 2016). It is important to conserve current levels of diversity across the landscape, and the genomic data suggest that this can be achieved with relatively small areas.
Increasing the number of fragments also helps protect against fires and other catastrophes that threaten Australia's insect species more generally (Sands, 2018), and provides nearby populations for future translocation efforts. The recreation of vegetation dominated by Themeda and a range of daisies is tractable if the high costs of seed can be overcome (Gibson-Roy & Delpratt, 2015), so that the strategic creation of insurance populations of K. scurra is likely possible.
Habitat corridors may have limited benefit for this species given that it can persist in small areas although its inability to fly means that any movement between nearby fragments will be rare.
Beyond their conservation merit, the ability to create populations may also permit studies of fundamental biological questions.
Following on from Michael White's early work with the benefit of modern molecular tools, there are opportunities to further understand the evolutionary dynamics of K. scurra populations and reconsider some key evolutionary questions that were previously considered in this system. Early work by White argued that chromosomal rearrangements which could easily be scored in this grasshop- per represented examples of heterozygote advantage and adaptive fitness interactions among chromosomal forms (White, 1957;White et al., 1963), which were interpreted as chromosomal forms being at different fitness peaks in an adaptive landscape (Lewontin & White, 1960). This was queried by others who argued for the importance of weak inbreeding (Allard & Wehrhahn, 1964) and changes in the selective advantage of different chromosomal arrangements across time (Colgan & Cheney, 1980) in accounting for patterns in these arrangements. By establishing populations with different combinations of chromosomal rearrangements from the same or different populations along climate gradients where the species occurs, and tracking changes in both the frequency of the rearrangements and their genomic content, it should be possible to gain insights into the extent to which rearrangements lock up adaptive genetic combinations, enhance or retard rates of evolutionary change, and change in fitness as a consequence of environmental variation. Such issues continue to be debated in the literature where Drosophila inversions in particular are regarded as important in climate adaptation (Kapun et al., 2016;Rane et al., 2015). Efforts to pursue these questions with Keyacris scurra would be greatly enhanced by developing an assembled and annotated genome of this and related morabine species.

This research was supported by the Australian Research Council
Discovery Grant DP190100990. We thank Craig Moritz and Ted Deveson for support during the fieldwork and Nick Bell for formatting assistance.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Aligned.bam sequence files for Keyacris scurra are available through https://www.ncbi.nlm.nih.gov/biopr oject/ 702007.