Generating continuous maps of genetic diversity using moving windows

Genetic diversity plays a key role in maintaining population viability by preventing inbreeding depression and providing the building blocks for adaptation. Understanding how genetic diversity varies across space is, therefore, of key interest in conservation and population genetics. Here, we introduce wingen, an r package for calculating continuous maps of genetic diversity, including nucleotide diversity, allelic richness, and heterozygosity, from standard genotypic and spatial data using a spatial moving window approach. We provide functions to account for variation in sample size across space using rarefaction, to create kriging‐interpolated maps of genetic diversity, and to mask any areas that are outside the area of interest. Tests with simulated and empirical datasets demonstrate that wingen can successfully capture variation in genetic diversity across landscapes from both reduced‐representation and whole genome sequencing datasets. For reduced‐representation datasets, wingen's functions can be run easily on a standard laptop computer, and we provide options for parallelization to increase the efficiency of running larger whole genome datasets. wingen provides novel and computationally tractable tools for creating informative maps of genetic diversity with applications for conservation prioritization as well as population and landscape genetic analyses.

of single individuals from different localities. In doing so, these approaches can take full advantage of the greater spatial coverage afforded by individual-based sampling if they can apply methodologies to estimate genetic diversity that do not require identifying populations or genetic clusters (Wang & Bradburd, 2014). However, tools to do so remain limited (e.g. Miller, 2005;Petkova et al., 2016;Shirk & Cushman, 2011;Vandergast et al., 2011). Here, we present a new method for constructing continuous maps of genetic diversity from individual-based sampling using a spatial moving window approach. This method calculates genetic diversity from samples that fall within a two-dimensional window of user-specified size around each cell of a landscape ( Figure 1) and is implemented in our new r package, wingen. Samples within the window around each cell are treated as a group-or neighbourhood-for estimating genetic diversity. The output is a raster map of genetic diversity in which each cell is assigned the value of the genetic diversity of the neighbourhood surrounding it.
Creating continuous maps of genetic diversity is particularly challenging because sample size often varies across space, which can bias estimates. Larger sample sizes are expected to capture more alleles, which can result in inflated estimates of genetic diversity in areas with more samples (Kalinowski, 2004). Because neighbourhoods can contain different numbers of samples, wingen provides options for rarefaction to control for the effects of sample size on measures of genetic diversity. We also include options to create interpolated surfaces of genetic diversity from the moving window rasters. To do this, wingen uses kriging, which estimates values at unknown points using a spatial model (i.e. a variogram). Kriging interpolation helps smooth out discontinuities created by the moving window and reduces noise caused by rarefaction, allowing for clearer visualization of spatial patterns of genetic diversity. Kriging also allows for better prediction accuracy by incorporating a spatially explicit model (Oliver & Webster, 1990). Finally, both the original moving window map and the kriged surfaces can be masked to exclude regions that were not sampled, were undersampled, or fall outside the region of interest. Altogether, wingen provides the tools to create informative maps of genetic diversity that can be used for population genetic analyses and to inform conservation decisions.

| PACK AG E DE SCRIP TI ON
wingen consists of three main functions: window_gd to create moving window maps of genetic diversity, krig_gd to spatially interpolate the moving window maps and mask_gd to mask the moving window and interpolated maps.

| Moving window maps with window_gd
The core function in wingen is window_gd, which generates a continuous map of genetic diversity using a moving window approach.
For each cell on a raster of the study area, window_gd calculates genetic diversity using the individuals within the window surrounding that focal cell ( Figure 1). The window size is user-defined and ideally should be set with some knowledge of the study system in mind (e.g. the dispersal kernel of the study organism). We provide a function, preview_gd, which allows users to explore the effects of different window_gd parameters on sample count in the moving window analysis.
The metrics calculated by window_gd are nucleotide diversity ( ), observed heterozygosity (H O ), and allelic richness (A R ). wingen includes two methods for calculating allelic richness. The first uses the allelic.richness function from the hierfstat package (Goudet, 2005).
The second uses a custom function that only applies to biallelic data but is computationally faster because it calculates the statistic using a small genotype dosage matrix. Both methods calculate allelic richness as the number of alleles per locus. Genetic diversity statistics F I G U R E 1 Schematic of moving window calculations and the outputs produced by the window_gd function: genetic diversity and sample count rasters. The number of individuals within the window used for the calculation is counted and assigned to the focal cell on the sample count raster.

Sample Count
Window are sensitive to sample size (Kalinowski, 2004), which users can address by using the option for rarefaction built into the function (Supporting Information).
The basic inputs to window_gd are a variant call format (VCF) file (a standard format for storing DNA polymorphism data, imported using the vcfR package; Knaus & Grünwald, 2017), sample coordinates and a raster of the study area. The output of window_gd is two moving window raster layers of (1) genetic diversity and (2) the number of samples used to calculate the genetic diversity statistic for each cell (hereafter, the sample count layer; Figure 1). The sample count layer is useful for examining the effect of sample size on genetic diversity.

| Spatial interpolation with krig_gd
To produce interpolated maps of genetic diversity, we provide the function krig_gd, which implements the autoKrige function from the automap r package (Hiemstra et al., 2008) which uses an automatically fitted variogram to perform kriging on the moving window raster produced by window_gd. The use of a spatially-explicit model, the fitted variogram, for interpolation allows for greater prediction accuracy. The inputs for this function are (1) a raster produced by window_gd and (2) a raster layer across which interpolation will be performed.

| Masking with mask_gd
The final main function in wingen is mask_gd, which takes the raster layer output from window_gd or krig_gd and masks cells in the genetic diversity layer based on a second raster or spatial object.
When using a raster for masking, a user-specified threshold is used to mask any areas that fall above or below that threshold. For example, the sample count layer can be used to mask regions where sample counts within neighbourhoods are low.

| Simulation example
We validated wingen using a landscape genomic dataset simu- We tested wingen on three datasets: the full dataset (142,129 SNPs and 1688 individuals), and two randomly subsampled datasets.
The subsampled datasets contained 10,000 and 100,000 randomly selected SNPs to approximate the size of reduced-representation Based on these tests, we set the window dimensions to 7 × 7 cells and aggregated the original raster by a factor of three to maximize the continuity and resolution of the map; we then generated moving window rasters, with and without rarefaction, for all three diversity metrics ( , A R , and H O ) for each dataset. We performed rarefaction with a minimum of two samples and using five iterations. As a demonstration of the entire workflow, we kriged the resulting raster of for the RADseq-type dataset with 200 individuals using krig_gd and masked the kriged raster based on the species range to exclude unsampled areas (Figure 2).
The resulting maps demonstrate that wingen successfully captures spatial variation in genetic diversity based on the underlying landscape rasters, the carrying capacity and movement surfaces ( Figure 3). We observed higher diversity in areas of the landscape with higher carrying capacity (i.e. larger population sizes) and better connectivity (i.e. more gene flow) across all statistics, for both dataset sizes, and before and after rarefaction (Figure 3, Figures S1 and S2). We did not observe large differences between the results from the WGS-type and the RADseq-type dataset for any of the statistics or either sample size ( Figures S1-S4). This is likely because of the highly simplified genomic architecture that we simulated (i.e. no selection and no linkage); empirical systems with more complex architectures may exhibit greater differences based on sample size and sampling distribution.
The allelic richness maps generated using the simulated datasets demonstrate why rarefaction can be important ( Figures S3 and S4).
Rarefied maps were consistent across all dataset sizes, while nonrarified maps for both WGS-type and RADseq-type datasets had lower allelic richness across the range compared to the full dataset due to its larger sample sizes ( Figures S3 and S4). The considerable differences in allelic richness between the full dataset with and without rarefaction can be attributed to how allelic richness is measured: with rarefaction, average allelic richness is calculated from a sample of fixed size across the landscape, whereas without rarefaction, allelic richness is calculated using all of the samples contained within the window, regardless of sample size. Otherwise, rarefaction did not have a substantial effect. The rarified maps appear slightly rougher than the non-rarified maps because the rarefaction approach involves drawing random samples for each cell, resulting in slightly more variation in the genetic diversity values from cell to cell.
One concern in moving window estimations is edge effects, in which estimates are less reliable along the edges of rasters due to windows falling off the edge of the map. This could lead to underestimates of genetic diversity along edges, where incomplete windows (which contain fewer cells than complete windows) could result in smaller sample sizes. In our simulations, there were no apparent edge effects (Figures S1-S4); nonetheless, for users concerned about potential edge effects, we provide an option in window_gd to crop the edges of the raster where the windows are incomplete.

| Empirical example
To demonstrate how wingen performs with empirical data, we used

| Computational time
For the simulated example dataset, moving window calculations of and H O for both sample sizes took under 1 min for the RADseq-type datasets on a single 3.4 GHz processor and under 3 min for the WGStype datasets parallelized across ten 3.4 GHz processors (Supporting Information; Figure S7). Calculations of A R took less than 10 min for the RADseq-type datasets and under 15 min for the WGS-type datasets ( Figure S8). For the S. occidentalis empirical example, it took approximately 30 s to calculate and H O and approximately 2.5 min to calculate A R . An extended runtime analysis to evaluate the effects of key window_gd parameters on computational time showed that runtime scales linearly with window size and raster size (Supporting Information; Figure S8). Overall, increasing sample sizes and numbers of SNPs predictably increased computational time, and allelic richness was the most computationally costly diversity metric.

| CON CLUS IONS
In order to conserve genetic diversity, we must first understand how it varies across space (Wagner & Fortin, 2013). Here, we introduce F I G U R E 2 Moving window maps of nucleotide diversity ( ) for the 200 sample RADseq-type dataset (10,000 SNPs) with varying levels of aggregation and window size, demonstrating how varying these parameters affects the continuity and spatial resolution of the resulting maps. Raster cells where the sample size within the window around that cell is less than two are assigned NA values and appear blank on the map. The grey background map represents the complete range of the simulated species.   (Petkova et al., 2016).
Generating maps of H O , and A R for biallelic datasets using wingen is computationally feasible on a standard laptop computer for a typical reduced-representation dataset. For larger datasets and for calculating allelic richness for non-biallelic data, parallelization can be used to decrease computational time. As would be expected, larger datasets covering large, higher resolution landscapes with large moving window sizes will take longer to run. The number of iterations used for rarefaction and the rarefaction sample size will also increase computational time, although these can be managed by users to suit their needs.
Users should explore different settings for window size, raster resolution, minimum sample number, and rarefaction parameters to determine the sensitivity of their results to these parameters. Users should also be mindful about how their sampling strategies affect their results and make use of the rarefaction and masking functions as needed. Our simulated dataset analyses demonstrated that and H O are the least sensitive to dataset size and rarefaction compared to A R .
The wingen package provides an effective way of summarizing genetic diversity across space using spatial moving windows and provides the added functionality of rarefaction and masking to help address sample size effects when calculating genetic diversity. The inputs to wingen are all standard genetic and spatial data types, and the raster layers of genetic diversity that wingen produces are in a common spatial data format that can be analysed on their own or used in downstream population genetic analyses. Overall, we demonstrate that wingen is able to capture real spatial patterns in genetic diversity caused by underlying variation in gene flow and population size.