Population genomics of the Anthropocene: urbanization is negatively associated with genome‐wide variation in white‐footed mouse populations

Abstract Urbanization results in pervasive habitat fragmentation and reduces standing genetic variation through bottlenecks and drift. Loss of genomewide variation may ultimately reduce the evolutionary potential of animal populations experiencing rapidly changing conditions. In this study, we examined genomewide variation among 23 white‐footed mouse (Peromyscus leucopus) populations sampled along an urbanization gradient in the New York City metropolitan area. Genomewide variation was estimated as a proxy for evolutionary potential using more than 10 000 single nucleotide polymorphism (SNP) markers generated by ddRAD‐Seq. We found that genomewide variation is inversely related to urbanization as measured by percent impervious surface cover, and to a lesser extent, human population density. We also report that urbanization results in enhanced genomewide differentiation between populations in cities. There was no pattern of isolation by distance among these populations, but an isolation by resistance model based on impervious surface significantly explained patterns of genetic differentiation. Isolation by environment modeling also indicated that urban populations deviate much more strongly from global allele frequencies than suburban or rural populations. This study is the first to examine loss of genomewide SNP variation along an urban‐to‐rural gradient and quantify urbanization as a driver of population genomic patterns.


Introduction
Humans exert an outsized influence on ecosystems (Vitousek et al. 1997) in the Anthropocene. This era began sometime between the late Pleistocene (Ellis et al. 2013;Ruddiman et al. 2015) and industrialization in the last few centuries (Steffen et al. 2007), but is always characterized by a global increase in human influence on biological and geochemical processes. Rapid urbanization is a key characteristic of the contemporary Anthropocene. The proportion of humans in cities increased from 16% to 50% in the last century and is projected to reach 70% by 2050 (Heilig 2011). Urban land conversion may occur at an even faster rate than population growth, thus resulting in accelerating encroachment of cities on reservoirs of biodiversity (Seto et al. 2012). Urban areas become ecologically homogeneous  due to loss of vulnerable species that in turn enhances the probability of abrupt state shifts (Barnosky et al. 2012).
Most species are 'urban avoiders' that do not persist after urbanization, but 'urban adapters' and 'exploiters' are facultative or obligate users, respectively, of human-dominated habitats (Blair 2001;McKinney 2002). Although Blair's (2001) conception of urban adapters did not explicitly include evolution after urbanization, local adaptation may have enhanced the ability of urban adapters to exploit human subsidies. Urban habitats are 'novel ecosystems' composed of unique ecological communities and processes (Hobbs et al. 2006(Hobbs et al. , 2009, and thus, many species likely face new, strong selection pressures from urbanization. Humans have undoubtedly altered the evolutionary trajectory of crop, pest, and disease species (Palumbi 2001), but conclusive cases of human-driven evolution in organisms that are not human commensals or pathogens have been difficult to identify (Meril€ a and Hendry 2014) outside of a few recent examples (Donihue and Lambert 2014).
Urbanization results in severe habitat fragmentation (Zipperer et al. 2012), a process that reduces genetic variation among animal populations (Keyghobadi 2007;Rivera-Ort ız et al. 2014). Loss of variation may in turn reduce the evolutionary potential of populations experiencing rapidly changing ecological conditions (Etterson and Shaw 2001;Hoffmann and Sgr o 2011;Oakley 2013). Definitive evidence of adaptive evolution from standing variation in wild populations is still relatively scarce (Colosimo et al. 2005;Renaut et al. 2011;Domingues et al. 2012), but new userfriendly approaches to generating large population genomic datasets have improved prospects for quantifying and analyzing genetic variation in the wild (Narum et al. 2013). Evolution can proceed at the nucleotide level through new mutations or changes in frequency of standing genetic variants (Orr 2005), but it is likely that standing variation is more important in cases of rapid evolution (Barrett and Schluter 2008). Results from laboratory experiments (Teot onio et al. 2009;Burke et al. 2010), ancestral variation in model organisms (Rockman 2008;Scarcelli and Kover 2009), humans (Pritchard et al. 2010), artificial selection in crops (Gibson and Dworkin 2004), viral pathogens (Pennings 2012), and invasive species (Prentis et al. 2008) all support this association.
Predicting the loci, genetic architecture, and additive effects involved in adaptive responses to urbanization will be difficult in most cases, but estimating genomewide variation as a general proxy for evolutionary potential is a useful alternative. Measures of allelic diversity and heterozygosity from genomewide markers have the advantages of accounting for traits influenced by many loci of small effect, and for predicting how much variation will be available for future adaptive responses to unknown selection pressures (Harrisson et al. 2014). One criticism of using genomewide single nucleotide polymorphism (SNP) datasets to estimate evolutionary potential is that many loci are not associated with functional genomic regions. However, average allele frequency divergence predicts the most extreme F ST outliers, and the geographic structure of neutral and selected alleles is nearly identical, in humans (Coop et al. 2009). Genomewide SNPs have also been successfully used to predict phenotypic improvement through 'genomic selection' methods in artificially selected species such as livestock (Meuwissen et al. 2013). The increased accessibility of genomewide markers for nonmodel organisms thus provides many opportunities for measuring and predicting evolutionary potential in an urbanizing world (Harrisson et al. 2014). Here, we examine genomewide variation in white-footed mouse populations sampled along an urbanto-rural gradient and robustly quantify urbanization as a driver of population genomic patterns.
Previous microsatellite-based analyses on this system showed substantial genetic structure between white-footed mouse populations in NYC's forest fragments (Munshi-South and Kharchenko 2010). Park area, age, or the extent of habitat within parks did not explain levels of genetic variation among these populations (Munshi-South and Nagy 2014), but these studies focused exclusively on forest fragments within NYC that were highly isolated by surrounding urbanization. In a separate analysis, we identified SNPs from transcriptomes sequenced from urban and rural populations, and found that population structure was greater among the urban than the rural populations (Harris et al. 2015a). This analysis was limited to only six sampling sites, however, and SNPs from coding regions may often deviate from neutral expectations. In this study, we expand these investigations using large, genomewide SNP datasets, and include populations sampled along an urban-to-rural gradient spanning 142 km from the urban core to extensive, rural protected areas. We specifically used a doubledigest RADseq protocol ) to generate over 10 000 SNPs for analyzing the population genomics of white-footed mice.
Land use transformation and anthropogenic barriers may ultimately reduce gene flow between populations (Epps et al. 2007;Balkenhol and Waits 2009;Jha 2015), leading to greater genetic structuring and loss of genomewide variation in urbanized areas. Several statistical approaches have recently been developed to investigate the influence of landscapes on genetic structure, although they vary in ability to distinguish between isolation by distance (IBD) and ecological factors. Isolation-by-resistance (IBR) models examine the statistical association between genetic and 'resistance' distances, where the latter represent probabilities that individuals disperse between populations given the landscape 'friction' to dispersal (McRae 2006). The friction values are inferred from empirical movement data or optimized using model selection approaches. One drawback of IBR is that resistance distances are calculated across all paths that individuals may take between populations, and thus are not truly independent from IBD. Isolation-byenvironment (IBE) processes, in contrast, are characterized by positive correlations between environmental differences around sites and genetic distances that are independent of the effects of geographic distance (Wang and Bradburd 2014). Both IBR and IBE patterns are influenced by the same biological processes that limit migration across landscapes (such as costs of movement or selection against dispersing genotypes), but IBE patterns are defined by their independence from IBD. IBR and IBE also differ in that the former is focused primarily on the influence of the landscape between sites or individuals, whereas the latter is concerned with the influence of the environment in and around sites.
We previously reported that gene flow between populations of white-footed mice could be explained by IBR models based on patterns of urban vegetation cover in NYC (Munshi-South 2012). Here, we investigate IBR in a much broader range of landscape conditions along an urban-torural gradient. In our earlier work, we used partial Mantel tests and a causal modeling approach to identify ecological distances between populations (based on vegetation cover) that best explained gene flow after factoring out IBD (Cushman and Landguth 2010). While this approach can be successful for landscapes composed of high-contrast cover types (such as NYC), several authors have argued that partial Mantel tests have low statistical power and are prone to false positives (Legendre and Fortin 2010;Graves et al. 2013). Here, we use a partial Mantel test for our IBR model, as well as a new statistical IBE approach to quantify the relative contributions of urbanization and IBD to genetic differentiation between white-footed mouse populations (Bradburd et al. 2013). Specifically, we model the strength of covariance in allele frequencies between populations as a function of IBD and IBE due to urbanization of the landscape.
In this study, we test the following interrelated predictions about the population genomics of white-footed mice (Peromyscus leucopus Rafinesque) in the New York City (NYC) metropolitan area: 1) Genomewide variation within populations is inversely related to urbanization of the surrounding landscape. 2) Urbanization of the landscape results in greater genomic structure and differentiation between populations. 3) 'Resistance distances' and 'ecological distances' resulting from urbanization are better predictors of genetic differentiation than geographic distances between populations.

Study species and sampling sites
White-footed mice are one of the most widespread and abundant small mammals in eastern North America, and occupy a broad range of forest, meadow, and secondary growth habitats. They have served as model systems for population ecology for decades (Vessey and Vessey 2007;Brunner et al. 2013) because of their ubiquity and easy trappability. Peromyscus spp. more broadly have emerged as model systems for the genomics of adaptation (Bedford and Hoekstra 2015), and the first reference genomes are currently being assembled (Kenney-Hunt et al. 2014).
For this study, we sampled white-footed mice from 23 sites in the NYC metropolitan area (Table 1). The 12 sites within NYC limits were the same as in previous studies, At each site, we trapped white-footed mice over a period of one to three nights using two or more 7 x 7 grids of Sherman live traps (9″93″93″). Traps within grids were placed 15 m apart, and grids were located several hundred meters apart to avoid trapping close relatives. We collected ear punches or tail clippings from up to 25 mice at each site, and stored the tissue in 80% EtOH before transfer to a -20 C freezer in the laboratory. Animal handling procedures were approved by Fordham University's Institutional Animal Care and Use Committee (Protocol No. JMS-13-03).
We chose sites that qualitatively represented typical urban (n = 12 sites), suburban (n = 4), and rural (n = 7) sites in the NYC metropolitan area based on the levels of development (Fig. 1). However, suburban counties adjacent to NYC have higher human population densities than major cities in other parts of North America. To facilitate comparisons with other urban areas, we calculated the size of the human population and percent impervious surface in geographic buffers around each site as proxies for relative urbanization (Fig. 1B).
We first incorporated GPS coordinates for our study sites into base layers (2010 TIGER/Line â Shapefiles-County or States and equivalent) available from the US Census Bureau. Buffers were created in ArcGIS v10.1 (ESRI, Redlands, CA, USA) around each site's GPS coordinates with radii of 500 m, 1000 m, 1500 m, and 2000 m. These buffers were chosen because they are relevant to the typical lifetime dispersal distance of many white-footed mice; for example, no mice over a 40-year study of one woodlot dispersed to the nearest neighboring woodlot <1.5 km away (Vessey and Vessey 2007). To determine human population size inside each buffer, we used US Census Blocks as these provide the smallest geographic unit with 100% census data. We first calculated the area of each census block and then intersected the census block and buffer layers. We interpolated human population size within each buffer based on the percentage of area for each census block that intercepted each buffer ([area of census block within buffer/area of census block] 9 population of census block). We then summed all of the interpolated population sizes that fell within each buffer.
To determine percent impervious surface within each study site buffer, we used the 2011 Percent Developed Imperviousness layer from USGS National Land Cover Data (Xian et al. 2011). We then calculated zonal statistics to measure the geometry of the raster file and summarize the cell values of the raster that fell within each buffer.  Table 1. (B) Scatterplot of percent impervious surface cover vs. human population size for the 23 sampling sites. Both variables were log-transformed to improve linearity, and were measured using 2-km buffers around study sites as described in the text. Red dots represent urban sites, green dots represent suburban sites, and black dots represent rural sites.
From these data, we calculated the average percent impervious surface within each landscape buffer for each site.
These estimates of impervious surface and human population size could be highly correlated at the different spatial buffers, so we calculated Pearson correlation coefficients between all pairs of values (Table S1). To avoid including redundant information in downstream analyses, we removed any variables that exhibited r > 0.90 in pairwise comparisons with the other variables. The 1-km and 1.5km buffers were removed because they were highly correlated with other estimates, and percent impervious surface and human population size estimated for the 500-m and 2-km buffers were retained.

ddRADSeq, SNP genotyping, and population genomic statistics
We generated SNP genotypes for 233 individuals using a double-digest RADseq protocol . We aimed to obtain genotypes from 10 individuals from each site, but sample dropout due to poor DNA yield during library preparation resulted in variability in the sample size for each site (Table 1). In brief, we extracted DNA from tail snips or ear punches using Qiagen DNEasy tissue kits (Qiagen, Valencia, CA, USA) with an RNAse treatment, and then digested 500-1000 ng of DNA using the enzyme combination of SphI-HF and MluCI for 1 h following the manufacturer's instructions (New England Biolabs, Ipswich, MA, USA). A Qubit 2.0 fluorometer (Life Technologies, Norwalk, CT, USA) was used to quantify DNA concentration at several steps in the library preparation. Next, digested DNA was cleaned with 1.5X AMPure XP magnetic beads (Beckman-Coulter, Brea, CA, USA) and custom inline 'flex' barcodes and P1/P2 adapters were ligated to 200-400 ng of digested DNA for each sample . Up to 48 individual libraries with unique barcodes were then pooled in equimolar amounts and cleaned with 1.5X AMPure XP beads. Next we selected DNA fragments of known sizes (376-412 bp) from the pooled libraries using a Pippin Prep (Sage Science, Beverly, MA, USA). We then conducted multiple PCR amplifications using 20 ng of size-selected DNA and Phusion High-fidelity PCR reagents with manufacturers' PCR conditions (New England Biolabs). This PCR step added a second, unique index sequence and Illumina sequencing primers to the pooled libraries so each individual sample contained a unique combination of the in-line barcode and index. We then pooled the PCRs for each size-selected library, cleaned the pools using 1.5X AMPure XP beads, and checked the libraries using an Agilent BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA) for DNA concentration and the correct distribution of fragment size. The libraries were sequenced using three lanes of Illumina HiSeq 2000 2 9 100 bp paired-end sequencing at the New York University Center for Genomics and Systems Biology (New York, NY, USA).
As an initial check on the quality of our Illumina sequence data, we analyzed the raw reads in FastQC (Andrews 2010). Subsequent demultiplexing, quality filtering, and de novo SNP calling were conducted using the Stacks 1.21 pipeline (Catchen et al. 2013). First, we used the process_radtags script to filter out low-quality reads and demultiplex the remaining reads according to their unique combination of in-line barcode and index. Based on FastQC results, we trimmed all reads to 96 bp to remove poor-quality base calls at the ends of reads. Next, we concatenated the single-and paired-end reads for each individual into one fastq file because the two paired reads did not overlap and were not aligned to a reference genome. To identify RAD loci and call SNPs, we used the den-ovo_map.pl script in Stacks with default settings except for the minimum number of identical reads required to create a 'stack' (m = 7), number of mismatches allowed between RAD loci for a single individual (M = 3), and the number of mismatches allowed when building the catalog (n = 2). After building the initial catalog of loci, we used the populations script in Stacks to filter loci for those that occurred in at least 22/23 sampling sites (P = 22) and in at least 50% of individuals at each site (r = 0.5) to a minimum coverage of 7X (m = 7). These parameter values imply that each base pair position was sequenced at least 668X across the entire dataset. Stacks calculates the likelihood of the two most common genotypes at each site given a maximumlikelihood estimate of the sequencing error rate, and then uses a likelihood ratio test to identify the most likely genotype at that position. Full details of the models underlying Stacks are given in Catchen et al. (2013). The populations script also produced genotype output in multiple formats (i.e., Genepop, Structure) with one randomly selected SNP from each locus (-write_random_SNP), and generated summary statistics such as observed heterozygosity, nucleotide diversity, and pairwise F ST between all populations. These summary statistics were calculated for the sites that were polymorphic in at least one population (i.e., variant sites; Table 2), as well as for all the sites (Table S2). We used summary statistics for the variant sites in downstream analyses, although the two estimates were typically highly correlated (i.e., r < 0.99).
We loaded the RAD loci and individual data from Stacks into a MySQL database and visualized the output using the Stacks webserver. Based on the results of the first pipeline run, we removed 26 individuals because their small number of reads resulted in very small SNP datasets and excessive missing genotypes compared to other samples. Highly related individuals in our dataset could also bias downstream analyses. We avoided relatives at our urban sites using relatedness values from previous microsatellite studies of the same individuals (Munshi-South and Kharchenko 2010). For the SNP dataset, we identified highly related individuals by calculating kinship coefficients in the software package KING (Manichaikul et al. 2010). Kinship analysis identified 16 pairs of individuals that were related at the half-sib level or greater. We removed one of the pair members from the dataset, resulting in a final dataset comprised of 191 of the original 233 individuals. We then reran the Stacks pipeline on the screened dataset of 191 individuals.
Besides the filters applied by Stacks and the removal of highly related individuals, we also omitted outlier loci detected using the approach in BayeScan 2.1 (Foll and Gaggiotti 2008). Operating with default parameters and a false discovery rate <0.01, we identified 200 outliers using BayeScan. Five of these outliers showed a signature of positive selection, whereas 195 showed signatures of balancing selection. We omitted these outliers from the dataset because they did not fit assumptions of neutrality inherent to our downstream analyses.

Modeling genomewide variation within populations
We modeled four measures of genomewide variation within populations against our two urbanization proxies to test the hypothesis that urbanization is associated with reduced genetic variation. Summary statistics for each population included observed heterozygosity (H O ), nucleotide diversity (p), number of private loci, and the percent of polymorphic loci. We examined genetic (dependent) variables using seven candidate general linear models (GLMs) consisting of combinations of human population size and percent impervious surface estimated using different geographic buffers as described above (500 m and 2000 m): an intercept-only model, four univariate models with one variable measured at one buffer size, and two bivariate models including both human population and impervious surface estimated for the same buffer size. Human population size and percent impervious surface were ln-transformed prior to analysis to improve normality. If a model performed substantially better than the intercept-only model, then we interpreted that result as evidence of a statistical effect of the urbanization variable(s) on genetic diversity within white-footed mouse populations. We calculated maximum-likelihood estimates of model parameters for each model, and then ranked models using values of AICc, the corrected Akaike's information criterion (Burnham and Anderson 2002). The relative quality of models was further assessed based on DAICc, and the relative weight (w i ) of each model. For the best GLMs, we examined the statistical significance of the urbanization coefficients. We also examined scatterplots (Fig. 2) and fitted regression lines to confirm that the model explained variation in the genetic parameter of interest. We conducted all statistical analyses in R 3.2.1 (R Development Core Team 2008), and used GLM regression in the AICcmodavg package for model selection (Mazerolle 2015).

Genetic structure and population differentiation
To investigate whether urbanization results in greater genetic differentiation between populations, we used discriminant analysis of principal components (DAPC) in the R package adegenet (Jombart and Ahmed 2011) to identify evolutionary clusters among 191 white-footed mice. DAPC first reduces total genetic variation (i.e., variance in allele frequencies) into principal components, and then identifies discriminant functions that maximize differences between clusters while minimizing variation within clusters. We used cross-validation in adegenet to identify the optimal number of principal components. This procedure uses a randomly generated training set and validation set of individuals to identify the optimal number of principal components that accurately predict group membership without overfitting. To visualize clusters, we used DAPC  Table 1. scatterplots and barplots (compoplot command in adegenet) to visualize membership of individuals in different clusters. Genomewide SNP datasets often have power to discriminate between all groups, and results from analyses such as DAPC may reflect hierarchical structure. Thus, we ran DAPC on the full dataset as well as subsets of sampling sites to investigate hierarchical structure.
We also used the model-based evolutionary clustering approaches in fastSTRUCTURE (Raj et al. 2014) and ADMIXTURE (Alexander et al. 2009). fastSTRUCTURE uses approximate inference of the Bayesian model in the original STRUCTURE (J. Pritchard et al. 2000) whereas ADMIXTURE computes maximum-likelihood estimates of parameters to estimate the most likely number of evolutionary clusters, K. We ran fastSTRUCTURE on our data for each value of K from 1 to 23 using the standard model with a simple prior. A fastSTRUCTURE script (chooseK.py) also calculates heuristic scores for detecting the range of the most likely values of K. After identifying likely values of K, we reran fastSTRUCTURE with the more computationally demanding logistic prior model on a smaller subset of K values. The most likely values of K were determined in ADMIXTURE using its cross-validation procedure (Alexander and Lange 2011). To compare clustering results from fastSTRUCTURE and ADMIXTURE, we used the CLUMPAK (Cluster Markov Packager Across K) web server (Kopelman et al. 2015) to align and visualize bar plots for both programs at multiple values of K.

Landscape genomics
To investigate IBD, IBR, and IBE between white-footed mouse populations, we first tested for IBD using a Mantel test in the ecodist package in R. The data were bootstrapped 10 000 times to generate 95% confidence intervals for the Mantel P value. We then used the BEDASSLE package (Bradburd 2014) in R to estimate the relative contributions of IBD and IBE to genetic differentiation between the sampling sites. We computed allele counts and sample sizes for each population using the-counts function in VCFtools 0.1.12b (Danecek et al. 2011). We examined the following IBE models using BEDASSLE: (i) a simple binary matrix indicating whether the population was located in NYC or outside the city; and (ii) a pairwise matrix of 'resistance distances' calculated between populations using the IBR approach in Circuitscape 4.0 (McRae and Beier 2007).
Circuitscape calculates a pairwise matrix of resistance distances between populations based on the ability of a simulated electrical current to flow between adjacent landscape cells connected by resistors with user-defined resistance values. We used the 2011 Percent Impervious Surface layer from USGS National Landcover Data (Xian et al. 2011), and set the resistance value for each 30-m cell as equivalent to its percent impervious surface unless the cell exceeded 70% impervious surface. For cells exceeding 70%, we set the resistance level to 100; in other words, any cell with >70% impervious surface was assumed to be 100X more resistant to migration than a cell with 1% impervious surface. The 70% cutoff for relatively high, quasi-barrier resistance was based on results of an earlier IBR analysis we conducted in NYC (Munshi-South 2012). To calculate resistance distances between all populations, we ran Circuitscape in pairwise mode with raster cells connected to all eight neighboring cells. The analysis also produced a cumulative current map to visualize hypothesized migration between all populations. As an initial check on the success of the resistance distances at explaining variation in pairwise F ST , we conducted a partial Mantel test with 10 000 bootstraps in ecodist that factored out the effects of Euclidean geographic distance. We then used these resistance distances for the IBE model in BEDASSLE.
BEDASSLE uses a Markov Chain Monte Carlo (MCMC) approach and includes several graphing functions for evaluating success of the MCMC posterior parameter estimation. To confirm adequate mixing and convergence of chains, we examined traces and marginal distributions for all parameters. We also examined acceptance rates for MCMC parameter estimation, and when these rates were too high or low, we adjusted the tuning parameters and reran the analysis for 5-10 million steps. For the final runs, we calculated the median and 95% credible intervals for the a E : a D ratio after discarding the first 20% as burn-in. This ratio represents the effect size of ecological distance relative to geographic distance. The simple BEDASSLE model assumes identical variance of allele frequencies about the global mean allele frequency. However, populations deviate from the global mean for a number of demographic reasons (i.e., bottlenecks, inbreeding), and outlier populations can have a strong influence on posterior distributions. To account for this variation, BEDASSLE includes a beta-binomial model that estimates an additional parameter, Φ K , that measures the strength of drift and lack of fit of each population to the model. To address population history, we also ran the beta-binomial model for the two IBE scenarios above. For these models, we examined the posterior distributions of the Φ K values (recalculated as F K = 1/1 + Φ K ) to identify outlier populations.
To test the relative fit of the models to our data, we generated 1000 posterior predictive samples for each model and compared the simulated data to our observed data in BEDASSLE. BEDASSLE includes a function that randomly draws parameter values from the posterior distributions and simulates new datasets. These simulated datasets are then used to calculate F ST between all pairs of populations. Simulated F ST values can then be compared to the observed F ST values to examine the models' ability to describe patterns in the real data.
Results ddRADSeq dataset and summary population genomic statistics Three lanes of Illumina HiSeq 2 9 100 PE sequencing produced a total of 1.16 billion reads, of which 936.64 million reads (80.9%) passed initial quality filters. Reads were filtered out due to ambiguous barcode sequences (56.3%), low-quality scores (40.9%), and ambiguous RAD tags (2.8%). The catalog generated by STACKS included 880 898 RAD loci with at least 7X coverage, but this number was reduced to 14 930 after requiring loci to be present in 22/23 populations and at least 50% of individuals within each population. All populations were well-represented in the final dataset, with averages of 4 to 10 individuals genotyped per locus across the 23 populations (median = 6.6; Table 2).
For all loci that were polymorphic in at least one population, the major allele frequency (P) ranged from 0.894 to 0.957, and the average observed heterozygosity from 0.054 to 0.181 (Table 2). If all invariant positions are included, then the major allele frequency exceeded 0.999 in all populations and the heterozygosity ranged only from 0.001 to 0.002 (Table S2). Nearly all of the populations located within NYC exhibited lower genetic variation than suburban or rural populations as measured by the percentage of polymorphic sites, numbers of private alleles, major allele frequency, observed heterozygosity, and nucleotide diversity (p). The only exceptions to these trends were relatively high diversity values for the Pelham Bay (PB) population in NYC, and low values for suburban Saxon Woods (SW), the rural Cary Institute (CIE) and Minnewaska Reserve (MR; Table 2). Excluding those four outliers, heterozygosity ranged from 0.054 to 0.117 in the urban populations and 0.158 to 0.181 in the suburban and rural populations. Nucleotide diversity similarly ranged from 0.066 to 0.116 in NYC and 0.138 to 0.15 in the suburban and rural populations.

Modeling urbanization and genomewide variation within populations
Sampling sites could be distinguished by their combinations of human population size and percent impervious surface (Fig. 1B). The rural sites all clustered at very low values for both urbanization variables. Urban and rural sites exhibited little overlap in the scatterplot, with the exception of Fort Tilden (FT), Jamaica Bay (JB), and Pelham Bay in NYC. Figure 1B shows the relationship between the two urbanization variables measured for a 2-km buffer around sampling sites, but results were qualitatively similar for the 500-m buffer.
Model selection based on AIC c confirmed that nearly all GLMs with one or more urbanization variables described variation in genomic diversity better than intercept-only models, except for human population size estimated at a 500-m buffer (Table 3). Impervious surface cover, Table 3. Results of selection among general linear models describing the influence of percent impervious surface cover (imprv) and human population size (pop) at 500-m and 2-km buffer sizes on a) observed heterozygosity, b) nucleotide diversity, c) the number of private alleles within each population, and d) the percentage of polymorphic loci within each population. The best models were chosen based on the second-order Akaike's information criterion (AIC c ), the rate of change in AIC c (D i ), and the Akaike weights (w i ).
estimated at the 2-km buffer, was the highest ranked model for all four genetic variables, with the second best models all including both impervious surface cover and human population size at the 2-km buffer. The Akaike weights for the best models were all equal to or >0.55, and the delta AICc for moving from the second to third model were all >3.0. Heterozygosity was negatively associated with percent impervious surface, with the exception of the outlier populations mentioned above (i.e., rural populations with low genetic diversity; Fig. 2A). Nucleotide diversity was also negatively associated with percent impervious surface, and exhibited the same outlier populations as the heterozygosity scatterplot (Fig. 2B). The number of private alleles and percent polymorphic loci were negatively associated with percent impervious surface (Fig. 2C,D). All urbanization coefficients from the top GLMs were also significant at P < 0.05, with the coefficients for nucleotide diversity and number of private alleles significant at P < 0.001.

Genetic structure and population differentiation
Pairwise F ST calculated using STACKS ranged from a low of 0.033 between a rural (MH) and suburban (LCC) population, and a high of 0.145 between a suburban (MRG) and urban (RR) population. Most values were between 0.05 and 0.10. Some NYC populations exhibited many F ST > 0.10, particularly RR and JB, as did one suburban population (MRG). A simple Mantel test revealed no significant IBD (Mantel r = 0.041, 95% CI = À0.046-0.124, P = 0.32). Cross-validation identified n = 23 as the optimal number of principal components to retain for DAPC analysis ( Figure S1). The first two discriminant functions distinguished two isolated NYC sites, Jamaica Bay (JB) and Fort Tilden ( (Fig. 3A). This clustering largely recapitulates the spatial orientation of these populations along a north-south axis. One exception is the most urban population, Central Park (CP), which does not occur where it would be expected based on geography. This placement indicates that Central Park is one of the most isolated and unique NYC populations, along with Fort Tilden and Jamaica Bay. The DAPC compoplot (i.e., a barplot of membership probability) indicated that nearly all individuals could be assigned to their sampling site with high probability ( Figure S2).
To clarify relationships between major clusters of sampling sites, we reran the DAPC analysis on two subsets: all populations on the mainland and Manhattan, and all populations located on Long Island. Twelve principal components were retained for the mainland-Manhattan analysis, and the scatterplot of the first two discriminant functions revealed a major, central cluster that recapitulated the geography of the populations. However, three isolated, urban populations were distinct from this cluster (Fig. 3B): Central Park (CP), Inwood Hill Park (IP), and Van Cortland Park (VC). The Long Island analysis retained seven principal components, and confirmed that JB and FT are highly distinct populations, as well as the Ridgewood Reservoir (RR; Figure S3).
Both the heuristic analysis in fastSTRUCTURE (using the simple prior) and cross-validation in ADMIXTURE identified K = 2 as the most likely number of evolutionary clusters among the 23 white-footed mouse populations we sampled. CLUMPAK confirmed that individual assignment to the two clusters was highly correlated across the two methods (r = 0.93; Fig. 4A,B). One cluster (blue in Fig. 4A,B) contained all the NYC populations on Long Island, the two populations on Manhattan (CP and IP), and three other suburban/rural populations with atypically low genetic variation (CIE, MR, and SW). All other populations were assigned to the other cluster (orange in Fig. 4A,B), except for WW located in rural Long Island which was an admixture of the two clusters in almost equal proportions. When we reran the fastSTRUCTURE analysis for K = 1-4 using the more accurate logistic prior, the heuristic analysis identified K = 3 as the upper bound on the likely number of evolutionary clusters. Cross-validation in ADMIXTURE also identified K = 3-5 as only slightly worse than K = 2 ( Figure S4), so we created barplots for these numbers of evolutionary clusters. For K = 3, the blue cluster of urban Long Island, Manhattan, and outlier rural populations was maintained, but the other cluster was split into two largely based on location east or west of a northsouth axis (Fig. 4C). The fourth cluster in the K = 4 ADMIXTURE analysis included a new cluster (Fig. 4D) with the two Manhattan populations, a Bronx population (NYBG), and a few distant populations adjacent to or west of the Hudson River (CFP and HIP). The additional cluster for K = 5 included two suburban populations in relative proximity (CPV and LCC; Fig. 4E).

Landscape genomics
For the simple BEDASSLE model examining presence or absence in the city, the median a E: a D ratio was 7.14 and the 95% credible set was 6.90 to 7.36. The interpretation of this result is that being located in NYC has an impact of approximately 7 km of extra pairwise geographic distance on genetic differentiation. Comparison of posterior predictive samples for the simple and beta-binomial model con-firms that the latter is a much better fit to the data (Fig. 5A,B). For the beta-binomial model, the median a E: a D ratio was 0.006 and the 95% credible set was 0.0002 to 0.038. This model indicates that presence in NYC has virtually no additional impact over geographic distance on genetic differentiation. However, values of F k were elevated   Table 1. for most of the urban populations (range in medians from 0.04-0.25; mean = 0.12) relative to the suburban (range = 0.01-0.06; mean = 0.03) and rural populations (range = 0.001-0.11; mean = 0.03), with the exception of two rural populations that also had high values (CIE and MR; Fig. 5D). These F k values indicate that the urban populations deviated substantially more from global allele frequency estimates than the suburban and rural populations.
Pairwise resistance distances from the IBR model (Figure S5) were significantly associated with F ST , even after factoring out the effects of geographic distance (partial Mantel r = 0.521, 95% confidence interval = 0.438-0.621, P < 0.0001; Table S3). The simple IBE model using resistance distances did not converge after trying many different combinations of prior values. The beta-binomial model did converge, producing a median a E: a D ratio of 0.0001 and  Table 1 and are ordered geographically in the bar plots. The first five sites are on Long Island (AP-WW), the next two on Manhattan (CP and IP), the next 10 east of the Hudson River (VC-MH, organized from North to South), and the last three west of the Hudson River (HP-MR).
95% negligible impact on genetic differentiation compared to geographic distances, which contradicts the IBR results. Posterior predictive sampling indicated that this beta-binomial model performed moderately well, but systematically underpredicted observed F ST < 0.05 (Fig. 5C).

Discussion
This study is the first to our knowledge to examine the influence of urbanization on genomewide SNP variation in a city-dwelling species. We found that populations of white-footed mice along an urban-to-rural gradient exhibit a negative correlation between genomic variation and urbanization in and around their habitat. Populations within NYC also exhibited greater genetic differentiation from one another than pairs of populations in rural areas. IBR and IBE models based on urbanization explained a greater proportion of pairwise population differentiation overall than IBD by some metrics. NYC populations deviated more strongly from global mean allele frequencies than rural populations, indicating that urbanization has substantially altered the evolutionary trajectories of urban wildlife (Donihue and Lambert 2014).
Many recent studies have documented reduced migration and loss of heterozygosity at microsatellite loci among populations in urbanized landscapes (Gortat et al. 2014;Barr et al. 2015;Jha 2015). However, for logistical reasons the most variable microsatellite loci are often chosen for population genetic analysis, and thus may not represent unbiased samples of genomewide diversity (V€ ali et al. 2008). Heterozygosity measured using microsatellites and traits related to fitness are also often weakly correlated, even though there is a publication bias toward reporting only high heterozygosity-fitness correlations (Chapman et al. 2009). New approaches such as ddRAD-Seq used here produce genomewide SNP markers that are more appropriate for assessing genomewide variation in relation to ecological factors such as urbanization.
Population genomic variation is a key to understanding and predicting evolutionary responses to environmental transformation. While urbanization may not directly cause extinctions of most species, it is likely to decrease the evo- lutionary potential of populations. Results from native species in cities are few. Urban bobcats maintained variation at immune-linked loci due to balancing selection from disease pressures, even after a population bottleneck and population subdivision by freeways (Serieys et al. 2015). Blackbirds colonizing cities also exhibited polymorphisms in a candidate behavioral gene that was strongly associated with urban habitats (Mueller et al. 2013), indicating that functional variation is important for the evolutionary success of urban wildlife. We previously identified candidate genes that may be under selection in urban white-footed mice (Harris et al. 2013), but our results here indicate that these populations have lost as much as half of their genetic diversity compared to nearby rural populations. While candidate genes of large effect may be relevant to understanding some responses to landscape change, screening genomewide variation offers many advantages for measuring evolutionary potential. Most adaptive traits are polygenic and influenced by many loci of small effect. Information on genomewide variation will also capture cryptic variation and quantify the amount of standing variation available for responses to future change (Harrisson et al. 2014). Loss of standing variation will thus make it less likely that urban populations will be able to adapt to local conditions, or to global phenomena such as climate change (Franks et al. 2014). Mutation is likely too slow of a process for rapid recovery of evolutionary potential in fragmented urban populations. However, increasing connectivity in cities would immediately boost genetic variation if migration between genetically differentiated populations could be re-established.

Modeling urbanization and genomewide diversity
We found that percent impervious surface cover, and to a lesser extent human population size, estimated at 2-km buffers around study sites were highly correlated with levels of genomewide diversity. Gradient studies have predominated in urban ecology, but terms such as 'urban' and 'suburban' have been used in many different contexts without standardization (Magle et al. 2012). Study sites are often defined based on subjective criteria, or chosen simply to represent a linear geographic gradient regardless of the actual pattern of urbanization (Ramalho and Hobbs 2012). We advocate that future landscape genetics studies report percent impervious surface cover in and around study sites to facilitate comparisons between landscapes and species. Impervious surface and human population size are highly correlated, but humans may be present in large numbers outside of cities for recreation in protected areas (Monz et al. 2013). Impervious surface can be readily used to track urbanization over time, and has relevance to both terrestrial and aquatic systems (Walsh et al. 2005). Extensive impervious surface in the form of roads, parking lots, and buildings characterizes all cities (Nowak and Greenfield 2012). Roads in particular have well-characterized negative impacts on the connectivity of wildlife populations (Balkenhol and Waits 2009;Ben ıtez-L opez et al. 2010). Nearly 70% of global forest cover is now fragmented, and areas subject to urbanization and high-intensity agriculture are most severely affected (Haddad et al. 2015). Reporting measures of impervious surface cover will provide much needed standardization. We found that estimates for 2-km buffers were much better than 500-m buffers, although the spatial effects will likely vary for different species. In this case, 500 m was likely too small of a buffer to capture the extent of urbanization's influence on study sites, whereas 2 km was near the maximum buffer size we could use around many of our study sites and still retain statistical independence from other study sites.

Population structure and differentiation
Urban and rural populations were differentiated from each other, with many F ST > 0.10. Some populations within NYC had pairwise F ST as high as urban-rural pairs that were much more distant, indicating that isolation within the city is quite high. Higher F ST values reported here are similar to those reported for endangered beach mice (Austin et al. 2015) and Channel Island deer mice (Ozer et al. 2011) that recently experienced strong genetic drift from extirpations and translocations. The lack of IBD reported here, but moderate-to-strong genetic population differentiation, suggests that dispersal is limited by barriers and high landscape resistance rather than geographic distance. Thus, evolutionary clustering and IBE (Sexton et al. 2014;Wang and Bradburd 2014) are more appropriate models for understanding patterns of genomewide diversity among these populations. Discriminant analysis of principal components, and two model-based clustering analyses, sorted individuals into 23 clusters that largely recapitulated geographic patterns. We identified one major split between Long Island and mainland populations, which historical demographic modeling indicated is likely related to glacial retreat and ecological succession in the region (Harris et al. 2015b). The other striking pattern was that several urban populations were outliers in their genetic divergence from the major clusters of sampling sites. Previous clustering analyses using microsatellites also identified most NYC sites as distinct populations (Munshi-South and Kharchenko 2010). SNPs evolve more slowly than microsatellites, and there is likely still considerable ancestral variation in isolated urban populations. These NYC populations may also not have reached linkage and Hardy-Weinberg equilibrium at many SNP loci because not enough generations have elapsed since isolation, or the populations have not reached mutation-migration-drift equilibrium.

Landscape genomics
Our IBR and IBE analyses produced mixed results. Overall, IBR and IBE modeling indicated that urbanization drives genetic differentiation to a greater degree than geographic distance alone. Although the relative influence of IBE to IBD was modest in the BEDASSLE analyses, urban populations deviated to a much greater degree from global allele frequencies than suburban or rural populations. The most likely explanation for this deviation is substantial drift due to inbreeding or bottlenecks in isolated urban habitats. This scenario generally conforms to the strong structure observed among some of the urban populations. However, other factors that caused these populations to deviate from the BEDASSLE model cannot be ruled out, such as unsampled environmental variables (Bradburd et al. 2013).
IBR modeling in Circuitscape confirmed that variation in percent impervious surface is highly associated with variation in pairwise F ST , even after factoring out IBD. Partial Mantel tests have several known issues with false positives (Legendre and Fortin 2010), but we previously found that vegetation cover (typically the inverse of impervious surface in NYC) successfully described migration between populations using similar approaches (Munshi-South 2012). The IBE models did not identify the same clear association between impervious surface and genetic differentiation. As above, the strong deviation of urban populations from the global mean allele frequencies indicated that the IBE model may have underperformed. Resistance distances from Circuitscape are integrated over all possible paths between points on the landscape, and thus, BEDASSLE also counts geographic distance twice with unknown consequences when using resistance distances (G. Bradburd, personal communication).
IBE is an active area of inquiry that holds great promise for understanding the processes that generate genetic variation. However, currently available approaches are relatively new, and several caveats apply to their application (Wang and Bradburd 2014). This study is the first use of BEDAS-SLE to model the relationship between genetic differentiation and human modification of the environment. In addition to the model adequacy issues raised above, some of the populations analyzed here may not have reached migration-drift equilibrium, or were dominated by idiosyncratic environmental processes. Previous BEDAS-SLE analyses that detected stronger IBE patterns were conducted over much larger geographic scales among much more strongly differentiated populations, such as sky island birds (Manthey and Moyle 2015), lizards occupying a SE Asian archipelago separated by deep ocean trenches (Barley et al. 2015), and a widespread bird occupying much of the Amazon basin (Harvey and Brumfield 2015). These results suggest that the IBE model in BEDASSLE may currently be best suited to populations that are deeply diverged in both time and space.

Conclusions & future research
The results presented here demonstrate for the first time that urbanization is associated with a pervasive reduction in genomewide variation among animal populations. Peromyscus spp. Are increasingly important models for investigating natural variation (Bedford and Hoekstra 2015). Given unchecked urbanization, particularly in the eastern United States, it is likely that many white-footed mouse populations in metropolitan areas have experienced similar declines in standing genetic variation. This study examined only a single urbanization gradient. Replicate analyses on white-footed mice in other metropolitan areas, and studies on additional taxa that may be isolated in urban fragments, are necessary to robustly establish a pattern of declining genomewide variation in urbanizing landscapes.
Preliminary evidence indicates that some loci have experienced selective sweeps in urban white-footed mice (Harris et al. 2013), but the relative impacts of urbanization on historical demography and natural selection have not been fully disentangled for these populations. Using an expanded transcriptome dataset (Harris et al. 2015a), we recently identified signatures of selection in NYC populations using an approach that accounts for historical demographic patterns (S. E. Harris & J. Munshi-South, unpublished manuscript). We identified dozens of candidate loci under selection that are associated with metabolic and immune processes. These patterns may reflect changes in diet and biotic pressures (i.e., disease and inflammation) in cities. Thus, the loss of genomewide diversity documented here does not necessarily preclude local adaptation to highly altered, stressful urban environments.
Evolutionary potential of urban populations could be improved by restoring connectivity between urban forest patches. Enhanced gene flow between urban forests, as well as between urban and suburban areas, would increase overall genomewide diversity (although may simultaneously break up local adaptation). Habitat networks that promote gene flow in cities can be constructed from even small gardens and green spaces not explicitly dedicated to biodiversity (Goddard et al. 2010;Vergnes et al. 2012). Microevolutionary processes are rarely integrated into urban conservation due to their perceived complex nature and variation between taxa, but could be useful metrics for assessing landscape connectivity (Stockwell et al. 2003;Kinnison et al. 2007). Converting 'gray' infrastructure into 'green' networks could also simultaneously address conser-vation goals while delivering co-benefits to humans in the form of biodiversity experiences (Tanner et al. 2014). A rich literature indicates that humans benefit in myriad ways from access to nature (Fuller et al. 2007), but integrative approaches to urban wildlife conservation are a major area for future growth (Shwartz et al. 2014).

Supporting Information
Additional Supporting Information may be found in the online version of this article: Table S1. Correlation coefficients calculated between % impervious surface and human population sizes calculated at buffers around study sites of 500 m, 1 km, 1.5 km, and 2 km. Table S2. Summary population genomic statistics calculated for all nucleotide positions (variant and fixed). Table S3. Matrix of pairwise F ST (above diagonal) and great-circle geographic distance (km; below diagonal) calculated between all pairs of 23 populations. Figure S1. Cross-validation (i.e., a-score optimization) to identify the optimal number of principal components to retain for DAPC without overfitting4. Figure S2. Compoplot/bar plot result from DAPC analysis on all 23 populations. Figure S3. Scatterplot of first two discriminant functions from DAPC for populations on Long Island. Figure S4. Cross-validation of results from ADMIXTURE for K = 1 -12. Figure S5. Cumulative current map from isolation by resistance (IBR) modeling in Circuitscape. Lighter areas represent landscape cells with higher cumulative predicted current (i.e., higher movement).