Predicting the strength of urban-rural clines in a Mendelian polymorphism along a latitudinal gradient

Cities are emerging as models for addressing the fundamental question of whether populations evolve in parallel to similar environments. Here, we examine the environmental factors that drive parallel evolutionary urban-rural clines in a Mendelian trait — the cyanogenic antiherbivore defense of white clover (Trifolium repens). We sampled over 700 urban and rural clover populations across 16 cities along a latitudinal transect in eastern North America. In each population, we quantified the frequency of genotypes that produce hydrogen cyanide (HCN), and in a subset of the cities we estimated the frequency of the alleles at the two genes (CYP79D15 and Li) that epistatically interact to produce HCN. We then tested the hypothesis that winter environmental conditions cause the evolution of clines in HCN by comparing the strength of clines among cities located along a gradient of winter temperatures and frost exposure. Overall, half of the cities exhibited urban-rural clines in the frequency of HCN, whereby urban populations evolved lower HCN frequencies. The weakest clines in HCN occurred in cities with the lowest temperatures but greatest snowfall, supporting the hypothesis that snow buffers plants against winter frost and constrains the formation of clines. By contrast, the strongest clines occurred in the warmest cities where snow and frost are rare, suggesting that alternative selective agents are maintaining clines in warmer cities. Additionally, some clines were driven by evolution at only CYP79D15, consistent with stronger and more consistent selection on this locus than on Li. Together, our results demonstrate that both the agents and targets of selection vary across cities and highlight urban environments as large-scale models for disentangling the causes of parallel evolution in nature. Impact Summary Understanding whether independent populations evolve in the same way (i.e., in parallel) when subject to similar environments remains an important problem in evolutionary biology. Urban environments are a model for addressing the extent of parallel evolution in nature due to their convergent environments (e.g. heat islands, pollution, fragmentation), such that two distant cities are often more similar to one another than either is to nearby nonurban habitats. In this paper, we used white clover (Trifolium repens) as a model to study the drivers of parallel evolution in response to urbanization. We collected >11,000 plants from urban and rural habitats across 16 cities in eastern North America to examine how cities influence the evolution of a Mendelian polymorphism for an antiherbivore defense trait – hydrogen cyanide (HCN). This trait had previously been shown to exhibit adaptive evolution to winter temperature gradients at continental scales. Here we tested the hypothesis that winter environmental conditions cause changes in the frequency of HCN between urban and rural habitats. We found that half of all cities had lower frequency of HCN producing genotypes relative to rural habitats, demonstrating that cities drive parallel losses of HCN in eastern North America. We then used environmental data to understand why cities vary in the extent to which they drive reduction in HCN frequencies. The warmest cities showed the greatest reductions in HCN frequencies in urban habitats, while colder, snowier cities showed little change in HCN between urban and rural habitats. This suggests that snow weakens the strength of natural selection against HCN in cities. However, it additionally suggests alternative ecological or evolutionary mechanisms drive the strong differences in HCN between urban and rural habitats in the warmest cities. Overall, our work highlights urban environments as powerful, large-scale models for disentangling the causes of parallel and non-parallel evolution in nature.

populations evolved lower HCN frequencies. The weakest clines in HCN occurred in cities with 48 the lowest temperatures but greatest snowfall, supporting the hypothesis that snow buffers plants 49 against winter frost and constrains the formation of clines. By contrast, the strongest clines 50 occurred in the warmest cities where snow and frost are rare, suggesting that alternative selective 51 agents are maintaining clines in warmer cities. Additionally, some clines were driven by 52 evolution at only CYP79D15, consistent with stronger and more consistent selection on this locus 53 than on Li. Together, our results demonstrate that both the agents and targets of selection vary 54 across cities and highlight urban environments as large-scale models for disentangling the causes 55 of parallel evolution in nature. 56

Introduction 80
The extent to which populations adapt in parallel to similar environmental conditions remains a 81 fundamental problem in evolutionary biology ( correlational data suggest that reduced urban snow cover has led to the observed colder winter 113 ground temperatures in some cities relative to rural areas, which drives selection to reduce HCN 114 frequencies in cities . The absence of a cline in one of the four previously 115 studied cities was explained by high urban snow depth in both urban and rural locations, which 116 was hypothesized to insulate plants against the damaging effects of frost  Here, we combine sampling of over 11,000 white clover plants from 16 cities with 149 publicly available climate data to assess the environmental drivers underlying the evolution of 150 latitudinal and urban-rural clines in cyanogenesis across multiple cities in eastern North 151 America. We begin by assessing the environmental predictors of HCN frequencies along the 152 latitudinal gradient by asking: (1) What regional environmental factors predict mean HCN 153 frequencies within populations? Consistent with previous work (Daday 1954a, 1965 and Olsen 2012, 2013), we expected to observe lower HCN frequencies at more northern 155 latitudes due to lower winter temperatures. We then examine urban-rural clines in HCN across 156 16 cities to address the following questions: (2) How common are urban-rural clines among large 157 (> 2 million people) cities in eastern North America? (3) What regional environmental factors 158 predict the strength of clines in cyanogenesis? We predicted that we would observe the weakest 159 clines in cities with high minimum winter temperature (i.e., warm cities) and also in those with  (Table 1) using a standard CTAB-chloroform extraction method 264 (Agrawal et al. 2012). We chose these cities because they spanned the range of latitudes included 265 in our study, thus reducing potential impact of geographical variation on haplotype frequencies. 266 We included cities that varied in the presence (5 cities) and absence (2 cities) of clines in HCN. 267 We only extracted DNA from plants that were homozygous recessive at both loci (i.e. acac lili) 268 because these plants have at least one deletion haplotype at each locus. Each plant was amplified 269 with 6 different primer pairs (3 for each locus), designed to assay the approximate size of the 270 genomic deletion at each locus based on the presence/absence of PCR products (Kooyers and 271 Olsen 2014). Note that larger deletions are masked by smaller deletions when resolving 272 haplotypes on a gel, preventing us from estimating the true frequency of each deletion; we 273 therefore rely only the presence/absence of deletions in our analyses (see "Statistical analyses" 274 below). Using this approach, we were able to assign 88% and 78% of samples to previously 275 described Li (n = 4) and Ac (n = 2) deletion haplotypes, respectively. The remaining individuals 276 were either newly discovered haplotypes or individuals with intact Li and Ac genes, the latter 277 indicating either false negative phenotyping or false positive haplotyping assays (or the presence 278 of a silencer modifier locus). We focus our results on the haplotypes that aligned with those 279 previously described by Kooyers and Olsen (2014)  Administration's National Centers for Environmental Information 301 (https://www.ncdc.noaa.gov/cdo-web/datatools/selectlocation). Importantly, these are regional 302 environmental data obtained from a single weather station for each city located at the nearest 303 international airport; thus, these data represent city-level abiotic conditions and not the 304 conditions extracted for each population. 305 Some filtering and processing of the environmental data was required prior to 306 downstream analyses. First, we took the mean MWT (°C), MST (°C), AI, and aPET across all 307 populations within a city to estimate the city-level minimum winter temperature, maximum 308 summer temperature, aridity, and annual potential evapotranspiration, respectively. Next, we 309 calculated an alternative measure of aridity, the soil moisture deficit (SMD), as the difference 310 between Precip and mPET. This measure of aridity can more reliably estimate regional aridity in 311 cases where both precipitation and PET are low (Thompson et al. 2014). We calculated SMD for 312 all months from May to September to represent the plant growing season, and again took the 313 mean of these measurements across all populations as our measure of city-level SMD. Finally, 314 we used NOAA's weather station data to estimate regional snow depth (cm), snowfall (cm), and 315 the number of days below zero with no snow cover, a measure of frost exposure that has been 316 previously associated with urban-rural clines in HCN ( ). To retrieve these 317 estimates, we first filtered the weather data to remove missing data and only included 318 observations from January and February as these are the coldest winter months in eastern North 319 America. We additionally removed observations from years where data was unavailable for both 320 January and February and eliminated months with fewer than 10 days of weather data. Following 321 filtering, we retained 31,005 weather observations, with a mean of 1937 observations per city 322 (Table S1). From these data, we calculated the mean snow depth, mean snowfall, and the number 323 of days below 0 °C with no snow cover (i.e. snow depth of 0 cm) and took the mean of these 324 measurements across all years as our estimates of regional winter conditions. 325 326

Statistical analyses 327 328
For brevity, we only briefly describe the statistical procedures used throughout the paper; a 329 detailed description of all analyses can be found in the supplementary materials (text S1: 330 "Detailed statistical analyses"). We first tested whether, on average, cities varied in mean HCN 331 frequencies and whether urbanization influenced HCN frequencies. To do this, we fit an 332 ANOVA using type III SS with within-population HCN frequencies as the response variable and 333 city, standardized distance to the urban center and the city × distance interaction as predictors. 334 We used distance to the urban center as a measure of urbanization as this is highly correlated 335 Upon confirming that cities varied significantly in the strength of urban-rural phenotypic 355 clines using ANOVA, we used a similar approach to that described above to examine the 356 environmental predictors of the strength of urban-rural phenotypic clines in HCN. For each city, 357 we first fit a linear regression with the proportion of cyanogenic plants within each population as 358 the response variable and standardized distance to the urban center as the sole predictor. Note 359 that cities that showed significant changes in HCN frequency with distance (Table 1)  MWT, and MST, all of which were highly correlated and on their own significantly predicted 366 variation in the strength of clines (see text S1). Cities with low values along PC1 slope experience 367 little snow, higher minimum winter temperature, and higher maximum summer temperature. 368 We explored whether clines were present at each of the two loci underlying HCN. This 369 was done by fitting linear models in which the allele frequencies for the Ac and Li loci were 370 treated individually as a response variable in separate analyses, which was regressed against 371 standardized distance to the urban core as a predictor. Finally, to examine variation in deletion 372 haplotypes across urban and rural habitats, we used the raw counts of deletion haplotypes at each 373 locus to calculate Simpson's diversity index for deletions in urban and rural habitats for each 374 city, which accounts for both presence/absence and abundance of haplotypes in each population 375 when estimating diversity. We fit Simpson's diversity index as the response variable in a linear 376 model with habitat type (i.e. urban vs. rural) as the sole predictor such that a significant effect of 377 habitat suggested differences in deletion haplotype diversity among urban and rural habitats. All 378 analyses were performed in R v. 3.6.0 (R Core Team 2019  Fig. 1, Fig. 2), with the highest frequencies at 386 the most southern latitudes (Fig. S2). The number of days < 0°C with no snow cover and PC1 HCN 387 together accounted for 94.1% of the variation in mean HCN frequencies among cities (Table S4). 388 Specifically, HCN frequencies decreased by 1.5% for every additional day below 0 °C with no 389 snow cover (β = −0.015, ‫ݖ‬ = 8.77, P < 0.001, Table S5, Fig. 3a), and by 6.4% for every unit 390 increase along PC1 HCN (β = −0.064, ‫ݖ‬ = 7.0, P < 0.001, Table S5, Fig. 3b), suggesting HCN 391 frequencies decrease in colder, wetter environments that get more snow. Annual aridity index 392 was not significant predictor of mean HCN frequencies in our model (P = 0.36, Table S5) Fig. 4), implying that the strongest clines occurred in the warmest environments and the 407 weakest clines occurred in regions of low temperature and high snowfall/depth. 408

409
Clines at loci underlying HCN and deletion haplotype frequencies 410

411
Of the 16 cities surveyed in this study, we assayed the genotype at the two underlying genes in 412 nine cities, six of which showed significant clines in HCN (Table 1). Of the six cities with 413 significant clines, three (Atlanta, Jacksonville, Toronto) showed significant linear clines at both 414 Ac and Li, three (New York, Norfolk, Washington) showed significant linear clines only at Ac, 415 whereas no cities had a significant linear cline only at Li. Significant clines at Ac and Li were 416 always in the same direction as clines in HCN (i.e., lower frequencies of the dominant alleles Ac 417 and Li in urban populations). None of the three cities that lacked a cline in HCN showed a cline 418 at either underlying gene. 419 All deletion haplotypes at Ac and Li identified previously in this system were found in 420 each city, and their relative frequencies did not vary for either locus between urban and rural 421 populations ( fig. S3 and S4). Based on this earlier work, we predicted the weakest clines in cities with high mean winter 489 temperature or high snowfall due to the absence of frost-mediated selection against HCN in these 490 cities. Consistent with this prediction, cities with high snowfall (i.e. high values along PC1 slope ) 491 had the weakest clines, potentially due to snow buffering plants from frigid temperatures and 492 weakening frost-mediated selection against HCN-producing genotypes. However, the strongest 493 clines occurred in the warmest cities (i.e. low values along PC1 slope ), contrary to our predictions. 494 Importantly, this provides no information about whether lower winter temperatures in cities is an 495 important selective agent; urban frost may still be important in frost-prone cities with shallower 496 urban-rural gradients in snow depth, as suggested by currently available data (Thompson et al. 497 2016). However, this does suggest that alternative mechanisms must drive the evolution of clines 498 in warmer cities where frost is uncommon. Indeed Atlanta, which gets little snow (mean snowfall 499 = 0.07 cm/year) and is relatively warm throughout the winter months (average minimum winter 500 temperature = −0.43 °C) contained the strongest phenotypic cline in HCN observed in any city to 501 date (β = 0.36, Table 1). 502 The stronger clines in warmer cities suggests that regional temperature modulates the 503 strength of selection along urban-rural clines in some cities. Given that cyanogenesis functions as biotic interactions more generally, in these cities is needed. Alternatively, recent experimental 513 data suggests a cost to producing the metabolic components of HCN in stressful environments, 514 especially for cyanogenic glucosides (Kooyers et al. 2018). If environmental stressors are 515 stronger in cities (e.g. frost, salinity, pollution, etc.), costs involved in the production of the 516 metabolic components of HCN may result in selection against these genes and lower frequencies 517 in urban populations. Consistent with this hypothesis, some urban-rural clines in HCN were 518 mirrored only by clines at Ac, suggesting selection may be acting on this locus due to its greater 519 metabolic costs (Kooyers et al. 2018). 520 521

Conclusions and future directions 522
We have demonstrated the repeated evolution of urban-rural cyanogenesis clines across eastern 523 North American cities. A major goal for future work in this system entails distinguishing among 524 the targets of selection across replicate clines (i.e., HCN vs. Ac. vs. Li) and disentangling the 525 numerous ecological (e.g., environmental factors) and evolutionary (e.g., selection, drift) drivers 526 of (non)parallel responses of HCN to urbanization. This work will require quantifying a broad 527 array of environmental factors at a finer-scale (e.g. population-level) in cities spanning all 528 continents. White clover is a natural model for understanding how cities drive parallel evolution 529 on a global scale due to its ubiquity across the globe and ease of sampling and phenotyping. Such 530 work would advance our understanding of how cities influence the evolution of populations in 531 our own backyards, and further cement the utility of cities as useful models for understanding the 532 causes and consequences of parallel evolution in nature. 533 534 Tables   669  670  Table 1: Beta coefficients (i.e. slope) and P-values from linear models testing the change in the frequency of HCN, Ac, or Li with 671 increasing distance (standardized) from the urban center for each of 16 cities. Also shown are the total number of populations and 672 plants sampled in each city, and whether deletion haplotypes were identified in urban and rural population of that city. Bolded terms 673 represent linear clines that were significant at P < 0.05. Grey boxes represent cities where we did not quantify the frequency at the 674 genes underlying HCN. Significance of β values: *P < 0.05; **P < 0.01, ***P < 0.001 676 ‡ Cities were better fit by a quadratic model (see online supplementary text: "Assessing the fit on non-linear clines") and showed a significant non-linear change 677 in the frequency of HCN, Ac, or Li with increasing distance from the urban center.

678
† Tampa was excluded from the analysis testing the environmental predictors of the strength of clines since it was functionally fixed for HCN (Fig. 1). 679