Response to Perrier and Charmantier: On the importance of time scales when studying adaptive evolution

Inferring adaptation and evolutionary change by combining data from field studies and genomics is an exciting new area in evolutionary biology but also presents challenges. These challenges are particularly acute when the focal trait has a polygenic architecture, because many long‐term field studies are sample‐size‐limited compared to studies of humans and model organisms, making the detection of loci that contribute to trait variation difficult. In a recent comment, Perrier and Charmantier (2018); hereafter P&C, highlight these issues and draw attention to several analyses described in our recent publication (Bosse et al. 2017) on the evolution of longer bill length in UK populations of the great tit (Parus major). While we support the overall message of P&C – that caution should be exercised when making inferences about long‐term evolutionary trends from shorter ecological time series – we also address some of the specific criticisms that P&C raised about the analyses described in Bosse et al. (2017). P&C's comments can broadly be split into two sets of queries. The first considers how phenotypic variation is distributed in space and time. The second explores how signatures of selective sweeps can be sensitive to local (in the genomic sense) variation in recombination rate.

Inferring adaptation and evolutionary change by combining data from field studies and genomics is an exciting new area in evolutionary biology but also presents challenges. These challenges are particularly acute when the focal trait has a polygenic architecture, because many long-term field studies are sample-size-limited compared to studies of humans and model organisms, making the detection of loci that contribute to trait variation difficult. In a recent comment, Perrier and Charmantier (2018); hereafter P&C, highlight these issues and draw attention to several analyses described in our recent publication (Bosse et al. 2017) on the evolution of longer bill length in UK populations of the great tit (Parus major). While we support the overall message of P&C -that caution should be exercised when making inferences about long-term evolutionary trends from shorter ecological time series -we also address some of the specific criticisms that P&C raised about the analyses described in Bosse et al. (2017).
P&C's comments can broadly be split into two sets of queries. The first considers how phenotypic variation is distributed in space and time. The second explores how signatures of selective * These authors contributed equally to this article. sweeps can be sensitive to local (in the genomic sense) variation in recombination rate.

TIME SERIES OF BILL LENGTH
In Bosse et al. (2017) we presented a number of analyses supporting the contention that bill length in UK populations had been under positive selection (see below), one of which was the observation that, over a 25-year time series, bill length had significantly increased. We performed a linear regression of bill length on year of birth--in hindsight a mixed model with year fitted both as a fixed effect (to detect any temporal trend) and as a random effect (to account for annual differences in bill length variation) would have been a better choice of model. P&C have investigated the relationship in more detail. After inspection of Fig. 4B of Bosse et al. (2017), P&C observed a downturn in the mean bill length-year relationship, which they subsequently analyzed more formally by a breakpoint analysis, aimed at identifying whether there was a rapid change in the mean bill length temporal trend. P&C showed that the significant increase in the bill length linear model was dependent on the inclusion of the first five years of the dataset (1982)(1983)(1984)(1985)(1986) in the analysis; if these years are excluded from the model, bill length has significantly decreased. P&C then argued that the data cannot be used to support evidence of contemporary  evolution of bill length, and especially that the data cannot be used to support the idea that bird feeder use was driving a contemporary increase in bill length. They also suggested that further research should investigate whether an apparent decline in bill length, starting in 1986, could be caused by a genetic change, phenotypic plasticity, or a change in the measuring process.
We do not dispute that there is an apparent decline in bill length during the latter part of the time series-that pattern holds even when a mixed model with year included as a random effect is fitted (see Supplementary Information). However, we suggest that the trend should be interpreted cautiously. The authors cite the work of Rosemary and Peter Grant as their inspiration for searching for sudden changes in the trajectory of bill morphology in the great tit time series, but that work was motivated by selection witnessed after a sudden climatic event; an El Nino event in 1982-1983 causing exceptionally heavy rainfall (Grant and Grant 1993). In the analysis in Bosse et al. 2017 however, there is no a priori reason to expect bill length to have started to change after 1986. Instead, P&C's analysis was motivated by a posthoc inspection of the figure in Bosse et al. 2017. Moreover, the conclusion that bill length declined after 1986 is sensitive to the dataset used. In the supplementary material of Bosse et al. (2017) we described a longer  and larger (9980 records, from 5145 birds) dataset, comprising birds measured throughout the year; the series described in the main text of Bosse et al. (2017) and reanalyzed by P&C was a subset of 2489 birds that were measured in May or June. Using a linear-mixed model framework, the larger dataset also shows an increase in mean bill length (Table S3 in Bosse et al. 2017). Adopting a similar approach used by P&C, we searched the larger dataset, using the R package segmented (Muggeo 2008), for a breakpoint where bill length switched from an increasing trend to a declining one. The larger dataset also appears to be consistent with bill length increasing over the first half of the time series, followed by a decline in the second half ( Fig. 1; null hypothesis of no change in slope has P < 0.0001). However, the decline in bill length starts between 1995 and 1996. Therefore, the period where bill length was increasing spans an 18-year period rather than a five year one, and the decline may have started in 1995 rather than 1986. Great tit bill size, especially length, shows strong fluctuations between years and seasons as a response to diet (Gosler 1987). Thus, we think it is unwise to place too much emphasis on the possibility of an evolutionary or plastic response to an unknown environmental change occurring in 1986 (or any other year).
There is additional evidence that evolution of longer bills has been occurring over a longer, yet nonetheless recent, period in the UK population. In Fig Bill length measurements are corrected for sex, age of bird, month measured, and whether the bird was a resident or immigrant to Wytham Woods. All birds were measured by AGG.
a sample of 291 museum specimens of great tits to describe a difference in bill length between UK and mainland European populations. P&C show that there is no temporal variation in bill length in the museum samples collected in the UK between 1850 and 2007. However, given the large within-year variation in billlength (Gosler 1987), the power to detect a trend is very low in this sample (n = 177) in comparison to the contemporary data, which is why we declined to test for it in Bosse et al. (2017). Ideally, we would compare genomic breeding values from museum and contemporary birds to evaluate whether there was an underlying genetic change in bill length, but genomic data are not currently available from museum specimens.
In Bosse et al. (2017) we were careful not to make the argument that the recent Wytham time series implicated the role of bird feeders in the evolution of longer bills. Rather, the increase in bill length was one piece of evidence for the relatively recent evolution of longer bills, along with: (i) signatures of selection being more prevalent at genes associated with craniofacial morphology and palate development; (ii) loci affecting bill length being found in selective sweep regions more often than expected by chance; (iii) UK populations exhibiting longer bills than other European populations; (iv) alleles causing longer bills being associated with greater fitness; and (v) the use of haplotype-based tests of selective sweeps that are sensitive to relatively recent selection. comparison was between UK and mainland European birds, with UK birds having bills approximately 0.4 mm longer than European birds. P&C compare the UK birds to each of the mainland European countries (n = 11 countries). UK birds had longer bills than birds from these other countries, but were only significantly longer than birds from three countries (the Netherlands, France, and Italy). However, the sample sizes for the mainland countries ranged from two to 33, so the power to detect differences from the UK population was very low. P&C argue that because the bill length across different countries is not bimodal (i.e., with UK birds all in one distribution and birds from all other populations in a second distribution), that is evidence against recent evolution of bill length in the United Kingdom. This argument is incorrect. A highly polygenic trait is likely to exhibit a continuous distribution between populations, even if one (or more) population is evolving by natural selection toward a larger mean, simply due to the effects of genetic drift and environmental variation being unequal in all populations. Ideally, the effects of the environment on spatial variation in bill length would be removed or reduced by comparing the genetic component of bill length between populations. In we have genotyped birds from >20 European populations with the same SNP chip (Kim et al. 2018) as the one used in Bosse et al. (2017). Based on those data, here we ask whether the loci identified as being under selection in the United Kingdom could cause UK populations to have longer bills than mainland European populations. By using genotyped and phenotyped birds from Wytham as a training population, we performed genomic prediction of bill length in the other European populations. The results (Fig. 2) show that loci under selection cause UK populations to have greater genomic estimated breeding values (GEBVs), that is genetically longer bills, than most other European populations. Note that UK populations additional to the one at Wytham Woods are included in this dataset. It is also striking that a Finnish population has larger GEBVs than other European populations, suggesting that this population may also be experiencing selection for longer bills. All other mainland European populations have lower mean GEBVs (often significantly lower) than UK populations. Notably, the museum birds from Finland had longer bills than the other European mainland populations (see Fig. 4 of P&C), although the sample size is very small (n = 4). We interpret these results cautiously, because the accuracy of genomic prediction can depend on population structure, LD pattern (between markers and unknown causal loci), trait architecture, and genotype by environment interactions. In particular, a comparison of samples that are temporally separated by many generations may require recalibration of the relationship between SNP genotypes and phenotypic variation, as the marker-causal loci LD relationships may be altered by recombination (Habier et al. 2009). However, crosspopulation polygenic scores tend to be most reliable when population differentiation is low, as is the case here (Bulik-Sullivan et al. 2015;Berg et al. 2018).

Genomic Evidence of Selection at COL4A5
We now switch our focus to the genomic arguments made by P&C. In Bosse et al. (2017), much of the focus on genomic signatures of selection was on the COL4A5 locus. This region showed evidence of a selective sweep in the UK population, and was also one of the region's most strongly associated with bill length variation in the United Kingdom, with the selected haplotype associated with longer bills. Further, the positively selected haplotype was associated with an increased production of fledglings and an increased number of visits to feeders. It is important to note, though, that the observation that selected regions explained more variation in bill length than expected by chance was not dependent on the COL4A5 region. Neither was the observation that the Gene Ontology (GO) term most significantly overrepresented among the selected loci was palate development. Thus, the overall conclusion that bill length has been a target of selection is not critically dependent on COL4A5 being a contributor to the polygenic architecture of bill length variation.
P&C convincingly show that COL4A5 is in a region of low recombination. They point out, highlighting recent evidence (Burri 2017;Comeron 2017), that background selection in regions of low recombination can give spurious evidence of selective sweeps in F ST outlier based tests (the eigenGWAS test used by Bosse et al. 2017 falls into this category). P&C use simulations to show that eigenGWAS tests could easily give a signal that looks like positive selection, when instead linked background selection is acting within populations. They argue that the apparent adaptive evolution at COL4A5 could be an artefact of background selection in a region with limited recombination; the argument is given further credibility by the observation that the same region shows a similar signal in collared flycatchers, where recombination is also very limited (Burri et al. 2015). P&C raise an important and valid point; however, for reasons we outline below, we remain confident that it has been one of the loci involved in adaptive evolution of longer bills in UK populations.
The evidence for a selective sweep at COL4A5 came not only from eigenGWAS and F ST outlier locus analyses, which are sensitive to local recombination rates, but also from an Rsb test (Tang et al. 2007), that compares haplotype homozygosity between two populations (see Fig. S8 of Bosse et al. 2017). Rsb and similar test statistics such as the cross-population extended hap-lotype homozygosity (XP-EHH) are considered robust to local recombination rate, provided the recombination landscapes are similar in the two populations being compared (Tang et al. 2007;Enard et al. 2014). We have previously shown through linkage maps independently constructed in the United Kingdom and the Netherlands populations that recombination landscapes are highly conserved between great tit populations (van Oers et al. 2014). Furthermore, if P&C were correct and the evidence of selection at COL4A5 in the United Kingdom was an artefact caused by background selection in a region of low recombination, then there is a clear prediction: F ST or EigenGWAS tests as well as haplotype homozygosity tests (Rsb) would show strong signatures of selection between the Netherlands population and other (non-UK) populations. We find no evidence for such a pattern in either Rsb or Fst (Fig. 3). Thus, the data are more consistent with positive selection acting at COL4A5 in the United Kingdom rather than spurious signatures of a sweep caused by background selection in a low recombining region operating in all great tit populations. Finally, there is no reason to think that the association between SNPs in and around COL4A5 and bill length is an artefact caused by low local recombination rates. Although regions with low recombination rates (and therefore high linkage disequilibrium) will have enhanced power to detect genuine causal variants in a GWAS (Visscher et al. 2017), there is no reason why they should be more prone to false positive associations. We note that the region has not been associated with other traits studied in the Wytham population in previous GWAS studies (Santure et al. 2013;Santure et al. 2015;Kim et al. 2018).
In summary, we are in broad agreement with the general points made by P&C. Pinning down the time scale of adaptive evolution remains a challenging problem, especially when there have been numerous genomic regions driving an evolutionary response to selection. We hope to make progress with understanding the evolution of great tit bill length by genomic analysis of museum samples and other European populations. We remain cautious about supplementary feeders being the cause of longer bill length evolution and confirming (or ruling out) that explanation will require careful experimentation and ecological study, alongside accurate dating of when longer bill length evolution started. Despite the caveats identified by P&C, the evidence that great tit bill length has evolved under recent positive selection in UK populations clearly remains, as does the evidence that COL4A5 is likely to be one of many loci involved in this adaptation. One clear message arising from P&C and this paper is that attempts to make inference about polygenic adaptation from any single line of evidence are likely to be inconclusive. Instead, careful formulation and testing of hypotheses that incorporate different types of data are more likely to prove incisive. More generally, combining multiple interdisciplinary approaches is the key to understanding the mechanisms involved in local adaptation.

AUTHOR CONTRIBUTIONS MB,
LGS and JS performed the analyses and wrote the response. All authors approved the response. Author contributions to the original work that motivated Perrier and Charmantier's comment and our response are described in Bosse et al. (2017).

DATA ARCHIVING
Data described in this article were originally reported in Bosse et al. 2017, where they were deposited on Dryad with accession number https://doi.org/10.5061/dryad.p03j0.