Whole‐genome patterns of linkage disequilibrium across flycatcher populations clarify the causes and consequences of fine‐scale recombination rate variation in birds

Recombination rate is heterogeneous across the genome of various species and so are genetic diversity and differentiation as a consequence of linked selection. However, we still lack a clear picture of the underlying mechanisms for regulating recombination. Here we estimated fine‐scale population recombination rate based on the patterns of linkage disequilibrium across the genomes of multiple populations of two closely related flycatcher species (Ficedula albicollis and F. hypoleuca). This revealed an overall conservation of the recombination landscape between these species at the scale of 200 kb, but we also identified differences in the local rate of recombination despite their recent divergence (<1 million years). Genetic diversity and differentiation were associated with recombination rate in a lineage‐specific manner, indicating differences in the extent of linked selection between species. We detected 400–3,085 recombination hotspots per population. Location of hotspots was conserved between species, but the intensity of hotspot activity varied between species. Recombination hotspots were primarily associated with CpG islands (CGIs), regardless of whether CGIs were at promoter regions or away from genes. Recombination hotspots were also associated with specific transposable elements (TEs), but this association appears indirect due to shared preferences of the transposition machinery and the recombination machinery for accessible open chromatin regions. Our results suggest that CGIs are a major determinant of the localization of recombination hotspots, and we propose that both the distribution of TEs and fine‐scale variation in recombination rate may be associated with the evolution of the epigenetic landscape.

In the context of population genetics, meiotic recombination contributes to patterns of genomic diversity by shuffling alleles on different haplotypes. This can be a double-edged sword because recombination can not only create novel and potentially advantageous combinations of alleles but can also break up existing advantageous ones. In addition, recombination influences the efficacy of selection at neighbouring sites along a chromosome (i.e., Hill-Robertson interference) (Hill & Robertson, 1966). Moreover, recombination together with the density of targets for selection influences genomewide variation in genetic diversity via the process of linked selection (Cutter & Payseur, 2013). Both selective sweeps and background selection reduce genetic variation at neutral sites linked to those under natural selection, which manifests as a locally reduced effective population size (N e ). This association between the rate of recombination and local N e in turn leads to locally accelerated lineage sorting during the process of speciation in regions of low recombination (Charlesworth, 2009). Therefore, meiotic recombination has been suggested to play an important role in shaping the differentiation landscape and linked selection can be seen as a null model in explaining the existence of islands of differentiation (Cruickshank & Hahn, 2014;. Recombination can also affect local nucleotide composition by the process called GC-biased gene conversion (gBGC) (Duret & Galtier, 2009). gBGC is associated with meiotic recombination and leads to the preferential fixation of "strong" bases (G and C) over "weak" bases (A and T). Although this bias is not very strong, gBGC gradually establishes a positive correlation between GC content and recombination rate over time (Backstr€ om et al., 2010;Birdsell, 2002;Duret & Arndt, 2008). Recombination may further impact genomic stability by introducing point mutations and structural rearrangements via nonallelic homologous recombination (Sasaki, Lange, & Keeney, 2010;Strathern, Shafer, & McGill, 1995). Moreover, distribution of transposable elements (TEs) is associated with variation in recombination rate due to effective elimination of deleterious TE insertions at high recombination regions and/or suppression of recombination via epigenetic regulation of TE activities (Dolgin & Charlesworth, 2008;Rizzon, Marais, Gouy, & Biemont, 2002). However, several studies suggested a positive association between recombination and insertions of several TE families because of their integration site preferences (Baller, Gao, & Voytas, 2011;Liu et al., 2009;Yoshida et al., 2017), indicating complex interaction between recombination and TE distribution.
Studies in rodents and primates have shown that recombination is genetically regulated by localizing recombination-initiating DNA double-strand breaks (DSBs) to small regions of the genome, known as "recombination hotspots," where recombination rate is 100-to 1,000fold higher than the genomic average (Baudat, Imai, & de Massy, 2013;. These recombination hotspots are usually less than a few kilobases (kb) long and contain degenerate DNA sequence motifs of 7-13 nucleotides, which are recognized by the zinc finger protein PR domain-containing 9 (PRDM9) (Myers et al., 2010;Parvanov, Petkov, & Paigen, 2010). However, many nonmammalian species, such as birds, plants and yeasts, lack PRDM9 but still show highly variable rates of recombination across the genome with distinct recombination hotspots (Choi et al., 2013;Pan et al., 2011;Singhal et al., 2015;Smeds, Mugal, Qvarnstr€ om, & Ellegren, 2016). In addition, dogs, where PRDM9 was pseudogenized, still show hotspots (Auton et al., 2013;Axelsson et al., 2012;Berglund, Quilez, Arndt, & Webster, 2015). In these species, recombination hotspots tend to coincide with genomic features, such as regions in the proximity of transcription start sites (TSSs) and transcription termination sites (TTSs). It has been suggested that colocalization of recombination hotspots with transcription initiation contributes to an evolutionary stable recombination landscape (Axelsson et al., 2012;Lam & Keeney, 2015;Singhal et al., 2015). However, it is not clear whether transcriptional activity per se or underlying genomic features that are associated with promoter regions, such as CpG islands (CGIs), are responsible for this colocalization.
Studies of ecological adaptation often use multiple pairs of populations in ecologically distinct habitats (e.g., Fraser, Kunstner, Reznick, Dreyer, & Weigel, 2015;Hohenlohe et al., 2010;Westram et al., 2014), and shared patterns of elevated genetic differentiation along a chromosome are often considered as a signature of parallel evolution.
However, because of the potential influence of recombination rate variation on several population genetics statistics (e.g., nucleotide diversity [p] and genetic differentiation [F ST ]) via the effect of linked selection, conserved patterns of recombination rate variation can lead to concerted evolution of diversity and differentiation landscapes independently of ecological selection Roesti, Hendry, Salzburger, & Berner, 2012;Van Doren et al., 2017;Vijay et al., 2017).
On the other hand, changes of the recombination landscape can affect the pattern of genetic diversity in a given lineage, mimicking evidence of lineage-specific adaptation. For proper inferences of patterns of genetic diversity, it is therefore important to characterize variation in recombination rate for each population or species under study.
Using linkage map data, we have previously documented heterogeneity in the rate of recombination across the genome of collared flycatcher (Ficedula albicollis) at the resolution of 200-kb windows . The precision of pedigree-based recombination rate estimates (hereafter referred to as pedigree recombination rate) is dependent on the number of meiotic crossovers in a given pedigree, and, as a result, there were regions with insufficient resolution in our previous study, particularly at the centre of chromosomes where crossovers rarely occurred. To gain a better insight into variation in recombination rate in avian genomes, we here estimate the population-scaled recombination rate (q) based on whole-genome patterns of linkage disequilibrium (hereafter referred to as LD recombination rate) in multiple populations of two closely related species of Old World flycatchers, collared flycatcher and pied flycatcher (F. hypoleuca). These two species are estimated to have diverged <1 million years ago with speciation essentially completed, although occasional hybridization still occur (Nadachowska-Brzyska, Burri, Smeds, & Ellegren, 2016;Nadachowska-Brzyska et al., 2013;Nater, Burri, Kawakami, Smeds, & Ellegren, 2015;Wiley, Qvarnstrom, Andersson, Borge, & Saetre, 2009 , 2012;Chan, Jenkins, & Song, 2012). In contrast, the pedigree recombination map reflects the per-generation recombination rate observed in a given pedigree. Therefore, LD recombination rate estimates of the two flycatchers do not only allow us to characterize variation in recombination rate and its regulation at much higher resolution than before but also to analyse the conservation of the recombination landscape and the location of hotspots among populations and species at multiple evolutionary timescales. As the heterogeneous landscape of differentiation in Ficedula flycatchers seems mainly to be the result of the extent of linked selection , species-specific estimates of recombination rate further make it possible to study the impact of changes in the recombination landscape on differentiation.

| Samples and SNP selection
We used whole-genome sequences of 77 collared flycatchers from  : u006, u012, u016, u017 and u117). Detailed information about whole-genome sequencing and sequence analysis is described in . Briefly, sequence reads obtained by Illumina HiSeq2000 were mapped to a repeat-masked version of the  (Scheet & Stephens, 2006) using default parameters except for the maximum number of EM iteration that was set to 50. The subpopulation option was used to specify four populations within species. Five, 10 and 15 clusters of haplotypes were considered for determining the number of haplotype clusters.

| Estimation of population recombination rate
LD recombination rate q (q = 4N e r, where N e denotes the effective population size, and r denotes the recombination rate) was estimated using LDhelmet (Chan et al., 2012). This program uses a coalescent- We ran five independent rjMCMC simulations for each population with each run for 2,000,000 iterations after 200,000 iterations of burn-in. Estimated LD recombination rate was averaged over the five runs. Based on the results of simulation analysis (Text S1, Figure S1 Supporting Information), we used a block penalty of 10 to minimize overfit while maintaining resolution of variation in recombination rate.
To characterize variation in recombination rate across genomes, we calculated weighted average of the LD recombination rate in To evaluate sensitivity and specificity of detecting recombination hotspots, we used the program MaCS (Chen, Marjoram, & Wall, 2009) to simulate sequences of 200 kb with various combinations of effective population size, background recombination rate, the number of haplotypes and hotspot intensity and width (Text S1, Supporting Information). Based on the results of these simulations (Table S1, Supporting Information), recombination hotspots were called if given pairs of SNPs had at least 10 times higher recombination rate than the mean recombination rate of each surrounding 200-kb window, and the width was at least 750 bp from the beginning to the end of hotspots.
Recombination hotspots were not called in windows with low SNP density (<1 SNP/kb). LDhelmet occasionally infers narrow spikes with unusually high q, which is likely to be a spurious artefact of the rjMCMC procedure (Chan et al., 2012). To distinguish between spurious peaks and true recombination hotspots, we only called recombination hotspots that were detected in all five independent rjMCMC runs.
Based on the expected allele frequency shifts at recombination hot-

| Motif identification
We used HOMER 4.2 (Heinz et al., 2010) and MEME suite 4.11.1 (Bailey, Johnson, Grant, & Noble, 2015) to discover sequence motifs associated with recombination hotspots. In the HOMER de novo motif enrichment analysis, we searched for 6-to 16-bp sequence motifs enriched at 1-kb regions containing recombination hotspots at the centre relative to 1-kb "cold" regions that were at least 10 kb away from neighbouring hotspots and had a GC content similar to the one in hotspots (<2% difference). In the MEME motif discovery analysis, 6to 20-bp sequence motifs were searched at 1-kb hotspot regions relative to "cold" regions using discriminative motif discovery mode. Only hotspots shared among populations of collared flycatcher were used.

| Annotation of the recombination landscape
To investigate the distribution of hotspots with respect to annota- To estimate LD recombination rate in TEs and compare it with the approximate time since transposition of TE copies, we re-estimated LD recombination rate using repeat unmasked data. LD recombination rate was calculated for each copy of TEs using all pairs of adjacent SNPs within the TE copies. We chose 72 LINE subfamilies and 127 LTR retrotransposon subfamilies, which had more than 50 copies within each subfamily in the collared flycatcher genome assembly. Consensus sequences of subfamilies of these TEs were constructed following Smeds et al. (2015) and merged with a library from Vijay et al. (2017) (Data S1, Supporting Information).
CpG-corrected Kimura 2-parameter (K2P) genetic distances were calculated between TE copies and the consensus sequences of each subfamily, and these distances were used as a proxy for the time since transposition or "age of TEs" (Kapusta & Suh, 2017). Correlation between mean K2P genetic distance and LD recombination rate was evaluated separately for LINEs and LTR retrotransposons using all copies within LINEs and LTR retrotransposons.
To test whether LD recombination rate was higher in TEs with CGIs than those without CGIs, LINE and LTR retrotransposons were subdivided into four classes: (i) TEs whose ancestral consensus sequences contained CGIs and were identified within 400 bp from the nearest CGIs in the present-day reference genome, (ii) TEs whose ancestral consensus sequences did not contain CGIs and were identified within 400 bp from the nearest CGIs in the presentday reference genome, (iii) TEs whose ancestral consensus sequences contained CGIs and were inserted >400 bp away from CGIs in the present-day reference genome and (iv) TEs whose ancestral consensus sequences did not contain CGIs and were inserted

| The relationship between recombination rate and the diversity landscape
It has previously been shown that variation in genetic diversity within both collared flycatcher and pied flycatcher genomes is correlated with the pedigree-based recombination rate in collared flycatcher . Here we extend these results by analysing the relationship between species-and population-specific LD recombination rates, and species-and population-specific data on genetic diversity. LD recombination rate and nucleotide diversity

| Recombination hotspots
To investigate variation in recombination rate at a fine scale, we Following a previous approach (Singhal et al., 2015), we defined shared hotspots between populations or between species as those whose mid-points were located <3 kb from each other in the genomes of the respective population or species. In collared flycatcher, 51% of hotspots were shared between at least one pair of populations within species, while 39% were shared between at least one pair of populations between species (Figure 3). The corresponding proportions in pied flycatcher were 69% and 45%, respectively. For hotspots not defined as shared between species (i.e., being speciesspecific), recombination rate in the other species was still elevated, indicating that the extent of hotspot sharing was somewhat dependent on the threshold used for calling hotspots. For hotspots unique to collared flycatcher, the LD recombination rate in the corresponding regions in the pied flycatcher genome was about four times higher than the local background (Figure 3b). Similarly, for hotspots unique to pied flycatcher, the LD recombination rate in collared flycatcher was seven times higher than the local background (Figure 3c). We observed similar patterns for population-specific hotspots within species ( Figure S8, Supporting Information).
We next sought to validate the patterns of interspecific variation in the intensity of recombination hotspots by taking advantage of the expected effects of recombination on the site frequency spectrum via gBGC. There were 2,669 and 5,020 hotspots that were called as unique to collared flycatcher or pied flycatcher (hereafter referred to as "collared flycatcher-specific hotspots" and "pied flycatcher-specific hotspots," respectively; Figure 3b and c). Provided that the location of elevated recombination is conserved between species (even though called as hotspot only in one species), we would expect that stronger signatures of gBGC should be observed in both species at hotspots than coldspots. In agreement with this prediction, values of H gBGC were more negative in both species at recombination hotspots than at coldspots regardless of which species were used for identifying hotspots (bootstrap with 1,000 replicates, p = .001 and p = .008 for collared flycatcher and pied flycatcher, respectively) ( Figure 4). In addition, for both species, H gBGC tended to show more negative values when hotspots were identified in the focal species than in the other species (Figure 4), although the differences were not statistically significant (bootstrap with 1,000 replicates, p = .26 at collared flycatcher-specific hotspots and p = .16 at pied flycatcher-specific hotspots).

| Association between recombination hotspots and genomic features
We searched the collared flycatcher genome for sequence motifs associated with recombination hotspots using HOMER, which uses ZOOPS scoring (zero or one occurrence per sequence) coupled with hypergeometric enrichment calculations to determine motif enrichment at hotspots (Heinz et al., 2010). Additionally, we used MEME, which discovers longer motifs by calculating position-specific priors to search motifs enriched in hotspot regions based on the expectation-maximization algorithm (Bailey et al., 2015). HOMER discovered 230 motifs that were significantly enriched at recombination hotspots, many of which were mostly composed of G and C nucleotides (mean GC content = 83%, The number of hotspots shared within species, between species and both within and between species is also shown.

| DISCUSSION
Genomewide polymorphism data obtained by resequencing of individual genomes have opened up a new avenue for estimating recombination rate at a very fine scale in various organisms (Auton et al., 2012(Auton et al., , 2013Singhal et al., 2015). Using whole-genome polymorphism data of four populations of collared flycatchers and four populations of pied flycatchers, we found highly heterogeneous recombination landscapes in both species. Variation in LD recombination rate at 200-kb resolution was largely concordant with recombination rate data from a collared flycatcher linkage map . However, the LD recombination map could improve the existing pedigree recombination map in at least three aspects.
First, the LD recombination map covered genomic regions that were not represented in the pedigree recombination map due to lack of genetic markers, particularly regions towards the ends of chromosomes. This allowed detection of steep decline in recombination rate at the very ends of chromosomes (Figures S3 and S4,Supporting Information), similar to what has been reported in yeasts, plants and Drosophila (Comeron, Ratnappan, & Bailin, 2012;Lam & Keeney, 2015;Yelina et al., 2015). This is potentially because of densely packed chromatin in these highly heterochromatic regions and/or a deleterious effect of recombination in repeat-rich regions (e.g., nonallelic homologous recombination and segregation errors) (Barton, Pekosz, Kurvathi, & Kaback, 2008). Second, the LD recombination map offered much higher resolution of recombination rate variation than the pedigree-based map ( Figure S6 Supporting Information). In particular, the LD recombination map allowed for identification of recombination hotspots and thus direct analyses of the association between recombination events and genomic features. Finally, our approach provided a recombination map for the pied flycatcher, which we previously had to assume to be identical to the collared flycatcher map as well as population-specific maps of both species.
Using these population-and species-specific maps, we could study the degree of conservation of rates and patterns of recombination within and between species. Below, we discuss the evolution of the recombination landscape in the two closely related Ficedula flycatchers from broad chromosome-scale resolution to fine-scale local patterns.

| Interspecific variation in recombination landscape
The recombination landscape was generally conserved between col- Several recent studies have suggested that a conserved recombination landscape between distantly related birds is the cause of concordant patterns of genetic diversity and differentiation along genomes of different avian species (Dutoit et al., 2017;Van Doren et al., 2017;Vijay et al., 2017). While global conservation of recombination landscape seems indeed likely within flycatcher lineages, our observations demonstrate that small differences in recombination rate between even closely related taxa can be translated into species-specific patterns of genetic diversity and differentiation. This would be particularly so in low recombination regions because only a few recombination events are sufficient to counteract the effects of linked selection (Charlesworth & Campos, 2014).
In addition to the conservation of recombination rate at the scale of 200-kb windows, we also found conservation of the fine-scale localization of recombination events between species, that is, conservation of hotspot location (Figure 3). This was supported by the KAWAKAMI ET AL.
| 4167 strong signatures of gBGC seen in both species at hotspots (Figure 4). Recombination intensity appeared to differ between species at species-specific hotspots, but the differences were not statistically supported in our H gBGC measures, possibly reflecting the recent divergence of the two species. Different recombination intensities at hotspots have been reported in three commercial chicken breeds, where similar associations between recombination and genomic features were observed (Pengelly et al., 2016). One possible explanation for the variation in recombination intensity at hotspots in species without PRDM9 is that differences in gene expression during meiosis results in differences in the likelihood of DSB and thereby recombination frequency between individuals and between species (Adrian & Comeron, 2013;Petes, 2001). In the next section, we further dis-  (Flanagan et al., 2005;Li et al., 2006;Ramirez-Carrozzi et al., 2009;Thomson et al., 2010). It has been suggested that genomic regions with less condensed chromatin have higher recombination rates by allowing the recombination machinery to bind to DNA and initiate DSBs (Berglund et al., 2015;Choi et al., 2013;Comeron et al., 2012;Shilo, Melamed-Bessudo, Dorone, Barkai, & Levy, 2015). It is notable that humans and chimpanzees, which have a functional PRDM9, also showed elevated recombination rates at regions nearby TSSs and CGIs, suggesting that the ancestor of these species and other mammalian species has acquired the PRDM9-dependent regulation of recombination while maintaining the ancestral mechanism of localizing recombination nearby genes and CGIs (Auton et al., 2012). However, a recent study implicated a possible involvement of PRDM9 in recombination in more basal vertebrate lineages (Baker et al., 2017), which warrants further in-depth analysis of the evolutionary origin of the PRDM9-dependent recombination.
We also found that hotspot localization was not limited to promoter-associated CGIs, but intergenic CGIs were also associated with recombination hotspots. It has been shown that artificial insertion of CGIs into a mouse genome induced H3K4ME3 and modified local chromatin state even at sites away from existing promoters, suggesting that structural chromatin modification can be induced by unmethylated CGIs without promoter function (Thomson et al., 2010). Therefore, regardless of their functional significance and relative location to promoters, unmethylated CGIs may be generally targeted by the recombination machinery. Moreover, our results prompt the idea that it is not promoters per se that are associated with recombination, but that only CGI-containing promoters show such an association.
In addition to CGIs, our enrichment analysis identified several other sequence motifs associated with recombination hotspots (Figure S9, Supporting Information). Among these, the CT-rich motifs and the A-rich motif were remarkably similar to the ones identified as associated with recombination in humans and chimpanzees (Auton et al., 2012), finches (Singhal et al., 2015), Drosophila melanogaster (Comeron et al., 2012), Arabidopsis thaliana (Choi et al., 2013;Shilo et al., 2015) and yeast (Mancera, Bourgon, Brozzi, Huber, & Steinmetz, 2008). In plants, for example, high recombination regions with CT-rich motifs are accompanied by trimethylation of H3K4 and histone variant H2A.Z, which lead to low nucleosome density and increased chromatin accessibility (Choi et al., 2013;Shilo et al., 2015). In plants and yeasts, the A-rich sequence is known to be associated with nucleosome-depleted regions due to their unusual mechanical properties, which disfavour nucleosome formation (Choi et al., 2013;Jansen & Verstrepen, 2011;Segal & Widom, 2009).
Characterization of epigenetic patterns at these motifs would represent an important topic for future research in order to investigate whether these motifs can modify patterns of recombination landscape independently or in concert with CGIs in avian genomes.

| Transposable elements at recombination hotspots
TEs were preferentially located in high recombination regions, which is in contrast to findings in insects and plants (Rizzon et al., 2002;Xu & Du, 2014). The association between TEs and recombination seen in collared flycatcher appears counter-intuitive in the light of the increased efficiency of selection against deleterious insertions of TEs in high recombination regions (Dolgin & Charlesworth, 2008;Petrov, Fiston-Lavier, Lipatov, Lenkov, & Gonz alez, 2011). To test whether the positive association between TEs and recombination rate is specific to flycatchers or common to other birds, we used LD recombination rate data from zebra finch (Singhal et al., 2015) and similarly found a significantly higher recombination rate at TEs For example, LTR retrotransposons in yeast preferentially insert upstream of genes transcribed by RNA polymerase and nucleosomefree regions flanking genes (Baller et al., 2011;Guo & Levin, 2010).
Assuming an indirect relationship between retrotransposons and elevated recombination rate by the shared preference for an accessi-  Figure S13, Supporting Information), which supports the idea that young TEs reflect a more recent state of the epigenetic landscape with higher accessibility to the recombination machinery.
Our results highlight the importance of CGIs as a key molecular feature for the localization of meiotic recombination events in flycatchers. Our analysis thus refines the model proposed for finches (Singhal et al., 2015) and suggests that it is not functional elements per se that play an important role for attracting the recombination machinery, but that the number and location of recombination events depend on the epigenetic status of CGIs irrespective of their genomic location. It is tempting to speculate that the conservation of recombination hotspot locations is a consequence of the conserved distribution of CGIs between the closely related flycatchers, but that the activity of recombination at hotspots can vary between them due to the differences in the epigenetic status of CGIs. The