Flycatcher single-nucleotide polymorphism array
Recent advancement in high-throughput sequencing technologies has enabled rapid and reliable discovery of genome-wide SNP markers for ecologically important organisms. We developed a custom collard flycatcher 50K SNP array using the comprehensive genomic resources recently developed for Ficedula flycatchers in the form of a draft genome assembly, genome-wide SNP discovery by whole-genome resequencing of population samples and transcriptome sequencing (Ellegren et al. 2012). The SNP array has markers that cover all 30 characterized chromosomes in the genome assembly plus all large unassigned scaffolds, a proportion of which are likely to originate from still uncharacterized microchromosomes.
The SNP selection criteria that we employed for array development proved highly successful. Only 2162 markers failed to produce scorable genotypes in both collared and pied flycatchers (95% success rate), suggesting that (i) flanking sequences for probe design were successfully extracted from the reference collared flycatcher genome (Ellegren et al. 2012) and (ii) these flanking sequences were well conserved between and within species. High reproducibility using replicated samples (50 of 50) of collared flycatchers confirms a reliable genotyping with very low error rate. When it comes to informativeness, two important observations were made. First, by selecting markers based on the degree of polymorphism initially seen in whole-genome resequencing, it is possible to obtain an array with markers of high polymorphism information content. Second, the resequencing of 9–10 individuals of each species was in most cases not sufficient for identifying sites truly fixed for alternate alleles. Of 4246 Category II markers initially suggested to be species-diagnostic, only 19.5% were subsequently found to be fixed for alternate alleles in the much larger population samples (Table 2). However, largely nonoverlapping allele frequency spectra (Fig. 2) still support the utility of Category II markers for charactering patterns of genetic admixture between collared and pied flycatchers (Fig. 5 and Fig. S1, Supporting information).
Linkage disequilibrium decay
We have previously reported that genome-wide divergence between collared flycatcher and pied flycatcher is highly heterogeneous and is represented by ~50 ‘genomic islands’ that are usually associated with higher LD than background genomic regions (Ellegren et al. 2012). Our new genotype data using the SNP array confirmed this observation by showing larger mean LD between pairs of markers, and slower LD decay, within the islands than outside these regions (Fig. 3). Variation in the extent of LD is caused by a number of inter-related mechanisms, such as differences in recombination rate, mutation rate, genetic diversity, selection, effective population sizes and genetic drift (Hill & Robertson 1968; Barton 2000; Pritchard & Przeworski 2001; Wang et al. 2002; Stumpf & McVean 2003; Rundle & Nosil 2005; Slatkin 2008). As many of the islands appear to be located close to predicted centromeric and telomeric regions (Ellegren et al. 2012), extended LD can also be associated with underlying molecular and genetic features at centromeric/telomeric regions. Several species show reduced recombination rate near centromeres (chicken, Groenen et al. 2009; tomato, Sherman & Stack 1995), while other species showed that variation in recombination rate was not strongly correlated with centromeres (domestic pig, Tortereau et al. 2012). As the location of centromeres was predicted based on the homologous chromosomes of zebra finch, the accurate karyotype of collared flycatcher will be essential for further investigating the relationship between the extent of LD and centromeric/telomeric regions.
We have previously reported rather extended LD in the collared flycatcher using 34 SNPs from 23 different genes on the Z chromosome; LD in the form of D′ dropped below 0.5 at ~400 kb (Backström et al. 2006). When the same LD metric was used here, D′ dropped below 0.5 at ~240 kb using 770 SNPs on the Z chromosome. Shorter LD in the current study may result from much larger sample size (82 females vs. 221 males) as well as higher marker density. In addition, the markers of our previous study span roughly 50% of the entire Z chromosome, while the markers of the current study cover nearly 100% of the Z chromosome. As the ends of avian chromosomes tend to have higher recombination rates than centre of chromosomes (Backström et al. 2010), this may have resulted in the detection of a more rapid decay of LD in the present study. Such bias may very well be a general feature of population genetic studies using limited amounts of markers.
The SNP array presented here offers a valuable resource for future studies of Ficedula flycatchers, such as linkage mapping, association mapping, LD mapping and scans for selective sweeps. At the same time, our results also highlight the current difficulties and challenges in developing genomic toolkits for natural populations. First and foremost, because of the rapid decay of LD in most parts of the flycatcher genome, completely covering the entire genome with independent SNP sets would require a much larger number of markers. For instance, given the distance over which LD decays to the background level (mean of 17 kb on each side), >32 000 evenly distributed markers with distance between markers ≤34 kb would be required to cover the entire flycatcher genome with the size of 1.1 Gb. As the flycatcher array contained ~13 000 polymorphic markers after pruning markers within 34 kb from neighbouring markers, ~60% of the flycatcher genome is not covered by markers represented on the array. Second, consistent with the above calculation, the tag SNP and block structure analysis revealed that the number of markers on the array is too low to completely cover variation in the whole genome. With a moderate LD threshold of r2 = 0.5, it is still required to use 95.4% of the markers (32 289 tag SNPs) to efficiently represent all markers on the array. Finally, the genome-wide pattern of rapid LD decay is further illustrated by the existence of a large number of short LD blocks with <1 kb and with a median block size of 3.0 kb. However, it should be noted that our set of markers was biased towards high-recombination regions, resulting in the recovery of a large number of small LD blocks.
The collared flycatcher and the pied flycatcher are almost completely reproductively isolated from each other, yet occasionally form heterospecific breeding pairs and hybridize, which creates individuals of mixed ancestry (reviewed by Sætre & Sæther 2010 and references therein). The fitness of hybrids is severely reduced, with apparent sterility of females (Alatalo et al. 1990; Gelter & Tegelström 1992) and with reduced fertility of males (Ålund et al. 2013) and sexual selection against intermediate phenotypes contributing to reduced male fitness (Svedin et al. 2008; see further below). According to field observation, about 4% of breeding pairs are mixed and about 3% have a hybrid male breeding in our sympatric study populations (Svedin et al. 2008). The genetic admixture analysis identified a total of five F1 hybrids in our main sample cohort of collared flycatchers and pied flycatchers (Table 1). It is not surprising to find a small portion of hybrids because of the difficulty in confidently identifying hybrids of these species, in particular of females. This is also reflected in the fact that seven of 31 individuals classified as hybrids based on their phenotypic characters actually turned out to have genotypes corresponding to pure species (Table 1; cf. Veen et al. 2001). Interestingly, of 17 offspring from mixed breeding pairs, only four had F1 hybrid genotypes (21%). Extra-pair paternity occurs relatively frequently in collared flycatchers (Sheldon & Ellegren 1999), and even more so in mixed pairs (Veen et al. 2001), and it has been suggested that female collared flycatchers can reduce the indirect costs of mixed pairing (unfit offspring) by engaging in conspecific extra-pair copulations, either as an active strategy or favoured via conspecific sperm precedence (Veen et al. 2001). At the same time, direct benefits of hybridization could be accrued via ecological factors, such as habitat conditions (Veen et al. 2001; Wiley et al. 2007). Our results quantitatively support that conspecific fathers often sire offspring from mixed pairings in this system.
Our SNP array contained 965 fully diagnostic markers to distinguish the two flycatcher species, and a subset of these markers with minimal linkage (466 markers, >100 kb apart from each other) was applied to characterize genetic ancestry of 33 putative hybrids. The ancestry analysis revealed that all of these individuals had F1 hybrid genotypes, and there were no backcrosses or more advanced later-generation hybrids (Fig. S1, Supporting information). For the same populations, Wiley et al. (2009) estimated the incidence of F1 hybrids, first-generation backcrosses (B1) and second or later-generation backcrosses (B2) as 0.9–1.8%, 0.4–0.5%, and <0.3%, respectively, using 40 informative SNPs to distinguish these species. As most of the hybrids detected in the present study did not come from a random sample of birds, the incidence of hybrids is not directly comparable between these two studies. Importantly, however, three hybrids genotyped in both studies were classified as B1 (one) and B2 (two) hybrids by Wiley et al. 2009, whereas all of them had hybrid F1 genotypes in the present study. This discrepancy is likely explained by the difference in the number of markers used (40 vs. 466), the number of individuals screened for assessing species-specificity, and/or because the 40 SNPs in the previous study were considered informative based on allele frequency distributions in separate allopatric populations. It could thus be that backcrosses are extremely rare in these sympatric flycatcher populations and that the amount of ongoing gene flow is very low, if at all present. More generally, this illustrates the limitations associated with admixture analyses when genome-wide approaches cannot be taken.
Absence of backcrosses and later-generation hybrids implies strong selection against F1 hybrids. Using approximate Bayesian computation (ABC) for reconstructing the demographic history and timing of speciation, we recently estimated their divergence to be <1 Ma and gene flow from pied flycatcher into collared flycatcher at a rate of 0.16–0.36 migrants per generation (Nadachowska-Brzyska et al. 2013). Although the timing of gene flow could not be precisely ascertained, a model with recent gene flow after the last glacial maximum (LGM) was suggested. If this is correct, several scenarios for how to view the present results are possible. One is that the rate of gene flow differs between different hybrid zones and areas of secondary contact of these species in such a way that it is lower in our study populations than elsewhere. Flycatcher populations on the Baltic Sea islands Gotland and Öland have most likely come into secondary contact only recently (Qvarnström et al. 2010). However, if anything, one might have expected stronger barriers to gene flow in old hybrid zones than in areas of recent contact. A previous study using 25 microsatellite loci and 20 SNPs supports this scenario, with the highest introgression in populations of Gotland and Öland (Borge et al. 2005). Alternatively, it could be that our sampling regime during field studies does not provide a random representation of the population, for example, because hybrids are more dispersive. Of course, given the uncertainty in the ABC estimates, it may be that gene flow only occurred up until, or before, the LGM and that strong reproductive incompatibilities evolved very recently.
Previous studies suggest that various types of mechanisms are involved in the reproductive isolation of these species (Qvarnström et al. 2010; Sætre & Sæther 2010). First, mating success rate of hybrid males is lower than pure males of either species because of their intermediate plumage characters and mixed song, which is disadvantageous for attracting mating partners (Svedin et al. 2008). Second, even after successful mating, genes of hybrid males were less likely to contribute to the subsequent generations due to the low hatching rate of their offspring and high susceptibility to extra-pair paternity, where hatched nestlings were likely to be sired by other males of the pure species (Svedin et al. 2008; Ålund et al. 2013). Finally, fitness of F1 hybrids is much lower than pure species due to complete sterility in female hybrids and severely reduced reproductive performance of male hybrids by producing a high proportion of malformed sperm (Alatalo et al. 1982; Sætre et al. 1999; Veen et al. 2001; Svedin et al. 2008; Wiley et al. 2009; Ålund et al. 2013). Evolution of such strong intrinsic postzygotic isolation despite the recent divergence time between collared flycatcher and pied flycatcher makes these species unusual because diverging avian lineages are thought to develop intrinsic reproductive incompatibility more slowly (Price & Bouvier 2002; Fitzpatrick 2004). Therefore, rarity of backcross hybrids, coupled with the existence of strong postzygotic reproductive isolation, highlights that speciation progressed very rapidly in collared flycatcher and pied flycatcher.
The genetic basis for reduced fitness of hybrids could take other forms of incompatibility than a standard Bateson–Dobzhansky–Müller model where interacting loci are fixed for different alleles in hybridizing species. One such scenario is the case when one of the alleles (a1), but not the other (a2), at a polymorphic locus in one of the parental species shows reduced compatibility when interacting in a hybrid with a locus that is fixed for a species-specific allele (b2) in the other parental species. If this were the case, we might expect to see a distortion in the segregation of a1 and a2 alleles when transmitted to hybrids. We tested for this scenario by comparing allele frequency distributions at segregating sites in collared flycatcher with the frequency distribution in hybrids of alleles transmitted by this species (Fig. 6). The test was limited to 23 846 loci where pied flycatchers were fixed for one of the alleles (a2), meaning that the allelic contribution of the collared flycatcher parent could always be inferred. Four loci showed a highly significant deviation from the expected transmission ratio (Table 3). However, all of these four loci showed a deficit of the a2 allele (i.e. an excess of the a1 allele), which is not easily conceived under a model of incompatibilities between a1 and variants of other loci transmitted by the pied flycatcher parent. We still think this is worthy of further investigation as the signal of biased transmission in each of the cases was very strong. Deviation from random inheritance of gametes can result from a number of mechanisms, including meiotic drive (de Villena & Sapienza 2001; Zollner et al. 2004; Huang et al. 2013).