Carmen Segarra, Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Diagonal 645, 08028 Barcelona, Spain. Tel.: 34 93 4021848; fax: 34 93 4034420; e-mail: firstname.lastname@example.org
Drosophila madeirensis is an endemic species of Madeira that inhabits the island Laurisilva forest. Nucleotide variation in D. madeirensis is analysed in six genomic regions and compared to that previously reported for the same regions in Drosophila subobscura, an abundant species in the Palearctic region that is closely related to D. madeirensis. The gene regions analysed are distributed along the O3 inversion. The O3 arrangement is monomorphic in D. madeirensis, and it was present in ancestral populations of D. subobscura but went extinct in this species after the origin of the derived OST and O3+4 arrangements. Levels of nucleotide polymorphism in D. madeirensis are similar to those present in the OST and O3+4 arrangements of D. subobscura, and the frequency spectrum is skewed towards rare variants. Purifying selection against deleterious nonsynonymous mutations is less effective in D. madeirensis. Although D. madeirensis and D. subobscura coexist at present in Madeira, no clear evidence of introgression was detected in the studied regions.
Drosophila species have been used as a model system in evolutionary studies both in the laboratory and in nature (reviewed in Powell, 1997). Standing genetic variation has been characterized at different levels in natural populations of several Drosophila species. However, the cosmopolitan species Drosophila melanogaster and Drosophila simulans are best characterized at the molecular level. Since the work of Kreitman (1983) in D. melanogaster, a large number of population studies performed by direct DNA sequencing have been published. These studies, which at first were based on a single locus and later used multilocus approaches, are now being improved in genome-wide surveys (Begun et al., 2007; Sackton et al., 2009). In contrast to the rather large number of surveys in?cosmopolitan and other widely distributed species, molecular population genetics studies in island-endemic Drosophila species are still scarce. Multilocus analyses have been reported in three island species –Drosophila mauritiana, Drosophila sechellia and Drosophila santomea– that together with D. melanogaster, D. simulans and Drosophila yakuba constitute the melanogaster group of the Sophophora subgenus. Drosophila mauritiana (endemic at the Mauritius Island) and D. sechellia (endemic at the Seychelles archipelago) are closely related to D. simulans, whereas D. santomea (endemic at the São Tomé Island) is a close relative of D. yakuba. Comparison of the level of nucleotide variation between each endemic species and its close widely distributed relative has yielded contradictory results. A strong reduction of variation has been described in D. sechellia relative to D. simulans (Kliman et al., 2000; Legrand et al., 2009). In contrast, D. mauritiana exhibits only slightly lower nucleotide diversity than D. simulans (Kliman et al., 2000; McDermott & Kliman, 2008). Drosophila santomea also exhibits a lower level of variation than D. yakuba (Llopart et al., 2005), with an up to two-fold reduction of silent nucleotide diversity for different X-linked loci (Bachtrog et al., 2006). These contradictory results would indicate that not only the speciation process itself and the reduced distribution area but other factors might affect the level of variation in endemic species in relation to their close relatives.
Drosophila madeirensis is an endemic species of the Madeira Island (Portugal) located 32.75°N and 16.89°W in the Atlantic Ocean. The Madeira archipelago has a volcanic origin that has been dated to some 5–6 Mya (Galopin de Carvalho & Brandão, 1991). Natural populations of D. madeirensis inhabit the Laurisilva forest, which is a relict of the Pliocene subtropical forest. Nowadays, the Laurisilva forest from Madeira represents the largest surviving area of this vegetation type and it is included in the World Heritage List (IUCN 1999). A unique suite of plants and animals, including many endemic species such as D. madeirensis, are associated with the different Laurisilva forests. Drosophila madeirensis is thus a specialized species with a restricted distribution area. These characteristics might have affected its effective population size that can be expected to be relatively low.
Drosophila madeirensis is closely related to Drosophila subobscura, a widespread and abundant Palearctic species (Krimbas, 1993) that has recently colonized the American continent (Prevosti et al., 1988). Natural populations of D. subobscura have been extensively surveyed in the last 50 years, and the adaptive character of the species’ chromosomal polymorphism is well established. Chromosomal inversions present latitudinal clines in the same direction both in Europe and in America (Prevosti et al., 1988), and their frequencies have undergone temporal changes in Europe that are consistent with a response to global warming (Balanyàet al., 2004). Drosophila subobscura and D. madeirensis diverged very recently (some 0.6–1.0 Mya; Ramos-Onsins et al., 1998) and can be considered a model species pair to study how differences in effective size and habitat preferences (specialized vs. more generalist species) can affect the level and pattern of nucleotide variation. Indeed, under strict neutrality, endemic species with small effective sizes are expected to harbour low levels of nucleotide variation (Kimura, 1983). However, the small effective size might also cause a relaxation of selection that would imply, according to the nearly neutral model, a longer sojourn time of mildly deleterious mutations and thus an increase in the level of variation (Ohta, 1992).
The species pair D. madeirensis–D. subobscura is also a good system to analyse the speciation process. Under the biological species concept, species are isolated systems that do not interbreed nor exchange genetic information with other systems (Dobzhansky, 1937; Mayr, 1942). The mechanisms involved in the isolation process, their relative importance and the temporal order in which they are established are important aspects in studies of speciation (Coyne & Orr, 2004). Nevertheless, genes can cross the species barriers when reproductive isolation is incomplete and the production of nonsterile hybrids is possible. Introgression is a common process in incipient species with overlapping distribution areas and has often been detected from variation at the mitochondrial and/or the nuclear genomes in closely related Drosophila species (Wang et al., 1997; Llopart et al., 2005; Bachtrog et al., 2006; Rand et al., 2006). The level of gene flow between not completely isolated species can also differ across the nuclear genome. Indeed, gene regions with restricted gene flow, as reflected by a low number of shared polymorphisms and a high number of fixed differences, can coexist with gene regions that exhibit extensive gene flow that results in a high number of shared polymorphisms and no fixed differences between species (Wang et al., 1997). Some studies have revealed a good correspondence between genomic regions associated with reproductive isolation and regions with restricted gene flow (Ting et al., 2000; Machado et al., 2002). Hence, natural selection may prevent gene flow in those genes involved in sexual isolation and, thus, it?contributes to the reinforcement of reproductive isolation.
Gene flow between incipient species can also be prevented in certain genomic regions by the presence of chromosomal inversions. Classical chromosomal speciation models based on the reduction of fitness in heterokaryotypes (White, 1978) have been recently extended to speciation models that focus on the suppression of recombination between different arrangements as a means to preclude gene flow in key genomic regions associated with species-specific adaptations or reproductive isolation (Noor et al., 2001, 2007; Rieseberg, 2001; Navarro & Barton, 2003; Brown et al., 2004).
Reproductive isolation between D. madeirensis and D. subobscura is not complete because fertile hybrids are obtained in laboratory conditions (Khadem & Krimbas, 1991; Papaceit et al., 1991; Rego et al., 2007). At present, both species coexist in the Madeira Island and thus they might exchange genes in nature via interspecific hybridization. However, D. madeirensis and D. subobscura differ by chromosomal inversions, either fixed between them or segregating as polymorphic in D. subobscura. These inversions might have prevented interspecific genetic exchange in the regions covered by the inversions in the interspecific hybrids (Khadem et al., 2011).
Nucleotide variation was previously surveyed at the rp49 gene region in a sample of D. madeirensis and D. subobscura from Madeira (Khadem et al., 2001). No genetic differentiation between continental and island D. subobscura populations and similar levels of genetic diversity in D. subobscura and D. madeirensis were detected. The rp49 gene region is located in the O chromosome (Muller’s element E) of D. madeirensis. This species is monomorphic for the O3 arrangement, which was present in ancestral populations of D. subobscura but went extinct in this species after the origin of its derived arrangements OST and O3+4 (Ramos-Onsins et al., 1998). Current natural populations of D. subobscura in Madeira are almost monomorphic for the O3+4 arrangement (Prevosti, 1972; Larruga et al., 1983). Therefore, interspecific hybrids would be O3+4/O3 heterokaryotypes and would form an inversion loop including almost completely segment I of the O chromosome (sections 91–99). Here, the previous analysis of nucleotide variation at rp49 is extended to six new genomic regions in a set of 14 lines of D. madeirensis. These six regions, like rp49, are associated with the O3 inversion (Fig. 1), and their variation was previously analysed in a sample of 14 OST and 14 O3+4 chromosomes of D. subobscura (Muntéet al., 2005).
The main aim of the present multilocus study is to compare the level and pattern of nucleotide variation between an endemic and specialized species (D. madeirensis) and a species with a wide distribution in the Paleartic region (D. subobscura). The expected strong differences in effective population size between D. madeirensis and D. subobscura might be reflected in their level and pattern of nucleotide variation. In addition, the present study also aims to shed some light on the speciation process that gave rise to D. madeirensis. Indeed, introgression between species in the studied regions is expected to be low as they are associated with the inversion and would, thus, lie in the loop formed in putative interspecific hybrids. Therefore, the level and pattern of genetic variation in these regions would not be affected by introgression, unlike collinear regions, and would better reflect the speciation history of D. madeirensis and D. subobscura.
The results obtained show that: (1) the level of nucleotide variation in D. madeirensis is similar to that present in the OST and O3+4 arrangements of D. subobscura, (2) the pattern of variation in D. madeirensis is skewed towards polymorphisms with rare variants, (3) purifying selection against nonsynonymous mutations is stronger in D. subobscura than in D. madeirensis and iv) introgression at loci located along the O3 inversion is negligible.
Materials and methods
Fourteen highly inbred lines of D. madeirensis were established after 12 generations of sib-mating from flies collected in Ribeiro Frio (Madeira, Portugal) in 2000. All lines were homozygous for the O3 arrangement. The 14 OST and 14 O3+4D. subobscura lines used in some analyses (Rozas et al., 1995; Muntéet al., 2005) were obtained from a sample collected in El Pedroso (Galicia, Spain).
DNA extraction and sequencing
A modification of protocol 48 in Ashburner (1989) was used to extract genomic DNA from a single individual of the inbred lines. Six genomic regions (P2, P21, P22, P154, S1 and S25) were PCR-amplified in all lines. The primers and conditions used in the amplification process were the same as described in Muntéet al. (2005). PCR products were purified with Qiaquick columns (Qiagen, Chatswort, CA, USA) and used as template for sequencing with internal primers designed at increasing intervals of ∼300 nucleotides. Both strands were cycle-sequenced using the ABI Prism® BigDye™ Terminators 3.0 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA, USA) and run on an ABI Prism 3700 sequencer (Applied Biosystems). Sequences were assembled using the SeqMan program of the DNASTAR Lasergene software package (Burland, 2000), multiply aligned using Clustal W (Thompson et al., 1994), and edited with the MacClade (version 4.0) program (Maddison & Maddison, 2000). Newly reported sequences have been deposited in the EMBL/GenBank Data Libraries under accession numbers HE653145-HE653228.
The level and pattern of nucleotide variation in D. madeirensis were analysed in the six genomic regions studied. DNA sequences from the six orthologous regions of D. subobscura previously reported in Muntéet al. (2005) and the rp49 sequences of D. subobscura and D. madeirensis (Rozas & Aguadé, 1994; Ramos-Onsins et al., 1998; Khadem et al., 2001; EMBL accession nos. X80076X80109, Y09708 and J310269J310306, respectively) were also used in most data analyses. The orthologous Drosophila pseudoobscura sequences used for interspecific comparisons were obtained from the Drosophila Genome Project database at the HGSC using the Mega BLAST search engine at NCBI. Multiply aligned sequences of the six newly reported regions were concatenated in the same order as they are in the O3 arrangement of D. madeirensis, and the analysis was performed either for each genomic region independently or for the concatenated data set.
The level of nucleotide polymorphism in D. madeirensis was estimated as S (the number of polymorphic sites), π (nucleotide diversity; Nei, 1987) and θ (the heterozygosity per site expected at mutation–drift equilibrium; Watterson, 1975). Sites with insertion/deletion variants (indels) were not considered in the analyses. Interspecific analyses were performed independently between D. madeirensis and either OST or O3+4, given that these gene arrangements are genetically differentiated in D. subobscura (Muntéet al., 2005). Genetic differentiation between species was estimated as DXY and DA (the average number of nucleotide substitutions per site and the number of net nucleotide substitutions per site, respectively; Nei, 1987). The statistical significance of genetic differentiation as measured by the KS* statistic was determined by the permutation test (Hudson et al., 1992a). FST or the proportion of nucleotide diversity attributed to variation between populations was estimated according to Hudson et al. (1992b). This parameter was used to estimate Nm (migration index) under the island model of population structure and assuming migration–drift equilibrium (Wright, 1951; Hudson et al., 1992b). The hypergeometric distribution was applied, as described in Rozas & Aguadé (1994), to infer whether the observed number of shared polymorphisms between species (i.e. sites with the same polymorphic variants in D. madeirensis and D. subobscura) could be explained by recurrent mutations. The algorithm developed by Betran et al. (1997) to identify gene conversion tracts was used to detect putative introgressed haplotypes between D. subobscura and D. madeirensis. The DnaSP v5 (Librado & Rozas, 2009) and SITES (made kindly available by J. Hey at http://lifesci.rutgers.edu/~heylab/HeylabSoftware.htm) software packages were used for most DNA polymorphism analyses.
Scaled selection coefficients (γ = Nes, where Ne is the effective population size and s is the selection coefficient) were estimated for synonymous and nonsynonymous polymorphic sites within D. madeirensis and within the D. subobscura OST and O3+4 chromosomal arrangements. We used a Poisson random field (PRF) approach (Sawyer & Hartl, 1992) based on the frequency distribution (fd) of polymorphic variants (Hartl et al., 1994; Akashi & Schaeffer, 1997). The prfml program, made kindly available by S. Sawyer at http://www.math.wustl.edu/~sawyer/, was used to estimate γ, its confidence interval and its associated P-value. The distribution of fitness effects on new nonsynonymous mutations was inferred according to Eyre-Walker & Keightley (2009) using the?DFE-alpha server kindly available at http://liberty.cap.ed.ac.uk/~eang33/dfe-alpha-server.html. This maximum-likelihood approach uses information from the site frequency spectra at selected and neutral sites in a population sample. Nonsynonymous and synonymous sites in the coding regions were used as selected and neutral sites, respectively. The method also takes into account demographic events that can alter the frequency spectrum and allows the estimation of the fraction of adaptive amino acid mutations (α) when divergence data are available. Divergence from D. pseudoobscura was used with this purpose.
Gene genealogies were reconstructed by the neighbour-joining method (Saitou & Nei, 1987) as implemented in the mega v4.0 program (Tamura et al., 2007). Genetic distances were corrected for multiple hits according to Jukes & Cantor (1969). Bootstrap values were obtained after 1000 replicates.
The method developed by Wakeley (1996a,b) was applied to contrast whether the observed pattern of variation was consistent with that expected under the isolation model. The ψ-test (Wakeley, 1996a) based on the variance of the number of nucleotide differences between species was performed with a C program kindly provided by J. Wakeley. The numbers of exclusive mutations (SX1, SX2), fixed differences between groups (SF) and shared polymorphisms between groups (SS) were used to estimate the population parameters θ1 (OST or O3+4), θ2 (D. madeirensis), θA (ancestral species) and τ (τ = 2μt, where μ is the mutation rate per sequence and t the time since the split) for each genomic region and for the concatenated data set as described in Wakeley & Hey (1997). In addition, the fit of the data to the isolation model was contrasted by a multilocus approach according to the test statistics χ2 (Kliman et al., 2000) and wh (Wakeley & Hey, 1997; Wang et al., 1997) as implemented in the wh program (http://lifesci.rutgers.edu/~heylab/HeylabSoftware.htm). This model assumes that an ancestral population split into two descendent populations with constant effective sizes.
The isolation with migration model, as implemented in the im program, was also used to contrast whether the data could be explained by isolation with some degree of directional gene flow between species (Nielsen & Wakeley, 2001; Hey & Nielsen, 2004). The im program (http://lifesci.rutgers.edu/~heylab/HeylabSoftware.htm) applies Markov chain Monte Carlo (MCMC) simulations to estimate the posterior probability distribution of model parameters from multiple unlinked neutral loci data sets assuming no recombination within loci. Thus, only the largest fragment for each region with no evidence for recombination according to the four gamete test (Hudson & Kaplan, 1985) was considered in this analysis. The basic procedure to obtain maximum-likelihood estimates and confidence intervals of model parameters was to start with a burn-in period of 100 000 steps and proceed for more than ten million steps after the burn-in period. We applied the Hasegawa–Kishino–Yano substitution model (HKY; Hasegawa et al., 1985) and Metropolis coupling with six coupled Markov chains and the two-step increment model. Inheritance scalars were assigned as constants (h = 1, for autosomal loci). The im program was run under the changing population size model that allows changes in the population size of the descendent populations. Sequence divergence from D. pseudoobscura in the studied regions (7.4 × 10−9 changes per site per year assuming a divergence time of 8.1 Myr; Ramos-Onsins et al., 1998) was used to infer the mutation rate per locus per year. Time estimates in mutation units (t) were converted to years (T = t/μ) according to the geometric mean of the mutation rate per locus per year (μ).
Nucleotide diversity and neutrality tests
Six genomic regions in segment I of the O chromosome (Fig. 1) were sequenced in 14 D. madeirensis lines. Information about the studied regions is available in supplementary Table S1 (Supporting Information). The concatenated data set including the six newly reported regions had 11 255 sites after excluding alignment gaps. A detailed description of nucleotide variation is shown in supplementary Fig. S1 (Supporting Information). Table 1 shows a summary of nucleotide variation estimates in the different regions of the O chromosome of D. madeirensis. Silent nucleotide diversity (πsil) ranges from 0.006 at regions S25 and P21 to 0.018 at region P154. These estimates are within the range observed in the orthologous regions of D. subobscura, either when the OST and O3+4 arrangements are analysed independently or when they are jointly analysed (Table 2). Therefore, no strong reduction of nucleotide variability was observed in the endemic species D. madeirensis relative to D. subobscura.
Table 1. Nucleotide polymorphism and neutrality tests in Drosophila madeirensis.
n, Number of DNA sequences in the sample; Hd, haplotype diversity; # sites, number of sites after excluding alignment gaps; S, number of polymorphic sites; πtotal, nucleotide diversity in all sites; πsil, nucleotide diversity in silent sites (noncoding and synonymous sites); θsil, heterozygosity per silent site based on the number of silent segregating sites; πsyn, nucleotide diversity at synonymous sites; πa, nucleotide diversity at nonsynonymous sites; πnoncoding, nucleotide diversity at noncoding sites.
*, 0.01 < P < 0.05; **, P ≤ 0.01 (two-tailed test).
†rp49 was not included in the concatenated data set as this region was sequenced in a different set of D. madeirensis lines.
‡Fu and Li tests based on the D and F statistics were performed using D. pseudoobscura as the outgroup.
# silent sites
Fu and Li’s D‡
Fu and Li’s F‡
Table 2. Silent nucleotide polymorphism (πsil) in Drosophila madeirensis and Drosophila subobscura.
D. subobscura (OST)
D. subobscura (O3+4)
D. subobscura (OST + O3+4)
*Concatenated data set including all regions, except rp49 (see Table 1).
When comparing different estimates of nucleotide variation in D. madeirensis (Table 1), silent nucleotide diversity (πsil) was lower than silent heterozygosity (θsil) in all regions, which indicates an excess of variants segregating at low frequency in the species. Among silent sites, nucleotide diversity was always lower at noncoding than at synonymous sites. This result suggests a higher level of constraint at noncoding relative to synonymous sites, as previously reported in Drosophila (Andolfatto, 2005). The lowest estimates of nucleotide variation were detected at nonsynonymous sites, as expected by the action of purifying selection acting on amino acid replacements.
Neutrality test statistics based on the frequency spectrum (Tajima, 1989; Fu & Li, 1993) yielded negative values in all regions, indicating an excess of singleton polymorphisms in D. madeirensis (Table 1). Tajima’s D test did not indicate a significant departure from neutral expectations in any region, except at S25 and P21 where the test was marginally significant (0.05 < P < 0.1). In these two regions, as well as at P154, the test was significant (P < 0.05) when considering only noncoding sites. Nevertheless, Fu and Li’s tests based on the D or F statistics were significant at P2, S25 and P21, using D. pseudoobscura (Table 1) or D. subobscura (results not shown), as the outgroup. Multilocus Tajima’s D and Fu and Li’s D tests (two-tailed tests) showed that the average value of both statistics across regions differed significantly from zero (, P = 0.002; , P = 0.002). These results indicate an overall highly significant excess of singletons in D. madeirensis. Indeed, the performed single region tests are rather conservative because the confidence intervals of the test statistics were inferred under the assumption of no recombination.
HKA tests (Hudson et al., 1987) based on silent variation were performed between pairs of regions using D. subobscura OST, D. subobscura O3+4 or D. pseudoobscura as the outgroup. None of the tests yielded a significant result. Therefore, no decoupling between the level of divergence between species and the level of polymorphism within D. madeirensis was detected in any region (results not shown). In addition, a multilocus HKA test based on variation in D. madeirensis and using D. pseudoobscura as the outgroup yielded also a nonsignificant result (X2 = 3.47; d.f. = 6, P = 0.74).
Selection coefficient estimates
Interspecific differences in population size might affect the effectiveness of selection acting on new mutations. Estimates of the scaled selection coefficients (γ) in the coding regions are summarized in Table 3. Purifying selection against nonsynonymous mutations is much stronger in D. subobscura than in D. madeirensis. Indeed, γ estimates for nonsynonymous mutations differ significantly from zero in both the OST and O3+4 arrangements of D. subobscura, but not in D. madeirensis. In addition, Tajima’s D statistic for nonsynonymous variants is negative in both species but only significant in D. subobscura. These results would indicate that purifying selection is more effective in eliminating nonsynonymous mutations in D. subobscura than in D. madeirensis. The distribution of fitness effects of deleterious nonsynonymous mutations also supports differences in the strength of purifying selection between species (Fig. 2). Drosophila madeirensis presents a higher fraction of nearly neutral and weakly selected mutations and a smaller fraction of mutations under strong purifying selection than D. subobscura (O3+4). In addition, the fraction of adaptive nonsynonymous substitutions is lower in D. madeirensis than in D. subobscura (O3+4) (α = 41% and α = 59%, respectively). These results are also consistent with a less effective selection and thus with a lower effective size of D. madeirensis relative to D. subobscura. However, the use of synonymous sites as the neutral class in the maximum-likelihood approach (Eyre-Walker & Keightley, 2009) to infer both the distribution of fitness effects of deleterious nonsynonymous mutations and the fraction of adaptive nonsynonymous substitutions might be?questioned. Indeed, γ estimates in Table 3 indicate purifying selection on synonymous mutations in D. subobscura (O3+4), which can be related to the maintenance of codon bias as the mutations from preferred to unpreferred codons present the same pattern (Akashi, 1995). On the other hand, it is interesting to note that the distribution of fitness effects differs substantially in both arrangements of D. subobscura. Indeed, 98% of mutations fall into the class 10 < |Nes| < 100 in D. subobscura (OST).
Table 3. Tajima’s D and scaled selection coefficient (γ) estimates according to the frequency distribution (fd) method.
Number of polymorphic sites
P (γ = 0)
†P→U, polymorphic changes from preferred to unpreferred codons inferred according to the Drosophila melanogaster codon preferences table (Akashi, 1995). Only polymorphic synonymous changes that could be unambiguously polarized using the sequence of Drosophila pseudoobscura as the outgroup were considered.
‡*P < 0.05.
§Confidence interval at the 95% level.
Drosophila subobscura (O3+4)
D. subobscura (OST)
The OST and O3+4 arrangements of D. subobscura are genetically differentiated (P < 0.001) in the different gene regions distributed along the O3 inversion here studied (Muntéet al., 2005). The permutation test (Hudson et al., 1992a) also indicated a significant genetic differentiation between D. madeirensis and D. subobscura (either OST or O3+4) when regions are independently analysed and for the concatenated data set (KS* = 4.11 and KS* = 4.33 in the D. madeirensis vs. OST and D. madeirensis vs. O3+4 comparison, respectively; P < 0.001 in both cases). The DXY and DA estimates were used to infer the level of genetic differentiation between species (Table 4). In each region and in the concatenated data set, the level of genetic differentiation between D. madeirensis and O3+4 was higher than between D. madeirensis and OST.
Table 4. Genetic differentiation and gene flow between Drosophila subobscura (OST or O3+4) and Drosophila madeirensis.
*Lines M27 and M54 of D. madeirensis were excluded from the analysis of the P2 region because they lacked the sequences from nucleotide 1900 to 2720 (M27) and from 1680 to 2720 (M54) available in D. subobscura and the other D. madeirensis lines.
†Concatenated data set including all regions, except rp49 (see Table 1).
As expected from the significant genetic differentiation between species, estimates of gene flow (Nm) were rather low (Table 4). The observed number of shared polymorphisms can be explained by recurrent mutation in most genomic regions and in the concatenated data set (Table 5), which is also consistent with restricted gene?flow between species. Only at the P154 (OST–?D. madeirensis) and rp49 (O3+4–D. madeirensis) regions, the number of shared polymorphisms was slightly higher than expected. Indeed, the maximum number of shared polymorphisms expected from recurrent mutation was three at P154 and four at rp49, which is lower than the four and five shared polymorphisms observed in each comparison, respectively. In addition, no introgressed haplotypes between D. subobscura and D. madeirensis were detected in five of the seven regions studied, and in the remaining two regions only very short tracts were identified (17 nt long at P154 and 15 nt long at P22). Therefore, it can be inferred that in the regions studied, neither introgression nor the presence of ancestral polymorphisms have greatly contributed to increase the number of shared polymorphisms between species.
Table 5. Population parameter estimates according to the isolation model (Wakeley & Hey, 1997).
SF, fixed differences; SS, shared polymorphic sites; SX1, exclusive polymorphic sites in D. madeirensis; SX2, exclusive polymorphic sites in OST or O3+4; ψ statistic (Wakeley, 1996a,b); P-value, P[ψ ≥ ψobserved] after 1000 simulations. θ1 (Drosophila subobscura OST or O3+4), θ2 (D. madeirensis), θA (ancestral population) and τ estimates obtained according to the isolation model with the WH program.
†Recurrent mutation test based on the hypergeometric distribution. *P < 0.05; ns, not significant.
‡Lines M27 and M54 of D. madeirensis were excluded from the analysis in this region (see Table 2).
§Concatenated data set including all regions, except rp49 (see Table 1).
The gene genealogy of the D. subobscura and D. madeirensis lines was inferred from total variation in the concatenated data set. Fig. 3 shows the neighbour-joining tree reconstructed using D. pseudoobscura as the outgroup. Lines group with a high bootstrap support in three well-defined clusters: D. madeirensis, D. subobscura OST and D. subobscura O3+4. This result is consistent with the significant genetic differentiation detected between D. madeirensis and the D. subobscura OST and O3+4 arrangements. It is also consistent with a restricted genetic exchange between the OST and O3+4 arrangements of D. subobscura. However, the branching order of the three main clusters detected (D. madeirensis, OST and O3+4) is not well resolved in the inferred genealogy as the bootstrap support of the corresponding nodes is low.
Gene genealogies reconstructed from each genomic region independently also support the presence of the three main clusters with rather high bootstrap support (Fig. S2; Supporting Information). However, the branching pattern of the D. madeirensis, OST and O3+4 clusters differs from that inferred from the concatenated data set in some cases. Indeed, only rp49 and P22 rendered the same topology as in Fig. 2 relative to the three main clusters. According to nucleotide variation at S1 and S25, the D. madeirensis lines cluster with the O3+4 lines. In contrast, the OST and O3+4D. subobscura lines form a monophyletic group in the gene genealogies reconstructed from variation at P21, P2 and P154. Nevertheless, bootstrap values supporting the branching order of the three main clusters are also low in the gene genealogies based on individual genome regions.
Speciation model tests
The ψ-test statistic (Wakeley, 1996b) was used to contrast whether the data were consistent with the isolation model of speciation, in which an ancestral population splits into two descendent populations without migration. The isolation model was not rejected for any of the studied regions or for the concatenated data set (Table 5). The multilocus tests based on both the wh (Wang et al., 1997) and the χ2 (Kliman et al., 2000) test statistics also failed to reject the isolation model of speciation (Table 6). The population parameters θ1, θ2, θA and τ were estimated as proposed by Wakeley & Hey (1997) under the isolation model in comparisons between D. madeirensis and D. subobscura (Table 5). Estimates of θ (θ = 4 Nμ) in each region and in the concatenated data set are lower in D. madeirensis than in the D. subobscura O3+4 arrangement but higher than in OST. Assuming a similar mutation rate across species, this result would indicate that D. madeirensis has an effective population size intermediate between those of O3+4 and OST. Estimates of τ in Table 5 would indicate that the divergence time between D. madeirensis (with the ancestral O3 arrangement) and O3+4 is higher than between D. madeirensis and OST.
Table 6. Multilocus analysis according to the isolation model.
The isolation model without migration was also supported by the Markov chain Monte Carlo approach implemented in the IM program. Indeed, the marginal posterior probability distribution of the migration rate revealed a peak at the lower limit of resolution in comparisons between D. madeirensis and D. subobscura (OST or O3+4). Maximum-likelihood estimates of different population parameters inferred by the im program are summarized in Table 7. The ratio θ1/θ2 in both comparisons is quite similar, although slightly higher, to that inferred from the concatenated data set in Table 5 (0.65 vs. 0.71 for D. madeirensis– OST and 1.14 vs. 1.37 for D. madeirensis– O3+4), which also supports that the effective size of D. madeirensis is lower relative to O3+4 but higher relative to OST. However, maximum-likelihood estimates of θA relative to either θ1 or θ2 are much lower than estimates in Table 5, which may be a consequence of the different assumptions in the IM changing population size model (as implemented in the im program) and the isolation model (wh program). The IM changing population size model was also used to obtain maximum-likelihood estimates of the elapsed time since the split of D. madeirensis and either OST or O3+4 (Table 7). Although the marginal posterior probability distribution of the t parameter is partly overlapping in both comparisons, the results also suggest that the divergence time between D. madeirensis (with the ancestral O3 arrangement) and O3+4 is higher than between D. madeirensis and OST.
Table 7. Maximum-likelihood estimates according to the IM changing population size model.
θ1 (Drosophila subobscura OST or O3+4), θ2 (D. madeirensis), θA (ancestral population) and t estimates obtained according to the IM program.
*t (years) were estimated assuming a mutation rate per locus per year of 3.88 × 10−6 (OST–D. madeirensis) and 4.17 × 10−6 (O3+4–D. madeirensis).
‡HPD, highest posterior density interval.
582 614–904 856
610 744–934 079
Natural populations of D. madeirensis are restricted to the Laurisilva forest of the Madeira Island. Island-endemic species are expected to have a low effective population size and thus reduced genome-wide variation as compared to abundant species with a wide distribution range. However, levels of nucleotide diversity in D. madeirensis (πsil = 0.0087 for the concatenated data) are within the range than those in D. subobscura, not only for the genomic regions here analysed (πsil = 0.0080 in OST and πsil = 0.0100 in O3+4, Table 2), but also for different D. subobscura X-linked regions (πsil = 0.0072; Nóbrega et al., 2008). The first intron of the X-linked RpII215 gene also presents similar levels of nucleotide variation in both species (Llopart & Aguadé, 2000; Khadem, unpublished data). In addition, silent nucleotide variation in D. madeirensis is only slightly lower than in the continental species D. pseudoobscura (πsil = 0.0105 for different genes of the second chromosome that is not affected by polymorphic inversions; Hamblin & Aquadro, 1999). When compared to other endemic species, silent nucleotide diversity in D. madeirensis is much higher than in D. sechellia (πsil = 0.0011; Legrand et al., 2009) and also higher than in the single gene so far analysed of Drosophila guanche (πsil = 0.0059; Pérez et al., 2003), an endemic species of the Canary Islands and also closely related to D. subobscura. Therefore, it would seem that the historical effective size of D. madeirensis has not been extremely low, although the highly significant excess of singletons indicates that variation is not at mutation–drift equilibrium.
According to the nearly neutral model of molecular evolution, slightly deleterious mutations may segregate in populations at low frequencies when the effective population size is low (Ohta, 1992). In small populations, selection can be counteracted by genetic drift causing a longer sojourn time in the population of mildly deleterious mutations. The scaled selection coefficients estimated for nonsynonymous mutations and Tajima’s D tests applied to this kind of mutations indicate that purifying selection is less effective in D. madeirensis than in D. subobscura (Table 3). In addition, the distribution of fitness effects of nonsynonymous mutations is slightly shifted towards more weakly selected mutations (i.e. 0 < |Nes| < 100) in D. madeirensis relative to D. subobscura (O3+4). Selection against nonsynonymous mutations would seem to have been less effective in D. madeirensis, which would be consistent with a stronger effect of genetic drift in this species and thus with the expected difference in effective population size between D. madeirensis and D. subobscura.
The closely related species D. madeirensis and D. subobscura are not completely isolated as viable and fertile hybrids are recovered in the laboratory (Krimbas & Loukas, 1984). Although reproductive isolation between species in the laboratory may differ from that in their natural habitats (Counterman & Noor, 2006), hybridization between D. madeirensis and D. subobscura might be likely in nature at present. However, the present multilocus approach indicates that nucleotide variation in the genomic regions analysed is consistent with the speciation model of isolation without migration, and confirms that the studied regions are especially suitable to trace the history of these species. Drosophila madeirensis likely arose in allopatry by a speciation process in Madeira after ancestral O3D. subobscura populations colonized the island (Fig. 4). Current populations of D. subobscura in Madeira with a high frequency of the O3+4 arrangement would thus be the result of a second colonization event (Khadem et al., 2001). O3+4 appears as a basal cluster in the inferred gene genealogy, although with a very low bootstrap support. Indeed, O3+4 is a rather old arrangement according to its standing level of variation (Rozas & Aguadé, 1994; Navarro-Sabatéet al., 1999; Rozas et al., 1999). O3+4 might have its origin in Iran or Asia Minor, where it currently reaches its highest frequency (Krimbas & Powell, 1992). After its origin, the new arrangement would have expanded westwards into Europe, where its frequency is lowest in the north. A putative scenario would thus be that O3+4 was not present either in Europe or North Africa when the ancestral O3D. subobscura population colonized Madeira. It is even likely that the origin of OST, which according to the previously estimated ages of both arrangements (Rozas & Aguadé, 1994; Navarro-Sabatéet al., 1999; Rozas et al., 1999) would be younger than O3+4, occurred in Europe after the first colonization of the Madeira Island.
Reproductive isolation between D. madeirensis and D. subobscura would have arisen before the second colonization event and likely reinforced when both species coexisted. The lack of introgressed haplotypes in D. madeirensis from D. subobscura supports that the gene pool of both species has evolved independently in the studied regions. However, interspecific hybridization between D. madeirensis and D. subobscura in nature after the second colonization event cannot be completely discarded according to the present results. Putative interspecific hybrids would be O3/O3+4 heterokaryotypes. Therefore, the presence of inversion 4 in hybrids would have prevented genetic exchange in the studied regions. Only the analysis of variation at nuclear regions not affected by inversions and thus collinear between D. madeirensis and D. subobscura might contribute to elucidate the extent of introgression between both species in Madeira. Also, the study of variation at the mtDNA genome might be informative, as in the D. yakuba species group, a more extensive gene flow in mitochondrial than in nuclear genes has been detected (Llopart et al., 2005; Bachtrog et al., 2006). However, preliminary data on mtDNA variation in D. madeirensis (Khadem, unpublished results) do not support introgression from D. subobscura and indicate genetic differentiation between both species at the organelle genome, as previously reported in phylogenetic studies (Barrio et al., 1994; Gleason et al., 1997; Brehm et al., 2001).
Drosophila madeirensis inhabits the Laurisilva forest that, although formerly covered much of Madeira, at present extends over an area of 150 km2, which represents 20% of the island surface. Therefore, the natural habitat of the species has been progressively reduced in the recent past by human pressure. Average heterozygosities are lower in species with a high extinction risk than in their unthreatened relatives (Spielman et al., 2004). However, present data indicate that the endemic species D. madeirensis harbours a rather high level of nucleotide polymorphism and a pattern of variation characterized by a significant excess of low frequency variants. Therefore, evolutionary factors that reduce variation (i.e. genetic drift and inbreeding) have not had until now a strong effect in wearing away the genetic variability of the species. However, ecological factors, such as global warming and the risk of damage of the Laurisilva forest, might threaten the survival of D. madeirensis.
We thank J. Rozas for critical comments on a previous version of the manuscript and the anonymous reviewers for their constructive comments. We also thank Serveis Científico-Tècnics, Universitat de Barcelona, for automated DNA sequencing facilities. This work was supported by grant POCTI/BSE/43097/2001 by Foundation for Science and Technology (FCT), Portugal, to M.K. and grants BFU2004-2253, BFU2007-63228 from Comisión Interdepartamental de Ciencia y Tecnología, Spain, and 2005SGR-166, 2009SGR-1287 from Comissió Interdepartamental de Recerca i Innovació Tecnològica, Catalonia, Spain, to M.A.