STRUCTURE AND POPULATION GENETICS OF THE BREAKPOINTS OF A POLYMORPHIC INVERSION IN DROSOPHILA SUBOBSCURA

Authors

  • Montserrat Papaceit,

    1. Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, i Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
    Search for more papers by this author
  • Carmen Segarra,

    1. Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, i Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
    Search for more papers by this author
  • Montserrat Aguadé

    1. Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, i Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
    2. E-mail: maguade@ub.edu
    Search for more papers by this author

  • This paper is dedicated to the memory of Professor Antoni Prevosti.

Abstract

Drosophila subobscura is a paleartic species of the obscura group with a rich chromosomal polymorphism. To further our understanding on the origin of inversions and on how they regain variation, we have identified and sequenced the two breakpoints of a polymorphic inversion of D. subobscura—inversion 3 of the O chromosome—in a population sample. The breakpoints could be identified as two rather short fragments (∼300 bp and 60 bp long) with no similarity to any known transposable element family or repetitive sequence. The presence of the ∼300-bp fragment at the two breakpoints of inverted chromosomes implies its duplication, an indication of the inversion origin via staggered double-strand breaks. Present results and previous findings support that the mode of origin of inversions is neither related to the inversion age nor species-group specific. The breakpoint regions do not consistently exhibit the lower level of variation within and stronger genetic differentiation between arrangements than more internal regions that would be expected, even in moderately small inversions, if gene conversion were greatly restricted at inversion breakpoints. Comparison of the proximal breakpoint region in species of the obscura group shows that this breakpoint lies in a small high-turnover fragment within a long collinear region (∼300 kb).

Structural variation has played an important role in chromosomal evolution as initially revealed by cytogenetic studies (e.g., as reviewed in Powell 1997 for Drosophila) and more recently through the comparison of complete genome sequences from both distantly and closely related species (e.g., Pevzner and Tesler 2003; Bhutkar et al. 2008; Lee et al. 2008; von Grotthuss et al. 2010). Among structural variants, paracentric inversions have greatly contributed to within-chromosome reorganization, whereas translocations and chromosome fusions underly most between-chromosome reorganizations, with transpositions having played a minor role in this context (Ranz et al. 2001; see also Conceição and Aguadé 2008). At the intraspecific level, chromosomal inversion polymorphism has been extensively studied at the cytogenetic level in Drosophila (Powell 1997), in which it is widespread in some species (e.g., D. melanogaster, D. pseudoobscura, D. subobscura, and D. buzzatii) and absent, or nearly absent, in other species (e.g., D. simulans). Inversion polymorphism is, however, not restricted to Drosophila and other insects where their study was first possible due to the presence of polytene chromosomes in some of their organs. Indeed, the availability of molecular markers has allowed the relatively recent identification and characterization of segregating inversions in such diverse taxa as birds and plants (Thomas et al. 2008; Lowry and Willis 2010). Also, inversions segregating at intermediate frequencies have been detected in human populations (Feuk et al. 2005; Stefansson et al. 2005; Bansal et al. 2007).

As for any mutation, the inversions detected through the comparison of extant taxa—that is, fixed between species and segregating within species—constitute a small subset of all the inversions that originated in the past. Most inversions become lost by drift soon after their origin, independently of their effect on fitness. Those adaptive inversions that escape their loss by drift can rapidly increase in frequency and either become established at an intermediate frequency (balanced polymorphism) or become fixed. In contrast, the frequency of those inversions that do not affect the fitness of their bearers can slowly drift on their way to either fixation or loss. There is evidence in multiple species of Drosophila, as well as in other organisms, for the adaptive character of inversion polymorphism even if the underlying mechanisms are generally unknown (Powell 1997).

Concerning the origin of inversions, two mechanisms have been proposed based on available data (Ranz et al. 2007). Thus, inversions can originate through recombination between distant inverted copies of a particular transposable element or repetitive sequence. Alternatively, inversions can originate via staggered double-strand breaks at distant locations of a particular chromosome. For relatively young inversions, the presence of transposable elements at both inversion breakpoints would support their origin through the first mechanism (henceforth mechanism 1), whereas their absence together with the presence of duplicated fragments at one or both inversion breakpoints would support their origin through the second mechanism (henceforth mechanism 2). There are only a few polymorphic inversions for which both inversion breakpoints have been identified and sequenced. At this time scale, there seems to be a difference between species concerning the origin of polymorphic inversions. Data from Anopheles species (Mathiopoulos et al. 1998; Lobo et al. 2010) and from both D. buzzatii (Cáceres et al. 1999; Casals et al. 2003; Delprat et al. 2009) and D. pseudoobscura (Richards et al. 2005) would point to mechanism 1 due to the presence of transposable elements and other repetitive sequences at the breakpoints. Indeed, polymorphic inversion breakpoints in D. pseudoobscura seem enriched in two different medium-sized repetitive sequences (Richards et al. 2005). In contrast, data from D. melanogaster (Wesley and Eanes 1994; Andolfatto et al. 1999; Matzkin et al. 2005) would favor mechanism 2 because duplications, but no transposable elements, were detected at the characterized breakpoints. At the longer timescale (i.e., for inversions fixed between species), there is little evidence for mechanism 1 because no transposable elements have generally been detected at fixed inversion breakpoints (Cirera et al. 1995; Sharakhov et al. 2006; Prazeres da Costa et al. 2009; Runcie and Noor 2009). Transposable elements could have been lost, however, once inversions became fixed. On the other hand, support for mechanism 2 emerged from the comparison of the D. melanogaster and D. yakuba genome sequences (Ranz et al. 2007), which revealed an enrichment of duplicated fragments at within-chromosome rearrangement breakpoints.

Although the number of polymorphic inversion breakpoints characterized at the sequence level for any particular clade of Drosophila species is very low to draw any conclusion relative to the different modes of origin, the available data would point to staggered breaks in D. melanogaster and to repetitive sequences in D. pseudoobscura and D. buzzattii (see, however, Calvete et al. 2012). If the observed trend did hold within species, it would also be important to establish whether modes of origin are species specific or clade specific.

In a first effort to address the above-raised question in the obscura group, we have identified and sequenced the two breakpoints of a polymorphic inversion of D. subobscura: inversion 3 of the O chromosome (i.e., of Muller's element E). The breakpoints of this inversion had been mapped at sections 91B/C and 94E/95A of the Ost cytological map (Künze-Mühl and Müller 1958; Fig. 1). According to its physical (∼3.5 Mb) and recombination (27.4 cM) length (Munté et al. 2005), inversion 3 can be considered a rather short inversion. Inversion 3 originated on the ancestral O3 arrangement and gave rise to the Ost chromosomal arrangement (Fig. 1). A second arrangement (O3 + 4)—derived also from O3 through a single inversion (Inv 4 in Fig. 1)—segregates together with Ost in extant populations of this species. The ancestral O3 arrangement went extinct in the D. subobscura lineage after the origin of the Ost and O3 + 4 chromosomal arrangements. Therefore, we identified the breakpoint regions in a homokaryotypic O3 + 4 line, because the breakpoint regions present in the ancestral O3 arrangement (named AB and CD in Fig. 1) are maintained in extant O3 + 4 chromosomes, except for the reversed order of the distal breakpoint (DC in Fig. 1). Sequence comparison of the two breakpoints of inversion 3 between Ost and O3 + 4 chromosomes indicates that inversion 3 of D. subobscura most likely originated via staggered double-strand breaks, which contrasts with the origin of the arrowhead inversion of D. pseudoobscura via ectopic recombination of repetitive sequences (Richards et al. 2005). This would imply that the association detected in the D. pseudoobscura lineage between inversion breakpoints and two families of repetitive sequences is not a characteristic of the obscura group but specific of the pseudoobscura lineage.

Figure 1.

Schematic representation of the chromosome regions affected by inversions 3 and 4 in Drosophila subobscura. The ancestral O3 chromosomal arrangement, which went extinct in this species, is represented in the central part of the figure. The derived and extant Ost and O3 + 4 arrangements are represented above and below the ancestral O3 arrangement. Bars labeled Inv 3 and Inv 4 indicate the extent of the fragments affected by inversions 3 and 4. The rectangular boxes (A, B, C, D) represent the regions flanking inversion 3 breakpoints. Gray boxes in the derived Ost and O3 + 4 arrangements indicate regions whose location has been affected by inversions 3 and 4, respectively. The cytological location of the breakpoints in Ost is indicated below the corresponding regions. Markers flanking the breakpoints of inversion 3 (P2, Acph, S1) are indicated above the chromosome schemes. Cen, centromere; Tel, telomere.

Moreover, our resequencing of the fragments encompassing each breakpoint in a moderately large sample of Ost and O3 + 4 isochromosomal lines extracted from a Spanish natural population (Rozas et al. 1995) has allowed us to evaluate the evolutionary forces that have shaped variation at the breakpoints of both chromosomal arrangements.

Materials and Methods

DROSOPHILA STRAINS

Nineteen D. subobscura isochromosomal lines for the O chromosome were used: 10 O3 + 4 and 9 Ost. These lines from El Pedroso (Spain) are a subset of those used in previous studies (Rozas et al. 1995; Navarro-Sabaté et al. 1999; Munté et al. 2005). A highly inbred line of each D. madeirensis and D. guanche, which had been obtained by over 10 generations of sibmating (Khadem et al. 1998; Pérez et al. 2003), were also used.

PCR AMPLIFICATION AND SEQUENCING

Genomic DNA was extracted from frozen individuals of each isochromosomal line of D. subobscura and from one individual of a highly inbred line of each D. madeirensis and D. guanche using a modification of protocol 48 in Ashburner (1989). Oligonucleotides for PCR amplification and sequencing were designed based on the comparison of the genome sequences of D. melanogaster and D. pseudoobscura and using D. subobscura sequences whenever available (unpubl. results, Barcelona Subobscura Initiative or BSI). Sequences of the primers used for PCR amplification and sequencing are available from the authors upon request. Different Taq polymerases (TaKaRa DNA polymerase from Takara Bio Inc and GoTaq DNA polymerase from Promega) were used according to the expected length of the fragment to be amplified. The amplified fragments were purified with MultiScreen PCR (Millipore) and used directly as templates for sequencing with the ABI PRISM version 3.2 cycle sequencing kit (Applied Biosystems, Foster City, CA) according to manufacturer's conditions. Sequencing products were separated on an ABI PRISM 3730 sequencer (PerkinElmer, Norwalk, CT). Sequences were assembled using the DNASTAR package (Burland 2000) and multiply aligned with the MAFFT version 6.864 program (Katoh and Toh 2008). Multiple sequence alignments were edited with the MacClade version 3.06 program (Maddison and Maddison 1992). All sequences were obtained on both strands. The newly obtained sequences have been deposited in the EMBL/GenBank Data Library under accession numbers HE614146-HE614186.

IN SITU HYBRIDIZATION

Polytene chromosome preparations of D. subobscura were performed according to Montgomery et al. (1987). The fragments amplified by PCR using DNA from an isochromosomal O3 + 4D. subobscura line (J16 in Figs. S1–S5) were gel-band extracted with the QIAquick kit (Qiagen) prior to their labeling for in situ hybridization. Probes were obtained through biotin-16-dUTP labeling by nick translation of purified PCR amplicons. Prehybridization, hybridization, and detection were as described in Montgomery et al. (1987) using the ABC-Elite Vector Laboratories kit for detection and with a hybridization temperature of 37°C. Digital images were obtained at a 400 magnification using a phase contrast Axioskop 2 Zeiss microscope and a Leica DFC290 camera. The location of the hybridization signals was determined using the cytological map of D. subobscura (Künze-Mühl and Müller 1958) with the standard arrangement for all chromosomes. The length and location of the different fragments used as probes are given in Table S1.

SEQUENCE ANALYSIS

Sequences of the regions that span the breakpoints of inversion 3 in D. subobscura O3 + 4 chromosomes were compared to the D. pseudoobscura genome sequence through the discontiguous megablast algorithm (http://blast.ncbi.nlm.nih.gov/). Moreover, they were analyzed with RepeatMasker (http://www.repeatmasker.org/) to identify any repetitive sequences.

The DnaSP version 5.10.01 program (Librado and Rozas 2009) was used for most analyses of intraspecific and interspecific variation. Nucleotide polymorphism was estimated as: the number of segregating sites (S), the minimum number of mutations (η), nucleotide diversity (π; Nei 1987), haplotype number (h), and haplotype diversity (Hd) (Nei 1987). Insertion-deletion (indel) polymorphism was characterized by: the number of indels (I), average indel length (IL), and indel diversity per site (ID). The D statistic (Tajima 1989) was used as a summary of the frequency spectrum. The level of genetic differentiation between chromosomal arrangements was estimated as DXY (Nei 1987) and FST (Hudson et al. 1992a), and its significance established using the KS* test statistic (Hudson et al. 1992b). Gene conversion tracts were inferred according to Betrán et al. (1997). Interspecific divergence was estimated as the number of nucleotide substitutions per site (K) using D. madeirensis as outgroup, and correcting for multiple hits according to Jukes and Cantor (Jukes and Cantor 1969). Gene genealogies were reconstructed by the neighbor-joining method as implemented in the MEGA version 5.05 program (Tamura et al. 2011). A goodness-of-fit test (χ2L; Kreitman and Hudson 1991) was used to contrast whether levels of nucleotide variation varied among regions. Computer simulations conditioned on the number of segregating sites and under the conservative assumption of no recombination were used to obtain confidence intervals for nucleotide diversity estimates that were used to assess putative differences between chromosomal arrangements.

Results

CHARACTERIZATION OF THE REGIONS SPANNING INVERSION 3 BREAKPOINTS IN O3 + 4 CHROMOSOMES

The starting point for the identification of inversion 3 breakpoints of D. subobscura (Fig. 1) consisted of three molecular markers that had been previously mapped close to the inversion breakpoints (Segarra et al. 1996; Munté et al. 2005). Two markers flanked the distal breakpoint on O3 chromosomes (CD in Fig. 1): Acph and S1, located inside (at section 91C) and outside the inverted region (at section 95A), respectively, as represented in Figure 1. The third marker (P2 in Fig. 1) mapped at section 94E and was therefore close to the internal part of the proximal breakpoint (AB in Fig. 1). Comparison of the D. subobscura sequences of markers S1 and P2 with the available genome sequence of D. pseudoobscura (Richards et al. 2005) had revealed that these markers were orthologous to the GA26879 and Abdominal A (AbdA) genes (Munté et al. 2005), respectively.

We used the D. pseudoobscura and D. melanogaster genome sequences (Richards et al. 2005; Tweedie et al. 2009) to anchor our mapping efforts. Sequence comparison revealed that these genomes are collinear to the O3 + 4 chromosome over several hundred kilobases at both inversion breakpoints (AB and CD in Fig. 1). Indeed, the ∼400-kb-long fragment delimited by Acph and GA26879 (S1 marker in Fig. 1), is collinear between these distantly related species, and most likely also between D. pseudoobscura and D. subobscura. In addition, an ∼300-kb-long region spanning the AbdA gene (marker P2 in Fig. 1) is collinear between those species. We obtained probes by PCR amplification using DNA from an O3 + 4 line, in situ hybridized these probes to Ost polytene chromosomes, and determined the cytological signal location using the Ost cytological map (Künze-Mühl and Müller 1958). The sequential strategy depicted in Figure 2 was used to narrow down to each of the two breakpoints of inversion 3.

Figure 2.

Schematic representation (not at scale) of the experimental strategy used to identify the region spanning each of the two breakpoints of inversion 3. Markers flanking these breakpoints are indicated above the chromosome scheme. Cen, centromere; Tel, telomere.

In the case of the distal breakpoint (CD in Fig. 1), two fragments located ∼270 kb and ∼190 kb from the Acph region in D. pseudoobscura (probes 3a and 3b in Fig. 2, respectively) were used as probes in the first round of in situ hybridizations on Ost polytene chromosomes (Fig. 3A and B). Both markers gave a strong signal at section 95A. Their colocalization with marker S1 indicates that they are also outside the inverted region and, therefore, that the breakpoint is in the fragment flanked by Acph and probe 3b (Fig. 2). In the second round, two new fragments were used as probes, from which probe 3d mapped at section 95A (Fig. 3C) and probe 3e mapped at section 91C (Fig. 3D). Thus, the fragment spanning the inversion breakpoint could be narrowed down to the ∼50-kb-long region that separated probes 3d and 3e (Fig. 2). Three new fragments within this region were used as probes in the third and final round (Fig. 2). Two probes (3g and 3h) gave a single signal at section 91C (Fig. 3E and F), whereas the last probe (named 3f) gave multiple and strong signals (see below) that included the locations of the two cytological breakpoints of inversion 3 (subsections 91B/C and 94E/95A; Fig. 3G). As probe 3f (∼7.5 kb long) covered almost completely the region between probes 3g and 3d, it most likely spanned the distal breakpoint of inversion 3 (Figs. 1 and 2). This was later confirmed using a shorter probe [3f(s)] that clearly hybridized at both inversion breakpoints (Fig. 3H).

Figure 3.

Results of the in situ hybridizations performed on Ost chromosomes to isolate the distal (CD in Fig. 1) and proximal (AB in Fig. 1) breakpoints of inversion 3 of the O chromosome of D. subobscura. Distal breakpoint: (A) Probe 3a; (B) probe 3b; (C), probe 3d; (D) probe 3e; (E) probe 3g; (F) probe 3h; (G) probe 3f; (H) probe 3f(s). Proximal breakpoint: (I) probe 4a; (J) probe 4b; (K) probe 4b1. The multiple and strong hybridization signals in panel G can be attributed to the presence of a region with similarity to known transposable element families (three fragments to helitrons and one to a P-family element) within probe 3f (see text). As shown in panel H, a shorter fragment that did not include this region—Probe 3f(s)—only hybridized at the two breakpoints of inversion 3.

In the case of the proximal breakpoint (AB in Fig. 1), we used a similar strategy despite that the task was much riskier because we started from a single marker within the inverted region: marker P2 that partly contained the AbdA gene. As previously indicated, an ∼300-kb-long region spanning this gene was collinear between D. pseudoobscura and D. melanogaster. This region spanned the three genes (Ubx, AbdA, and AbdB; Fig. 2) that constitute the bithorax complex (Martin et al. 1995). In the first round of in situ hybridizations on Ost polytene chromosomes (Fig. 3), two fragments (4a and 4b in Fig. 2) located at the extremes of this collinear region were used as probes. Probe 4a (AbdB) mapped at the distal end of the inversion (at section 94E; Fig. 3I), whereas probe 4b mapped at its proximal end (section 91B; Fig. 3J). The distal breakpoint is, therefore, in the fragment flanked by probes 4a and 4b (Fig. 2). Given the great conservation of the bithorax complex, we assumed that the breakpoint lay between Ubx and fragment 4b (Fig. 2). Therefore, we designed one probe spanning the complete region separating GA16100 and Ubx (fragment 4b1). To confirm our previous assumption, two probes (4a1 and 4a2) were designed between Ubx and AbdA (Fig. 2). As expected, the two probes located between Ubx and AbdA hybridized at section 94E (results not shown) similarly to the AbdA and AbdB probes, whereas the ∼6.0-kb-long fragment corresponding to probe 4b1 hybridized at both ends of inversion 3 (at subsections 91B/C and 94E/95A; Fig. 3K), which confirms that this fragment (∼6.0 kb long) includes the proximal inversion breakpoint (AB in Fig. 1).

The complete AB and CD fragments (i.e., the ∼6.0 kb and ∼7.5 kb fragments that spanned the breakpoints in the ancestral arrangement) were sequenced for one O3 + 4 line (J16 in Figs. S1–S5). These sequences were compared to the corresponding D. pseudoobscura genome sequence using the discontigous megablast algorithm. For the AB region (Fig. 4A), this comparison revealed four fragments with high similarity between species (sequence identity between 78% and 92%), and it allowed the identification of the orthologs of genes GA16100 (modSP) and Ubx, as well as of their flanking regions. For the CD region (Fig. 4B), the megablast comparison revealed three high similarity fragments (sequence identity between 72% and 92%), corresponding to the orthologs of gene GA26869 (trp), gene GA20651 (Jon99C) and its flanking region, and an additional intergenic region.

Figure 4.

Sequence organization of the two breakpoints regions of inversion 3 of the O chromosome in O3 + 4 chromosomes of D. subobscura and in the corresponding D. pseudoobscura regions. (A) Proximal breakpoint; (B) distal breakpoint. Coding regions are represented by green rectangles with arrowheads indicating the direction of transcription. The region of similarity to known transposable element families is represented by a black box. Noncoding regions are depicted as red rectangles when they exhibit high similarity between species and as thick horizontal lines when they exhibit no similarity. Yellow boxes in the D. subobscura schemes represent the breakpoints themselves. Each breakpoint is delimited by the same 4-bp sequences in inverted orientation (depicted in yellow and blue) as detailed below the general schemes. Black arrowheads above the D. subobscura schemes indicate the approximate location of the primers used to PCR amplify the breakpoint regions in a population sample of O3 + 4 (pairs pA/pB and pC/pD) and Ost (pairs pA/pC and pB/pD) isochromosomal lines (see text and Fig. 1).

No similarity was detected in the remaining fragments (i.e., in the ∼2.5-kb-long central fragment of the AB region, and in any of the two short intervening fragments of the CD region). It is worth noting that at the central part of the AB region of D. pseudoobscura, a short coding region (GA26454) is present in this species (Fig. 4A). This CDS has no homolog in any of the Drosophila species with an available whole genome sequence, with the exception of D. persimilis.

Analysis of the AB and CD regions using the RepeatMasker software revealed a few and short low-complexity regions (not shown), whereas the only moderately long region with similarity to known transposable elements was found in the CD region (Fig. 4B). The multiple signals detected in the in situ hybridization using the 3f fragment as probe (Fig. 3G) reflect the presence of these repetitive sequences in this ∼7.5-kb-long fragment spanning the CD breakpoint.

IDENTIFICATION OF THE BREAKPOINTS OF INVERSION 3

Upon sequencing the complete AB and CD fragments for one O3 + 4 line (see above), internal primers were used to isolate the two breakpoints in Ost chromosomes. Primer pairs pA/pB and pC/pD (Fig. 4) did successfully amplify the central part of the AB and CD regions, respectively, in the sequenced O3 + 4 line (yielding ∼1.3-kb- and ∼1.0-kb-long fragments), but amplification failed when using Ost DNA. This result suggested that these fragments spanned the breakpoints in O3 + 4 chromosomes. This was confirmed by using primer pairs pA/pC and pB/pD (Fig. 4) for PCR amplification. As expected, these primer pairs did successfully amplify the AC and BD regions in Ost (yielding ∼1.0–kb- and ∼1.4-kb-long fragments), whereas amplification failed when using O3 + 4 DNA. Fragments spanning the breakpoints in Ost chromosomes were initially sequenced in a single Ost line (J07 in Figs. S1–S5).

Comparison of the breakpoint regions in O3 + 4 (AB and DC) and Ost (AC and BD) chromosomes allowed the detailed identification of the breakpoints (Fig. 4). In the initially sequenced O3 + 4 line (J16), the proximal breakpoint (AB) could be delimited to a 309-bp-long fragment (Fig. 4A), which was duplicated during the inversion process, as revealed by its presence at the two breakpoints of the inverted Ost chromosomes (AC and BD in Fig. S6). This fragment is located in the ∼2.5-kb-long central fragment of the AB region, which exhibits no similarity to the corresponding region of D. pseudoobscura (Fig. 4A). In contrast, the distal breakpoint is included at the end of one of the high similarity fragments of the CD region (Fig. 4B). In the initially sequenced Ost line (J07), the distal breakpoint could be delimited to a 63-bp-long fragment (Fig. 4B), which was deleted during the inversion process (see below).

The extended AB and CD regions were also amplified and partially sequenced in the closely related species D. madeirensis and D. guanche. The size of the AB amplicon was ∼2 kb smaller in these species than in D. subobscura. Indeed, sequencing of the fragment spanning the AB breakpoint region revealed an ∼1.8-kb deletion in these species relative to D. subobscura. Interestingly, the deleted fragment includes the entire proximal breakpoint, with its distal end and that of the inversion breakpoint being nearly coincidental.

POLYMORPHISM AT THE BREAKPOINT REGIONS AND ALONG THE O3 INVERSION

Variation at the breakpoints was surveyed in a population sample of Ost and O3 + 4 chromosomes through the amplification and sequencing of approximately 1.0–1.4 kb long fragments spanning the breakpoints (Fig. S6). Table 1 shows a summary of nucleotide polymorphism at the A, B, C, and D regions. Based on the comparison between the D. subobscura and both the D. madeirensis and D. guanche sequences, the fragment that was duplicated during the inversion process was considered to be part of the A region (Fig. S6). The size of the regions analyzed varies between 309 nucleotides for the D region, and 666 nucleotides for the B region (after excluding alignment gaps in the complete dataset). Estimates of nucleotide diversity in these regions vary between 0.005 and 0.018 in Ost chromosomes and between 0.009 and 0.046 in O3 + 4 chromosomes. Except for the value estimated at the C region in O3 + 4 chromosomes (0.046), these estimates are within the range of previous estimates at silent sites of eight regions of D. subobscura affected by inversion 3 (0.004–0.022; Munté et al. 2005). Indeed, estimates of variation in this arrangement do not differ significantly (χ2L= 1.60, 2 df, P= 0.449) except when the C region is included (χ2L= 29.97, 3 df, P < 0.0001). Considering each region separately, levels of variation are similar in both arrangements at the A and B regions, whereas they are higher in O3 + 4 than in Ost chromosomes in the C and D regions, even though they only differ significantly between arrangements at the C region (χ2L= 25.16, 1 df, P < 0.0001). For the concatenated dataset, the level of variation is significantly higher in O3 + 4 (0.020) than in Ost chromosomes (0.012). Significance (χ2L= 13.39, 1 df, P= 0.0002) vanishes, however, when only the A, B, and D regions are compared (0.013 in both Ost and O3 + 4 chromosomes; χ2L= 0.743, 1 df, P= 0.389). Similar results were obtained by computer simulation under the conservative assumption of no recombination (results not shown). The frequency spectrum of nucleotide polymorphisms, as summarized by Tajima's D statistic, seems in general (although not significantly) shifted toward an excess of low-frequency variants in O3 + 4 chromosomes, with Ost chromosomes exhibiting a weaker trend (Table 1). The A, B, C, and D regions do not only exhibit an appreciable level of nucleotide variation, but also extensive length variation (Table S2). The level of length variation, as measured by the number of indels and indel diversity, is higher at O3 + 4 than at Ost chromosomes.

Table 1.  Polymorphism and divergence at the fragments (A, B, C, and D) spanning the breakpoints of inversion 3.
 ABCDConcatenated data
O3 + 4OstO3 + 4OstO3 + 4OstO3 + 4OstO3 + 4Ost
  1. n, sample size; L, fragment length (fragment length with gaps); S, number of segregating sites (number of mutations); h, number of haplotypes; Hd, haplotype diversity; π, nucleotide diversity; D, Tajima's D; K, nucleotide divergence from Drosophila madeirensis.

n 999910910999
L 340 (413) 340 (413) 666 (935) 666 (935) 401 (738) 401 (738) 309 (355) 309 (355) 1814 (2441) 1814 (2441)
S 18 (19)18282745 (50)10125104 (110)62
h 9 9 9 9 10 9 9 5 9 9
Hd 1.001.001.001.001.001.000.980.811.001.00
  0.017 0.018 0.012 0.015 0.046 0.009 0.009 0.005 0.020 0.012
D −0.812−0.419−1.279−0.0510.195−0.227−1.464−0.654−0.539−0.245
K    0.034 0.034 0.063 0.093 0.036 0.040 0.043 0.052
π/K  0.3530.4410.7300.0970.2500.1250.4650.231

Table 2 gives a summary of the level of genetic differentiation between chromosomal arrangements. Despite some differences in the level of differentiation among regions (with the A and C regions exhibiting higher estimates than the B and D regions), the four fragments exhibit significant genetic differentiation between the two arrangements (as established by Hudson's KS* statistic; Table 2), both when individually and jointly considered.

Table 2.  Genetic differentiation at the breakpoint fragments.
ComparisonFragmentFixedP1F2F1P2Shared F ST D XY (sign.)
  1. Fixed, fixed differences; P1F2, sites polymorphic in O3 + 4 and fixed in Ost; F1P2, sites fixed in O3 + 4 and polymorphic in Ost; Shared, polymorphic sites segregating for the same two variants; FST, proportion of nucleotide diversity attributable to variation between arrangements; DXY, average number of nucleotide differences per site; sign., significance established using Hudson's KS* test statistic; A, B, C, and D, fragments spanning the breakpoints; DUP, duplicated fragment (see text); ***P < 0.001.

O3 + 4 vs. Ost       
  A  3  18 17 1 0.528 0.037 (***)
 B 1 232250.4210.022 (***)
  C  6  50 10 0 0.572 0.064 (***)
 D 1 11 410.5680.017 (***)
  Concatenated 12 103 55 7 0.525 0.033 (***)
DUP-AB vs. DUP-AC  2 121410.5120.033 (***)
DUP-AB vs. DUP-BD    0  12 11 0 0.312 0.022 (***)
DUP-AC vs. DUP-BD  5 131100.5720.035 (***)

To gain further information on the inversion origin, nucleotide variation was also estimated at the ∼300-bp-long fragment that was duplicated during this process. This fragment is present as a single copy in the O3 + 4 chromosomes (DUP-AB), and as two copies in the Ost chromosomes (DUP-AC and DUP-BD; Fig. S6). Estimates of nucleotide diversity (Table 3) are similar among copies (0.015 in all three cases). Moreover, their pairwise comparison reveals similar and significant levels of genetic differentiation among copies (Table 2). This is also reflected in the reconstructed gene genealogy (whose unrooted character is due to the absence of this region in both D. madeirensis and D. guanche). Indeed, the presence of three clusters, each corresponding to one of the three copies (Fig. S7), points to their independent evolution.

Table 3.  Nucleotide and insertion–deletion polymorphism at the fragment (DUP) duplicated during the inversion process.
 DUP-ABDUP-ACDUP-BD
  1. n, sample size; L, fragment length; S, number of segregating sites; h, number of haplotypes; Hd, haplotype diversity; π, nucleotide diversity; D, Tajima's D; I, number of InDel; IL, average InDel length event; ID, InDel diversity per site.

n 999
L 236 (355) 236 (355) 236 (355)
S 121110
h 8 6 9
Hd 0.970.831.00
  0.015 0.015 0.015
D −0.982−0.507−0.297
I 9 1 4
IL12.75.03.8
ID 0.007 0.002 0.004

The level of nucleotide diversity (scaled over divergence, π/K) at inversion 3 breakpoints in O3 + 4 and Ost chromosomes (Table 1) was compared to that previously detected in the same population at eight regions along this inversion (Navarro-Sabaté et al. 1999; Rozas et al. 1999; Munté et al. 2005). Region A was excluded from this analysis given its absence in the outgroup species D. madeirensis. The scaled π/K estimate at the C region in O3 + 4 chromosomes (0.730) far exceeds all other estimates, similarly to the nonscaled π estimate (Table 1 and Fig. 5). This very high level of variation, therefore, cannot be explained by a higher than average mutation rate at this region. The other π/K estimates for inversion breakpoints do not differ greatly from those at the eight other regions (Fig. 5), even if the lowest estimate in each arrangement corresponds to breakpoint regions. Genetic differentiation between O3 + 4 and Ost chromosomes, as measured by the DXY statistic, shows the highest variation at inversion breakpoints, with the estimates for three of the four breakpoint regions (C region included) exhibiting higher values than more internal regions (Fig. 5). Levels of genetic differentiation are more similar among regions when measured as FST (results not shown).

Figure 5.

Nucleotide variation (π/K) and genetic differentiation (DXY) at regions affected by inversion 3 of the O chromosome. Distance of each region to the nearest breakpoint was estimated assuming a homogeneous distribution along inversion 3 (3.5 Mb).

BREAKPOINTS, POPULATION SAMPLE, AND INVERSION ORIGIN

When all O3 + 4 and Ost sequences are considered (Figs. S1–S5), the two regions spanning the breakpoints are generally delimited by the same two 4-bp motifs (5′ATGC3′ and 5′GCAG3′) in inverted orientation (Fig. S6). Although this is reminiscent of repetitive sequences, the delimited fragments differ greatly in length (∼300 bp at AB vs. ∼60 bp at CD) and exhibit little if any similarity. Moreover, they do not exhibit any similarity to any known transposable element family. The A fragment that was duplicated during the inversion process is generally longer in the DUP-AC copy than in copies DUP-AB and DUP-BD (Fig. S6). The CD breakpoint fragment varied from 40 to 79 bp in O3 + 4 chromosomes (Fig. S4), and it was deleted in Ost chromosomes as previously indicated. The presence of a duplicated fragment at the breakpoints of inverted chromosomes (Ost in this case) would support that the inversion arose via staggered breaks as depicted in Figure 6.

Figure 6.

Schematic representation of inversion 3 origin through staggered breaks at both breakpoints and posterior repair of the intervening sequence (in red) after the reannealing, through complementarity, of the alternative base pair motifs. The two 4 bp motifs that delimit each breakpoint (i.e., the AB and CD breakpoints) are represented in yellow and blue, respectively (as in Fig. 4 and Fig. S1). The lower case letters at each motif refer to the A, B, C, and D regions (see Fig. 4 and text), with the two complementary strands depicted as x and x′ (with x varying from a to d).

Discussion

ORIGIN OF INVERSION 3

The availability of whole genome sequences of closely related species has definitely provided new insights into the role played by chromosomal inversions in genome evolution (e.g., Pevzner and Tesler 2003; Feuk et al. 2005; Bhutkar et al. 2008; Lee et al. 2008; von Grotthuss et al. 2010). Moreover, it has provided support for the two major molecular mechanisms underlying their origin (e.g., Richards et al. 2005; Ranz et al. 2007). The identification of new polymorphic inversions and of their breakpoints has also benefited from whole genome sequence projects and the associated development of molecular markers. Here we have used a mixed strategy (i.e., taking advantage of whole genome sequences and of molecular markers) to isolate and finely characterize the two breakpoints of a polymorphic inversion of D. subobscura, inversion 3 of the O chromosome, in a population sample of both Ost and O3 + 4 chromosomes. In O3 + 4 chromosomes (with the ancestral configuration for inversion 3; Fig. 1), the inversion breakpoints could be identified as two rather short fragments (∼300 bp and 60 bp long, respectively; Fig. 4) with no similarity to any known transposable element family or repetitive sequence. This observation renders the origin of inversion 3 by ectopic recombination between transposable elements, or other repetitive sequences, highly unlikely. Furthermore, the presence of an ∼300-bp duplicated fragment at the two breakpoints of Ost chromosomes (Figs. 1, 4, and Fig. S6) clearly support that this inversion originated via staggered double-strand breaks. The fact that the two breakpoints are delimited by the same two 4-bp motifs in inverted orientation facilitates the visualization of the inversion process. Indeed, after the occurrence of the staggered breaks encompassing the two breakpoints and the subsequent inversion of the intervening chromosomal region (Fig. 6), the short single-strand motifs would have mediated the annealing of the inversion ends and, thus, the origin of the ∼300-bp-long duplication through synthesis of the complementary strand at both breakpoints, and the deletion of the shorter single-strand fragment. Inversion 3 breakpoints provide one of the clearest examples of the origin of a segregating inversion through the staggered double-strand breaks mechanism.

The characterization and sequencing of polymorphic and fixed inversions had raised the possibility that the mode of origin of inversions was related to the inversion age (Prazeres da Costa et al. 2009), and also that it might be species or group specific. Our results together with previous results would rather suggest that the mode of origin is neither related to the inversion age—as revealed by the staggered breaks mode in inversion 3, a polymorphic inversion—nor group specific—as revealed by the coexistence of both modes in the obscura lineage, that is, through repetitive sequences in the D. pseudoobscura lineage and through staggered double-strand breaks in the D. subobscura lineage.

INVERSIONS AND NUCLEOTIDE VARIATION

The origin of an inversion constitutes an extreme bottleneck that for successful inversions (i.e., those that are preserved and achieve some frequency in the population) implies an initial depletion of variation. The inverted region regains variation through mutation and also through genetic exchange with the ancestral gene arrangement (Navarro et al. 1997). Levels of variation in a particular inversion would thus be dependent on the inversion age and also on the time of coexistence with the ancestral arrangement. Although both gene conversion and double crossovers can contribute to the genetic exchange in inversion heterokaryotypes, double crossovers would only be likely to have an effect in the more central part of the inverted region. Genetic exchange between inverted and noninverted chromosomes is, therefore, expected to increase with distance from the breakpoints (Navarro et al. 1997). In the inverted region, the level of variation would increase following the differential contribution of double crossovers with distance from the breakpoints.

In the case here studied, extant populations of D. subobscura harbor two chromosomal arrangements—-Ost and O3 + 4—that arose independently from the ancestral O3 arrangement through inversions 3 and 4, respectively (Fig. 1). O3 went extinct some time after both inversions occurred, which implies that there was limited time for each of the derived arrangements to exchange information with O3. The location of the regions analyzed—at inversion 3 breakpoints in Ost chromosomes (AC and BD in Fig. 1) but rather distant from inversion 4 breakpoints in O3 + 4 chromosomes (AB and DC in Fig. 1)—might have differentially affected this exchange. Moreover, the overlapping character of inversions 3 and 4 would have greatly affected the level of genetic exchange between Ost and O3 + 4 upon their establishment.

In a previous study of nucleotide variation at eight regions distributed along inversion 3 in a sample of Ost and O3 + 4 chromosomes, levels of nucleotide variation within chromosomal arrangement and of genetic differentiation between arrangements were quite uniform across regions (Munté et al. 2005). Double crossovers would have thus contributed little to the genetic exchange along this rather short inversion. Variation in each arrangement would, therefore, be the result either of mutation or of the genetic exchange majorly due to gene conversion. Here we have surveyed nucleotide variation at and around the breakpoints themselves, where gene conversion (but not mutation) is expected to be restricted due to mechanical problems in the synapsis of chromosomes carrying alternative gene arrangements (Navarro et al. 1997). Accordingly, a lower level of variation within each arrangement and a higher level of genetic differentiation between arrangements would be expected at the breakpoint regions relative to more internal regions. These expectations are not well supported by present results. Indeed, although for each arrangement a breakpoint region exhibits the lowest scaled diversity estimate, the level of variation at breakpoint regions other than the outlier C region is overall of the same order than in more internal regions (Fig. 5). Also, genetic differentiation—as estimated by DXY—does not differ greatly between breakpoint regions (excluding the C region) and more internal regions, with only two of the three breakpoint regions exhibiting a higher estimated value than internal regions (Fig. 5). This trend is even milder when measuring genetic differentiation with FST. These results, and the detection of two small (4 and 14 nt long) tracts in the C region and 2 medium-sized (87 and 125 nt long) tracts in the B region, suggest that gene conversion at inversion breakpoints is not as restricted as previously thought. Indeed, if the actual reduction were mild or affected a small region, its effect at the evolutionary time scale might be difficult to detect at least in some cases.

Nucleotide variation at the eight regions distributed along inversion 3 revealed a higher level of nucleotide diversity in O3 + 4 than in Ost chromosomes, suggesting a more recent origin of Ost (Navarro-Sabaté et al. 1999; Rozas et al. 1999; Munté et al. 2005). The pattern is not so clear at the breakpoint regions here studied. Our results would suggest that, even if Ost originated more recently than O3 + 4 from O3, the time elapsed between these events would not be very long. Indeed, the rather similar level of variation detected at the three copies of the DUP region (Table 3), despite that DUP-AC and DUP-BD originated as a result of the inversion process, would point in this same direction.

THE BREAKPOINT REGIONS IN THE OBSCURA GROUP

Our strategy to identify inversion 3 breakpoints relied heavily on the comparison of the D. melanogaster and D. pseudoobscura genome sequences, and more specifically on the collinearity of long regions between or around the orthologs of known cytological markers in the target species D. subobscura (Figs. 1 and 2). The regions spanning the breakpoints were initially identified and sequenced in one O3 + 4 line, which confirmed that these regions were orthologous to those in both D. melanogaster and D. pseudoobscura. In D. subobscura as well as in the other two species of the subobscura species cluster—D. madeirensis and D. guanche—the AB breakpoint was flanked by the orthologs of genes modSP and Ubx, and the CD breakpoint by those of genes trp and Jon99C. Given the strong association previously detected in D. subobscura between peptidase allozymes variants and chromosomal arrangements O3 + 4 and Ost (Fontdevila et al. 1983; Prevosti et al. 1983), it is worth noting that each breakpoint is flanked by at least one gene encoding an enzymatic protein with peptidase activity (modSP and Jon99C, respectively).

The analysis of the extended AB region sequences in species of the obscura group would suggest that the inversion breakpoint is located in a small fragment with a high turnover despite it being within a long collinear region (∼300 kb). In the three species of the subobscura species cluster, the extended AB region is highly conserved except for an ∼1.8-kb fragment that is absent in both island species D. madeirensis and D. guanche. Interestingly, in D. subobscura one of the ends of this fragment is the breakpoint itself (i.e., the ∼300-bp fragment). It seems unlikely that both states (i.e., with and without this fragment) segregated in the ancestral D. subobscura populations (with the O3 chromosomal arrangement) during the over 2 million years period that separates the origin of the two island species (Ramos-Onsins et al. 1998). It seems more plausible that this fragment was gained after the split of the D. guanche lineage and likely of the D. madeirensis lineage. Moreover, the presence of this fragment in both O3 + 4 and Ost arrangements would require that the variant with this fragment had attained a rather high frequency across most of the D. subobscura O3 distribution area. Indeed, there are some indications from the current geographical frequency distribution of these arrangements that the two inversions originated in different parts of the ancestral distribution area of D. subobscura (Krimbas 1992).

In D. pseudoobscura and D. persimilis, the extended AB region provides further evidence for the high turnover of its central part in the obscura group. Indeed, in the D. persimilis genome sequence, a 5-exon coding region (1454 nt long) has been annotated. In the D. pseudoobscura genome sequence, this CDS is annotated as a 1-exon pseudogene (Fig. 4) due to a donor splice-site mutation and the resulting in-frame stop codon at the beginning of the first intron. This CDS has not been found in any of the other Drosophila species with whole genome sequences (Clark et al. 2007; Tweedie et al. 2009), and it is not found either in the AB fragment of any of the three species of the subobscura species cluster. The origin of this CDS remains an open question and its location in the AB region further suggests the possibility of a short fragment with a high turnover (Fig. 4).

We can conclude therefore that although the AB breakpoint of inversion 3 of the O chromosome could be readily identified and is not associated with any known transposable element family, it is located in a high turnover region. In contrast, the CD breakpoint region is more stable across the obscura species group, because it has only been affected by inversion 3 in D. subobscura. However, in the extended CD region there is an ∼1-kb-long fragment around the breakpoint that exhibits the lowest level of similarity between D. subobscura and D. pseudoobscura (Fig. 4), and in D. subobscura the breakpoint itself (i.e., the ∼60-bp-long fragment) harbors considerable length variation. The question arises of how often inversion breakpoints occur in short unstable regions. This question has been recently addressed at the level of fixed rearrangements (von Grotthus et al. 2010). Indeed, the analysis of chromosome reorganization across the Drosophila genus (i.e., across the 12 Drosophila species whose genome was first sequenced; Clark et al. 2007) revealed that fragile regions have played a more prevalent role than functional constraints in chromosomal evolution (von Grotthus et al. 2010). Addressing this same question at the level of polymorphic inversions awaits, however, the fine and massive characterization of polymorphic inversion breakpoints. Moreover, there are other open questions, such as those concerning the distribution of polymorphic inversions across the genome, which will also benefit from that detailed information.

Associate Editor:K.Dyer

ACKNOWLEDGMENTS

We thank D. Salguero for his excellent technical assistance, Servei de Genòmica, Serveis Cientifico-Tècnics, Universitat de Barcelona, for automated sequencing facilities, and three anonymous reviewers for comments. This paper was prepared with full knowledge and support of the Barcelona Subobscura Initiative (BSI). This work was supported by grants BFU2007–63228 from Ministerio de Educación y Ciencia, Spain, and 2009SGR-1287 from Comissió Interdepartamental de Recerca i Innovació Tecnològica, Generalitat de Catalunya, Spain to MA.

Ancillary