Mitochondrial DNA insertions in the nuclear horse genome

Authors


E. Giulotto, Dipartimento di Genetica e Microbiologia ‘Adriano Buzzati-Traverso’, Università di Pavia, Via Ferrata 1, 27100 Pavia, Italy.
E-mail: elena.giulotto@unipv.it

Summary

The insertion of mitochondrial DNA in the nuclear genome generates numts, nuclear sequences of mitochondrial origin. In the horse reference genome, we identified 82 numts and showed that the entire horse mitochondrial DNA is represented as numts without gross bias. Numts were inserted in the horse nuclear genome at random sites and were probably generated during the repair of DNA double-strand breaks. We then analysed 12 numt loci in 20 unrelated horses and found that null alleles, lacking the mitochondrial DNA insertion, were present at six of these loci. At some loci, the null allele is prevalent in the sample analysed, suggesting that, in the horse population, the number of numt loci may be higher than 82 present in the reference genome. Contrary to humans, the insertion polymorphism of numts is extremely frequent in the horse population, supporting the hypothesis that the genome of this species is in a rapidly evolving state.

Introduction

Integration of mitochondrial DNA (mtDNA) sequences into the nuclear genome gives rise to so-called numts (nuclear sequences of mitochondrial origin) (Lopez et al. 1994). Following the pioneering work in which the presence of genomic DNA hybridizing with mitochondrial DNA probes was identified more than 40 years ago (Du Buy et al. 1966; Du Buy & Riley 1967), numts were studied in different eukaryotes, from protists to mammals (Leister 2005; Hazkani-Covo et al. 2010 and references therein). Studies aimed at identifying the mechanisms of integration and evolutionary dynamics of numts are now greatly facilitated by the availability of whole nuclear genome sequences. These studies demonstrated that numt abundance varies in different eukaryotic genomes from those with no detectable numt sites such as some protists (Cyanidioschyzon merolae, Monosiga brevicollis, Naegleria gruberi and Thalassiosira pseudonana) and animals (Anopheles gambiae, Branchiostoma floridae, Ciona savignyi and Danio rerio), up to more than 2.1 Mb of numt sequences in the opossum Monodelphis domestica (Hazkani-Covo et al. 2010). Taking into account the density of numts in the genome, the highest estimate seems to be 0.1% (1 bp/kb) in the honeybee (Pamilo et al. 2007).

Insertion of mitochondrial DNA fragments into nuclear chromosomes, together with the insertion of transposons, retroviruses and telomeric-like sequences, is a driving force in evolution. While transposons and retroviruses are integrated by well-described mechanisms relying on specific proteins encoded by the inserted element itself (Kazazian 2004), and interstitial telomeric sequences are integrated by retrotranscription of the telomerase RNA component (Nergadze et al. 2004, 2007), the mechanisms responsible for the transfer of DNA fragments from mitochondria to nuclei are still elusive (Hazkani-Covo et al. 2010). Numt integration represents the prototype of exogenous insertions in the nucleus (Ricchetti et al. 2004). Studies on a yeast experimental system showed that the integration of mitochondrial sequences can occur during the repair of DNA double-strand breaks (Ricchetti et al. 1999). Numts, similarly to transposons and interstitial telomeres (Salem et al. 2003; Nergadze et al. 2004; Hazkani-Covo et al. 2010), have been successfully used in several studies as evolutionary markers, because the insertion events can be easily dated by comparing the sequence of orthologous loci in related species: the presence of a locus containing the insertion in one species and of the corresponding ‘empty’ locus, lacking the insertion, in another species demonstrates that the radiation of the two species occurred before the integration event. Besides phylogenetic studies, insertion polymorphisms are particularly informative for population studies, because the probability that two insertion events occur independently in the same position is essentially equal to zero and, conversely, the precise removal of an inserted sequence is extremely rare.

The analysis of numts has been extensively carried out in humans and other primates. Of the 452 estimated numts in the human genome (Hazkani-Covo et al. 2010), at least 40 were inserted in the human lineage after the split from the chimpanzee lineage. Of these human-specific numts, 12 display insertion polymorphism in the human population, reflecting their recent integration; thus, the colonization of nuclear genomes by mitochondrial DNA sequences is an ongoing process (Zischler et al. 1995; Thomas et al. 1996; Hazkani-Covo et al. 2010). It was also shown that 23 human-specific numts are inserted into known or predicted genes, mainly in introns (Ricchetti et al. 2004). Although insertions of mtDNA fragments into the nuclear genome usually appear as neutral mutations, in rare cases numt insertions into genes are associated with human diseases: a 251-bp numt insertion into the gene for plasma factor VII caused a splice site junction abnormality (Borensztajn et al. 2002); a 72-bp insertion into exon 14 of the GLI3 gene created a pre-mature stop codon resulting in a truncated protein product. This mutation was shown to be responsible for a sporadic case of Pallister-Hall syndrome detected in a patient that was exposed to high-level radioactive contamination following the Chernobyl accident (Turner et al. 2003), hinting that environmental factors inducing DNA double-strand breaks may facilitate numt insertions. A 93-bp numt insertion into the MCOLN1 gene caused an inherited case of mucolipidosis IV (Goldin et al. 2004), and a 36-bp numt insertion into the USH1C gene is associated with Usher syndrome 1c (Chen et al. 2005). In one case, a 41-bp fragment from the mitochondrial 12S rRNA gene was integrated at the breakpoint junction of a familial constitutional reciprocal translocation t(9;11)(p24;q23) (Willett-Brozick et al. 2001).

The insertion of mitochondrial sequences into the nuclear genome could occur in principle both by direct DNA transfer and by cDNA-mediated transfer. It was proposed that, during the repair of a double-strand break in the nuclear DNA, either a mitochondrial DNA duplex fragment or a cDNA derived from retrotranscription of mitochondrial RNA can be recruited as filler DNA by the non-homologous end-joining repair process (Shay & Werbin 1992; Mourier et al. 2001; Ricchetti et al. 2004; Leister 2005; Hazkani-Covo et al. 2010), similarly to what we have proposed for the insertion of interstitial telomeric repeats (Nergadze et al. 2004, 2007).

The transfer of functional genes from mitochondrial to nuclear genomes has occurred extensively during evolution (Henze & Martin 2001), leading to the very streamlined mitochondrial genome of today’s mammals. It is conceivable that we are now observing the continuation of this type of event that is mainly transferring non-functional DNA.

The genomes of the species from the genus Equus (horses, asses and zebras) are in a rapidly evolving phase (Ryder et al. 1978; Wichman et al. 1991; Trifonov et al. 2008). For instance, the formation of evolutionarily novel centromeres (Carbone et al. 2006) still lacking satellite repeats has been recently described (Piras et al. 2009; Wade et al. 2009; Piras et al. 2010). This situation makes the study of numts particularly attractive to obtain molecular data on the evolutionary dynamics of these genomes.

In this work, the availability of the horse genome sequence offered us the opportunity to study numt insertions and their polymorphism in this species.

Materials and methods

Search of numts within the horse nuclear genome

The horse mitochondrial database sequence (acc. No.: NC_001640) was used as query for BLAT search (blast-Like Alignment Tool; Kent 2002; http://genome.ucsc.edu/cgi-bin/hgBlat) to reveal all numt loci present in the Equus caballus nuclear genome database sequence (September 2007 Broad/equCab2 assembly) (Wade et al. 2009). BLAT finds sequences with at least 95% similarity that are 40 bp or longer and may be interrupted. It may miss genomic alignments that are more divergent or shorter, although it will find perfect sequence matches of as few as 22 nucleotides; BLAT is particularly useful to reveal sequences that are interrupted relatively to the query. As numt sequences undergo post-insertional rearrangements, such as deletions or insertions of interspersed fragments and insertion of known repetitive elements, BLAT is a good tool for locating them (Lascaro et al. 2008). We found 82 numt loci containing 32 or more nucleotides homologous to horse mtDNA sequence. A complete list and description of the numt loci used for this analysis is presented in Table 1. It is important to underline here that Hazkani-Covo et al. (2010) detected 203 numt loci in the horse genome using a blastn search with a threshold E-value of 0.0001.

Table 1.   Numts identified in the horse reference genome1.
Numt nameChromosomeStarting nucleotide of numt insertionLength of numt insertionStarting nucleotide in horse mtDNA sequenceIdentity with horse mtDNA (%)Sequence analysis of numts2Analysis of flanking sequences
  1. 1Horse genome database sequence at Broad Institute, MIT (Wade et al., 2009).

  2. 2NR, no rearrangement (see Fig. 1a); Del, deletion (see Fig. 1b); Ins, insertion (see Fig. 1c); Del/Ins, deletion and insertion (see Fig. 1d).

  3. 3Numt loci analysed in different horse individuals (see Table 3).

1-131218121319164196.2NRSingle-copy sequence
1-21697051641684494382.7DelSingle-copy sequence
1-319345715668427892.7NRSingle-copy sequence
1-4110817489211001402784.1NRSingle-copy sequence
1-53111135676365514998.4NRSingle-copy sequence
1-6111545827064199496.9NRInterrupts SINE/ERE3
1-7113900943355251456393.4DelFlanked by LINE/L1
1-8115518186136532294.5NRSingle-copy sequence
1-9115518189567982094.1NRSingle-copy sequence
1-1011604322888319623381.3InsSingle-copy sequence
2-1266330511324290.0NRSingle-copy sequence
2-2265735556260614584.1DelSingle-copy sequence
2-32110796771801157893.8NRFlanked by LINE/L1
3-133348708673123098.8NRSingle-copy sequence
3-239198847953773490.0NRSingle-copy sequence
3-3310206927283892178.4NRFlanked by SINE/MIR
3-43111796087711068591.6NRInterrupts LINE/L1
4-14378568822651314799.3NRSingle-copy sequence
4-24505623637356094.0NRInterrupts LINE/L1
4-34543793758939789.3NRFlanked by LTR/ERVL and SINE/MIR
4-447568537011947890.2NRSingle-copy sequence
4-54912103871481159687.9NRSingle-copy sequence
5-135875927424921340395.4NRSingle-copy sequence
5-2572726099456110583.2Del/InsSingle-copy sequence
5-358985019247986393.7NRSingle-copy sequence
5-45913183262003577885.9DelSingle-copy sequence
6-136628656244881634094.1NRSingle-copy sequence
6-23667014027170724298.9NRInterrupts LINE/L2
6-3629621487320868483.7NRInterrupts LINE/L2
6-4629622055398968488.1NRInterrupts LINE/L2
8-18265397751161160284.3DelFlanked by LINE/L2
8-2884205943884186887.7NRInterrupts LINE/L2
9-195390309961157992.6NRSingle-copy sequence
9-2930275929324183.5DelSingle-copy sequence
9-39302792456716735479.3Del/InsSingle-copy sequence
9-493847952154484394.5NRInterrupts LINE/L1
9-5943143577701540692.9NRSingle-copy sequence
9-694379402757191.3NRSingle-copy sequence
9-79577191669514686.3NRSingle-copy sequence
9-89615321874531388081.8NRSingle-copy sequence
10-110123524534651076785.4NRSingle-copy sequence
10-210548559046012488100.0NRInterrupts SINE/ERE1
10-3107342967878173692.4NRSingle-copy sequence
10-4107370352143282090.7NRFlanked by LINE/L1
10-51075481662537269899.1NRFlanked by LINE/L2
11-11112667007451493691.2NRSingle-copy sequence
11-2112884247241838292.7NRSingle-copy sequence
11-3114601898153698096.3NRSingle-copy sequence
12-112200252442591446186.2NRSingle-copy sequence
14-1314162135078451427599.6NRInterrupts LINE/L1
14-21439907520324774100.0NRSingle-copy sequence
14-31443692682871427792.0NRSingle-copy sequence
14-414911485014457184.4DelSingle-copy sequence
14-5149115345910201531191.6InsSingle-copy sequence
15-115166745183001579584.4NRSingle-copy sequence
15-2151697592295204792.8NRSingle-copy sequence
15-31527077095761170994.8NRSingle-copy sequence
16-1161229032977514189.7NRSingle-copy sequence
16-21674871971621071890.4NRSingle-copy sequence
18-11822907251991617693.8NRSingle-copy sequence
18-2187173924841409684.6NRSingle-copy sequence
18-31823031196106138093.4NRSingle-copy sequence
19-1195878351951299288.5NRSingle-copy sequence
19-21913989012149107589.6NRSingle-copy sequence
19-331937379292801261598.8NRSingle-copy sequence
19-41951225342499468881.5NRSingle-copy sequence
20-120349325325051570187.0Del/InsFlanked by LINE/L1
21-12186721625441117479.7DelSingle-copy sequence
21-232148937028849843100.0NRSingle-copy sequence
21-321572954153596665380.3InsSingle-copy sequence
23-132312913902901549988.9NRSingle-copy sequence
23-22338318463401174097.5NRInterrupts LINE/L1
24-1243220978727691333485.3InsSingle-copy sequence
25-125166223156071247380.0NRSingle-copy sequence
26-132635276579135795187.5NRSingle-copy sequence
27-12752034273189425289.8NRFlanked by LINE/L1
27-227520919914811403690.4NRSingle-copy sequence
27-327373670286193806598.2DelSingle-copy sequence
29-129120013671041158493.5NRSingle-copy sequence
30-1301974356164696290.2NRSingle-copy sequence
31-13119419621881638100.0NRSingle-copy sequence
31-23311652020312861049899.4NRSingle-copy sequence

Source of DNA and polymerase chain reaction amplification

Unique primer pairs were deduced from the sequences flanking 12 numt loci (Table 1) inserted within single-copy sequence (Table 1). The sequence of the primers is external to numt insertions and is reported in Table 2. Genomic DNA was extracted from 20 unrelated horses. For DNA extraction, blood was collected from 17 animals used for show-jumping competitions; these horses derive from different European stud farms and, according to their pedigree chart, do not share common ancestors up to the third generation. DNA was also extracted from fibroblast cell lines established from the skin of three different slaughtered animals. Genomic DNA (50–100 ng) was amplified by polymerase chain reaction as previously described (Nergadze et al. 2004).

Table 2.   Primers used in PCR numt analysis.
Numt lociPrimers
1-1GGAAGAATTCTGTGCTGGACTCAAAGGAGTGGTAAAGCAGCAA
1-5AGTCACATCGCAGGAAGAACTATGGGTCATATCGAGCTTTC
3-1TGATGGATAGGCAGATGGAGGGGTAAAGGTCTGCAGGATG
5-1CAGCAGCCTCTGTCTAATCCTTCACACCAAGTCCCACAACTA
6-1GCAGAAGGGAATAGATAAGACGAAGCGATGAATCTTGCAGAG
6-2CTGATGGGCTGGAAGTTGAAGCCTCTAGAACATCGGGTAATTGC
14-1CTGAATCAAGTTACCATCGCTGTAAGTTTGGAGTGCCCAGA
19-3TGCTACATTCAAACCACTTCCCATCACAGGCTCAGAGGACT
21-2GCCAGAGGAAGATGTGAGACGGTTTCCACACATTTGCATAG
23-1GCCATGTATCTACAGCAGAATCAGTTTGGGTCAGTGCTCAGT
26-1GCAACTGTGCCTGGAGAACAATGGCAGTCTTAGTGCTTGTGTTCA
31-2TGTAGGTGGGAAAAGGTGAACGGAAACAGCTTGCCAT

Reaction products were analysed by electrophoresis on 1–2% agarose gels, and fragments of interest were gel-extracted and purified (Wizard SV gel and PCR clean-up system; Promega Corporation) for sequencing.

Sequence generation and analysis

Gel-purified amplification products were sent to sequencing facilities for direct sequencing with both the forward and the reverse amplification primers. To prove specificity of amplified fragments, their sequences were compared with the corresponding loci from the horse genome database sequence (http://genome.ucsc.edu/cgi-bin/hgGateway?org=Horse&db=equCab2&hgsid=153566333).

The Repeat-Masker software at EMBL (http://woody.embl-heidelberg.de/repeatmask/) was used to identify known repetitive elements (such as SINEs, LINEs, microsatellites, etc.). Sequences were aligned using the Multiple sequence Alignment software, MultAlin (http://prodes.toulouse.inra.fr/multalin/multalin.html).

Results and discussion

Search of numts in the horse genome reference sequence

We used the horse mitochondrial sequence from GenBank as query for a BLAT search aimed at revealing the nuclear genome loci containing mitochondrial DNA insertions (numts) in the horse genome reference sequence. Eighty-two such loci with 78–100% identity to mtDNA were found. Numt length varies from 32 (at nt 39907520 of horse chromosome 14) to 9456 bp (at nt 7272609 of horse chromosome 5). In the horse nuclear reference genome, 77191 bp (0.0029%) are composed of sequences of mitochondrial origin. A complete list of the 82 numt insertions found in this search is reported in Table 1. Twenty-nine of the 82 numts (35%) are inserted into introns of known or predicted genes. The remaining 53 numts are located at intergenic sites. No numt insertions were observed within exons. Hazkani-Covo et al. (2010) performed a blast search and found 203 blast hits. The discrepancy between the number of numts found by Hazkani-Covo et al. and by us derives from the different approaches used (see Materials and methods). In addition, while we performed our search on the genome of a single individual (the reference genome of the Thoroughbred mare Twilight), the other authors carried out a blast search that includes sequence runs from the genomes of other individuals; because of the high frequency of numt insertion polymorphisms that we demonstrate in the present work, it is likely that the search of Hazkani-Covo et al. (2010) may have included some numts that are absent in Twilight.

Distribution and sequence analysis of numts

Contrary to the situation described in the human and other genomes, in which numt insertion was sometimes followed by duplication of the surrounding region, leading to duplication of the numt loci, in the horse reference genome we did not find duplicated numt loci. This observation matches with the relatively low abundance of duplications present in the horse genome in comparison with other genomes (Wade et al. 2009).

The overall distribution in the horse reference genome is one numt per 33 Mb, with values ranging between less that one per 124 Mb (the length of chromosome X, which seems to be devoid of numt insertions) and 1 per 10.5 Mb (chromosome 9) (Table 1). It should be noted, however, that owing to their insertion polymorphism, not all numt loci are present in the reference genome. In conclusion, although some chromosomes seem more prone to numt insertion than others, similar to human chromosome 18 and Y (Ricchetti et al. 2004), given the relatively small number of insertions, our data do not allow us to identify the presence of chromosomal regions clearly favouring or disfavouring numt insertions. In a similar manner to interstitial telomeric repeat insertions (Azzalin et al. 2001; Nergadze et al. 2004, 2007; Ruiz-Herrera et al. 2008), numt insertions (Ricchetti et al. 1999, 2004; Hazkani-Covo & Covo 2008) are the consequence of DNA double-strand break repair; therefore, we may expect that these events also occur preferentially at chromosomal regions more prone to breakage.

To detect mutations that may have occurred in the mitochondrial DNA following its insertion in the horse genome, we compared the sequence of the 82 numts with the horse mitochondrial DNA reference sequence. However, it is important to underline that mitochondrial DNA is characterized by high mutation rates and polymorphism and that the identity values reported in Table 1 derive from the comparison of one nuclear genome (the genome of Twilight) with one mitochondrial DNA sequence (the reference horse mtDNA sequence). From Table 1, it appears that four relatively short numts share an identical sequence with the corresponding mitochondrial genome fragment (numts 10-2, 14-2, 21-2 and 31-1); all other numts contain variable numbers of point mutations (transitions, transversions, nucleotide additions or deletions), with identity values ranging between 78.4% and 99.6%. Among these, 16 numts also underwent gross rearrangements (Table 1); the sequence of four numts representative of different situations is depicted in Fig. 1. In Fig. 1a, an example of a numt in which no gross rearrangement has occurred is reported. Nine numts (1-2, 1-7, 2-2, 5-4, 8-1, 9-2, 14-4, 21-1 and 27-3) contain non-continuous pieces of mitochondrial DNA in the same orientation; these rearrangements are probably attributed to deletions (23–2070 bp) of internal fragments from integrated numt sequences; an example is reported in Fig. 1b. Four numts (1-10, 14-5, 21-3 and 24-1) are interrupted by 17- to 2141-bp sequences; the insertion of 259 bp from an ERE2_SINE/tRNA element is reported in Fig. 1c. Three numts (5-2, 9-3 and 20-1) contain more complex rearrangements, including both insertions and deletions (one example is shown in Fig. 1d).

Figure 1.

 Post-integration modifications of numt loci. The sequence of four numt insertions (red nucleotides) is compared with the sequence of the corresponding region of the horse mitochondrial DNA (blue nucleotides); flanking and inserted nucleotides are in black. (a) At the numt locus on chromosome 4 (nt 37856882), the 265 bp from mtDNA (see Table 1) did not undergo gross rearrangements. (b) At the numt locus on chromosome 21 (nt 8672162), 64 nt were deleted following insertion. (c) At the numt locus on chromosome 1 (nt 160432288), a 259-nt fragment from the repetitive element ERE2_SINE/tRNA was inserted. (d) At the numt locus from chromosome 20 (nt 3493253), a 207-bp fragment was deleted and an 85-bp random sequence was inserted.

In Fig. 2, a map of the horse mtDNA with the distribution of its genes is depicted. Each external black line represents the sequence of a numt. The map shows that the entire horse mitochondrial genome is represented as numts, without gross bias. In particular, the ribosomal RNA genes do not seem to be conspicuously more represented, contrary to what is suggested for other mammals (Qu et al. 2008).

Figure 2.

 Horse mtDNA coverage of numt fragments. The mitochondrial protein coding genes and the control region are depicted in different colours on the horse mitochondrial DNA map (purple: rRNA genes, red: tRNA genes, grey: control region, green: protein genes). Each black line represents a numt insertion in the horse genome reference sequence.

It has been suggested that numt fragments could derive from the insertion of either retrotranscribed RNAs or mtDNA fragments. The available data do not allow us to distinguish between these two possibilities. The mechanism of mtDNA transcription involves the processing of two long complementary transcripts covering the entire genome, and the transcript of the light strand is fragmented into a number of short primers; therefore, the possibility of retrotranscription must certainly be considered, even if this is made less likely by the fact that polyA tracts appear to be absent in the inserts, similarly to what has already been described in humans (Leister 2005).

All of these results suggest that mtDNA sequences have appeared in nuclear intrachromosomal locations by transfer of mtDNA fragments into DNA double-strand break sites as filler DNA derived either directly from mtDNA fragmentation or by retrotranscription of mitochondrial RNA.

Analysis of the numt insertion sites

Analysis of the sequences flanking the insertions showed that the GC content at numt integration sites did not differ from that of the entire Equus caballus genome: 39.62% using 1-kb intervals spanning the insertion sites, and 39.26% using 10-kb intervals, when compared with 40.27% (Leeb et al. 2006) (data not shown).

Repeat-Masker analyses of the sequences flanking the numt insertions have shown that in 64 cases of 82 (78%) mtDNA sequences are inserted within single-copy sequence (Table 1). Because 26% of the horse genome is made up of LINEs and SINEs (Wade et al. 2009), and 18 numts of 82 (22%) are inserted within or are flanked by one of these repetitive elements, numt insertions seem to occur essentially at random sites.

Polymorphism of numt insertion in the horse

As mentioned previously, a large body of evidence suggests that the horse genome is in a rapidly evolving phase; therefore, we might expect that several numt insertions may have occurred in the horse lineage in relatively recent evolutionary times and may not have been fixed yet in the horse species, giving rise to polymorphism. To test this possibility, 12 numt loci (marked in Table 1) were analysed by PCR in 20 unrelated horse individuals, using the primers listed in Table 2. The numt inserts at these loci comprised between 73 and 1286 bp are >87% identical to the horse mtDNA and are flanked by single-copy sequences on which the primer pairs were designed.

The results (Table 3) showed that, for six of the 12 numts (1-1, 1-5, 6-1, 21-2, 23-1 and 26-1), all individuals analysed were homozygous for the presence of the numt (numt+/+), suggesting that these insertions are either very frequent or even fixed in the horse species. The remaining six numts (50% of numts analysed) are characterized by insertion polymorphism, as in the studied individuals null alleles (i.e. amplified sequences not containing the numt insertion) are also present. The observation of these ‘empty’ alleles at 50% (six out of 12) of numt loci strongly supports the hypothesis that the horse genome is evolving rapidly, at least in comparison with the human genome, in which the fraction of polymorphic numt loci is lower (Hazkani-Covo et al. 2010). The frequency of null alleles in the 20 individuals is variable, ranging between 7.5% (numt 5-1) and 100% (numt 14-1). In the case of numt 14-1, only null alleles were found in the 20 horses, and this numt locus was only present in Twilight, suggesting that this insertion was a recent event. Figure 3 reports the results of PCR amplification of the DNA of three representative polymorphic loci in one heterozygous individual, one individual homozygous for the presence of the numt and one homozygous for the null allele.

Table 3.   Numt loci in 20 unrelated horses.
Numt nameLength of PCR fragments1 (bp)Number of alleles (%)Homozygous individuals (%)Heterozygous individuals (%)
numt+numtnumtnullnumt+/+numt−/−numt+/−
  1. 1NF, not found in the 20 individuals.

1-11230NF40 (100)0 (0)20 (100)0 (0)0 (0)
1-5978NF40 (100)0 (0)20 (100)0 (0)0 (0)
3-158550736 (90.0)4 (10.0)17 (85.0)1 (5.0)2 (10.0)
5-178229037 (92.5)3 (7.5)17 (85.0)0 (0)3 (15.0)
6-1745NF40 (100)0 (0)20 (100)0 (0)0 (0)
6-2152213465 (12.5)35 (87.5)1 (5.0)16 (80.0)3 (15.0)
14-1NF6940 (0)40 (100)0 (0)20 (100)0 (0)
19-332323930 (75.0)10 (25.0)12 (60.0)2 (10.0)6 (30.0)
21-243935540 (100)0 (0)20 (100)0 (0)0 (0)
23-1413NF40 (100)0 (0)20 (100)0 (0)0 (0)
26-1845NF40 (100)0 (0)20 (100)0 (0)0 (0)
31-216693839 (22.5)31 (77.5)2 (10.0)13 (65.0)5 (25.0)
Figure 3.

 PCR fragments obtained from three polymorphic numt loci. The results of agarose (1–2%) gel electrophoresis, following PCR amplification, are shown. 1: Homozygous individual for the numt insertion. 2: Heterozygous individual. 3: Homozygous individual for the null allele. The molecular weight of the fragments is shown to the left of the gel.

In Fig. 4, the sequence of the numt and the null allele of three loci is compared with the corresponding horse mtDNA portion. In all three cases, the inserted mitochondrial sequence interrupts the ancestral empty sequence, but the addition of 1–6 random nucleotides occurred at the 3′ end of the insert. This sequence organization suggests that numts were probably inserted during the repair of DNA double-strand breaks via the non-homologous end-joining pathway. In fact, similar modifications of break sites during repair were previously observed in mammalian cellular systems (Rebuzzini et al. 2005). In addition, dinucleotide microhomology between the mitochondrial DNA and the genomic insertion site is present at the 5′ end (Fig. 4, shaded in grey), similar to what observed for other mammalian numt insertions (Hazkani-Covo & Covo 2008) and interstitial telomeric sequences (Nergadze et al. 2004, 2007), suggesting that such microhomology facilitates ligation of the broken DNA ends with the mitochondrial DNA fragment.

Figure 4.

 Sequence of the numt and the null allele at three polymorphic loci. The mitochondrial reference sequences are shown in blue; genomic numt insertions are shown in red and genomic flanking nucleotides in black. Insertions of random nucleotides are in green, and microhomology between mitochondrial and nuclear DNA is shaded in grey. For each locus, the first and the last nucleotide of the numt in the nuclear genome (top) and in the mitochondrial DNA sequence (bottom) are reported.

Concluding remarks

In our survey of mitochondrial DNA insertions in the horse reference genome sequence, we identified 82 numts. Unexpectedly, at some loci, the null allele is prevalent in the analysed sample, suggesting that the number of numt loci in the global horse population is higher than 82 present in Twilight. The analysis of the insertion sites is consistent with the argument that numts arise from the repair of DNA double-strand breaks by non-homologous end-joining during evolution, similarly to interstitial telomeric sequences (Nergadze et al. 2004, 2007; Ruiz-Herrera et al. 2008).

It is worth noting that in the human genome (Giampieri et al. 2004; Ricchetti et al. 2004; Hazkani-Covo et al. 2010), the proportion of loci characterized by insertion polymorphism is much lower than that in the horse. This observation is in agreement with the notion that the horse genome is in a rapidly evolving state (Ryder et al. 1978; Wichman et al. 1991; Carbone et al. 2006; Trifonov et al. 2008; Wade et al. 2009; Piras et al. 2010).

Acknowledgements

This work was funded by Programmi di Ricerca Scientifica di Rilevante Interesse Nazionale (PRIN 2008).

Conflicts of interest

The authors have declared no potential conflicts.

Ancillary