A,B, Types of SRSRs distinct (more than 3 bp differences) within the same microorganism. ND, Not determined.
Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria
Article first published online: 18 JAN 2002
Volume 36, Issue 1, pages 244–246, April 2000
How to Cite
Mojica, F. J. M., Díez-Villaseñor, C., Soria, E. and Juez, G. (2000), Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria. Molecular Microbiology, 36: 244–246. doi: 10.1046/j.1365-2958.2000.01838.x
- Issue published online: 18 JAN 2002
- Article first published online: 18 JAN 2002
- Received 20 December, 1999; revised 5 January, 2000; accepted 10 January, 2000.
A peculiar type of repeated element has been detected in different prokaryotes and the occurrence of similar elements in very distant phylogenetic groups is being reported subsequent to genomic sequencing. A comparative study of these peculiar elements, aimed at determining the common structural and sequence features, as well as their phylogenetic distribution, will contribute to elucidate their biological relevance.
These sequences share multiple features which are unique as a whole, being easily distinguishable from any other recurrent motif, and arising as a new family of prokaryotic repeats. They are repeated short elements generally occurring in clusters, but their main peculiarity is the layout: they are always regularly spaced by unique intervening sequences of constant length. For the sake of clarity, and ensuing from the mentioned characteristics, we will refer to the members of this family of repeats as Short Regularly Spaced Repeats (SRSRs).
Using a specific computer program, we have performed a SRSRs search in the completed microbial genomes and the available partial genome sequences of those close to completion. The organisms in which SRSRs have currently been found are listed in Table 1. In summary, the SRSRs are widespread among the various physiological and phylogenetic groups, probably being present in all the Archaea and hyperthermophilic Bacteria, in at least some members of the cyanobacteria and proteobacteria lineages, as well as in the two subgroups of Gram-positive bacteria (the low and high GC content groups). They thus represent the most widely distributed family of repeats among prokaryotic genomes.
|Organism||SRSR size (bp)||Spacing (bp)||Number of clusters||SRSR units per cluster||Reference|
|H. volcanii||30||ND||≥2||ND||Mojica et al. (1995) Mol Microbiol9: 13–21|
|H. mediterranei||30||33–39||3||21/ ND / ND||Mojica et al. (1995) Mol Microbiol9: 13–21|
|M. jannaschii||28–30||31–51||7A + 6B + 1C||4–25||Bult et al. (1996) Science273: 1058–1073 and this work|
|M. thermoautotrophicum||30||34–38||2||124/47||This work|
|A. fulgidus||37A/30B||≈ 37||1A + 2B||42A/48B/60B||This work|
|S. solfataricus||25||≈ 40||2||94/102||Sensen et al. (1998) Extremophiles2: 305–312|
|P. abysii||29A/30B||26–43||1A + 2B||7A/22 B/27B||This work|
|P. horikoshii||29||34–58||3||18/26/66||Kawarabayasi et al. (1998) DNA Res5: 55–76|
|A. pernix||24A/23B||37–52||2A + 1B||19A/27A/42B||Kawarabayasi et al. (1999) DNA Res6: 83–101|
|T. maritima||30||39–40||8||2–40||Nelson et al. (1999) Nature399: 323–329|
|A. aeolicus||29||36–38||1||6||This work|
|E. coli||29||32–33||3||2/7/13||Nakata et al. (1989) J Bacteriol171: 3553–3556 and this work|
|S. typhi||29||32||1||6||This work|
|C. jejuni||36||30||1||5||This work|
|Y. pestis||28||32–33||2||6/9||This work|
|C. difficile||29||36–38||4A + 2B||5–17||This work|
|M. tuberculosis||36||38–40||1||Variable||Hermans et al. (1991) Infect Immun59: 2695–705|
|Calothrix sp.||37||35–41||>1||5||Masepohl et al. (1996) Biochim Biophys Acta1307: 20–36|
|Anabaena sp.||37||32–43||>1||17||Masepohl et al. (1996) Biochim Biophys Acta1307: 20–36|
|V. faba||40||20–35||1||6||Flamand et al. (1992) Plant Mol Biol19: 913–923|
The main features of the SRSRs are summarized in Table 1. They are typically short partially palindromic sequences of 24–40 bp, containing inner and terminal inverted repeats of up to 11 bp (see Fig. 1). Although isolated elements have been detected, the SRSR elements are generally arranged in clusters (up to 14 per genome) of repeated units spaced by unique intervening 20–58 bp sequences. The extent of the clusters is particularly noteworthy in the Archaea.
The SRSRs are very homogeneous within a genome, most of them being identical. However, there are examples of heterogeneity, specially in Archaea. Various SRSR sequences with less than 85% similarity can be distinguished in Pyrococcus abyssi, Archaeoglobus fulgidus, Aeropyrum pernix and Methanococcus jannaschii. In the latter, two clusters with 25 and five units of the same element were initially reported (Bult et al., 1996, Science273: 1058–1073). We have found 12 additional loci and three different SRSR elements, with more than 5 bp changes.
The sequence is conserved in members of the same phylogenetic group, and there is a high percentage of similarity even among domains (see Fig. 1), indicative of a common origin. Phylogenetic distance and the degree of sequence conservation closely concur. Haloferax volcanii differs from Haloferax mediterranei in 3 out of 30 bp, and Pyrococcus horikoshii differs from Pyrococcus abysii in 2 out of 29 bp. The high degree of homology between Escherichia coli and Salmonella typhi is remarkable, with one difference out of 29 bp.
The terminal and inner-inverted repeats of each element are the most conserved regions of the SRSRs (Fig. 1), suggesting that they must be playing an essential role.
In M. jannaschii, Methanobacterium thermoautotrophicum, A. fulgidus, Thermotoga maritima, A. pernix and Mycobacterium tuberculosis, some SRSR clusters are followed by larger (> 300 bp) repeated elements. This association is not detectable in other microorganisms, nor is its possible relevance known.
A general location pattern of the SRSRs loci is not recognizable. There is, however, a remarkable coincidence. Possible chromosomal origins of replication have recently been proposed for the Archaea M. thermoautotrophicum and P. horikoshii (Lopez et al. 1999, Mol Microbiol32: 883–886). In both cases, two clusters of SRSRs are located one to each side of the proposed origin of replication. The distance to the origin is similar, and relatively short, for both clusters (200 and 270 kb in M. thermoautotrophicum, 40 and 78 kb in P. horikoshii). The early and simultaneous appearance of the SRSR clusters in the nascent molecules can be interpreted as being indicative of their relevance.
Besides the sequence conservation, other remarkable features of this family of tandem repeats are the palindromic nature and regular spacing of the SRSR elements. The size of the repeated unit and the presence of inner short inverted repeats are characteristics that concur with those of recognizing sites for certain DNA-binding proteins. The regular spacing of the SRSR elements locate the inverted repeats to the same side of the DNA chain. Although cooperative binding to free proteins cannot be excluded, this peculiar arrangement, with such a length of regularly positioned sites, would rather suggest the need for a solid attachment to a cellular structure that is consequently organized. This would be in agreement with the previously proposed role in replicon partitioning for the SRSRs of haloarchaea (Mojica et al. 1995, Mol Microbiol9: 13–21)
The question emerges here as to whether the SRSRs have a common function in prokaryotes, or whether their presence is reminiscent of ancient sequences and their role diverged with evolution. The universality, phylogeny and biological significance of this peculiar family of repeats arises as an item to be elucidated.
This work was financed by a research grant from the Conselleria de Cultura Educació i Ciència, Generalitat Valenciana (GV97-VS-25–82). E.S. holds a graduate fellowship from the Conselleria de Cultura Educació i Ciència, Generalitat Valenciana.
The sequence data of unfinished genomes were produced by the S.typhi (Salmonella typhi), the C.jejuni (Campylobacter jejuni), the Y.pestis (Yersinia pestis), and the C.difficile (Clostridium difficile) Sequencing Groups at the Sanger Centre and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/st, ftp://ftp.sanger.ac.uk/pub/pathogens/cj, ftp://ftp.sanger.ac.uk/pub/pathogens/yp and ftp://ftp.sanger.ac.uk/pub/pathogens/cd respectively.