Instituto de Microbiología Bioquímica/Departamento de Microbiología y Genética, Consejo Superior de Investigaciones Científicas (CSIC)/Universidad de Salamanca, Campus Miguel de Unamuno, 37007 Salamanca, Spain.
Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
The sequencing of the entire genetic complement of Streptomyces coelicolor A3(2) has been completed with the determination of the 365 023 bp sequence of the linear plasmid SCP1. Remarkably, the functional distribution of SCP1 genes somewhat resembles that of the chromosome: predicted gene products/functions include ECF sigma factors, antibiotic biosynthesis, a gamma-butyrolactone signalling system, members of the actinomycete-specific Wbl class of regulatory proteins and 14 secreted proteins. Some of these genes are among the 18 that contain a TTA codon, making them targets for the developmentally important tRNA encoded by the bldA gene. RNA analysis and gene fusions showed that one of the TTA-containing genes is part of a large bldA-dependent operon, the gene products of which include three proteins isolated from the spore surface by detergent washing (SapC, D and E), and several probable metabolic enzymes. SCP1 shows much evidence of recombinational interactions with other replicons and transposable elements during its history. For example, it has two sets of partitioning genes (which may explain why an integrated copy of SCP1 partially suppressed the defective partitioning of a parAB-deleted chromosome during sporulation). SCP1 carries a cluster of probable transfer determinants and genes encoding likely DNA polymerase III subunits, but it lacks an obvious candidate gene for the terminal protein associated with its ends. This may be related to atypical features of its end sequences.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Streptomyces spp. are morphologically and genetically complex members of the high G+C Gram-positive actinomycetes, and are important as the major source of natural antibiotics and therefore of genes for combinatorial biosynthesis. Streptomyces coelicolor A3(2) has become a model organism for this genus for two main reasons: its early development as a genetic system and its production of at least four chemically distinct antibiotics, of which two are pigmented and therefore convenient for genetic studies (Hopwood, 1999). The ≈ 8.7 Mb linear chromosome of S. coelicolor A3(2) has been sequenced (Bentley et al., 2002; http://www.sanger.ac.uk/Projects/S_coelicolor/).
Soon after early conjugation-based genetic mapping studies of S. coelicolor had demonstrated that conventional genetic markers could all be aligned on a single linkage map (Hopwood, 1967), it became apparent that the levels, direction and pattern of natural genetic exchange involved a genetically defined plasmid, SCP1 (Vivian and Hopwood, 1970). It was deduced that SCP1 existed autonomously in the original A3(2) strain, and that it had integrated into the chromosome in certain fertility variants (Hopwood et al., 1973). SCP1 could also be lost altogether, and SCP1– × SCP1– crosses exhibited very low, but still perceptible, fertility. Subsequently, most of this residual fertility was attributed to a second plasmid, SCP2 (Bibb et al., 1977), an ≈ 30 kb circle, the complete sequence of which was determined recently (Haug et al., 2003).
SCP1 resisted isolation as an intact molecule for many years until pulsed-field gel electrophoresis (PFGE) revealed it as a linear molecule of ≈ 350 kb with long terminal inverted repeats (TIRs) of about 80 kb (Kinashi et al., 1987; Kinashi and Shimaji-Murayama, 1991). It has since become clear that linear plasmids (so far ranging from c. 12 kb to many hundreds of kilobases) are present in many streptomycetes (Kinashi, 1994) and other actinomycetes (e.g. Le Dantec et al., 2001). Particularly with relatively small linear plasmids of actinomycetes, progress has been made in studying their mode of replication, which typically involves bidirectional replication from a centrally located origin and end-filling of the lagging strand from a protein bound to the 5′ ends (e.g. Bao and Cohen, 2001; Yang et al., 2002).
The detailed characterization of the genetic composition of representative large linear plasmids may provide further insights into the origins and evolution of linear chromosomes in Streptomyces spp. (where studied, other actinomycetes such as mycobacteria and corynebacteria have circular chromosomes; Cole et al., 1998; Bentley et al., 2002). Recently, Mochizuki et al. (2003) found that about 75% of the sequence of pSLA2-L, a linear plasmid of 210 614 bp from Streptomyces rochei, comprises genes related to secondary metabolism. Here, we analyse the 356 023 bp sequence of SCP1 and show that its complex evolution has included the acquisition of genes that probably represent adaptations to reproduction via a sporulating aerial mycelium and to life in the complex soil environment. As an example, we analyse a large, developmentally controlled operon that encodes several proteins associated with the surface of spores as well as some predicted metabolic enzymes.
Overview of the DNA sequence of SCP1, in relation to the chromosome, other plasmids and the lifestyle of streptomycetes
General features of the 356 023 bp SCP1 sequence are shown in Fig. 1. The sequence and annotation are available in the public databases with EMBL accession number AL589148. At 69.32%, SCP1 has a lower G+C content than the S. coelicolor chromosome (72.12%; Bentley et al., 2002), the Streptomyces avermitilis chromosome (70.7%; Ikeda et al., 2003) and the large linear plasmid pSLA2-L of S. rochei (72.8%; Mochizuki et al., 2003). There are segmental shifts in GC strand bias, which, unlike the situation often found in bacteria (Ochman and Santos, 2003), do not correlate precisely with a centrally located autonomously replicating se-quence tentatively identified as a replication origin (see below) (Fig. 1). Nevertheless, like most bacterial replicons (Ochman and Santos, 2003), but in contrast to the S. coelicolor chromosome (Bentley et al., 2002), SCP1 displays an upward shift (i.e. a bias towards G) on most of what would be considered as the leading strand, based on the presumptive central replication origin. As in the host chromosome, there are further shifts at the junctions of the central region with the long (75 122 bp) TIRs. The predominant direction of transcription is strongly biased towards the centre in the left non-TIR half and shows little bias in the right non-TIR half.
Of the 353 predicted SCP1 coding sequences (CDSs), of which five are pseudogenes, 75 reside in each TIR. Remarkably, 57% of the deduced gene products show no significant database matches, compared with 23% for the host's chromosomal genes. A further nearly 20% are conserved hypotheticals, compared with 30% of chromosomal genes. Overall, 128 (36%) CDSs could be allocated some functional prediction when database similarities and structural predictions were combined (Table 1). In line with the distribution of gene functions on the chromosome, SCP1 encodes a high proportion of proteins with predicted regulatory functions (about 6%). Among these, three appear to be ECF (extracytoplasmic function) sigma factors, in proportion with the 63 sigma factors encoded by the chromosome. Possession of so many ECF sigma factors by streptomycetes has been considered to reflect the complexity and variability of the soil habitat, as well as the developmental complexity of these organisms (Bentley et al., 2002; Ikeda et al., 2003). The SCP1.161c-encoded sigma factor has the unusual feature of an N-terminal extension homologous to the actinomycete-specific Wbl (WhiB-like) class of regulatory proteins (Soliveri et al., 2000) and is designated wblP. Two other genes also encode Wbl proteins (SCP1.95, wblN; and SCP1.115, wblO), and wblO is located next to another of the ECF sigma-encoding genes (SCP1.116). Conjunctions of wbl genes and sigma factor determinants are intriguing in the light of evidence that a Wbl protein of Mycobacterium tuberculosis functions by direct contact with the principal sigma factor (Steyn et al., 2002). At least two of the chromosomal wbl genes (whiB and whiD) are needed for normal sporulation of S. coelicolor (Davis and Chater, 1992; Molle et al., 2000).
Table 1. List of SCP1 genes with functional predictions.
General function, and number of genes
SCP1 CDS number and brief annotation
Additional copies in TIR.
The criteria for setting up the four major and 13 subsidiary functional groupings and for including particular genes in them were based on the submitted annotation (EMBL accession number AL589148). The conventional abbreviations used in the annotation were as follows: Rep, replication; Pol, DNA polymerase; Par, DNA partitioning; Tra, plasmid transfer; Spd, plasmid intramycelial spread; Wbl, WhiB-like proteins peculiar to actinomycetes; AraC, TetR, GntR, the paradigms of classes of regulatory proteins; Mmf, Mmy, associated with methylenomycin production; Mmr, methylenomycin resistance; Sap, spore-associated protein; ACP, acyl carrier protein.
Proteins predicted to be involved in membrane transactions are encoded by about 8% of SCP1 genes, again mirroring the chromosome. In one notable operon-like cluster (SCP1.165–179), 10 of the 12 gene products have predicted membrane association, the other two having predicted ATP/GTP-binding domains. Possibly, some or all of these 12 proteins may form a surface-located protein complex.
SCP1 contains several genes that are potential targets for developmental stage-specific regulation by bldA
Among six alternative leucine codons, one, UUA, is particularly rare in S. coelicolor. The only tRNA able to translate this codon efficiently is the product of bldA, a gene needed for various stationary phase attributes (Lawlor et al., 1987; Leskiw et al., 1991). Thus, bldA mutants show conditional loss of aerial mycelium formation and failure to make antibiotics (Merrick, 1976). The preferential accumulation of this tRNA late in growth (Trepanier et al., 1997) may be one of the mechanisms responsible for the general limitation of secondary metabolism to stationary phase. About 2% (145) of the 7825 genes of the S. coelicolor chromosome contain a TTA codon (Bentley et al., 2002). The vigorous growth of bldA mutants indicates that none of these 145 genes is essential. Although the possibility has not been excluded that this surprising distribution of TTA codons is merely adventitious, the fact that most gene sets for antibiotic production in streptomycetes include a regulatory gene containing a TTA codon points towards an adaptive benefit.
Six of the 18 SCP1 genes containing TTA codons are predicted to be involved in regulating gene expression. They include SCP1.116 and 161c, two of the three SCP1 genes encoding putative ECF sigma factors. The SCP1.161c TTA is codon 70 of this gene, exactly at the downstream end of the region of homology to wbl genes described above. None of nearly 50 S. coelicolor chromosomal genes for ECF sigma factors contains a TTA codon, although we found a TTA codon in one of 45 such genes in the S. avermitilis chromosome (Ikeda et al., 2003).
A TTA codon is also present at codon 11 in both copies (SCP1.58c/295) of a gene present in the TIRs that encodes a homologue of AbaA OrfA, which is the product of one of a cluster of S. coelicolor chromosomal genes involved in regulating antibiotic production (Fernandez-Moreno et al., 1991). SCP1.58c/295 is co-transcribed with the nearby sapCED genes encoding spore-associated proteins (see below). Among other abaA orfA-like genes in the S. coelicolor chromosome, one has been associated with antibiotic production (mia; Champness et al., 1992), whereas another is next to the abaA orfE-like gene whiJ, which is involved in morphological differentiation (Gehring et al., 2000; N. J. Ryding, J. Ainsa and K. F. Chater, unpubl.).
SCP1 carries the mmy gene cluster for biosynthesis of the antibiotic methylenomycin, production of which is eliminated by bldA mutations (Merrick, 1976). The further detailed analysis of this cluster will be reported separately (S. O’Rourke, C. J. Bruton and K. F. Chater, in prep.; see also Challis and Chater, 2001), but here we note particularly that two genes in the mmy cluster contain TTA codons: mmfL, which is likely to be involved in a methylenomycin-specific gamma-butyrolactone signalling system (it is related to afsA, a TTA-free gene implicated in A-factor biosynthesis in Streptomyces griseus; Horinouchi et al., 1989); and mmyB, which encodes a putative DNA-binding protein of the XRE (xenobiotic response element) family. Experiments to be described elsewhere show that the TTA codons in mmfL and mmyB are the cause of the bldA dependence of methylenomycin production (S. O’Rourke and K. F. Chater, unpubl. results).
TTA codons are also found in SCP1.212, encoding an isoprene biosynthetic enzyme, and two genes that encode putative transposases: SCP1.276 is the core of IS466S, which is identical to two S. coelicolor chromosomal copies of IS466 (i.e. including the TTA codon; the relevant chromosomal genes are SCO3467 and 3490 in the ScoDB database); and SCP.163, which is similar to a TTA-free chromosomal gene, SCO4343.
Spore-associated proteins encoded by SCP1
The evidence that SCP1 is highly integrated into its host's physiology and developmental biology is not limited to in silico analysis. Here, we present results in which a study starting from the developmental biology of S. coelicolor led to the demonstration that a large cluster of SCP1 genes in the TIRs has interesting temporal regulation, gives rise to proteins with a distinctive developmental association and is dependent on bldA.
Detergent washing of spores from cultures of the SCP1-containing S. coelicolor HU3 grown on medium R2YE yielded several electrophoretically distinct spore-associated proteins or Saps, two of which, SapA and SapB, have been described elsewhere (Guijarro et al., 1988; Willey et al., 1991). Three more (Saps C, D and E) were found to accumulate as cultures sporulated (Fig. 2). They were purified and subjected to N-terminal sequencing. Using the codon preference of the organism, the resulting sequences were used to design ‘best guess’ oligonucleotides, which were used as probes to clone hybridizing DNA from strain A3(2). Southern hybridization analysis, and further cloning and sequencing experiments, showed that the three genes were present only in strains carrying SCP1, and were closely linked, with sapC being about 4 kb from sapE, and sapD close to sapE. When the SCP1 sequence became available, it confirmed the arrangement deduced from the Southern blots, and revealed that an identical gene set for the three Sap proteins is located in each TIR of SCP1 in its autonomous form (the righthand set is deleted from the integrated SCP1NF). sapC was identified with SCP1.56c and 297 (here termed SCP1.56c/297), sapE with SCP1.51c and 303 (i.e. SCP1.51c/303), and sapD with SCP1.50c and 304 (i.e. SCP1.50c/304).
The experimentally determined N-termini of the Sap proteins agreed with the genome-based predictions. The N-terminus of one of them, SapC, resembles Sec-dependent signal sequences, and also has a potential signal peptidase cleavage site, yet SapC remained unprocessed. SapE and SapD have no obvious features that might bring about their export. In contrast, chromosomally determined SapA contains a processed signal sequence (Guijarro et al., 1988). Finding extracellular spore-associated proteins without a signal sequence is not novel: an example has been found in Myxococcus xanthus (Nelson and Zusman, 1983). SapC, D and E are predicted to be positively charged, which may account for their association with the negatively charged cell wall. Otherwise, the primary sequences render no further insight into their potential function. They are not essential for aerial growth, spore formation/maturation or spore germination, as SCP1– strains are not obviously impaired in these processes.
The three sap genes are part of an ≈ 9 kb cluster of 12 contiguous or nearly contiguous genes (SCP1.59c/294–48c/306; Fig. 3A). Other genes in the cluster, those sandwiched between sapC and sapED, are annotated as encoding a hydratase (SCP1.55c/299), an acetaldehyde dehydrogenase (SCP1.54c/300), a 4-hydroxy-2-oxovalerate aldolase (SCP1.53c/301) and a lyase (SCP1.52c/302), and may perhaps be involved in a catabolic pathway. In view of the unusual mixture of genes, some encoding abundant secreted proteins associated with the spore surface, and others possibly participating in an intracellular metabolic pathway, it was of interest to investigate the possible operon arrangement of the cluster. Co-transcription of these genes was confirmed by Northern blotting of RNA from surface-grown cultures of HU3, using non-overlapping probes for sapED, sapC and regions upstream of the sap genes (SCP1.59c/294–57c/296, and 65/288c to 60/293c) (Fig. 3A). Probes I, II and III all revealed a smear of partially degraded RNA of maximum size ≈ 9 kb (Fig. 3B), present in high abundance only at later time points when aerial mycelium was present (not shown). No such mRNA was detected in liquid-grown cultures [strain J1506 (SCP1+) in YEME medium]. Hybridization to probe IV revealed an RNA of ≈ 1.35 kb (Fig. 3B) coinciding most closely with the predicted size of the divergently co-transcribed genes SCP1.60/293c, which encodes a putative DNA-binding protein of the XRE family, and 61/292c, which encodes a protein resembling AbaA OrfD, a protein encoded by a chromosomal gene in a cluster that regulates antibiotic production (see above). S1 mapping and primer extension analysis revealed a major transcription start point at nucleotide (nt) position 59 464/296 560 for the sap operon, 68 nt upstream of the predicted SCP1.59c/294 translation start point (Fig. 3C and D). Our experiments did not rule out the possibility of other internal promoters within the 12-gene operon.
To test the possible developmental association of transcription of this operon, an SCP1NF construct containing a xylE–hyg reporter cassette inserted into sapC (at a HindIII site; Fig. 3A) was introduced into representative bldA, B, C, D, G and H mutants (defective in aerial growth on R2YE medium), as well as into the morphologically wild-type strain J1501. Most of the mutants gave more or less the same level of xylE activity, as revealed by yellow staining when exposed to catechol, showing that sapCED expression was not obligatorily coupled to normal aerial growth or the extracellular signal cascade in which several of these bld genes participate (Willey et al., 1993); but the bldA mutant had sharply reduced yellow staining, indicating reduced transcription of sapC. As pointed out earlier, a TTA codon is present in the abaA orfA-like second gene (SCP1.58c/295) of the sapCED operon, raising the possibility that the reduced expression of sapC in the bldA mutant might be a polar effect of translational arrest at the TTA codon. To investigate this, xylE was inserted into SCP1.59c/294, the gene immediately upstream of the TTA-containing gene (at a NotI site; Fig. 3A). In the bldA mutant, there was a strong yellow halo, compared with the marked reduction in the level of transcription downstream of the TTA codon in the bldA sapC::xylE fusion, whereas the bldA+ control strain gave rise to strong haloes of yellow stain, just as with the sapC–xylE fusion. This indicates that, in Streptomyces, ribosomal arrest caused by an inability to recognize a codon can give rise to transcriptional polarity.
During this work, it became obvious that the sapCDE operon was much more strongly expressed in HU3 than in another SCP1-containing strain (J1506). This phenotypic difference was closely linked to the operon, as high expression was retained by exconjugants containing the SCP1NF region of HU3 after crosses with morphologically wild-type strains or bld mutants other than bldA. In addition, the strain difference was further suggested as being located upstream of the NotI site in SCP1.59c/294 when the fragments used to create the xylE fusions were considered, ruling out a mutation in the abaA orfA-like gene (SCP1.58c/295). Finally, because the Northern blots suggested that the divergently transcribed putative regulatory genes (SCP1.61/293c and 62/292c described above) were expressed, the sequence of these genes and the intergenic region containing the sap promoter was determined for HU3. A single point mutation was found in the 4-base overlap between SCP1.61 and 62 (293c and 292c) (M. Y. Ryan and J. R. McCormick, unpubl. result). The consequences of the mutation will be more fully explored in the future.
Plasmid maintenance functions
Replication. A previous study revealed a short segment of SCP1 that, when ligated to a replicon-free resistance gene, conferred on it the ability to replicate in Streptomyces lividans (an SCP1-free organism) (Redenbach et al., 1999). It was speculated that a low (G+C) region of this segment, located between SCP1.194 and SCP1.195c, might function as the replication origin. The essential part of the autonomously replicating sequence (ARS) also includes SCP1.196, which encodes a 507-amino-acid primase/helicase-like product 46% identical over 470 of its residues to mycobacteriophage TM4 gp70, and SCP1.195c and 197, which may encode nucleotide-binding proteins. Presumably, in S. lividans, this ARS is opened up at its origin assisted by the helicase activity of the SCP1.196 product, the primase activity of which would then prime replication. Host components would presumably make up the enzymatic machinery needed for DNA synthesis, although SCP1 also encodes homologues of two major subunits of DNA polymerase III: α, the catalytic subunit for replicative chromosomal DNA synthesis in most bacteria including S. coelicolor (Flett et al., 1999) (SCP1.224c); and β, which is important for processivity (SCP1.119). Two similar gene sets in pSLA2-L can also function as replication origins (Mochizuki et al., 2003), but pSLA2-L does not encode putative DNA polymerase subunits. SCP1 does not contain a replication system resembling those of pSLA2-S, pSCL1 or SLP2, three smaller linear Streptomyces plasmids (Chang et al., 1996; Huang et al., 2003). A further SCP1 gene, SCP1.113, encodes a protein resembling a possible controlling element for replication of the integrative plasmid pSAM2 of Streptomyces ambofaciens.
Telomere structure. Kinashi et al. (1991) first analysed the SCP1 ends. More information about the ends of Streptomyces linear replicons has accrued since then, and it has become clear that the predicted secondary structure of the first 237 nt of the 3′ ends of SCP1 (Fig. 4) differs from typical Streptomyces telomeres. The G+C content (58%) of this sequence is relatively low. The 3′ end is exposed, whereas that of typical Streptomyces telomeres is paired with an internal sequence; the hairpin organization is somewhat simpler than other known Streptomyces telomeres; and all seven hairpin loops have 4 nt loops, while those in typical Streptomyces telomeres have 3 nt loops. Of the seven 4 nt hairpin loop sequences, six have the potential of forming Pu:Pu sheared pairing between the first and fourth nucleotide, thus leaving two unpaired nucleotides at the loops. No SCP1 genes recognizably encode homologues of known terminal proteins (i.e. proteins covalently bound to the 5′ ends of Streptomyces linear replicons; Bao and Cohen, 2001; Yang et al., 2002). The terminal protein of SCP1 is either supplied by the host or not closely similar to known terminal proteins. The latter possibility is quite attractive considering the atypical telomeres of SCP1.
Pseudotelomeres, internal sequences similar to typical Streptomyces telomeres in primary and secondary structures, have been discovered previously in linear plasmid SLP2 (Huang et al., 2003). Two such pseudotelomeres are present in each TIR of SCP1: nt 3974–4234 and 6318–6590 in the left repeat and nt 349 434–349 706 and 351 790–352 050 in the right repeat. The SCP1 pseudotelomeres are very different from the telomere sequences of SCP1. Instead, like typical Streptomyces telomeres, these sequences have the potential to form complex and extensive secondary structures with many hairpins (Fig. 5), all of which contain 3 nt loops mostly of the sequence GCA. Each pseudotelomere contains two Y-shaped secondary structures (‘rabbit ears’) such as are found in typical Streptomyces telomeres and in the 3′ ends of autonomous parvoviruses genomes. The hairpin complexes, like those in SLP2, sit on very long (30 and 39 bp) GC-rich stems. The presence of the pseudotelomeres suggests integration of linear replicons in the past. Interestingly, the pseudotelomeres in SLP2 are also localized near (3.6–5.3 kb) an end.
Partitioning. In S. coelicolor, SCP1 is a low-copy-number plasmid (the copy number was recently recalculated as seven; Yamasaki et al., 2003) that is very seldom lost during sporulation [about 0.1% of S. coelicolor A3(2) spores are SCP1–; Vivian and Hopwood, 1970]. With SCP1.138 and 139 (located about 1 kb away from SCP1.136, which appears to encode a helicase) and SCP1.221 and 222 (about 4 kb from another putative helicase gene, SCP1.217c), SCP1 contains two parAB gene pairs of the kind involved in partitioning of various bacterial chromosomes (the most similar plasmid partitioning systems are the type Ia partitioning loci of low-copy-number circular plasmids such as F and prophage P1, but the SCP1 systems differ from these in the absence of a DNA-binding helix–turn–helix from ParA; Gerdes et al., 2000). Both gene pairs have more than 30% identity through much of their length to their S. coelicolor chromosomal homologues at the amino acid level, and they are more than 50% identical to each other. These genes may be involved in the efficient partitioning of SCP1 at cell division. This raised the possibility that a copy of SCP1 integrated into the chromosome could substitute for the chromosomal genes in the partitioning of chromosomes into spore compartments. Indeed, the introduction of an integrated SCP1NF largely suppressed the sporulation-associated chromosome partitioning defects of chromosomal parAB or parB mutants reported previously by Kim et al. (2000); thus, the frequency of spores with aberrant DNA amounts in spore chains was reduced from ≈ 13% in the SCP1-free parAB mutant to ≈ 4% in the SCP1-containing derivative (Fig. 6).
In type Ia systems of plasmids, to which the SCP1 Par systems are imperfectly related (see above), ParB proteins generally bind to specific DNA sequences close to the parAB locus ( Gerdes et al., 2000). Such parS sites are normally single-copy elements downstream of parB. In the case of bacterial chromosomes, there are usually several parS sites scattered in the region of the replication origin, S. coelicolor currently holding the record with more than 20 (Kim et al., 2000; Jakimowicz et al., 2002). We could detect no S. coelicolor parS-like sites in SCP1.
Plasmid transfer. The transfer of SCP1 into SCP1-free strains is highly efficient (Vivian and Hopwood, 1970). SCP1, like many Streptomyces plasmids, also spreads efficiently within the mycelium of a newly infected recipient. Notably, several similar transfer- and spread-related genes are found in a cluster in both SCP1 and SCP2 (SCP2.20c, traA = SCP1.102; SCP2.22c, traB = SCP1.101; SCP2.23c, traC = SCP1.100; SCP2.27c, spdA = SCP1.98; SCP2.35, korA = SCP1.91c) (Brolle et al., 1993; Haug et al., 2003). This same tra, spd region of SCP1 is part of a longer series of 16 genes with homologues present in the same order and orientation (i.e. showing synteny) in the 94 kb linear plasmid SAP1 of S. avermitilis (Ikeda et al., 2003). The relevant segments of the plasmids are SCP1.91c-122 and SAP1.38c-68. This extended region of synteny is punctuated intergenically by nine non-homologous segments, indicating the occurrence of numerous insertions and deletions since the syntenous regions of the two plasmids diverged.
Other potentially transfer-related genes include SCP1.136, which encodes an 881-amino-acid putative helicase, and SCP1.217c, encoding an N-terminally truncated version of the SCP1.136 gene product (93% identity over 599 residues; the conserved C-terminus contains the helicase-like domain). These genes resemble ttrA, a putative helicase gene of the S. lividans linear plasmid SLP2 that is necessary for conjugal transfer of both SLP2 and the chromosome (Huang et al., 2003). Homologues of ttrA are also present on the chromosomes of S. coelicolor and S. lividans. All these ttrA homologues differ from the SCP1 genes in being terminally located in the respective replicons, and it has been suggested that their products bring about transfer of linear molecules from termini.
SCP1 has a complex evolutionary history
In addition to the presence of two sets of parAB genes, the pseudotelomeres and the presence of many insertion/deletion differences in regions of synteny with other plasmids, there is abundant evidence of complex origins of SCP1. The case for a transfer of an ≈ 37 kb block of DNA from a precursor of the circular plasmid pSV1 of S. violaceoruber to a precursor of SCP1 was presented recently by Yamasaki et al. (2003). The block included the 20–25 kb cluster for methylenomycin biosynthesis and some genes presumptively associated with plasmid replication and partitioning. Subsequent to this transfer, a 42 kb segment of DNA has become inserted between the mmy genes and the parAB, dnaE-containing segment. The SCP1 sequence also records an extensive history of bombardment by transposons, and subsequent recombination events that often involve transposons.
Transposable elements and their association with DNA rearrangements. Three transposable elements, IS466, Tn4811 and Tn5714, have previously been located on SCP1.
IS466 was found at first as a DNA sequence possibly associated with integration of SCP1 into the chromosome in different high-frequency chromosome donor fertility variants derived from S. coelicolor A3(2) (Kendall and Cullum, 1986). In the freely replicating SCP1 (as studied in strain M138), one copy of IS466 is located at the inside end of the right terminal inverted repeat (TIR-R) of SCP1, while two copies are present on the chromosome (Kinashi et al., 1991; Redenbach et al., 1998; Yamasaki et al., 2000). Analysis of SCP1 integration in the NF strain 2612 and an ‘NF-like’ donor strain A634 isolated by Vivian and Hopwood, 1973) showed that IS466 was also implicated in rearrangements of the chromosome (Hanafusa and Kinashi, 1992; Yamasaki et al., 2001). The SCP1 sequence shows that the transposase gene (SCP1.276) of IS466S of SCP1 is flanked by imperfect inverted repeats (IRs) identical at 32/37 positions, which in turn are flanked by sequences that are not direct repeats (DRs): AAACGATT at the left end and CGCAGAAG at the right end. This observation suggests that this copy of SCP1 was not generated directly by transposition of IS466, but was formed by recombination of two copies of IS466 each carrying different DRs. This agrees with the idea that IS466 might have been involved in the acquisition of TIR-R (Hanafusa and Kinashi, 1992).
A copy of Tn4811 is present in each TIR (SCP1.36c-40c and 314–318) (Spychaj and Redenbach, 2001). Tn4811 was first detected in S. lividans 66 as a 5.4 kb DNA fragment spontaneously picked up by plasmid pIJ702, causing the inactivation of the melC operon (Chen et al., 1992). Three copies of Tn4811 were found in S. lividans, one near each end of the linear chromosome and one near the right end of the linear plasmid SLP2. The identity of one end of SLP2 with a segment of the chromosome led Lin et al. (1993) to the finding that the chromosome of S. lividans is linear. Tn4811 contains five genes, among which ORF3 encodes a transposase. Interestingly, in the Tn4811 in the SCP1 TIRs, ORF1 (SCP1.40c and 314) is truncated by deletion of its N-terminal three-quarters, which suggests that Tn4811 on SCP1 was involved in at least two recombination events that caused the truncation of ORF1 and the formation of the present TIRs of SCP1.
Tn5714 (SCP1.80–82) was found as a sequence common to SCP1 and the circular plasmid SCP2 that was able to mediate their co-integration (Kinashi et al., 1993; J. McCormick, J. Dalton, M. Yamasaki, M. Murayama and H. Kinashi, in prep.). Tn5714 contains three genes similar to ORF1, ORF2 and ORF3, respectively, of Tn4811.
The SCP1 sequence revealed traces of seven more possible transposable elements including some carrying a phage-type integrase/recombinase. Two consecutive genes, SCP1.219 (hypothetical protein) and SCP1.220c (transposase, pseudogene), are similar to two chromosomal genes of S. coelicolor, SCO4342 and SCO4344. These two chromosomal genes are separated by a small gene, SCO4343, similar to parts of another transposase gene, SCP1.163, and they are flanked by inverted repeat (IR) sequences and so may constitute a transposon. However, both sides of the SCP1 genes were deleted, and the IR sequences were not detected, which again suggests rearrangements of SCP1.
Taking together all the data on transposable elements of SCP1, the present form of SCP1 was evidently generated by many recombination events involving these elements. This may be true for Streptomyces linear replicons in general including linear chromosomes (Fischer et al., 1998; Pandza et al., 1998; Huang et al., 2003). As several of the SCP1 transposable elements are also represented on the chromosome or on SCP2, it is likely that the three replicons have inhabited the same host for a significant length of time.
Huang et al. (2003) showed that SLP2, a 50 kb linear plasmid of S. lividans, has evolved from at least three replicons, and that much of its genome is involved in plasmid maintenance and transfer functions. In contrast, Mochizuki et al. (2003) found that the 210 kb linear plasmid pSLA2-L of S. rochei is mainly composed of large gene clusters for secondary metabolites; but among the fairly small amount of it devoted to plasmid replication and maintenance, there are three replication origins, again indicating that several replicon fusions have contributed to its present structure. The DNA sequence of SCP1 indicates a very complex evolution. It contains traces of at least two replicons, as indicated by the two sets of partitioning genes and the two pseudotelomeres, and has acquired large numbers of transposable elements. Multiple recombination events, often involving the transposons, must be postulated to account for the organization of the DNA flanking some of these elements. Such events can be observed in the laboratory as interactions of SCP1 with the chromosome to generate novel fertility types ( Vivian and Hopwood, 1973; Hopwood and Wright, 1976; Kinashi et al., 1993; Yamasaki et al., 2001), and with the circular plasmid SCP2 to give co-integrates (Kinashi et al., 1993). The ‘antibiotic island’ containing the large gene cluster for methylenomycin biosynthesis has, in the comparatively recent past, undergone lateral transfer, as shown by the near identity of the equivalent gene set on a quite different plasmid, pSV1 (Chater and Bruton, 1985; Yamasaki et al., 2003). Yamasaki et al. (2003) considered the possibility that an IS117 minicircle-like gene (SCP1.219) and a transposase-like pseudogene (SCP1.220c) in the sequences common to SCP1 and pSV1 (near the left boundaries) might have been involved in the horizontal gene transfer of the common DNA. Two transposase genes (SCP1.214, 215) just outside the common region in SCP1 might also have played a role in the horizontal transfer event.
With such a state of flux, one cannot be confident that natural selection has operated to maintain most or all of the genetic features of SCP1. Nevertheless, a perceptible theme in the plasmid's gene content, which differs strikingly from the gene content of SLP2 (Huang et al., 2003), SAP1 (Ikeda et al., 2003) and pSLA2-L (Mochizuki et al., 2003), the other sequenced large linear plasmids of streptomycetes, is a possible adaptation or contribution to the host's complex developmental biology. For example, in addition to the duplicated sapCED-containing clusters encoding proteins associated with the surface of spores, SCP1 also carries three members of the actinomycete-specific wbl (whiB-like) family of regulatory genes, some members of which are involved in sporulation (Molle et al., 2000; Soliveri et al., 2000). SCP1 encodes the production of an antibiotic, methylenomycin, which appears to inhibit the aerial growth (and some antibiotic production) of SCP1-free derivatives of S. coelicolor (Vivian, 1971). We speculate that, in an intimate mixture of SCP1+ and SCP1– strains, those hyphae that have not been infected by the plasmid would be prevented from contributing to the spore progeny by the action of methylenomycin. SCP1-containing aerial hyphae or spores presumably also acquire novel attributes from their possession of the SapC, D and E proteins. We further suppose that the TTA codons in the methylenomycin and sapCED clusters, and perhaps those in other SCP1 genes including some with probable regulatory function, may in some circumstances serve to delay their expression until the major period of rapid vegetative growth is over, when the levels of the bldA-specified tRNA, needed for translation of UUA codons, increase (Trepanier et al., 1997). In the case of TTA codons in transposase genes, one can readily imagine that transposition would be suppressed in rapid growth conditions, but permitted when growth is severely limited.
The complicated life cycle of the host probably involves different modes of genome partitioning (i.e. during syncytial mycelial growth and branching, and during the subdivision of multigenomic aerial hyphae into unigenomic spore compartments). A large low-copy-number plasmid such as SCP1 might be expected to encode systems to ensure that it, like the chromosome, is reliably inherited by all daughter cells. Indeed, SCP1 contains two parAB gene pairs, at least one of which may contribute to the partitioning efficiency of linked DNA during sporulation septation. This was indicated by an amelioration of the partitioning deficiency of the chromosomes of a chromosomal parAB mutant, when SCP1 was introduced into the chromosome. The chromosomally determined ParB protein of S. coelicolor is believed to interact with about 20 palindromic parS sites near the origin of replication (three of them in the parAB operon) (Kim et al., 2000; Jakimowicz et al., 2002). We did not detect any candidate parS sites in SCP1, so we suppose that the SCP1-specified ParB proteins recognize some different (but possibly related) sequences. In line with this, the predicted DNA recognition helices are not well conserved among the chromosomal and SCP1-determined ParB proteins. The incompleteness of the restoration of efficient partitioning in the NF parAB mutant may indicate that the large number of chromosomal parS sites is important for the very large linear chromosome.
As annotated by Bentley et al. (2002), the chromosome of S. coelicolor contains an exceptionally high number of genes that facilitate interactions with the environment, including the determinants of ABC transporters, ECF sigma factors and sensor kinase-response regulator pairs. SCP1 echoes this likely adaptation to the biologically complex and abiotically variable soil environment: it encodes a further three ECF sigmas, a class whose original definition as ‘extracytoplasmic function’ (Lonetto et al., 1994) has substantially stood the test of time; 14 SCP1 genes are annotated as encoding probable secreted proteins; a 12-gene cluster (SCP1.165–179) may encode a surface-located protein complex; and SCP1 specifies production of the antibiotic methylenomycin and a probable gamma-butyrolactone intercellular signalling system, associated with methylenomycin production (S. O’Rourke and K. F. Chater, unpubl.). It is perhaps surprising that SCP1 does not encode any readily recognizable determinants of specific antibiotic resistance (apart from the methylenomycin resistance determinant embedded in the cluster of cognate production genes) – it might have been anticipated that transmissible resistance genes should be particularly valuable in organisms that are highly likely to encounter diverse antibiotics in their natural environment. Conceivably, most of the naturally encountered antibiotics are different from those exploited in human medicine, and the relevant resistance genes may simply be unrecognizable from our highly selective experience.
Strains, media and culture conditions
The source of SCP1 was the wild-type Streptomyces coelicolor A3(2) (John Innes strain 1147). The S. coelicolor derivative used for the isolation and study of Sap proteins was HU3 (an SCP1NF strain) (Guijarro et al., 1988), which produced higher yields of Saps than other strains tested. The bldA, B, C, D, G and H mutants used to examine the developmental dependence of sapCED transcription were hygromycin-resistant SCP1NF-containing derivatives of the representative bld mutants used by Willey et al. (1993), obtained after mixed culture of the bld mutants with HU3 marked with a xylE–hyg cassette integrated at one of two positions in the lefthand sap gene cluster of SCP1 (the righthand cluster is deleted from SCP1NF strains; Hanafusa and Kinashi, 1992). The expression of xylE was detected by spraying agar-grown cultures with catechol solution (Ingram et al., 1989). Strain J1501 (Kieser et al., 2000) was the morphologically wild-type parent of several of the bld mutants including the bldA39 mutant J1700. The parB- and parAB-deleted M145 derivatives J2537 and J2538 (Kim et al., 2000) were crossed with a differently marked SCP1NF strain (J1508; Kieser et al., 2000), circumstances expected to show close to 100% conversion of the J2537/8 recipient to SCP1NF status. Strains retaining the phenotypic markers of J2537/8 (prototrophic, streptomycin-sensitive) were nearly all defective in extracellular agarase, indicating that they had received the integrated SCP1 together with the associated deletion of the dagA (agarase) gene (Hodgson and Chater, 1981). One such strain from each cross (J3301, ΔparB; J3300, ΔparAB) was used to evaluate the efficiency of chromosome partitioning during sporulation on minimal medium, essentially as described by Kim et al. (2000). The medium for all other surface cultures was the rich medium R2YE (Kieser et al., 2000). Growth was at 30°C.
Sequencing of SCP1
SCP1 was isolated from S. coelicolor A3(2) by PFGE, essentially as described by Redenbach et al. (1998). The methods of Harris and Murphy (2001) were used for sonication to produce 1.4–2 kb fragments, library preparation in either M13 or pUC18 vectors, clone growth and isolation and sequencing. 75 122 bp of the sequence at each end were represented by about twice as many sequence reads as the 205 779 bp of DNA located in the unique central region of the plasmid. Moreover, no more heterogeneity at any base position was obtained among sequence reads from the terminal inverted repeats (TIRs) than among those from the central region, indicating that the two TIRs are identical [except that the variation in the number (4–6) of G residues at the 5′ ends reported by Kinashi et al. (1991) was not excluded]. The sequence was assembled, finished and annotated as described previously (Cole et al., 1998), using artemis (Rutherford et al., 2000; http://www.sanger.ac.uk/Software/Artemis) to collate data and facilitate annotation.
The extraction of spore-associated proteins (Saps) from spores of S. coelicolor strain HU3, using a non-lethal detergent wash, their separation by polyacrylamide gel electrophoresis and N-terminal sequencing, the use of this information for the design of oligonucleotides and the use of the oligonucleotides in Southern blots and as probes to clone the Sap determinants were essentially as described by Guijarro et al. (1988).
Northern blot hybridization was done essentially as described by Virca et al. (1990), with RNA isolated from cultures of strain HU3 grown for different lengths of time on the surface of agar medium R2YE until sporulation or in the liquid medium YEME. To locate the 5′ end of the mRNA encoding SapC, D and E, we used S1 nuclease protection. The S1 nuclease protection buffer was as in Maniatis et al. (1982), and the hybridization buffer was as in Favaloro et al. (1980), with probes labelled at the NotI site 382 bp downstream of the SCP1.59c/294 start codon or the HindIII site 199 bp downstream of the sapC start codon. We confirmed the result by primer extension (McCormick et al., 1991), with an oligonucleotide primer (21C92; 5′-GGATAGCTCATGACGT GCTCC-3′) that gave an extension product of 79 nt with M-MLV reverse transcriptase (Life Technologies).
Construction of xylE operon fusions
An xylE–hyg cassette was introduced into SCP1NF by single cross-over homologous recombination, using derivatives of pJR57 [8.15 kb HindIII–XhoI fragment of pSC101 (Cohen et al., 1973) containing xylE–hyg as a HindIII–XhoI fragment from pKC1053 (Kuhstoss and Rao, 1991)]. xylE was integrated at the HindIII site in sapC by transforming HU3 to hygromycin resistance using pJR59 (pJR57 with a 3.6 kb SacI–HindIII fragment), resulting in strain HU26. The xylE fusion was also integrated at the NotI site in SCP1.59c/294 essentially by deleting the DNA between NotI and HindIII in pJR59, creating pJR76. HU3 was transformed with pJR76 to hygromycin resistance. In this case, most transformants did not have high-level expression of xylE. Among the few that did, one, HU88, was used for further work and had apparently been generated by homogenotization between the wild-type sequence and a point mutation present in SCP1NF of HU3 (see Results).
We would like to acknowledge the support of the Wellcome Trust Sanger Institute core sequencing and informatics groups. This work was funded by BBSRC grants to B.G.B. and K.F.C., NSF grant MCB-9727234 to R.L., NIH post-doctoral grant F32 G12961 to J.R.M., National Research Council (R.O.C.) grant NSC91-2321-B010-001 to C.W.C., Marie Curie Fellowship CT-2002-01676 of the EU Commission to D.J., and the John Innes Foundation. We thank David Hopwood for help in establishing this work and for valuable comments.