• Open Access

Natural products genomics


  • Roland J. Siezen,

    Corresponding author
    1. Kluyver Centre for Genomics of Industrial Fermentation; TI Food and Nutrition, 6700AN Wageningen, the Netherlands.
    2. NIZO food research, 6710BA Ede, the Netherlands.
    3. Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, 6500HB Nijmegen, the Netherlands.
      *E-mail r.siezen@cmbi.ru.nl.
    Search for more papers by this author
  • Barzan I. Khayatt

    1. Kluyver Centre for Genomics of Industrial Fermentation; TI Food and Nutrition, 6700AN Wageningen, the Netherlands.
    2. NIZO food research, 6710BA Ede, the Netherlands.
    3. Center for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, 6500HB Nijmegen, the Netherlands.
    Search for more papers by this author

*E-mail r.siezen@cmbi.ru.nl.

Secondary metabolites (or natural products) are often synthesized by multi-modular, multi-domain proteins called non-ribosomal peptide synthetases (NRPS) and polyketide synthases (PKS). Various well-known metabolites produced by microorganisms are listed in Table 1, and examples of structures are shown in Fig. 1. In particular, Streptomyces species are known for their ability to produce a wide variety of secondary metabolites such as antibiotics, herbicides, parasitocides, siderophores and pharmacologically active substances including antitumour agents and immunosuppressants. Genome sequencing of Streptomyces coelicolor (Bentley et al., 2002) and S. avermitilis (Omura et al., 2001) revealed over 20 gene clusters for biosynthesis of secondary metabolites, while only a few of their natural products were known prior to sequencing. High-throughput genome sequencing of hundreds of other bacterial species and strains is now rapidly increasing the repertoire of identified gene clusters for biosynthesis of natural products (Donadio et al., 2007). Here we give a brief update of the current status of genome mining and bioinformatic tools to identify novel NRPS and PKS systems.

Table 1.  Examples of microbial natural products produced by NRPS/PKS systems.
Natural productMicroorganismNRP/PK
  1. Bacteria unless otherwise indicated.

 PenicillinPenicillium chrysogenum (fungi)NRP
 BacitracinBacillus licheniformisNRP
 TyrocidinBacillus brevisNRP
 CephalosporinStreptomyces clavuligerusNRP
 ErythromycinSaccharopolyspora erythreaPK
 TetracyclineStreptomyces aureofaciensPK
 ActinomycinStreptomyces chrysomallusNRP
Antitumour agents
 Dolastatin 10Symploca species (cyanobacteria)NRP
 BleomycinStreptomyces verticillusHybrid NRP/PK
 ChondramideChondromyces crocatusHybrid NRP/PK
 EpothiloneSorangium cellulosumHybrid NRP/PK
 CyclosporinTolypocladium inflatum (fungi)NRP
 RapamycinStreptomyces hygroscopicusPK
 FK506Streptomyces sp.PK
 FK520Streptomyces hygroscopicusPK
Protease inhibitors
 AnabaenopeptinAnabaena flos-aquae (cyanobacteria)NRP
 OscillamideOscillatoria agardhii (cyanobacteria)NRP
 MycobactinMycobacterium tuberculosisNRP
 BacillibactinBacillus subtilisNRP
 EnterobactinEscherichia coliNRP
 YersiniabactinYersinia pestisHybrid NRP/PK
 MycolactoneMycobacterium ulceransPK
 NaphthazarinsFusarium oxysporum (fungi)PK
 HC-toxinCochliobolus carbonum (fungi)NRP
Figure 1.

Examples of some chemical structures of (A) polyketides, (B) non-ribosomal peptides and (C) mixed NRP-PK compounds. Reprinted with permission from Watanabe and Oikawa (2007). Copyright Royal Society of Chemistry.

Polyketide and non-ribosomal peptide biosynthesis

Both NRPS and PKS systems are molecular assembly lines for successive linking of multiple-amino/hydroxy acids or acyl-CoA precursors, respectively, into complex polymers which are often further modified into unique structures (Table 1, Fig. 1). The basic steps of both systems are initiation, elongation and termination performed by separate modules of the synthases (Fig. 2). These modules and others are usually encoded in large gene clusters (Khosla et al., 1999; Crosa and Walsh, 2002; Donadio et al., 2007; Rokem et al., 2007).

Figure 2.

Basic steps during (A) non-ribosomal peptide synthesis and (B) polyketide synthesis. Adapted with permission from Donadio and colleagues (2007). Copyright Royal Society of Chemistry.

Non-ribosomal peptide synthetase modules can contain four principal domains (Fig. 2A): an adenylation domain (A) that selects, activates and loads the building blocks (proteinogenic and non-proteinogenic amino acids or carboxylic acids), a thiolation domain (T), also known as peptidyl carrier protein (PCP) that covalently fixes the amino acid on the synthetase, a condensation domain (C) that catalyses the peptide bond formation, and a thio-esterase domain (Te) that releases the assembled peptide from the synthetase (Sieber and Marahiel, 2005; Wenzel and Muller, 2005). The diversity in structure and composition of the products is achieved due to different specificities of the A domains and further modifications by gene cluster-embedded or stand-alone additional domains such as methyltransferase (MT), epimerization (E), cyclization (Cy) and others (Walsh et al., 2001). The assembled final peptide structures range from linear [such as the pentadecapeptide gramicidin (Kessler et al., 2004)], to branched [such as vibriobactin (Keating et al., 2000)], partially cyclic [such as daptomycin (McHenney et al., 1998)], cyclic [such as gramicidin S (Erlanger and Goode, 1960)] or bicyclic [such as actinomycin (Pfennig et al., 1999)].

Polyketide synthase modules can contain four core domains (Fig. 2B): an acyltransferase (AT) domain that selects and activates the acyl-CoA building blocks (such as acetyl-CoA, malonyl-CoA, methylmalonyl-CoA and ethylmalonyl-CoA), an acyl carrier (ACP) domain, a keto-acylsynthase (KS) condensation domain and a releasing thio-esterase (Te) domain. The modules may contain other modification domains such as ketoreductase (KR), dehydratase (DH) and enoylreductase (ER). Polyketide synthases generate enzyme-bound ketoacyl intermediates in stepwise decarboxylative condensations between the extender building blocks and the growing polyketide chain in a process similar to fatty acid synthesis. An example of such an assembly process is shown in Fig. 3.

Figure 3.

Biosynthetic pathway, module and domain organization of two polyketide synthases (Type I) (MlsA1 and MlsA2) responsible for mycolactone core biosynthesis in Mycobacterium ulcerans. Reprinted with permission from http://www.med.monash.edu.au/microbiology/research/stinear.html.

Prediction of structure of non-ribosomally synthesized peptides

In most of the NRPS systems known so far, the order and structure of building blocks present in the secondary polypeptide product are reflected by the modular architecture of the NRPS. This relation between the template and the product is referred to as co-linearity rule. The specificity of A domains as well as the role of the other modifying domains will specify the composition of the produced polypeptide. General rules for predicting substrate specificity of A domains were initially developed based on the crystal structure of an adenylation domain of gramicid in synthetase (Stachelhaus et al., 1999; Challis et al., 2000). The NRPSpredictor (http://www.ab.informatik.uni-tuebingen.de/toolbox) uses transductive support vector machines (TSVMs) as a predictive tool for detecting substrate specificities of A domains (Rausch et al., 2005) based on the physicochemical properties of substrate-binding pocket residues.

In silico genome screening for NRPS/PKS gene clusters

There are several bioinformatic tools available for searching NRPS/PKS systems in genome sequences. The NRPS-PKS tool is web-based software (http://www.nii.res.in/nrps-pks.html) for analysing the large multi-enzymatic, multi-domain megasynthases (Ansari et al., 2004). The results of these analyses have been organized as four searchable databases for elucidating domain organization and substrate specificity of NRPS and PKS. These databases provide an interface to correlate chemical structures of these natural products with the domains and modules in the corresponding PKS or NRPS. ASMPKS is a web-based tool (http://gate.smallsoft.co.kr:8008/~hstae/asmpks/index.html) for computational analysis of PKS systems against genome sequences (Tae et al., 2007). The ASMPKS can predict functional modules for each protein sequence, estimate the chemical composition of a polyketide synthesized from the modules, and display the carbon chain structure on the web interface. Another recent method to accurately predict PK/NRP structures from genome sequences is described by Minowa and colleagues (2007). Norine (http://bioinfo.lifl.fr/norine) is a platform that includes a database of non-ribosomal peptides (currently more than 700) together with tools for their analysis. The Norine database stores peptide structures as well as various annotations such as the biological activity, producing organisms, bibliographical references and others (Caboche et al., 2008).

Analysis of over 220 completed bacterial genomes up to 2005 revealed that PKS and NRPS systems are mainly found in actinobacteria, β-proteobacteria, γ-proteobacteria, firmicutes and cyanobacteria (Donadio et al., 2007). We have now analysed the 140 most recently sequenced microbial genomes (July 2007–April 2008; GOLD database http://www.genomesonline.org/) using Hidden Markov Model profiles of all core domains of both NRPS and PKS. Many of these genomes are publicly accessible in the NCBI database but have not been described in the scientific literature yet (Siezen and Wilson, 2008). Numerous NRPS/PKS systems were found, and Table 2 lists the genomes with three or more systems; several are described in more detail below. They are mainly found in microorganisms with genomes larger than 4 Mb isolated from soil or aquatic environments. In addition, at least two NRPS or PKS systems are predicted in Yersinia pseudotuberculosis IP 31758, Azorhizobium caulinodans ORS 571, Marinomonas sp. MWYL1 and Bacillus cereus cytotoxis NVH 391-98, while at least one system is predicted in Escherichia coli HS, Coxiella burnetii Dugway 7E9-12, Enterobacter sakazakii ATCC BAA-894, Staphylococcus aureus ssp. aureus Mu3, Vibrio harveyi BB120, Serratia proteamaculans 568, Delftia acidovorans SPH-1, Salmonella enterica arizonae sv. 62:z4,z23 RSK2980, Klebsiella pneumonia MGH78578 and Kineococcus radiotolerans SRS30216. Quite a number of the latter bacteria are human pathogens.

Table 2.  Recently sequenced bacterial genomes (1 July 2007 to April 2008) with at least three predicted NRPS/PKS gene clusters.
SpeciesHabitatGenome size (Mb)Gene clusters (predicted)Reference and/or NCBI code
Sorangium cellulosum So ce56Soil13.03 NRPSSchneiker et al. (2007)
6 PKSNC_010162
Salinispora tropica CNB-440Marine, sediment5.23 NRPSUdwary et al. (2007)
6 PKSNC_009380
Streptomyces griseus IFO13350Soil8.59 NRPSOhnishi et al. (2008)
5 PKSNC_010572
Salinispora arenicola CNS205Marine, sediment5.84 NRPSNC_009953
2 ambiguous PKS
Frankia sp. EAN1pecPlant symbiont, soil9.02 NRPSNC_009921
5 ambiguous PKS
Bacillus amyloliquefaciens FZB42Rhizosphere-colonizing, soil3.94 NRPSChen et al. (2007)
2 PKSNC_009725
1 ambiguous NRPS
Herpetosiphon aurantiacus ATCC 23779Aquatic6.45 NRPSNC_009972
Pseudomonas aeruginosa PA7Soil, aquatic, host (human)6.65 NRPSNC_009656
Xanthobacter autotrophicus Py2Soil, aquatic, sediment4.82 NRPSNC_009720
2 ambiguous PKS
Clostridium kluyveri DSM 555Aquatic, mud4.01 NRPSSeedorf et al. (2008)
3 NRPS/PKSNC_009706
Bacillus pumilus SAFR-032Soil3.72 NRPSGioia et al. (2007)
1 NRPS/PKSNC_009848
Citrobacter koseri ATCC BAA-895Soil, aquatic, food, human intestine4.73 NRPS/PKSNC_009792

Recently sequenced microbial genomes with large potential for production of NRPS/PKS natural products

Sorangium cellulosum is a soil-dwelling δ-proteobacterium of the group myxobacteria. The genus Sorangium synthesizes approximately half of the secondary metabolites isolated from myxobacteria, including the anticancer metabolite epothilone. Seventeen secondary metabolite loci are encoded in the genome of strain So ce56 (Schneiker et al., 2007), mostly PKS and NRPS systems (Table 2). Known products are chivosazol, etnangien and myxochelin, while others are still unknown. Metabolites secreted by S. cellulosum known as epothilones have been noted to have antineoplastic activity. This has led to the development of analogues that mimic its activity. One such analogue, known as Ixabepilone, is a US Food and Drug Administration (FDA)-approved chemotherapy agent for the treatment of metastatic breast cancer.

The soil actinomycete Streptomyces griseus produces the well-known antituberculosis agent streptomycin. Recent sequencing of the genome of S. griseus IFO 13350 shows that it has 34 gene clusters or genes for biosynthesis of secondary metabolites, of which 14 PKS or NRPS gene clusters seem to be specific for this species (Ohnishi et al., 2008). These clusters presumably direct the synthesis of various as yet unknown secondary metabolites.

Actinomycetes of the marine-dwelling genus Salinispora are a rich source of drug-like molecules. Salinispora strains are commonly isolated from tropical marine sediment, and many isolates produce compounds that inhibit cancer cells, such as salinosporamide A (Feling et al., 2003). The Salinispora tropica CNB-440 genome dedicates nearly 10% of its genome to natural product assembly (Udwary et al., 2007), which is greater than S. coelicolor and S. avermitilis as well as other secondary metabolite-producing actinomycetes. The S. tropica genome features PKS systems of every known formally classified family, NRPS systems and several hybrid clusters. The majority of the 17 biosynthetic loci are novel. Genome sequencing is ongoing of Salinispora arenicola CNS-205, a producer of the bioactive compounds staurosporine and rifamycin which may be useful in the treatment of cancer. Other marine actinobacteria are also potential sources of bioactive natural products (Bull and Stach, 2007)

Frankia species form a separate lineage among the high % G+C Gram-positive Actinobacteria. They are filamentous ‘euactinomycetes’ that grow by hyphal branching and tip extension and thus resemble the antibiotic-producing Streptomyces species. Frankia species form a symbiotic nitrogen-fixing association with a number of plants. These symbioses add a large proportion of new nitrogen to several ecosystems. The genome of Frankia sp. strain EAN1pec has all housekeeping genes necessary for saprophytic existence plus genes for sporulation, vesicle development, symbiosis, N2 fixation and secondary metabolite production. Ten putative NRPS/PKS clusters were identified in the genome sequence of strain EAN1pec.

Bacillus amyloliquefaciens is a Gram-positive bacterium belonging to the firmicutes. It is member of a group of free-living soil bacteria known to promote plant growth and suppress plant pathogenic bacteria and fungi. The B. amyloliquefaciens FZB42 genome reveals an unexpected potential to produce secondary metabolites, with more than 8.5% of the genome devoted to synthesizing antibiotics and siderophores by NRPS and PKS pathways (Chen et al., 2007). Besides five gene clusters known from Bacillus subtilis to mediate biosynthesis of secondary metabolites (surfactin, fengycin, bacillibactin, bacilysin, bacillaene), an additional four giant gene clusters were identified for biosynthesis of bacillomycin D, macrolactin, difficidin and a putative siderophore. Bacillus spores are notoriously resistant to unfavourable conditions such as UV radiation, γ-radiation, H2O2, desiccation, chemical disinfection or starvation. Bacillus pumilus SAFR-032 spores and vegetative cells exhibit elevated resistance to UV radiation and H2O2 compared with other Bacillus species, and its genome sequence provides insight into numerous DNA repair and oxidative stress pathways (Gioia et al., 2007). It also encodes three NRPS/PKS systems of unknown function.

Clostridium kluyveri DSM555, a strictly anaerobe Firmicute, was isolated from canal mud in the Netherlands. It is unique among clostridia in that it can grow on ethanol and acetate as sole energy sources, producing butyrate, caproate and H2 (Seedorf et al., 2008). Furthermore, it is biotechnologically interesting as the genome sequence predicts that it could ferment ethanol and glycerol to 1,3-propanediol. Quite unexpected in an anaerobe Firmicute is the presence of three hybrid PKS-NRPS clusters of unknown function, and one NRPS gene cluster which is predicted to synthesize a yersiniabactin/pyochelin-like siderophore.

Chloroflexi are a class of eubacteria that produce energy through photosynthesis. They make up the bulk of the filamentous anoxygenic phototrophs (formerly known as green non-sulfur bacteria). The phylum Chloroflexi accommodates additional genera, including filamentous but non-phototrophic species. Herpetosiphon aurantiacus is a non-phototrophic, strictly aerobic, gliding bacterium. Herpetosiphon spp. have been found in soil, freshwater and sewage treatment plants and grow in microbial mats. The genome of H. aurantiacus strain ATCC 23779 is predicted to encode nine NRPS/PKS systems of unknown function.

Xanthobacter autotrophicus, an α-proteobacterium, is a nitrogen-fixing methylotroph, commonly isolated from organic-rich soil, sediment and water. Xanthobacter autotrophicus strain Py2 is unique in that it can use propene as a sole carbon and energy source, converting it to epoxypropane using an alkene-specific monooxygenase. The monooxygenase gene and other genes involved in alkene degradation are located on a 320 kb megaplasmid. The genome sequence provides further information on the production and regulation of the genes involved in alkene degradation. The genome also has five putative NRPS/PKS gene clusters of as yet unknown function.

High-throughput experimental screening for NRPS/PKS gene clusters

The newly discovered gene clusters for NRP and PK synthesis represent a tremendous source of novel bioactive compounds, but in most cases the natural product is unknown. Classical methods to characterize the products include heterologous expression of gene clusters (Wenzel and Muller, 2005), metabolic profiling and assay-guided fractionation (Zazopoulos et al., 2003; McAlpine et al., 2005). A novel ‘genomisotopic’ approach uses a combination of genomic sequence analysis and isotope-guided fractionation to identify unknown compounds synthesized by NRPS gene clusters (Gross et al., 2007). A phage-display method was developed for high-throughput mining of gene clusters encoding PKS and NRPS systems, which can be applied to genomes of unknown sequence and metagenomes (Yin et al., 2007), providing opportunities for exploiting the potentially rich source of natural products from unculturable microbes.

Novel natural products and applications

The past decade has already seen numerous examples of genetic engineering, metabolic engineering, rational design, and directed evolution of NRPS and PKS systems to provide novel compounds based on known NRPS/PKS gene clusters for biosynthesis of natural products (Stachelhaus et al., 1996; Cane et al., 1998; Chartrain et al., 2000; Du and Shen, 2001; Du et al., 2001). The impact of systems biology to control and regulate secondary metabolite production has only recently been addressed (Rokem et al., 2007). The ever-increasing pace of microbial genome sequencing is revealing a plethora of new NRPS/PKS gene clusters, mostly of unknown function. A major challenge for the next decade is to back this up with characterization of the chemical structures and biological activities of these secondary metabolites, so that we can chart Nature's unique repertoire of natural products and exploit them for the directed synthesis of novel molecules of biotechnological, agricultural and pharmaceutical utility.


We thank Kenji Watanabe, Stefano Donadio and Tim Stinear for permission to use Figs 1–3, respectively, and Greer Wilson for reading and correcting the manuscript. This project was carried out within the research programme of the Kluyver Centre for Genomics of Industrial Fermentation which is part of the Netherlands Genomics Initiative/Netherlands Organization for Scientific Research.