• metagenomics;
  • Cyanobacteria;
  • freshwater;
  • Cylindrospermopsis;
  • Australia


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

High molecular weight (HMW) DNA prepared from a toxic freshwater cyanobacterial bloom sample was used to construct a PCR-generated 75-clone, 16S rRNA gene library and a 2850-clone bacterial artificial chromosome (BAC) library. Phylogenetic analysis of the 16S rRNA gene library demonstrated that members of eight phyla of domain Bacteria, which included Cyanobacteria, Actinobacteria, Verrucomicrobium, Bacteriodetes, Planctomycetes, Chloroflexi, Candidate Division OP10 and Alpha-, Beta- and Gammaproteobacteria, were present in the bloom community. Diversity estimates determined from 16S rRNA gene analysis and direct cell counts and morphological identification of phytoplanktons suggested that the bloom community was dominated by members of the genera Aphanizomenon and Cylindrospermopsis, phylum Cyanobacteria. BAC-end sequencing of 37 randomly selected clones and subsequent sequence analysis provided a snapshot of the total bloom community putative metabolic activities. The sequencing of the entire inserts of seven clones (clones designated 578, 67, 142, 543, 905, 1664 and 2089) selected from BAC-end sequence studies resulted in the generation of a total of 144-kb sequence data and in the identification of 130 genes for putative proteins representing at least four phyla, Proteobacteria, Actinobacteria, Bacteroidetes and Cyanobacteria. This is the first report on a snapshot analysis of a limited metagenome of a toxic cyanobacterial freshwater bloom.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Cyanobacteria are widely distributed in nature and represent one of the major, diverse and most ancient lineages of the domain Bacteria. Cyanobacterial blooms occur as a result of the rapid growth of cyanobacteria with complex communities of photosynthetic and heterotrophic microorganisms, and are often associated with the production of a range of secondary metabolites. Some secondary metabolites are thought to possess useful therapeutic properties (anticancer, antibacterial, antifungal and protease inhibition activities) (Namikoshi & Rinehart, 1996), some are regarded as nuisance taste and odor compounds (geosmin) (Izaguirre & Taylor, 2004) and yet others are toxins (Carmichael et al., 1988). Of the latter, hepatotoxins, neurotoxins and dermatoxins are commonly produced by members of the cyanobacterial genera Anabaena, Microcystis, Aphanizomenon and Cylindrospermopsis (Wiegand & Pflugmacher, 2005). Our inability to culture up to 95% of microorganisms from any given environment (Curtis et al., 2002; Torsvik et al., 2002) is a significant bottleneck in developing a better understanding of the microbial community structure and metabolic activities such as those from cyanobacterial bloom communities. Therefore, culture-independent molecular methods, which rely on the studies of DNA extracted directly from environmental microbial biomass, have become the methods of choice for studying such complex ecosystems. Consequently, we have initiated culture-independent studies of a cyanobacterial bloom community from Lake Samsonvale, an Australian subtropical Southeast Queensland freshwater lake, which supplies drinking water to the city of Brisbane. South East Queensland Water Corporation (SEQ Water), Brisbane, Australia, the stakeholder of the resource, routinely monitors the water quality and has reported that the toxin-producing cyanobacteria Aphanizomenon and Cylindrospermopsis dominate the planktonic blooms of the lake. An artificial destratificaton system installed in 1995 has not successfully controlled the blooms. In order to initiate a better understanding of the community structure and function of the blooms, we have constructed a large-insert DNA bacterial artificial chromosome (BAC) library and a 16S rRNA gene PCR library from the high molecular weight (HMW) bloom DNA. Selected BAC clones (67 BAC-end sequence tags and seven complete inserts) and 75 clones from the 16S rRNA gene library were sequenced to assess putative functions of genes and gene clusters and to determine the phylogenetic diversity of the bloom community. Here we report on the current status of the cyanobacterial bloom metagenome project, which to our knowledge constitutes the first report on a snap-shot of the metagenome of a toxic cyanobacterial freshwater bloom.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Sample collection and counting and identification of phytoplanktons

Cyanobacterial bloom samples were collected from North Pine Dam (also known as Lake Samsonvale and located at 27°15′S, 152°55′E, Brisbane, Australia) using a 3-m-long hosepipe sampler (5 cm diameter) at predetermined locations (designated as sites 10001S and 10006S by SEQ Water Corporation). These sites were regularly monitored for the numbers and identity of phytoplanktonic cells. Several liters of water was collected, transported to the laboratory and used within 24 h. Total cell counts of the phytoplankton community and the morphological identification of various cyanobacterial species of this bloom community were performed by SEQ Water Corporation as described (Burford & O'Donohue, 2006) for routine assessment of the health and safety risk posed to the public.

Extraction of HMW DNA

The bloom sample (2 L) was centrifuged at 18 000 g for 4 h at 4 °C, the cell pellet was resuspended in 1 mL of STE buffer (1 M NaCl, 10 mM Tris-HCl, 1 mM EDTA pH 8.0) and mixed with an equal volume of 1.8% low melting temperature (LMT) agarose (FMC Corp., Philadelphia, PA); the mixture was cast into molds (Bio-Rad Corp., Hercules, CA) and set to form plugs. HMW DNA was prepared by in situ lysis of the cells in the plugs (Stein et al., 1996) and was used in the construction of both the BAC library and the 16S rRNA gene library.

Construction, sequencing and analysis of the PCR-based 16S rRNA gene library

DNA used for 16S rRNA gene PCR was prepared by digesting lysed gel plugs with 1 U of GELase (Epicentre Technologies Inc., Madison, WI) followed by an equal-volume phenol chloroform extraction and ethanol precipitation in the presence of 3 M ammonium acetate. 16S rRNA gene was amplified by PCR in reactions (50 μL) containing 100 ng of template, 1 U of Taq polymerase (Promega Corp., Madison, WI), 25 mM dNTP (Promega Corp.) and 50μM each of 16S rRNA gene specific primers, FD1 and RD1 (Redburn & Patel, 1993), in a Corbett thermal cycler (Corbett Research, Mortlake, NSW) with initial denaturation at 95 °C for 2 min followed by 30 cycles at 95 °C for 1 min, 55 °C annealing for 1 min and 72 °C extension for 90 s. PCR amplified products were analysed by agarose gel electrophoresis and purified using Qiaquick® purification Qiagen spin columns as per the manufacturer's instructions (Qiagen Pty Ltd, Clifton Hill, Vic., Australia). Recovered PCR fragments were cloned into pGEM-T vector using the pGEM-T easy cloning kit (Promega Corp.) and transformed into Escherichia coli JM109 cells (Promega Corp.) as per the manufacturer's instructions. Plasmid DNA was purified using QIAprep® miniprep columns following the manufacturer's instructions (Qiagen Pty Ltd). The purity and concentrations of PCR products and plasmid DNA were checked using agarose gel electrophoresis using routine methods (Sambrook et al., 1989). The purified plasmids were sequenced using an ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction kit and electrophoresed on an Applied Biosystems 377 DNA sequencer with 96-lane upgrade (Spanevello & Patel, 2004). DNA sequences were edited manually using bioedit (Hall, 1998) and aligned using clustalw. Distance matrix files were generated in DNADIST (Felsenstein, 2006) using the neighbor-joining method and imported into DOTUR (Schloss & Handelsman, 2005) to assign sequences to operational taxonomic units (OTUs). Phylogenetic affiliation was performed using the RDP classifier (Wang et al., 2007) to assign 16S rRNA gene sequences to the taxonomical hierarchy proposed in Bergey's Manual of Systematic Bacteriology, 2nd Edn. Genus affiliation of 16S rRNA gene sequences belonging to the phylum Cyanobacteria was additionally carried out using sequence comparison with the GenBank databases using the blastn program (Altschul et al., 1990). Phylogenetic trees were constructed by the neighbor-joining method using the treeconw package (Van de Peer & De Wachter, 1994).

BAC library construction

Partial digestion of the HMW DNA was performed with 5 U μL−1 BamHI (Promega Corp.) and DNA fragments were separated using a purpose built in-house contour-clamped homogeneous electric field (CHEF). For this, 1% low melting point (LMP) agarose gels were prepared and run at 12 °C in 0.5X TBE buffer (Sambrook et al., 1989) at 6.0 V cm−1 with a pulse time of 30 s for 16 h. DNA in the size range of 100–300 kb was excised from the gel and purified with 1 U of GELase (Epicentre Technologies) as per the manufacturer's recommendations. Size selected DNA (100 ng) was ligated to a predephosphorylated pIndigoBAC-5 vector (Epicentre Technologies Inc.) in a molar ratio of 10 : 1 using 4 U μL−1 T4 DNA ligase (Promega Corp.) as per the manufacturer's protocol. Ligation reactions were drop-dialysed for 2 h to remove salts and other small molecules by transferring reaction volume onto a Millipore 0.025 μm VSWP filter floating on 0.5X TE buffer. The retanate (2 μL) was used to transform 30 μL of TransforMAX® EC100 Electrocompetent cells (Epicentre Technologies Inc.) at 2.5 kV, 25 μF capacitance and 1000 resistance with time constants ranging from 4.3 to 4.8 s. Following electroporation, the cells were suspended in 1 mL of SOC medium (2% tryptone, 0.5% yeast extract, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2, 10 mM MgSO4 and 20 mM glucose), incubated for 90 min at 37 °C with shaking and 200 μL of sample was plated on LBc (Luria–Bertani medium containing 12.5 μg L−1 chloramphenicol) agar plates supplemented with isopropyl-β-d-thiogalactoside and X-Gal. Positive clones were picked after overnight incubation at 37 °C using blue/white color selection, and a BAC library, designated CBNPD1, consisting of 2850 clones was constructed.

BAC library survey – insert size estimation and BAC-end sequencing

For insert size estimation of the library and BAC-end sequencing 37 clones were randomly selected. BAC DNA was purified using SV wizard miniprep spin columns (Promega Corp.), digested with 5 U Bgl II for 2 h at 37 °C and separated using PFGE (a pulse time of 5–15 s for 16 h).

BAC-end sequencing reactions were performed using vector primers T7 or RPII (Epicentre Technologies Inc.) with 5 ng μL−1 template DNA. BAC-end sequences were analysed by translated similarity searches against the GenBank nr database using blastx. COG (Clusters of Orthologous Genes of proteins) functions of genes found to affiliate with translated BAC-end sequences were identified using the integrated microbial genomes (IMG) system v 1.3 (Markowitz et al., 2006). Sequence data were also processed with GeneMark (Borodovsky & McIninch, 1993) in order to identify ORFs.

Sequencing and analysis of entire BAC DNA inserts

BAC DNA was purified from 100 mL cultures using QIAfilter midipreps as per the manufacturer's recommended protocols (Qiagen Pty Ltd). Sequencing of the entire BAC DNA inserts was performed using a primer-walking approach in which the sequencing data generated from the vector primers, T7 and RPII (Epicentre Technologies Inc.), were used in primer3 software to design further primers (Rozen & Skaletsky, 2000). Contig assembly was carried out manually using bioedit (Hall, 1998). ORFs were identified and COG (clusters of orthologous groups) function assigned using Glimmer ( and BASys ( The following criteria were used to determine possible genuine ORFs from among all the potential ORFs detected: only small or nonoverlapping sequences longer than 50 amino acids were retained and in instances in which putative ORFs were detected in more than one reading frame, only those with known homologs were retained. Sequence identity between ORFs and published protein sequences was established using blastp (Altschul et al., 1990). Protein sequences were retrieved using the blast algorithm (Altschul et al., 1990) with nonredundant protein databases (compiled from swissprot, trembl and pir). Protein trees were reconstructed by distance methods using the treeconw package (Van de Peer & De Wachter, 1994).

Results and discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Microscopy and 16S rRNA gene phylogeny of the freshwater bloom sample

Morphological examination, identification and cell counts of the phytoplankton present in the bloom sample revealed that Cyanobacteria were the dominant members with Aphanizomenon and Cylindrospermopsis making up 69% of the total phytoplankton community counted (Fig. 1). Further support for this observation came from the sequencing of the 75-clone 16S rRNA gene library, which indicated that the most represented phylum in this bloom sample was the Cyanobacteria (31 clones), of which the majority of the sequences were affiliated to members of the genera Aphanizomenon and Cylindrospermopsis (Fig. 1). Cyanobacterial blooms are a complex mixture of dominant cyanobacterial and associated nonphotosynthetic bacterial communities. Also present in the bloom community was an abundance of heterotrophic bacteria that was found to affiliate to seven different phyla (Fig. 2). More specifically, members of the Proteobacteria (18 clones), Actinobacteria (17 clones), Verrucomicrobia (four clones), Bacteroidetes (two clones), Planctomycetes (one clone), Chloroflexi (one clone) and Candidate Division OP10 (one clone) were identified.


Figure 1.  Percentages of total cyanobacterial cells identified microscopically and counted from a collected bloom sample (pattern) and 16S rRNA gene clones affiliated to various genera from the phylum Cyanobacteria which were recovered from extracted DNA of the same sample (filled). Total phytoplanktonic cell count estimates demonstrated that members of the genera Aphanizomenon and Cylindrospermopsis dominated the cyanobacterial bloom sample. 16S rRNA gene analysis additionally supported that the most dominant members of the phylum Cynaobacteria comprised members of the genera Aphanizomenon and Cylindrosperompsis. Percentages calculated with respect to the number of clones affiliated to the phylum Cyanobacteria and not to the total number of clones. This bloom sample was used for construction of a BAC metagenomic and a 16S rRNA gene library.

Download figure to PowerPoint


Figure 2.  Neighbor-joining dendrogram showing the OTUs assigned from 16S rRNA gene sequences recovered from an Aphanizomenon and Cylindrospermopsis dominated cyanobacterial bloom sample in relation to their closest relatives in the domain Bacteria. The habitat sources of the cloned environmental sequences are indicated in brackets. OTUs with multiple sequences indicated with the total number of sequences provided after OTU name. Thermus thermophilus Accession number M26923) was used as an out-group. Accession numbers for all sequences used in analysis provided after sequence name. The scale bar represents 10% sequence divergence.

Download figure to PowerPoint

From the 75 clones, a total of 31 OTUs were identified. Ten OTUs from two phyla (Cyanobacteria and Proteobacteria) had a similarity of 96% or more to cultured isolates and comprised 32% of all clones analysed. Twenty-one OTUs from five phyla had <96% similarity to cultured isolates, or were related to previously cloned 16S rRNA gene sequences, and comprised 68% of the clones analysed. In particular, all OTUs affiliated to the phyla Actinobacteria, Bacteroidetes, Verrucomicrobia, Planctomycetes, Chloroflexi and OP10 had no culturable relatives. This shows that the microbial ecology of freshwater cyanobacterial blooms includes a diverse range of many uncultured novel species. Comparisons with the GenBank database showed that the majority of the clones were most closely related to sequences recovered from other freshwater clones or isolates (<80%), while a few were closely related to sequences recovered from soil or marine habitats. Regarding the heterotrophic bacteria, four of 25 OTUs (16%) were most closely related to sequences recovered from heterotrophic bacteria inhabiting freshwater lakes susceptible to cyanobacterial blooms. In order to identify possible patterns of microbial community composition at a global scale, our sequence data of the 16S rRNA gene library have also been extensively compared with the 16S rRNA gene sequences of other cyanobacterial blooms reported in the literature. These results will be presented elsewhere (P.B. Pope & B.K.C. Patel, unpublished data).

General survey of the BAC library

The same HMW DNA that was used to construct 16S rRNA gene libraries was used to construct a metagenomic library, designated CBNPD1. This BAC library consisted of 2850 clones with an average DNA insert size of 27 kb as determined by Bgl II digestion and pulse-field gel electrophoresis analysis of c. 1.3% of the library clones (n=37). The DNA insert size of the library ranged from 5 kb to >50 kb with more than 60% of the inserts determined to be >20 kb. The CBNPD1 library was therefore estimated to contain c. 77.5 Mb of the bloom community DNA, which is equivalent to c. 19 genomes, each with an average genome size of 4 Mb and a total of 77 500 genes, assuming that the length of each coding gene is c. 1 kb.

BAC-end sequencing of 37 randomly selected BAC clones was used to provide a snapshot of potential metabolic capabilities of the cyanobacterial bloom community and also aid in the selection of clones containing genes from yet to be cultivated microorganisms for further sequencing. In this survey, 40 kb of DNA data was generated from 65 sequencing reactions, and the gene sequence tag data were analysed (see supplementary material, Table S1). The sequences obtained showed a wide variation in their G+C composition, ranging from 30.02% to 75.5%. Sixty-three per cent of the sequence tags (41 of the 65) show low matches (<80% similarity) to database sequence entries, suggesting that the bloom microbial community harbors many yet to be studied taxa.

The BAC-end sequence tags were classified into various COG functional categories (supplementary Table S1) and found to be involved in a wide array of cell metabolism processes, including amino acid metabolism (e.g. methionine synthase), carbohydrate metabolism (cellulose degradation), inorganic ion metabolism (nitrite/sulfite reductase) and lipid metabolism (fatty acid hydroxylase). A number of genes involved in cell structures (e.g. flagella), DNA processes, defence mechanisms (nuclease) and energy production (photosynthetic reaction center L subunit) were also found. The identification of the gene photosynthetic reaction center L subunit suggests that the library should have the potential to contain other photosynthetic bacterial genes.

Seventy-five per cent of the BAC-end sequence tags were found to be affiliated to genes from members of the phylum Proteobacteria. Numerous reports have suggested that this phylum represents the majority of heterotrophic bacteria in freshwater communities (Glockner et al., 1999), which concurs with our 16S rRNA gene and metagenomic studies presented here. Likewise, 40% of the GenBank database sequences from completed microbial genome sequencing projects originate from members of this phylum. Although morphological identification, cell counts and the sequence analysis of the 16S rRNA gene library had suggested that members of the phylum Cyanobacteria dominated the bloom community, no BAC-end sequence tag matches were found (supplementary Table S1). The lack of cyanobacterial BAC clones, despite their numerical dominance in the plankton community, could suggest that the extraction of DNA discriminates against this group, however, the results from the 16S rRNA gene libraries suggest otherwise as high levels of clones affiliated to the phylum Cyanobacteria were noted (Fig. 2). Another possibility is potential biases being encountered either during ligation of cyanobacterial DNA fragments into the pIndigoBAC vector or when the transformed host microorganisms (E. coli) failed to proliferate while harboring cyanobacterial BACs. However, the fact that the sequence tags obtained showed a wide variation in their G+C content (supplementary Table S1) suggests that there was no bias against a particular group of DNA fragments. The absence of any sequence tag matches to Cyanobacteria could possibly be attributed to the lack of representation and/or under-representation of gene and genome sequences of cyanobacteria in databases. For example, of the 460 completed microbial genomes only 14 are from the phylum Cyanobacteria, and, more significantly, there no genomes represented for Cylindrospermopsis raciborskii or Aphanizomenon species, the dominant species in the cyanobacterial bloom used in this study. Sixty-three per cent of the sequence tags, as indicated above, show low matches against database entries adding; suggesting that a proportion of these sequences could perhaps be ascribed to the phylum Cyanobacteria, but with missing sequence data from up to 95% of the microbial taxa, including those from members of the phylum Cyanobacteria, this is merely speculation at this stage.

Analysis of large BAC metagenome inserts

Based on the sequence tags survey, several clones were selected for insert sequencing and analysis. Table 1 displays the results from the sequence tag survey for BAC clones 67, 142, 543, 578, 905, 1664 and 2089, which were sequenced to completion. With the absence of any sequence tags matching with members of the phylum Cyanobaceria, clones 543 and 1664 were selected due to the uniqueness of their sequence tags (low matches to proteins in the GenBank database) and/or their affiliation to an under-represented phylum (clone 543). Genes obtained from these inserts could then be subjected to phylogenomic analysis, used to infer taxonomic assignments and gain insights into gene functions. Results from the sequence analysis of the 16S rRNA gene library demonstrated that apart from members of the phyla Cyanobacteria, which dominated the bloom community, members of heterotrophic bacteria were also present affiliated to phyla such as Bacteroidetes and Actinobacteria, which lacked any culturable representatives. Sequence tags from clones 905 and 2089 were selected for insert sequencing as possible matches to members of the Bacteroidetes and Actinobacteria, respectively, based on similar G+C content and similarities to protein-coding genes from those particular phyla. Members of the phylum Proteobacteria (clones 578, 67, 142) were also selected as their high numbers detected from 16S rRNA gene library analysis and the BAC-end sequencing survey suggest that members of this phylum make up a substantial part of the heterotrophic bacterial community. The seven BAC clones sequenced had DNA insert sizes ranging from 9 kb to 30 kb, and a cumulative nucleotide sequence of 144 kb was generated and assigned to 130 putative ORFs. A detailed description of DNA insert size, ORF positions, translation frames, putative functions, closest relatives and percentage similarity to putative proteins is summarized in Table 2 and the COG categories of ORFs are presented in Fig. 3.

Table 1.   Selection of cyanobacterial bloom metagenome clones for insert sequencing
CloneSequence*G+C (%)Most similar orthologueIdentities (aa)
  • *

    fwd and rev sequence generated with pIndigoBAC-5 forward and reverse sequencing primer respectively (see ‘Materials and methods’).

  • Most similar orthologues were identified by blastx searches.

578fwd76Conserved hypothetical membrane protein Betaproteobacteria71/157 (45%)
rev65Lipoic acid synthetase Betaproteobacteria233/276 (84%)
905fwd36Zinc protease Bacteroidetes54/175 (30%)
rev30Linoleoyl-CoA desaturase Bacteroidetes70/87 (80%)
1664fwd73Putative dehydrogenase Alphaproteobacteria123/173 (71%)
rev68Conserved hypothetical protein Betaproteobacteria35/136 (25%)
2089fwd63UDP-N-acetylglucosamine 1-carboxyvinyltransferase Actinobacteria36/57 (68%)
rev65Peptidase Actinobacteria117/290 (40%)
543fwd67Acetyl-muramoyl-alanine amidase Betaproteobacteria28/86 (32%)
rev63ATP synthase F0, C subunit Candidate division TM726/39 (66%)
142fwd49Peptidase S1 and S6 Gammaproteobacteria148/248 (55%)
rev45Conserved hypothetical protein Gammaproteobacteria100/147 (68%)
67fwd64Sugar transporter, MFS family Alphaproteobacteria171/270 (61%)
rev62Dehydrogenase/reductase SDR Alphaproteobacteria41/49 (83%)
Table 2.   Predicted putative proteins and RNA encoded by the metagenome of BAC clones 67 (EF157666), 142 (EF157667), 543 (EF157668), 578 (EF157669), 905 (EF157670), 1664 (EF157671) and 2089 (EF157672)
ORFPositionDSize (aa)Putative function: blastp (COG accession number if available)Closest relative (Phylum Genera)Identity (aa)Comments
  1. D, direction.

67 (EF157666)
145–266+74Toluene sulfonate zinc-independent alcohol dehydrogenase oxidoreductase (COG1028)ProteobacteriaXanthobacter60/73(84%)Fatty acid biosynthesis pathway; first reduction step
2997–473175Conserved hypothetical proteinBacteroidetesFlavobacterium40/113(35%)Cytoplasmic protein, unknown function
31113–1790+226Two component transcriptional regulator, winged helix familyProteobacteriaBurkholderia107/220(48%)Member of a two-component regulatory system
41787–2806+340Signal transduction histidine kinase (COG0642)ProteobacteriaMagnetospirillum117/296(39%)Member of a two-component regulatory system qseB/qseC. Activates the flagella regulon by activating transcription of flhDC
53951–3106282Enoyl-CoA hydratase/carnithine racemase (COG1024)ProteobacteriaPseudomonas128/246(52%)Could possibly oxidizes fatty acids using specific components
64032–5021+330Oxidoreductase, zinc-binding dehydrogenase family protein (COG0604)ProteobacteriaRoseovarius248/328(75%)Involved in energy production and conversion
75398–5024125Lysyl-tRNA synthetaseEukaryotaArabidopsis19/48(39%) 
86108–5479210Transcriptional regulator, TetR familyProteobacteriaRhodopseudomonas103/199(51%) 
97479–6175435Phenylacetate-CoA ligase (COG1541)ProteobacteriaRhodopseudomonas333/424(75%)ORFs 9–25: Components of a phenylacetate acid degradation gene cluster
107932–7483150Phenylacetic acid degradation-related protein (COG2050)ProteobacteriaParacoccus93/143(65%) 
118746–7934271Phenylacetate degradation, enoyl-CoA hydratase paaB (COG1024)ProteobacteriaNitrobacter164/269(60%) 
129014–10 177+388Putative branched-chain amino acid ABC transporter system substrate-binding protein (COG0683)ProteobacteriaRhodopseudomonas274/376(72%)ORFs 12–16: Component of the high-affinity leucine, isoleucine, valine transport system I (LIV-I), which is operative without Na(+) and is specific for alanine and threonine, in addition to branched-chain amino acids. Involved in cell growth and/or maintenance
1310 253–11 125+291Inner-membrane translocator (COG0599) putative branched-chain amino acid ABC transport system permease protein (COG4177)ProteobacteriaParacoccus187/290(64%) 
1411 125–12 072+316Permease protein LivMProteobacteriaRhodopseudomonas196/306(64%) 
1512 069–12 872+268ABC transporter related LivGProteobacteriaRhodopseudomonas170/258(65%) 
1612 865–13 563+233ABC transporter related LivFProteobacteriaRhodopseudomonas140/232(60%) 
1713 688–15 730+681Phenylacetic acid degradation protein, PaaN/Z subunitProteobacteriaRhodopseudomonas432/674(64%) 
1815 727–17 853+709Enoyl-CoA hydratase/isomerase/3-hydroxyacyl-CoA dehydrogenaseProteobacteriaRoseobacter330/695(47%) 
1917 901–19 103+401Thiolase, PaaJ subunitProteobacteriaStappia295/400(73%) 
2020 268–19 192359Phenylacetate-CoA oxygenase/reductase, PaaK subunitProteobacteriaRhodopseudomonas189/355(53%) 
2120 817–20 278180Phenylacetate-CoA oxygenase, PaaJ subunitProteobacteriaBradyrhizobium104/158(65%) 
2221 573–20 821251Phenylacetate-CoA oxygenase, PaaI/C subunitProteobacteriaRhodopseudomonas160/249(64%) 
2321 860–21 57396Phenylacetic acid degradation B, PaaBProteobacteriaRhodopseudomonas77/95(81%) 
2422 870–21 866335Phenylacetate-CoA oxygenase, PaaG subunitProteobacteriaRhodopseudomonas249/327(76%) 
2523 748–22 978257Phenylacetic acid degradation operon negative regulatory protein, PaaXProteobacteriaBradyrhizobium90/236(38%) 
2624 408–23 752219Putative glutathione S-transferaseProteobacteriaSinorhizobium125/219(57%) 
2724 714–24 49673Hypothetical ProteinNo significant similarity found  
2824 724–25 518+265ABC transporter, substrate binding protein, ModA subunitProteobacteriaAgrobacterium147/231(63%)ORFs 28–31: Components of a gene cluster that encodes an ABC-type, high-affinity molybdate transporter
2925 525–26 217+231Molybdenum transport protein, ModB subunitProteobacteriaMesorhizobium165/226(73%) 
3026 221–26 907+229Molybdenum import ATP-binding protein ModCProteobacteriaAgrobacterium128/228(56%) 
3126 904–27 314+137Molybdenum-binding transcriptional regulator, ModE family proteinProteobacteriaAgrobacterium61/101(60%) 
3227 589–28 548+320RNA methyltransferase TrmH, group 3ProteobacteriaXanthobacter158/276(57%) 
142 (EF157667)
195–973+292Peptidase S1/S6, chymotrypsin/HapProteobacteriaShewanella153/275(55%) 
22531–1119470Glutamate synthase, NADH/NADPH, small subunit 2 (COG0493)ProteobacteriaPseudoalteromonas347/471(73%)ORFs 2 and 3: Involved in nitrogen metabolism and glutamate biosynthesis
37092–25481514Glutamate synthase, large subunit (COG0069)ProteobacteriaAlteromonas1055/1429 (70%) 
48730–7777317Uncharacterized conserved protein (COG1469)ProteobacteriaIdiomarina225/308(73%) 
59415–8963150Isoaspartyl dipeptidaseFirmicutesBacillus69/145(47%)Catalyzes the hydrolytic cleavage of a subset of l-isoaspartyl (l-β-aspartyl) dipeptides
543 (EF157668)
11806–172544Peptidyl-tRNA hydrolasePlanctomycetesBlastopirellula44/113(38%)Involved in translation, ribosomal structure and biogenesis
22105–2704+200Metallophosphoesterase (COG1408)AcidobacteriaCandidatus Chloracidobacterium68/200(34%)Hydrolase activity
33784–2711358Saccharopine dehydrogenase (COG1748)ProteobacteriaHahella185/357(51%)Involved in amino acid transport and metabolism
45565–3781595Oligoendopeptidase F (COG1164)BacteroidetesCytophaga229/566(40%)Involved in amino acid transport and metabolism
56412–54743134-Diphosphocytidyl-2-C-methyl-d-erythritol kinase (COG1947)FirmicutesDesulfitobacterium91/270(30%)Catalyzes the phosphorylation of the position 2 hydroxy group of 4-diphosphocytidyl-2C-methyl-d-erythritol
66442–6621+60Putative S1/P1 NucleaseProteobacteriaAlteromonas17/40(42%)Cleave RNA and single stranded DNA with no base specificity
89929–10 306+126Quinolinate synthetase (COG0379)ActinobacteriaStreptomyces55/109(50%)Catalyzes the condensation of iminoaspartate with dihydroxyacetone phosphate to form quinolinate
912 776–10 530749Hypothetical proteinNo significant similarity found  
1013 639–12 596348Hypothetical proteinNo significant similarity found  
1110 341–13 700+1119Hypothetical proteinNo significant similarity found  
1213 717–14 874+386Hypothetical proteinEukaryota56/215(26%) 
1315 471–14 914186NADH dehydrogenase I chain I (COG1143)ProteobacteriaBdellovibrio66/156(42%)May donate electrons to ubiquinone
1415 528–15 872+115ATP synthase F0, subunit IProteobacteriaRoseovarius20/63(30%) 
1515 869–16 327+153Probable AMP-dependent synthetase and ligase proteinProteobacteriaRhizobium30/97(30%) 
1616 373–17 380+336Probable protein ATP synthase A chain (COG0356)PlanctomycetesRhodopirellula74/205(36%)Key component of the proton channel; it may play a direct role in the translocation of protons across the membrane
1717 481–17 798+106F0F1-type ATP synthase C subunit/Archaeal/vacuolar-type H+-ATPase subunit K (COG0636)SpirochaetesLeptospira36/62(58%)This is one of the three chains of the nonenzymatic component (Cf(0) subunit) of the atpase complex
1817 557–18 306+250Probable protein ATP synthase B chain (COG0711)PlanctomycetesBlastopirellula47/139(33%) 
578 (EF157669)
12416–5073+886Leucine aminopeptidase-related protein (COG2234)ProteobacteriaCaulobacter97/197(49%)Pfam04389; peptidase family M28, involved in proteolysis process
26489–5125455d-Lactate dehydrogenase (COG0277)ProteobacteriaBurkholderia203/456(44%)Involved in energy production and conversion
37514–6486343Conserved hypothetical proteinProteobacteriaPolarmonas209/340(61%)Cytoplasmic protein, unknown function
49063–7507519Aldehyde dehydrogenase (COG1012)ProteobacteriaRhodoferax345/499(69%)Enzymes that oxidize a wide variety of aliphatic and aromatic aldehydes using NADP as a cofactor
510 173–9079365Putative saccharopine dehydrogenase, NAD-binding (COG1748)ProteobacteriaBurkholderia235/357(65%)Involved in amino acid transport and metabolism
610 290–10 775+162Transcriptional Regulator, AsnC family (COG1522)ProteobacteriaPseudomonas80/149(53%)The AsnC family is a family of similar bacterial transcription regulatory proteins. ORF7–11 exhibit significantly lower G+C% than average ORF in fragment
711 584–11 928+115Conserved hypothetical proteinProteobacteriaLegionella81/113(71%)G+C%: 52.2
812 578–12 907+110Conserved hypothetical proteinDeinococcusThermusdeinococcus52/112(46%)G+C%: 53.3
913 451–13 831+127Conserved hypothetical proteinProteobacteriaRalstonia44/68(64%)G+C%: 49.6
1013 815–13 964+50Conserved hypothetical proteinActinobacteriaArthrobacter24/42(57%)G+C%: 50.7
1114 136–14 465+110Conserved hypothetical proteinProteobacteriaMesorhizobium36/62(58%)G+C%: 57.3
tRNA14 308–14 236 tRNA-Ala  tRNA-alanine
tRNA14 403–14 330 tRNA-Ile  tRNA-Isoleucine
rRNA14 551–16 070+ 16S rRNA geneBetaproteobacteria*(99%)Unlinked rrn operon
1215 996–16 472+159Hypothetical proteinEukaryota20/68(29%)Cytoplasmic protein, unknown function
1318 323–16 572584Phosphoenolpyruvate-protein kinase (COG1080)ProteobacteriaRubrivivax375/567(66%)Transfers the phosphoryl group from phosphoenolpyruvate (PEP) to the HPr
1418 633–18 36490Phosphocarrier HPr protein (COG1925)ProteobacteriaMethylibium64/89(71%)A component of the phosphotransferase system (PTS), a major carbohydrate transport system in bacteria
1519 323–20 786+488Hypothetical proteinProteobacteriaMethylibium116/346(33%) 
1620 803–21 315+171Hypothetical proteinProteobacteriaMethylibium87/134(64%) 
1722 032–21 328235Lipoate synthase (COG0320)ProteobacteriaMethylibium195/232(84%) 
905 (EF157670)
12890–77937Putative zinc proteaseBacteroidetesBacteroides493/940(52%)Zn-dependent peptidase involved in proteolysis and peptidolysis
23062–3619+185Conserved hypothetical proteinBacteroidetesFlavobacterium95/165(57%)Cytoplasmic protein, unknown function
33657–4325+222Conserved hypothetical proteinBacteroidetesFlavobacterium143/222(64%)Cytoplasmic protein, unknown function
44394–4780+128Protein of unknown function DUF525 (COG2967)BacteroidetesFlavobacterium115/128(89%)Uncharacterized protein affecting Mg2+/Co2+ transport
54900–6528+542δ-1-pyrroline-5-carboxylate dehydrogenase 1 (COG1012)BacteroidetesFlavobacterium486/541(89%)Oxidizes proline to glutamate for use as a carbon and nitrogen source
67769–6693358Conserved hypothetical proteinBacteroidetesBacteroides103/335(30%)Cytoplasmic protein, unknown function
77864–9093+409ISPg4, transposase (COG3385)BacteroidetesPsychroflexus292/401(71%)Necessary for efficient DNA transposition
89463–10 515+350Two-component system sensor protein, without kinase domain (COG2972)Proteobacteria110/263(41%)Predicted signal transduction protein with a C-terminal ATPase domain.
910 549–11 724+391Hypothetical proteinBacteroidetesFlavobacterium67/240(27%)Predicted hydrolases or acyltransferases (α/β hydrolase superfamily).
1011 763–12 458+231Two-component system, regulatory proteinProteobacteria102/231(44%)Member of the two-component regulatory system lytR/lytS that regulates genes involved in autolysis and cell wall metabolism.
1112 677–12 949+90Conserved hypothetical proteinBacteroidetesBacteroides33/86(38%)Cytoplasmic protein, unknown function
1212 946–14 040+364Probable transposaseBacteroidetesBacteroides184/365(50%)Cytoplasmic protein, unknown function
1314 549–14 27790Peptidase M48, Ste24p (COG0501)BacteroidetesFlavobacterium58/74(78%)Zn-dependent protease with chaperone function
1415 109–14 606167Peptidase M48, Ste24p (COG0501)BacteroidetesFlavobacterium99/152(65%) 
1515 895–15 266209Methyltransferase GidB (COG0357)BacteroidetesFlavobacterium180/207(86%)Predicted S-adenosylmethionine-dependent methyltransferase involved in bacterial cell division
1664 (EF157671)
23901–1337854Thiamine pyrophosphate enzyme TPP-binding domain proteinProteobacteriaRhodobacter56/217(25%) 
34767–3868299Cytoplasmic membrane extrusion proteinProteobacteriaBurkholderia35/121(28%) 
44817–6391+524Hypothetical proteinEukaryota37/121(30%) 
56936–7706+256DSH domain proteinActinobacteriaAcidothermus32/93(34%) 
67719–8207+162Cystathionine β synthase (CBS) domain protein (COG0517)ProteobacteriaCaulobacter91/142(64%)GMP biosynthesis from IMP; first step
79757–8189522Conserved hypothetical proteinProteobacteriaBurkholderia127/292(43%)Cytoplasmic protein, unknown function
89761–10 858+365PilLProteobacteriaDelftia97/317(30%)Evolutionarily-related protein involved in the transport of ammonium ions across membranes
911 143–9839434Putative ammonium transporter (COG0004)Proteobacteria212/409(51%) 
1011 279–11 821+180Translation initiation factor IF-2CyanobacteriaSynechococcus35/102(34%) 
1112 183–13 469+428Hypothetical proteinProteobacteriaStigmatella66/207(31%) 
1215 243–12 1481031Hemolysin activator-related protein (COG0532)No significant similarities foundActivates hemolysin, a toxin secreted across both the cytoplasmic and outer membranes of Gram-negative bacteria
1317 354–15 567595Hypothetical protein (COG0595)No significant similarities foundPredicted hydrolase of the metallo-β-lactamase superfamily
1417 523–18 497+324Hypothetical protein (COG0692)No significant similarities foundProbable histidine kinase involved in signal transduction mechanisms
1518 558–17 650302Conserved hypothetical proteinProteobacteriaBurkholderia122/289(42%) 
1618 730–19 566+278Hypothetical proteinNo significant similarities found 
1719 554–21 086+510Hypothetical proteinNo significant similarities found 
1821 750–21 59850Hypothetical proteinNo significant similarities found 
1922 229–21 744161Guanine nucleotide exchange factorEukaryota25/69(36%) 
2089 (EF157672)
12882–1377501Conserved hypothetical protein (COG0183)ActinobacteriaMycobacterium283/506(47%)Involved in Lipid transport and metabolism.
22861–3406+181Ribose-5-phosphate isomerase 3 (COG0698)ActinobacteriaPropionibacterium81/144(56%)In addition to its activity on d-ribose 5-phosphate it probably also has activity on d-allose 6-phosphate
35727–396487Short-chain dehydrogenase/reductase SDRProteobacteriaBurkholderia93/255(36%) 
45707–5970+87Glycosyl transferase, family 4ActinobacteriaNocardioides25/74(33%)Involved in lipid metabolism
56050–6385+111ATP synthase F0, subunit IProteobacteriaSulfitobacter20/60(33%)ORFs 5–16: ATP synthase complex, composed of a nine-subunit (A–G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (α, β, γ, δ and ɛ) catalytic core for ATP synthesis (F1-ATPase).
66382–6846+154Putative ATP synthase protein IActinobacteriaFrankia35/138(25%)
76906–7664+252ATP synthase F0, A subunit (COG0356)ActinobacteriaFrankia109/233(46%)
87713–7997+94ATP synthase C chain (COG0636)ActinobacteriaCorynebacterium43/62(69%) 
98072–8683+203ATP synthase B chain (COG0711)ActinobacteriaCorynebacterium38/108(35%) 
108680–9222+180ATP synthase F0, B subunitActinobacteriaFrankia47/161(29%) 
119219–9743+175ATP synthase δ subunit (COG0711)ActinobacteriaSymbiobacterium64/175(36%) 
129774–11 366+530ATP synthase α subunitActinobacteriaAcidothermus339/514(65%) 
1311 372–11 854+160H+-transporting ATPase, γ subunitProteobacteriaMethylophilales53/128(41%) 
1411 829–12 332+167H+-transporting ATPase, γ subunitActinobacteriaFrankia80/139(57%) 
1512 348–13 790+480ATP synthase β subunitActinobacteriaPropionibacterium320/450(71%) 
1613 790–14 218+142ATP synthase ɛ chainFirmicutesGeobacillus57/129(44%) 
1714 359–13 862165Hypothetical proteinNo significant similarities found 
1814 314–14 994+226Short chain dehydrogenase (COG0300)ActinobacteriaSaccharopolyspora106/227(46%)Short-chain dehydrogenases of various substrate specificities
1915 032–15 757+241LuxR:Response regulator receiverCyanobacteriaTrichodesmium24/48(50%)A transcriptional activator for quorum-sensing control
2015 809–16 462+217Conserved hypothetical proteinActinobacteriaJanibacter47/173(27%)Cytoplasmic protein, unknown function
2116 619–16 48843Putative fructose-1,6-bisphosphataseActinobacteriaNocardia24/42(57%)Involved in carbohydrate metabolism
2216 593–19 334+913Peptidase S9, prolyl oligopeptidase (COG1506)ChloroflexiChloroflexus155/504(30%)Serine-type peptidase activity
2319 346–20 119+257Riboflavin biosynthesis protein RIBAFirmicutesDesulfitobacterium36/101(35%) 
2420 888–20 127253Fatty acid transporter protein-like proteinEukaryota27/82(32%) 
2520 839–21 405+188Predicted hydrolase of the α/β-hydrolase foldProteobacteriaMagnetospirillum84/154(54%)Cytoplasmic protein, unknown function
2622 970–21 402522General substrate transporter:Major facilitator superfamily MFSProteobacteriaCaulobacter99/411(24%) 

Figure 3.  ORFs of seven BAC clones 578 (EF157669), 905 (EF157670), 1664 (EF157671), 2089 (EF157672), 543 (EF157668), 142 (EF157667) and 67 (EF157666) from the cyanobacterial bloom metagenome library CBNPD1 are shown. ORFs are color coded according to the COG affiliations. Further details of the ascribed putative functions for each of the ORF is indicated in Table 2.

Download figure to PowerPoint

It is well established that clones that contain phylogenetic anchors (usually rRNA genes) are preferred in metagenomic studies, as phylogeny and phylogenetic neighbors can be readily determined. While the 16S rRNA gene is most commonly sought after for phylogenetic analysis and taxonomic assignments, previous findings have suggested that particular genes involved in informational processes can also be used for reconstruction of an organism's evolution as they are rarely affected by horizontal gene transfer (Jain et al., 1999). In the absence of a suitable phylogenetic anchor, assignment of each translated query ORFs to a phylum is made based on best blast hits and then the whole metagenome insert is ascribed to the most matched phylum. Accordingly, clones 67 and 142 were determined to be affiliated to the phylum Proteobacteria with over 90% and 80% (respectively) of ORFs assigned to the phylum. The metagenome of clone 905 was initially proposed as being assigned to the phylum Bacteroidetes, with 87% of the predicted ORFs affiliated to it. Among the 15 ORFs of clone 905, ORF 15 encoded a 209-residue protein to which putative function as an informational gene could be assigned, more specifically as a methyltransferase involved in bacterial cell division (Table 2). The deduced protein was used to reconstruct the phylogeny of clone 905 (Fig. 4a) with tree topologies providing further support that clone 905 has a phylogenetic association with members of the Flavobacterium, phylum Bacteroidetes.


Figure 4.  Distance dendrograms showing the phylogenies of (a) methyltransferase including the deduced amino acid sequence of ORF 15 from clone 905 (b) ATP synthase, beta subunit including the deduced amino acid sequence of ORF 15 from clone 2089 and (c) quinolinate synthase including the deduced amino acid sequence of ORF 8 clone 543. Accession numbers of the reference sequences are given in parentheses. The scale bar represents a sequence divergence of 10%. Bootstrap values >250 are shown at nodes.

Download figure to PowerPoint

The assignment of the metagenome of clones 2089, 1664 and 543 proved difficult to determine as there was no dominance of affiliation to a particular phylum and the majority of matches were low-value similarities. Clone 2089 contained 26 ORFs, with 57% of putative proteins encoded, exhibiting highest similarity to homologs affiliated with the phylum Actinobacteria. Highest blastp matches for the remaining putative proteins included members of Proteobacteria (20%), Firmicutes (8%), Cyanobacteria (4%) and Chloroflexi (4%). ORFs 5–16 encoded a putative ATP synthase complex, composed of a nine-subunit (A–G, F6, F8) transmembrane channel through which protons are pumped (F0-complex), and a five-subunit (α, β, γ, δ and ɛ) catalytic core for ATP synthesis (F1-ATPase). In particular, ORF 15 encoded a 480-residue protein that exhibited 71% sequence identity to an ATP synthase, β subunit affiliated to a member of the genus Propionibacterium, phylum Actinobacteria (Table 2). A further consensus on the predicted function of ORF 15 as a putative ATP synthase, β subunit was based on 10+ top hits matching to proteins of this function (data not shown). β-Unit ATP synthases have previously been used in phylogenetic investigations as an alternative phylogenetic marker (Wolfgang et al., 1998). ORF 15 was additionally used as a phylogenetic marker to reinforce the taxonomic assigment of clone 2089. A phylogenetic tree generated using aligned homologs and the distance estimation method suggested an evolutionary relationship to the phylum Actinobacteria, and further reinforced the predicted function of ORF 15 (Fig. 4b).

The metagenome of clone 543 contained 18 ORFs with the highest blast match for each of the putative proteins spanning six different phyla (Proteobacteria, Planctomycetes, Actinobacteria, Bacteroidetes, Firmicutes and Spirochaetes), thus making it difficult to verify taxonomic assignments based on highest similarities. Several ORFs were considered for phylogenomic analysis to try and verify the taxonomic assignment of clone 543. ORF 1 was predicted to encode a 544-residue protein with highest match to a peptidyl-tRNA hydrolase homolog, a crucial bacterial enzyme involved in protein translation. However, the assignment to this particular function was questionable, given the very low sequence identity (38%) and the short length of alignment (113 aa). ORF 8 encoded a 126-residue protein that exhibited 50% sequence identity and 10+ top hits to quinolinate synthetase affiliated to a member of the phylum Actinobacteria. Additionally, COG predictions of ORF 8 assigned putative function as a quinolinate synthase (Table 2). Quinolinate synthetases are involved in the production of quinolinic acid from which NAD, a cofactor that plays a crucial role in numerous essential redox biological reactions, is derived (Ollagnier-de Choudens et al., 2005). A phylogenetic tree generated with ORF 8 and aligned homologs (Fig. 4c) showed that clone 543 may have a phylogenetic association with members of the phylum Actinobacteria, although this assignment must be considered with caution.

Clone 1664 contained 19 ORFs and, in comparison to the other clones, had the highest percentage of hypothetical proteins (37%) and also had the highest percentage without homologs (32%). Approximately 42% of putative proteins encoded on clone 1664 exhibited closest similarity to homologs affiliated with the phylum Proteobacteria. The majority of matches were of low significance, i.e. very low percentage sequence identity and only partial length of alignments to homologs (Table 2) Although several matches were identified to predicted informational genes that could be used as phylogenetic markers, including ORF 10 which encoded a 180-residue protein that had closest similarity to a translation initiation factor (IF-2) affiliated to a member of the phylum Cyanobacteria, the blastp statistics were such that only a poor prediction of the putative function could be assigned. Thus, any phylogenomic analysis attempting to predict function as well as taxonomic assignments would not be conclusive. The results suggest that the metagenome of clone 1664 may have possibly originated from yet to be cultured novel bacteria, which may represent novel lineages or already known lineages for which no genome sequences are available in the databases.

Clone 578 was found to contain a 16S rRNA gene affiliated to members of the order Burkholderiales, class Betaproteobacteria with the closest cultured relatives being a yet to be validated taxon, strain A1004 (similarity value of 99%), and Roseateles deploymerans (DSM11813T) and Rubrivivax gelatinosus (DSM1709T), with which it was equidistantly placed with a similarity value of 96%. Further analysis of clone 578 showed that ORFs 1–6 were found to be most similar to proteins affiliated to different genera of the order Burkholderiales whereas ORFs 13–17 immediately after the 16S rRNA gene were most similar to a region of the genome of Methylibium petroleiphilum (ATCC BAA-1232T) (position 2394076–2404086) (GenBank Accession number CP000555); confirming that the metagenome of clone 578 is a member of the order Burkholderiales. Interestingly, ORFs 7–12 were found to be affiliated to members of three different phyla and not to order Burkholderiales. The mol% G+C content of ORFs 1–6 and ORFs 13–17 ranged from 62% to 72% (average of 66%), whereas that of ORFs 7–12 varied between 49% and 57% (average of 52%). This could suggest that horizontal gene transfer (HGT) of ORFs 7–12 might have occurred. The results of our analysis are similar to that noted by Nesbøet al. (2005) for their metagenome studies of anaerobic sediments in which 57–96% of the predicted ORFs of each metagenomic fragment were similar to the respective homologs of a particular phylum but another fraction (7–44%) of the ORFs of the same fragment that had a different mol% G+C content were not, presumably because the latter had been acquired by lateral gene transfer mechanisms.

Description of a putative phenylacetyl-CoA catabolon (PhAc)

A total of 32 putative genes identified in the metagenome of clone 67 (Table 2) were found to be affiliated to members of the phylum Proteobacteria. Of these, 17 genes (ORFs 9–25) were deemed to be involved in the degradation of phenylacetate and/or phenylacetate CoA (also known as the PhAc) via the β-oxidation pathway, indicating that some organisms in the community are capable of utilizing phenylacetate as a carbon and energy source. The genes of the PhAc catabolon are made of several gene clusters organized into operons and have been identified in a few of the environmentally important aromatic degrading bacteria such as E. coli (designated paa) (Ferrandez et al., 1997), Pseudomonas sp. Y2 (Velasco et al., 1998) and P. putida (designated pha) (Olivera et al., 1998). The arrangement of the operons is somewhat different in the genomes of different species, suggesting that DNA gene and operon rearrangements may have occurred during evolution of the catabolon.

The P. putida catabolon is the most studied catabolon, and the catabolon of clone 67 has marked organizational differences. The former encodes an 18-kb gene cluster composed of 15 genes that are organized in five contiguous operons, namely paaABCEF, paaGHIJK, paaLMN, paaY and paaX. In contrast, clone 67 catabolon is made up of 17 genes that are arranged in a group of five functional units to form four putative operons (Fig. 5). The two transport genes, paaL (which encodes a permease) and paaM (which encodes a specific channel forming protein for the uptake of phenylacetic acid), found in P. putida are missing in the cluster of clone 67. Instead, a cluster of five genes represented by livKHMGF that are known to be involved in the transport of branched-chain aromatic amino acids, such as phenylalanine, were identified. It is likely that livKHMGF cluster may have a role in the transport and/or degradation of branched amino acids containing phenylalanine. Interestingly, the two same genes that are missing in the catabolon of clone 67 are also missing from a number of other PhAc catabolons (Fig. 5). It has been suggested that the missing gene functions could be replaced by other similar genes involved in the β-oxidation pathway (Jiménez et al., 2002), and therefore it is likely that the role of paaM and paaL is undertaken by other genes. The fact that the livKHMGF ABC transporter system is present instead of a symporter (as in P. putida) could suggest that the concentration of phenylacetate in this environment is lower than in soil (from which Pseudomonas spp. were isolated). This suggestion is quite plausible, because in soil phenylacetate is not only generated from anaerobic degradation of aromatic amino acids in proteins, but also from the breakdown of lignin (Colberg & Young, 1985; Chefetz et al., 2002). Whether or not phenylacetate is generated by anoxic degradation of aromatic amino acids occurring within the cyanobacterial bloom community itself or is released from the sediment remains to be elucidated. In addition, a gene for a putative regulatory protein (paaR) of the TetR family instead of a paaX orthologue (Mohamed et al., 2002) has been found in Azoarcus evansii and Burkholderia pseudomallei (Betaproteobacteria) and is similar to the TetR transcriptional regulator found upstream from the PhAc cluster of clone 67 (Table 2).


Figure 5.  Organization of a putative phenylacetyl-CoA catabolon. Gene cluster encoding the phenylacetyl-CoA catabolic pathway or a catabolon core from clone 67 of the CBNPD1 library is shown in comparison with other catabolons. Genes are color coded based on their functional units namely (i) PaaL and PaaM which code for transport proteins (yellow), (ii) PaaF, a PhAc-activating enzyme, also known as PhAc-CoA ligase (green), (iii) PaaG, PaaH, PaaI, PaaJ and PaaK which code for a ring-hydroxylating complex (red), (iv) PaaN which codes for a a ring-opening protein (yellow), (v) a β-oxidation-like system (PaaA, PaaB, PaaC and PaaE) (green), and (vi) PaaX (purple) and PaaY (blue) which code for regulatory proteins, (vi) PaaX which codes for a regulatory protein and the cluster named LivKHMGF which is inferred in branched-chain amino acid transport although it was previously found to be involved in phenylalanine accumulation in Escherichia coli.

Download figure to PowerPoint

Concluding remarks

Freshwater cyanobacterial blooms dominated by Aphanizomenon and Cylindrospermopsis species are reported to produce cylindropermopsin, a potential cytotoxin and a carcinogen (Harada et al., 1994), and is unlike the situation of Microcystis dominated blooms in which the hepatotoxin microcystin is produced (Carmichael et al., 1988). Despite the dangers, very little is known about the population composition and function of cyanobacterial bloom communities and the intricate relationships that may exist between members of the cyanobacteria and other heterotrophic bacteria. Here, we report on the first culture-independent investigation into the metagenome of a freshwater toxic bloom dominated by Aphanizomenon and Cylindrospermopsis species, members of the phylum Cyanobacteria.

Our report here has only provided a small glimpse into the complex freshwater cyanobacterial toxin-producing community. Our laboratory efforts are currently concentrated on the isolation of novel nonphotosynthetic bacteria, and 16 new strains, which are members of the phylum Actinbacteria and phylum Proteobacteria, have so far been isolated. We are also investigating the possibility of sequencing the complete bloom metagenomic library and the genomes of the novel isolates that are not yet represented in genome sequencing projects in order to predict metabolic processes and provide a further insight into the bloom community.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

P.B.P. is a recipient of the Australian Postgraduate Award (APA) from Griffith University. We thank CRC for Water Quality and Treatment (CRCWQT) for their support, Daphne Lim for the construction and sequencing of the 16S rRNA gene library, and SEQWater Corporation staff for assistance in sample collection and access to the data on cyanobacterial counts.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and methods
  5. Results and discussion
  6. Acknowledgements
  7. References
  8. Supporting Information

Table S1. Bac-end sequence analysis of cyanobacterial bloom metagenome clones.

This material is available as part of the online article from: (This link will take you to the article abstract).

Please note: Blackwell Publishing is not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

FEM_448_sm_TableS1.doc142KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.