Metagenomic analysis of viruses in reclaimed water


*E-mail; Tel. (+1) 727 553 3520; Fax (+1) 727 553 1189.


Reclaimed water use is an important component of sustainable water resource management. However, there are concerns regarding pathogen transport through this alternative water supply. This study characterized the viral community found in reclaimed water and compared it with viruses in potable water. Reclaimed water contained 1000-fold more virus-like particles than potable water, having approximately 108 VLPs per millilitre. Metagenomic analyses revealed that most of the viruses in both reclaimed and potable water were novel. Bacteriophages dominated the DNA viral community in both reclaimed and potable water, but reclaimed water had a distinct phage community based on phage family distributions and host representation within each family. Eukaryotic viruses similar to plant pathogens and invertebrate picornaviruses dominated RNA metagenomic libraries. Established human pathogens were not detected in reclaimed water viral metagenomes, which contained a wealth of novel single-stranded DNA and RNA viruses related to plant, animal and insect viruses. Therefore, reclaimed water may play a role in the dissemination of highly stable viruses. Information regarding viruses present in reclaimed water but not in potable water can be used to identify new bioindicators of water quality. Future studies will need to investigate the infectivity and host range of these viruses to evaluate the impacts of reclaimed water use on human and ecosystem health.


Increasing urbanization on a global scale places enormous pressure on finite freshwater supplies. The use of alternative water supplies is therefore an important component of sustainable water resource management practices across the world (Levine and Asano, 2004). Reclaimed water (i.e. the reusable end-product of wastewater treatment) is an important alternative water supply since it reduces the discharge of wastewater effluent into surface waters and contributes to water conservation by supplying water for activities that do not require drinking water quality standards. For more than 20 years Florida has been on the forefront of water reuse efforts in the USA (Young and York, 1996). Local freshwater supplies are insufficient for supporting rapid population growth in Florida, which has lead regulatory agencies to increase emphasis on beneficial water reuse, such as water reclamation (Overman and Pirozzoli, 1996). Reclaimed water is currently used in Florida for non-potable public water supply, agricultural irrigation, environmental enhancement, industrial uses and groundwater recharge (Florida Department of Environmental Protection, 2007).

Reclaimed water has successfully been used as an alternative water resource for decades (Young and York, 1996; Levine and Asano, 2004). Nevertheless, as water reuse applications increase and reclaimed water distribution expands, there are some concerns that need to be addressed to ensure protection of public health and the health of the environment. One of the biggest issues regarding reclaimed water use is pathogen transport. Since the microbiological content of this water supply is still largely unknown, it is difficult to assess which pathogens can potentially be spread through this alternative water supply. Viruses are a group of particular concern because they include highly stable pathogens that can be resistant to standard wastewater treatment processes. Although reclaimed water meets water quality standards, for practical reasons, current quality control methods do not test the presence of pathogens directly (Salgot et al., 2001). The spread of viral pathogens through reclaimed water is a real possibility as several studies have detected enteric viruses in treated wastewater, including reoviruses, astroviruses, saproviruses, rotaviruses, noroviruses, adenoviruses, hepatitis A viruses and enteroviruses (Morace et al., 2002; Sedmak et al., 2005; Bofill-Mas et al., 2006; Haramoto et al., 2006; 2008; Arraj et al., 2008; Katayama et al., 2008; Meleg et al., 2008). In spite of these findings, most of the microbiological research in treated wastewater has been directed towards bioindicator organisms, such as faecal coliforms, to indirectly reflect the presence of enteric bacteria and viruses. It has been shown that bacterial indicators, such as faecal coliforms, do not correlate with the occurrence of viral pathogens in wastewater (Harwood et al., 2005; Haramoto et al., 2006; Carducci et al., 2008). These findings have led several scientists to propose a suite of bioindicator organisms, including bacteria and coliphage, as well as viral indicators, such as human adenoviruses and polyomaviruses as a more sensitive tool to detect viral pathogens (Harwood et al., 2005; Bofill-Mas et al., 2006; McQuaig et al., 2006; Carducci et al., 2008).

Concerns regarding reclaimed water use mainly focus on human pathogens, since most of the water is derived from domestic (human) wastewater. However, reclaimed water may also be a reservoir for non-human pathogens that are present in human waste. For example, it has been shown that plant viruses dominate the RNA viral community in human faeces (Zhang et al., 2006). Other studies have identified animal rotavirus strains of unknown origin cocirculating with human strains in sewage and treated wastewater (Villena et al., 2003; Meleg et al., 2008). Recently, a metagenomic study of viruses in stool from South Asian children identified an abundance of novel picornaviruses related to the Enterovirus genus (i.e. cosaviruses) and four new viral species related to the Dicistroviridae, Nodaviridae, Circoviridae families and the Bocavirus genus (Kapoor et al., 2008; Victoria et al., 2009). Therefore, the diverse viral flora in human faeces may contain plant, insect and animal viruses, in addition to human pathogens.

It is critical to have a comprehensive understanding of viruses in reclaimed water as this alternative resource becomes more widely used. Information regarding viruses will help regulatory agencies to make informed decisions about reclaimed water use to minimize negative impacts upon human and environmental health. The main objective of this study was to examine the abundance and diversity of viruses found in reclaimed and potable water samples from south-west Florida, USA, through direct epifluorescent microscopy and metagenomic sequencing of purified viral particles from these water sources. Bacteriophages (phages) were abundant in both potable and reclaimed water; however, differences in phage community composition between these water supplies can be exploited to identify potential bioindicators of water quality. Reclaimed water also contained a wealth of novel viruses related to plant, animal and insect pathogens, suggesting that highly stable viruses can spread through the use of this alternative water supply.

Results and discussion

Abundance of virus-like particles

The concentration of virus-like particles (VLPs) in various traditional and alternative water supplies, including well, potable and reclaimed water, was determined through SYBR Gold staining and epifluorescent microscopy. Potable and well water had VLP concentrations on the order of 105 and 106 VLPs ml−1 respectively (Fig. 1A). The VLP concentrations found in potable water are similar to those observed in other studies, which have found an abundance of bacteria (on the order of 105 cells ml−1) and VLPs (on the order of 106−107 VLPs ml−1) using direct counts (Rinta-Kanto et al., 2004; Berney et al., 2008). Reclaimed water samples contained approximately 1000-fold more VLPs (on the order of 108 VLPs ml−1) than the conventional water supplies studied. In order to compare the abundance of VLPs in raw sewage, after treatment, and at the downstream point-of-use, several samples originating from a single wastewater treatment plant were collected. The average VLP concentrations for treated reclaimed water effluent and at the point-of-use were similar to the VLP concentrations in raw sewage (Fig. 1A). Examination of purified reclaimed water VLPs by transmission electron microscopy (TEM) showed an abundance of viruses resembling known phages and plant pathogens (Fig. 1B).

Figure 1.

A. Epifluorescent microscopy counts of virus-like particles (VLPs) found in conventional water supplies (well and potable water), raw sewage and reclaimed water (RW).
B. Transmission electron micrographs of virus-like particles found in reclaimed water (the bar in each panel is 100 nm). Viral counts include well water samples collected from private shallow wells (n = 5); potable water samples collected at a plant nursery (n = 3); raw sewage and RW effluent samples collected at a wastewater treatment plant (n = 3 for each); and RW samples at the point-of-use obtained from different public sprinklers and a plant nursery (n = 8). Error bars represent one standard deviation.

Although average VLP concentrations were similar in raw sewage and reclaimed water (∼108 VLPs ml−1), this should not be considered an indication of treatment efficiency. Virus-like particles counted with epifluorescent microscopy are not necessarily infectious virus particles. The infectivity of the virus particles will rely heavily upon the type of wastewater treatment. This study focused on reclaimed water from a treatment plant using secondary treatment (activated-sludge) with chlorine disinfection, which is typical of wastewater reclamation facilities in Florida (Florida Department of Environmental Protection, 2007). In addition, high VLP concentrations in reclaimed water samples may reflect the abundance of phages in treated effluent. A study investigating bacteriophage populations in an activated sludge system found that there was a net production of phages in the system, with total phage concentrations in the supernatant of an activated-sludge reactor and the unchlorinated effluent measuring as high or higher than those in sewage entering the reactor (Ewert and Paynter, 1980). The abundance of phages in reclaimed water was also supported by TEM of viral concentrates, where an abundance of phage-like particles was observed compared with other VLPs (examples of VLPs shown in Fig. 1B) and by the dominance of phage-like sequences in the reclaimed water DNA viral metagenomes (see below).

Overview of metagenomic analyses

This study utilized a metagenomic sequencing approach to examine the viruses present in reclaimed water. The advantage of this method is that it surveys the complete viral community, without selection based on host or sequence similarity to known viruses. DNA viral metagenomes were obtained from a potable water sample (‘Potable’), and reclaimed water samples at the point-of-discharge (‘Effluent’) and the point-of-use (‘Nursery’ and ‘Park’) (Table 1). Corresponding RNA viral metagenomes were sequenced from reclaimed water samples at the point-of-discharge (‘Effluent’) and the point-of-use (‘Nursery’) (Table 1). It was not possible to obtain enough RNA from the potable water sample for pyrosequencing and, thus, a comparison between reclaimed and potable water RNA viral communities could not be achieved.

Table 1.  List of samples used to construct metagenomic libraries and overview of the total number of sequences and contigs for each library.
SampleDate of collectionVolume
DescriptionLibraryNo. of raw readsNo. of contigsRaw reads in contigs (%)
  1. All samples were collected in Manatee County, Bradenton, FL. Samples collected at the ‘point-of-use’ (Park and Nursery) receive reclaimed water from the wastewater treatment plant where ‘point-of-discharge’ samples (i.e. Effluent) were collected. The number of contigs includes all contigs analysed by blastx (i.e. contigs larger than 200 bases).

Park9/7/200650 lReclaimed water collected from a public sprinklerDNA307 0698 32767.60
Effluent5/10/2007100 lReclaimed water collected
at a treatment plant
DNA260 30426 72978.40
RNA232 9295 39296.70
Nursery5/10/2007100 lReclaimed water collected
at a plant nursery
DNA281 54227 37276.60
RNA287 01413 88792.00
Potable5/10/2007100 lPotable water collected at a plant nurseryDNA231 7154 92395.60

All DNA and RNA reclaimed water libraries were analysed individually. However, there were no notable differences in the distribution of phages and viral types between the Effluent and the point-of-use, for both DNA and RNA libraries, suggesting that the viral community composition does not change significantly in the distribution system. Therefore, all reclaimed water DNA libraries sequence distributions, including Effluent, Park and Nursery, were averaged together and shown as ‘Reclaimed DNA’ in the results. The Effluent and Nursery RNA library distributions were averaged together as well, and are referred to as ‘Reclaimed RNA’.

For metagenomic sequence analysis, more than 230 000 raw reads for each library were used for contig assembly in the SeqMan program (dnastar) (Table 1). The average raw sequence length for Effluent, Nursery and Potable libraries was 245 nt while the average length for the Park library was 136 nt. The majority (68–97%) of the sequences in each metagenome assembled into contigs larger than 200 nt (Table 1). Contigs larger than 200 nt were utilized for blastx searches against the GenBank non-redundant (nr) protein database since longer read length increases the chances of identifying homologies in the database (Wommack et al., 2008).

Novel sequences

Summaries of the blastx similarities for the reclaimed and potable water viral metagenomes are shown in Fig. 2. Over 50% of the viral metagenomic sequences (both DNA and RNA) identified in reclaimed water metagenomes had no significant similarity to proteins in GenBank, suggesting the novelty of viruses in this alternative water source (Fig. 2A). This is similar to previous studies, in which the majority of the sequences in environmental viral metagenomes had no similarities to known genes in the database, indicating the high proportion of unknown viruses in the environment (Breitbart et al., 2002; Breitbart et al., 2004; Angly et al., 2006; Culley et al., 2006; Bench et al., 2007). The Potable library had a higher proportion of sequences (56%) with significant similarities to sequences in GenBank compared with reclaimed water libraries, but a large fraction of the viral community still could not be identified.

Figure 2.

Overview of the average distribution of contigs larger than 200 nt for potable and reclaimed water metagenomic libraries based on blastx analysis (E-value < 0.001).
A. Percentage of contigs that had homologues in the GenBank protein database (‘assigned’) versus contigs for which no homologues were found (‘no hits’).
B. Distribution of ‘assigned’ contigs according to their top blastx homologues in GenBank and ACLAME databases.
C. Host distribution of top viral homologues in GenBank and ACLAME. For reclaimed water DNA libraries eukaryotic hosts included animals, insects, algae and protists. For RNA libraries all viruses identified as prokaryotic viruses had hits to DNA phage. The environmental category refers to contigs with similarities to marine viruses believed to infect protists.

Mobile genetic elements

All blastx results with an E-value of < 0.001 were analysed using the megan software to identify the different taxa present in reclaimed and potable water viral communities. Since viral particles were purified extensively prior to isolating DNA and RNA, contigs with their best matches to proteins from cellular organisms (i.e. bacteria, archaea and eukaryotes) were further compared against the ACLAME database to identify proteins related to mobile genetic elements (i.e. plasmids and phages). For all metagenomes, more than 60% of the sequences that were classified as bacteria were re-classified as plasmids or phages after performing a blastx analysis against the ACLAME database. Although sequences similar to eukaryotes and archaea were not as abundant as the bacterial sequences, 15–56% of eukaryotic and 40–70% of archaeal sequences were also re-classified in the same manner. The discrepancy between the results obtained from searching the GenBank versus ACLAME databases may be due to an abundance of unidentified prophage-like sequences within microbial genomes in GenBank (Fouts, 2006). Similarities to mobile genetic elements are common in previously sequenced DNA viral metagenomes from other sources (Breitbart et al., 2002; 2003; 2004; Bench et al., 2007; Kim et al., 2008).

Hits to mobile genetic elements dominated all the viral metagenomes (Fig. 2B). For reclaimed water DNA libraries, 51% of the known sequences were similar to viral proteins, whereas the majority of the sequences in the reclaimed water RNA libraries and potable water DNA library were similar to proteins found in plasmids. More than 99% of plasmid-like protein sequences were identified after re-analysing sequences with hits to cellular organisms through the ACLAME database. Although most of the sequences identified were similar to hypothetical proteins, numerous integrases, transposases, recombinases and replication-associated proteins, among others, were identified. Since viral particles were purified by CsCl gradients and DNase treatment to remove contaminating cells and free DNA, plasmids should have been eliminated before viral DNA and RNA isolation. However, phages and plasmids share a number of characteristics as both contain machinery for gene transfer and replication. Moreover, there may be genetic exchange between plasmids, phages and other mobile genetic elements within a bacterial host (Boltner et al., 2002; Mark Osborn and Böltner, 2002), and this exchange may lead to gene organization and protein similarities between plasmids and phages (Hazen et al., 2007). In addition, some prophages (e.g. pKO2, PY4, P1, N15, LE1, φ20 and φBB-1) replicate in their hosts as low-copy-number plasmids instead of integrating into the host genomes (Ikeda and Tomizawa, 1968; Inal and Karunakaran, 1996; Eggers et al., 2000; Girons et al., 2000; Ravin et al., 2000; Briani et al., 2001; Casjens et al., 2004). It is possible that there is an abundance of previously undescribed phages containing plasmid-like proteins in potable water as 52% of the identified sequences had similarities to plasmid proteins (Fig. 2B). The abundance of plasmid-like sequences in reclaimed water RNA libraries may reflect the abundance of novel RNA viruses with plasmid-like properties such as the endornaviruses. Currently, four species of dsRNA viruses with plasmid-like properties found in some rice and bean species have been classified by the International Committee on Taxonomy of Viruses as members of the Endornavirus genus (Gibbs et al., 2005). In addition, double-stranded RNA viruses with plasmid-like properties have been found in plants, algae, fungi, protozoa and insects and, thus, endorna-like viruses may be widely distributed among eukaryotes (Horiuchi and Fukuhara, 2004; Fukuhara et al., 2006; Osaki et al., 2006).

Comparison of DNA and RNA viral sequences

The fraction of metagenomic sequences with similarities to known viral proteins suggests there are fundamental differences between the DNA and RNA viral communities in reclaimed water (Fig. 2C). The potable and reclaimed water DNA viral communities were dominated by phages (more than 98% of the contigs), with only one contig similar to an archaeal virus protein. In contrast, the RNA viral community in reclaimed water was dominated by eukaryotic viruses. Similar results have been found in viral metagenomic studies of human faeces where the DNA community was dominated by phages (Breitbart et al., 2003), whereas the RNA community was dominated by eukaryotic viruses (Zhang et al., 2006; Finkbeiner et al., 2008; Kapoor et al., 2008). The same trend has also been observed in marine viral metagenomes (Breitbart et al., 2002; Angly et al., 2006; Culley et al., 2006; Bench et al., 2007). This trend may represent an important fundamental difference between these viral types that persists across systems; however, it may simply reflect the under-representation of RNA phage genomes in the database. To date, there are only 11 complete RNA phage genomes in GenBank. The small number of RNA phages in the database compared with DNA phages may lead to an underestimation of RNA phages identified through metagenomic surveys of natural populations in different environments.


Since the DNA viral community was dominated by phages, DNA metagenomic sequences were analysed against the Phage Sequence Databank, containing the complete genomes of 512 phages and prophages. Although phages were abundant in DNA viral metagenomes from both potable water and reclaimed water, there were distinct dominant phage types identified in each of these water sources based on the distribution of top phage homologues identified in the Phage Sequence Databank. The phages in potable water were dominated by prophages while reclaimed water libraries had a similar proportion of hits to phages in the Siphoviridae family and prophages (Fig. 3A). The abundance of phages belonging to the Siphoviridae family in reclaimed water is consistent with a viral metagenomic study performed in human faeces (Breitbart et al., 2003) and suggests that members of the Siphoviridae are potential indicators of faecal pollution. In addition, the prevalence of Siphoviridae in reclaimed water suggests that members of this viral family are resistant to chlorination. This is consistent with previous studies, which demonstrated that some phages from the Siphoviridae family are more resistant to chlorination than male-specific phages (i.e. MS2) and some members of the Microviridae (i.e. phiX174) and Myoviridae (i.e. MY2) families (Duran et al., 2003). Together, these findings suggest that members of the Siphoviridae family could be further explored as potential bioindicators as they represent an abundant group of viruses in human sewage that are fairly resistant to wastewater treatment (chlorination).

Figure 3.

Distribution of phage families (A) identified through tblastx analysis of DNA contigs against the Phage Sequence Databank and host representation within prophage (B), Siphoviridae (C), Myoviridae (D) and Podoviridae (E).

For all sequences that had significant similarities to the Phage Sequence Databank, the host for the top phage hit was examined. Differences in host representation within each phage family indicate that reclaimed water has a distinct phage community when compared with potable water (Fig. 3). Although prophages were abundant in both potable and reclaimed water libraries, the host distribution was quite different. The Potable library was dominated by hits to prophages identified in different strains of Escherichia coli (∼12% of total phage hits), whereas prophages in reclaimed water libraries had a more even distribution of hosts including, but not limited to, Xyllela spp., Ralstonia spp., Steptococcus spp. and Pseudomonas spp. (less than 4% of the total phage hits for each) (Fig. 3B). Differences in host distributions were also noted within the Myoviridae (Fig. 3D) and Podoviridae (Fig. 3E) families.

Total somatic coliphages (viruses that infect E. coli) and male-specific (F+) RNA coliphages have been used as viral indicators of faecal contamination (Griffin et al., 2000; United States Environmental Protection Agency, 2001; Cole et al., 2003; Harwood et al., 2005). However, the results presented here suggest that coliphages are not appropriate for this purpose. DNA coliphages were found in potable and reclaimed water, indicating that their presence does not reflect water quality. In addition, the metagenomic data gathered during this study demonstrate that coliphages were not the most abundant phages in reclaimed water (Fig. 3). Phages infecting Salmonella spp. were more abundant in reclaimed water (∼7%) than coliphages (∼5%) and their abundance in potable water was not as high (∼4%) as coliphages (∼12%). The most abundant host in reclaimed water libraries was Burkholderia spp. (∼9.5%), while phages infecting this host were less abundant in potable water (∼2.5%). Therefore, phages that infect hosts other than E. coli should be explored as potential bioindicators of water quality.

Eukaryotic DNA viruses

The role of reclaimed water in the dissemination of human, animal and plant viral pathogens is currently unknown. Therefore, one of the goals of this study was to examine the prevalence of different eukaryotic viral groups in reclaimed water. The types of eukaryotic viruses found in reclaimed water metagenomic libraries, based on their best blastx homologies in GenBank, are summarized in Tables 2 and 3. No eukaryotic viruses were detected among the sequences from potable water (Fig. 2C).

Table 2.  Plant viruses identified in reclaimed water metagenomic libraries.
Plant virus homologueTaxonomic classificationAmino acid identity range (%)Library
  1. The identity range refers to the amino acid level identity between contigs and their top homologue in the nr protein database. *Sequences similar to DNA viruses that were found in RNA libraries and vice versa.

RNA libraries   
 Cherry rasp leaf virusComoviridae family37Effluent
 Tobacco etch virusPotyviridae family33Nursery
 Maize chlorotic dwarf virusSequiviridae family24–35Effluent, Nursery
 Parsnip yellow fleck virus 32Nursery
 Rice tungro spherical virus 26–38Effluent, Nursery
 Paprika mild mottle virusTobamovirus genus77–98Effluent, Nursery
 Pepper mild mottle virus 80–100Effluent, Nursery
 Tobacco mild green mosaic virus 97–100Effluent
 Tobacco mosaic virus 89–100Effluent, Nursery
 Tomato mosaic virus 87–100Effluent, Nursery
 Cardamine chlorotic fleck virusTombusviridae family57Nursery
 Maize chlorotic mottle virus 53Nursery
 Melon necrotic spot virus 34–99Effluent, Nursery
 Pelargonium flower break virus 61Nursery
 Oat dwarf virusReoviridae family45Nursery*
 Pea stem necrosis virusUnclassified60Nursery
DNA libraries   
 African cassava mosaic virus-Uganda MildGeminiviridae family30–36Nursery
 Ageratum yellow vein virus-associated DNA 1 33–34Nursery
 Alternanthera yellow vein virus 45Nursery
 Bean leaf curl Madagascar virus 44Nursery
 Bean yellow dwarf virus 24Nursery
 Beet curly top Iran virus 44Nursery
 Beet curly top virus 35Nursery
 Beet mild curly top virus 26Nursery
 Chickpea chlorotic dwarf Sudan virus 30–40Nursery, Park
 Chili leaf curl Pakistan virus 43Nursery
 Chloris striate mosaic virus 30–39Effluent*, Nursery
 Clerodendron yellow mosaic virus 32Nursery
 Cotton leaf curl Alabad virus 33Nursery
 Cotton leaf curl Multan virus 33Nursery
 East African cassava mosaic virus (Uganda variant) 45Nursery
 Emilia yellow vein virus 24Nursery
 Eupatorium yellow vein virus 37Nursery*
 Maize streak virus 22Nursery*
 Rhynchosia golden mosaic virus 50Nursery
 Squash leaf curl virus 29Nursery
 Squash leaf curl Yunnan virus 31–36Nursery
 Tobacco yellow dwarf virus 38–40Nursery*
 Banana bunchy top virusNanoviridae family28–56Effluent, Nursery*
 Coconut foliar decay virus 30–42Effluent
 Faba bean necrotic yellows virus 33–52Nursery
 Milk vetch dwarf virus 52Nursery
 Nanovirus-like particle 28–45Nursery*
 Subterranean clover stunt virus 38–43Nursery
Table 3.  Eukaryotic viruses, other than plant viruses, identified in reclaimed water metagenomic libraries.
Virus hostVirus homologueAmino acid identity range (%)Library
  1. The identity range refers to the amino acid level identity between contigs and their top homologue in the nr protein database. *Sequences similar to DNA viruses that were found in RNA libraries and vice versa.

RNA libraries   
 HumanRhinovirus sp.24–36Effluent, Nursery
Enterovirus sp.28–36Effluent, Nursery
Parechovirus32–35Effluent, Nursery
Aichi virus34Nursery
 AnimalPorcine enterovirus sp.25Effluent
Bovine enterovirus25Effluent
Simian enterovirus sp.26–30Nursery
Seneca valley virus32Effluent
Ljungan virus25Effluent
European brown hare syndrome virus29Effluent, Nursery
Hepatitis C virus subtype 1a29Effluent*
 InvertebratesTaura syndrome virus25–37Effluent, Nursery
Cricket paralysis virus28–56Effluent, Nursery
Drosophila C virus27–35Effluent, Nursery
Plautia stali intestine virus26–51Effluent, Nursery
Rhopalosiphum padi virus34Nursery
Himetobi P virus25–42Effluent, Nursery
Triatoma virus22–42Effluent, Nursery
Aphid lethal paralysis virus27–36Effluent, Nursery
Kashmir bee virus38–43Effluent, Nursery
Acute bee paralysis virus32–43Effluent, Nursery
Israel acute paralysis virus of bees30–74Effluent, Nursery
Solenopsis invicta virus 125–48Effluent, Nursery
Homalodisca coagulata virus-124–54Effluent, Nursery
Honey bee slow paralysis virus27–31Effluent, Nursery
Nora virus28–54Effluent, Nursery
Sacbrood virus31Nursery
Acheta domesticus virus37Effluent
 Other Euk.Heterosigma akashiwo RNA virus27–51Effluent, Nursery
 (Algae, diatoms, fungi)Schizochytrium single-stranded RNA virus31–50Effluent, Nursery
Rhizosolenia setigera RNA virus24–57Effluent, Nursery
Sclerophthora macrospora virus A36–37Effluent*, Nursery*
DNA libraries   
 AnimalBird circovirus sp.27–65Effluent*, Nursery*, Park
Swine circovirus sp.25–50Effluent*, Nursery*, Park
Canarypox virus31–38Effluent, Nursery
Crocodilepox virus51–57Effluent, Nursery
Cercopithecine herpesvirus sp.31–33Effluent
Lymphocystis disease virus 130Effluent
Rock bream iridovirus31Effluent
Equine herpesvirus sp.26Nursery
 InvertebratesEpiphyas postvittana NPV60Effluent
Costelytra zealandica iridescent virus42Effluent
Invertebrate iridescent virus 631–36Nursery
Melanoplus sanguinipes entomopoxvirus51–61Park
 Other EukParamecium bursaria Chlorella virus sp.26–52Effluent, Nursery
 (Algae, protists)Acanthocystis turfacea Chlorella virus 135–51Effluent, Nursery
Ostreococcus virus OsV532–45Effluent, Nursery
Emiliania huxleyi virus 8639Nursery
Acanthamoeba polyphaga mimivirus30–41Effluent, Nursery

Eukaryotic viral sequences in the DNA libraries were dominated by viruses containing single-stranded DNA (ssDNA) circular genomes, including plant pathogens from the Geminiviridae and Nanoviridae families (Tables 2 and 3) and animal pathogens from the Circoviridae family (Table 3) (Rosario et al., 2009). It is important to note that the amino acid identities to known viral proteins were less than 60%, suggesting that these are novel viruses with weak similarities to known ssDNA viruses. The identification of ssDNA viruses in reclaimed water suggests that they may be resistant to chlorination. This is consistent with the known resistance of small ssDNA viruses to wastewater treatment (Nwachcuku and Gerba, 2004) and suggests the presence of these viruses in treated effluent should be further explored.

Circular ssDNA viruses are likely overrepresented in the libraries due to the multiple displacement amplification (MDA) step utilized before pyrosequencing. MDA has been shown to selectively amplify circular single-stranded genomes by 2 or 3 orders of magnitude compared with other DNA types in a mixed community (Kim et al., 2008). Despite this enrichment, this study demonstrates that circular ssDNA viruses can be disseminated through reclaimed water. To date, most of the research regarding circular ssDNA viruses has focused on pathogens infecting agriculturally important crops and animals (e.g. Todd, 2000; Seal et al., 2006). However, a recent metagenomic study used MDA to identify novel circular ssDNA viral genomes in a rice paddy soil sample (Kim et al., 2008), suggesting they are more abundant in the environment than previously recognized.

Eukaryotic RNA viruses

All the viral-like sequences in the RNA viral metagenomes were similar to positive-sense RNA eukaryotic viruses (Tables 2 and 3). Sequences similar to proteins from the proposed Picornavirales order (Le Gall et al., 2008) dominated the libraries, suggesting an unprecedented abundance of novel picorna-like viruses in reclaimed water. Some sequences were similar to picornaviruses that infect vertebrates, including members of the Enterovirus, Parechovirus, Rhinovirus and Kobuvirus genera, which have been associated with diseases in humans and animals (e.g. Yamashita et al., 1995; 2001; Rotbart, 2002; Yamada et al. 2004; Benschop et al., 2008). However, this does not necessarily mean that there are viruses in reclaimed water that cause human disease, since amino acid identities to known human pathogens were very low (< 36%). Furthermore, the majority of the picorna-like viruses were similar to viruses that infect invertebrates. Numerous sequences were similar to marine picorna-like viruses, Marine JP-A and JP-B, believed to infect protists based on phylogeny and genome characteristics (Culley et al., 2007). These sequences were classified in the environmental category (Fig. 2C) because the Marine JP-A and JP-B genomes were assembled from marine metagenomic libraries and a definitive host has not been identified (Culley et al., 2006).

Reclaimed water RNA libraries also contained sequences similar to insect and arthropod picornaviruses from the family Dicistroviridae and an insect virus (i.e. Nora virus) that may belong to a new picorna-like family (Habayeb et al., 2006). A recent study of stool from South Asian children also identified an abundance of sequences related to picornaviruses of invertebrates, including members of the Dicistroviridae and Nodaviridae families (Victoria et al., 2009). However, these types of viruses have not been reported in human faeces from the USA population and, thus, the source of novel picorna-like viruses found in reclaimed water remains to be determined. Future studies need to investigate the relationship between picorna-like viruses in reclaimed water and viruses from known hosts to determine if the reclaimed water viruses belong to known picornavirus families and if virus hosts can be inferred based on phylogenetic analysis (Culley et al., 2003; Culley and Steward, 2007).

Both DNA and RNA reclaimed water metagenomes contained a diverse group of sequences related to plant viruses (Table 2). Most of the viruses were novel as they shared less than 60% amino acid identities to known viral proteins. However, all the viruses belonging to the Tobamovirus genus and the Melon necrotic spot virus (MNSV) from the Tombusviridae family had high amino acid identity (> 90%) to known plant pathogens. Viruses from these groups are known to have extremely stable virions that can resist high temperatures and are insensitive to organic solvents and non-ionic detergents (Fauquet et al., 2005). In the early 1980s, the tombusvirus Tomato bushy stunt virus was used to demonstrate that humans can act as carriers of plant pathogens by consuming infected produce and shedding infective viral particles in their faeces (Tomlinson and Faithfull, 1982). It was suggested that plant viruses with no known vectors, such as most tobamoviruses and tombusviruses, may have certain ‘alimentary resistance’ (i.e. stay intact after passing through alimentary tract), which enables humans and other animals to act as carriers of these viruses (Tomlinson and Faithfull, 1982). Interestingly, all the tobamoviruses and MNSV detected in the reclaimed water metagenomes have also been previously detected in faecal samples from healthy individuals (Zhang et al., 2006). The tobamovirus Pepper mild mottle virus (PMMoV) was the most abundant virus in RNA viral metagenomes from individual faecal samples, and was still capable of infecting plants after passing through the human gut (Zhang et al., 2006). PMMoV was subsequently detected at concentrations greater than 104 copies ml−1 in both raw sewage and treated effluent samples collected throughout the USA (E. Symonds, K. Rosario and M. Breitbart, unpublished). MNSV and most of the tobamoviruses, including PMMoV, were detected in reclaimed water samples both at the point-of-discharge (Effluent library) and at the point-of-use (Nursery library), indicating that stable viruses may reach irrigation systems. The tobamovirus Tomato mosaic virus (ToMV) and MNSV have been detected in irrigation systems in other countries (Gosalvez et al., 2003; Boben et al., 2007); however, data regarding the presence of plant pathogens in irrigation systems in the USA are not available. These findings indicate that reclaimed water may serve as a mechanism for the spread of highly stable plant pathogens that exhibit alimentary resistance. The infectivity of these plant viruses needs to be examined to determine if reclaimed water use represents a potential problem for the agricultural sector.


This study identified the dominant DNA and RNA viral types in reclaimed water, thus making a significant contribution to current microbiological data regarding treated wastewater. The DNA viral community in both reclaimed and potable water was dominated by phages. However, there were clear differences between both communities as demonstrated by phage family representation and host distribution in the different libraries. From a water quality standpoint, it is useful to evaluate which types of phages endure wastewater treatment processes in order to identify potential viral bioindicators. Finding strong bioindicators that correlate with the presence of human viruses is not an easy task as different viruses exhibit varying levels of resistance to wastewater treatment (Nwachcuku and Gerba, 2004). Natural phage populations found in wastewater offer a range of resistance to disinfection (chlorination) that may represent most of the viruses that can be found in water (Duran et al., 2003). Therefore, phage populations in reclaimed water offer an untapped source of potential bioindicators.

The metagenomes also uncovered a wealth of novel eukaryotic viruses present in reclaimed water. DNA metagenomic libraries revealed the presence of viruses similar to ssDNA viruses, including plant and animal pathogens from the Geminiviridae, Nanoviridae and Circoviridae families. The RNA metagenome contained an abundance of plant pathogens known to be resistant to environmental degradation, including members of the Tobamovirus genus and Tombusviridae family. The RNA metagenome also contained an abundance of picorna-like viruses. Some of these picorna-like viruses may be related to human pathogens; however, the majority of these novel viruses are most closely related to insect and protist viruses. None of the established human pathogens (e.g. enteroviruses, hepatitis viruses and caliciviruses) were detected during this study, suggesting that these viruses were not abundant relative to phages and other eukaryotic viruses in the reclaimed water samples. The genetic information gathered during this study can be used to design molecular assays to detect viral types of interest and assess their abundance in wastewater and ecosystems exposed to wastewater discharge. Future research needs to evaluate the host range, infectivity and ecological impacts of novel viruses identified in reclaimed water to ensure the appropriate use of this important alternative water supply.

Experimental procedures

Enumeration and visualization of virus-like particles

SYBR Gold staining and epifluorescent microscopy (Chen et al., 2001; Shibata et al., 2006; Patel et al., 2007) was used to enumerate VLPs in well, potable and reclaimed water samples as well as in raw sewage. Potable water samples (n = 3) were collected from spigots at the point-of-use (plant nursery) in Manatee County (Brandenton, FL). Well water samples (n = 5) were collected from private shallow wells in Pinellas County (St. Petersburg, FL). Reclaimed water samples were collected at the point-of-discharge (i.e. wastewater treatment plant, n = 3) and at the point-of-use (i.e. public sprinklers, fountains, spigots at a plant nursery, n = 8) in Pinellas County (St. Petersburg, FL) and Manatee County (Bradenton, FL). Raw sewage samples (n = 3) were collected from a raw inflow stream at a wastewater treatment plant. Raw sewage and reclaimed water samples at the point-of-discharge and point-of-use originated from the same wastewater treatment facility, which uses activated sludge followed by chlorination for disinfection. All samples for VLP enumeration were collected in sterile 50 ml conical tubes and processed within 3 h. Samples were fixed with 2% para-formaldehyde and then subsamples were filtered onto a 0.02 μm Anodisc (Whatman, Maidstone, Kent, UK). For raw sewage and reclaimed water samples, 10 μl was diluted into 1 ml of sterile water to prepare the slides, whereas 3–5 ml of well and potable water samples was directly filtered onto the Anodisc. Filters were stained with 1× SYBR Gold (Invitrogen, Carlsbad, CA, USA) for 10 min in the dark, and virus particles were counted digitally (Chen et al., 2001) in at least eight fields of view for each sample.

Transmission electron microscopy was used to examine the morphology of virus particles in reclaimed water. For this purpose, purified viral particles (see below) were dried down overnight onto copper 400-mesh carbon coated formvar grids. Grids were stained with 1% uranyl acetate for 20 s and air-dried before visualization on a Hitachi 7100 transmission electron microscope.

Viral isolation and extraction of nucleic acids

Viruses were purified from four different samples: potable water (‘Potable’), reclaimed water at the point-of-discharge from the wastewater treatment plant (‘Effluent’) and reclaimed water at two points-of-use, a plant nursery (‘Nursery’) and a public park sprinkler (‘Park’) (Table 1). For each sample, viruses were concentrated and purified from 50 to 100 l of water (Table 1) using a combination of tangential flow filtration, density-dependent centrifugation and nuclease treatment (Breitbart et al., 2002; 2003; 2004; Zhang et al., 2006; Thurber et al., 2009). Each water sample was first filtered through a 0.2 μm tangential flow filter (TFF) (GE Healthcare, Westborough, MA, USA) to remove bacteria, eukaryotes and large particles. Viruses in the filtrate were concentrated using a 100 kDa TFF until the final sample volume was less than 1 l. All TFF viral concentrates were then filtered through a 0.22 μm Sterivex filter (Millipore, Billerica, MA, USA) to remove any bacterial contamination. The viral concentrate from potable water was further concentrated through polyethylene glycol (PEG 8000) precipitation in order to obtain enough nucleic acids for metagenomic sequencing, and treated with 10% chloroform to remove contaminating microbial cells. TFF viral concentrates from reclaimed water and the PEG-precipitated concentrates from potable water were loaded onto a cesium chloride (CsCl) density gradient, ultracentrifuged at 61 000 g for 3 h at 12°C, and the 1.2–1.5 g ml−1 fraction was collected. Examination of the different CsCl fractions by SYBR Gold staining and epifluorescent microscopy revealed a significant number of VLPs that remained in the sample reservoir, therefore the sample fraction was re-loaded onto a second CsCl density gradient and processed with the same procedure. The 1.2–1.5 g ml−1 fractions collected from both gradients were pooled together. This double CsCl gradient procedure allowed the recovery and isolation of the vast majority of VLPs. After CsCl purification, viral fractions were treated with DNase I to further eliminate free DNA. Samples were not treated with RNase due to the potential for RNase to destroy the nucleic acids of some RNA viruses (Griffin et al., 2000; Cole et al., 2003). DNase-treated VLPs were further concentrated using centrifugal concentration filters (Microcon Ultracel YM-30; Millipore, Bedford, MA, USA) before nucleic acid extraction. After final purification and concentration, viral DNA and RNA were simultaneously extracted from viral concentrates using the QIAmp MinElute Virus Spin Kit (Qiagen, Valencia, CA, USA). Extracted nucleic acids were split into two fractions, one for DNA libraries and one for RNA libraries. The RNA fraction was treated with DNase I using the DNA-free Kit (Ambion, Austin, TX, USA) followed by random-primed cDNA synthesis using the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen).

Library construction and pyrosequencing

Four DNA and two RNA viral metagenomes were pyrosequenced at the Genome Institute of Singapore (Table 1). For this purpose, DNA and cDNA samples were amplified in triplicate reactions with GenomiPhi V2 DNA Amplification Kit (GE Healthcare, Little Chalfont, Bukinghamshire, UK). All GenomiPhi reactions were purified using standard phenol/chloroform extraction and ethanol precipitation (Sambrook and Russell, 2001). The amplification products were pooled and 4 μg was processed for pyrosequencing with the GS-FLX sequencer (454 Life Sciences, Roche) according to the manufacturer's protocol. The Park-DNA sample was amplified in duplicate by the GenomiPhi reaction and purified amplification products (4 μg) were processed for shotgun sequencing using a GS20 sequencer (454 Life Sciences, Roche). Sequences have been deposited to the Short Read Archive (SRA) at NCBI (accession numbers: SRA008294).


For metagenomic sequence analysis, raw reads longer than 100 nt were assembled into contigs using SeqMan (dnastar, Madison, WI) with a criteria of ≥ 95% identity over at least 35 nt (Table 1). Contigs larger than 200 nt were then compared against the GenBank non-redundant (nr) protein database using blastx (Altschul et al., 1997) (E-value < 0.001). These blastx results were analysed using the Metagenome Analyzer (megan) software (Huson et al., 2007) to identify the different taxa present in reclaimed and potable water viral metagenomes. Contigs with best matches to cellular organisms (i.e. bacteria, archaea or eukaryotes) were further compared against the ACLAME (‘A Classification of Mobile Genetic Elements’) database using the same parameters as in GenBank to identify mobile genetic elements, including phage and plasmids (Leplae et al., 2004; 2006). All contigs from the DNA libraries with significant hits to phages and cellular organisms were also analysed against the Phage Sequence Databank ( using tblastx (E-value < 0.001) in order to identify the dominant phage families in the DNA metagenomes.

A number of chimeric sequences were identified after manual examination of contigs. This chimera problem was more pronounced in the RNA libraries, making genome assemblies impossible from the RNA libraries. These artifacts were most likely the result of the multiple displacement amplification (MDA) used to obtain enough nucleic acids for pyrosequencing (Lasken and Stockwell, 2007). In addition, MDA is known to have amplification biases selecting for circular ssDNA templates (Kim et al., 2008). Therefore, the relative abundance of different viral sequences, such as ssDNA and RNA viruses, was not used to infer the relative abundance of different DNA and RNA eukaryotic viruses. Instead, this study focused on contigs to evaluate the different types of viruses present in reclaimed water as opposed to investigating the absolute abundance of individual raw reads. Analysis of contigs rather than individual raw reads has several advantages. First, this strategy reduces the number of sequences to be analysed by eliminating redundant sequences (i.e. identical sequences assemble into the same contig). Second, contigs are longer than individual reads, which increases the chances of finding significant matches in the database by increasing sequence length (Wommack et al., 2008). Third, analysis of contigs may help collapse the effects of artifacts caused by MDA, as chimeras are likely to form within the same template (Lasken and Stockwell, 2007).


This research was funded by grants to M.B. from the National Science Foundation (MCB-0701984) and Alfred P. Sloan Foundation (BR-4772). K.R. was funded by a Bridge to the Doctorate Fellowship from the National Science Foundation and the Saint Petersburg Progress Endowed Fellowship. The Genome Institute of Singapore (GIS) sequencing team provided the Roche 454 sequencing data. Y.R. was funded by Singapore A*STAR and NHGRI (R01 HG004456-01 and R01HG003521-01).The authors would like to acknowledge Tony Greco for the TEM analysis and Dana Hall and Wah Heng Lee for assistance with the bioinformatics analysis. Reclaimed water sampling was possible thanks to the collaboration of Owrang Kashef, Jan Tracy, David Shulmister and Chris Collins.