Challenges of studying viral aerosol metagenomics and communities in comparison with bacterial and fungal aerosols


  • Aaron J. Prussin II,

    Corresponding author
    1. Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
    • Correspondence: Aaron J. Prussin, II, Virginia Tech, 245 Kelly Hall, Blacksburg, VA 24061, USA. Tel.: +1 540 2399303; fax: +1 540 2317916; e-mail:

    Search for more papers by this author
  • Linsey C. Marr,

    1. Department of Civil and Environmental Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
    Search for more papers by this author
  • Kyle J. Bibby

    1. Department of Civil and Environmental Engineering, University of Pittsburgh, Pittsburgh, PA, USA
    Search for more papers by this author


Despite the obvious importance of viral transmission and ecology to medicine, epidemiology, ecology, agriculture, and microbiology, the study of viral bioaerosols and community structure has remained a vastly underexplored area, due to both unresolved technical challenges and unrecognized importance. High-throughput, culture-independent techniques such as viral metagenomics are beginning to revolutionize the study of viral ecology. With recent developments in viral metagenomics, characterization of viral bioaerosol communities provides an opportunity for high-impact future research. However, there remain significant challenges for the study of viral bioaerosols compared with viruses in other matrices, such as water, the human gut, and soil. Collecting enough biomass is essential for successful metagenomic analysis, but this is a challenge with viral bioaerosols. Herein, we provide a perspective on the importance of studying viral bioaerosols, the challenges of studying viral community structure, and the potential opportunities for improvements in methods to study viruses in indoor and outdoor air.


Bacteria, fungi, viruses, and other biological particles (e.g., pollen) can all be found in airborne particles, known as bioaerosols. Bioaerosols can spread diseases of humans, plants, and animals on the scale of meters to continents, due to potential for long-distance transport in the atmosphere (Griffin, 2007; Smith et al., 2012; DeLeon-Rodriguez et al., 2013). Bioaerosols may also play a role in climate change and ecological processes and may even be beneficial for human health and the environment (Ariya et al., 2009; Pratt et al., 2009; Barr et al., 2013). There have been many studies elucidating the properties of bacterial and fungal bioaerosols. Bacterial concentrations of 102–106 CFU m−3 and fungal concentrations of 102–103 spores m−3 have been found in indoor and outdoor air (Lighthart, 2000; Zhu et al., 2003; Jo & Seo, 2005; Lee et al., 2006; Bowers et al., 2009). In contrast, relatively little is known about viral bioaerosols: Only a few studies have reported concentrations in air, and they focused on a single virus species.

The diversity and seasonal dynamics of bacterial communities have been studied in many environments, including both indoor and outdoor air environments (Pastuszka et al., 2000; Maron et al., 2005; Rintala et al., 2008; Bowers et al., 2011ab). The universal bacterial 16S rRNA genomic marker region is typically used in bacterial community studies. Similar to bacteria, fungi have a conserved genomic region, the ribosomal internal transcribed spacer (ITS), that can serve as a universal marker for community sequencing of fungi (O'Brien et al., 2005; Schoch et al., 2012). The fungal diversity of both indoor and outdoor air environments has been previously described (Pitkäranta et al., 2008; Fröhlich-Nowoisky et al., 2009; Yamamoto et al., 2012). Unlike bacteria and fungi, viruses do not share a conserved genetic element, so metagenomic sequencing is required to study viral community structure.

The viral community structure in air remains largely unexplored. Researchers have used a metagenomic sequencing approach to examine the composition of viral communities in marine, soil, and sludge environments (Angly et al., 2006; Fierer et al., 2007; Dinsdale et al., 2008; Bibby & Peccia, 2013) and to identify viral pathogens (Edwards & Rohwer, 2005; Rosario & Breitbart, 2011; Bibby, 2013). However, none of these studies targeted airborne virus community structure, although there have been studies targeting particular viruses in the air. For example, Yang et al. (2011) identified influenza A virus in indoor air and found an average concentration of 1.6 × 104 genome copies m−3. However, such studies provide a limited view of the overall viral community structure.

The viral bioaerosol community is an important scientific frontier for understanding overall microbial ecology, in particular transmission of infectious disease. Advances in metagenomics hold great promise for improving our knowledge about viral bioaerosols, but there are some methodological challenges that must be overcome first. Herein, we discuss the importance, challenges, and future of studying community structures of viral bioaerosols through the use of metagenomics.

Aerosol transport drives viral bioaerosol community structure

The role of transport in structuring microbial communities has been previously described in many environments including air, soil, and water (Yavuz Corapcioglu & Haridas, 1985; Kellogg & Griffin, 2006), and many complex mechanisms dictate microbial transport in any given environment. In air, settling velocity is a key factor that dictates microorganism transport (Friedlander, 2000). Settling velocity is directly correlated with the size of a microorganism, by its diameter squared, so larger microorganisms typically settle out of the air much more quickly than do smaller ones. Density of particles also affects settling velocity but likely does not vary much between different types of microorganisms. Most biological particles have a density approximately equal to that of water (c. 1 g mL−1) (Sharp et al., 1945; Martinez-Salas et al., 1981; Trail et al., 2005).

Due to the significant differences in size between virus-like particles (VLPs) (c. 0.02–0.30 μm), bacteria (c. 0.50–5 μm), and fungi (c. 1–100 μm), orders of magnitude differences in settling times and transport distances are expected (Fig. 1). Here, we demonstrate that transport plays a key role in viral bioaerosol community composition, and thus, knowledge of transport distance is important when attempting to use metagenomics to determine the origin of VLPs in a community. Consider a scenario in which a small virus, such as hepatitis B (c. 30 nm), and a large fungal spore, such as Alternaria sp. (c. 100 μm), are released at the same time from the top of a 10-m-tall building. If the horizontal wind speed is constant at 4 m s−1 and there is no vertical advection or particle aggregation, the spore would be transported a horizontal distance of less than 150 m before settling to the ground, while the viral particle would be transported nearly 200 000 km. The circumference of Earth is c. 40 000 km, so it is theoretically possible that a VLP can be transported around the entire planet before depositing to the ground (Kellogg & Griffin, 2006; Allen & Turner, 2008; Takemura et al., 2011). Similarly, if a small virus and large fungus are released from 2 m high in an indoor environment, it would take the fungal spore less than 10 s to settle onto the floor, while it would take the viral particle nearly 3 months. Although these examples are oversimplified, the results demonstrate that viral bioaerosols can be significantly removed from their original hosts, both spatially and temporally.

Figure 1.

Potential horizontal transport distance in an outdoor environment for viruses (dotted line), bacteria (solid line), and fungi (dashed line) estimated for bioaerosols released from 10 m high (a). Lines representing the upper and lower bounds are shown for viruses, bacteria, and fungi, corresponding to 20 and 300 nm, 0.5 and 5 μm, and 1 and 100 μm, respectively. The total time a bioaerosol remains airborne in an indoor environment is also estimated, assuming a particle is released from 2 m high (b). These estimates show that viruses may be transported orders of magnitude farther and remain airborne much longer than bacteria or fungi.

Previously, Bowers et al. (2011ab) used community sequencing to determine potential sources of airborne bacteria in the United States. When using metagenomic sequencing to determine viral bioaerosol community sources, one must be cautious in attempting to identify potential sources. Viral bioaerosols' potential for long-distance transport make it difficult to interpret whether a viral bioaerosol community represents local sources and community structure or a combination of more distant sources if viruses are transported intercontinentally and are just ‘passing through' during the time of sampling.

Another challenge associated with studying a viral bioaerosol community is determining whether virus particles collected are viable and infectious. Environmental factors, such as humidity, temperature, and solar irradiation, have been shown to affect viral bioaerosol viability (Harper, 1961, 1963; Akers et al., 1966; Songer, 1967; Sagripanti & Lytle, 2007). Fungi and bacteria can easily be cultured in a laboratory setting to determine viability, but because viruses are obligate intracellular parasites, they require a living host to grow and replicate. There are no growth media that allow easy determination of viability. Using metagenomics to study viral bioaerosol communities is informative in its own right, and its utility could be extended further if there were a way to relate community data with viability. Knowledge about viability could show that viruses are not just carriers of nucleic acid, but rather, they are potentially infectious agents with implications for long-distance disease transmission and genetic transfer.

Challenges associated with viral bioaerosol metagenomic studies

Commonly used methods for collecting bioaerosols include filters, liquid impingers, cyclone samplers, slit samplers, and electrostatic precipitators (Verreault et al., 2008). Choosing an appropriate sampling method to collect viral bioaerosols is both essential and challenging due to their size being orders of magnitude smaller than that of bacteria and fungi. Not all collection methods have high enough collection efficiencies for viral metagenomic studies. Historically, researchers have primarily used either liquid impingers or filters to collect viral bioaerosols; however, each of these methods has important drawbacks. Impingers, first introduced as all-glass impingers (AGIs) (May & Harper, 1957), are the most commonly used sampler to collect viral bioaerosols (Verreault et al., 2008). In an impinger, a jet of air is forced into a liquid, and particles impact into the liquid. Smaller particles can diffuse into the liquid as the air bubbles its way back to the surface. The major advantage of impingers is that viral bioaerosols are deposited into a liquid media to preserve viability (May & Harper, 1957; Harper, 1961). There are two major disadvantages of impingers. First, they have low collection efficiencies for viral bioaerosols, typically less than 1% for particles smaller than 100 nm (Hogan et al., 2005; Tseng & Li, 2005; Dart & Thornburg, 2008). Second, sampling time and flow rate are limited to c. 30 min and 15 L min−1 (Lin et al., 1997, 1999), respectively, as the collection medium liquids evaporate quickly. Additionally, there is the potential for re-aerosolization and loss in viability of collected bioaerosols. These challenges are problematic when trying to collect a large enough sample for metagenomic studies.

Filter collection is an alternative method to impingers. Bioaerosols are collected onto filters through five mechanisms: (1) interception, (2) impaction, (3) diffusion, (4) settling, and (5) electrostatic attraction (Hinds, 1999). Polytetrafluoroethylene, polycarbonate, and gelatin filters are commonly used for the collection of VLPs (Sawyer et al., 1994; McCluskey et al., 1996; Aintablian et al., 1998; Myatt et al., 2003, 2004; Tseng & Li, 2005). Filters are capable of trapping particles much smaller than the nominal pore size because many collection mechanisms other than straining are at work. Burton et al. (2007) observed collection efficiencies > 93% for VLPs with polytetrafluoroethylene and gelatin filters. Polycarbonate filters had lower collection efficiencies of 22–49%, although these values are still much better than with impingers. With the exception of gelatin filters, there are no limitations to sampling duration or flow rate, although using filters to study viral bioaerosols can result in structural damage to the virus particles, which would affect viability (Verreault et al., 2008). Some researchers prefer gelatin filters as they do not appear to significantly affect infectivity, but samples can only be collected for c. 30 min at a low flow rate (< 15 L min−1) to prevent the filter from drying out (Tseng & Li, 2005). Collection of viable viruses is not necessary for viral metagenomics, so using filters is advisable because they would allow sampling of a larger volume of air.

Filters used in heating, ventilation, and air-conditioning (HVAC) systems can conveniently collect bioaerosols, and the volume flow rate through them is often quite high (ASHRAE, 2007; Noris et al., 2011). Farnsworth et al. (2006) developed a method to successfully extract bacteria and VLPs from HVAC filters. HVAC filters vary greatly in their efficiency at trapping particles. The efficiency is quantified by a minimum efficiency reporting value (MERV), which ranges from 1 to 20, where higher numbers indicate higher collection efficiency. For studies of bioaerosols, using sampling filters with a MERV rating greater than 12 is recommended to ensure more than 90% of bioaerosols are collected (Fisk et al., 2002; ASHRAE, 2007; Noris et al., 2011). HVAC filters with low MERV ratings are not suitable for metagenomic studies due to low collection efficiencies for VLPs. On the other hand, filters with the highest MERV ratings are usually not practical because of the large pressure drop across them and thus greater power required to move air through them. Another advantage of HVAC filters is that they require no additional sampling equipment, namely pumps, which can be problematic because of power demand, space requirements, and noise generated. Researchers studying bacterial communities in indoor environments have already started using HVAC filters as collection media (Zhu et al., 2003; Noris et al., 2011; Hospodsky et al., 2012).

Viral biomass limitation

As viruses do not share a common gene or genetic element (Rohwer & Edwards, 2002), metagenomics represents a powerful tool for community analysis; however, viral metagenomics requires a large quantity of biomass to be collected during environmental sampling to have a sufficient amount of nucleic acid for sequencing. Traditional viral metagenomic sequencing requires a minimum of 1–5 μg of genomic material, which corresponds to c. 1011 virus particles (Thurber et al., 2009). Newer sequencing kits have allowed viral metagenomic sequencing with as little as 1 ng of genomic material. Additionally, researchers must account for the extraction efficiency of viral particles and nucleic acids. Previous studies have used 1012 virus particles for metagenomic sequencing preparation (Angly et al., 2006; Thurber et al., 2009).

The actual required sample volume may be even larger because there are additional biomass needs and opportunities for losses during preparation of viral bioaerosol samples for metagenomic analysis. The following steps are required: (1) purification and isolation of VLPs from bacterial cells, free nucleic acids, and other contaminants; (2) concentration of VLPs; and (3) extraction and purification of nucleic acid (Thurber et al., 2009). Tangential flow filtration and CsCl density centrifugation filtration have been successful for VLP isolation with large sample volumes (Bench et al., 2007; Dinsdale et al., 2008; Schoenfeld et al., 2008). If the viral sample volume is limited, it is possible to purify the virus by passing a suspension through a 0.45-μm filter to remove bacteria and other eukaryotic cells, followed by DNase/RNase treatment of the filtrate to digest any free nucleic acid (Bibby et al., 2011). One challenge with relying upon DNase/RNase treatment to remove free nucleic acids from a sample collected from air is that nucleic acids may be physically associated with dust and other minerals commonly present in air that protect them from DNase digestion (Romanowski et al., 1991; Cai et al., 2006). If VLPs in suspension need be concentrated, polyethylene glycol (PEG) precipitation is commonly added to a sample preparation protocol (Yamamoto et al., 1970; Lewis & Metcalf, 1988; Thurber et al., 2009). However, there are two drawbacks with PEG precipitation for viral bioaerosol preparation. First, a large sample volume (50–100 mL) is usually required. Second, PEG precipitation is not selective solely for viral particles and will precipitate other contaminants, such as metals and proteins commonly found in air (Demacker et al., 1980; Atha & Ingham, 1981). Finally, nucleic acid from the purified and concentrated VLPs must be extracted. Nucleic acid extraction efficiency is known to improve with higher nucleic acid concentration, so small sample sizes present a challenge. Additionally, it is possible the nucleic acid extraction step might either concentrate or remove contaminants.

The quality and purity of viral samples for metagenomic sequencing must be verified to ensure there is not any contaminating free DNA. Epifluorescence microscopy can be used to assess purity, as VLPs appear as distinct pinpoints, which differ from contamination which might appear as broader spots (Patel et al., 2007; Thurber et al., 2009). Additionally, PCR approaches, such as targeting the 16S rRNA region, may also be employed to assess VLP purity and confirm the absence of bacterial contamination. After nucleic acid has been extracted, the extraction products should be run on a gel to check for quality and the presence of ribosomal contamination; however, this is a challenge when working with air samples in which biomass is limited. At least 200 ng of nucleic acid is needed for a gel check, a significant portion of what would be required for metagenomic sequencing.

The volume of air that must be sampled to obtain the required number of VLPs depends on typical ambient concentration, which, to our knowledge, has not been established in air. Table 1 lists typical virus concentrations per unit volume found in a variety of environments including seawater, freshwater, soil, and humans (Fuhrman, 1999; Lighthart, 2000; Suttle, 2005; Williamson et al., 2005; Breitbart et al., 2008; Clasen et al., 2008; Bowers et al., 2011a, b; Kim et al., 2011; Qian et al., 2012; Reyes et al., 2012). The values for air are estimated based on the typical bacterial concentrations in air and the virus-to-bacteria ratio (VBR) found in other environments (Table 1). Bacterial concentrations as high as 106 bacteria per m3 have been reported both in indoor and in outdoor air (Lighthart, 2000; Bowers et al., 2011ab; Qian et al., 2012). The VBR ranges from c. 0.2 to 3000 in seawater, freshwater, soil, and human environments. Using this information, we developed a range of potential viral concentrations in air (Table 1), assuming low and high VBRs of 0.01 and 10 000, respectively. These VBRs are an order of magnitude lower than and higher than the extremes reported in other environments. Even an extremely high VBR (10 000 VLPs per bacterial cell) results in a VLP concentration in air of 104 viruses mL−1 that is two to three orders of magnitude lower than observed in marine and freshwater environments and more than five orders of magnitude lower than observed in soil and human gut environments. In air, it is likely that the VBR is higher than in other environments due to the preferential removal of bacteria by settling and the ability of VLPs to remain in the air for longer periods of time (Fig. 1)

Table 1. Viral and bacterial biomass present in different environments
EnvironmentViral concentrationBacterial concentrationRatio of viruses to bacteriaReferences
  1. a

    Estimated based on two extreme VBRs.

Coastal Arctic Ocean3.9 × 106 VLPs mL−15.6 × 105 cells mL−111.1Clasen et al. (2008), Fuhrman (1999) and Suttle (2005)
Coastal Pacific Ocean6.7 × 107 VLPs mL−13.0 × 106 cells mL−140.0Clasen et al. (2008), Fuhrman (1999) and Suttle (2005)
Lake9.5 × 106 VLPs mL−14.0 × 106 cells mL−12.9Clasen et al. (2008)
Agricultural soil1.1 × 109 VLPs g−14.0 × 105 cells g−12750Williamson et al. (2005)
Forest soil4.2 × 109 VLPs g−13.4 × 108 cells g−112Williamson et al. (2005)
Human gut1.5 × 109 VLPs g−17.6 × 109 cells g−10.2Breitbart et al. (2008), Kim et al. (2011) and Reyes et al. (2012)
Air1.0 × 10−2 VLPs mL−1a1.0 × 100 cells mL−10.01Bowers et al. (2011ab), Lighthart (2000) and Qian et al. (2012)
Air1.0 × 104 VLPs mL−1a1.0 × 100 cells mL−110 000Bowers et al. (2011ab), Lighthart (2000) and Qian et al. (2012)

Assuming an upper bound VBR of 10 000, collecting 1012 VLPs would require sampling an air volume of 105 L followed by 100% removal efficiency of viruses from the collection medium. If a VBR of 0.01 is assumed, 1011 L of air would be needed. At an air sampling flow rate of 10 L min−1, it would take 7 and 7 × 106 days (nearly 20 000 years) to sample 105 and 1011 L of air, respectively.

Viral database

Following successful sampling, sample preparation, and sequencing, raw sequence datum is subjected to QA/QC, assembled into contiguous sequences (optional step), and finally annotated against a viral database. Databases for the 16S rRNA gene in bacteria have been well established and are comprehensive, such as greengenes, silva, rdp, and eztaxon-e, making the annotation step when using 16S rRNA gene to study bacterial communities more straightforward (DeSantis et al., 2006; Chun et al., 2007; Pruesse et al., 2007; Cole et al., 2009; Kim et al., 2012). Genomic marker regions (e.g., 16S rRNA gene) are useful, as they are conserved regions that allow amplification of genomic material without knowledge of the target. Additionally, there is phylogenetic information for 16S rRNA gene, so the relationship of the sequence to previously identified sequences can be determined without knowledge of the source of the sequence, which is not the case for viruses.

Reference databases for viruses are limited due to the challenges of culturing viruses in the laboratory and performing whole genome sequencing (Breitbart & Rohwer, 2005; Edwards & Rohwer, 2005; Bibby, 2013). The most commonly used database for viral metagenomics is GenBank; however, it is estimated that much less than one percent of viruses have been cultured, sequenced, and uploaded to the database (Hendrix et al., 1999; Bibby, 2013). Another challenge associated with creating viral databases is that viruses must be isolated and subjected to whole genome sequencing (Rohwer & Edwards, 2002). Recently, Bibby and Peccia (2013) reported that c. 80% of viral samples collected from sewage sludge were unidentifiable. Continued improvements in identification and full genome sequencing of viruses will enhance viral databases, making metagenomics a powerful tool for studying viral community structure in both air and other environments.

Infectious viral particles

Another challenge and significant consideration during sampling the viral bioaerosol community is determining whether VLPs are still viable and infectious. Metagenomics is not able to distinguish between viable and inactivated viruses, as genomic material from both is sequenced. Environmental factors, such as humidity, temperature, and solar irradiation, have been shown to affect viral bioaerosol viability (Harper, 1961, 1963; Akers et al., 1966; Songer, 1967; Sagripanti & Lytle, 2007). Because viruses cannot be cultured in a laboratory setting without a living host (Mokili et al., 2012), additional approaches are required to demonstrate that VLPs are infectious. This information has implications for long-distance disease transmission and genetic transfer.

Future work needed for successful studies of viral bioaerosol communities

Viruses play an important role in the health of humans, animals, and plants, and they influence the global biogeochemical cycle by infecting bacteria. Viral bioaerosols are of particular concern due to their potential for intercontinental transport and ability to remain airborne for long periods of time in an indoor environment. Using metagenomics to study viral bioaerosol communities in indoor and outdoor environments holds considerable promise. Through metagenomics, researchers will be able to decipher changes in viral communities under different environmental conditions, relationships to other microbial communities, the effect of viruses on ecological processes, and the transport and dynamics of viruses relevant to human health and agricultural applications. Advances in sequencing technology have been breathtakingly rapid over the past decade. Sequencing is now faster and orders of magnitude cheaper, and costs continue to decrease, demonstrating the future potential of these approaches (Wetterstrand, 2014). Although there has been headway in the study of viral bioaerosols, much additional work and improvements are needed.

Improved sampling methods are crucial for exploration of viral communities in the atmospheric environment. Due to an extremely low biomass of viral particles in the air compared with other environments, current sampling technologies make it difficult to collect enough viral particles for effective metagenomic sequencing. Additionally, extraction efficiency is known to decrease with low biomass. One potential solution in an indoor environment is to use HVAC filters, which sample very large volumes of air. Researchers have successfully used HVAC filters to study microorganisms in indoor environments (Farnsworth et al., 2006; Noris et al., 2011; Qian et al., 2012). A potential sampling method for outdoor environments is to affix filters onto unmanned aerial vehicles (UAVs), which have been used to sample for pathogens in the atmosphere (Maldonado-Ramirez et al., 2005; Schmale et al., 2008; Techy et al., 2010). UAVs are capable of high volume flow rates of c. 500 m3 or 5 × 105 L of air per h, which would be expected to provide sufficient biomass for viral metagenomic sequencing.

In addition to improving sampling strategies for viral bioaerosols, databases to compare sequence data need improvement. Currently, only a very small fraction of all viral genomes are available in reference databases, resulting in a large fraction of ‘unidentified’ samples in a viral metagenomic analysis of any medium, whether air, water, or soil. Although the time and cost requirements for virus isolation are high, full genome sequencing of all viruses that can be cultured in the laboratory is necessary to advance viral metagenomics. Additionally, methods need to be developed that allow sequencing of the genomes of viruses that cannot currently be cultured in the laboratory. With improvements in methods and techniques, researchers will be able to begin to readily use metagenomics to explore viral communities and answer important questions about them.


Over the past decade, the community structure of bacterial and fungal bioaerosols has become a subject of intense scientific interest. The study of bacterial and fungal bioaerosol communities has been facilitated by the presence of well-conserved genes, comprehensive sequence databases, improvements in sequencing and costs, and an increased recognition of their importance. Although research has shown that many diseases of importance are caused by viral bioaerosols, interestingly, there has been very little work examining viral community structure in either indoor or outdoor air. There are many challenges associated with studying viral bioaerosols, including sample collection, low biomass, and lack of a conserved gene among viruses, among others. Metagenomics has made the study of viral community structures increasingly feasible. Viral bioaerosols provide an opportunity for future research, as they have been largely unexplored despite major implications for human, animal, and plant health, microbial ecology, and global biogeochemical and biodispersion processes.


This work was supported by the Alfred P. Sloan Foundation Grant 2013-5-19MBPF. The authors would like to acknowledge Robert M. Bowers for his helpful comments.