A unifying quantitative framework for exploring the multiple facets of microbial biodiversity across diverse scales


For correspondence. E-mail arthur.escalas@univ-montp2.fr; Tel. (+33) 4 67 14 92 27; Fax (+33) 4 67 14 37 19.


Recent developments of molecular tools have revolutionized our knowledge of microbial biodiversity by allowing detailed exploration of its different facets and generating unprecedented amount of data. One key issue with such large datasets is the development of diversity measures that cope with different data outputs and allow comparison of biodiversity across different scales. Diversity has indeed three components: local (α), regional (γ) and the overall difference between local communities (β). Current measures of microbial diversity, derived from several approaches, provide complementary but different views. They only capture the β component of diversity, compare communities in a pairwise way, consider all species as equivalent or lack a mathematically explicit relationship among the α, β and γ components. We propose a unified quantitative framework based on the Rao quadratic entropy, to obtain an additive decomposition of diversity (γ = α + β), so the three components can be compared, and that integrate the relationship (phylogenetic or functional) among Microbial Diversity Units that compose a microbial community. We show how this framework is adapted to all types of molecular data, and we highlight crucial issues in microbial ecology that would benefit from this framework and propose ready-to-use R-functions to easily set up our approach.


Through their important biomass and their diversified metabolic abilities, micro-organisms play a key role in the regulation of ecological processes, such as organic matter degradation, in all ecosystems (Sleator et al., 2008). However, the diversity of micro-organisms, representing a large portion of all biological diversity, has been only recently and partially described thanks to major advances in molecular methods, which is particularly true for bacteria and microbial eukaryotes (Fierer and Lennon, 2011). Indeed, the use of nucleic acid-based analyses has revealed that microbial diversity levels have been underestimated by several orders of magnitude over the past few decades (Ward et al., 1990; Rappe and Giovannoni, 2003; DeSantis et al., 2005). By improving our capacity to assess microbial community composition (i.e. the number and identity of taxonomical units) and structure (i.e. the abundance distribution among these units as well as their phylogenetic relatedness), the era of molecular microbiology already does and will even more provide a better understanding the ecological processes that shape these communities and underpin ecosystem functioning (Bell et al., 2005). To gain from this unprecedented mass of information, we need to unify the calculations of different diversity indices while considering the wide range of available and forthcoming data from genes to functions. Moreover, diversity indices should be able to quantify the biodiversity of microbial communities and its variations across spatial and temporal scales, and along environmental gradients (Christen, 2008).

Quantifying the diversity of ecological communities has become a multifaceted issue for all groups of organisms including microbes (Devictor et al., 2010). Species belonging to a community can differ in their abundance, their taxonomic affiliation, their phylogenetic relatedness along with their ecological functions (Bryant et al., 2008; Peter et al., 2011). These last three facets of biodiversity were coined as taxonomic, phylogenetic and functional diversity respectively. Beyond the simple assessment of community composition, in terms of microbial taxa, species or OTU (operational taxonomic unit) richness, the phylogenetic structure sheds light on evolutionary constraints and historical contingencies shaping microbial communities, while their functional structure indubitably relates to ecosystem functioning (Salles et al., 2009; Bissett et al., 2011; Peter et al., 2011; Pommier et al., 2012). In other words, while taxonomic diversity is the facet of diversity that provides information about how many and which microbes are present in the community, phylogenetic diversity informs about their evolutionary history, and functional diversity quantifies the breadth of roles that they play. However, the current indices used to estimate the level of microbial diversity are mainly restricted to community composition and rarely embrace the different facets of community structure by including phylogenetic and functional information along with the distribution of abundances among lineages or functional groups (Lozupone et al., 2007; Bryant et al., 2008). This is even more critical as environmental filtering may operate at the level of lineages instead of isolated species, and specific microbial processes may rely on a single phylotype or functional group (Bryant et al., 2008; Peter et al., 2011). This could be the case, for example, if a niche-based process selects lineages or species according to biological traits associated with specific phylogenetic groups (Pommier et al., 2012).

The importance of measuring the different facets of biodiversity (taxonomic, phylogenetic and functional) while taking into account scale effects is fundamental when quantifying microbial diversity with a biogeographical perspective or tackling macroecological issues (Odonnell et al., 1994; Christen, 2008; Fierer and Lennon, 2011). This could benefit the development of theories and hypotheses in regards to the factors that structure microbial communities, their response to environmental pressures and the connections between diversity and function (Kembel, 2009; Stegen et al., 2012). Surprisingly, only few microbiological studies have used the historical Whittaker's biogeographical framework (Whittaker, 1960; 1972), along with its biodiversity decomposition into local (α), inter-sites (β) and regional (γ) components (Griffiths et al., 2011). Here, the term ‘decomposition of biodiversity’ refers to the idea that there are different components (or levels) that, together, constitute the biodiversity at a certain scale. This approach is for instance particularly useful in biogeography where one wants to estimate the biodiversity across a landscape composed of different localities inhabited by different biological communities. Hence, the biodiversity estimation in each locality corresponds to the α level, the difference between these localities is the β level and the overall biodiversity across all localities is the γ level.

The estimation of these biodiversity components can be achieved either with indices related to community composition (species richness) or with more sophisticated indices embracing the different facets of community structure. Recently, Lozupone and Knight (2008) plainly reviewed the indices designed to estimate separately the α and β components of the Whittaker's decomposition for different types of data. While the use of these indices led to valuable insights in microbial ecology and biogeography such as the existence of universal patterns (e.g. taxa–area relationship and community assembly rules; Fuhrman, 2009), in our view, they exhibit several limitations. Indeed, former classical indices dedicated to the estimation of community differences (e.g. Jaccard, Sorensen and Bray–Curtis) only capture the β component of diversity, leaving apart the α and γ components. They also often compare communities in a pairwise way and consider all species as equivalent. Similarly, recent indices including species abundance distributions, phylogenetic or functional differences between species (e.g. Unifrac, VAW-UniFrac and βMNTD) only account for the β component of diversity (Lozupone et al., 2007; Kembel, 2009; Chang et al., 2011; Stegen et al., 2012). This can be a limiting view of biodiversity across scales as estimating differences between communities (β-diversity) does not provide any information about the biodiversity of local communities (α) and of the whole system (γ). For instance, for a given level of β-diversity estimated between two communities, using the Bray–Curtis or the Unifrac dissimilarity index, these two communities can have either a low or a high level of α diversity, and these two scenarios cannot be discriminated. In this case, the estimation of biodiversity is only partial and would require the comparison of the β-diversity value to α and γ-values to fully assess the biodiversity across the system. Moreover, the α, β and γ components are often estimated using different and independent indices, which prevent any mathematically explicit relationship among them and thus any comparison, contrary to the Whittaker's framework. Finally, no consensus has emerged about how and in which cases these indices should be used according to the earlier-mentioned limitations. We, thus, urgently need to improve our way of analysing the different facets of microbial diversity across scales through a unified and flexible framework that allows us (i) to estimate β-diversity for a set of two or more communities, (ii) to estimate simultaneously the three components using a similar unit, (iii) to compare the three components using a decomposition of γ diversity and (iv) to integrate data of different nature (e.g. abundance, taxonomy, phylogeny and function). New indices recently developed for biogeography and community analyses of macro-organisms may fulfil this gap and may successfully be applied to the microbial world (Allen et al., 2009; Ricotta and Szeidl, 2009; Devictor et al., 2010).

Here we aim to (i) briefly review the different kinds of methods used to assess the different facets of biodiversity in microbial communities and classify the generated data; (ii) propose a unified, flexible and multifaceted framework to estimate microbial diversity based on taxonomic, phylogenetic or functional data and across temporal and spatial scales; (iii) present some possible applications of this framework in microbial ecology and finally (iv) provide appropriate ready-to-use resources for organizing and analysing multiple microbiological data.

Measuring the diversity of microbial communities: different methods for different data

We first present the available methods to study the different facets of microbial biodiversity along with their advantages, disadvantages, some key considerations for their application and the associated data (Table 1).

Table 1. Summary of the most widely used molecular methods to assess the diversity of microbial communities and their associated data.
Method typeMethod characteristicsNature of Microbial Diversity Units (MDUs)
Diversity sourceAmplificationOutputSample throughputCostAdvantagesDisadvantagesNucleic acid fragmentsTaxaPhylogenetic groupsFunctionsRelative abundance
  1. Ticks in brackets correspond to method–data associations that are not unanimously accepted in the literature. See text section 1 for further details on the methods and data type, definition of ‘group-specific gene’, ‘functional encoding gene’ and the corresponding references.
Nucleic acid-based fingerprintsGroup-specific genePCRBand/peak patternMedium-high+Quick and easy to set upLimited resolution(✓)(✓)(✓)(✓)
Functional encoding geneRestricted sampling effort
Gene-based cloning and sequencingGroup-specific genePCRSequencesLow+++Adaptable sampling effortCloning step and associated costs and quantitative biases (✓)(✓)
The highest resolution and coverage (reads >1000 bp)Focus on a specific sequence
Reliance on existing databases
Shotgun cloning and sequencingMetagenomeNew amplification methodsSequencesLow+++Adaptable sampling effortCloning step and associated costs and quantitative biases 
No amplificationLong reads
Whole metagenome screeningReliance on existing databases
Next generation sequencingMetagenomeNew amplification methodsSequencesHigh+++Adaptable sampling effortPotential diversity overestimation 
No cloning stepReliance on existing database
No amplificationShort reads
Whole metagenome screeningSample throughput limited by the cost
MicroarraysGroup-specific geneNew amplification methodsArray imageHigh++++Adaptable and standardized sampling effortOnly predefined genes can be detected 
PCRWhole metagenome screeningReliance on existing databases
Functional encoding geneSample throughput limited by the cost
No amplification

Methods for studying the different facets of biodiversity in microbial communities

The presented methods were selected based on their potential to be used for extensive biodiversity studies. They also need to be well established in the literature, independent of cultivation, allow the characterization of microbes’ biodiversity while focusing on the community level, and allow the analysis and the comparison of a large number of samples in a standardized and reproducible way. Moreover, in order to be used with the proposed framework, these methods should provide a table output depicting the microbial composition (in rows) of the studied communities (in columns). Based on these criteria, we set aside several methods used to assess microbial diversity such as the family of fluorescent in situ hybridization methods because their sample size limitation and their low resolution or the community-level physiological profile methods such as Biolog Ecoplates because they are not culture-independent methods and are biased by inoculums variability (Kirk et al., 2004). Hence, we focus on high-throughput nucleic acids-based methods even if they have well-known limitations such as nucleic acid extraction step (Petric et al., 2011). The following section briefly reviews some key characteristics of the chosen methods, along with their advantages and disadvantages.

One of the most common approaches to study the diversity of microbial communities is the use of ribosomal RNA (rRNA) gene analysis using fingerprinting methods. These fingerprint techniques are based on electrophoretic separation of PCR products (or amplicons) amplified from nucleic acid extracted from a sample (Nocker et al., 2007). The separation step can be performed by gel electrophoresis, chromatographic or capillary electrophoresis, and is based on amplicon length (T-RFLP, ARISA and LH-PCR) or nucleotide composition (SSCP, DGGE and DHPLC). For each community (i.e. sample), the resulting output is a profile specific of the studied community with respect to migration distance and relative intensity of band or peak, which refer theoretically to a unique sequence (Loisel et al., 2006). All fingerprints are affected by the same nucleic acid extraction and PCR biases (unspecific amplification, generation of chimeric sequences, formation of heteroduplexes and nucleotide misincorporation), and are inherently limited in the maximum number of microbial units detected (Von Wintzingerode et al., 1997; Loisel et al., 2006). Moreover, the diversity estimation is biased by migration issues leading to comigration of multiple amplicons (up to 15) under one band or peak, or the formation of several bands or peaks for one amplicon (Kisand and Wikner, 2003). For detailed comparisons of fingerprint methods, see elsewhere (Kirk et al., 2004; Nocker et al., 2007).

During the last decade, the development of metagenomic tools with even higher resolution has revolutionized the description of biodiversity in microbial communities. The metagenomic term refers to the culture-independent analysis of complete genomes of microbial communities, directly isolated from an environmental sample (water, soil and air) or living on plants or animal hosts (Sleator et al., 2008; Petrosino et al., 2009; Metzker, 2010). Metagenomic methods considered here (sequencing and microarrays) offer the highest throughput and resolution for microbial diversity assessment and could avoid biases introduced during PCR amplification of marker genes (von Mering et al., 2007).

Historically, the most standard metagenomic approach is gene-based cloning and sequencing, which involves the cloning and sequencing of amplicons using the Sanger method (Suenaga, 2012). This chain-termination method (Sanger et al., 1977), improved and still in use, produces high-quality reads (sequences) up to 1000 bp and has a wider coverage of the targeted sequences and a better resolution than other sequencing approaches (Xiong et al., 2010). In counterpart, the cloning step can be time consuming and laborious, reducing the sample throughput or leading to arbitrary loss of genomic DNA (Metzker, 2010; Zinger et al., 2012). The use of direct pyrosequencing of amplified fragments can avoid the cloning step, but the length of the reads is shorter with this method, which makes the process of genome assembly more difficult (Fierer and Lennon, 2011). Moreover, gene-based sequencing (with cloning or not) is restricted to PCR-amplified sequences and so is affected by PCR biases. Overall, the limitation here lies in the reliability of relative abundance distributions for detected sequences because the extraction, amplification and cloning steps of nucleic acid may introduce quantitative biases. These gene-based sequencing approaches are progressively being replaced by whole metagenomic sequencing, which produces reads from the whole metagenome sequences and could benefit from the development of new amplification methods such as whole genome amplification or emulsion PCR (Shendure and Ji, 2008; Petrosino et al., 2009). Whole metagenomic DNA can be randomly sheared before cloning and sequencing, this is the case in the shotgun approach, or can be directly sequenced using next generation sequencing (NGS) methods (Petrosino et al., 2009; Roh et al., 2010); for a detailed comparison of NGS methods see elsewhere (Shendure and Ji, 2008; Metzker, 2010). These whole metagenomic sequencing approaches generate thousands of short sequences (reads) that can be assembled in longer sequences using bioinformatic automated pipelines (Hirsch et al., 2010; Santamaria et al., 2012). Recent developments in high-throughput sequencing are limited by the computing step, but rapid improvements are underway (von Mering et al., 2007; Metzker, 2010).

Another recently improved metagenomic approach is the microarray, which are glass slides on which DNA fragments are spotted and serve as probes for the hybridization of the labelled metagenomic DNA. Next, fluorescent label intensity is measured, which reveals the presence of hybridized DNA on the slides (Zhou, 2003). Microarrays overcome some of the restrictions of other methods such as the low resolution of fingerprint methods, the laborious cloning step of some sequencing approaches and avoids the amplification step by direct hybridization of metagenomic DNA on the slides (Roh et al., 2010). However, microarrays only detect already sequenced genes as the probes spotted on the slides are designed using reference databases (Zhou, 2003; Hirsch et al., 2010).

For detailed comparisons of metagenomic approaches see Metzker (2010), Roh and colleagues (2010) and Su and colleagues (2012).

Markers to assess the biodiversity of microbial communities

Basically there are two categories of marker genes, group-specific genes and functional encoding genes (Stahl, 2007). Group-specific genes (taxonomic or phylogenetic) correspond to conserved biopolymers that can be used to infer taxonomic or phylogenetic relationships among the organisms. Functional encoding genes code for specific proteins and could be used to evaluate specific chemical transformations or potential activity of microbial populations (Stahl, 2007).

The taxonomic and phylogenetic relationships between prokaryotes can be deduced from sequence comparisons of conserved group-specific genes. To do so, these genes should be widely distributed, should not be frequently transmitted horizontally and should be present as a single copy. In addition, they should not be too long for being easily amplified and sequenced, but not too short in order to contain enough information. Moreover, they should have a ‘good’ level of resolution, that is they should not be too conserved nor too variable (Gevers and Coenye, 2007). The group-specific markers that are the most commonly used in microbial ecology are the rRNA genes because they are universally present, functionally constant and are composed of conserved and more variable domains (Vos et al., 2012). Modern microbial taxonomy and phylogeny benefit from the PCR sequencing of the genes coding for the small (16S) or large (23S) rRNA molecules (Gevers and Coenye, 2007; Stahl, 2007). However, other genes had been used as group-specific marker genes to delineate relationships between micro-organisms such as recA, gyrB, rpoB, rpoD, hsp60, soda, atpD and infB (Gevers and Coenye, 2007; Bonilla-Rosso et al., 2012; Vos et al., 2012).

Functional encoding genes have been used as markers for studying the diversity of microbial functions in different nucleic acid-based approaches such as fingerprints (Hirsch et al., 2010), microarrays (He et al., 2008) or various sequencing approaches (Dinsdale et al., 2008; Carvalhais et al., 2012).

In any of the earlier presented nucleic acid-based methods, the target molecule should be either DNA or RNA, the latter being the transcribed version of the former, and the use of reverse transcription is required to generate cDNA from RNA (Stahl, 2007). In an ecological perspective, DNA and RNA targets provide different views of the microbial communities, which correspond, respectively, to metagenomic and metatranscriptomic approaches. When using a group-specific marker and its corresponding DNA and RNA versions, 16S rDNA and 16S rRNA for instance, we can differentiate present versus active bacteria respectively (Gremion et al., 2003). In the case of functional encoding genes, one can differentiate present (potential) versus expressed (realized) functions depending on the use of DNA or mRNA respectively (Sleator et al., 2008; Roh et al., 2010). Note that total environmental RNA extracted from microbial communities is mainly composed of rRNA and transfer RNA, with approximately 1–5% mRNA (Carvalhais et al., 2012).

Data generated and associated microbial biodiversity facets

The data outputs of selected methods are represented in Fig. 1. In the rest of the article, we will use the generic term Microbial Diversity Unit (MDU) to refer to the different diversity units that compose a microbial community. These MDUs can correspond to different facets of biodiversity: the diversity of nucleic acid fragments, taxa, phylogenetic lineages or functions depending on the method and the genetic marker.

Figure 1.

The different ways to obtain data for the different facets of microbial biodiversity using the methods presented in Table 1. See text and references therein for a detailed description of the different steps presented here.

The data obtained using fingerprint methods can be represented as a presence–absence matrix for each MDU in each sample (Fig. 1). Although there is still an ongoing debate about how to name the units detected by fingerprints (e.g. OTU, biotype, ribotype, genotype, phylotype and ribosomal genotypes), all these terms refer to the same entities. They are in fact nucleic acid fragments (amplicons) that are discriminated in various ways, which depend on the method, to give a snapshot of the complexity of the microbial community. Consequently, in the case of fingerprints, the generic term MDU will refer to these entities, and the associated diversity will be called ‘fingerprint's nucleic acid fragments diversity’. Considering the earlier-mentioned PCR biases, there is still a debate on the possibility to use band intensities, peak heights or areas as relative abundances of MDUs (Bent et al., 2007). Taking into account the limited resolution of the method and the earlier definition of MDU, it is important to note that these MDUs may not correspond to any taxonomic or phylogenetic group. Hence, the resolution of fingerprints methods is limited as they do not provide any clues about which microbes compose the community. However, it is possible to collect and sequence the MDUs (e.g. bands in DGGE) to know their taxonomic affiliation. This allowed, in the context of many experiment, the identification of the organism of interest, but this approach remains complicated in practice and is still considered limited (Zinger et al., 2012). The amplified fragment may also correspond to a functional encoding gene (mer, amoA, nifH, nozZ, mcrA, etc.). In such a case, what is estimated is the diversity of the gene encoding the function within the community but not the whole diversity of the community functions because fingerprints can analyse only one gene at a time (Stahl, 2007; Hirsch et al., 2010).

Measuring the taxonomic diversity of micro-organisms requires their identification and classification into the different levels of taxonomic hierarchy such as genus, family, order and phylum (Odonnell et al., 1994; Santamaria et al., 2012). This approach is still widely used to study microbial diversity, but affiliation of microbes into taxonomic levels is no longer done on the basis of their phenotypic characteristics but rather using their genetic similarity with known taxa (Huse et al., 2008). This similarity is estimated using nucleotide sequences of group-specific marker genes (usually the 16S rRNA gene) present in the community metagenome (Odonnell et al., 1994). The sequences can be obtained using earlier presented sequencing approaches and can then be compared with reference databases to obtain their taxonomic affiliation (Christen, 2008; Santamaria et al., 2012; Fig. 1). The description of this step (called binning) is beyond the scope of this work (but see Kunin et al., 2008; Santamaria et al., 2012 for further details). Although there is some quantitative biases associated with the nucleic acids extraction, PCR and cloning steps, metagenomics approaches provide the relative abundances of detected MDUs (i.e. taxa or OTUs).

The taxonomic diversity of microbial communities can be also estimated using microarrays designed with probes corresponding to group-specific marker genes of different microbial taxa. For instance, the PhyloChip G3 is a microarray able to detect more than 60 000 different MDUs (i.e. taxa) (Kellogg et al., 2012). The probes hybridized on the array provide the MDU composition of the community, and their hybridization intensities can be linked to the relative abundances of MDUs to fully assess the quantitative structure of microbial communities, i.e. the abundance distribution among MDUs (Fig. 1; DeSantis et al., 2005; Handley et al., 2012). The resulting data are a table with detected MDUs in rows and the studied samples in column (Fig. 1).

The investigation of phylogenetic diversity of microbial communities has increased since the development of metagenomic methods, providing deeper insight into the processes that influence their composition and structure (Martin, 2002; Christen, 2008). To assess the phylogenetic diversity of microbial communities, one needs to know the phylogenetic relatedness between the microbes present in the community. A first way is to determine the taxonomic diversity of the community using the sequencing approaches described previously and then to assign the identified MDUs to lineages of a reference phylogenetic tree (Petrosino et al., 2009; Liggenstoffer et al., 2010). Phylogenetic relatedness between MDUs can also be obtained by comparing directly their respective sequences with referenced sequences (Kembel et al., 2011). As in the case of taxonomic MDUs, these approaches provide the relative abundances of the phylogenetic MDUs (i.e. the leaves of the phylogenetic tree).

The phylogenetic diversity of microbial communities can also be estimated using microarrays. Indeed, the list of positive hybridized probes (positive MDUs) can be used to prune the phylogenetic tree relating the entire spotted probes on the array. This gives the tree relating only the MDUs composing the studied community (Holmes et al., 2010).

In both cases (sequencing and microarrays), the resulting dataset is a table containing the detected MDUs and the studied samples, associated with a phylogenetic tree or a phylogenetic distance matrix depicting the relatedness among MDUs (Fig. 1).

The functional diversity of micro-organisms is now widely considered as the biodiversity component underpinning ecosystem functioning (Christen, 2008). Its estimation drastically differs depending on the size of the studied organisms. Indeed, for macro-organisms (e.g. plants, fishes and birds), functions performed by each species or populations are usually approximated by measuring functional traits on individuals and finally calculating the functional diversity of communities (Mendez et al., 2012; Villéger et al., 2012). Even if this approach is possible for some groups of micro-organisms such as zooplankton or flagellates, it becomes more difficult to set up as the size of studied organisms decreases (Barnett et al., 2007; Kruk et al., 2010). When communities are composed of bacterial, archaeal or microbial eukaryotes populations, this species-centred approach is currently impossible because we cannot separate all the populations that compose a community to measure their specific functional traits. In the near future, the development of technologies such as MicroFISH, NanoSIMS and flow cytometry may allow assigning simultaneously both identity and functions to specific microbes forming natural communities (Amann and Fuchs, 2008). However, today, the functional diversity of certain groups of micro-organisms such as bacteria, archaea and microbial eukaryotes is only assessed at the community level using nucleic acid-based methods (Dinsdale et al., 2008; He et al., 2008; Yavitt et al., 2012). Metagenomic data provides extensive information about gene content and their potential functions, and metatranscriptomic assesses what genes may be expressed. Whatever the approach, the functional data are obtained as lists of functional encoding genes associated with the whole community (Fig. 1). These genes can, however, be treated as discrete functional units (MDUs) composing the functional diversity of microbial communities, exactly the same way as taxa compose their taxonomic diversity. This diversity of functions can be obtained using whole metagenomic sequencing approaches by comparing the obtained sequences to databases containing sequences of functional genes (Dinsdale et al., 2008; Prakash and Taylor, 2012). As it is the case for taxa or phylogenetic groups, relative abundances of these functional MDUs can be extracted, providing a full quantitative assessment of functional community structure. Similarly, using probes corresponding to sequences of functional genes, microarrays can provide the functional diversity of microbial communities along with the relative abundances of detected functions (He et al., 2008).

The resulting dataset is a table containing the detected MDUs in rows and the studied samples in column (Fig. 1). In the near future, we may use functional data in the same way as we use phylogenetic data, as it is done for macro-organisms. Then, combining a taxonomic MDU composition table and a functional matrix depicting the functions performed by each MDU, we will be able to assess microbial functional diversity using the tools primarily developed for macro-organisms (Mouchet et al., 2010).

Decomposing the biodiversity into α, β and γ components

Biodiversity across scales

Biodiversity is classically decomposed across temporal and spatial scales into three levels considered as components: (i) local diversity (α), (ii) regional diversity (γ) and (iii) the difference among local communities (β). β-diversity is also referred as differentiation diversity and turnover (Whittaker, 1960; Vellend, 2001; Jurasinski et al., 2009; Anderson et al., 2011). Although these last two terms are often synonymous in the literature, they actually apply to different concepts (Tuomisto, 2010a,b; Anderson et al., 2011). ‘Differentiation diversity’ refers to the variation in community structure regardless of any external gradient and often estimated using (dis)similarity or distance estimators (Jurasinski et al., 2009; Anderson et al., 2011). ‘Turnover’ can be defined as a directional (along a gradient) pairwise estimation of change in community structure (according to Jurasinski et al., 2009 and Anderson et al., 2011, but criticized in Tuomisto, 2010a,b). To prevent ambiguity, we use the generic term β-diversity throughout the paper to refer to between community diversity based on an additive partitioning of biodiversity components.

Since Whittaker's seminal works on β-diversity (Whittaker, 1960; 1972), the number of proposed indices to quantify the three components of diversity has drastically increased (Koleff et al., 2003; Tuomisto, 2010a,b; Anderson et al., 2011). β-diversity indices can be divided into two classes whether they are based on a dissimilarity metric or deduced from the decomposition of diversity into α, β and γ components. In the first case, β-diversity is simply estimated as a pairwise intercommunity distance using a chosen dissimilarity metric (e.g. Sorensen, Jaccard or Bray–Curtis dissimilarity). This is achieved regardless of α and γ components (Koleff et al., 2003; Zinger et al., 2012). An important limitation of this dissimilarity-based approach of β-diversity is that the same intercommunity dissimilarity value (e.g. calculated with the Bray–Curtis index) may be obtained between two pairs of communities, which have different local diversity values (estimated with another index). In the second case, diversity is decomposed into α, β and γ components, all being related within an additive, β = γ –  or a multiplicative framework, β = γ/, in which corresponds to the mean local diversity across samples (Whittaker, 1960).

Additive decomposition of the Rao quadratic entropy

The Rao quadratic entropy (Q) is a measure of diversity that combines species-relative abundances and pairwise interspecies differences. By combining these two features of diversity, this index measures the community structure rather than its composition. This approach therefore complements the classic estimation of diversity using indices based on species richness, evenness or community composition (i.e. basically who is present in the community). In the context of microbial ecology, species can be replaced by any MDUs, such as phylotypes, OTU, taxa, species or functional genes, according to the method used (Table 1).

Here, we propose to use additive partitioning of the Rao quadratic entropy, which has several valuable properties in comparison to independent α, β and γ diversity estimations (Rao, 1982; Ricotta and Szeidl, 2009). Additive partitioning has the advantage, over its multiplicative counterpart, to express the three components (α, β and γ) in the same unit (phylotype, taxa, OTU, functional genes, microbial unit, etc.) so they can be compared directly (Lande, 1996). Another advantage of this framework is that α-diversity values do not influence the calculation of β-diversity values (Jost, 2007; 2010). The additive property also enables the calculation of the relative contributions of α- and β-diversity to the γ-diversity and in doing so, to compare their values among multiple scales and studies (Lande, 1996). Finally, using this framework, β-diversity can be estimated globally for a set of communities or between pairs of communities.

At the local scale, Rao quadratic entropy Qα represents the expected dissimilarity between two randomly chosen MDUs from a sampled community and hence can be defined as the extent of dissimilarity between MDUs in a community (e.g. the phylogenetic distance between taxa in a community):

display math(1)

Where dij is the distance (taxonomic, phylogenetic or functional) between the i-th and the j-th MDUs in the local community; distances need to be ultrametric to ensure the monotonicity of the Q with the richness (Pavoine et al., 2005). In an ultrametric tree, the branch lengths are scaled in a way that all distances from the root to the tips (or leaves) of the tree (MDUs in our case) are the same (Vellend et al., 2010). When these distances are unknown, dij can be set to unity. pi and pj are the relative local abundances of the i-th and the j-th MDUs respectively; pi and pj can be set equal for presence–absence data. s is the number of MDUs in the local community.

At the regional scale γ, sampled communities are pooled together into a single regional community. The Rao quadratic entropy at this regional scale Qγ can be defined as the extent of dissimilarity between two randomly chosen MDUs in the regional community.

display math(2)

Where dij is the distance (taxonomic, phylogenetic or functional) between the i-th and the j-th MDUs in the regional community. Pi and Pj are the relative regional abundances of the i-th and the j-th MDUs respectively. S is the number of MDUs in the regional community. The relative regional abundances are commonly quantified as the mean relative abundances over local communities for MDUs. Likewise, the quantification of local diversity, dij, Pi and Pj, can be set to a unique value when information is missing, e.g. dij = 1 or Pi = Pj = 1/S.

The mean intracommunity () quadratic entropy Q is simply the mean of the local quadratic entropy values across the kth studied communities. The local quadratic entropy (Qα) can be weighted by a parameter wk or not (see de Bello et al., 2010 for more details). For instance, this parameter could correspond to local community abundances:

display math(3)

Subtracting the regional quadratic entropy (Qγ) and the mean local quadratic entropy (Q) allows quantifying the β component of the quadratic entropy (Qβ) using an additive framework and hence the intercommunity diversity (Ricotta and Szeidl, 2009; de Bello et al., 2010):

display math(4)

Standardized indices

Recent studies show that many diversity indices, including the Rao quadratic entropy, might have counterintuitive ecological properties (Jost, 2007; Ricotta and Szeidl, 2009; de Bello et al., 2010). Indeed, when α-diversity increases, the β-diversity decreases and approaches zero, even in cases where there are no shared species between sampling units. Consequently, estimated β-diversity would be low regardless of the actual species overlap and the change in diversity across sampling units (Jost, 2007). We therefore applied the correction proposed by Jost (2007) derived from equivalent numbers (see de Bello et al., 2010 for further details). Following its definition, the equivalent number of species is the number of maximally dissimilar species having equal abundance, which produces maximal entropy. Thus, by replacing Q and Qγ by their equivalent numbers in Eq. (4), we obtain the unbiased measures of intracommunity, regional and intercommunity diversity as follows (Ricotta and Szeidl, 2009):

display math(5)
display math(6)
display math(7)

It is worth noting that the correction is applied on Q and not on the local Qα so the relationship that makes Q the mean of local Qα is lost (see de Bello et al., 2010 for more details).

To quantify the relative proportion of α and β components of diversity within the γ diversity, the corrected Qβ(corrected) component of quadratic entropy can be expressed as the percentage of the corrected Qγ(corrected) component (total regional diversity):

display math(8)

We compiled a function under the free R software (R Development Core Team, 2011) to estimate all Rao indices described earlier. This function, along with an R-script is available in Appendix S1.

Theoretical examples

Our described decomposition is illustrated with four simplified but realistic cases in Fig. 2, according to the nature of the MDU data (presence/absence versus relative abundance, and known versus unknown relationships among MDUs). We built an artificial regional pool of five MDUs (A–E) scattered across three local communities (I–III) of similar size, i.e. three individuals [cases (A) and (B)) and 100 individuals [cases (C) and (D)] distributed into three MDUs taken from the regional pool. For each community, we estimated α-diversity (Qα), the mean α-diversity (Q), the regional γ-diversity (Qγ), the β-diversity (Qβ) and the standardized β-diversity (Qβst). The Jost correction was applied in order to quantify diversity while accounting for the equivalent number of species, i.e. the number of maximally dissimilar (dij = 1) and evenly distributed MDUs required to obtain the same index value (Q) as estimated with our dataset. In case (A), data are presence–absence of MDUs, while pairwise distances are unknown (dij = 1). In case (C), the relative abundances of MDUs are known. Cases (B) and (D) are based on the same community matrices as cases (A) and (C), respectively, but the phylogenetic relatedness among the five MDUs are known in the formers. This relatedness corresponds to the pairwise cophenetic distances between MDUs, which is the amount of branch length relating all MDU pairs on an ultrametric phylogenetic tree.

Figure 2.

Schematic example illustrating calculation of additive partitioning of Rao's quadratic entropy (Q, Qβ and Qγ) for different types of data. We estimated the different components of Rao entropy using the Rao function: α-diversity (Qα), the mean α-diversity (Q), the regional γ-diversity (Qγ), the β-diversity (Qβ) and the standardized β-diversity (Qβst) (see Appendix S1 for data, R-script and R-function). We applied the Jost correction, but the weighting of local communities was not applied because all communities contain the same number of individuals (100).

A. Presence–absence data of equally distant Microbial Diversity Units (MDUs, A–E) within local communities (I, II and III).

B. Presence–absence of phylogenetically related MDUs within local communities.

C. Abundance data of equally distant MDUs within local communities.

D. Abundance data of phylogenetically related MDUs within local communities. dij corresponds to the distance between MDUi and MDUj; coph (MDUi and MDUj) is the cophenetic distance between MDUi and MDUj, that is the length of the branches relating these two MDUs on a phylogenetic or functional tree; Ncom is the total number of individuals in the studied communitym, and NMDU is the number of individuals of the MDUi. Note that Q values do not correspond to the mean local Qα values because the Jost correction was applied after the calculation of local Qα, Q, Qβ and Qγ.

In case (A), all communities have the same Qα diversity as they contain the same number of MDUs (i.e. three). The regional Qγ value does not equal the number of MDUs (i.e. five) because Qγ decreases with the proportion of shared units among communities. Here, MDUs A, C, D and E are shared by two communities. In case (C), the relative abundances of MDUs are known. Local diversity (Qα) is maximized when individuals are evenly distributed (community I) and minimized with unbalanced distributions of individuals amongst units (community III).

Cases (A) and (C) exhibit the highest Qβst values among the four cases; β-diversity represents more than 37% of the estimated regional diversity Qγ. In cases (B) and (D), the highest Qα value is estimated for community I, which has the highest phylogenetic diversity (distance = 19.2) as each MDU (A, C and E) belongs to scattered lineages. Moreover, MDUs have similar abundances in this community, thus increasing the estimated diversity. Community II has the lowest Qα value in both cases. This is due to an uneven abundance distribution among MDUs but more importantly to the low phylogenetic diversity (distance = 14.9), explained by the presence of close relative MDUs such as A and B. Finally, community III has intermediate Qα values because the phylogenetic diversity is intermediate (distance = 18.1) and abundances are unevenly distributed (case D).

The estimated β-diversity (Qβst) represents 23.6% and 22.1% of the regional Qγ diversity in cases (B) and (D) respectively. These values are lower than in cases (A) and (B) because taking into account the phylogenetic relationships between MDUs reduces the dissimilarity between communities that share common branches on the phylogenetic tree in addition to sharing some species.

All the data, R-scripts and R-functions required to run these theoretical cases are available in a user friendly format in Appendix S1.

How does this framework enrich the microbial ecologist's toolbox?

What do we need in microbial ecology?

Microbial ecology faces, at least, two major challenges. The first one relies on the need to elucidate the role of microbes not only on ecosystem functioning, but also on ecosystem resilience and stability in the context of environmental changes (Bell et al., 2005). The second challenge is to use the enormous amount of available and forthcoming microbial data generated through molecular approaches in a quantitative way that is more ecologically relevant (Jones et al., 2012). These two challenges are interrelated because the former needs to generate a large number of samples (which is realistic in terms of sampling strategy and collection), and the second will see an increasing amount of information per individual and species that differ in their nature (abundance, identity, phylogeny, activity, physiology and function). To address these challenges, microbial ecology calls for a much better description of biodiversity, microbial processes and interactions in space and time. Decomposing diversity, as described earlier, in a way that fulfil Whitaker's framework but with more flexibility, is of high priority because it will allow the comparison of communities in a standardized way (time point and sites) and the integration of data of various sorts.

Measuring community structure to complement existing tools

Given the myriad of indices already available for microbiologists, the proposal of a new framework is only valuable if it brings additional and complementary information to existing tools (Lozupone and Knight, 2008). Using four theoretical cases (Fig. 3), we compared β-diversity values estimated using the additive Rao quadratic entropy framework, based on MDUs phylogenetic relatedness and their relative abundance, with those obtained using a classical additive composition, based on MDUs composition only (i.e. MDUs presence/absence).

Figure 3.

Comparison of the additive framework for the Rao quadratic entropy with a composition-based additive partition of Microbial Diversity Unit (MDU) diversity. In each case (A–D), we calculated an additive partition of the regional diversity. We estimated Q, Qβ and Qβst components of diversity using the Rao quadratic entropy framework (Q). We estimated , β and βst using an additive composition-based framework, i.e. a framework that considers all MDUs as equivalent and uses only presence/absence of MDUs within communities, where: γ = the number of MDUs across communities;  = the mean number of MDUs per community; β = γ − ; βst = β/γ. The big circles correspond to 20 individuals and the small circles to 1.

In case (A), the two communities have no MDU in common, hence explaining the compositional β-diversity of 50%, which represents the highest possible value for this number of communities (see Appendix S2). The two MDUs within each community are closely related phylogenetically (low Q value), but these two pairs of MDUs are phylogenetically distant between communities (Qβst = βst = 50%, highest possible value). In this case, the taxonomic composition and the phylogenetic structure of the two communities differ in a similar way, i.e. the maximum level.

In case (B), the communities contain four MDUs while they share three of them. The estimated β-diversity, Qβst and βst are all low because only one out of four species differs between the communities and because the two unshared species (A and B) are closely related phylogenetically (this reduces the Qβst). In this case, the taxonomic composition and the phylogenetic structure are close between the two communities.

In case (C), the communities have no MDU in common, and the two MDUs that compose each community have marked unequal abundances. The β-diversity estimated using only community composition is 50% (βst), which is the highest possible value, as in case (A). However, phylogenetic β-diversity (Qβst) is much lower than is case (A) as closely phylogenetically related MDUs have the same abundances in their respective communities (A–B in community I, and C–D in community II). In this case, the taxonomic composition maximally differs between the two communities, but their phylogenetic structures are very close.

In case (D), the MDU composition is the same between the two communities, explaining the lowest compositional β-diversity value (βst = 0) suggesting no turnover. However, the estimated phylogenetic β-diversity (Qβst) is high (44.7% over a maximum of 50%) as the most abundant MDUs are phylogenetically distant between the two communities. In this case, while the community composition is perfectly identical between the two communities, their phylogenetic structure markedly differs.

Here, using four theoretical cases, we show that compositional and phylogenetic β-diversity are not trivially related and that the Rao framework deserves to be applied in addition to classical taxonomic-based analyses in order to reveal complementary biodiversity patterns.

Potential applications in microbial ecology

To illustrate possible uses of the presented framework, we identify three common issues that may necessitate a scaling of biodiversity (Fig. 4).

Figure 4.

Potential designs for studying the different components (α, β and γ) of microbial biodiversity across scales. In each case, the sampling unit is the local community (α1, α2 and α3), γ represents the regional diversity across all communities and β the intercommunity diversity, in a global (β1-2-3) or in a pairwise way (β1–2, β1–3 and β2–3). In case (A), the three local communities correspond to the same community sampled at different times; in case (B) to spatially dispersed communities; and in case (C) to fish gut microbial communities.

The first example corresponds to the monitoring of one or several bacterial communities over time (Fig. 4A). This can refer, for instance, to the dynamic of an in situ microbial community in different seasons, in experiments after input of contaminants, nutrients or after a modification in land use (Jones et al., 2012; Perez-Leblic et al., 2012; Zhou et al., 2012). The objective would be then to determine whether the structure of these communities varies through time by estimating the relative contribution of Q and Qβ values to Qγ values. If Q explains most of Qγ, then microbial communities remain stable through time, while a higher contribution of Qβ to Qγ would mean a major change in community structure.

Another possibility for use of our approach (Fig. 4B) is when investigating spatial processes across systems, metacommunities, and the dynamics of biodiversity and ecosystem processes from the nano to the regional scale (Jones et al., 2012; Yavitt et al., 2012). The relative contribution of Q and Qβ values to Qγ values, and their interactions with biotic and abiotic factors, may shed light on the processes underpinning empirical community patterns, i.e. the patch dynamics, species sorting, source–sink effects and neutral model frameworks (Logue et al., 2011).

The last example refers to micro-organism–macro-organism associations (Fig. 4C) and corresponds to the study of host-associated microbial communities. In the last years, there has been increased interest in understanding the structure of the indigenous microbiota that inhabit the surface or the inside of terrestrial and aquatic animals and plants (Fierer et al., 2012; Mouchet et al., 2012). The application of ecological theory to the host-associated microbiota, through description of α, β and γ components of diversity and their interactions, may push us beyond simple descriptions of community structure towards the understanding of mechanisms that structure their diversity and functions. Consequently, this could lead us to a better understanding of their role in animal and plant health (Fierer et al., 2012).

The three potential applications described earlier are not exhaustive. Within natural communities, the microbial taxa coexist with a wide range of physiologic states, expressed as different activity levels from very active to latent or even dead states (Del Giorgio and Gasol, 2008). Moreover, communities are dominated by species represented by few individuals with wide functional potential (Szabo et al., 2007). The loss and persistence of these categories of cells in the assemblage, as well as their relative importance in assembly processes are still unknown. However, there may be key players to explain the persistence of species and their global diversity patterns. These issues may benefit from the assessment of α, β and γ diversity of these cell categories at the different scales described in Fig. 4.

Concluding remarks

The modern molecular tools presented here allow estimating different facets of microbial diversity (taxonomic, phylogenetic and functional) with different but high levels of resolution and standardization. Many recent papers rely on this multifaceted approach to address questions such as the understanding of biogeographical patterns (Griffiths et al., 2011; Nemergut et al., 2011), the multiscale assessment of diversity and the comprehension of microbial community assembly rules (Lozupone and Knight, 2007; Fierer et al., 2012; Zinger et al., 2012). While these studies have facilitated the development of concepts and test major theories, they are not consistent in the way they measure the different components of diversity (α, β and γ) across scales, phylogenies and functions, so comparison between studies is not possible. The framework proposed by de Bello et al. (2010) and that we have adapted to the microbial world is unique by combining the dissimilarity and the relative abundances among the community members (here MDUs), and being flexible to cope with different kinds of data that are, or will be, generated by molecular tools. In addition, it provides a standardized methodology for the comparison of α, β and γ components across different facets of microbial diversity. Thus, large datasets covering microbial cell identity and function that are currently methodologically accessible, as well as the unified framework of diversity calculations described here, are key ingredients for successful findings in spatio-temporal distributions of microbial life, along with comparisons between case studies.


This work was partially funded by an EC2CO project FDFish (2008PRJ1) and the ANR project BIODIVNEK. Authors would like to thank Alison Duncan for improving the English language of this manuscript, and two anonymous reviewers for insightful comments on this manuscript. The authors declare that they have no conflict of interest.