- Top of page
- Conflict of Interest
- Supporting Information
The available taxonomic expertise and knowledge of species is still inadequate to cope with the urgent need for cost-effective methods to quantifying community response to natural and anthropogenic drivers of change. So far, the mainstream approach to overcome these impediments has focused on using higher taxa as surrogates for species. However, the use of such taxonomic surrogates often limits inferences about the causality of community patterns, which in turn is essential for effective environmental management strategies. Here, we propose an alternative approach to species surrogacy, the “Best Practicable Aggregation of Species” (BestAgg), in which surrogates exulate from fixed taxonomic schemes. The approach uses null models from random aggregations of species to minimizing the number of surrogates without causing significant losses of information on community patterns. Surrogate types are then selected in order to maximize ecological information. We applied the approach to real case studies on natural and human-driven gradients from marine benthic communities. Outcomes from BestAgg were also compared with those obtained using classic taxonomic surrogates. Results showed that BestAgg surrogates are effective in detecting community changes. In contrast to classic taxonomic surrogates, BestAgg surrogates allow retaining significantly higher information on species-level community patterns than what is expected to occur by chance and a potential time saving during sample processing up to 25% higher. Our findings showed that BestAgg surrogates from a pilot study could be used successfully in similar environmental investigations in the same area, or for subsequent long-term monitoring programs. BestAgg is virtually applicable to any environmental context, allowing exploiting multiple surrogacy schemes beyond stagnant perspectives strictly relying on taxonomic relatedness among species. This prerogative is crucial to extend the concept of species surrogacy to ecological traits of species, thus leading to ecologically meaningful surrogates that, while cost effective in reflecting community patterns, may also contribute to unveil underlying processes. A specific R code for BestAgg is provided.
- Top of page
- Conflict of Interest
- Supporting Information
The unprecedented increase in anthropogenic disturbance worldwide has exacerbated concerns about the potential ensuing depletion of biodiversity and ecosystem functioning (Hooper et al. 2012). However, the intrinsic complexity of ecological systems largely limits our ability to predict their possible critical transitions toward undesirable states (Scheffer et al. 2012). Environmental impact assessment and monitoring, therefore, are of basic importance in revealing the effects of human pressures and their interactions with natural sources of variability, detecting early signals of phase shifts, and guiding subsequent adaptive management and mitigation strategies (Hill and Arnold 2012).
Wide gaps in knowledge of phylogenetic, taxonomic, and functional characteristics of most species (Lomolino 2004; Whittaker et al. 2005; Cardoso et al. 2011) make difficult quantifying human-driven patterns of changes and unveiling underlying ecological processes. Progresses in molecular analyses, such as DNA bar coding of organisms, are helping the process of cataloging biodiversity (Gross 2012), and recent developments in this field highlighted the value of genetic tagging in estimating ecological properties of communities despite the inherent loss of taxonomic information (e.g., Fonseca et al. 2010; Yu et al. 2012). Molecular analyses and bioinformatics, nevertheless, represent complementary but not alternative approaches to huge endeavors for research in taxonomy and autoecology (Wilson 2004), which are inevitable for advancing the knowledge of biodiversity (May 1990; Wheeler 2004; Wheeler et al. 2004; de Carvalho et al. 2007).
Despite renewed efforts in the exploration of biodiversity (e.g., Snelgrove 2010; Fontaine et al. 2012) and in the enhancement of taxonomy and systematics (Boero 2001; Wilson 2003), current knowledge of species is still far from being exhaustive (Pereira et al. 2012) and the availability of taxonomic expertise appears still insufficient (Wägele et al. 2011) to cope with the current need of timely solutions to pressing environmental problems.
This so-called ‘taxonomic impediment’ (e.g., Wheeler 2004) is challenging in applied ecological research to provide cost-effective methods for elucidating the response of communities and ecosystems to natural and anthropogenic drivers of change (Pik et al. 1999; Jones 2008; Mandelik et al. 2010; Mellin et al. 2011). A mainstream practice to overcome this hindrance across terrestrial, freshwater, and marine environments focuses on the use of higher taxa as surrogates for species (Bevilacqua et al. 2012). The higher taxon approach in environmental investigations is based on the concept of taxonomic sufficiency, which involves the use of coarse taxonomic resolution without causing a significant loss of information, thus avoiding costly, time-expensive, and difficult species-level identifications (Beattie and Oliver 1994). Such an approach, especially when based on intermediate taxonomic ranks (i.e., Genus and Family), is generally effective in depicting species-level patterns of community response under a wide range of environmental settings (e.g., Heino and Soininen 2007; Lovell et al. 2007; Terlizzi et al. 2009).
However, taxonomic sufficiency implies the static grouping of organisms in taxa belonging to a single taxonomic level higher than species (e.g., all organisms identified as genera, or families, etc.) irrespective of their ecological relevance or difficulty of taxonomic identifications. As a consequence, the use of higher taxa as surrogates for species (hereafter referred to as taxonomic surrogates) often restricts inferences about the causality of the observed patterns (Lenat and Resh 2001; Terlizzi et al. 2003; Jones 2008).
Uncertainties about the appropriateness of this approach to species surrogacy may depend on the fact that related empirical studies have amassed in the absence of incisive efforts in structuring a solid theoretical framework for the application of taxonomic surrogates. Putative similarities in ecological traits among closely related species, or hierarchical (from species to higher taxonomic ranks) responses to environmental disturbance, have been invoked to substantiate the ability of taxonomic surrogates to mirror species-level patterns (e.g., Warwick 1993; Ferraro and Cole 1990; Heino and Soininen 2007). Such explanations are, nevertheless, unable to elucidate exhaustively the reasons behind the success, or failure, of taxonomic surrogates (Lenat and Resh 2001; Bertrand et al. 2006; Dethier and Schoch 2006; Bevilacqua et al. 2009), and are difficult to validate experimentally. The absence of clearly stated assumptions on the effectiveness of taxonomic surrogates, and the lack of standard methods for quantifying the probability of Type-I error when identifying a particular taxonomic level as effective in discerning a given pattern of interest, raised criticism about their potential utility (Mellin et al. 2011).
Several studies that have investigated factors affecting the performance of taxonomic surrogates, such as taxonomic relatedness among species, outlined that higher taxa perform better as surrogates for species when they are poor in species (e.g., Lovell et al. 2007), or there is a small mean and variance in the number of species per higher taxon (e.g., Neeson et al. 2013) or, in other words, when the ratio of the number of species to the number of higher taxa is low (e.g., Giangrande et al. 2005; Dethier and Schoch 2006). In a recent attempt to shed light on potential mechanisms determining the performance of taxonomic surrogates, Bevilacqua et al. (2012), working on marine molluscs at a regional scale, used null models to show that higher taxa of the Linnaean taxonomic hierarchy may be considered as arbitrary categories of species unlikely to convey consistent responses to natural or human-driven environmental changes. A similar approach, based on the metacommunity concept, led Siqueira et al. (2012) to analogous conclusions when investigating congruences in spatial patterns of variation in community composition of freshwater invertebrates among the whole set and different subset taxa. Bevilacqua et al. (2012) showed that information loss and the ensuing decrease in statistical power to detect changes in assemblage structure at higher taxonomic levels depended on the degree of species aggregation (exemplified by the ratio between the number of higher taxa and the number of species), rather than on taxonomic relatedness of species (i.e., the relative closeness of species in the Linnaean taxonomic hierarchy) (see also Siqueira et al. 2012 for similar findings). By analyzing 20 years of research on taxonomic surrogates, the authors also found strong evidence supporting the generality of such findings across a wide range of terrestrial, freshwater, and marine organisms.
In this perspective, here, we propose a novel approach to species surrogacy, the Best Practicable Aggregation of Species (BestAgg), that allows alternative ways to aggregate species into surrogates, beyond static taxonomic grouping, in order to maximizing ecological information and to optimizing the use of surrogates for species in ecological studies. Taxonomic sufficiency concerns the use of higher taxa as surrogates for species and aims to identifying the coarser level of taxonomic resolution sufficient to allow the assessment of community response to environmental drivers. The BestAgg approach, instead, relies on determining the sufficient (i.e., minimum) number of surrogate groups, irrespective of their type (i.e., if taxonomic, morphological, functional, etc.), that could be used while still obtaining consistent results with species-level community response. As for any rigorous surrogacy approach (e.g., Van Wynsberge et al. 2012), taxonomic sufficiency relies on a first assessment of the sufficient taxonomic resolution based on species-level data (e.g., Terlizzi et al. 2003; Defeo and Lercari 2004; Jones 2008). In this framework, a pilot investigation compares results of analyses at species level with those obtained using higher taxa. Species-level data are therefore aggregated (i.e., grouped and summed) into higher taxa and the coarser taxonomic resolution able to provide consistent results with those obtained from species-level data is assumed to be suitable for subsequent monitoring or for very similar study contexts.
Following the same framework, we used species-level information from pilot studies to identify the sufficient number of surrogates able to depict community patterns consistently with species-level information. Surrogates were then defined based on their ecological importance (relevance), low difficulty of taxonomic identification during sample processing (easiness), and shared characteristics among organisms (resemblance). Finally, we tested the performance of BestAgg surrogates in similar study contexts and compared their response with classic surrogates based on taxonomy (i.e., higher taxa).
- Top of page
- Conflict of Interest
- Supporting Information
For both pilot assessments, linear regressions of ρ against ln(ϕ) were significant (P < 0.001), indicating that the information retained in the aggregated matrices strongly depended on the level of aggregation following a semilog model (Fig. 3).
Figure 3. Semilog plot of ρ values between the species-level matrix and each randomly aggregated matrix against the corresponding ϕ values for pilot studies (A) P1 (OP) and (B) T1 (DG). Fading gray zones indicate the range of ϕ values at which analyses were consistent with those at species level. Dotted lines indicate ϕlow (i.e., the lowest practicable aggregation level), sufficient to obtain results consistent with those obtained analyzing species-level data.
Download figure to PowerPoint
For the OP case study, the pilot assessment showed that the lowest ϕ value allowing 95% of PERMANOVAs on aggregated data to give consistent results with those obtained at species level was ϕlow = 0.10, corresponding to Gmin = 26 (Table 1). For the DG case study, instead, ϕlow = 0.20 and, therefore, Gmin = 16 (Table 1). This means that the original S species, that is, 259 for OP and 79 for DG could be aggregated in 26 and 16 surrogates, respectively, while still allowing analyses to perform as well as at species level.
Table 1. Percentage of tests from PERMANOVA on random aggregated data consistent with those from species-level analyses, at decreasing levels of aggregation (φ)
|OP – Pilot assessment (Platform P1 − S = 259)||DG – Pilot assessment (Time 1 − S = 79)|
|Number of surrogates (G)||Aggregation ratio (φ)||% Analyses consistent with species level||Number of surrogates (G)||Aggregation ratio (φ)||% Analyses consistent with species level|
|156||0.60||100%|| 16 || 0.20 ||98%|
|130||0.50||100%|| || || |
|117||0.45||100%|| || || |
|104||0.40||100%|| || || |
|91||0.35||100%|| || || |
|78||0.30||100%|| || || |
|65||0.25||100%|| || || |
|52||0.20||100%|| || || |
|39||0.15||100%|| || || |
| 26 || 0.10 ||98%|| || || |
|13||0.05||69%|| || || |
The procedure for selection of BestAgg surrogates from pilot assessments led to define a set of GBestAgg = 29 surrogates for OP (see Table S3) and GBestAgg = 23 surrogates for DG (see Table S4). General and context-specific ecological relevance of surrogates was defined based on available scientific information (see Tables S3 and S4), whereas study-specific relevance was defined based on species most contributing to the observed patterns (see Table S2 for results of SIMPER analyses). For OP, all 29 surrogates were based on taxonomy, with six species (Aspidosiphon sp., Corbula gibba, Golfingia sp., Thyasira biplicata, Timoclea ovata, and Nuculana commutata), three genera (Kelliella, Diplodonta, and Nucula), four families (Capitellidae, Cirratulidae, Paraonidae, and Spionidae), five orders (Amphipoda, Cumacea, Decapoda, Isopoda, and Tanaidacea), 10 classes (Aplacophora, Asteroidea, Bivalvia, Echinoidea, Gastropoda, Holothuroidea, Polychaeta, Ophiuroidea, Scaphopoda, and Turbellaria), and one phylum (Sipuncula) (Table S3). For DG, the set of 23 BestAgg surrogates was more heterogeneous including 11 taxonomic surrogates from species to class level (Agelas oroides, Anthozoa, Axinella sp., Bivalvia, Cirripedia, Cladocora caespitosa, Cliona spp., Hydrozoa, Peyssonnelia spp., Tunicates, and Wrangelia penicillata), and 12 morphological/functional groups (calcareous tube worms, canopy-forming algae, coarsely branched/unbranched algae, Crambe/Spirastrella, encrusting Bryozoans, erect Bryozoans, encrusting calcified Rhodophytes, encrusting/massive sponges, green filamentous algae, Madreporarians/Zoanthidea, massive black sponges, and turf-forming Algae) (Table S4).
For both pilot assessments, tests based on randomizations showed that the probability of GBestAgg to fail in depicting species-level community patterns was P = 0.021 for OP and P = 0.005 for DG. Correlation ρBestAgg was in both cases significantly (P < 0.05) higher than random expectations (Fig. 4), indicating that data aggregated using BestAgg surrogates retained much more of the original species-level information than what is expected to occur by chance.
Figure 4. Frequency distribution (n = 1000) of ρ values between the species-level matrix and matrices in which species were randomly aggregated in GBestAgg groups (see text and Fig. 2), for pilot studies (A) P1 (OP) and (B) T1 (DG). Dotted lines indicate ρBestAgg, that is, the correlation value between the species-level matrix and the matrix aggregated using BestAgg surrogates, which in both cases fall significantly (P < 0.05) above random expectations.
Download figure to PowerPoint
Pilot assessments showed that PERMANOVA based on data aggregated using BestAgg surrogates allowed obtaining the same results of species-level analyses (Table 2). For OP, the variance component associated to investigate source of variation (i.e., distance from platform) accounted for 22% of the total variance at species level and 24% when using the BestAgg surrogates. For DG, the variance component associated to investigate source of variation (i.e., depth) accounted for the 26% of the total variance at species level and 27% when using BestAgg surrogates. Analyses on aggregate data were able to detect not only the main effect of the investigated sources of variation but also to depict consistently species-level patterns of difference in community structure along the studied gradients (Table 2). For both case studies, PERMANOVA on test data using the specific BestAgg surrogates derived from pilot assessments showed results consistent with those that would have been obtained if species were analyzed (Table 2).
Table 2. Results of PERMANOVA on data aggregated using BestAgg surrogates. Results consistent (including pairwise comparisons) with those obtained at species level (which are also reported) are given in bold
|BestAgg pilot assessment|
|Case study||Pilot study||Source of variation||Species level||BestAgg|
|OP||P1||Distance||(N ≠ M ≠ F)***||(N ≠ M ≠ F)***|
|DG||T1||Depth||(5 ≠ 15 = 25)***|| (5 ≠ 15 = 25) *** |
|Application of BestAgg|
|Case study||Test study||Source of variation||Species level||BestAgg|
|OP||P2||Distance||(N = M ≠ F)**|| (N = M ≠ F) ** |
|DG||T2||Depth||(5 ≠ 15 = 25)*|| (5 ≠ 15 = 25) * |
| ||T3||Depth||(5 ≠ 15 ≠ 25)**|| (5 ≠ 15 ≠ 25) ** |
| ||T4||Depth||(5 ≠ 15 ≠ 25)**|| (5 ≠ 15 ≠ 25) ** |
For both case studies, the information retained in taxonomically aggregated matrices fell within or below the 95% confidence interval from random expectation for all investigated taxonomic levels (i.e., genus, family, order, class, phylum) (Fig. 5), indicating that taxonomic surrogates behaved as, or even worse, than random groups of species. Pilot assessments of taxonomic surrogates based on taxonomic sufficiency indicated order and phylum, respectively, as the sufficient taxonomic levels for analyses in OP and DG (Table 3). In contrast, the sufficient taxonomic level predicted based on the lowest practicable aggregation (ϕlow) was that of order for both case studies (Table 3). For OP, PERMANOVA on test data aggregated at order level confirmed this taxonomic resolution as sufficient in providing results consistent with those obtained using species (Table 4). For DG, analyses showed that orders were effective surrogates, whereas the analysis at phylum level, although still detecting main effects, was unsuitable to depict community pattern of difference along the investigated environmental gradient as well as at species level (Table 4).
Table 3. φ values of each taxonomic level and sufficient φ values based on BestAgg (pilot studies)
|Case study||Pilot study||Source of variation|| S ||Values of φ based on taxonomic aggregation||Sufficient φ from BestAgg||BestAgg prediction of sufficient taxonomic level||Sufficient taxonomic level (classic)|
Table 4. Results of PERMANOVA on test data aggregated on the basis of the sufficient taxonomic level determined using taxonomic sufficiency (classic approach) and the lowest practicable aggregation level φlow from BestAgg (see Table S5)
|Case study||Source of variation||Taxonomic level||Test study|
| || ||Species||(N = M ≠ F)**|
| || ||Order|| (N = M ≠ F) ** |
| || ||Species||(5 ≠ 15 = 25)*||(5 ≠ 15 ≠ 25)**||(5 ≠ 15 ≠ 25)**|
| || ||Order|| (5 ≠ 15 = 25) * ||(5 ≠ 15 ≠ 25)**||(5 ≠ 15 ≠ 25)**|
| || ||Phylum||(5 = 15 ≠ 25)*||(5 ≠ 15 = 25)**||(5 ≠ 15 = 25)*|
Figure 5. Mean ± 95% confidence interval (n = 1000) of ρ values between species and randomly aggregated matrices for (A) P1 pilot study (OP case study) and (B) T1 pilot study (DG case study). Black points are ρ values between species and higher taxon matrices at genus, family, order, class, and phylum level. Numbers in brackets indicate the number of taxa in each taxonomic level.
Download figure to PowerPoint
The application of BestAgg surrogates led to an estimated saving of time during sample processing and organism identification of 90% for OP and 71% for DG, in contrast to savings of 85% and 45% for OP and DG, respectively, when using taxonomic surrogates.
- Top of page
- Conflict of Interest
- Supporting Information
Over the past three decades, the use of higher taxa as surrogates for species has received increasing attention as a pragmatic solution to overcome impediments related to fine taxonomic identifications of organisms in ecological studies (Bevilacqua et al. 2012). However, a number of issues on taxonomic surrogates remained largely unsolved, possibly preventing the consolidation of such a practice in routine monitoring programs despite its undeniable advantages (Dauvin et al. 2007). The use of taxonomic surrogates is problematic when the allocation of species into higher taxa is queried, or when cladistic revisions of the taxonomic hierarchy lead to the insertion/removal of additional (e.g., infraorder, superfamily) or classic ranks. Also, taxa of the same rank may not be equivalent from a phylogenetic point of view among different phyla, making the use of taxonomic surrogates less stringent when considering assemblages embracing more than one phylum (Bertrand et al. 2006; Bevilacqua et al. 2009). More importantly, ecological similarity among species within taxa may be markedly taxon specific (e.g., Losos 2008), hampering the association of a clear ecological meaning to changes in community structure when it is codified through ranks of the Linnaean hierarchy higher than species (Somerfield and Clarke 1995; Terlizzi et al. 2003; Bertrand et al. 2006; Jones 2008).
In spite of these evident intrinsic limits, approaches based on taxonomic relationships have profoundly conditioned the way species surrogacy is conceived so far. The BestAgg framework proposed in this study attempts to rise above this stagnant perspective. BestAgg focuses on the aggregation of variables in multivariate data matrices, looking at the effect of aggregation on congruencies between the information contained in the original versus the corresponding aggregated matrix (Bevilacqua et al. 2012). The approach is based on the simple concept that the higher the level of aggregation (i.e., the ratio of the number of aggregated variables to the number of original variables) the higher the loss of information. As it is the numerical relationship between original and aggregated variables that matters, the nature of variables (which could express the abundance of species, taxa, morphological groups, etc.) and the logic guiding variables’ aggregation are irrelevant. Thus, BestAgg is applicable to any kind of community data from any environmental context and type of organisms (whether involving a single phylum or more different ones), allows mixing any type of surrogates (as the identification of the sufficient number of surrogates goes beyond any potential relationship among species, whether taxonomic, phylogenetic, etc.), and prioritizes the choice of ecologically meaningful groupings (in contrast to taxonomic surrogates, for instance, which are based on taxonomic relatedness regardless of whether higher taxa could actually represent ecologically meaningful units).
It could be argued that such numerical relationships might be biased by sample size, as might happen for ratios between taxonomic categories/subcategories (Gotelli and Colwell 2001). This is not the case for aggregation ratios in BestAgg because (1) the approach does not assume any intrinsic relationship between the original and the aggregated variables (which, actually, are arbitrary categories deriving from random aggregations) and (2) sample size is constant for a given study.
Disentangling species surrogacy from static aggregation schemes, BestAgg, can also take advantage of using different surrogate types (e.g., higher taxa, functional groups, ecological indicators, etc.). Such a prerogative is decisive to open species surrogacy to ecological knowledge (Groc et al. 2010), which can guide the choice of those surrogates more aligned to ecological characteristics of species in order to maximizing ecological information on community patterns notwithstanding the inherent reduction in taxonomic detail. Although this aspect seems to introduce some level of subjectivity in the approach, surrogates selection in BestAgg is far from being arbitrary. The number of effective surrogates is determined objectively and the identity of surrogates is determined based on objective macrocriteria. Also, evidence from pilot assessments and the solid scientific information on the investigated system substantiate the choice of surrogates, limiting the subjectivity of the experimenter (Dauvin et al. 2007). Moreover, in contrast to other approaches, within the BestAgg framework, the experimenter can set a priori the probability of failing in detecting significant results using the selected surrogates and therefore controlling for uncertainty on their application.
Quite intuitively, the set of surrogates from BestAgg may be strictly context specific because their choice, as for any other approach to species surrogacy, depends on the aim of the study, the particular environmental situation, the organisms involved, and the available ecological knowledge of the system. Pragmatic considerations seem to suggest that levels of aggregation up to 0.4–0.5 (corresponding to a number of surrogates equal to 40–50% of the original number of species) are usually still conducive to effective representations of species-level community patterns (Bevilacqua et al. 2012). However, one-fit-all solutions in species surrogacy could be misleading and the identification of suitable surrogates for a given study needs to be based on representative pilot assessments at species level (Terlizzi et al. 2003; Jones 2008; Siqueira et al. 2012). Therefore, the set of effective surrogates obtained from the application of BestAgg to a pilot study should be applied to subsequent studies in very similar environmental contexts only (e.g., same source of impact in the same habitat, the same natural gradient in areas of the same region, etc.), and given the same experimental design, which, clearly, needs to be appropriately planned to assess the effects of the investigated source of variability in modifying community patterns. We simulated the application of BestAgg to real case studies, in which a first pilot assessment was performed to define the set of effective surrogates that was then used successfully in similar environmental investigations (as in the OP example), or for subsequent monitoring programs (as in the DG example). Results demonstrated the robustness of BestAgg in analyzing community patterns in relation to both natural and human-driven gradients, whether involving individual or colonial species, although further efforts are required to extend this approach to other environmental contexts. As the estimation of cost savings deriving from using surrogates strongly depends on the investigated group(s) of organisms, the number of specimens to be classified, and available taxonomic expertise (Ferraro and Cole 1990), quantifying the advantages provided by BestAgg surrogates in term of costs with respect to classic taxonomic surrogates is a difficult task and estimated cost savings could not have a general validity. However, our results on real case studies showed that BestAgg surrogates might lead gaining up to 25% of time during sample processing with respect to classic taxonomic surrogates. Moreover, such a time saving is likely to be underestimated because, in contrast to taxonomic surrogates, which imply at least a basic taxonomic expertise, the choice of surrogates in BestAgg prioritizes identification easiness and might involve nontaxonomic surrogates (e.g., morphological groups).
Above all, our findings showed that BestAgg represents a valuable alternative method to species surrogacy in environmental impact assessment and ecological monitoring, potentially leading to increased time saving with respect to traditional approaches, such as those involving the use of higher taxa as surrogates for species. In addition, BestAgg recognizes the need for conferring an ecological meaning to surrogates, which is fundamental for the interpretation of ecological patterns. It is increasingly evident that the quest for effective proxies for species has to abandon static approaches, moving toward the integration of taxonomic, phylogenetic, and functional aspects (Devictor et al. 2010). BestAgg may represent a step forward in this direction.