1. Stewardship of biological and ecological resources requires the ability to make integrative assessments of ecological integrity. One of the emerging methods for making such integrative assessments is multimetric indices (MMIs). These indices synthesize data, often from multiple levels of biological organization, with the goal of deriving a single index that reflects the overall effects of human disturbance. Despite the widespread use of MMIs, there is uncertainty about why this approach can be effective. An understanding of MMIs requires a quantitative theory that illustrates how the properties of candidate metrics relates to MMIs generated from those metrics.
2. We present the initial basis for such a theory by deriving the general mathematical characteristics of MMIs assembled from metrics. We then use the theory to derive quantitative answers to the following questions: Is there an optimal number of metrics to comprise an index? How does covariance among metrics affect the performance of the index derived from those metrics? And what are the criteria to decide whether a given metric will improve the performance of an index?
3. We find that the optimal number of metrics to be included in an index depends on the theoretical distribution of signal of the disturbance gradient contained in each metric. For example, if the rank-ordered parameters of a metric-disturbance regression can be described by a monotonically decreasing function, then an optimum number of metrics exists and can often be derived analytically. We derive the conditions by which adding a given metric can be expected to improve an index.
4. We find that the criterion defining such conditions depends nonlinearly of the signal of the disturbance gradient, the noise (error) of the metric and the correlation of the metric errors. Importantly, we find that correlation among metric errors increases the signal required for the metric to improve the index.
5. The theoretical framework presented in this study provides the basis for understanding the properties of MMIs. It can also be useful throughout the index construction process. Specifically, it can be used to aid understanding of the benefits and limitations of combining metrics into indices; it can inform selection/collection of candidate metrics; and it can be used directly as a decision aid in effective index construction.
Multimetric indices (MMIs) are monitoring and assessment tools used in many fields such as finance, policy, economics, medicine and ecology. They are developed to act as quantitative indicators of an underlying process such as health or integrity or simply to act as a composite of the impact of the underlying process. In biology, MMIs developed to assess and indicate ecosystem integrity, such as the Index of Biological Integrity (IBI, Karr 1981) and Vegetation Index of Biological Integrity (VIBI, Mack 2004), have been widely used as indictors of the effects of human disturbance on natural communities. The goal of MMI development is to create an index that can function as a proxy indicator of human disturbance, thus providing a tool ‘…to select high-quality areas for acquisition and conservation; to diagnose likely causes of degradation; and to define management actions to halt degradation or restore degraded areas’ (Karr 2006). This is performed by combining measures of the biological community that respond strongly to one or more of the physical or chemical properties that are altered by human activity/disturbance (Karr & Chu 1997).
Multimetric indices are generally used in complex systems where the underlying causal processes are poorly understood. The observed measures, usually compiled into ‘metrics’ (e.g. abundances of individual species or growth forms, diversity and conservation value), often integrate several levels of causal hierarchy, potentially reducing the signal of the causative factor one might hope to detect. MMIs functionally collapse the causative hierarchy and use the independent information contained in the metrics to create a single index that is indicative of human-caused disturbance (Karr & Chu 1997). This process is especially useful in the context of resource management, where we are often interested in the impacts of human disturbance on biological communities, but where the causal networks linking human disturbance to the measured biological metrics are complex and the mediating factors are unknown. In such situations, managers are nonetheless charged with assessing the impact of human disturbance on systems, allocating resources to remediate the impacts and measuring the effect of remediation efforts. MMIs are useful in this context because they allow for a quantitative measure of the impact of disturbance even in the presence of uncertainty of the causal processes and can be used as a starting point for the building of causal models (Riseng et al. 2006).
While they have been widely used and are of increasing popularity, there have been questions about the usefulness of MMIs based on both interpretability (Suter 1993) and usefulness (von der Ohe et al. 2007). Under what conditions does a MMI provide a better prediction of disturbance than the best single metric? Is there an optimal number of metrics? Does a metric have to demonstrate a significant bivariate association with human disturbance to improve an index? Here, we begin to develop a statistical theory of MMIs to quantitatively address questions such as these. Our strategy is to quantitatively define the quality of a MMI as a predictor of human disturbance and derive quantitative rules for when a MMI provides better information than any of its component metrics. We then use the results to infer some general features of MMI. We end with some generalized advice for the development of procedures for effective MMI construction.
Assumptions and context
The literature relating to MMIs is complex and encompasses a range of situations and priorities. For this reason, it is important to define the context for our presentation and the assumptions being made. Oversimplifying necessarily, we can contrast two approaches to selecting biological properties (metrics) for inclusion in a MMI. One can be described as the ‘universal metric’ approach, while another can be described as the ‘strongest index’ approach. In the first of these, the dimensions of biotic or ecological integrity to be included in an MMI are decided in advance based on expert judgment. This approach seems to be most common in the assessment of fish communities (Karr 1981; Karr et al. 1986). In principle, an MMI constructed from universal metrics may not necessarily shows a correlation with human activities, as the metrics for inclusion are decided independent from their correlation with human activities. Of course in practice, universal metrics may tend to be sensitive to human influences on the abiotic environment, so they certain can show a strong relationship to human disturbance. A perhaps more common case, certainly for invertebrate and plant communities, is the selection of metrics so as to derive an index that is highly sensitive to (and therefore a good predictor of) degrees of human activities (Ofenbock et al. 2004; Hering et al. 2006a; Whittier et al. 2007, Stoddard et al. 2008). In this more empirical approach, there is usually a large number of potential metrics identified and some combination of those is selected, based on the strength of signal between metrics and human disturbance. It is this case that we address in our analysis.
Studies that wish to assemble an MMI that is sensitive to levels of human disturbance can be derived either using a continuous measure of human disturbance (e.g. Karr & Chu 1997; Ofenbock et al. 2004) or using a set of samples representing ‘reference’ conditions (e.g. Stoddard et al. 2008). We think it is appropriate to consider the reference condition approach to be a special case of the continuous measure approach where some range of human disturbance is classified as ‘minimally disturbed’ and then treated as ‘undisturbed’ for practical purposes. While the pros and cons of the reference condition approach may be debated (e.g. D. R. Schoolmaster unpublished data), in this article, we address the situation where there is assumed to be a continuous gradient of human disturbance and the capacity to quantify that gradient. Figure 1 presents a causal network to facilitate our presentation. In Fig. 1a, we represent what we believe to be a common situation where human activities (H) can lead to changes in a number of abiotic environmental properties (Ps), which in turn lead to alterations in biotic conditions (aka. metrics ms). The process for developing an MMI will commonly begin by constructing some index of human disturbance through the weighted summation of known alterations of the environment (e.g. Mack 2004, Hering et al. 2006b; Hughes et al. 2009; Whittier & Van Sickle 2010). This disturbance index (D) shown in Fig. 1b is then used in some fashion to select biological metrics (ms) for possible use in building an MMI. Ultimately, some subset of metrics is selected for constructing an MMI deemed to be suitable for our objectives. There are two levels of objectives, however. At the more detailed level, we will be interested in the individual metrics chose to represent the system. At a more general level, we are interested in having an MMI that is highly sensitive to D. This sensitivity means that using backwards inference, it is possible to use MMI scores derived from some samples to make inferences about human disturbance in other samples. In this article, we focus on the more general objective of constructing MMIs that are sensitive indicators of human activities. We recognize the importance of having metrics that are interpretable. We believe, however, that achieving this more specific objective requires steps designed to investigate the causal connections between disturbance and responses (e.g. structures like Fig. 1a, see also Riseng et al. 2006), which we address in a separate study (J. B. Grace, unpublished data).
In this section, we develop a model of the relationship between metrics and disturbance and then use it to derive some general properties of MMIs. For simplicity and concreteness, we begin at the step represented in Fig. 1b, assuming that we have a single measure of human disturbance and several metrics. We assume that all variables are normally distributed and that all metrics are a linear function of human disturbance. In Appendix S1, we show that the general properties that we derive are robust to non-normally distributed metrics, metrics measured on discrete or continuous scales, or metrics that exhibit a nonlinear relationship with the measure of human disturbance.
Imagine a set of metrics that have been scaled to be unitless measures of similar range (indexed by i), mi (e.g. biological attributes of a community such as measures of diversity, cover, taxonomic composition), which are each a linear function of human disturbance D plus some independent error ɛ,
where mi, ɛi and D are vectors with one value per site, β is a coefficient representing the effect of D on m and ɛ is multivariate normally distributed with mean of zero and variance/covariance equal to Σ. Note that while we assume that metrics have a mean equal to zero to simplify the presentation, metrics may be scaled differently (as strictly positive, for example) without loss of generality. We have the goal of combining metrics in such a way as to represent the effect of human disturbance better than could be represented with any metric individually. A common approach to creating an index is by taking the mean over sites of the metrics included in the index. Note that this is mathematically equivalent to summing the metrics and rescaling them, as is sometimes done by MMI developers. Taking a mean over n metrics in eqn (1) gives
where mI is a vector of MMI scores, and . One way to measure the sensitivity of the MMI to human disturbance is through correlation strength. The correlation of the MMI, mI, with human disturbance D is
where is the variance of D, and is the variance of . It is helpful to take the square of eqn (3), to eliminate the square root in the denominator. Squaring both sides of eqn (3) gives,
From eqn (2), Thus, eqn (4) is equivalent to the ‘coefficient of determination’ of linear regression and can be usefully interpreted as the relative proportion of variation in the MMI attributable to the variation in D. It is this quantity that we use to assess the quality of an MMI.
An MMI can be expected to yield a greater than that attributable to any one of the metrics, whenever combining metrics increases the relative proportion of the variation in mI associated with variation in D. To understand when this will be the case, it is necessary to understand the relationship between mean error, and error variance . Imagine two metrics m1 and m2 such that the error associated with m1 (ɛ1) has variance and that associated with m2 (ɛ2) has variance and the correlation of ɛ1 and ɛ2 is represented by the coefficient ρ1,2. The error variance of the index, , is
For ρi,j = 0, eqn (6) is approximately equal to ; thus, eqns (5) and (6) suggest that the error variance of the index is inversely related to the number of metrics in the index. Because the error variance is also inversely related to (eqn 4), eqns (5) and (6) also show that metrics with positively correlated errors will improve a MMI less than those that are negatively correlated or uncorrelated, because of their cumulated effects on . Further, they suggest that any index constructed from metrics with perfectly correlated errors will never be an improvement over the single metric with the highest correlation with D. To see this in the two metric case, let β1 > β2 and ɛ1 = ɛ2 (thus implying ρ1,2 = 1). The is greater than that for an MMI using both m1 and m2 because and , showing that the relative proportion of the variance of the MMI associated with variation in D is less that for the single metric m1. This occurs because the arithmetic benefit of MMI construction is the cancelling of errors in the metrics. Such cancelling works most efficiently when the errors are uncorrelated (or negatively correlated). When metric errors are positively correlated, they tend to reinforce one another in a way such as to offset the benefit derived by the division of n2.
Given a metric, m1, it is possible to solve for the conditions under which constructing an index by combining with a second metric, m2, would lead to a combined index, , with a greater capacity to predict D. This is done by solving the conditional relationship > for β2, which gives
In words, this says that β2 has to be greater than the difference between β1 and the signal-to-noise ratio (e.g. ) of m1 times a measure of how the variances of the metrics will combine.
In general, adding metric mn+1 improves an MMI when
See the Appendix S2 for a derivation of this result based on Bayesian reasoning.
Figure 2a shows an example of the relationship in eqn (7). Combinations of and β2 above the curve improve the , those below reduce it. It shows that given metric m1 (with β1 and indicated by the cross), if there is little or no correlation in the errors of the metrics, creating an MMI improves even if β2 is much lower and is much higher than the m1. However, positively correlated errors strongly increase the requirements on m2 to improve .
Using eqn (7), we can also answer the question as to whether only metrics that are ‘significantly’ correlated with D can improve an MMI. Expanding on the two-metric example and assuming a sample size of 100, Eqn (7) can be used to derive the P-value of the correlation between m2 and D that satisfies the inequality (Appendix S3). Figure 2b reproduces Fig. 2a with an additional overlay that indicates whether the combination of β2 and results in P > 0·05 (Fig. 2b, white area) or P < 0·05 (grey area). It shows that a metric need not necessarily be significantly correlated with the measure of human disturbance to improve a MMI. However, if the errors of the metrics are highly correlated, even a metric that has a very small P-value may not improve a MMI.
Given eqn (4), some properties of MMI can be deduced. For example, we can address the question: ‘Is there an optimum number of metrics, n, that can be included in an MMI?’ Here, we show that the answer depends on the distribution of βi. In the case where the error variance of all the metrics are equal and equally correlated ( and ρi,j = ρ), if all βi are equal (i.e. βi = β), then substituting into eqn (4) gives
which is a monotonic increasing function of the number of metrics, n, and has an asymptotic limit of as n goes to infinity (Fig. 3). Thus, even in this ideal case, there is a limit to the number of metrics that can be usefully added to an MMI and that number depends crucially of the correlation of metrics errors (ρ in this example).
With real data, not all metrics will share the same βi. In the more likely case that they are unequal, they can be sorted to form a monotonic decreasing function of n, such that β1 > β2 > ... > βn. In this case, there is an optimum number of metrics that is independent of sigma and inversely related to the degree of correlation of the metric errors ρ (Fig. 4). Figure 4a shows an example where . The optimal number of metrics, , is the solution for n of . In this case, it is a function that decreases in both b1 and ρ (Fig. 4), indicating that as the differences in information among rank-ordered βs increases, the number of metrics resulting in the best possible MMI decreases.
We tested the idea that indexes created from metrics in which both βi and varied would have an optimal number of metrics by simulating 15 metrics using eqn (1), where βi were selected from a beta distribution with mean 1/2 and variance 4/21 (i.e. shape 2, scale 2), and were selected from a gamma distribution with mean 40, variance 200 (shape 8, scale 5) and ρi,j ranged from 0 to 0·75. Metrics were then scaled to have a negative relationship with D and range from 0 to 10. From these metrics, a MMI was assembled algorithmically (see Discussion). Figure 5 shows the mean and standard error of the MMI correlation with the simulated human disturbance gradient as a function of the number of metrics added to the index. The curve generated by the simulations clearly has a minimum at an intermediate number of metrics and both the optimum number of metrics, and the maximum correlation achieved by an MMI are reduced by correlation of metric errors, as predicted by the analytical model.
This analysis aims to provide a general understanding of the properties of MMIs. It is our hope that this will help both guide data collection for new MMIs and allow construction of better MMIs from existing data (as discussed below).
At its most fundamental level, a MMI allows a shared signal in a set of metrics to be amplified relative to the errors contained in those metrics. Amplification of the shared signal can occur when the independent (or negatively correlated) errors of the component metrics tend to cancel one another. Thus, to be effective, index construction requires that each component metric of the index provide some independent measure of the shared signal. If metric errors are positively correlated, they become amplified with the signal, eliminating much of the benefit of index construction. Others have partially recognized this and suggested a criterion by which to exclude candidate metrics based on correlation with other metrics using only the minimally disturbed subset of sites (Barbour et al. 1992; Whittier et al. 2007; Stoddard et al. 2008) or the whole data set (Hughes et al. 1998; McCormick et al. 2001; Hughes, Howlin & Kaufamann 2004; Herbst & Silldorff 2009; Raburu, Okeyo-Owuor & Masese 2009). However, this analysis suggests that a criterion based on the correlation of the metric scores, instead of metric errors, risks eliminating metrics that share a high degree of correlation with the gradient of interest and low error (see Appendix S4 for an example).
The reliance of MMI construction methods on the simple correlation of metric scores and human disturbance has led to a recognized misunderstanding as to whether strongly intercorrelated metrics are beneficial or detrimental to MMIs (Van Sickle 2010). This potential for disagreement exists because a strong intercorrelation can exist between metrics for two different reasons: one of which benefits MMI properties and the other being detrimental. If two metrics are each strongly related to disturbance, (large β) they will be highly correlated and benefit MMI. However, if metrics have strongly positively correlated disturbance-independent errors (ρ), their scores will be strongly correlated but, as our analyses show, will likely not improve an MMI. Van Sickle’s (2010) analysis of previously published MMIs found that correlated metrics were detrimental to MMIs. Based on this analysis, Van Sickle’s result suggests that correlated disturbance-independent errors are prevalent. In general, if one wishes to judge the redundancy of metrics (the non-beneficial correlation), the appropriate statistic is the correlation of the metric errors. These can be estimated as the correlation of the residuals of the simple regression (or nonlinear analogue) of disturbance on each metric individually.
Another step that is present in almost all published MMI construction methodologies suggests eliminating candidate metrics prior to index construction based on ‘responsiveness’, usually using a t-test or P-value cut-off of the correlation between the metric and the measure of disturbance (Karr & Chu 1997; Hering et al. 2006a; Stoddard et al. 2008). Figure 2b shows that this is not a necessary or sufficient step and may result in the elimination of potentially useful metrics. Metrics that have a low correlation with disturbance may nonetheless improve an MMI, especially if their errors are not correlated with those of metrics in the MMI. The concept of P-values comes from a hypothesis-testing framework and depends on factors such as sample size. Index construction depends on the amount of information of a shared signal contained in metrics. While a metric must contain information of the shared signal to benefit an MMI, the strict hypothesis-testing framework is not relevant. Indeed, it is possible to create an MMI that is significantly related to a disturbance gradient from a set of metrics, which all contain information on the gradient, but are not individually significantly correlated with disturbance (Fig. 6).
This analysis also suggests that one should check the predictive performance (as measured by , for example) when assembling metrics into an MMI. Not all combinations of metrics will result in a MMI with a better predictive capacity than the best metric (e.g. the one with the strongest correlation with human disturbance). In addition, given a set of candidate metrics with varying amount of signal and error, one can expect that the MMI with the best predictive capacity will occur at an intermediate number of metrics. In other words, there is likely to be an optimum number of metrics to include past, which each candidate metric will only reduce the predictive capacity of the MMI.
One is then faced with the decision of which metrics to include. Eqn (8) gives quantitative criteria by which these decisions can be made. However, because the effect of a candidate metric depends on its relationship with each of the others in the index, different selections for the initial metric will lead to different final sets of included metrics. Due to this fact, and the fact that the number and identity of metrics that will lead to the best MMI cannot be deduced simply from the bivariate relationships between metrics and human disturbance, constructing a sensitive MMI will require a more computationally intensive approach.
Potential applications of multimetric indices principles
The theory described above shows that given only information on metric bivariate relationships with disturbance and intercorrelations with one another, it is not possible to discern which metrics will combine into a sensitive MMI. Indeed, the theory suggests that the minimum amount of information needed to most effectively construct an MMI is , which is the number of metric bivariate correlations with D plus the number of correlations of metric errors of the developing MMI and all the candidate metric yet to be included in the MMI. For example, a 7-metric MMI selected from 30 candidate metrics requires at least 2501 pieces of information plus additional calculations (such those needed to evaluate eqn 8). As a result, we suggest that an automated MMI construction process would be more efficient and effective at constructing robust MMIs. Below, we summarize one potential approach by outlining an algorithm for index construction. While this approach uses a stepwise strategy to initially construct multiple MMI from the set of candidate metrics, it then uses statistical criteria to select from among them. A stepwise approach to metric selection has been used previously by others to create single MMIs (Roth et al. 1998; Southerland et al. 2007; Angradi et al. 2009). Further, the strategy of creating multiple MMI from a set of candidate metrics has been suggested previously on empirical grounds by Van Sickle (2010).
What follows is a list of steps (written in the form of pseudocode) that can be used to generate the MMI with the best predictive capacity from a given set of metrics. These steps can be developed into computer code to quickly generate a set of candidate MMIs. This method assumes one wants an MMI with the strongest possible negative correlation with disturbance and that metrics have already been adjusted for correlated environmental gradients (D. R. Schoolmaster et al., unpublished ms.) and scaled. If metrics are not a linear function for D, a measure of nonlinear association could be used instead such as rank correlation or Efron’s R2 (Efron 1978) of a LOESS regression.
1 Reflect (reverse) metrics that are positively correlated with disturbance, D.
2 Select an initial metric to include in the MMI, m1.
3 Add m1 to each of the rest of the metrics, mj, site-by-site.
4 For each mj, find which combination m1 + mj has the strongest negative correlation with D. Select that one.
5 Check to see whether the correlation of the assembled index with D is stronger than before adding mj.
6 Add the MMI to each of the remaining metrics element-wise.
7 Find which combination of index + mj has the strongest negative correlation with D. Select that one.
8 Check to see whether the correlation of the assembled index with D is stronger than before adding mj.
9 Continue steps 6–8 until some stopping criterion has been satisfied. For example, until adding the best remaining metric weakens the index’s correlation with D, or until all metrics have been used, etc.
10 Repeat steps 2–9 until all metrics have been used as initial metric, m1.
This process will result in a set of MMIs equal to the number of candidate metrics. Many will be identical. From these candidate MMIs, one can select the MMI that has the strongest final relationship with D or other desired properties such as interpretability. As the number of initial candidate metrics is large, it will be desirable to use some criteria to narrow the list of candidate MMIs, such as eliminating duplicate lists and those that contain a metric that only occurs in one MMI. Because stepwise model selection procedures can have undesirable effects in some contexts (Whittingham et al. 2006), one can also employ information theoretical approaches such as likelihood ratio tests to determine where cut-offs should be made within each candidate MMI, and Akaike Information Criterion to help select the most parsimonious, predictive models (D. R. Schoolmaster, unpublished data).
These theory-based suggestions for MMI construction deviate significantly from other recent recommended methodologies. Part of the reason for the difference is that our method emphasizes the predictive capacity of the MMI, as opposed to interpretability of the final MMI. We argue that interpretability is addressed best by identifying the mediating factors that causally relate measures of human disturbance with the MMI (we discuss MMI interpretability elsewhere, Grace unpublished data, but see Riseng et al. 2006). However, as the outcome of our suggested procedure is a set of indices, professional judgment and issues of interpretability can be brought to bear on the process of selecting one MMI from the set.
The use of MMIs has been steadily increasing, and the conclusions drawn from them given increasing weight in decision-making. Biological and ecological MMIs are increasingly used to both set standards and judge improvement of biological and ecological integrity at a variety of spatial scales. The goal of this works is to inform the MMI construction process by creating general theory to allow both discovery some general properties of MMI, understanding of how and why they work and to help determine how best construct them. We believe that this work provides a foundation for a deeper understanding of what MMIs are and how they work which will lead to advancements in both the construction and use of MMIs.
We thank Kathy Irvine for reviewing an early version of this manuscript. We also thank John Van Sickle and an anonymous reviewer for constructive comments on an earlier draft of the manuscript. This work was supported, in part, by funding from the USGS Status and Trends, Ecosystems, and Global Change Programs. The use of trade names is for descriptive purposes only and does not imply endorsement by the U.S. Government.