SEARCH

SEARCH BY CITATION

Keywords:

  • Study design;
  • Sample size;
  • Environmental monitoring;
  • Critical effect size

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

The effective design of field studies requires that sample size requirements be estimated for important endpoints before conducting assessments. This a priori calculation of sample size requires initial estimates for the variability of the endpoints of interest, decisions regarding significance levels and the power desired, and identification of an effect size to be detected. Although many programs have called for use of critical effect sizes (CES) in the design of monitoring programs, few attempts have been made to define them. This paper reviews approaches that have been or could be used to set specific CES. The ideal method for setting CES would be to define the level of protection that prevents ecologically relevant impacts and to set a warning level of change that would be more sensitive than that CES level to provide a margin of safety; however, few examples of this approach being applied exist. Program-specific CES could be developed through the use of numbers based on regulatory or detection limits, a number defined through stakeholder negotiation, estimates of the ranges of reference data, or calculation from the distribution of data using frequency plots or multivariate techniques. The CES that have been defined often are consistent with a CES of approximately 25%, or two standard deviations, for many biological or ecological monitoring endpoints, and this value appears to be reasonable for use in a wide variety of monitoring programs and with a wide variety of endpoints.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

The traditional paradigm in ecological studies and environmental monitoring bases both research and management decisions on statistical tests of null hypotheses (H0). Statistical tests of collected data are used to evaluate the probability of observing a test statistic at least as extreme as the one observed. Newman [1] recently reviewed the use of statistical tests and highlighted some of the concerns. The statistical test sets a predefined critical significance value (α), which is the type I error rate, and a second predefined critical value (β), which describes the overall type II error rate. Typically, α is set at 0.05, but a range of values are used for β. In addition, insufficient attention has been given to setting β in a justifiable manner [1]. Some programs (i.e., the Canadian Environmental Effects Monitoring [EEM] program [2]) have set α = β for field programs so that the risk of a false conclusion is equally balanced between risk to industry (type I error) and risk to the ecosystem (type II error).

If the probability of H0 being true is less than α, a significant departure from H0 and, therefore, from HA (the logical opposite of H0) is supported, and a difference, or effect, likely exists. A significant difference between treatments is defined as a meaningful effect, or the effect size [1,3]. The designation of this value for purposes of study design often is referred to as the critical effect size (CES). No clear guidelines exist on how to (or who should) determine how large an effect is unacceptable, and this determination can vary with the design, purpose, and regulatory basis for a monitoring program. Differences exist between the statistical threshold for detecting changes and the threshold that describes either ecologically important changes [4] or the changes thought to be of importance to higher levels of organization [5]. Few studies have attempted to link changes across levels of organization [6,7], and even fewer have tried to define levels of change that would be protective of higher levels of organization or important ecological processes.

The specification of a CES has been argued to be one of the most crucial aspects of environmental monitoring programs [3,8–10], although failure to consider CES a priori has been widespread [1]. In practice, CES rarely are used to guide ecological experiments or environmental management decisions in an effective manner, and few examples exist of aquatic monitoring programs implementing CES in a useful way. This has resulted in considerable criticism regarding the adequacy of environmental monitoring programs [11,12].

Critical effects sizes are defined by two components: The form and the magnitude (i.e., the type and the size) of the impact to be detected. The form of impact involves deciding what endpoints to monitor (e.g., individual- and/or community-level characteristics), deciding whether we are concerned with the means and/or variances at impacted sites relative to control sites, and specifying at what scales the impact is expected to occur. The magnitude of the impact is a measure of the amount by which means or variances change [3].

Critical effects sizes can be used in two main ways: As an a priori component of study design to set target levels for sample collection sufficient to detect a difference that would trigger more detailed monitoring [1,2], or a posteriori to identify the minimum effect size (difference between H0 and HA) that would be deemed unacceptable and trigger management actions [3,13,14]. Decisions regarding management action are site-specific and depend on a number of factors, including the magnitude and extent of changes, species sensitivity, number of endpoints responding, and their trend over time [2,15]. Ideally, a monitoring framework would be intentionally designed with endpoints that would be more sensitive than regulatory endpoints (e.g., species presence and absence) to allow a response time that would mitigate serious impacts. In any case, the CES becomes a component of study design to calculate the desired number of animals/samples/replicates required to make decisions.

Table Table 1.. Potential methods used for determining critical effect sizes in monitoring program using different methods
CategoryMethodUsed by government in monitoringUsed in researchPotential
Ideal approachesEcologically relevant differences× (Australia) Maybe; data not available
 Stakeholder negotiation× (Sweden) Would require time and monetary investment
 Legal mandate or program policy× (Canada) Already exist in the Canadian Environmental Effects Monitoring program
Using a number from outside the monitoring programRegulatory threshold  Probably insufficient data
 Universal effect size XNo
 Detection thresholds  No
Data distributions at reference sitesRange of natural variability× (Sweden) Maybe
 Within-site natural variability× (Canada) Yes
 Range of differences between studies× (Canada) Yes
Data distributions across impacted and reference sitesData distributions× (USA) Yes
 Distribution of impacted site data relative to reference data  Maybe
 Distribution of relative comparisons between sites× (Canada) Already exist in the Canadian Environmental Effects Monitoring program
 Distribution of statistical results of comparisons between sites X 
 Pooled effect size X 
Multivariate approachesAnalysis of reference conditions× (Canada)XYes
 Combination multimetric and reference condition× (European Union)XProbably not useful for Environmental Effects Monitoring
 Magnitude of effects relative to impacts XMaybe; not enough detail outlined in reports
 Spatially based modeling XBased on fish community approach, may be suitable for benthos

A variety of other papers have focused on issues related to power [3,8,10,12,16–21] and sampling design [10,19,22–25]. The present paper focuses on approaches that have been or could be used to set a CES or on approaches and processes that have been used to set targets and their potential relevance for setting CES. Three approaches are ideal: Knowing the size of a change that is ecologically relevant, negotiating a level of change that would be considered as ecologically relevant, and adopting a number set from ecological studies and environmental monitoring [3,26], medical research [27,28], or behavioral sciences [29]. Because so few attempts have been made to set CES, the present review examines alternative approaches, and it divides them into numbers derived from outside the specific monitoring program, the examination of past data using distributions, or multivariate techniques (Table 1). Regardless of the approach taken or the endpoints in question, the magnitude of the change that programs have used for a CES has been similar (Table 2). Detailed descriptions of the potential approaches follow.

Table Table 2.. Summary of critical effect sizes found in the literature review
MethodEndpointMagnitude of differenceReference
Ecologically relevant differencesAbundance of invertebrates settling relative to reference areas25–50%[14]
Published number (previously identified concern)Gonad size25%[2]
Universal effect sizeAll endpointsSmall (0.10) medium (0.25) and large (0.40)[29]
Within- and among-site natural variabilityInvertebrate univariate endpointsStandard deviation (SD) or equivalent (e.g.,[15]
  2 SD) 
 Invertebrate and fish multimetric endpointsVarious magnitudes; analogous to using SD units[44,45]
 Invertebrate multivariate endpointsVarious magnitudes; analogous to using SD units (e.g., 95% probability ellipse)[46]
Data distributionsGonad size, liver size, condition25, 25, and 10%, respectively[49]
Data distributionsBiological condition20%[50]
 Various5–25%[39]
Not definedBenthic abundance and biomass; in eelgrass, biomass, in blue mussel, settling25%[93]

EXISTING OR IDEAL CES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

Three main approaches to develop an ideal CES include using published data to determine the magnitude of changes in similar endpoints that would result in changes at higher levels of biological organization, negotiating a CES a priori between stakeholders, and adopting proven CES used in other monitoring programs that use similar endpoints.

Citing or developing sufficient background data

Few attempts have been made to use comparative studies and ecological theory to develop a CES. Lincoln-Smith et al. [14] examined the effects of a marine reserve on the recovery of valuable, commercially exploited tropical invertebrates and designed a study to detect 25 and 50% changes in proportional abundance relative to data collected before establishment of the marine reserve. Critical effects sizes were based on the percentage difference in the abundance of selected invertebrates at the marine reserve (before its establishment) relative to the abundance of selected invertebrates reported in the literature in areas free of exploitation elsewhere in the tropical Pacific.

Other examples of successful approaches are limited, but some hypothetical examples of the definition of CES were based on ecological-response thresholds and sensitivity provided by Mapstone [3] regarding the effect of loss of key algal species on the functioning of rocky intertidal platforms and by Downes et al. [9] concerning the potential impacts on natural benthic macroinvertebrate species by the liming of Welsh streams to promote the survival of trout. These hypothetical examples employ data and safety margins or stakeholder negotiation, and they could be used to guide data development to define CES. In reality, however, few studies have the luxury of the time lag required to develop the baseline databases needed to define these CES a priori. Specifying a CES is not a simple procedure, and unfortunately, ecological information at the level of detail presented in the mentioned examples [3,9] seldom is available. Consequently, few regulations state a required CES.

Deriving numbers from stakeholder negotiation

Although science is involved in quantifying relationships, the strength of impacts, and the variables being measured, social and economic values need to be considered when trying to decide what constitutes “harmless” or “acceptable” [9]. As a consequence, several authors emphasize the importance of public consultation with landowners, environmental groups, and industry when determining CES to develop a credible and widely accepted result by the stakeholders involved [3,9,12].

One of the major challenges involved with determining the magnitude of change deemed important is achieving consensus between stakeholders regarding how to interpret the results. For example, although descriptions of the impacts of pulp mill effluent on sexual maturation and reproductive development in fish have been available for more than 15 years [30], a lack of consensus remains among stakeholders concerning whether these changes are real, consistent, or important [31]. The inability to reach consensus about the existence of impacts, or about their causes, is based on a number of factors, including a lack of agreement regarding the importance of measurement endpoints, confusion about the relevance of changes, and confusion about what would have to be done if an impact was declared. The decision concerning the acceptability of changes can be based on science or can be developed through consensus agreement on the nature of changes that will be socially or economically unacceptable. These decision points can be agreed to through stakeholder consultation a priori, before data collection, or a posteriori.

In Sweden, a multistakeholder group met over a considerable period of time and proposed levels of impact that would be considered as unacceptable [32]. The group divided indicators into functional groups [33] and recommended that if three or more variables in the same functional group were significantly affected (meaning a statistical difference), this should be interpreted as an unacceptable disturbance of the function. For physiological functions, an unacceptable disturbance was two or more statistical differences, which would represent an unacceptable disturbance of fish health and an evident risk of population effects through increased mortality. If statistical changes occur in a functional group in one (physiological) or two (other) variables, further investigations would be needed to confirm the responses and to analyze their wider significance.

This strategy of multistakeholder negotiation to a priori define levels of impact that would be considered as unacceptable does not require that the changes be ecologically relevant, only that all parties agree in advance that the changes would be considered important enough to correct. It should be emphasized that the Swedish EEM program includes a wide variety of endpoints across multiple levels of organization, ranging from physiological changes in fish to community-level disturbances [34].

Criteria set by program policy—The Canadian EEM example

In 1992, the Canadian Pulp and Paper Effluent Regulations were developed, and these included an EEM program [35]. The EEM program is a cyclical monitoring study that provides information regarding the potential effects of effluent on fish populations, fish tissue, and benthic invertebrate communities [2]. Nine decision endpoints for fish and macroinvertebrate communities currently are used in the EEM program. Although effects are determined based on consistent, statistically significant effects over two consecutive monitoring cycles, significant environmental impacts requiring further study currently are identified based on the exceedance of responses beyond a CES. Critical effects sizes initially were developed after the completion of cycle 1 monitoring by fish and macroinvertebrate expert working groups comprised of both industry and government scientists [15,36]. Fish and macroinvertebrate working groups operated independently to develop the monitoring requirements, study designs, and CES for their respective taxonomic groups. Critical effects sizes currently are set for fish at a 25% difference in relative gonad and liver size and a 10% in condition, whereas macroinvertebrate population- and community-level endpoints are set at two standard deviations as derived from reference area data [37] (Table 2).

For fish, new pulp mill EEM requirements were developed largely as a result of concerns about potential reproductive responses to effluent. As a consequence, changes in the gonadosomatic index were of primary concern, and CES initially were set at 25%, based on the magnitude of difference observed at pulp mills at Jackfish Bay (Canada) [30] and Norrsundet (Sweden) [38], where reproductive impacts were known to have occurred. Effectively, the target was set to define how often changes were seen that were as large as those that were commonly accepted to represent significant changes. Other endpoints developed for the EEM program, such as liver size and condition, were determined to be less variable than gonad size, and as a consequence, sample sizes for these endpoints were based on the power and sample size requirements to detect a statistical difference of 25% in the gonadosomatic index (actually, a range of 20–30%). These levels of change subsequently were reexamined after two more cycles of data collection, based on the distributions of data (discussed below).

The present review did not find other examples of CES defined by developing baseline data to define a CES, or to define them through stakeholder negotiation or legal mandate. If information from comparable studies, ecological theory, or legal mandates is insufficient and published numbers for CES are not available or not acceptable, it may be possible to develop CES using generic numbers or using numbers generated from within the monitoring program after it is operating (see later sections).

SOURCES UNRELATED TO MONITORING PROGRAM DATA

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

Several methods have been proposed for setting CES that could be based on a threshold that would trigger a regulatory response, a value based on a universal effect size, or a value based on detection thresholds.

Regulatory response threshold

It is possible to set a CES based on a regulatory threshold identified through a review of regulatory decisions that have been made relevant to the endpoint in question (i.e., how often a discharger has been fined for a specific outcome). Analysis would require reviewing the regulatory decisions that have been made as a consequence of environmental impacts and relevant effects sizes and then making a decision regarding how large a difference in endpoints has been associated with these decisions. One of the challenges is the lack of a sufficient record of enforcing environmental regulations to generate a sufficiently large database to use for this purpose.

Universal effect size

Critical effects sizes could be established through the use of standard/universal effect sizes. Within the social sciences, specific magnitudes of effect size for small (0.10), medium (0.25), and large (0.40) were proposed by Cohen [29]. The biological and ecological basis for their use in environmental monitoring remains unproven, however, and even Cohen acknowledged that these values were relative to the specific content and method applicable to a given research situation. Researchers from several fields of study have recommended against adopting CES based on this method [9,27] and have emphasized that no single value is applicable in all situations. Although Cohen's effect sizes have been used in the environmental monitoring literature, thus far they have been used only in a theoretical context focused on optimizing sample sizes in monitoring programs [21].

Detection thresholds

It is possible to choose a CES based on analytical detection limits, especially for chemically based monitoring endpoints. These have been used, including in Canada, for endpoints such as the presence of chlorinated dioxins and furans in effluents (i.e., dioxins must be nondetectable in effluents or <10 ppq, which was the detection limit at the time the regulation was developed). The limit could be defined based on practicality (e.g., a detection limit), experience (difference from normal that is able to be detected), or data collection requirements, but the difficulty is that the relationship to effects on receiving water biota is unclear. Critical effect sizes based on detection thresholds make sense only when the target is well defined; otherwise, a CES would have to be developed using data collected from similar monitoring programs when they are available.

DATA DISTRIBUTIONS AT REFERENCE SITES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

Once a program is running, potential CES can be defined in a variety of ways based on data generated within a monitoring program, parallel research to generate the necessary background data for the program, or data from similar programs. The variability estimated could be derived from the ranges of data found across a range of natural variability within or among reference areas (temporal or spatial) or from the distributions of reference data. A variety of approaches can be used to define reference sites [39]; these include best professional judgment; using minimally disturbed or least-disturbed sites; following preset chemical, physical, or biological criteria; or interpreting historical conditions.

Range of natural variability among reference areas

Several researchers have cited the importance of using natural variability among reference sites within a monitoring program to identify the range of expected response levels and then subsequently defining CES as an observed value either outside of or at the extreme of this range. This approach would involve sampling multiple reference sites, potentially in multiple reference seasons, to assess natural variation in characteristics. Balk et al. [40] sampled Eurasian perch (Perca fluviatilis) at two reference sites for six years to document natural variability in whole-organism characteristics. Swedish scientists have a number of long-term reference databases to use in defining natural variability [38,41,42]. Few studies have defined natural variability, although some databases are available, including long-term studies on fish endpoints at pulp mills [43] (K.R. Munkittrick, unpublished data). It should be noted that natural variability among reference sites frequently is used to derive the equivalent of CES for invertebrate and fish monitoring studies using both multimetric approaches (see, e.g., [44,45]) and multivariate approaches (see, e.g., [46]). These kinds of approaches are described in more detail below.

Effectively, this approach defines the range as values outside of those seen at reference sites. This approach has definite disadvantages for use in a monitoring program, including the fact that many monitoring endpoints change seasonally and are affected by habitat. As a consequence, a CES based on this method would have the added disadvantage of needing to be region and habitat specific, and a nationwide standard would not be possible. Dramatic differences can occur in reference levels within a species regionally and between lake and river environments [43,47] such that exceeding the normal range of variability would require about half the observed fish in an exposed site to actually have a condition that is outside the normal range of variability observed in reference sites. This is less acceptable than it is for more variable invertebrate community endpoints, such as indices of benthic community composition, in which use of the normal range generally is accepted [48].

Natural variability within reference areas

Critical effects sizes have been set for macroinvertebrate communities using a measure of natural variability within reference areas, such as two standard deviations calculated from the mean of the reference area data [15,37,48,49]. A general relationship exists between effect sizes expressed as percentage differences (from the reference mean) and as reference area standard deviation units (R.B. Lowell, unpublished analysis, Canadian pulp and paper EEM data). Fixed percentage CES for invertebrate community data (e.g., a–50% to +200% change relative to the reference mean) ultimately are based on measures of reference area variability [15] but would vary widely in magnitude among different geographic areas. One of the challenges of a CES based on standard deviation is that the value would be free to vary between sites and studies and could lead sampling personnel to artificially inflate reference area variability and so reduce the potential for finding significant differences.

Basing CES on variability has the advantage of being end-point specific, but the magnitude of impact determined to be important can vary between programs or cycles when converted to a fixed percentage (as in the multivariate approach described below). This could potentially be viewed as a means of implementing adaptive management (i.e., adapting CES to natural cycle-to-cycle variability in reference conditions).

DATA DISTRIBUTIONS AT REFERENCE SITES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

The distribution of data from the reference sites could be used to set a CES based on ambient distributions, and these have been based on 5th or 25th percentiles of important indicators [39] or on the 10th percentile [50]. Impairment also has been defined as a percentile of the reference data distribution, including assigning the 95th percentile as maximum impairment [51] or the 25th percentile of the least-impacted streams as minimum impairment [52]. In addition, it is possible to set a target level as the maximum (or minimum) level found at reference sites by using a range associated with two standard deviations from the reference sites or by using the 95% confidence interval of the reference sites; this would represent a combination of the approaches, using variance to define ranges and natural variability to set the CES [43].

Based on these approaches, Meador et al. [50] judged a 20% change in biological condition as being degraded, although they caution that “a 20% change in biological condition should represent a reasonable threshold for statistical comparison and a biologically relevant response to disturbance [but] should not necessarily be considered a standard for regulatory purposes.”

A variation of the data distribution approach also has been advocated by the U.S. Environmental Protection Agency for nutrient and algal monitoring in their reference reach approach [53]. Data from relatively undisturbed stream segments are used to identify the natural range of nutrient and algal indices, and potentially impacted streams are classified, based on their condition relative to the reference streams, as at reference, at risk, or impaired. The reference values can be selected based on best professional judgement, a percentile of the distribution (i.e., 75th), or a percentile of the streams thought to represent reference streams (i.e., 5th to 25th) [53].

Data distributions across impacted and reference sites

If similar monitoring programs have been developed, or if the monitoring program is ongoing, data distributions across all sites can be used to generate a CES using a variety of procedures, including the distributions of impacted sites versus reference data, the distributions of comparisons relative to reference data, and comparisons of the distributions of the statistical differences between sites or the magnitude of the differences between sites (the pooled effect size approach described below).

Data distributions

The scoring approaches common in multimetric index development typically use the distributions of data from both reference and impacted sites [54]. A multimetric index takes data from a variety of endpoints, calibrates the various end-points (e.g., size, lesions, and percentage omnivorous species) individually against the distributions of data, scales its values, and obtains a unitless score, which is then aggregated with other scores to form a multimetric index (see, e.g., [55–57]). Total rankings can be summed, scaled to 100 based on the number of metrics used, or standardized to a scale of 0 to 1 through dividing by the sum of the reference sites. Ideal multimetric indices incorporate multiple levels of biological organization, address structure and function within the community, and incorporate broad sensitivities and ranges of habitat. Final metrics usually are selected from a large group of initial metrics based on criteria of power, consistency, uniqueness, or overlap with defined reference sites [58], and they are designed to be responsive to stressors and to exhibit low natural variability [59].

The relevant issue related to developing CES is the designation of the marks for the metrics and the use of the distributions of data across all sites. Sites usually are preclassified based on human judgment or stressor data, and scores are picked based on the distributions of data. The metrics can be assigned scores on a subjective scale (e.g., slight deviation or moderate) [60] based on dividing the distributions of data by percentiles or quartiles or based on lines that bisect the data into double or triple groups [51,61] or that bisect interquartiles [62]. Candidate scores can be distributed across a variety of rankings, ranging from 0 to 10 [58], 1 to 5 [63], 0 to 6 [51], or 0, 10, or 20 [52]. The scales can be discontinuous (i.e., a score of 0, 2, 4, and 6 are only possible as in Applegate et al. [51], and a score of 1, 3, and 5 as in Karr [63]) or continuous (as in Mayon et al. [64]). In all cases, assigning scores still requires some assumption about what represents normal and what represents impairment for each system and each endpoint.

A similar approach has been taken with the biological condition gradient [65] as a model of biological response to increasing effects of stressors. This model encompasses the complete range, or gradient, of aquatic resource conditions from natural (e.g., undisturbed or minimally disturbed conditions) to severely altered conditions. The biological condition gradient uses changes in 10 ecological attributes and divides responses into six condition tiers, with tier 1 representing natural or undisturbed conditions and tier 6 representing severely altered conditions. The attributes vary from taxonomic composition and tolerance to nonnative taxa to organism condition and ecosystem function. Organism condition indicators include fecundity, morbidity, mortality, growth rates, and anomalies (lesions, tumors, and deformities). For individual-level indicators, 1 and 2 are background, 3 is infrequent changes, 4 represents that the incidence of anomalies may be slightly higher than expected, 5 indicates that biomass may be reduced and anomalies are increasingly common, and 6 represents that long-lived taxa may be absent, biomass reduced, anomalies common and serious, and reproduction minimal except in extremely tolerant groups. The tiers are combined into a weight-of-evidence approach [65], and levels of impairment are not defined for specific levels, resulting in limitations similar to those seen with the multimetric approaches.

Distribution of impacted site data relative to reference data

It is possible to develop a CES by examining the distribution of the data from impacted sites relative to the reference sites, similar to what is done within the sediment-quality triad approach [66]. The triad involves a comparison of data for sediment chemistry, sediment toxicity testing, and benthic community indices, although it can be extended to include a wide variety of other lines of evidence [67]. Each station is classified for contamination (based on the values of the mean effects range relative to contaminant levels [high, medium, or low]), toxicity (subjective scale from toxic to marginally toxic to nontoxic), and quality of the benthic assemblage (impaired, slightly impaired, or not impaired). Each metric can be scored as 5, 3, or 1, depending on whether it approximates, deviates slightly from, or deviates greatly from conditions at reference sites, respectively. The data also can be plotted graphically (on a scale of 0–1 based on the ratio to reference) [68] or scaled from 0 to 100 based on the ratio of the difference to that of the maximum difference from reference site [69]. The data also can be ranked nonparametrically, compared by rank correlations or spatial correspondence [68,70], and subjected to a principal component analysis [70], cluster analyses, or descriptive discriminant analysis [71].

The sediment-quality triad does not use a CES approach per se; rather, it uses a difference relative to the maximum difference from the reference site to rank sites and so identify the most different sites and the degree of confidence in the relationship between altered communities, chemical contamination, and inherent toxicity. The final approach is not quantitative, requires the use of multiple stations, and is based on relative differences to the reference site, basically ranking the most different as the worst. Similar to the distribution data across all sites, the relative distribution of impacted sites could be used to define CES by employing any of the types of distributions discussed in the multimetric approach.

Distribution of relative comparisons between sites

Since the completion of cycles 2 and 3 of the pulp and paper program, the Canadian EEM approach has used the distributions of relative comparisons between sites. Depending on the endpoint and cycle, studies are conducted at from 60 to more than 110 sites; each study compares reference and impacted sites (>95% of studies use a single reference and a single impacted site) and defines a magnitude of effect in that study (for males or females, targeting two species). The program prioritizes sites based on the distributions of differences between sites for the measured endpoints (Fig. 1). Using this structure, CES could then be based on defining relatively rare differences as representing potential areas of concern in which more information collection is warranted [72,73]. Currently, CES for fish are set at 25% for most endpoints [37]. The distributions of responses also could be subjected to a distribution analysis using the 90th or 95th percentile of the differences to define the magnitude of a CES (Fig. 2); for gonad sizes, the 90th percentile difference averaged 33.9% of cycles 1 to 3 of the program (data not shown).

thumbnail image

Figure Fig. 1.. Gonadal size differences during cycle 2 (□) and cycle 3(▪) fish comparisons from the Canadian pulp and paper Environmental Effects Monitoring program (modified from Lowell et al. [37]).

Download figure to PowerPoint

This approach is similar in intent to the benchmark dose (BMD) approach advocated for determining human exposure limits to toxic compounds [27,74]. The BMD uses a mathematical dose-response curve fitted to a specified benchmark response, defined as the degree of change assumed to distinguish between an adverse and a nonadverse effect [27]. The lower limit of the statistical confidence interval of the critical effect dose is called the BMD and has been proposed as an alternative to the use of no-effect levels to define a point of departure for reference dose/concentration calculations. The benchmark response often is guided by a threshold response in a dose-response curve. Threshold responses often are difficult to specify unambiguously in practice, and values often are left to best professional judgment and are open to interpretation. Conceptually, however, the approach is built with the purpose of preventing effects at any magnitude, and it requires the initial determination of a benchmark response for a chosen endpoint in response to an exposure variable. Regardless, the choice of the benchmark response typically is arbitrary, and it is common to use additional 10-fold safety factors to compensate for the uncertainty of responses within the human population and for extrapolating the results of animal exposures to humans [27].

Although this approach is not typically used to set a CES, it could be. The BMD could be set where a proportional increase occurs in the response of test subjects to an effect, along a dose-response curve [27,28]. An example of the approach can be seen in Figure 2 using the Canadian EEM fish data from the pulp and paper monitoring program. A significant change in the slope of the line would signify a biologically relevant change in the magnitude of the response and could be used to determine the benchmark response.

thumbnail image

Figure Fig. 2.. Distributions of percentage change in gonadosomatic index (absolute levels, expressed as a percentage of the reference site mean) from the 210 comparisons conducted during cycle 2 and cycle 3 of the Canadian pulp and paper Environmental Effects Monitoring program (T. Barrett and K.R. Munkittrick, unpublished data).

Download figure to PowerPoint

Distribution of statistical results of comparisons between sites

Yeom and Adams [60] developed an integrated index to evaluate effects of stressors over several levels of biological organization, ranging from the suborganism to the community level, using applied integrative star plot analysis. Accumulated values are plotted after being summed and divided by the total score at the reference site to give a number between 0 and 1. The area of the star plot is then calculated (see Beliaeff and Burgeot [75]). The least-disturbed areas are assigned a value of 2.0, whereas the maximal impairment approaches 0. Sampling sites are evaluated according to three categories of health status, including acceptable (star plot area, >1.60), marginally impaired (star plot area, 1.20–1.60), and impaired (star plot area, <1.20).

The approach uses statistical results to assign the scores for the metrics: Those metrics with no difference from reference conditions at p > 0.05 are assigned a score of 4, those with a difference from reference conditions at p < 0.05 receive a score of 3, those with a difference from reference conditions at p < 0.01 receive a score of 2, and those with a difference from reference conditions at p < 0.001 receive a score of 1. For other metrics, the 95th percentile of the data distribution is used to eliminate outliers, and remaining values are standardized as a percentage of the 95th percentile value to provide a range of scores: Values of 75% and higher receive a score of 4, whereas sites with values of 50 to 74% received a score of 3, sites with values of 25 to 49% a score of 2, and sites with values less than 25% a score of 1. As with the multimetric and sediment-quality triad approaches described above, the distribution of the data could be used to define CES for specific endpoints, with the added advantage that it would be based on statistical differences.

Tejerina-Garro et al. [76] based their designations of candidate scores for a multimetric index on distributions of the Student's t values for comparisons with reference sites. In the Canadian EEM program, the 75th percentile p value for cycles 2 and 3 gonad size data was 0.009, and the 80th percentile p value was 0.001 (T. Barrett and K.R. Munkittrick, unpublished data).

Pooled effect size

Bailer et al. [77] recommended an approach that converts the differences between sites based on the pooled standard deviation, similar to the one currently conducted for metaanalysis of Canadian EEM fish endpoint data [37,49]. The standardized difference suggested [77] is used to compare the p values from the statistical testing at each site to the pooled effect size values [(meanexp — meanref)/(pooled standard deviation)], but care should be taken to examine the potential influence of unequal variance between sites. The regression between the two values is compared to define the effect size (in units of standard deviation) crossed by p values of 0.05 and 0.10 to determine a range of important effect sizes ranging from the intersections of p = 0.05 and p = 0.10 with the regression curve. This pooled effect range could be used to set the CES and could be described by the range or the median of the range.

MULTIVARIATE APPROACHES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

Considerable interest exists in using multivariate approaches to examine the data from monitoring studies, both within a study, between sites, and between programs, and it generally is recognized that multivariate techniques can be more sensitive than univariate techniques [78]. Various multivariate approaches were reviewed for their relevance to setting CES, including multidimensional scaling and other ordination techniques to examine differences among reference sites, distributions and magnitudes of effects sizes, a combination of approaches, and the potential use of spatially based mapping.

Multivariate analysis of reference conditions

The reference condition approach involves testing an ecosystem exposed to a potential stressor against a reference condition that is little impaired [46,79,80]. It has been applied primarily to benthic communities (see below for fish-based applications) and has been applied in the United Kingdom, Canada, United States, and Australia (for reviews, see [46,81]). It usually requires a large number (»100 reference sites) and can use the presence or absence of taxa or additive indices using multiple metrics [79,82]. Expert judgment or public workshops are used to identify regional (or ecoregional) reference sites, and a large number of community metrics (55 in Reynoldson et al. [82]) are assessed by ordination to reduce redundancy. Environmental variables that can be affected by disturbance are removed from the analysis. The reference condition approach clusters communities based on similarity of community structure, and it correlates the biological data with environmental attributes to define an optimal set of environmental variables that can be used to predict group membership. Test (exposure) stations are assessed relative to the group to which it is predicted to belong to determine if it is different, either by the range of variation observed at reference sites (two standard deviations) or by the use of ordination methods and determining if the reference site is within the 95% probability ellipse of the matched reference sites [2].

Tonn et al. [83] have developed a reference condition approach for fish that uses canonical correspondence analysis to identify reference community characteristics for environmental variables at reference sites, followed by discriminant function analysis with cross-validation and best-model fits to determine whether disturbed sites had predicted communities.

In terms of defining CES, the effect size could be determined from the distributions of the reference site data once the analysis has been completed. Some limitations exist: Because sites are randomly selected and usually visited once to define reference status, it does not encompass natural temporal variability; it is assumed that the large number of sites sampled encompasses the variability issue. These approaches basically define sites that are furthest from the expected reference norms but differ from the other approaches in that they describe the difference from a range of variability between reference sites. These approaches are not unlike attempts to define the maximum differences and to rank the maximum differences as the most important.

Combination of multimetric and reference condition approaches

The European fish-based index combines the multimetric Index of Biotic Integrity approach with the reference condition approach, but it requires a large number of sites (>5,000 were used by Pont et al. [84]). It depends on best professional judgment to classify the sites and to decide on the reference sites. The approach is semiquantitative and uses four measures of human impact (modification of morphology, hydrology, presence of toxic substances or acidification, and nutrient loading) that are ranked from 1 to 5 (no pressure to severe impact) and summed for a rating of 4 to 20. These rankings are re-ranked from 1 to 5 (based on groupings of four) to represent no impact to very heavily impacted. Fish community data were modeled statistically to define the reference sites representing the least disturbed using the relationship of functional ecological characteristics relative to 13 local and regional environmental variables, and metrics were developed based on the deviation of the site from 0 (i.e., the most different sites were rated as the worst). The final index sums all retain variables, rated on a scale of 0 to 10 each, and then recalculates the index to get a value of between 0 and 1. These classifications can be analyzed to determine an effect size that will place a response outside of the reference data.

thumbnail image

Figure Fig. 3.. Environmental Effects Monitoring program cycle 2 fish data plotted in a multidimensional scaling ordination (modified from Lowell et al. [49]). Line encompasses the 90% of sites closest to the origin of the plot (○ = female fish; • = male fish).

Download figure to PowerPoint

Relative magnitude of effect

It is possible to set CES based on the multivariate magnitude of effect relative to other facilities (e.g., other pulp mills) based on a multiple dimension scaling that measures the multivariate distance a site lies from the zero-effect condition. The approach effectively would set a CES through identifying sites that show larger effects based on considering several response variables simultaneously. Figure 3 shows cycle 2 fish data for the Canadian pulp and paper EEM program plotted in a multidimensional scaling ordination [49]. Points drawn to the bottom right of the plot represent sites where the collected fish had larger gonads, larger livers, and larger condition; sites drawn to the upper left represent sites where collected fish had smaller livers, smaller gonads, and smaller condition factor. The circle encompasses the 90% of sites closest to the origin of the plot. Under this approach, sites outside the circle would represent the 10% of mills with the worst impacts in the country. The magnitude of these differences could be used to define a CES for a subsequent cycle. This approach has a number of disadvantages. For example, a mill's location inside or outside the circle may vary depending on the endpoints included in the analysis, and the magnitude of impact determined to be important can vary from cycle to cycle and with which mills were incorporated in the analysis.

Spatially based modeling

A second European approach to examining impacts on fish uses spatially based modeling, which copes with the problem of natural variability by predicting reference conditions of distinct fish assemblage types. An individual method is developed for each fish assemblage type, and discriminant function analyses are used to model and predict status class membership of any given site [85]. Metrics are weighted in the discriminant function according to their individual contribution to the overall pressure-index relationship. The disadvantage of discriminant function analysis is that because of its multidimensional nature, the contribution of individual metrics is hidden in the discriminant functions of the model [59].

A BAYESIAN PERSPECTIVE

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

The use of Bayesian statistics and decision theory has been increasing in ecology, conservation biology, and fisheries management since the mid-1990s (see, e.g., [86–88]). Use of Bayesian statistics and decision theory has been advocated to improve the quality of environmental monitoring programs by taking uncertainties into account quantitatively when evaluating management options [86,88,89]. In traditional frequentist inference, tests of significance are performed by supposing that a hypothesis is true (the null hypothesis) and then computing the probability of observing a statistic at least as extreme as the one actually observed during hypothetical future repeated trials. In contrast, Bayesian statistical inference requires the explicit assignment of previous probabilities, based on expert judgements elicited using existing and additional data, to the outcomes of experiments. The results of these experiments, regardless of sample size, then can be used to compute a posteriori probabilities of the hypotheses given the available data [86]. In other words, frequentist statistics examine the probability of the data given a model (hypothesis), whereas Bayesian statistics examine the probability of a model given the data. Proponents of Bayesian statistics have argued that this approach makes better use of existing data, allows stronger conclusions to be drawn from large-scale experiments with few replicates, and is a more relevant approach to environmental decision making.

Although this approach is gaining momentum in the ecological and environmental literature, it has not been widely adopted, and the theoretical framework generally is not well understood by biologists or managers. An additional concern exists regarding subjectivity in subsequent analyses, particularly subjectivity arising from the assignment of previous probabilities. Proponents, however, argue that these concerns are largely superficial and originate from poor understanding of the concept. Although the Bayesian approach is conceptually a viable option for field-monitoring programs, the predominant method currently in practice uses traditional hypothesis-driven experimental design.

CONCLUSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

Despite the widespread appeals for using CES to guide environmental management decisions, they are rarely used in practice. This is particularly true for CES developed directly for fish endpoints, such as those used in the Canadian EEM program [37], in part because of the popularity of multimetric endpoints for fish assessments in the United States and Europe. Biologically relevant effect sizes should be defined a priori and, in applied studies, take into consideration the type and magnitude of change that is likely to be of concern [78,89], which requires that the sampling designs be understood in advance [90–92].

A number of potential methods for determining CES were identified and reviewed, and several of these approaches have the potential to be suitable for determining CES in a monitoring program like Canada's EEM program, based on existing EEM data collected in cycles 1 to 4. Furthermore, a number of attempts to define CES are relevant in terms of the justifications they have used and the values employed for decision making.

Values based on extreme effect sizes and on stakeholder negotiations were each identified as potential alternatives for determining CES. Numbers based on observed regulatory thresholds and published CES, however, do not appear to be viable options given the lack of information available at this time. Universal effect sizes also were determined to be unsuitable, both because single values often are not applicable across disciplines and because of the lack of biological and/or ecological basis in their assignment.

Several of the data-defined methods reviewed may be applicable for evaluating CES, and further effort should be placed on the subsequent evaluation of these approaches using available data, with additional credibility potentially achieved by obtaining agreement among various methods. These would include examinations of data distributions and the distributions of statistical differences as well as evaluations of applying the various benchmark approaches and thresholds to examine differences. Dose-response curves commonly used for determining CES type effect sizes in medical science may be a potential option for determining CES; however, available data will have to be mined to determine whether a suitable dose-response curve could be developed. An additional advantage of data-defined methods is that they offer the potential for adaptive management, thus resetting the bar between monitoring cycles and allowing continuous movement toward improvement. Suitability would depend on the motivation for the monitoring program.

Numerous examples of alternative approaches were found that would be applicable to monitoring programs. Recommended approaches include basing CES on natural variability within and among comparable reference areas (e.g., two standard deviations), an approach commonly accepted for benthic community analyses, as well as deriving fixed percentage CES (e.g., 25%) for fish endpoints (Table 2). Further analyses of the existing data should be done to evaluate these alternative approaches for setting CES.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES

We would like to thank the researchers we corresponded with for their thoughts and suggestions regarding the use of CES in environmental monitoring, particularly Tony Underwood and three anonymous reviewers for their insightful comments. Tim Barrett is acknowledged for the statistical plots of the cycle 2 and 3 EEM data.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. EXISTING OR IDEAL CES
  5. SOURCES UNRELATED TO MONITORING PROGRAM DATA
  6. DATA DISTRIBUTIONS AT REFERENCE SITES
  7. DATA DISTRIBUTIONS AT REFERENCE SITES
  8. MULTIVARIATE APPROACHES
  9. A BAYESIAN PERSPECTIVE
  10. CONCLUSION
  11. Acknowledgements
  12. REFERENCES
  • 1
    Newman MC. 2008. “What exactly are you inferring?” A closer look at hypothesis testing. Environ Toxicol Chem 27: 10131019.
  • 2
    Environment Canada. 2005. Pulp and paper environmental effects monitoring guidance document. EEM/2005/1. National Environmental Effects Monitoring Office, Environment Canada, Gatineau, QC.
  • 3
    Mapstone BD. 1995. Scalable decision rules for environmental impact studies: Effect size, type I, and type II errors. Ecol Appl 5: 401410.
  • 4
    Oris JT, Roberts AP. 2007. Statistical analysis of cytochrome P4501A biomarker measurements in fish. Environ Toxicol Chem 26: 17421750.
  • 5
    van der Oost R, Beyer J, Vermeulen NPE. 2003. Fish bioaccumulation and biomarkers in environmental risk assessment: A review. Environ Toxicol Pharmacol 13: 57149.
  • 6
    Kidd KA, Blanchfield PJ, Mills KH, Palace VP, Evans RE. 2007. Collapse of a fish population after exposure to a synthetic estrogen. Proc Natl Acad Sci U S A 104: 88978901.
  • 7
    Carls MG, Heintz RA, Marty GD, Rice SD. 2005. Cytochrome P4501A induction in oil-exposed pink salmon Oncorhynchus gorbuscha embryos predicts reduced survival potential. Mar Ecol Prog Ser 301: 253265.
  • 8
    Osenberg CW, Schmitt RJ, Holbrook SJ, Abu-Saba KE, Flegal AR. 1994. Detection of environmental impacts: Natural variability, effect size, and power analysis. Ecol Appl 4: 1630.
  • 9
    Downes BJ, Barmuta LA, Fairweather PG, Faith DP, Keough MJ, Lake PS, Mapstone BD, Quinn GP. 2002. Monitoring Ecological Impacts Concepts and Practice in Flowing Waters. Cambridge University, Cambridge, UK.
  • 10
    Underwood AJ, Chapman MG. 2003. Power, precaution, type II error and sampling design in assessment of environmental impacts. J Exp Mar Biol Ecol 296: 4970.
  • 11
    Elliott M, De Jonge VN. 1996. The need for monitoring the monitors and their monitoring. Mar Pollut Bull 32: 248249.
  • 12
    Field SA, O'Connor PJ, Tyre AJ, Possingham HP. 2007. Making monitoring meaningful. Aust J Ecol 32: 485491.
  • 13
    Keough MJ, Mapstone BD. 1997. Designing environmental monitoring for pulp mills in Australia. Water Sci Technol 35: 397404.
  • 14
    Lincoln-Smith MP, Pitt KA, Bell JD, Mapstone BD. 2006. Using impact assessment methods to determine the effects of a marine reserve on abundances and sizes of valuable tropical invertebrates. Can J Fish Aquat Sci 63: 12511266.
  • 15
    Lowell RB. 1997. Discussion paper on critical effect size guidelines for EEM using benthic invertebrate communities. EEM/1997/8. National Environmental Effects Monitoring Office, Environment Canada, Ottawa, ON.
  • 16
    Green RH. 1989. Power analysis and practical strategies for environmental monitoring. Environ Res 50: 195205.
  • 17
    Peterman RM. 1990. Statistical power analysis can improve fisheries research and management. Can J Fish Aquat Sci 47: 215.
  • 18
    Fairweather PG. 1991. Statistical power and design requirements for environmental monitoring. Aust J Mar Freshw Res 42: 555567.
  • 19
    Underwood AJ. 1995. Detection and measurement of environmental impacts. In UnderwoodAJ, ChapmanMG, eds, Coastal Marine Ecology of Temperate Australia. University of New South Wales, Sydney, Australia, pp 311324.
  • 20
    Fox DR. 2006. Statistical issues in ecological risk assessment. Hum Ecol Risk Assess 12: 120129.
  • 21
    Ortiz M. 2002. Optimum sample size to detect perturbation effects: the importance of statistical power analysis—A critique. Mar Ecol 23: 19.
  • 22
    Clarke KR, Green RH. 1988. Statistical design and analysis for a 'biological effects' study. Mar Ecol Prog Ser 46: 213226.
  • 23
    Underwood AJ. 1994. On beyond BACI: Sampling designs that might reliably detect environmental disturbances. Ecol Appl 4: 315.
  • 24
    Benedetti-Cecchi L. 2001. Beyond BACI: optimization of environmental sampling designs through monitoring and simulation. Ecol Appl 11: 783799.
  • 25
    Cabral HN, Murta AG. 2004. Effect of sampling design on abundance estimates of benthic invertebrates in environmental monitoring studies. Mar Ecol Prog Ser 276: 1924.
  • 26
    Field SA, Tyre AJ, Jonzen N, Rhodes JR, Possingham HP. 2004. Minimizing the cost of environmental management decisions by optimizing statistical thresholds. Ecol Lett 7: 669675.
  • 27
    Dekkers S, de Heer C, Rennen MAJ. 2001. Critical effect sizes in toxicological risk assessment: A comprehensive and critical evaluation. Environ Toxicol Pharmacol 10: 3352.
  • 28
    Dekkers S, Telman J, Rennen MAJ, Appel MJ, de Heer C. 2006. Within-animal variation as an indication of the minimal magnitude of the critical effect size for continuous toxicological parameters applicable in the benchmark does approach. Risk Anal 26: 867880.
  • 29
    Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences, 2nd ed. L. Erlbaum, Hillsdale, NY, USA
  • 30
    Munkittrick KR, Portt CB, Van der Kraak G, Smith IR, Rokosh DA. 1991. Impact of bleached kraft mill effluent on population characteristics, liver MFO activity, and serum steroid levels of a Lake Superior white sucker (Catostomus commersoni) populations. Can J Fish Aquat Sci 48: 13711380.
  • 31
    Munkittrick KR, Sandström O. 2003. Ecological assessments of pulp mill impacts: Issues, concerns, myths and research needs. In Stuthridge T, van den Heuvel M, Marvin N, Slade A, Clifford J, eds, Environmental Impacts of Pulp and Paper Waste Streams. Proceedings, Third International Conference on Environmental Fate and Effects of Pulp and Paper Mill Effluents, Rotorua, New Zealand, pp 352362.
  • 32
    Larsson Å, Förlin L, Grahn O, Landner L, Lindesjöö E, Sandström O. 2000. Guidelines for interpretation and biological evaluation of biochemical, physiological and pathological alterations in fish exposed to industrial effluents. SSVL Miljö 2000, Report 5. Supplement 2. Swedish Environmental Protection Agency, Stockholm.
  • 33
    Sandström O, Larsson A, Andersson J, Appelberg M, Bignert A, Ek H, Forlin L, Olsson M. 2005. Three decades of Swedish experience demonstrates the need for integrated long-term monitoring of fish in marine coastal areas. Water Qual Res J Can 40: 233250.
  • 34
    Swedish Environmental Protection Agency 1997. Environmental impacts of pulp and paper mill effluents: A strategy for future environmental risk assessments. Report 4785. Stockholm, Sweden.
  • 35
    Walker SL, Hedley K, Porter E. 2002. Pulp and paper environmental effects monitoring in Canada: An overview. Water Qual Res J Can 37: 719.
  • 36
    Paine M. 1997. Detailed variability analyses for cycle 1 results. In Environment Canada, Fish Survey Expert Working Group: Review of EEM Cycle 1. Final Report, Cycle 1 Analysis. Evaluation and Interpretation Branch, Environment Canada, Ottawa, ON, pp 5593.
  • 37
    Lowell RB, Ring B, Pastershank G, Walker S, Trudel L, Hedley K. 2005. National assessment of pulp and paper Environmental Effects Monitoring data: Findings from cycles 1 through 3. National Water Research Institute, Scientific Assessment Report Series 5. Environment Canada, Burlington, ON.
  • 38
    Sandström O, Thoresson G. 1988. Mortality in perch populations in a Baltic pulp mill effluent area. Mar Pollut Bull 19: 564567.
  • 39
    Stoddard JL, Larsen DP, Hawkins CP, Johnson RK, Norris RH. 2006. Setting expectations for the ecological condition of streams: The concept of reference condition. Ecol Appl 16: 12671276.
  • 40
    Balk L, Larsson A, Forlin L. 1996. Baseline studies of biomarkers in the feral female perch (Perca fluviatilis) as tools in biological monitoring of anthropogenic substances. Mar Environ Res 42: 203208.
  • 41
    Sandström O, Neuman E. 2003. Long-term development in a Baltic fish community exposed to bleached pulp mill effluent. Aquat Ecol 37: 267276.
  • 42
    Sandström O, Larsson A, Andersson J, Appelberg M, Bignert A, Ek H, Forlin L, Olsson M. 2005. Three decades of Swedish experience demonstrates the need for integrated long-term monitoring of fish in marine coastal areas. Water Qual Res J Can 40: 233250.
  • 43
    Munkittrick KR, McMaster ME, Van der Kraak G, Portt C, Gibbons WN, Farwell A, Gray M. 2000. Development of Methods for Effects-Driven Cumulative Effects Assessment Using Fish Populations: Moose River Project. SETAC Technical Publications Series. SETAC, Pensacola, FL, USA.
  • 44
    Ohio Environmental Protection Agency. 1988. Biological Criteria for the Protection of Aquatic Life, Vol II—User's manual for biological field assessment of Ohio surface waters. Document 0046e/0013e. Division of Water Quality Monitoring and Assessment, Columbus, OH, USA.
  • 45
    Ohio Environmental Protection Agency. 1988. Biological Criteria for the Protection of Aquatic Life, Vol I—The role of biological data in water quality assessment. Document 0055e/0015e. Division of Water Quality Monitoring and Assessment, Columbus, OH, USA.
  • 46
    Bailey RC, Norris RH, Reynoldson TB. 2004. Bioassessment of Freshwater Ecosystems: Using the Reference Condition Approach. Kluwer, Dordrecht, The Netherlands.
  • 47
    Galloway BJ, Munkittrick KR, Currie S, Gray MA, Curry RA, Wood CS. 2003. Examination of the responses of slimy sculpin (Cottus cognatus) and white sucker (Catostomus commersoni) collected on the Saint John River (Canada) downstream of pulp mill, paper mill, and sewage discharges. Environ Toxicol Chem 22: 28982907.
  • 48
    Kilgour BW, Somers KM, Matthews DE. 1998. Using the normal range as a criterion for biological significance in environmental monitoring and assessment. Ecoscience 5: 542550.
  • 49
    Lowell RB, Ribey SC, Ellis IK, Porter EL, Culp JM, Grapentine LC, McMaster ME, Munkittrick KR, Scroggins RP. 2003. National assessment of the pulp and paper environmental effects monitoring data. National Water Research Institute Contribution 03–521. Environment Canada, Gatineau, QC.
  • 50
    Meador MR, Whittier TR, Goldstein RM, Hughes RM, Peck DV. 2008. Evaluation of an Index of Biotic Integrity approach used to assess biological condition in western U.S. streams and rivers at varying spatial scales. Trans Am Fish Soc 137: 1322.
  • 51
    Applegate JM, Baumann PC, Emery EB, Wooten MS. 2007. First steps in developing a multimetric macroinvertebrate index for the Ohio River. River Res Appl 23: 683697.
  • 52
    Lyons J. 2006. A fish-based index of biotic integrity to assess intermittent headwater streams in Wisconsin, USA. Environ Monit Assess 122: 239258.
  • 53
    U.S. Environmental Protection Agency. 2000. Nutrient Criteria Technical Guidance Manual—Rivers and Streams. EPA-822-B-00–002. Office of Water, Washington, DC.
  • 54
    Roset N, Grenouillet G, Goffaux D. 2007. A review of existing fish assemblage indicators and methodologies. Fish Manag Ecol 14: 393405.
  • 55
    Hughes RM, Oberdorff T. 1999. Applications of IBI concepts and metrics to waters outside the United States and Canada. In SimonTP, ed, Assessing the Sustainability and Biological Integrity of Water Resources using Fish Communities. Lewis, Boca Raton, FL, USA, pp 7983.
  • 56
    Angermeier PL, Smogor RA, Stauffer JR. 2000. Regional frameworks and candidate metrics for assessing biotic integrity in mid-Atlantic highland streams. Trans Am Fish Soc 129: 962981.
  • 57
    Scardi M, Tancioni L, Cautaudella S. 2006. Monitoring methods based on fish. In ZiglioG, SiligardiM, FlaimG, eds, Biological Monitoring of Rivers. John Wiley, Chichester, UK, pp 135154.
  • 58
    Herbst D, Silldorff E. 2006. Comparison of the performance of different bioassessment methods: Similar evaluations of biotic integrity from separate programs and procedures. J North Am Benthol Soc 25: 513530.
  • 59
    Schmutz S, Cowx IG, Haidvogl G, Pont D. 2007. Fish-based methods for assessing European running waters: A synthesis. Fish Manag Ecol 14: 369380.
  • 60
    Yeom D-H, Adams SM. 2007. Assessing effects of stress across levels of biological organization using an aquatic ecosystem health index. Ecotoxicol Environ Saf 67: 286295.
  • 61
    Pinto BCT, Araujo FG, Hughes RM. 2006. Effects of landscape and riparian condition on a fish index of biotic integrity in a large southeastern Brazil river. Hydrobiologia 556: 6983.
  • 62
    Rodríguez-Olarte D, Amaro A, Coronel J, Taphorn BDC. 2006. Integrity of fluvial fish communities is subject to environmental gradients in mountain streams, Sierra de Aroa, north Caribbean coast, Venezuela. Neotropical Ichthyology 4: 319328.
  • 63
    Karr JR. 1981. Assessment of biotic integrity using fish communities. Fisheries 6: 2127.
  • 64
    Mayon N, Bertrand A, Leroy D, Malbrouck C, Mandiki SNM, Silvestre F, Goffart A, Thomé J-P, Kestemont P. 2006. Multiscale approach of fish responses to different types of environmental contaminations: A case study. Sci Total Environ 367: 715731.
  • 65
    Davies SP, Jackson SK. 2006. The Biological Condition Gradient: A descriptive model for interpreting change in aquatic ecosystems. Ecol Appl 16: 12511266.
  • 66
    Chapman PM. 2007. Do not disregard the benthos in sediment quality assessments! Mar Pollut Bull 54: 633635.
  • 67
    Chapman PM, Hollert H. 2006. Should the Sediment Quality Triad become a tetrad, a pentad, or possibly even a hexad? Journal of Soils and Sediments 6: 48.
  • 68
    Chapman PM. 1996. Presentation and interpretation of Sediment Quality Triad data. Ecotoxicology 5: 327339.
  • 69
    Alden RW, Hall LW, Dauer DM, Burton DT. 2005. An integrated case study for evaluating the impacts of an oil refinery effluent on aquatic biota in the Delaware River: Integration and analysis of study components. Hum Ecol Risk Assess 11: 879936.
  • 70
    Iannuzzi TJ, Armstrong TN, Long ER, Iannuzzi J, Ludwig DF. 2008. Sediment quality triad assessment of an industrialized estuary of the northeastern USA. Environ Monit Assess 139: 257275.
  • 71
    Hall LW, Dauer DM, Alden RW, Uhler AD, DiLorenzo J, Burton DT, Anderson RD. 2005. An integrated case study for evaluating the impacts of an oil refinery effluent on aquatic biota in the Delaware River: Sediment Quality Triad studies. Hum Ecol Risk Assess 11: 657770.
  • 72
    Munkittrick KR, McGeachy SA, McMaster ME, Courtenay SC. 2002. Overview of freshwater fish studies from pulp and paper environmental effects monitoring program. Water Qual Res J Can 37: 4977.
  • 73
    Kilgour BW, Munkittrick KR, Portt C, Hedley K, Culp JM, Dixit S, Pastershank G. 2005. Biological criteria for municipal waste-water effluent monitoring programs. Water Qual Res J Can 40: 374387.
  • 74
    Crump KS. 1984. A new method for determining allowable daily intakes. Fundam Appl Toxicol 4: 854871.
  • 75
    Beliaeff B, Burgeot T. 2002. Integrated biomarker response: A useful tool for ecological risk assessment. Environ Toxicol Chem 21: 13161322.
  • 76
    Tejerina-Garro FL, de Mérona B, Oberdorff T, Hugueny B. 2006. A fish-based index of large river quality for French Guiana (South America): Method and preliminary results. Aquat Living Resour 19: 3146.
  • 77
    Bailer AJ, Oris JT, See K, Hughes MR, Schaefer R. 2003. Defining and evaluating impact in environmental toxicology. Environmetrics 14: 235243.
  • 78
    Somerfield PJ, Clarke KR, Olsgard F. 2002. A comparison of the power of categorical and correlational tests applied to community ecology data from gradient studies. J Anim Ecol 71: 581593.
  • 79
    Reynoldson TB. 2001. Comparison of models predicting invertebrate assemblages for biomonitoring in the Fraser River catchment, British Columbia. Can J Fish Aquat Sci 58: 13951410.
  • 80
    Bailey RC, Kennedy MG, Dervish MZ, Taylor RM. 1998. Biological assessment of freshwater ecosystems using a reference condition approach: Comparing predicted and actual benthic in-vertebrate communities in Yukon streams. Freshw Biol 39: 765774.
  • 81
    Bailey RC, Reynoldson TB, Yates AG, Bailey J. 2007. Integrating stream bioassessment and landscape ecology as a tool for land use planning. Freshw Biol 52: 908917.
  • 82
    Reynoldson TB, Norris RH, Resh VH, Day KE, Rosenberg DM. 1997. The reference condition: A comparison of multimetric and multivariate approaches to assess water-quality impairment using benthic macroinvertebrates. J North Am Benthol Soc 16: 833852.
  • 83
    Tonn WM, Paszkowski CA, Scrimgeour GJ, Aku PM, Lange M, Prepas EE, Westcott K. 2003. Effects of forest harvesting and fire on fish assemblages in boreal plains lakes: A reference condition approach. Trans Am Fish Soc 132: 514523.
  • 84
    Pont D, Hugueny B, Rogers C. 2007. Development of a fish-based index for the assessment of river health in Europe: The European Fish Index. Fish Manag Ecol 14: 427439.
  • 85
    Schmutz S, Melcher A, Frangez C, Haidvogl G, Beier U. 2007. Spatially based methods to assess the ecological status of riverine fish assemblages in European ecoregions. Fish Manag Ecol 14: 441452.
  • 86
    Ellison AM. 1996. An introduction to Bayesian inference for ecological research and environmental decision-making. Ecol Appl 6: 10361046.
  • 87
    Punt AE, Hilborn R. 1997. Fisheries stock assessment and decision analysis: The Bayesian approach. Rev Fish Biol Fish 7: 3563.
  • 88
    Wade PR. 2000. Bayesian methods in conservation biology. Conserv Biol 14: 13081316.
  • 89
    Peterman RM, Anderson JL. 1999. Decision analysis: A method for taking uncertainties into account in risk-based decision making. Hum Ecol Risk Assess 5: 231244.
  • 90
    Underwood AJ. 1997. Experiments in Ecology: Their Logical Design and Interpretation Using Analysis of Variance. Cambridge University Press, Cambridge, UK.
  • 91
    Underwood AJ. 1996. Detection, interpretation, prediction and management of environmental disturbances: Some roles for experimental marine ecology. J Exp Mar Biol Ecol 200: 127.
  • 92
    Underwood AJ. 2000. Trying to detect impacts in marine habitats: Comparisons with suitable reference areas. In SparksT, ed, Statistics in Ecotoxicology. John Wiley, Chichester, UK, pp 279308.
  • 93
    Gray JS. 1999. Using science for better protection of the marine environment. Mar Pollut Bull 39: 310.