Bosker et al. (this issue)2012 correctly state that pseudoreplication is of concern primarily at the individual mine site level. This “regulatory objective” was also the focus in Huebert et al. (2011), although perhaps not stated as clearly as it might have been.

The recommendation of Bosker et al. (this issue)2012 to include multiple reference sites within Environmental Effects Monitoring (EEM) programs is of critical importance and, as stated within the EEM guidance document (Environment Canada 2002, 2011), should be adopted where possible. Despite these recommendations, the majority of metal mines EEM studies carried out to date have apparently used a single reference site. It is this reality that was of concern in Huebert et al. (2011). National leadership to advance “good scientific practice” in the design of monitoring programs is clearly required.

Bosker et al. (this issue)2012 provide response to comments concerning the statistical structure of the EEM program. For clarity, it is useful to provide a short description of the various Type I and Type II errors (Table 1) under discussion. Bosker et al. (this issue)2012 suggest that the probability of a Type I error when rejecting the Null Hypothesis (α′) is not actually 61%, because the 9 endpoints are not independent. However, Type I errors are caused by random error and chance alone and are thus only affected if the endpoints are covariates. The original authors of the EEM guidance document understood this, as evidenced by the replacement, early on, of Simpson's Diversity Index with Simpson's Evenness. The former is a covariate of invertebrate species richness and abundance, whereas the latter is not. Clearly, when multiple endpoints are used to test a single hypothesis, the default estimate of the probability of at least 1 Type I error when rejecting the Null Hypothesis (α′) is 1 − (1 − α)n, where “n” is the number of endpoints, independent or not, being measured, and α is the probability of a Type I error when finding an effect for each endpoint.

Table 1. Definition of Type I and Type II error within the EEM program
ErrorScaleError definitionTermError value
  • EEM = Environmental Effects Monitoring.

  • a

    An “effect” within EEM is defined as a significant difference in at least 1 endpoint between a reference and exposure site.

  • b

    The Null Hypothesis within the metal mines EEM can be stated as follows; “There is no effect of mine discharge on fish or benthic invertebrate communities in receiving waters.”

  • c

    Adjustment to α by Bonferroni correction.

Type IEndpoint (comparison-wise)-The probability, by chance alone, of finding an effecta when there is not 1, for any of 9 endpointsα10%≈1.1%c
 Null Hypothesis (Ho)b-Rejecting Ho when it is trueα′≈61%≈10%
 (experiment-wise)-The probability, by chance alone, of finding at least 1 effect (in 9 endpoints) when there is not 1   
Type IIEndpoint (comparison-wise)-The probability, by chance alone, of finding no effect when there is 1, for any of 9 endpointsβ≈10%≈38%
 Null Hypothesis (Ho) (experiment-wise)-Accepting Ho when it is falseβ′???≥0.02%
  -The probability, by chance alone, of finding no effect (in any or all 9 endpoints) when there is at least 1   

Bosker et al. (this issue)2012 correctly state that using a Bonferroni Correction will increase comparison-wise β for individual endpoints, and calculate a β of 0.38 with a Bonferroni corrected α of 0.011 (Table 1). However, this analysis, although correct, misses the point. For EEM practitioners, it is the probability of a Type II error when accepting the Null Hypothesis (experiment-wise β′) that is of interest, not the probability of a Type II error when determining no effect for individual endpoints (comparison-wise β). Practitioners are primarily concerned with whether the Null Hypothesis has been incorrectly accepted, that is, whether any or all endpoints did not show an effect when they should have; practitioners are not initially interested in which endpoint that might be (i.e., β in Table 1). Considering that 9 endpoints are used to test the experiment-wise Null Hypothesis, there is substantial difference between β and β′. If one considers that comparison-wise power = 62% (β = 0.38 for each of the 9 endpoints within EEM), then it could also be considered that experiment-wise Power is approximately equal to 99.98% (β′ = 0.389, or ≈0.02%) when accepting the Null Hypothesis. Using a comparison-wise β = 0.38 in EEM clearly does not compromise power of the experiment-wise test of the Null Hypothesis.

To support their argument against use of the Bonferroni Correction, Bosker et al. (this issue)2012 cite Nakagawa (2004). However, abandoning the Bonferroni Correction is merely the initial introductory portion of Nakagawa (2004). Ultimately, those authors' recommendation is to completely abandon inferential statistics and to use Critical Effect Size to make a determination of “biological significance” rather than “statistical significance” (Nakagawa and Cuthill 2007). This concept is not without merit but should certainly not be adopted piecemeal.

Bosker et al. (this issue)2012 correctly state that within the EEM a second confirmatory cycle of monitoring is required, and that it is unlikely that a Type I error (α) for at least 1 of the 9 endpoints will occur twice. However, stating that a 61% probability of at least 1 Type 1 Error within any single monitoring study is acceptable, because the study will be repeated and the errors might cancel, is fundamentally incompatible with good scientific practice and will more likely than not result in requirements for ongoing confirmatory monitoring. This outcome will create a regulatory burden that is unjustifiable, because it is based in large part on the statistical bias inherent within each individual, independent monitoring study.

Calculation of the probability of finding and confirming an effect is also not the sole interest of the EEM practitioner; it is the finding and confirming of no effect that is of equal interest. It is surely reasonable to expect that if a discharge has no measurable environmental impact, then the probability of a Type I error when rejecting the Null Hypothesis (α′), should be close to the statistical standard (i.e., 10% in EEM; Table 1). This is not the case within the current EEM statistical structure (Table 1). Furthermore, the probability of at least 1 Type 1 error in 2 consecutive EEM studies is approximately equal to 85% (1 − 0.392). Currently, it is therefore unlikely for the practitioner to find and confirm a determination of no effect, even in the complete absence of a discharge-related environmental impact. If the EEM program is to fairly evaluate and test the Null Hypothesis, it is clear that a level of significance of 0.011 must be adopted for each of the 9 endpoints used within the EEM program. Failure to accept this alteration will lead inexorably to an unnecessary and unjustifiable requirement for increasingly intensive and extensive monitoring on the part of EEM practitioners.


  1. Top of page
  • Bosker T, Barrett TJ, Munkittrick KR. 2012. Response to Huebert et al. “Canada's environmental effects monitoring program: Areas for improvement.Integr Environ Assess Manag 8: 381–382.
  • Environment Canada. 2002. Metal Mining Guidance Document for Aquatic Environmental Effects Monitoring. Ottawa, ON, Canada.
  • Environment Canada. 2011. Metal Mining Environmental Effects Monitoring (EEM) Technical Guidance Document. Ottawa, ON, Canada.
  • Huebert DB, McKernan M, Paquette C. 2011. Canada's environmental effects monitoring program: Areas for improvement. Integr Environ Assess Manag 7: 143144.
  • Nakagawa S. 2004. A farewell to Bonferroni: The problems of low statistical power and publication bias. Behav Ecol 15: 10441045.
  • Nakagawa S, Cuthill IC. 2007. Effect size, confidence interval and statistical significance: A practical guide for biologists. Biol Rev 82: 591605.