A method for assessing the potential for confounding applied to ionic strength in central Appalachian streams



Causal relationships derived from field data are potentially confounded by variables that are correlated with both the cause and its effect. The present study presents a method for assessing the potential for confounding and applies it to the relationship between ionic strength and impairment of benthic invertebrate assemblages in central Appalachian streams. The method weighs all available evidence for and against confounding by each potential confounder. It identifies 10 types of evidence for confounding, presents a qualitative scoring system, and provides rules for applying the scores. Twelve potential confounders were evaluated: habitat, organic enrichment, nutrients, deposited sediments, pH, selenium, temperature, lack of headwaters, catchment area, settling ponds, dissolved oxygen, and metals. One potential confounder, low pH, was found to be biologically significant and eliminated by removing sites with pH < 6. Other potential confounders were eliminated based on the weight of evidence. This method was found to be useful and defensible. It could be applied to other environmental assessments that use field data to develop causal relationships, including contaminated site remediation or management of natural resources. Environ. Toxicol. Chem. 2013;32:288–295. © 2012 SETAC


The use of field data to understand and manipulate causal relationships is limited by the possibility that the apparent relationship is confounded. Confounding is a bias in the analysis of causal relationships due to the influence of extraneous factors (confounders). Confounding can occur when a variable is correlated with both the potential cause and its effect. The correlations are usually due to a common source of multiple potentially causal agents. However, they may be observed for other reasons (e.g., when one variable is a by-product of another) or due to chance associations.

Confounding is not, in general, well treated in ecological studies. Investigators often assume that confounding is not a problem if the association of interest is strong (e.g., has a high correlation coefficient). Alternatively, they assume that multiple regression, path analysis, or other multivariate statistics adequately deal with confounding, even though assumptions are violated and important potential confounders are often excluded for lack of adequate data. Correlation and regression do not even tell us whether C causes E or E causes C. In the present study, we present an alternative approach based on weighing all available evidence for and against plausible confounders. The approach includes a list of types of evidence that could indicate that confounding interferes with our ability to characterize the causal relationship and uses explicit criteria and scoring to transparently evaluate the evidence.

The method is applied to potential confounders of the relationship between stream invertebrate presence and the salts that leach from crushed rock in central Appalachia [1; this issue]. The goal of the present analysis was to determine which environmental variables must be treated as confounders in the development of the benchmark value. It was not to eliminate confounding variables. Most of them are natural variables, such as temperature and habitat structure, that cannot be literally eliminated, like eliminating women or smokers in an epidemiological study. Nor was the goal to equate the levels of confounders to an ideal or pristine level. Furthermore, the goal was not to demonstrate that these variables never cause effects. It is known that these factors all cause some effects in some circumstances. The goal was to support estimation of the ionic strength, measured as specific conductance, that protects against unacceptable effects on the invertebrate communities in those streams without significant influence by confounding variables 1, 2.


General approach

We developed a weight-of-evidence approach for evaluating potential confounders. Both logical arguments and statistical analyses are used to indicate whether an environmental factor affects or does not affect our ability to model the causal relationship. If the body of evidence indicates that the factor was not a potential confounder, no action was taken. If the body of evidence indicates that an environmental factor was a likely confounder, then the data set was truncated to reduce the effect of the confounder. Truncation removes the observations for which the confounder was beyond its threshold for effects. Although it was not necessary in this case, other methods might be used to adjust for any discovered confounding of the causal relationship.

Evidence of confounding

Most confounders are causal agents that are correlates of the cause of interest. Causation is commonly addressed by applying Hill's 3 considerations or some equivalent set of criteria for causation 4. This is done because statistics alone cannot determine the causal nature of relationships 5–7. Confounding, whether due to causation or chance correlations, can bias a causal model resulting in uncertainty concerning the actual magnitude of the effects. A variety of types of evidence may be used to determine whether confounders significantly affect a field-derived benchmark. We have identified 10 types, but there may be others. They are related to three of the characteristics of causation used to determine that elevated ionic strength is a cause of impairment of stream communities (co–occurrence, sufficiency, and alteration) 1. In some cases, one piece of evidence may rule out a potential confounder but more than one piece of evidence provides more confidence. Also, exclusion of relevant evidence can lead to false conclusions or accusations of bias. Only a weight-of-evidence approach allows assessors to consider all relevant statistical and logical evidence 8.

We used 10 types of evidence to assess confounding. They are listed below, beginning with a short description and followed by an explanation. In the descriptions, “the cause” refers to the cause of concern (ionic strength in this case) and “the confounder” refers to any potential confounder of the causal relationship. Type 1, correlation of confounder and cause: Confounders are correlated with the cause of interest. A low correlation coefficient is evidence against the potential confounder. Type 2, correlation of confounder and effect: Confounders are correlated with the effect of interest. A low correlation coefficient is evidence against the potential confounder. Type 3, influence of the confounder at extreme levels: Even when the confounder is not correlated with the cause of interest, it may be influential at extreme levels. A lack of influence at extreme levels of the potential confounder is evidence against the potential confounder. Type 4, influence of the presence of the confounder: If the frequency of the effect does not diminish when the potential confounder is never present or is present in all cases, it can be discounted in that subset. Type 5, occurrence of confounder at sufficient levels: The magnitude of the potential confounder (e.g., concentration of a cocontaminant) may be compared to exposure–response relationships from elsewhere (e.g., laboratory toxicity tests) to determine if the exposure to the potential confounder is sufficient to influence the effect. If it is not sufficient, that is evidence that it is not acting as a confounder. Type 6, influence of removing a confounder where it is at sufficient levels: If the confounder is estimated to be sufficient in a subset of cases, those cases may be removed from the data set and the remaining set reanalyzed to determine the influence of their removal on the results. If the cause–effect relationship is unchanged, the confounder was not causal or influential. Note that this evidence of confounding may also identify a treatment for confounding. Type 7, influence of the confounder in multivariate correlations: Multiple regression and other multivariate statistical techniques may be used to estimate the relative degree of association of the cause and potential confounders with the effect. Type 8, frequency of occurrence of the confounder: If the potential confounder occurs in a sufficiently small proportion of cases, it can be ignored. That is because if it occurs rarely, it cannot significantly influence the causal relationship. Type 9, occurrence of characteristic effects of the confounder: If a potential confounder has characteristic effects that are distinct from those of the cause of concern, then the absence of those effects can eliminate the potential confounder as a concern in either individual cases or the entire data set. Type 10, occurrence of characteristic effects of the cause: If the effects are characteristic of the cause of concern and not of the potential confounder, then the potential confounder can be eliminated as a concern in either individual cases or the entire data set.

Scoring evidence

Weighing evidence for confounding differs from weighing evidence for causation. A causal assessment determines whether the contaminant of concern (e.g., dissolved ions) is an important cause of biological impairment in the region 1. This assessment of confounding accepts the result of the causal assessment and attempts to determine whether any of the known potential confounders substantively interferes with estimating the effects of ionic strength in the causal model. If there is significant interference, the confidence in the model predictions would be weakened unless the model is modified. Although the general approach of weighing evidence by assigning scores using explicit criteria is the same as that used in the assessment of general causation (e.g., have dissolved ions caused impairment? 1) or specific causation (e.g., what caused impairment of this community? 9), the inference is different.

Two biological effect end points are used to develop evidence: (1) the species sensitivity distribution of invertebrate genera and the resulting 5th percentile hazard concentration (HC05), and (2) the number of ephemeropteran genera. The species sensitivity distribution and HC05 are used because they are the model and resulting output used to develop the benchmark. If a potential confounder does not influence that end point, it is not a confounder. However, the HC05 does not lend itself to correlation, contingency tables, or regression because it has only one value for the region and values are needed for individual sites. For those statistical analyses, the number of ephemeropteran genera is used as a surrogate end point. That metric was chosen because these genera are consistently among the most sensitive to salts. However, because of a resistant mayfly genus, it is not expected that all Ephemeroptera will be missing at high specific conductance, hereafter referred to as “conductivity.”

The primary data source for evidence of confounding is West Virginia's watershed analysis database, which was used to derive the benchmark 1. Except where indicated, reported results are derived from those data, which are referred to as the West Virginia data. However, where possible and appropriate, the U.S. Environmental Protection Agency (U.S. EPA) Region 3 data set from West Virginia samples (referred to as the U.S. EPA data) is used for independent corroboration. The U.S. EPA data set is much smaller and often does not have enough extreme values of the potential confounder to calculate reliable contingency tables or regressions of censored data.

The evidence is weighted using a system of plus (+) for supporting the potential confounder (i.e., the evidence suggests that the potential confounder is actually causing the effect to a significant degree), minus (−) for weakening the potential confounder (i.e., the evidence suggests that the potential confounder does not contribute to the effect to a significant degree), and zero (0) for no effect, usually due to ambiguity. One to three (+) or (−) symbols are used to indicate the weight of a piece of evidence: (+ + +) or (− − −) indicates convincing support or weakening, (+ +) or (− −) indicates strong support or weakening, (+) or (−) indicates some support or weakening, and 0 indicates no effect on the hypothesis of confounding.

Any relevant evidence receives a single plus, minus, or zero to register the relevance of the evidence and to indicate its logical implication (i.e., does it decrease or increase the potential for confounding) (Table 1). The strength of evidence is considered next. Criteria for scoring the strength of evidence are presented below for the common types. The criteria were developed for transparency and consistency and are based on the authors' judgments. After strength, the other possible unit of weight is assigned depending on the type of evidence.

Table 1. Relationships between qualities of evidence and scores for weighing evidence
Qualities of evidenceScore, not to exceed three minus or three plus
Logical implications and relevance+, 0, −
StrengthIncrease score
Other qualitiesIncrease score

For evidence based on co–occurrence (types 1 − 4), the strength and consistency of the association are the primary considerations. The primary measure of association is Spearman's correlation coefficients. For comparison to the potential confounders, the correlation coefficients for conductivity and number of ephemeropteran genera are −0.61 for the West Virginia data set and −0.72 for the U.S. EPA Region 3 data set; these values fall in the upper end of the moderate range. Correlations, as measures of co–occurrence, can be scored as in Table 2.

Table 2. Weighting co–occurrence using correlations for types 1 and 2
Absentr ≤ 0.1− −
Weak0.1 < r < 0.25
Moderate0.75 ≥ r ≥ 0.25+
Highr > 0.75+ +

These scores are based on conventional expectations for a confounder that is itself a cause. That is, a potential confounder such as deposited sediment by itself can cause extirpation of invertebrate genera (independent combined action) or can act in combination with conductivity to extirpate invertebrate genera (additive or more than additive combined action). However, sometimes correlations are anomalous. For example, a potential confounder may actually decrease effects. Such anomalous results require case–specific interpretation, based on knowledge of mechanisms and characteristics of the ecosystems being analyzed.

Anomalous results may also result from violation of the expectation that a confounder should be correlated with both conductivity and the effect. If only one of the correlations is observed, that result requires additional interpretation. If the potential confounder is correlated with the effect but not with conductivity, the result may be due to chance or to a partitioning of causation in space. That is, the cause and potential confounder are independent because they impair communities at different locations. This could occur if the potential confounder and conductivity have different sources. In any case, it is not a confounder of conductivity.

In the contingency tables (evidence type 3), the frequency of occurrence of any Ephemeroptera (i.e., of the failure to extirpate all ephemeropteran genera) is presented for combinations of high and low levels of conductivity and of the potential confounder. If the frequency of occurrence is much lower when the confounder is present at high levels, this is supporting evidence for confounding. Note that the goal here is not to determine the effects of exceeding a criterion or other benchmark. Rather, the goal is to clarify the co–occurrence of conductivity, confounders, and effects by determining the frequency of effects at each possible combination of extremely high and low levels of conductivity and the potential confounder. It is expected that if a variable is indeed a confounder, its influence on the occurrence of effects would be seen at an extreme level. This use of contingency tables could reveal influences of confounders that are obscured when the entire ranges of data are correlated. Therefore, clearly high and low levels of conductivity and the potential confounder are used in contingency tables.

A potential confounder gets a plus score if its presence at a high level reduces the probability of occurrence by more than 25% and a minus score if it does not (Table 3). It gets a double plus score if its presence at a high level reduces the probability of occurrence by more than 75% and a double minus score if it raises it by less than 10%. These cutoff levels delimit the indicated strength categories, based on the experience and judgment of the authors and reviewers. Any decrease in effects at high levels of a potential confounder is anomalous and treated as strong negative evidence.

Table 3. Weighting co–occurrence for evidence type 3 using contingency tables
High levels of a confounder should increase the probability that a site lacks Ephemeroptera at low conductivity, and low levels of the confounder should decrease the effect at high conductivitiesIncreased effect >25%+ for co-occurrence
Increased effect >75%+ + for co-occurrence and strength
Increased effect <25%− for co-occurrence
Increased effect <10% or decreased effect− − for co-occurrence and strength

The evidence concerning sufficiency of the confounder (evidence types 5 − 8) is diverse. Only evidence type 6 was sufficiently common and consistent to develop scoring criteria. For evidence type 6, the primary consideration is the degree of departure of the correlation in the truncated data set from the correlation of conductivity and Ephemeroptera in the full data set (Table 4). However, no more than one negative score was given if less than 10% of the data were removed.

Table 4. Weighting sufficiency for evidence type 6: Alteration of the correlation of conductivity with the number of ephemeropteran genera after removal of elevated levels of a confounder
Removal of elevated levels of a confounder should change the correlation coefficientCoefficients deviating by <10%− − for a lack of change in effect with removal of confounder
Coefficients deviating by <20%− for a small change in effect with removal of confounder
Coefficients deviating by >20%+ for a strong increase or decrease in effect with removal of confounder

For alteration, the primary consideration is the degree of specificity of the effects of the confounder relative to those of the dissolved ions. This type of evidence is rare and scored ad hoc when it occurs.

Additional considerations that may result in a higher score are presented in Table 5.

Table 5. Considerations used to weight the evidence concerning the influence of potentially confounding variables
Quality of evidenceDescriptor
Logical implicationNegative or positive
Directness of causeProximate cause, sources, or intermediate causal connections
SpecificityEffect attributable to only one cause or to multiple causes
Relevance to effectFrom the case or from other similar situations
Nature of associationQuantitative or qualitative
Strength of associationStrong relationships and large range or weak relationships and small range
Consistency of informationAll consistent or some inconsistencies
Quantity of informationMany data or few data
Quality of informationGood study or poor study

Weighing the body of evidence

After the individual pieces of evidence had been weighted, the body of evidence for a potential confounder was weighed based on the credibility, diversity, strength, and coherence of the body of evidence (Table 6). The body of evidence, rather than a single piece of evidence, was considered to determine how strongly these potential confounders might affect the model. Seven potential confounders (habitat quality, deposited sediment, high and low pH, selenium [Se], catchment area, settling ponds, and metals) are presented here. Five other potential confounders (organic enrichment, nutrients, temperature, loss of headwaters, and dissolved oxygen) are assessed in the U.S. EPA report 2.

Table 6. Weighing confidence in the body of evidence for a potential confounder
AssessmentScoreBody of evidenceAction
Very confident− − −All minus, some strongly negative evidenceNo treatment for confounding
Moderately confident− −All minus, no strongly negative evidenceNo treatment for confounding
Reasonably confidentMajority minusNo treatment for confounding
Undetermined0Approximately equal positive and negative, ambiguous evidence, or low-quality evidenceAdditional study advised
Potential confounding+Majority plusCorrection for confounding may be advised


Habitat quality

Stream habitat may be modified by physical disturbance, changes in flow, or increased sediment loads in reaches that receive high conductivity effluents. Habitat quality was represented by an index, the rapid bioassessment protocol derived by the West Virginia Department of Environmental Protection, which increases as habitat quality increases. Component metrics were not used because they were less correlated with Ephemeroptera than the index.

Habitat quality was analyzed as part of groups of variables judged a priori to be more likely than others to have combined effects. Therefore, sites at which the rapid bioassessment protocol and pH were low and fecal coliform count was high were removed to determine whether the HC05 was affected (Fig. 1). Similarly, the rapid bioassessment protocol score was used with fecal coliform count and temperature in a multiple linear regression with conductivity (Supplemental Data, Table S1).

Figure 1.

Species sensitivity distribution used to derive the conductivity benchmark (open circles) and one for sites with good habitat (rapid bioassessment protocol ≥135) and low organic enrichment (fecal coliform fewer than 400 colonies/100) (closed circles). The similarity of the relationships shows that even when both common potential confounders are removed, the results do not significantly change. The lower and upper confidence bounds on 300 µS/cm (5th percentile of distribution of open circles) are 225 and 350 µS/cm, respectively.

The body of evidence was mixed. Habitat scores were moderately correlated with both conductivity and biological response, indicating a potential for confounding. However, removal of poor habitat had little effect on the correlation of conductivity with Ephemeroptera or on the derivation of the HC05 for conductivity (Fig. 1 and Supplemental Data, Table S2). Habitat score had a very slight effect on the intercept and the slope for conductivity in a multiple regression (Supplemental Data, Table S1). In addition, Ephemeroptera occur even when habitat is poor (Supplemental Data, Table S2). The weight of the scored body of evidence indicated habitat was not a substantial confounder (Supplemental Data, Table S3).

Deposited sediment

Mining and other activities that result in crushing and exposing rocks are sources of salts and potentially of silt that may affect stream organisms. A qualitative measure of embeddedness (WABase embeddedness score) was evaluated by contingency table and by correlation 10 (Supplemental Data, Tables S1 and S4). No evidence supported embeddedness as a confounder (Supplemental Data, Table S5).

High pH

The dissolution of limestone, dolomite, and sandstone increases as unweathered surface area of rock increases. Waters draining crushed limestone, dolomite, or lime–cemented sandstone contain HCOmath image, which contributes to higher pH and alkalinity. The HCOmath image that raises the pH is also a major anion moiety that contributes to conductivity. Hence, pH directly reflects a major constituent of conductivity (HCOmath image), so it could not be a conventional confounder. In addition, salts influence hydrogen ion activity, which is measured as pH. In any case, the available evidence indicates that the variance in pH has little effect on the derivation of the HC05 for conductivity in waters above pH 7 (Supplemental Data, Tables S6 and S7).

Low pH

Because low pH from acid mine drainage is known to be an important cause of impairment where coal is mined, it was judged a priori to be a potentially important environmental variable. That preconception was supported by the evidence summarized in the Supplemental Data (Table S8). Therefore, sites with pH < 6 were not used to calculate the benchmark values. However, 84% of sites with low pH still had at least one genus of Ephemeroptera, whereas none occurred at either the low- or high-pH sites with high conductivity (Supplemental Data, Table S6). This suggests that even below pH 4.5, ionic strength is more important than acidity to the occurrence of Ephemeroptera. In sum, although the benchmark applies to waters with neutral or basic pH, high ionic strength appears to also cause effects at low pH.


Selenium is a potential confounder because it is commonly associated with coal and elevated levels have been reported in the region, but the evidence does not support confounding (Supplemental Data, Table S9). No correlations were found between Se and Ephemeroptera or between Se and conductivity in the West Virginia data set or in the U.S. EPA data set. This result is unreliable because most of the Se values were detection limits, and many of the detection limits were relatively high, even equaling the water-quality criterion of 5.0 µg/L. In addition, there were too few high Se concentrations in the West Virginia data to perform a contingency table analysis. For these reasons, correlational evidence of confounding was ambiguous.

Evidence of the sufficiency of observed Se levels to cause extirpation of stream macroinvertebrates is weakly negative. The national ambient water quality criterion (5 µg/L) is irrelevant because it is based on more sensitive vertebrates 11. Field and laboratory studies have found invertebrates to be relatively insensitive and unaffected at levels observed in West Virginia streams 12, 13. In outdoor artificial streams dosed with Se, insects were less sensitive than fish, crustaceans, and oligochaetes; baetid mayfly nymphs (Baetis, Callibaetis), damselfly nymphs (Enallagma), and chironomid larvae were not statistically significantly reduced, even at 30 µg/L 14. Relatively few invertebrate species have been tested, and highly sensitive species may be identified in the future 15; but the available toxicological evidence does not indicate that Se confounds the relationship between conductivity and invertebrate extirpation.

The effects of removing high Se on the conductivity relationship (evidence type 6) were addressed using the West Virginia data set. When data from streams with Se concentrations above the water-quality criterion (5 µg/L) were removed, the linear correlation coefficient for number of ephemeropteran genera and log conductivity was barely changed (r = −0.56, n = 339) relative to the full data set. When the same analysis was performed with the U.S. EPA data set, the correlation was actually greater than that for the full data set (r = −0.84, n = 32) (Fig. 2), which is contrary to expectations for a confounder. This result indicates that the conductivity relationship is not confounded by the toxic effects of Se.

Figure 2.

Spearman's correlation coefficient and scatterplot between the number of ephemeropteran genera and conductivity for 32 sites with low selenium concentrations (<5 µg/L).

Consideration of the specific effects of Se (evidence type 9) suggests that it is not an important contributor to the impairment. First, the most sensitive organisms to aqueous Se are fish and other oviparous vertebrates 12; however, in this case, relatively Se–insensitive insects are most affected. Second, Se causes characteristic deformities in fish, which have not been reported in West Virginia streams. Third, the effects of Se at low concentrations are seen in lentic ecosystems (lakes, reservoirs, ponds, wetlands), not in streams like those from which the conductivity relationship and benchmark were derived 12. Finally, because Se is biomagnified, it primarily affects top predators, not the herbivores and detritivores that are affected in this case. This specificity is supported by the fact that, in the region, the reported effects of Se are greatly elevated body burdens and associated deformities in a top predator fish (largemouth bass) in a lentic system (Upper Mud River Reservoir) 16, 17.

The weight of evidence does not support confounding by Se, so no action was taken to adjust the data set or analysis. However, because existing Se data are poor, the occurrence of Se in central Appalachian streams should be investigated further.

Catchment area

Larger streams tend to have more moderate chemical properties than small streams because they receive waters from more sources. Consequently, extreme values—in this case, both low and high conductivity—tend to occur less frequently in large streams. One of the initial data filters for this analysis was to exclude streams larger than 155 km2 (or 60 mi2). Small streams are numerically more abundant than large streams, and the inclusion of large streams might introduce extraneous variance. This raises the issue of whether stream size is a potential confounder and whether the results from small streams might be extrapolated to larger streams. That is, do the same effects of conductivity occur in larger streams as were found in the detailed analysis of smaller streams? We examined these issues by analyzing the influence of stream size (as catchment area) on the effects of conductivity and on the occurrence of Ephemeroptera.

We categorized streams by catchment area into three groups: small catchments less than 6 mi2 (15.5 km2), medium catchments of 6 to 60 mi2 (15.5 km2 to 155 km2), and large catchments greater than 60 mi2 (155 km2). These categories were distinguished because small catchments correspond to headwater streams, which have few pollutant sources and large terrestrial influence, and large catchments may have different sampling methods. In all three stream size categories, if conductivity was <200 µS/cm, 99% or more of streams had Ephemeroptera, but if conductivity was above 1,500 µS/cm, fewer streams had Ephemeroptera (Supplemental Data, Table S10). The number of Ephemeroptera taxa declines with increasing conductivity in all streams with measured catchment areas, independent of classification of catchment area (r = −0.59). Correlation of log conductivity with log catchment area is weak (Supplemental Data, Table S11).

The weight of evidence for confounding by catchment area (Supplemental Data, Table S11) is uniformly negative, so we conclude that catchment area has little or no effect on invertebrate response to conductivity.

Settling ponds

The effluents from most valley fills flow into settling ponds, and it has been suggested that those ponds are the actual cause of downstream community impairments. This issue was addressed using the U.S. EPA Region 3 data set because it identifies the presence of ponds. When data from only streams with ponds are used (i.e., the occurrence of ponds is removed as a variable, evidence type 4), the correlation coefficient for number of ephemeropteran genera and log conductivity is r = −0.84 (Fig. 3). This result is somewhat higher than the result for the uncensored U.S. EPA Region 3 data set (r = −0.73), which is contrary to the expectation if ponds were the cause or contributed to the effects of ionic strength. This result clearly shows that the conductivity relationship is not a result of co–occurrence with ponds. In addition, when ponds are removed and the streams are reclaimed, conductivity remains high and the effects continue. For example, Venter's Branch and Jones Branch in Martin County, Kentucky, USA, were mined in the mid-1990s and the ponds removed. When the streams were sampled in 2009, conductivity was >2,000 µS/cm and no Ephemeroptera were found in either stream (Greg Pond, U.S. EPA, personal communication).

Figure 3.

Spearman's correlation coefficient and scatterplot between the number of ephemeropteran genera and conductivity for 20 sites below settling ponds for valley fills. Data from the U.S. Environmental Protection Agency Region 3 data set.

The weight of evidence for confounding from ponds is uniformly negative, so we conclude that the presence of ponds has little or no effect on invertebrate response to conductivity.


Iron (Fe), aluminum (Al), and manganese (Mn) are the metals most associated with acid mine drainage; and commenters have suggested that they may cause the impairment associated with ionic strength. However, for the following reasons, streams that are circumneutral to moderately alkaline are unlikely to experience toxicity from these metals 18.

The most toxic form of Fe (free Fe2+) does not occur in oxygenated waters above pH 4. Under those conditions, Fe occurs as hydroxide particles or, if significant dissolved organic matter is present, as Fe colloids. In these forms, Fe is thought to reduce the toxicity of co–occurring metals by adsorption and coprecipitation. Toxic divalent Al precipitates similarly above pH 5 as hydroxide flocs or polymeric Al. Divalent Mn is converted to insoluble Mn4+ in mildly alkaline waters. The precipitates of these metals may adversely modify habitats and directly affect organisms; however, the valley fill effluents that are primarily responsible for the relationship between conductivity and extirpation of invertebrates are not equivalent to the acid drainage into neutralizing streams that results in heavy accumulations of precipitates. Finally, the toxicity of these divalent anions is mitigated by divalent calcium, which is the dominant cation in the ionic mixtures. Hence, because the calcium increase is much greater than the increase in these metals, it is expected that as conductivity increases, the toxicity of these metals will decrease per unit concentration.

Because of concern for combined effects of metals, multiple linear regression of conductivity, Fe, Al, and Mn was performed. The metals reduced the coefficient for conductivity by only 8.6% (Supplemental Data, Table S12).

Based on contingency table analyses, weak correlations, and other evidence 2, Fe and Al are clearly not confounders. However, Mn is more ambiguous since it is moderately correlated with both conductivity and ephemeropteran genera (Supplemental Data, Tables S13 and S14). Manganese has been relatively poorly studied because it has seldom been found at toxic levels. Like other divalent cationic metals, Mn2+ is less toxic in hard (i.e., high Ca) waters; and the high conductivity waters in this region are inherently hard. Based on a linear relationship of hardness to conductivity in the West Virginia data, 300 µS/cm conductivity is equivalent to a hardness of approximately 200 mg/L CaCO3. The equivalent hardness–adjusted British Columbia Chronic Water Quality Guideline for Mn is 1.5 mg/L 19. Dittman and Buchwalter 20 provide the laboratory study with the most directly relevant taxa: aquatic insects from Appalachia. They quantified bioaccumulation and performed biomarker studies that found reduced levels of cysteine and glutathione at 0.10 and 0.50 mg/L, but they saw no overt toxic effects. The most relevant conventional toxicity tests of aquatic invertebrates were 21-d reproduction tests of Daphnia magna, which yielded inhibiting concentration 25% (IC25) values of 5.4 and 9.4 mg/L for hardness levels of 100 and 250 mg/L, respectively 21. A recent assessment of the Clear Fork watershed, West Virginia, concluded that total Mn at 0.002 −0.50 mg/L was a minor contributor to biotic impairment because Mn was weakly correlated (r = −0.16) with the West Virginia Stream Condition Index when corrected for stronger causes 22.

In summary, Fe and Al are clearly not confounders. Equivocal evidence suggests that Mn is potentially a weak confounder.

Summary of actions taken to address potential confounding

The primary means of dealing with confounding is categorization 23. For example, in epidemiology, it is common to categorize people by gender and age (e.g., child, adult, elderly). In this case, pH is an apparent confounder and, because the problem of concern was associated with circumneutral to alkaline pHs, acidic sites (pH <6) were censored from the data set.

An alternative is to retain the apparent confounders and use multivariate statistics. This offers the opportunity to treat the agent of concern and the confounder as a combined cause if they interact or to statistically partition out the contribution of the confounder if they are independent. However, this requires a large data set that meets the assumptions of the statistical method (e.g., independence, additivity, and normality). It also requires that the contributions of the confounders be sufficiently large relative to the agent of concern and relative to the background variability among sites, sampling events, and analyses. Although habitat and Mn showed signs of being weak confounders, their contributions to multivariate models were too small and the violations of the assumptions of multiple regression were too great for them to be used with any confidence to adjust the conductivity model (Supplemental Data, Tables S13 and S14).

Other potential confounders were eliminated from consideration with confidence. We do not argue that these variables do not cause impairment at some locations in the region. Neither do we argue that they have no influence at all on ionically impaired sites. Rather, given the inevitable variability in sites to which the benchmark would be applied and the relatively strong relationship of conductivity and loss of sensitive genera, the evaluated confounders do not substantially affect the model that is used to develop and apply the conductivity benchmark.


A weight-of-evidence analysis proved to be effective in analyzing diverse information to determine whether the relationship of ionic strength to an effect on stream invertebrates (loss of ephemeropteran genera) was confounded by specific environmental variables. Quantitative methods are used to generate the individual pieces of evidence, but the combining of heterogeneous evidence is inevitably qualitative. However, the weighing method used in the present study is defensible because it is transparent and, as far as possible, uses a consistent logic, criteria, and scoring rules.

Too often, analyses of causal relationships in nature are conducted without a serious analysis of the potential for confounding. We believe that the weight-of-evidence method used in the present study provides a flexible yet rigorous means to use available evidence to evaluate confounding. Its utility potentially extends to other applications of causal models derived from field data such as resource management or the development of remedial goals for contaminated sites.


Tables S1–S14. (33 KB DOC).


We thank the West Virginia Department of Environmental Protection and G. Pond, U.S. EPA, for providing sampling data. L. Zheng, Tetratech, performed all analyses. Many anonymous and named reviewers improved the quality of the manuscript, including the following individuals: M. Griffith, C. Delos, M. Passmore, J. VanSickle, C. Schmitt, C. Menzie, C. Hawkins, and members of the U.S. EPA Biological Advisory Committee. The following members of the U.S. EPA Science Advisory Board provided careful review, interdisciplinary insights, and encouragement: D. Patten, E. Boyer, W. Clements, J. Dinger, G. Geidel, K. Hartman, R. Hilderbrand, A. Huryn, L. Johnson, T.W. La Point, S.N. Luoma, D. McLaughlin, M.C. Newman, T. Petty, E. Rankin, D. Soucek, B. Sweeney, and R. Warner. We thank U.S. EPA management for its support throughout this project. The present study is based on work supported by the U.S. EPA. The views expressed in this article are those of the authors and do not necessarily represent the views or policies of the U.S. EPA.