Field studies can generate statistical relationships between environmental attributes and biological responses, but those relationships are not necessarily causal. Epidemiologists have addressed this problem by weighing evidence of causation in terms of lists of considerations 1, 2, and that approach has been applied to ecological inferences 3–7. Ecoepidemiologists tend to be concerned with specific causation (what is the cause of impairment of a particular population or ecosystem?). As a result, ecoepidemiological methods are well developed for specific causation, particularly in the Causal Analysis/Diagnosis Decision Information System 7, 8. However, in the present study, we present a method for addressing general causation. That is, it asks, “Is agent C capable of causing effect E in the region?” rather than, “Did C cause E in a particular impaired stream reach?” 7, 9, 10. The standard of proof for general causation is that the agent did cause the effect under some circumstances.
The present study is one of a series that describes a method for using field data to derive benchmark values. The series begins with the method for deriving the benchmark and its application to dissolved ions in Central Appalachian streams 11, 12. The present study then explains how to determine that an association in the field is causal, and a separate article applies that method to determine that the relationship between specific conductance, hereafter referred to as conductivity, and the extirpation of benthic invertebrates is causal 13. The next article explains how to determine whether a field-derived exposure–response relationship is confounded and assesses potential confounders of the conductivity–extirpation relationship 14. The last article in the series analyzes the spatial relationship between sources of ions and conductivity, which supports the causal assessment and identifies targets for remediation 15.
The method described here determines whether an agent has caused a biological effect in a region, not that it causes all instances of the effect, nor that there are no other causes of the effect, nor that it causes the effect at any particular site. Therefore, the causal assessment does not require an evaluation of the relative effect of the agent compared with other potential causes. Instead, the relationship is compared with a null hypothesis that the agent does not cause the effect.
If the agent is found to be causal, other potential causes are analyzed as potential confounders 14. An agent may be a cause even if it is confounded. The agent of interest and confounder may be causal at different times, or the confounder may increase the effect of the cause of interest. In such cases, the model of the causal relationship must be adjusted to account for the confounding variable.
The determination of causation is controversial in epidemiology and ecoepidemiology. Some argue that field data cannot be interpreted causally and only experimental results are demonstrably causal. However, causal relationships in experimental systems do not prove that associations in the field are causal and it is not practical to test some relationships in controlled systems. Others argue that a statistical analysis of field data can demonstrate causation. This ignores the fact that association is not causation, no matter how sophisticated the statistics. An alternative exists. That process uses all types of knowledge, including statistically generated evidence, to demonstrate that causation is the best explanation of the body of evidence.
The method presented here consists of weighing the available evidence on the basis of causal considerations. That conceptual approach was developed by Hill 1 and has become the standard in epidemiology. For example, Hill's considerations are used by the U.S. Environmental Protection Agency to determine whether a chemical causes cancer 16. Inevitably, an element of subjectivity arises in deciding whether to accept that a relationship is causal, but criterion-guided judgment has the advantage of transparency and of requiring assessors to fully consider the implications of all available evidence. A more complete discussion of concepts of causation and their application to environmental effects can be found in U.S. Environmental Protection Agency 7 and Suter et al. 17. On the website for the Causal Analysis/Diagnosis Decision Information System 7, see Causal Concepts and History, http://www.epa.gov/caddis/si_approach.html.
We modified Hill's considerations 1 for establishing a probable causal relationship because we realized that these considerations are a mixture of types of evidence, sources of information, and qualities of information 3. In our modified approach, the causal arguments are based on various types of evidence for a set of characteristics shared by all causal relationships (Table 1), and the sources of information and qualities are used to weigh the evidence 3.
Evidence that the cause co–occurs with the unaffected entity in space and time
Evidence that the causal relationship is a result of a larger web of cause-and-effect relationships
Not recognized by Hill; however, he used metaphors of preceding causation or a causal pathway leading to the proximate cause
Evidence that the cause physically interacts with the entity in a way that induces the effect
Evidence that the entity is changed by the interaction with the cause
Evidence that the intensity, frequency, and duration of the cause are adequate, and the entity is susceptible to produce the type and magnitude of the effect
Evidence that the cause precedes the effect
The following sections describe the causal assessment method used in the case study concerning the extirpation of aquatic life by ionic stress 13. The basic method is generally applicable, but the details, such as scoring criteria, may change from case to case.
Planning and problem formulation
The causal assessment begins by stating the reasons for the causal assessment. The causal hypothesis is articulated including its scope and relevance. (Note that this is a scientific hypothesis but not a conventional statistical null hypothesis.) Both the cause and effect metrics are described. Sources of data used to develop evidence are characterized. In some cases, a conceptual model of the routes of exposure and mechanisms of action is useful.
Analyzing and weighing evidence
The overall process for performing the assessment is depicted in Figure 1. Causal evidence is data that have been analyzed or organized in some way to show a characteristic of causation or a lack of one. Evidence may come from the laboratory or field (from experiments, observations, or general knowledge), and from the region or elsewhere. After the evidence is developed, a form of criterion–guided judgment is used to weight evidence, to weigh consolidated evidence for each causal characteristic, and to weigh the body of evidence of the causal relationship 6.
Any scientific work can be deliberately or inadvertently biased. To avoid this, a formalized process is followed, and evidence and scoring are explicit. First, we list the characteristics of causation 3. Second, we apply analytical and inferential methods to reveal how each piece of evidence relates to the characteristic. We document how the evidence relates to a particular characteristic and ensure that it is not repeated in a slightly different form. Third, whenever possible, we use more than one data set to increase independence of evidence.
After the evidence is developed in step 1, it is sorted by type and by causal characteristic in step 2. A type of evidence consists of data from a particular source such as before-and-after observations, field experiments, toxicity tests, or body burden analyses, analyzed so as to relate to a causal characteristic 3. For example, toxicity test data might be used to estimate a threshold sufficient for lethality or to demonstrate an interaction such as neurotoxicity. The evidence is then evaluated for relevance to the assessment (e.g., with respect to community type), consistency with scientific knowledge and theory, and quality of the study in step 3. Evidence that does not meet those standards is not used in the assessment. The remaining types of evidence are scored based on logical implications and weighted by the strength of the signal, and degree of corroboration. For example, independence and lack of confounding of a piece of evidence strengthens that evidence. In step 4, the overall qualities of the collected evidence for each characteristic are weighed and then scored. In step 5, the body of evidence for the causal relationship is evaluated based on the evidence that the hypothesized relationship possesses the characteristics of causation. Lastly, in step 6, the results and confidence in the findings are described. Specific scoring associated with different statistical analyses is described in the case study 13.
The evidence is weighted using a system of plus (+) for supporting conductivity as a cause, minus (−) for weakening, and zero (0) for no effect. (Both neutral evidence and ambiguous evidence have no effect on the inference.) One to three plus or minus symbols are used to indicate the weight of a piece of evidence; (+ + +) or (− − −) indicates convincing support or weakening, (+ +) or (− −) indicates strong support or weakening, (+) or (−) indicates some support or weakening, and (0) indicates no effect. These qualitative scores were developed for epidemiology 18 and adopted for ecoepidemiology 4, 8.
Note that these scores are for particular types of evidence or a body of evidence for a causal characteristic but not for causation as a whole. For example, several studies may convincingly demonstrate that a source exists that is associated with elevated conductivity in the region, so the overall evidence for that causal characteristic is scored + + +, but alone, it is not convincing evidence that conductivity causes extirpation of biota.
Evidence is sorted into types by (1) the kind of association or information; (2) the source of the information (from observation, manipulation, or general knowledge); and (3) the source of the association (from the case, from elsewhere, or from theory). For example, a type of evidence might be characterized as co-occurrence of conductivity and Ephemeroptera from contingency tables (the kind of association) from field surveys (the source of information) and from the region (the source of the association; Table 2).
Table 2. Weighing and scoring evidence for co–occurrencea
Type of evidence
Description of evidence
Summary of co–occurrence: In summary, the causal relationship exhibits the causal characteristic of co–occurrence of loss of susceptible taxa with conductivity greater than natural background (+). Many genera are never seen at high conductivity in two independent data sets 19. Also, Ephemeroptera are present where conductivity is low even when other stressors are present 13. Ephemeroptera are frequently absent where conductivity is high, even when other stressors are absent 13. Loss of many genera is a strong effect (+). In paired watersheds, various biological metrics are diminished in co–occurrence with elevated conductivity. Each type of evidence was independently corroborated (+). A summary score of + + + was assigned.
Contingency tables provide quantitative evidence that high conductivity is strongly associated with severe effects. Ephemeroptera are present at >97% of low-conductivity sites and absent at 31 to 81% of high-conductivity sites in three data sets 13
Co–occurrence in nearby watersheds
In two studies, there is a two- to threefold difference between high- and low-conductivity sites for several effect endpoints despite similar habitat quality among sites 13
Co–occurrence between conductivity and extirpation of genera
All genera in the study set were observed at sites <150 µs/cm except for one; whereas 24.5% of genera are never seen >1,500 µs/cm. These findings were confirmed with an independent data set collected by independent entities using different sampling methods in Kentucky 13, 20.
After the pieces of evidence are grouped by type, the types of evidence are grouped by causal characteristics (Fig. 1, step 2). For example, co-occurrence of conductivity and Ephemeroptera from contingency tables is listed together with two other types of evidence, co–occurrence in other watersheds and co–occurrence between conductivity and extirpation of genera as evidence of the characteristic of co–occurrence (Table 2).
Scoring types of evidence
Each type of evidence is dichotomously evaluated as credible or not based on (1) relevance to the assessment, (2) coherence with scientific theory, and (3) quality of the study. Evidence that was not credible according to any of these criteria was not used in the assessment. For example, in evaluating sufficiency, we did not include toxicity test studies of genera or ionic mixtures that had low relevance because they were substantially different from those used to construct the causal model 13. No studies were found to be inconsistent with scientific theory. Low relevance studies were rejected based on content. Data from non–peer-reviewed studies were not used to assure quality, but an exception was made for a data set of chemical analyses of brine drilling waste.
The remaining evidence was weighted by scoring the types of evidence using the +, −, and 0 system (Fig. 1, step 3). Three qualities of the evidence may contribute to the score: (1) a single score is applied to register the logical implication of the relevance of good-quality evidence, that is, to decrease (−) or increase (+) support for the causal relationship or to have neither tendency (0); (2) especially strong evidence receives an additional score, based on logical properties (e.g., the effect is inconsistent with the mode of action of the agent) or the quantitative strength of the evidence (e.g., high correlation coefficients or large quantitative differences) (Table 3); and (3) a type of evidence may receive an additional score if there is consistency among multiple studies for that type of evidence. For example, multiple pieces of evidence for co–occurrence of cause and Ephemeroptera show that, where conductivity is high, genera of the order Ephemeroptera are less likely to occur. This supports the causal hypothesis, so a + is assigned for logical implication. A change of 50% or more is large in this case, so another + is assigned for strength. The evidence was consistently corroborated in three independent data sets, and therefore, the evidence receives another + for a total of + + +13.
Table 3. Scoring the logical implication and strength of evidence for co–occurrence from contingency tablesa
Strength from contingency tables
Strength from correlation
Strength from difference
Scoring was done by Spearman correlation, or factor of difference, and for sufficiency from regression. Relevant, good-quality evidence was awarded a single entry (0, +, or −); an additional score applied for strength for 2 pluses or minuses.
An additional + or–score may be applied for corroboration, for a total not to exceed three pluses or minuses.
Increased effect >25%
Increased effect >50%
Increased effect <25%
Increased effect <5%
r has the wrong sign
Wrong sign >2
When scoring evidence based on correlations, regression, contingency tables, or quantitative comparisons, we used the standard criteria for logical implication and strength described in Table 3. Other qualities, which are not simple and quantitative, must be scored based on judgment and explained in each case.
The standard scores for strength of evidence (Table 3) are based on the authors' experience and judgment as to what constitutes convincing, strong, or weak evidence. No objective basis exists for such judgments. They are equivalent to the judgment of Karl Pearson that an error probability of 0.01 was sufficiently low, which, after more experience, was changed to the current convention of 0.05 19. Users of this method may substitute their own judgments in future applications because the significance of a correlation or other statistical metric depends on the background variability. The advantage of these scores is that, unlike most weight-of-evidence judgments, they are explicit.
Weighing and scoring the collected evidence for each causal characteristic
We continued the process by weighing the strength, diversity, and consistency of the evidence for each causal characteristic and noting any discrepancies and any aspects of the body of evidence that could be improved (Fig. 1, step 4). The evidence is weighed using a system with the same symbols (+, −, or 0) as for weighting the types of evidence.
The summary score for each causal characteristic was assigned the median score for the body of evidence. A score was reduced if the evidence for that characteristic was inconsistent. The score was increased if the evidence included at least three types of consistent evidence.
Weighing the body of evidence
The scores for the evidence of the causal characteristics were used to evaluate the body of evidence for the causal relationship (Fig. 1, step 5). The following criteria were applied to determine causation by weighing the body of evidence for the causal relationship. Causation was refuted when there was evidence refuting one or more characteristics. Refuting is the logical process of demonstrating the impossibility of a candidate cause, thus allowing it to be eliminated from further consideration (e.g., effects occur before causes or upstream of the sources). Causation was judged unlikely when there was evidence discounting four, five, or six causal characteristics. Discounting (the antonym of supporting) is the property of evidence that weakens the case for a candidate cause but is insufficient to refute. Causation was unlikely but with low confidence when there was evidence discounting one, two, or three characteristics with other evidence supporting. Causation was confirmed when there was evidence strongly documenting six characteristics. Causation was very probable when there was evidence documenting five or six characteristics (but not necessarily all strong) and none discounting. Causation was probable when there was evidence strongly documenting three or four characteristics with at least one being either sufficiency or alteration and none discounting. Causation was probable but with low confidence when there was evidence strongly documenting two characteristics with at least one being either sufficiency or alteration and none discounting. When there was evidence for documenting only one causal characteristic, there was insufficient evidence to make a determination of causation.
In addition to the causal assessment, other potential causes should be assessed to assure that the model of the causal relationship is not materially confounded and predictions from it are reliable 14. In addition to assuring that the cause–effect model is not confounded, this analysis provides a response to contentions that the causal relationship is actually due to a different agent.
We have shown how evidence of causal characteristics can be used to determine whether a particular agent causes an effect. The process is an improvement on Hill's considerations 1, because it ties evidence directly to characteristics of causal relationships and does not confuse evidence with the source of the information or the quality of the information 3. The source and quality of the information, instead, are used to evaluate the logical implications, strength, and overall credibility and relevance of the evidence.
This method also shows how a clearly defined, a priori process for weighing evidence enhances the communication and credibility of a causal assessment 6. That process uses qualitative scoring of the characteristics of causal relationships to organize and weigh evidence for and against a general causal hypothesis. Defining the scoring criteria reduces lapses in consistency and provides transparency when weighting evidence for assessment of either general or specific causation 8, 18. The method has been applied to determining whether the water quality benchmark for conductivity is based on a causal relationship 13, 20. It can be used to assess the causation of other relationships in epidemiology or ecoepidemiology. The same approach, with different scoring criteria, can be used to assess potential confounders 14.
We thank the West Virginia Department of Environmental Protection, G. Pond, U.S. Environmental Protection Agency (U.S. EPA), who provided data, and L. Zheng, Tetratech performed all statistical analyses. Anonymous and named reviewers improved the quality of the manuscript: M. Griffith, C. Delos, M. Passmore, J. VanSickle, C. Schmitt, C. Menzie, C. Hawkins, and members of the U.S. EPA Biological Advisory Committee. The U.S. EPA Science Advisory Board provided careful review, interdisciplinary insights, and encouragement: D. Patten, E. Boyer, W. Clements, J. Dinger, G. Geidel, K. Hartman, R. Hilderbrand, A. Huryn, L. Johnson, T.W. LaPoint, S.N. Luoma, D. McLaughlin, M.C. Newman, T. Petty, E. Rankin, D. Soucek, B. Sweeney, and R. Warner. The article was formatted by D. Kleiser, C. Lewis, S. Moore, and L. Wood of EC Flex, and L. Kessler, K. Secor, and L. Tackett of IntelliTech Systems. The article is based on work supported by the U.S. EPA. The views expressed in the present study are those of the authors and do not necessarily represent the views or policies of the U.S. EPA.