SEARCH

SEARCH BY CITATION

Keywords:

  • Water quality criteria;
  • Stressor–response model;
  • Criteria assessment;
  • Risk-based;
  • Receiver operating characteristics

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

Field data relating aquatic ecosystem responses with water quality constituents that are potential ecosystem stressors are being used increasingly in the United States in the derivation of water quality criteria to protect aquatic life. In light of this trend, there is a need for transparent quantitative methods to assess the performance of models that predict ecological conditions using a stressor–response relationship, a response variable threshold, and a stressor variable criterion. Analysis of receiver operating characteristics (ROC analysis) has a considerable history of successful use in medical diagnostic, industrial, and other fields for similarly structured decision problems, but its use for informing water quality management decisions involving risk-based environmental criteria is less common. In this article, ROC analysis is used to evaluate predictions of ecological response variable status for 3 water quality stressor–response data sets. Information on error rates is emphasized due in part to their common use in environmental studies to describe uncertainty. One data set is comprised of simulated data, and 2 involve field measurements described previously in the literature. These data sets are also analyzed using linear regression and conditional probability analysis for comparison. Results indicate that of the methods studied, ROC analysis provides the most comprehensive characterization of prediction error rates including false positive, false negative, positive predictive, and negative predictive errors. This information may be used along with other data analysis procedures to set quality objectives for and assess the predictive performance of risk-based criteria to support water quality management decisions. Integr Environ Assess Manag 2012; 8: 674–684. © 2012 SETAC


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

Increased attention is being given to the development of water quality benchmarks and criteria using weight of evidence and field data that express relationships between ecological responses of aquatic ecosystems and stressor variables that cause those responses (Paul and McDonald 2005; USEPA 2006a, 2010a, 2010b, 2011; Cormier et al. 2008; Hollister et al. 2008; Suter and Cormier 2008). Efforts by a number of state environmental agencies to develop numeric nutrient criteria have involved evaluations of relationships between nutrient concentrations and measures of ecological and biological responses to excess nutrients such as increases in algal growth. In addition, several documents published by US Environmental Protection Agency (USEPA) are intended to address how field data on potential stressors and related responses may be evaluated to develop numeric water quality criteria (WQC) or benchmarks that support the attainment of designated uses as required by the Clean Water Act (CWA). Examples include the derivation of criteria for suspended and bedded sediment (USEPA 2006a), numeric nutrient criteria (USEPA 2010b), and benchmarks for specific conductance in central Appalachian streams (USEPA 2011).

Suter and Cormier (2008) provide a useful risk-based framework for the development and evaluation of environmental quality criteria derived from field data. The authors compared conventional risk assessment with a process termed “criterion assessment.” Whereas conventional risk assessment seeks to define human health or ecological risks associated with a range of exposures to 1 or more stressors, the criterion assessment process seeks to define the level of exposure needed to achieve a specific environmental goal, e.g., an ecosystem that is “healthy” with respect to 1 or more ecological attributes. The authors define 3 phases of criterion assessment: planning, analysis, and synthesis phases. Cormier et al. (2008) describe the criterion assessment process in greater detail and provide a hypothetical example using field data on stream macro invertebrates and deposited and bedded sediments. The example is based on information in the USEPA Framework for Developing Suspended and Bedded Sediments Water Quality Criteria (USEPA 2006a).

In general, such criterion assessment is a multistep process that involves identifying numeric thresholds or ranges of 1 or more response variables that define attainment and nonattainment of designated uses, a stressor variable that causes changes in the response variables, and a model that describes the relationship between the stressor and response variables. The modeled relationship may then be used to identify levels of the stressor (i.e., a numeric criterion or benchmark) that minimize the likelihood of occurrence of the unwanted condition (for clarity and brevity throughout this article, the term “criterion” refers to either WQC or benchmarks for a stressor variable, and the term “threshold” refers to a response variable threshold).

Conceptually, the process is relatively straightforward. In practice, as other authors have described (Barbour et al. 2004; Cormier et al. 2008), several challenges may need to be addressed. For example, it may be difficult to identify 1 or more response variables and associated thresholds that appropriately define ecosystem management goals and reflect the attainment of designated uses. Ultimately, selection of responses and response thresholds may be based on a combination of policy choices and information from scientific analyses, including studies of the impacts of varying levels of ecosystem responses on designated use attainment. Next, 1 or more stressor variables that are causally related to the selected responses must be identified and the nature of their relationship to selected responses assessed. A valid causal analysis developed through a weight of evidence approach goes beyond statistical modeling and can help to demonstrate that relationships observed in field data are not simply associations with no direct causal links to a stressor variable. Without a causal connection, management of the stressor may yield no improvement in the targeted ecological response (USEPA SAB 2010). Despite these challenges, the examples cited previously indicate that decisions on appropriate response variables and thresholds are made as a part of contemporary environmental management.

An important additional challenge is to identify an appropriate value for the stressor variable, i.e., a numeric criterion, that supports attainment of the desired response condition. Contributing to this challenge is the uncertainty that exists in stressor–response relationships described using field data. Combined with a causal stressor–response relationship and an established response threshold, selection of a numeric stressor criterion completes a model framework that may be used to predict the status of the response. Uncertainty in the stressor–response relationship dictates that these predictions will also be uncertain and may make the appropriate choice of a numeric criterion unclear. Attention to the characterization of uncertainty and its effects on water quality predictions is important for improved water quality management (Borsuk et al. 2002; DiToro et al. 2005; Reckhow et al. 2005; Gronewold et al. 2008; Stevenson et al. 2008) and is the focus of this article for the model framework described above.

There are 2 important questions for WQC development and implementation based on field-derived stressor–response relationships. For a given water body, how accurately does nonattainment of a stressor criterion indicate a nonattaining response condition, and conversely, how accurately does attainment of a stressor criterion indicate attainment of a desired response condition? These questions may be thought of in the context of statistical decision errors in hypothesis testing (Barbour et al. 2004). Smith et al. (2001, 2003) address the probability of statistical decision errors when comparing measures of a single variable to a numeric criterion for that variable to assess violations of US water quality standards. Assuming a null hypothesis that a standard is being attained, a Type I (also called a false rejection or false positive) error is defined as a case where a site may be classified as nonattaining when in fact designated uses are attained. A Type II (a false acceptance or false negative) error is defined as a case where a site is classified as attaining when it truly is nonattaining.

Smith et al. (2001) state that the choice of acceptable error rates should be a risk management decision, and achieving a balance of these errors may be appropriate when agreement on acceptable false positive and false negative error rates is not possible. Furthermore, considering decision error rates quantitatively is important because of uncertainty that exists due to natural variation and measurement and sampling errors, and because policy determinations may allow occasional violations of a standard (Smith et al. 2001). Reasons for minimizing false positive errors include a need for wise use of limited regulatory agency and other stakeholder resources and the application of remedial activities to truly impaired sites so that water quality goals can be achieved effectively (Smith et al. 2001, 2003; Llanso et al. 2009; Paul and Munns 2011). Minimizing false negative errors is important to minimize water quality risks to aquatic life and human health. Characterizing and selecting appropriate levels of both error types is also a central goal in the development of project specific data quality objectives (DQO) as discussed in USEPA (2002, 2006a, b).

For predictive models involving field-based stressor–response relationships, response thresholds, and stressor criteria, the consideration of decision errors can be extended to inferences about the response using information about the stressor variable. The general diagnostic nature of this prediction problem exists in a number of other fields such as medicine, meteorology, machine learning, and others (Swets et al. 2000). Often, such diagnostic models are evaluated with a receiver operating characteristic (ROC) approach in which the status of an indicator variable (e.g., exceedance or nonexceedance of an indicator threshold) is used to predict the status of the primary variable of interest (e.g., the presence or absence of disease). However, ROC analysis seems to be used less commonly to evaluate model performance in water quality management (Hale and Heltshe 2008). Some examples are present in the peer-reviewed environmental literature, however, and several are given in Table 1.

Table 1. Examples from peer-reviewed literature of ROC analysis used in environmental research and management
AuthorUses of ROC analysis
  1. AVS = acid volatile sulfides; ROC = receiver operating characteristics; SEM = simultaneously extracted metals.

Benyi et al. 2009Evaluating the extent of agreement between 2 benthic macroinvertebrate indices and associations between an index and environmental metrics
Efstratiou et al. 2009Comparing bacterial indicators to predict the presence of Salmonella sp. in sewage-polluted marine waters using different indicator thresholds
Hale and Heltshe 2008Developing a benthic index for nearshore waters in the Gulf of Maine
Hale et al. 2004Comparing logistic regression models developed to estimate the probability of degraded benthic conditions
Long et al. 2011Studied factors affecting the occurrence of terrestrial carnivores within a Vermont study area
Mason and Graham 1999Evaluating the quality of a meteorological forecast system
McLellan et al. 2008Characterizing the predictive ability of competing candidate regression models used to predict the probability of return by anglers in a coastal rainbow trout fishery
Morrison et al. 2003Evaluating the ability of indicator variables to correctly classify water as suitable or unsuitable for swimming by comparing the mean density of Enterococcus sp. with a threshold used to protect public health
Murtaugh 1996Evaluating ecological indicators to identify useful surrogates or indicators for ecological response variables
Murtaugh and Pooler 2006Studying lake condition indicators in the northeastern United States
Nevers and Whitman 2011Comparing measured and predicted Escherichia coli concentrations relative to a human health standard used to decide whether beaches should be closed to swimming
Shine et al. 2003Comparing percent mortality in sediment bioassays with toxicity predicted from the ratio of SEM–AVS

The basis for ROC analysis is commonly a 2 × 2 contingency table (also called a “confusion” or error matrix) representing 2 states of actual condition (e.g., a reference group and a diseased group) and 2 states of the predicted condition using results from a diagnostic test involving an indicator variable (see figure 1 in Fawcett 2006 and Table 2). The true condition is represented by 1 of 2 states, i.e., either the condition is present or it is absent. Likewise, the prediction is that the condition is either present or absent.

thumbnail image

Figure 1. Scatter plots (A,D,G), conditional probability plots (B,E,H), and ROC analysis error plots (C,F,I) for Data Sets 1, 2, and 3.

Download figure to PowerPoint

Table 2. Performance metrics (after Linnet 1988) and example calculations for a 2 × 2 contingency table (i.e., error matrix) for ROC analysis with quadrant counts and ROC terms calculated for a hypothetical situation involving uncorrelated stressor and response variables, with total n = 1001 data pairs, and Ythr, and Xc set at the median values for each variablea
 Indicator (stressor)Total count
AttainingNonattaining
  • a

    Counts are chosen to help illustrate the calculation of each term. FN = false negative; FNE = false negative error; FP = false positive; FPE = false positive error; NPE = negative predictive error; NPV = negative predictive value; PPE = positive predictive error; PPV = positive predictive value; ROC = receiver operating characteristics; Se = sensitivity; Sp = specificity; TN = true negative; TP = true positive.

Actual (response)
 Nonattainingn (FN)n(TP)250 + 251 = 501
 250251 
 Attainingn(TN)n(FP)252 + 248 = 500
 252248 
Total count250 + 252 = 502251 + 248 = 4991001
Prevalence = (n(TP) + n(FN))/(n(TP) + n(FP) + n(FN) + n(TN)) = 501/1001 ≈ 0.5
Nonerror rates:
Sp = n(TN)/[n(TN) + n(FP)] = 252/500 ≈ 0.5.
Se = n(TP)/[n(FN) + n(TP)] = 251/501 ≈ 0.5.
PPV = n(TP)/[n(TP) + n(FP)] = 251/499 ≈ 0.5.
NPV = n(TN)/[n(TN) + n(FN)] = 252/502 ≈ 0.5.
Accuracy = ½(Sp + Se) = (n(TP) + n(TN))/(n(TP) + n(FP) + n(FN) + n(TN)).
= (251 + 252)/1001 ≈ 0.5.
Error rates:
FPE = n(FP)/[n(FN) + n(TP)] = 1 − Sp = 248/500 ≈ 0.5.
FNE = n(FN)/[n(TN) + n(FP)] = 1 − Se = 250/501 ≈ 0.5.
PPE = n(FP)/[n(FP) + n(TP)] = 1 − PPV = 248/499 ≈ 0.5.
NPE = n(FN)/[n(FN) + n(TN)] = 1 − NPV = 250/502 ≈ 0.5.

Group classifications may be based on categorical data or continuous data in which category membership is determined using previously established thresholds and/or criteria (Linnet 1988, Murtaugh 1996). In ROC analysis, counts from the 2 × 2 error matrix can then be used to derive several metrics of the predictive performance of the overall prediction model, including estimation of error rates and their complementary nonerror rates (Table 2). Error rates include false positive error (FPE), false negative error (FNE), positive predictive error (PPE), and negative predictive error (NPE). As shown in Table 2, FPE represents the proportion of all observations actually attaining the desired response condition that are indicated as nonattaining, whereas PPE represents the proportion of all observations that are indicated as nonattaining that actually attain the desired response. FNE represents the proportion of all observations that are actually not attaining the response but are indicated as attaining, whereas NPE represents the proportion of all observations that are indicated as attaining that actually do not attain the desired response. PPE and NPE may be most relevant when new information is available only for the stressor and/or indicator, and inferences about the likelihood of observing one or the other actual response condition are desired.

Nonerror rates include specificity (Sp), sensitivity (Se), positive predictive value (PPV), and negative predictive value (NPV). Sp represents the proportion of true negatives among all cases in which the desired response is actually attained. Se represents the proportion of true positives among all cases in which the desired response is not attained; thus, high sensitivity indicates good model performance for identifying truly nonattaining cases. PPV and NPV are, respectively, the proportion of true positives among all observations that are indicated as nonattaining, and the proportion of true negatives among all observations that are indicated as attaining the desired response. In addition, the overall accuracy of the predictive model may be estimated as ½ (Sp + Se). Finally, prevalence estimates the rate of the true nonattaining condition in the population.

Literature on ROC analysis often emphasizes the generation and interpretation of the ROC curve, in which Se is plotted against 1-Sp as a function of a range of possible cutoff or criterion values, Xc, for the variable used to predict the presence or absence of the actual condition. An alternative to this presentation of ROC data is described by Linnet (1988) in which both Se and Sp are plotted against Xc. Using the equations in Table 2, it can be shown that FNE is equal to 1-Se and FPE is equal to 1-Sp. Thus, FPE and FNE also can be easily plotted as a function of Xc. For WQC derivation, this approach provides a useful way to characterize the influence of choices of Xc directly on decision error rates associated with classification predictions derived from possible stressor criteria.

This study explores the use of decision error rates estimated from ROC analysis to inform the criteria assessment process described by Suter and Cormier (2008) by providing metrics of predictive performance for models based on stressor–response data. The method is applied to 1 simulated data set and 2 data sets from published literature, along with associated response thresholds for each. It is assumed that the stressor variables are causally related to responses to focus the study on methods for quantitative evaluation of this type of criterion assessment model. Error rate estimates, rather than nonerror rates or other ROC results, are emphasized because they are often used in environmental research and management to characterize and control uncertainty (Smith et al. 2001; USEPA 2006a) and may be more commonly understood within the environmental community than Se and Sp.

The results from ROC analysis are compared to those obtained from 2 other approaches for evaluating stressor–response data: linear regression and conditional probability analysis (CPA). The simulated data set provides a hypothetical example of a simple linear relationship with relatively low variability that is useful for illustrating typical output from all 3 procedures. The published data sets are chlorophyll a (chl a) and total P (tp) concentration measurements used as the basis for proposed numeric nutrient criteria for Florida colored lakes (USEPA 2010b), and Ephemeroptera/Plecoptera/Tricoptera (EPT) taxa richness and percent sediment fine material (percent fines) data published by Paul and McDonald (2005) and Hollister et al. (2008).

METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

Data Set 1 was generated with Microsoft Excel using a simple linear regression model to yield simulated response data from a set of randomly generated stressor variable observations. The stressor variable values were generated from a normal distribution. The slope, intercept, and variance used to generate response variable values were selected to yield a statistically significant positive relationship with a relatively high degree of correlation. The median value of the simulated response variable observations is used as the response variable threshold, Ythr. This establishes a prevalence of 50% for the purpose of this study. Values greater than Ythr are defined as representing a nonattaining response condition.

Data Set 2 consists of data on annual geometric mean chlorophyll a (chl a), a response variable, and annual geometric mean total P concentrations (abbreviated “tp” to avoid confusion with the abbreviation for true positives, TP, used in Table 2), as a stressor variable. These data were used to derive proposed WQC for Florida colored lakes (USEPA 2010b) and were previously evaluated by McLaughlin (2012). The previously selected annual geometric mean chl a concentration of 20 µg/L was used as Ythr, with higher values indicating nonattainment of designated uses. The proposed baseline and modified tp criteria, 0.05 mg/L and 0.157 mg/L tp, respectively, were evaluated among other possible Xc values.

Data Set 3 consists of paired observations of EPT taxa richness and percent fine grain sediment in bottom substrate (percent fines). These data were previously published by Paul and McDonald (2005) and Hollister et al. (2008) to evaluate the use of CPA and to illustrate applications of the CProb software to conduct CPA. The EPT taxa richness is negatively correlated with percent fines (high EPT taxa richness and low percent fines representing higher quality conditions). EPT taxa richness less than 9 is used to indicate nonattaining conditions, consistent with the previous publications.

Summary statistics for all 3 pairs of stressor and/or indicator and response variables, including means, standard deviations, medians, minimums, and maximums were calculated using Minitab®, Version 16. Minitab also was used for correlation and regression analyses. The strength of correlation for all 3 relationships is compared using the nonparametric Spearman's rank correlation coefficient. The relationships in Data Sets 1 and 2 were analyzed using linear regression (the logarithms of chl a and tp were used for regression analysis of Data Set 2 following USEPA 2010b). Linear regression models are characterized using the slope and intercept of the regression line, the coefficient of determination (R2), residual plots, and the statistical significance of the regression line, slope, and intercept parameters. Upper and lower 50% prediction limits were evaluated, consistent with the approach used for the proposed USEPA Florida lakes criteria. No linear regression model was previously described for the EPT taxa richness data by Paul and McDonald (2005) or Hollister et al. (2008), nor is one developed here. Instead, the nature of the stressor–response relationship is characterized using Spearman's ρ and locally weighted scatterplot smoothing (LOWESS). The CProb procedure referenced in Hollister et al. (2008), developed for use within the R computing environment, was used to derive all CPA results.

For ROC analysis, the pROC software (Robin et al. 2011) was used to calculate prevalence, error rates, nonerror rates, and accuracy. In addition to providing definitions of these terms, Table 2 contains a set of example values for each box of the error matrix. The example values were selected to illustrate the calculation of each term, and to provide a reference example for comparison with ROC results obtained from each data set. Because the values in each box are nearly equal, the example represents a case in which the stressor and response variables are effectively uncorrelated with the response threshold and stressor criterion set at their respective medians. In this case, all terms defined in Table 2 have calculated values of approximately 0.5. This table also can be used to show that 2 perfectly correlated variables would have nonerror rates equal to 1 and error rates equal to 0. Se and Sp for all 3 data sets including the 95% confidence interval from 2000 bootstrap resampling events, were obtained using the “ci.thresholds(rocobj)” command in pROC. Se and Sp (i.e., their median estimates at each Xc value) were used along with the formulas in Table 2 and raw counts for the 2 × 2 matrix to estimate median rates of all 4 error types, FPE, FNE, PPE, and NPE. Results are compared with the hypothetical uncorrelated reference example provided in Table 2.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

Summary statistics

Summary statistics are shown for both variables of each data set in Table 3. The median value of the simulated response variable is 72 (unitless); as previously discussed, this value is used as Ythr, resulting in a prevalence of 50%. Using a coefficient of variation less than 1 to indicate normally distributed data (McBean and Rovers 1998), results obtained for each variable indicate that only the stressor and response variables from Data Set 1 and the EPT taxa richness data of Data Set 3 are approximately normally distributed. Figure 1 shows scatter plots, conditional probability plots, and error plots from ROC analysis for all 3 data sets.

Table 3. Summary statistics for variables in Data Sets 1–3
 nMeanSDCoef. Var.MedianMinMax
  1. Coef. Var. = coefficient of variation; EPT = Ephemeroptera/Plecoptera/Tricoptera; FL = Florida; SD = standard deviation.

Data Set 1 (simulation)
 Response10070.410.30.1572.039.392.3
 Stressor1006.91.10.166.94.39.6
Data Set 2 (FL colored lakes)
 Chl a (ug/L)29131.943.91.3815.90.39317
 Total P (mg/L)2910.0970.1071.090.0700.00470.963
Data Set 3 (EPT taxa/percent fines) (Hollister et al. 2008)
 EPT taxa richness9911.66.80.5811.0029
 Percent fines9915.618.31.1710.90100

Data Set 1 (simulated data)

Simple linear regression of the simulated data indicates that the relationship is statistically significant with a slope of approximately 7.9 and an R2 of 0.77 (Table 4). Upper and lower 50% prediction limits from the linear regression cross Ythr of 72 at simulated stressor values of Xc = 6.7 and Xc = 7.5 (Figure 1A). These intersection points, respectively, are referred to as Ythr/UPL50 and Ythr/LPL50 in the remainder of the article. Because higher values of the simulated response and stressor variables were defined to indicate nonattaining conditions, Ythr/UPL50 and Ythr/LPL50 indicate, respectively, that 25% of the response values are expected to be nonattaining at Xc = 6.7 and 75% are expected to be nonattaining at Xc = 7.5.

Table 4. Characteristics of criterion assessment models for Data Sets 1–3
Data SetPrevalence (%)AUCFPE and FNE at balance pointSpearman's ρR2 (if applicable)Regression equation
  1. AUC = area under ROC curve; EPT = Ephemeroptera/Plecoptera/Tricoptera; FL = Florida; FNE = false negative error; FPE = false positive error; na = not applicable; ROC = receiver operating characteristics.

Data Set 1 (Simulation)500.940.140.870.77Response = 15.5 + 7.9 (Stressor)
Data Set 2 (FL colored lakes)550.850.260.750.58log10(chl a) = 2.488 + 1.128 log10(tp)
Data Set 3 (EPT taxa/percent fines) (Hollister et al. 2008)340.810.24−0.54nana

Base case (from Table 2

)
≈50≈0.5≈0.50≈0nana

A conditional probability plot (Figure 1B) shows the probability of impairment for lowest Xc values of approximately 50%, consistent with the PPV and prevalence after setting Ythr equal to the median stressor value (i.e., 72). Bootstrap 95% confidence limits indicate a range of slightly more than 40% to just under 60%. With increasing Xc, the median probability of impairment among cases greater than Xc increases to nearly 1.0 at approximately Xc = 7.5 (the Ythr/LPL50). The width of the confidence interval around the median conditional probability estimate appears relatively constant across most of the range of Xc. The median probability of impairment associated with Xc = 6.7 (the Ythr/UPL50) is approximately 80% ± 10%. Using a nonoverlapping confidence intervals approach (Paul and McDonald 2005; Hollister et al. 2008), CPA results suggest Xc = 6.1 as a potential indicator/stressor variable criterion.

The median estimates of all 4 error types obtained from ROC analysis across the range of Xc values are shown in Figure 1C. FPE and PPE decrease as Xc increases, whereas FNE and NPE increase. All 4 error lines intersect at a probability of 0.14, indicating that 14% is the lowest error rate that can be achieved for all 4 error types at any single Xc, i.e., an error rate “balance point” for this stressor–response relationship with Ythr = 72. Table 5 shows ROC error rate estimates associated with Ythr/UPL50 and Ythr/LPL50 (i.e., Xc = 6.7 and Xc = 7.5). At Xc = 6.7, FNE and NPE are relatively low at less than 0.1 whereas FPE and PPE are somewhat higher at 0.20 and 0.18, respectively. At Xc = 7.5, FPE and PPE are very low (≤0.02) whereas FNE and NPE are much higher at 0.36 and 0.27, respectively. These error rate estimates represent improvements (i.e., reduced error rates) compared to the uncorrelated base case rates of approximately 0.5 (Table 2).

Table 5. ROC-derived median error rate estimates for selected Xc values from linear regression and conditional probability analysis for Data Sets 1–3
  FPEFNEPPENPE
  1. FL = Florida; FPE = false positive error, FNE = false negative error, PPE = positive predictive error, NPE = negative predictive error; ROC = receiver operating characteristics.

Data Set 1 (Simulation)Ythr/UPL50 = 6.70.200.080.180.09
 Ythr/LPL50 = 7.50.010.360.020.27
Data Set 2 (FL colored lakes)Ythr/UPL50 = 0.05 mg/L chl a0.380.040.330.04
 Ythr/LPL50 = 0.157 mg/L chl a0.040.680.140.36
Data Set 3 (EPT taxa/percent fines) (Hollister et al. 2008)CPA-derived response threshold = 15%0.210.280.360.16

Data Set 2 (Florida colored lakes)

Simple linear regression of the log10chl a and log10tp concentration data yields a slope of approximately 1.128 and an R2 = 0.58 (Table 4). The prevalence of nonattaining lakes based on Ythr = 20 µg/L chl a is 55% and Ythr/UPL50 and Ythr/LPL50 are located at tp concentrations of 0.05 mg/L and 0.157 mg/L, respectively (Figure 1D), as reported previously (USEPA 2010b). A conditional probability plot (Figure 1E) shows a probability of impairment at low Xc values of approximately 0.45. Unlike the CPA plot for Data Set 1, an intermediate plateau occurs, reflecting a leveling of the probability of nonattainment near 0.9 in the Xc range of 0.2 to 0.3 mg/L tp. This is followed by a decline to a minimum probability of 0.7 near Xc = 0.4 mg/L tp. The plot shows an increase in probability to 100% at Xc = 0.5 mg/L tp. Bootstrap 95% confidence limits tend to widen at higher Xc values, also unlike Data Set 1. Using a nonoverlapping confidence intervals approach to estimate a tp criterion reflecting significantly higher probability of non-attainment compared to background, a value of less than 0.05 mg/L tp is obtained.

Receiver operating characteristics analysis shows that median estimates of all 4 error types do not converge at a single balance point as occurred with Data Set 1 (Figure 1F). However, there is an FPE and FNE balance point at Xc = 0.074 of 0.26 (Table 4). Here, PPE and NPE are approximately 0.3 and 0.22 (Figure 1F). Table 5 shows ROC error rate estimates associated with Ythr/UPL50 and Ythr/LPL50 intersection points of 0.05 mg/L and 0.157 mg/L, corresponding to the baseline and modified tp criteria. At tp = 0.05 mg/L, median FNE and NPE estimates are very similar at approximately 0.05. However, FPE and PPE estimates are considerably higher at 0.38 and 0.33, respectively, compared with 0.5 in the uncorrelated base case (Table 5). At tp = 0.157 mg/L, FPE and PPE are less than 0.15, whereas NPE is 0.36 and FNE is 0.68. Figure 2 shows FPE and FNE estimates with 5th and 95th percentile confidence limits from 2000 bootstrap resampling events using pROC. Confidence limits and medians estimated for the intersection points compare favorably with those estimated by McLaughlin (2012) using an alternative bootstrapping approach.

thumbnail image

Figure 2. False positive error (FPE, 1-specificity) and false negative error (FNE, 1-sensitivity) as a function of Xc for Data Set 2 (FL colored lakes). Dashed lines represent 95% confidence limits.

Download figure to PowerPoint

Data Set 3 (EPT taxa and percent fines)

Summary statistics for the EPT taxa richness and percent fines data are shown in Table 3. The LOWESS line crosses Ythr at 17% fines (Figure 1G), and indicates the presence of an overall decline in EPT taxa richness with increasing percent fines (Spearman's ρ = −0.54). Results from CPA, including 95% confidence limits on the probability of EPT taxa less than 9 at various Xc are shown in Figure 1H. Results are consistent with those presented by Hollister et al. (2008). The nonoverlapping confidence intervals method yields an estimated Xc criterion for percent fines of 15% (shown in Figure 1G), which is similar to the value reported in Paul and McDonald (2005). The CPA plot indicates that for the lowest values of Xc, there is a 40%–50% probability of EPT taxa richness less than 9 based on the median estimate. This probability increases to 1 at Xc greater than approximately 45% fines. Unlike CPA results for Data Set 1, although similar to Data Set 2, confidence limits tend to be much wider at Xc greater than approximately 18% fines, most likely reflecting greater scatter in the data pairs about Ythr at these Xc values (Hollister et al. 2008). The total number of cases for estimating impairment probability decreases to fewer than 10 at Xc greater than about 40% fines.

Median estimates of each of the 4 types of error rates as a function of the percent fines criterion, Xc from ROC analysis are shown in Figure 1I. As Xc increases from zero, the FPE decreases from 100%, reaching 0 at Xc = 48% fines whereas the FNE increases from 0 to 100. FPE and FNE lines intersect at a balance point of 0.24 where Xc = 13.6% fines (Table 4). NPE increases from 0.05 to about 0.3 at Xc values greater than 35% fines, with a value of 0.14 at Xc = 13.6% fines. PPE decreases from 0.60 to 0.32 over this same range, and is 0.38 at Xc = 13.6% fines. Unlike Data Set 1, PPE and NPE do not converge at the FPE and FNE balance point. This may be related to the asymmetric nature of the scatter plot of these 2 variables compared to the simulated data set. At the Xc estimate of 15 derived from CPA, error rates range from a low of 0.16 (NPE) to a high of 0.36 (PPE), with FPE and FNE rates of 0.21 and 0.28, respectively.

Specificity, sensitivity, and accuracy

Figure 3 shows the Sp, Se, and accuracy of response condition predictions, calculated as <FRAX;1;2 > (specificity + sensitivity), as a function of Xc for all 3 data sets. Note that the y axis scale is the same for all 3 panels, ranging from 0.5 to 1.The highest accuracy observed for any of the 3 data sets was 0.86, occurring for Data Set 1 at Xc in the range of 6.7 to 7.1. This corresponds with an FPE/FNE balance point of 0.14. The highest accuracy for the FL colored lakes data is 0.79 at Xc = 0.05 mg/L tp. This Xc value differs from 0.074 mg/L at which both FPE and FNE rates are equal. The EPT taxa richness data set also shows a peak accuracy of 0.79 at Xc = 12% fines, which is slightly lower than the FPE/FNE balance point of 13.6% fines. Figure 3 illustrates that accuracy is likely to be highly dependent on Xc, and the Xc associated with peak accuracy may differ from the Xc at which Sp and Se are equal.

thumbnail image

Figure 3. Accuracy, sensitivity, and specificity as a function of Xc for Data Sets 1, 2, and 3.

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

The objective of this study is to evaluate the use of ROC analysis to inform the criterion assessment process described by Suter and Cormier (2008) and Cormier et al. (2008). Many factors, both scientific and policy driven, may contribute to the choices of response variables, response thresholds, stressor variables, and stressor criteria that may comprise a predictive model of the type discussed in this article. In addition, multiple models derived from other sets of stressors and responses may be needed to adequately address broad ecosystem management goals. Nonetheless, an important consideration in selecting numeric criteria may be the tolerable level of uncertainty in predictions of designated use attainment status using such models. This study shows that ROC analysis can be used to estimate FPE, FNE, PPE, and NPE, as a function of Xc (Figure 1C, F, I). These estimates quantify, in terms of several types of prediction error rates, the potential consequences of selecting various candidate criteria.

Applying ROC analysis, CPA, and linear regression approaches to 3 different data sets illustrates differences among the types of uncertainty characterizations possible using each technique and shows how those characterizations may change as a function of specific data set attributes. Regression models can be used to describe the nature and extent of the relationship between stressor and response variables, yielding information on the form of the relationship (linear, curvilinear, nonlinear) and goodness of fit (Draper and Smith 1998). Where a valid regression model can be developed, prediction limits can provide estimates of the proportion of nonattaining responses at specific stressor levels, i.e., at points where prediction limits intersect a response threshold (USEPA 2010b). However, no guidelines currently exist for selecting appropriate prediction limits (e.g., 50%, 80%, or some other percentage) to inform water quality management decisions. Furthermore, as illustrated in this study, prediction limits do not easily yield comprehensive information on the performance of a criterion-based water quality prediction model in terms of the probability of misclassifying the attainment status of surface waters. As shown here, CPA can partially meet this latter objective; however, compared to ROC analysis, CPA provides a limited assessment of error and nonerror rates for a given prediction model as described further below.

As illustrated by the example in Table 2, when 2 variables are uncorrelated and the response variable threshold yields a prevalence of a nonattaining condition equal to 50%, error and nonerror rates from ROC analysis are expected to be 0.5 (equal to the prevalence) because the indicator/stressor variable provides no information on the level of the response. As shown by the strongly correlated linear relationship between the variables in Data Set 1 where the prevalence is also 50%, much higher nonerror rates and much lower error rates may be achieved through careful selection of Xc. This example also shows that choosing Xc based on the intersection of regression prediction limits with response thresholds can yield relatively high rates of certain decision error types depending on the strength of the stressor–response relationship. The uncorrelated reference example and the example provided by Data Set 1 show that by combining information from regression and ROC analyses, the nature and extent of the stressor–response relationship, as well as the accuracy of response condition predictions based on selected stressor criteria, can be described.

The analysis of Data Set 3, the EPT taxa richness and percent sediment fines data of Hollister et al. (2008), provide a comparison of CPA and ROC analysis using field data where the stressor and response variables are negatively correlated to a moderate extent. This example is also useful because the data are not as easily modeled using linear regression as Data Set 1, highlighting the value of the nonparametric nature of both CPA and ROC analysis. As shown by the definitions provided in Table 2, the CPA plot is equivalent to a plot of PPV as a function of Xc Thus, CPA also can be used to obtain 1 − PPV = PPE. CPA does not provide an estimate of either FPE or FNE rates, however. In contrast, ROC analysis readily shows that at Xc = 15, 22% of all attaining waters would be incorrectly classified as nonattaining (FPE = 0.22), and 26% of all nonattaining waters would be incorrectly classified as attaining (FNE = 0.26). In addition, 30% of waters having greater than 15% fines (therefore indicating nonattainment) would be attaining (PPE = 0.3) and 16% of waters with less than 15% fines (therefore indicating attainment) would actually be nonattaining (NPE = 0.16). In addition, ROC analysis provides information on the accuracy of the prediction model for a selected Xc. For Data Set 3, the highest overall accuracy is estimated to be just below 80% at Xc = 12, and is slightly less (∼75%) at Xc = 15. These error and nonerror rates may or may not be acceptable to water quality managers and stakeholders; however, the salient point is that ROC analysis provides more complete information than CPA on the type and magnitude of errors associated with predictions of response variable condition based on exceedances of a stressor variable criterion.

ROC analysis of Data Set 3 also shows that the lowest possible balanced FPE and FNE rates for any single Xc (i.e., balance point) is 0.24, which occurs at Xc = 13.6% fines. This value is similar to, though slightly less than, the Xc = 15 determined by Paul and McDonald (2005). This suggests that where it is a management goal to balance FPE and FNE rates using a single criterion Xc, ROC analysis can be used to identify the appropriate value. Comparing error rate results from Data Sets 1 and 3 suggests that it may not be possible to balance all 4 error rate types with a single criterion except in the most linear stressor–response relationships. Furthermore, the magnitude of the balance point is likely to reflect the amount of variation in the stressor–response relationship. Data Set 1 has the lowest balance point at 0.14 and the highest degree of correlation among all 3 data sets.

Using data shown in Figure 1I, the errors associated with other Xc values may also be easily determined. Xc values could be selected based on preferred FPE and FNE rates. For example, to reduce FNE to 10%, the corresponding Xc is 6% fines. This lower value reflects a preference for minimizing false negative errors, and could be considered more protective of high EPT taxa richness scores. However, at a percent fines value of 6%, the FPE rate is estimated to be nearly 50%. That is, nearly half of “healthy” cases (EPT < 9) can be expected to have percent fines greater than 6%. This proportion is roughly the same as the FPE rate for the uncorrelated reference example given in Table 2. Conversely, choosing an FPE rate of 0.1 yields Xc = 26% fines. However, here the FNE rate increases to more than 60%. These results illustrate that reliance on a single stressor variable criterion may not provide adequate control of errors in predicting the condition of a response variable.

ROC analysis of the Florida colored lakes data (Data Set 2) illustrates the error implications of selecting a single Xc criterion based on 50% prediction limits in a current regulatory application. In the proposed and final criteria developed by USEPA, the baseline tp criterion of 0.05 mg/L, established using the Ythr/UPL50 of the regression model, is the applicable criterion for an individual lake if sufficient chl a data are not available. In this case, exceedance of 0.05 mg/L tp may be used to list a lake as not meeting water quality standards based on nonattainment of the chl a threshold. ROC results show that although this tp criterion limits FNE and NPE rates to less than 0.05, PPE is greater than 30% and FPE is nearly 40%. Thus, more than 30% of colored lakes exceeding the tp criterion would actually attain the chl a criterion (PPE), and nearly 40% of colored lakes that actually attain the chl a criterion would be declared nonattaining by the baseline tp criterion (FPE).

High misclassification rates can have negative consequences for water quality management. Although minimizing FNE and NPE is clearly an important environmental management objective, minimizing FPE is also relevant to maximize effective use of limited environmental management resources. If sufficient chl a data are available, as defined in the regulation, a tp criterion that is higher than the baseline criterion is allowed up to the modified tp criterion. Thus, the regulation appears structured in a way that can make use of direct measurements of the response, in this case chl a, to limit potentially high FPE and PPE rates when a single tp concentration criterion is established that minimizes FNE and NPE rates. ROC analysis provides a means to estimate the associated reduction in prediction errors.

These examples show that although many factors may affect the selection of an indicator or stressor criterion, ROC analysis can be used with stressor–response data to provide important information about potential decision error rates to decision makers and stakeholders. This information may be useful in all 3 phases of criterion assessment described by Suter and Cormier (2008) and Cormier et al. (2008), i.e., planning, analysis, and synthesis. In the planning phase, ROC-derived error rate results could be used to evaluate available data and preliminary response thresholds against predefined tolerable limits on decision errors. Results could also guide additional studies designed to reduce uncertainty in the stressor–response relationship. In the analysis phase, in which goals include modeling the stressor–response relationship and identifying an appropriate response threshold or “benchmark effect” (Cormier et al. 2008), ROC error rates could be used to quantify uncertainties associated with alternative responses and thresholds. In the synthesis phase, in which evaluation of candidate stressor criteria is the primary goal, ROC error rates characterize the uncertainties associated with specific criteria selections given the model and response thresholds established in the analysis phase. Thus, decision error rate estimates obtained using ROC analysis could support a criterion assessment process in which all 3 phases are addressed in an iterative manner, from preliminary to final stages of criterion selection. Other aspects of ROC analysis not emphasized in this article, such as characterizations of nonerror rates and analysis of the ROC curve, could also contribute useful information to criterion assessment.

Paul and McDonald (2005) list 5 necessary conditions for the appropriate application of CPA, and these also may apply to ROC analysis: 1) a probability-based sampling design, 2) a metric that quantifies the pollutant, 3) a response metric that responds to the pollutant at present (observed) levels, 4) known characteristics of an impacted response, and 5) a pollution parameter that can exert a “strong effect” on the response metric. When these conditions exist as part of a predictive model relating a causal variable to attainment status of a response variable, ROC analysis can provide a useful characterization of the reliability of such predictions.

CONCLUSIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

Analysis of receiver operating characteristics can provide estimates of false positive, false negative, positive predictive, and negative predictive errors for a given combination of stressor–response relationship, response threshold, and a range of possible stressor criteria. These values can inform the criterion assessment process for risk-based WQC in terms that may be more easily interpreted by water quality managers and stakeholders than traditional output from other statistical procedures such as regression. Furthermore, ROC analysis can provide more comprehensive information on both error and nonerror rates than conditional probability analysis. Where causal links between stressor and response variables are established, a combination of data analysis methods that includes ROC analysis is likely to provide the most complete assessment of the performance of a criterion-based water quality prediction model based on stressor–response relationships.

Acknowledgements

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES

The author thanks Xavier Robin, Camille Flinders, Craig Loehle, and 2 anonymous reviewers for their helpful comments and Anna Aviza for help in manuscript preparation.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. METHODS
  5. RESULTS
  6. DISCUSSION
  7. CONCLUSIONS
  8. Acknowledgements
  9. REFERENCES
  • Barbour MT, Norton SB, Preston HR, Thornton K, editors. 2004. Ecological assessment of aquatic resources: Linking science to decision-making. Pensacola (FL): SETAC Press. 272 p.
  • Benyi SJ, Hollister JW, Kiddon JA, Walker HA. 2009. A process for comparing and interpreting differences in two benthic indices in New York Harbor. Mar Pollut Bull 59: 6571.
  • Borsuk ME, Stow CA, Reckhow KH. 2002. Predicting the frequency of water quality standard violations: A probabilistic approach for TMDL development. Environ Sci Technol 36: 21092115.
  • Cormier SM, Paul JF, Spehar RL, Shaw-Allen P, Berry WJ, Suter GW. 2008. Using field data and weight of evidence to develop water quality criteria. Integr Environ Assess Manag 4: 490504.
  • DiToro DM, Berry WJ, Burgess RM, Mount DR, O'Connor TP, Swartz RC. 2005. Predictive ability of sediment quality guidelines derived using equilibrium partitioning. In: Wenning RJ, Batley GE, Ingersoll CG, Moore DW, editors. Use of sediment quality guidelines and related tools for the assessment of contaminated sediment. Pensacola (FL): SETAC Press. p 557588.
  • Draper NR, Smith H. 1998. Applied regression analysis. 3rd ed. New York (NY): John Wiley & Sons. 736 p.
  • Efstratiou MA, Mavridou A, Richardson C. 2009. Prediction of salmonella in seawater by total and fecal coliforms and Enterococci. Mar Pollut Bull 58: 201205.
  • Fawcett T. 2006. An introduction to ROC analysis. Pattern Recogn Lett 27: 861874.
  • Gronewold AD, Borsuk ME, Wolpert RL, Reckhow KH. 2008. An assessment of fecal indicator bacteria-based water quality standards. Environ Sci Technol 42: 46764682.
  • Hale SS, Heltshe JF. 2008. Signals from the benthos: development and evaluation of a benthic index for the nearshore Gulf of Maine. Ecol Indic 8: 338350.
  • Hale SS, Paul JF, Heltshe JF. 2004. Watershed landscape indicators of estuary and benthic condition. Estuaries Coasts 27: 283295.
  • Hollister JW, Walker HA, Paul JF. 2008. CProb: A computational tool for conducting conditional probability analysis. J Environ Qual 37: 23922396.
  • Linnet K. 1988. A review on the methodology for assessing diagnostic tests. Clin Chem 34: 13791386.
  • Llanso RJ, Dauer DM, Volstad JH. 2009. Assessing ecological integrity for impaired waters decisions in Chesapeake Bay, USA. Mar Pollut Bull 59: 4853.
  • Long JP, Donovan TM, MacKay P, Zielinski WJ, Buzas JS. 2011. Predicting carnivore occurrence with noninvasive surveys and occupancy modeling. Landscape Ecol 26: 327340.
  • Mason SJ, Graham NE. 1999. Conditional probabilities, relative operating characteristics, and relative operating levels. Weather Forecasting 14: 713725.
  • McBean EA, Rovers FA. 1998. Statistical procedures for analysis of environmental monitoring data and risk assessment. Englewood Cliffs (NJ): Prentice-Hall Publishing. 336 p.
  • McLaughlin DM. 2012. Estimating the designated use attainment decision error rates of US Environmental Protection Agency's proposed numeric total phosphorus criteria for Florida, USA, colored lakes. Integr Environ Assess Manag 8: 167174.
  • McLellan HJ, Hayes SG, Scholz AT. 2008. Effects of reservoir operations on hatchery coastal rainbow trout in Lake Roosevelt, Washington. N Am J Fish Manage 28: 12011213.
  • Morrison AM, Coughlin K, Shine JP, Coull BA, Rex AC. 2003. Receiver operating characteristic curve analysis of beach water quality indicator variables. Appl Environ Microbiol 69: 64056411.
  • Murtaugh PA. 1996. The statistical evaluation of ecological indicators. Ecol Appl 6: 132139.
  • Murtaugh PA, Pooler PS. 2006. Evaluating ecological indicators: lakes in the northeastern United States. Environ Monit Assess 119: 8396.
  • Nevers MB, Whitman RL. 2011. Efficacy of monitoring and empirical predictive modeling at improving public health protection at Chicago beaches. Water Res 45: 16591668.
  • Paul JF, McDonald ME. 2005. Development of empirical, geographically specific water quality criteria: a conditional probability analysis approach. J Am Water Resour Assoc 41: 12111223.
  • Paul JF, Munns WR. 2011. Probability surveys, conditional probability, and ecological risk assessment. Environ Toxicol Chem 30: 14881495.
  • Reckhow KH, Arhonditsis GB, Kenney MA, Hauser L, Tribo J, Wu C, Elcock KJ, Steinberg LJ, Stow CA, McBride SJ. 2005. A predictive approach to nutrient criteria. Environ Sci Technol 39: 29132919.
  • Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Muller M. 2011. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12: 77.
  • Shine JP, Trapp CJ, Coull BA. 2003. Use of receiver operating characteristic curves to evaluate sediment quality guidelines for metals. Environ Toxicol Chem 22: 16421648.
  • Smith EP, Ye K, Hughes C, Shabman L. 2001. Statistical assessment of violations of water quality standards under section 303 (d) of the Clean Water Act. Environ Sci Technol 35: 606612.
  • Smith EP, Zahran A, Mahmoud M, Ye K. 2003. Evaluation of water quality using acceptance sampling by variables. Environmetrics 14: 373386.
  • Stevenson RJ, Hill BH, Herlihy AT, Yuan LL, Norton SB. 2008. Algae-P relationships, thresholds, and frequency distributions guide nutrient criterion development. J N Am Benthol Soc 27: 783799.
  • Suter GW, Cormier SM. 2008. What is meant by risk-based environmental quality criteria? Integr Environ Assess Manag 4: 486489.
  • Swets JA, Dawes RM, Monahan J. 2000. Better decisions through science. Sci Am 283: 8287.
  • [USEPA] US Environmental Protection Agency. 2002. Guidance for quality assurance project plans for modeling. (EPA QA/G-5M). December. Washington (DC): USEPA. EPA 240/R-02/007.
  • [USEPA] US Environmental Protection Agency. 2006a. Framework for developing suspended and bedded sediments (SABS) water quality criteria. Washington (DC): USEPA. EPA-822-R-06-001.
  • [USEPA] US Environmental Protection Agency. 2006b. Guidance on systematic planning using the data quality objectives process (EPA QA/G-4). Washington (DC): USEPA. EPA/240/B-06/001.
  • [USEPA] US Environmental Protection Agency. 2010a. Using stressor-response relationships to derive numeric nutrient criteria. Washington (DC): USEPA. EPA-820-S-10-001.
  • [USEPA] US Environmental Protection Agency. 2010b. Technical support document for U.S. EPA's final rule for numeric criteria for nitrogen/phosphorus pollution in Florida's inland surface fresh waters. [cited 2012 February 1]. Available from: http://water.epa.gov/lawsregs/rulesregs/upload/floridatsd1.pdf
  • [USEPA] US Environmental Protection Agency. 2011. A field-based aquatic life benchmark for conductivity in central Appalachian streams. Cincinnati (OH): USEPA. EPA/600/R-10/023F.
  • [USEPA SAB] US Environmental Protection Agency Science Advisory Board. 2010. Letter to the Honorable Lisa P. Jackson re: SAB Review of Empirical Approaches for Nutrient Criteria Derivation, April 27, 2010. [cited 2012 February 1]. Available from: http://yosemite.epa.gov/sab/SABPRODUCT.NSF/81e39f4c09954fcb85256ead006be86e/E09317EC14CB3F2B85257713004BED5F/$File/EPA-SAB-10-006-unsigned.pdf