SEARCH

SEARCH BY CITATION

Keywords:

  • Benthos;
  • Correlation;
  • Kappa;
  • Sediment quality guidelines;
  • Toxicity

Abstract

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Toxicity-based sediment quality guidelines (SQGs) are often used to assess the potential of sediment contamination to adversely affect benthic macrofauna, yet the correspondence of these guidelines to benthic community condition is poorly documented. This study compares the performance of 5 toxicity-based SQG approaches to a new benthos-based SQG approach relative to changes in benthic community condition. Four of the toxicity-based SQG approaches—effects range median, logistic regression modeling (LRM), sediment quality guideline quotient 1 (SQGQ1), and consensus—were derived in previous national studies in the United States, and one was developed as a regional variation of LRM calibrated to California data. The new benthos-based SQG approach, chemical score index, was derived from Southern California benthic community data. The chemical-specific guidelines for each approach were applied to matched chemical concentration, amphipod mortality, and benthic macrofauna abundance data for Southern California. Respective results for each SQG approach were then combined into a summary metric describing the overall contamination magnitude (e.g., mean quotient) and assessed in accordance with a set of thresholds in order to classify stations into 4 categories of expected biological effect. Results for each SQG approach were significantly correlated with changes in sediment toxicity and benthic community condition. Cumulative frequency plots and effect category thresholds for toxicity and benthic community condition were similar, indicating that both types of effect measures had similar sensitivity and specificity of response to contamination level. In terms of discriminating among multiple levels of benthic community condition, the toxicity-based SQG indices illustrated moderate capabilities, similar to those for multiple levels of toxicity. The National LRM, California LRM, and the chemical score index had the highest overall agreement with benthic categories. However, only the benthos-based chemical score index was consistently among the highest performing SQG indices for all measures of association (correlation, percent agreement, and weighted kappa) for both toxicity and benthos. Integr Environ Assess Manag 2012; 8: 610–624. © 2011 SETAC


Editor's Note

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

This article represents 1 of 6 papers describing development and evaluation of a sediment quality assessment framework to support implementation of California's new sediment quality objectives for bays and estuaries, which became effective in 2009. Over thirty scientists collaborated on this effort by the California State Water Resources Control Board, which resulted in the establishment of one of the first statewide programs in the US to fully incorporate the sediment quality triad for regulatory applications.

INTRODUCTION

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Sediment quality guidelines (SQGs) are tools used by resource managers and scientists to relate sediment contaminant concentrations to predicted impacts on sediment-dwelling organisms (e.g., benthic macrofauna) in marine, estuarine, and freshwater systems. Although there are many variations, the most widely used SQGs consist of empirically based chemical concentration thresholds or ranges derived using statistical analysis of matched sediment chemistry and biological effects data (e.g., magnitude or frequency of adverse impacts). Although impact to the benthic macrofauna community is often the focus of concern in sediment quality assessments, few SQGs based on benthic community effects are available; those that are available are typically based on data sets with limited geographic scope (e.g., Puget Sound in Washington, USA, and the Hong Kong coastal region, China) and may not be applicable to other regions (Barrick et al. 1988; Leung et al. 2005; Kwok et al. 2008). Consequently, SQGs are often derived using laboratory sediment toxicity test data as the primary measure of biological effects (Wenning et al. 2005).

A wide variety of statistical approaches has been used to develop toxicity-based empirical SQGs. These approaches fall into 2 broad categories: co-occurrence and consensus (Wenning et al. 2005). Co-occurrence SQGs are developed using large databases of paired chemistry and toxicity information; the data are usually ordered by concentration and presence (or magnitude) of effect, and then chemical-specific SQGs are determined based on the characteristics of the data distribution. Commonly used co-occurrence approaches include the effects range median (ERM), which represents the median concentration of the chemical associated with multiple types of effects (Long et al. 1995) and logistic regression models (LRMs), which relate concentration to the probability of toxicity to amphipods (Field et al. 1999; Field et al. 2002). Consensus guidelines are “2nd generation” SQGs that aggregate values from several empirical approaches to produce guidelines that reflect the central tendency and are thus considered to be more robust for use in sediment quality assessment.

The association of empirical SQGs with toxicity has been documented in many studies (summarized in Wenning et al. 2005). Association is often demonstrated by a high incidence of toxicity when multiple guidelines are exceeded (Long et al. 1998). A stronger association is obtained when an integrative SQG index that attempts to account for mixtures of chemicals, such as the mean of SQG quotients for multiple chemicals in a sample, is used to describe the magnitude of contamination (Long et al. 2000; Long et al. 2006). Use of such integrative indices is generally preferred over individual guidelines for overall assessment of sediment contamination or comparisons among studies (Wenning et al. 2005).

The relevance of toxicity-based SQGs for assessing the potential for adverse effects to aquatic communities in situ is uncertain for most habitats. Although some studies have reported that contamination indices derived from toxicity-based SQGs are well correlated with benthic community disturbance (Hyland et al. 2003; Wenning et al. 2005), the characteristics of this relationship have been documented for only a few SQG approaches or locations, and dissimilar methods of analysis prevent generalization to other habitats. Other studies indicate that invertebrate community structure and function may be impacted at contamination levels lower than those indicated by toxicity tests for both freshwater (Liess and Von der Ohe 2005; Schäfer et al. 2007) and marine (Hyland et al. 2003; Kwok et al. 2008) systems. A wide discrepancy in contaminant sensitivity between laboratory toxicity tests and invertebrate communities suggests that application of toxicity-based SQGs may not be protective of key ecosystem resources.

Three strategies are available to increase confidence in the SQG-based assessment of sediment contamination effects, with respect to community condition. First, the accuracy of existing toxicity-based SQGs might be improved through site-specific calibration in order to account for differences in contaminant mixtures or geochemistry. Second, calibrating SQG effects thresholds based on changes to communities, rather than to changes in toxicity, might improve the ability of toxicity-based SQGs to predict impacts on communities. Finally, the development of new SQGs based on adverse community effects can provide more relevant and accurate methods for assessing the potential impacts of sediment contamination. The goal of this study was to evaluate these 3 strategies using Southern California sediment contamination data. We describe a new sediment quality guideline approach based on the response of a Southern California benthic community assemblage to contamination. The new approach, along with toxicity-based SQG approaches, was evaluated based on its ability to describe the potential for sediment contamination to affect benthic communities.

METHODS

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Study design

This study assessed the relationship of 6 empirical SQG approaches (1 benthos-based and 5 toxicity-based) with biological response for Southern California marine bay sediments. The toxicity-based approaches were selected from a group of established methods that met several screening criteria (described below). The benthos-based SQG approach was developed in this study, because no established approaches were identified that met the selection criteria. The evaluation included 4 major elements (Figure 1). The 1st element consisted of creating a data set of standardized development and validation data sets of Southern California sediment chemistry, toxicity, and benthic community measurements. The development data set was used in the 2nd analysis element to develop a new sediment quality index, the chemical score index (CSI), based on the association of changes in benthic macrofauna community condition with chemistry. The 3rd analysis element used a statistical optimization procedure to develop a consistent set of classification thresholds for indices based on each of the 6 SQG approaches that were evaluated. These classification thresholds were used to associate the SQG index values with expected toxicity or benthic community condition responses so that index performance could be compared among approaches. The final analysis element applied each of the SQG indices (and classification thresholds) to an independent validation data set in order to evaluate their association with toxicity and benthos responses. Three independent approaches were used to describe the associations: correlation with magnitude of response, cumulative frequency distributions of affected and unaffected samples, and agreement between SQG index and response categories.

thumbnail image

Figure 1. Schematic of data analyses for index development and evaluation. Numbers correspond to the sequence of analyses described in the Methods.

Download figure to PowerPoint

Data

Data consisting of 441 samples of matched toxicity, chemistry, and benthic macrofauna abundance data were compiled across 6 regional monitoring and research surveys in Southern California marine embayments. The data were screened to select high-quality and comparable information. All stations were located in enclosed bays or harbors at subtidal depths, and only data from surficial sediment (top 30 cm or less) were selected. The database included stations from Santa Barbara Harbor, in the north, to San Diego Bay, in the south. More information on the studies used to populate this database can be found at http://www.sccwrp.org/view.php?id=519.

Toxicity data were obtained by solid phase 10-d amphipod survival tests using Rhepoxynius abronius or Eohaustorius estuarius and standardized methods (USEPA 1994). Amphipod survival was normalized to the control response for each test batch by expressing the results as a percentage of the control survival. Each toxicity result was assigned to 1 of 4 categories of response: Nontoxic, Low Toxicity, Moderate Toxicity, and High Toxicity. The response ranges defining the toxicity categories were specific to each test species and were based on analyses of minimum significant difference and survival percentage (Bay et al. 2007). The category definitions for E. estuarius were: Nontoxic (≥90% survival), Low Toxicity (89–82% survival), Moderate Toxicity (81–59% survival), and High Toxicity (<59% survival). The corresponding categories for R. abronius were: ≥90% survival, 89 to 83% survival, 82 to 70% survival, and <70% survival. The magnitude of toxicity varied widely in the data set; however, most samples were classified as Nontoxic (Figure 2).

thumbnail image

Figure 2. Distribution of sediment toxicity and benthic community disturbance categories in the data set. N = 441.

Download figure to PowerPoint

Benthic community condition was determined based on species abundances collected in 0.1-m2 Van Veen grabs and sieved through 1-mm screens. Taxonomic inconsistencies among programs were eliminated by cross-correlating species lists, identifying differences in nomenclature, and resolving discrepancies by consulting the taxonomists from each program. The benthic community condition of each sample was classified into 1 of 4 categories of biological response: Reference (community that would occur at a reference site for that habitat), Low Disturbance (community shows some indication of stress, but is within measurement variability of reference condition), Moderate Disturbance (clear evidence of physical, chemical, natural, or anthropogenic stress), and High Disturbance (high magnitude of stress evident). The condition categories were determined by the median of 4 benthic indices calibrated to Southern California embayments (Ranasinghe et al. 2009): Benthic Response Index (BRI; Smith et al. 2001; Smith et al. 2003), Index of Benthic Biotic Integrity (Thompson and Lowe 2004), Relative Benthic Index (Hunt et al. 2001), and River Invertebrate Prediction and Classification System (Wright et al. 1993; Van Sickle et al. 2006). All categories of benthic condition were represented in the database, but most samples were classified as Reference or Low Disturbance; only 4% had High Disturbance (Figure 2).

The chemistry data selected for analysis had to meet several screening criteria that included a review of the data quality assessment by the study authors, use of comparable extraction or digestion methods, and measurement of a minimum suite of contaminants that included multiple metals and polycyclic aromatic hydrocarbons (PAHs). Standardized sums of PAHs, DDT and degradation products (DDTs), polychlorinated biphenyls (PCBs), and chlordanes were calculated using a consistent methodology for all samples (Bay et al. 2008). When calculating totals, chemical values that were missing or reported as below reporting limits were estimated using multiple regression imputation, via SAS procedure PROC MI (SAS Institute 2004). Imputation produces less bias than conventional approaches for estimating nondetect data, such as substituting 0 or one-half of the reporting limit (Helsel 2005).

The resulting matched toxicity–benthic data set contained a 1- to 2-order of magnitude range of concentration for each constituent (Table 1). Approximately two-thirds of the data (293 samples) were used for benthic-based SQG development and threshold calibration to benthic community condition. A validation data set, independent of the benthic development data, containing one-third of the data (148 samples) was used for performance evaluation of the SQG approaches, both with respect to benthos and toxicity. The splitting of samples into the calibration and validation data was performed using the SAS procedure, PROC SURVEYSELECT (SAS Institute 2004) with options for systematic sampling (with random start) throughout the range of benthic response values.

Table 1. Distribution of chemical contaminants in the Southern California data set
ChemicalUnitsPercentileN
10th50th90th
Coppermg/kg14.368.8211.0441
Leadmg/kg10.434.4101.0441
Mercurymg/kg<0.10.20.8441
Zincmg/kg61.7164.0315.0441
DDD, totalµg/kg0.83.010.8408
DDE, totalµg/kg0.88.652.9408
DDT, totalµg/kg0.82.28.7406
alpha-Chlordaneµg/kg0.31.14.2404
gamma-Chlordaneµg/kg0.41.46.2380
HMW PAH, totalµg/kg77.3339.03723.0441
LMW PAH, totalµg/kg19.177.1525.0441
PCB, totalµg/kg6.121.1116.0441

Toxicity-based sediment quality indices

Five toxicity-based SQG approaches were evaluated in this study. Four criteria were used to select these approaches: application on a national scale within the United States, diverse conceptual basis relative to other approaches, prior development of a method to represent mixtures of multiple contaminants as a single contamination index, and previous use to characterize California marine sediments. SQGs were considered to have national application if they were based on published methods and had been used for sediment quality assessment in multiple states. Multiple SQG approaches met the criterion for national application. Four of these national SQG approaches also met the remaining selection criteria and were chosen for evaluation. A 5th toxicity-based SQG approach, which was a regional calibration of 1 of the 4 selected national approaches, was also evaluated; this approach was included in the study in order to investigate the relative increase in performance resulting from adjustment for regional variations in geochemistry and contamination patterns.

Effects range median

The ERM guideline values are based on the analysis of marine chemistry and biological effects data from throughout North America (Long et al. 1995). These SQGs use results from a wide range of biological effects measures, including acute and sublethal sediment toxicity tests of field sediments, spiked sediment experiments, benthic community assessments, fish pathology, and mechanistic models of sediment toxicity. In general, the chemical concentrations associated with adverse effects for each study were compiled and sorted in ascending order, with the ERM representing the median concentration of the data distribution. The index used to represent the ERM approach in the present study was the mean ERM quotient (mERMQ) developed by Long et al. (2000), which was calculated by dividing each chemical concentration by its respective ERM and averaging the individual quotients. A subset of 27 ERM values was used to calculate the mERMQ (Table 2), which was the same as that used in previous mERMQ performance studies (Long et al. 2000).

Table 2. Chemical concentrations for individual sediment quality guideline (SQG) approaches used for data analysesa
ChemicalUnitsERMSQGQ1ConsensusNational LRMCA LRMmCSI
  • a

    Values for the effects range median (ERM) were taken from Long et al. (1995). Concentrations used to calculate the mean sediment quality guideline quotient 1 (SQGQ1) were taken from Fairey et al. (2001). Consensus midpoint effect concentration values were taken from Swartz (1999); MacDonald et al. (2000); and Vidal and Bay (2005). T50 values (concentration at which probability of toxicity equals 50%) are shown for the National LRM approach and were calculated from model parameters in Field et al. (2002). California LRM T50 values were taken from Bay et al. (2008). Values shown for the CSI indicate the concentration above which is associated with moderate to high levels of disturbance to benthic communities and were developed in this study (Table 3).

  • *

    Values are given on µg/g organic carbon basis.

Arsenicmg/kg70.0 55.0   
Cadmiummg/kg9.64.25.911 
Chromiummg/kg370.0 224.9   
Coppermg/kg270.0270.0225.0 14596.50
Leadmg/kg218.0112.2222.3944660.80
Mercurymg/kg0.7 0.6 10.45
Nickelmg/kg51.6 67.6   
Silvermg/kg3.71.83.4   
Zincmg/kg410.0410.0357.1245132201.00
1-Methylnaphthaleneµg/kg   94  
1-Methylphenanthreneµg/kg   112  
2,6-Dimethylnaphthaleneµg/kg   133  
2-Methylnaphthaleneµg/kg670.0  128  
Acenaphtheneµg/kg500.0  116  
Acenaphthyleneµg/kg640.0  140  
Anthraceneµg/kg1100.0     
Benzo[a]anthraceneµg/kg1600.0     
Benzo[a]pyreneµg/kg1600.08.07.0   
Benzo[b]fluoranthene    1107  
Biphenyl    73  
Chryseneµg/kg2800.0     
Dibenz[a,h]anthraceneµg/kg260.0     
Dieldrinµg/kg8.0  35 
Fluorantheneµg/kg5100.0  1034  
Fluoreneµg/kg540.0  114  
HMW PAH, total     12 5061325.00
LMW PAH, total     4127312.00
Naphthaleneµg/kg2100.0  217  
p,p′-DDD    19  
p,p′-DDEµg/kg 6.0    
p,p′-DDT     12 
Phenanthreneµg/kg1500.0 25.4455  
Pyreneµg/kg2600.01800.0*1800.0*   
Total chlordaneµg/kg 400.00.5   
trans-Nonachlor     6 
DDD, total      3.56
DDE, total      6.01
DDTs, totalµg/kg46.1    2.79
PAH, totalµg/kg      
PCB, totalµg/kg180.0  36894524.70
alpha-Chlordaneµg/kg    61.23
gamma-Chlordaneµg/kg     1.45
Consensus

Consensus SQGs are chemical values based on the integration of multiple SQG approaches in an effort to obtain guidelines with greater validity. The integration method and types of SQGs used vary, but in general the consensus SQG represents either the arithmetic or geometric mean of at least 3 different SQGs having a similar intended application (e.g., to predict probable biological effects). The Consensus SQG values for PAHs and PCBs were midrange effect concentrations obtained from Swartz (1999) and MacDonald et al. (2000), respectively. Consensus values for DDTs, dieldrin, As, Cd, Cr, Cu, Pb, Hg, Ni, Ag, and Zn were obtained from Vidal and Bay (2005). The index used to represent the Consensus SQGs in the present study was the mean Consensus quotient (mConsensusQ), which was calculated by dividing each chemical concentration by its respective Consensus SQG (Table 2) and averaging the individual quotients.

Sediment quality guideline quotient 1

The sediment quality guideline quotient (SQGQ) approach is a composite of chemical guidelines from other approaches that were selected to provide an improved ability predict toxicity to amphipods using California data (Fairey et al. 2001). These values are a combination of consensus values for PAHs and PCBs (Swartz et al. 1999; MacDonald et al. 2000), ERMs, and probable effects levels (PELs; MacDonald et al. 1996). The index used to represent the SQGQ1 guidelines in the present study was the mean SQGQ1 quotient (mSQGQ1Q), which was calculated by dividing each chemical concentration by its respective SQG (Table 2) and averaging the individual quotients.

Logistic regression modeling (National LRM)

The LRM approach uses a regression model to relate chemical concentration to the probability of sediment toxicity. The models were developed using logistic regression analysis of a large database of marine amphipod survival data from field studies throughout North America (Field et al. 1999; Field et al. 2002). The logistic regression model is described by the following equation:

  • equation image

where p = probability of observing a toxic effect; β0 = intercept parameter; β1 = slope parameter; and x = log concentration of the chemical.

Chemical-specific models for 18 contaminants were used in this study; the models were selected on the basis of a combination of occurrence of the chemical in the California data set and a low rate of false positives for predicting toxicity. The model parameters used for each chemical were obtained from Field et al. (2002). The maximum probability of toxicity across all 18 LRM models was used as the index of overall contamination for the National LRM approach. As a point of comparison analogous to the chemical concentrations used for the ERM, SQGQ1, and Consensus approaches, we calculated the concentration corresponding to a 50% probability of toxicity (T50) for each chemical model (Table 2).

California logistic regression modeling (CA LRM)

A regional version of the LRM approach calibrated using California data (CA LRM) was also evaluated, as recent analyses showed that the CA LRM had greater association with toxicity in Southern California sediments relative to other SQG approaches (Bay et al. 2008). Regional calibration consisted of developing logistic regression models for individual chemicals based on California sediment toxicity data (marine amphipod survival) using the methods described in USEPA (2005). Models for 12 chemicals were included in the CA LRM approach. The model parameters for each chemical are described in Bay et al. (2008). The maximum probability of toxicity across all chemical LRM models for each sample was used as the index of overall contamination. As with the National LRMs, T50 values were calculated for CA LRMs as a point of comparison with the other SQGs (Table 2).

Benthos-based sediment quality index

The chemical score index (CSI) is a new benthic-based SQG approach that was derived for this study using matched chemistry and Southern California benthic macrofauna data. The CSI describes the overall level of chemical exposure in terms of the potential for benthic community disturbance. First, chemical concentrations are translated into chemical exposure scores that are associated with 1 of 4 levels of increasing benthic disturbance measured using the benthic response index (BRI), an index based on the abundance weighted pollution tolerance score of the organisms present in the sample (Smith et al. 2001; Smith et al. 2003). This translation was done for 2 reasons: 1) BRI had readily available a set of ordered classifications representing 4 levels of disturbance to benthic communities, and 2) because data variability is greatly reduced when data are placed into discrete categories. In addition to scores we assigned weighting factors to each chemical to reflect the relative strength of chemical-benthos association. Chemical exposure scores that had high concordance with benthic disturbance categories were given more weight than those chemicals showing relatively low concordance with disturbance. Finally, these 2 types of data, exposure scores and relative weights are combined across 12 chemical contaminants known to be associated with benthic disturbance in order to account for the joint chemical contribution to the sample. In the sections that follow, we give the 1) calculation for the CSI, 2) an optimization routine that selects chemical-specific guidelines for defining chemical exposure categories, and 3) the statistic that defines the weighting factors for each chemical.

Calculation of the CSI

Chemical scores are determined by comparing each chemical concentration to a set of 3 chemical-specific guidelines (G1, G2, and G3) that classify chemical concentration into 1 of 4 exposure categories that correspond to 4 levels of community disturbance defined by the BRI (Appendix C): Reference (≤G1), Low Disturbance (>G1 and ≤G2), Moderate Disturbance (>G2 and ≤G3), and High Disturbance (>G3). The chemical exposure categories are given a numerical score of 1, 2, 3, or 4 corresponding to a category of Reference, Low, Moderate or High, respectively. The chemical score for each contaminant is then multiplied by a weighting factor that measures the strength of association between the chemical score and biological effect. These products are then summed across the 12 constituents in the sample and divided by the sum of weighting factors, producing the mean weighted CSI.

  • equation image

where: sj = chemical score based on the concentration of a contaminant, and wj = weighting factor for that contaminant

An example of the CSI calculation is shown in Appendix A (online Supplemental Data).

CSI chemical guideline optimization procedure

The CSI chemical-specific guidelines, G1, G2, and G3, were selected by applying an optimization procedure based on maximizing the overall agreement between chemical scores and the 4 benthic community disturbance levels of the BRI. Overall agreement was measured by the weighted kappa statistic (Cohen 1960, 1968) and gives the proportion of agreement beyond that expected by chance alone between 2 sets of ordered classifications. The weights for the kappa statistic were selected according to the linear weighting scheme of Cicchetti and Allison (1971). These weights are commonly used with ordinal data and are the default in many statistical programs including SAS. With this weighting scheme, the highest (full) weight is given to perfect agreement (e.g., chemical score = 2 and BRI = Low Disturbance). A relatively high weight (i.e., more partial credit) is given when the disagreement is relatively small (e.g., chemical score = 2 and BRI = Moderate Disturbance), whereas a lower weight is given when disagreement is relatively large (e.g., chemical score = 2 and BRI = High Disturbance). The ratio of the average weighted agreement and its expectation under the assumption of random classification gives the kappa statistic (weights and calculation of weighted kappa is given in Appendix B). A weighted kappa value > 0 implies improved classification over chance, whereas weighted kappa = 1 indicates perfect agreement. A weighted kappa < 0 represents below chance expectation of classification.

The optimal set of 3 guideline values for each chemical was selected by comparing weighted kappa values across a large set of possible candidates. These candidates consisted of all permutations of 3 guidelines, taken at 5% increments of the concentration range. Weighted kappa values for guidelines between the 5% increments were linearly interpolated. To ensure convergence on the optimization procedure and so that guidelines were not too close to one another, distances between individual guidelines within each set were constrained to be no less than 10% of the chemical range. The set of 3 guidelines that yielded the highest kappa statistic from all permutations was selected as optimal for that chemical.

The guideline optimization procedure was bootstrapped, where 50 random subsamples were selected (without replacement) from the development data for optimization. Each subsample contained 60 minimally affected (i.e., Reference or Low Disturbance) and 60 clearly affected (Moderate or High Disturbance) samples according to the BRI. This step was necessary in order to minimize bias due to the greater prevalence of reference samples in the benthic data set. It is well known that increased prevalence in a single category can yield questionable kappa results (e.g., high agreement but low kappa and vice versa) that could distort the relative strengths of association among the chemicals (Kraemer and Bloch 1988; Fienstein and Cicchetti 1990). Kappa is least susceptible to providing large contradictions when samples are more evenly distributed among the categories (Lantz and Nebenzahl 1996; Canran et al. 2005). Repeating the subsampling process (i.e., bootstrapping) 50 times ensured that nearly all samples were represented in at least 1 subsample. The median across all 50 subsamples was used as the final optimal guideline value for the Low (G1), Moderate (G2), and High (G3) categories.

CSI weighting factors

The CSI weighting factor for each chemical was determined by applying the final set of optimal guidelines to all 50 bootstrap subsamples and computing the weighted kappa for each subsample. The median weighted kappa for each chemical was then calculated, representing the overall relative strength of association between the variations in chemical concentration and BRI categories. Finally, the median kappa values were normalized to the largest median kappa across 12 chemicals for use as weighting factors in calculation of the CSI. Thus, bootstrapping addressed 2 important considerations: 1) potential bias due to increased prevalence of reference samples in the data set, and 2) robustness of guidelines to perform reasonably well across a variety of chemical contributions within each benthic category.

Threshold development for all SQG indices

Determining the agreement of sample classification based on SQG indices with those based on toxicity or benthos requires the application of effect thresholds for each SQG index. Such thresholds are generally unavailable for these SQG approaches or vary in the method of development. The thresholds used in this study were developed for each SQG index using a consistent methodology so that differences in performance would reflect inherent differences among approaches, rather than variations in how thresholds were assigned.

Two sets of 3 thresholds were established for each SQG approach. One set of thresholds defined 4 ranges of index values that corresponded to toxicity categories based on amphipod survival: Nontoxic, Low Toxicity, Moderate Toxicity, and High Toxicity. These thresholds were developed in a previous study (Bay et al. 2008). The other set of thresholds, developed in this study, was based on benthic community condition categories based on the median of 4 benthic indices: Reference, Low Disturbance, Moderate Disturbance, and High Disturbance. Each set of thresholds was selected on the basis of optimizing weighted agreement equation image between the 4 SQG index ranges and the corresponding biological effect categories, relying on the same methodology as previously described for the CSI Chemical Guideline Optimization Procedure, with the exception that the weighted kappa statistic was replaced by weighted agreement. This small modification to the optimization routine preserved consistency with the toxicity threshold optimization of the previous study. Selection of the benthic community-based SQG index thresholds used the SQG development data set described previously. The toxicity-based thresholds were based on a larger development data set (n = 887), which was used for a related study to calibrate and evaluate various toxicity-based SQG indices (Bay et al. 2008). This data set contained many of the same samples contained in the benthic community data set and also additional samples from the same water bodies for which benthic community data were not available. This larger data set was used to provide the most robust set of threshold optimization analyses possible and to enhance comparability between the 2 studies.

SQG index evaluation

SQG index performance was evaluated by quantifying the strength of association between chemistry and biological effect (i.e., toxicity or benthic community condition) using both threshold independent and threshold dependent measures of association. Threshold independent measures include correlation and cumulative frequency plots whereas threshold dependent measures include categorical classification accuracy and the weighted kappa statistic. Correlation was measured as the nonparametric Spearman's correlation coefficient between the SQG index value (i.e., mean quotient, mean score, or maximum probability) and amphipod survival or benthic community condition category.

The SQG index cumulative frequencies were plotted separately for unaffected and affected samples with respect to sediment toxicity or benthic community condition. For this analysis, samples classified as Nontoxic or Low Toxicity were regarded as unaffected with respect to toxicity. Samples classified as Reference or Low Disturbance were regarded as unaffected with respect to benthic community condition. Affected samples were defined as those with either a moderate or high category of biological effect. Cumulative frequencies give the percentage of affected (or unaffected) samples having SQG index values less than or equal to a threshold value, where threshold values span the range of all possible SQG values. The cumulative frequency for unaffected samples estimates the specificity for correctly classifying samples with little or no biological effect, whereas 1-cumulative frequency for affected samples estimates the SQG approach's sensitivity for correctly classifying samples with substantial biological effects.

Categorical classification accuracy was quantified as the percent agreement between the 4 SQG index categories (determined by applying the thresholds derived using the development data set) and the biological effect categories (toxicity or benthic community condition). Percent agreement was calculated as:

  • equation image

where: A = percent agreement; NC = number of samples correctly classified; and NT = total number of samples.

Finally, the weighted kappa statistic was compared among SQG indices with respect to the 4 categories of benthic community condition. As previously stated, the weighted kappa statistic measures the weighted agreement between 2 sets of ordered classification (e.g., SQG index and benthic community condition) beyond what is expected by chance alone. The weights are the same linear weights that were used in CSI development and SQG index threshold optimization (see above). The ratio of the average weighted agreement and its expectation under the assumption of random classification gives the weighted kappa statistic (see Appendix B).

All 3 measures of association (correlation, percent agreement, and weighted kappa) were computed across 50 bootstrap subsamplings of 40 unaffected and 40 affected benthic samples, selected randomly without replacement. Because there lacked sufficient numbers of highly disturbed benthic samples in the validation data (i.e., 10), complete uniformity across all 4 benthic categorizations was not possible. Instead, sample sizes were constrained to be equal between the affected and unaffected samples. Even so, it was not possible to simultaneously sample evenly across both toxicity and benthic categorizations with this method. Therefore, the weighted kappa statistic was computed only with respect to benthic community condition. Bootstrapping 50 subsamples ensured that all samples were represented at least once in the analysis.

Bootstrapping also allowed for statistical comparisons of SQG index performance. The lower 10th percentiles of the bootstrapped distributions for the highest median correlation, for percent agreement (weighted kappa) established a lower bound for determining statistical differences. For example, medians for SQG index associations falling below the 10th percentile of the bootstrap distribution of the highest median were deemed statistically different, whereas medians above the 10th percentile were characterized as statistically similar to the highest median. All evaluations were conducted using the same independent validation data set, which consisted of data that was not used for the development of the CSI or the calibration of benthos-based SQG index thresholds for categorical classification.

RESULTS

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

CSI development

There was substantial variability in the relationship between individual chemical concentrations and the benthic response index (BRI). Spearman correlation coefficients were highest for some metals (e.g., Cu and Zn) and lowest for DDE and DDT (Figure 3). Nine of the 12 chemicals showed statistically significant correlation with respect to BRI, even after adjusting for multiple comparisons; the exceptions were LPAH, DDT, and DDE. Intercorrelation among chemicals also varied greatly, from 0.15 for DDT and Cu to 0.93 for alpha-chlordane and gamma-chlordane. Chlordanes, PAHs, pesticides, and metals all showed moderate to high correlations within each class, whereas correlations between classes were typically lower.

thumbnail image

Figure 3. Correlation of chemical concentration with toxicity or benthic response index (BRI). Nearly all chemicals were significantly correlated with toxicity and benthic response (p < 0.05), even after adjusting for multiple comparisons. Only HPAH was nonsignificantly correlated with toxicity, whereas LPAH, DDT, and, DDE, were nonsignificantly correlated with benthic response.

Download figure to PowerPoint

Markedly different correlations were obtained for some contaminants when compared to toxicity. For example, increased DDD, DDE, and DDT concentrations showed a relatively high correlation with increased amphipod mortality but a low correlation with increased BRI. All chemicals were statistically significantly correlated with toxicity except high molecular weight PAHs (HPAH). BRI and toxicity were significantly correlated (p < 0.0001), although the strength of their correlation was low (r = 0.28).

Plots representative of results from the CSI guideline development analyses for individual chemicals are shown in Figure 4 for Cu, which had one of the strongest correlations, and HPAH (weak correlation). Analyses for each contaminant usually showed a trend of greater frequency and severity of benthic disturbance with higher concentrations. A poor relationship was typical at low chemical concentrations, where BRI scores were highly variable, ranging from Reference to High Disturbance (Figure 4). High BRI scores at low concentrations may indicate the presence of samples impacted by stress due to noncontaminant factors, such physical disturbance or predation.

thumbnail image

Figure 4. Benthic response index (BRI) data distributions and chemical score indicator (CSI) guideline values (vertical dashed lines) for Cu and high molecular weight PAHs (HPAH).

Download figure to PowerPoint

A set of guideline values was developed for 12 contaminants using a bootstrapped optimization procedure that maximized overall agreement (weighted kappa) between chemical concentration and BRI score (Table 3). The guidelines for Cu, Pb, and Zn produced the highest weighted kappa values (0.34, 0.30, and 0.33, respectively), indicating that classifications based on these metals had greater concordance with the 4 levels of benthic disturbance (BRI score) than those based on other chemicals. The lowest weighted kappa values and therefore the least concordance with the BRI were obtained for the HPAH and low molecular weight PAHs (LPAH) (0.05 and 0.02, respectively) and DDT (0.07). Variation in kappa values was generally similar to the pattern observed for Spearman correlation coefficients. The normalized weighting factor for each contaminant, which was based on kappa, varied from 5 (LPAH) to 100 (Cu).

Table 3. Chemical score indicator (CSI) guideline values and weighting factors calculated in this study from the southern California dataseta
ChemicalUnitsGuidelineKappaWeight
Low (G1)Moderate (G2)High (G3)
  • a

    Weights are based on the kappa obtained during threshold optimization for each chemical, normalized to the largest kappa value.

Coppermg/kg52.8096.50406.000.34100
Leadmg/kg26.4060.80154.000.3088
Mercurymg/kg0.090.452.180.1030
Zincmg/kg113.00201.00629.000.3398
DDD, totalµg/kg0.773.5626.370.1545
DDE, totalµg/kg1.196.0145.840.1133
DDT, totalµg/kg0.612.7934.270.0720
alpha-Chlordaneµg/kg0.501.2311.100.1955
gamma-Chlordaneµg/kg0.541.4514.500.2058
HMW PAH, totalµg/kg313.001325.009320.000.0516
LMW PAH, totalµg/kg85.40312.002471.000.025
PCB, totalµg/kg11.9024.70288.000.1955

No attempt was made to develop chemical guideline values using common community indices such as Shannon–Weiner or Simpson. Preliminary analyses indicated such indices were less accurate than the BRI in describing benthic condition, and thus predictions of sediment quality based on a CSI derived using these alternative indices would likely be less effective.

SQG index associations with toxicity and benthos

All of the SQG indices were significantly correlated with changes in sediment toxicity and benthic community condition, though actual correlation values were moderate in scale (Table 4). The correlations were similar, both among indices and between biological effects measures. The median toxicity correlations ranged from 0.42 (mSQGQ1Q) to 0.55 (CA LRM index), whereas the median correlations with the combined index of benthic community condition ranged from 0.46 (mConsensusQ) to 0.53 (National LRM index). None of the SQG indices' median correlations with benthic community condition were statistically different from that of the National LRM. Median correlations for the new CSI approach were among the highest (0.51 and 0.50 for toxicity and benthos, respectively), regardless of which biological endpoint was being measured.

Table 4. Correlation and classification accuracy of sediment quality guideline (SQG) index values with respect to toxicity or benthic community conditiona
SQG ApproachIndexToxicityBenthos 
CorrelationAgreementCorrelationAgreementWeighted kappa
r%r%
  • a

    Values marked with an asterisk are within the 90th percentile of the SQG distribution that yielded the largest performance value across 50 bootstrap samplings. Numbers in parentheses are the median percent of bootstrap samples in each of the 4 biological effect categories (i.e., reference, low, moderate, and high, respectively) where SQG classifications agreed with biological endpoint.

  • b

    All thresholds for CSI were based on benthic community condition.

  • *

    Statistically similar to highest median.

CSIbWeighted mean score0.51*52* (63,43,61,15)0.50*49* (65,40,50,22)0.37*
CA LRMMaximum probability0.55*44 (45,40,47,36)0.52*49* (50,35,54,56)0.41*
National LRMMaximum probability0.4546* (45,57,59,29)0.53*49* (45,35,61,50)0.40*
National ERMMean quotient0.4549* (40,38,83,17)0.47*40 (45,30,54,11)0.27
ConsensusMean quotient0.53*47* (45,38,63,25)0.46*44 (55,35,46,38)0.33
SQGQ1Mean quotient0.4240 (31,31,58,29)0.47*36 (18,35,50,33)0.23

Each of the indices was also significantly correlated with each other. Pairwise correlations ranged from 0.73 (mConsensusQ versus mSQGQ1Q) to 0.95 (mConsensusQ versus mERMQ). Correlations of the CSI with the other indices ranged from 0.80 to 0.88.

Although overall correlations were similar, the toxicity and benthos data showed a different distribution with respect to the SQG index values. Results for the mERMQ (Figure 5) are typical of those for the other indices. At low index values (e.g., mERMQ < 0.1), a low incidence of toxic samples is evident but the benthic community condition is highly variable. The presence of some samples with high community disturbance at low contaminant concentrations illustrates a limitation common to most benthic indices: they cannot distinguish contaminant stress from impacts of other types, such as hypoxia, episodic stormwater runoff discharges, and physical disturbance, which are difficult to identify accurately in tidal embayments having multiple sources of contaminants and complex circulation patterns. An opposite pattern is present among samples with the highest mERMQ values: most of these samples have evidence of benthic community disturbance whereas the toxicity results are highly variable.

thumbnail image

Figure 5. Biological effect data distributions relative to mean effects range median (ERM) quotient. The mean quotient was calculated using ERM values for the 27 contaminants listed in Table 2.

Download figure to PowerPoint

The plots of the cumulative frequencies of stations with and without biological effects against the nationally derived SQG indices show a similar degree of sensitivity and specificity for toxicity and benthic condition, however (Figure 6). Within each type of biological effect measure, cumulative frequencies for unaffected samples were always higher than cumulative frequencies for affected samples for a given SQG index value. The difference in index values (i.e., SQG indexunaffected − SQG indexaffected) for a given percentage was also similar between toxicity and benthic condition, indicating a similar potential within each SQG approach to discriminate between affected and unaffected samples, regardless of which biological effect was targeted. For example, at a cumulative percentage 50%, the difference in mERMQs for toxicity and benthic condition was 0.062 and 0.072, respectively. Values of mERMQ values corresponding to low, moderate, and high incidence of effects (i.e., 10, 50, 90% cumulative frequency) were also similar for toxicity and benthic condition. For example, mERMQs corresponding to the 10th percentile of affected samples were 0.083 and 0.074 for toxicity and benthos respectively. The main difference between the frequency plots for toxicity and benthos occurred the lower range of SQG index values (e.g., mConsensusQ ≤0.2), where there was often a higher proportion of affected benthic samples than affected toxicity samples.

thumbnail image

Figure 6. Cumulative percentage of samples showing toxicity or benthic community responses versus SQG index value. Two plots are shown for each type of response (toxicity or benthos): 1 plot for samples with minimal effects based on a 4-category classification of the response (e.g., samples having toxicity categories of Nontoxic or Low) and 1 plot for clearly affected samples (e.g., toxicity categories of Moderate or High). The index values shown on the x-axis are based on the concentrations of multiple chemicals and vary for each sediment quality guideline (SQG) approach: mean effects range median (ERM) quotient, mean consensus guideline quotient mConsensusQ, National logistic regression model (LRM) maximum probability of toxicity, and mean sediment quality guideline quotient 1 (SQGQ1) quotient.

Download figure to PowerPoint

The SQG index category thresholds calibrated to 4 categories of benthic community condition were similar to those calibrated to toxicity (Table 5). Although the toxicity-based thresholds were almost always lower than those based on benthos, the differences were small (usually less than 10%). The 2 SQGQ1Q High thresholds showed the greatest variation: 0.80 for High Toxicity versus 1.26 for High Benthic Disturbance.

Table 5. Sediment quality guideline (SQG) index thresholds based on toxicity or benthic community conditiona
SQG approachIndexToxicityBenthos
LowModerateHighLowModerateHigh
  • na = not applicable

  • a

    Chemical score indicator (CSI) thresholds were not developed for toxicty.

CSIWeighted mean scorenanana1.722.283.03
CA LRMMaximum probability0.420.580.720.430.600.78
National LRMMaximum probability0.230.440.610.220.430.65
National ERMMean quotient0.060.120.380.060.130.40
ConsensusMean quotient0.140.260.600.150.300.68
SQGQ1Mean quotient0.160.340.800.130.371.26

Each of the SQG approaches showed moderate levels of agreement with each type of biological effect (Table 4). The CSI and National LRM approaches had the greatest percent agreement for toxicity (52% and 49%, respectively). Indices for the CSI, CA LRM, and National LRM had the highest percent agreement with benthic community condition (49% in all 3 cases). The SQGQ1Q had the lowest percent agreement for each effects measure. Results for the other SQG indices varied within a relatively narrow range. There was no consistent trend in percent agreement between the effects measures. For CA LRM and National LRM, percent agreement was slightly higher for benthos than for toxicity, but the trend was reversed for the other SQG approaches.

All the SQGs investigated in this study showed agreement percentages greater than would be expected by chance alone. Recall from probability theory that for the case where categorizations across the 4 levels of benthic response are even (e.g., 20 samples representing each category), the expected agreement by chance alone is (202 + 202 + 202 + 202)/802 = 25%. However, because there tended to be fewer samples in the “Highly Disturbed” benthic category the expected % agreement due to chance alone was approximately (202 + 202 + 312 + 92)/802 = 29%, where 20, 20, 31, and 9 represent that average number of samples (across the 50 bootstrapped samplings) in the Reference, Low, Moderate, and Highly Affected categorizations, respectively. For the toxicity, the average number of samples in the Reference, Low, Moderate, and High categorizations were 34, 9, 25, and 12, respectively, giving an expected percent agreement due to chance of (342 + 92 + 252 + 122)/802 = 31%.

The weighted kappa values for all SQG approaches with respect to benthic community condition were all > 0, indicating improvement in weighted agreement over that expected solely on the basis of chance. The CA LRM, National LRM, and CSI approaches showed the greatest improvement (weighted kappa = 0.41, 0.40, and 0.37, respectively), whereas mConsensusQ showed the least (0.27).

DISCUSSION

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Each of the SQG indices investigated in this study showed utility with respect to describing the potential for impacts on the benthic community, regardless of whether the indices were developed and calibrated to toxicity or benthos. This effectiveness was demonstrated by the presence of significant but moderate correlations, percent agreements, and weighted kappa values among 4 levels of biological impact. These results are consistent with studies conducted in other locations in the United States (Long et al. 2006) and worldwide (Borja et al. 2008). Mean SQG quotients based on ERMs have been shown to be predictive of marine benthic community disturbance in Florida, USA (MacDonald et al. 2004), along the Atlantic and Gulf Coasts (Hyland et al. 2003), and in Washington state, USA (Long et al. 2005). In Europe, SQG quotients based on empirical guidelines corresponded to benthic community degradation in the Baltic Sea (Gulf of Gdańsk, Bettinetti et al. 2009), whereas relationships with benthic community disturbance in Spanish embayments have been shown for indices of contamination based on toxic units in Barcelona harbor (Lladó et al. 2007) and for empirical SQGs in the Gulf of Cádiz (Choueri et al. 2009). This study documents for the 1st time that the new CSI index, as well as SQG indices derived from logistic regression models and consensus guidelines, relate comparably to benthic community impacts in Southern California.

The wide range of benthic community condition observed at low contamination levels (Figure 5) illustrates the challenge in assessing benthic disturbance in urban bays. On the coastal shelf or in streams impacted by pollution sources, well-defined spatial pollution and disturbance gradients are present and reference regions are relatively easy to identify. As a result, benthic indices applied in these habitats typically have less variability when sediment contamination is low (e.g., Schäfer et al. 2007). In contrast, contamination gradients in Southern California marine bays are less distinct and more complex, being influenced by multiple sources of pollution and disturbance that are further modified by complex tidal currents and episodic freshwater and/or storm water flows. Benthic community responses to these gradients are further modified by additional natural and anthropogenic physical stresses, such as predation by fish, hypoxia, and ship movements (e.g., prop scour and anchors). These factors result in difficulties in identifying reference areas with certainty and account for benthic samples in poor biological condition in areas with low contaminant concentrations.

Similar levels of association with toxicity and benthic community disturbance were observed among SQG indices, regardless of the SQG approach or type of biological data underlying the guidelines (i.e., toxicity or benthos). Such similarity may have been related to 2 factors: 1) the robust nature and general applicability of empirical SQGs, and 2) the use of consistent methods for index evaluation. Because the empirical SQGs were developed using large and diverse data sets, these SQGs tend to describe general trends in the data set that are common to multiple regions rather than site-specific variations in contaminant mixtures or bioavailability. As a result, contamination indices based on these SQGs (e.g., mean ERM quotient) would be expected to have a similar (albeit relatively low) degree of association with various measures of biological effect. The use of consistent methods for threshold selection and SQG index evaluation in this study minimized the potential for differences in analysis method to influence the evaluation results. Had thresholds obtained from the literature been used for evaluating the national SQG indices, it is likely that the agreement with biological effects would have been lower and not representative of the maximum performance of each approach. For example, a mERMQ >1.5 has been identified as indicative of a high risk of biological effects (Long et al. 2000), yet the threshold providing the highest concordance between mERMQ and the highest category of Southern California sediment toxicity was much lower (0.4).

The lack of improvement in classification agreement with the use of a SQG index developed using benthic community response data (CSI) indicates that uncertainty in the measurement of chemical exposure to the organism is likely the key factor limiting index performance. The 3 primary aspects of this uncertainty are: relatively low sediment contamination in Southern California embayments, unaccounted exposure from unmeasured contaminants, and inaccuracy in quantifying the contaminant dose delivered to benthic organisms. Most Southern California sediments, while contaminated with multiple compounds, have overall contaminant levels that are 1 to 2 orders of magnitude less than other contaminated US embayments. Thus, typical “high” Southern California contaminant levels (e.g., mERMQ = 0.5–1.0) represent intermediate levels on a national scale, where mERMQ values range above 100. The predictive accuracy SQG indices is relatively low (approximately 40% incidence of toxicity) at intermediate contamination levels (Long et al. 2006). Currently used pesticides, such as pyrethroids and organophosphates, have been identified as important (and perhaps the primary) causes of sediment and water toxicity in California's urban watersheds (Holmes et al. 2008). None of the empirical SQG indices in widespread use include current use pesticides and thus they may not represent this significant exposure source. Finally, it is well established that measurement of total sediment contaminant concentrations is an inaccurate indicator of the bioavailable contaminant fraction (i.e., dose) that is able to enter the organism (Di Toro et al. 1991). No SQG index has been able to effectively account for variations in contaminant partitioning (a primary determinant of bioavailability) among different sediment types. The greatest potential for improving the accuracy in SQG indices is to improve the representativeness and accuracy of the chemical exposure measurement, both through developing SQGs for additional priority contaminants (e.g., current use pesticides) and developing improved measures of the bioavailable concentration of contaminants on sediment (e.g., measurement of porewater or easily desorbed portion of sediment contaminants).

The cumulative frequency plots and classification thresholds between toxicity and benthic community condition were nearly identical, indicating that these laboratory and field biological effects measures had similar sensitivity and specificity in their response to sediment contamination. This finding differs from other studies, where benthic community responses were reported to be more sensitive. For example, Hyland et al. (2003) determined a very high level of risk to the benthos occurred at an mERMQ >0.361, whereas Long et al. (1998) determined that an mERMQ >1.5 corresponded to a high risk of toxicity. Kwok et al. (2008) determined probable effect concentrations for Hong Kong benthos that were lower than most current SQGs.

The disparity in relative sensitivity reported in other studies may be related to several factors. One factor is a difference in the source of the toxicity response thresholds used for comparison. Both Hyland et al. (2003) and Kwok et al. (2008) compared benthic community effect thresholds that they derived from regional data sets to toxicity effect thresholds derived by others from different data sets and regions. The toxicity and benthic community thresholds compared in this study were developed using similar data sets and an identical statistical approach. Furthermore, the decision to bootstrap roughly even numbers of samples across the biological effects categories allowed for the calibration and evaluation of multiple thresholds equally, without preference toward a particular level of classification. In practice, accuracy for assessing benthic condition can be improved by calibrating SQG index thresholds to data distributions more reflective of particular benthic populations for the region of interest. This study combined multiple data sets across many different regions in Southern California, which may have limited the effectiveness of SQG index calibration.

Differences in relative sensitivity of toxicity or benthic community SQG index thresholds among studies may also be related to variations in test species or benthic community indices. The toxicity data used by Long et al. (1998) to identify SQG index thresholds was dominated by survival tests using the marine amphipod Ampelisca abdita, whereas the Southern California analyses were based on the survival of 2 other species: E. estuarius and R. abronius. Tests of split samples indicate that A. abdita is less responsive to sediment contamination than E. estuarius or R. abronius (Bay et al. 2007), suggesting that the similarity of toxicity and benthic-based thresholds in this study may be due to the use of a more responsive test species. Different benthic indices were used in each study and their relative sensitivity to contamination may potentially influence threshold selection.

The CA LRM, National LRM, and CSI indices tended to have the highest association (correlation, percent agreement, and weighted kappa) with respect to benthos. However, only the CSI consistently had among the highest associations for all 3 metrics with respect to both benthos and toxicity. The high degree of association for CSI may have been related to several unique attributes: the approach was developed using Southern California data and therefore incorporated regional differences in contaminant mixtures or bioavailability, the approach was calibrated directly to a measure of benthic community response instead of toxicity, and finally, the contribution of individual contaminants to the final index value was weighted in proportion to the strength of association with biological effects.

Although the validation data used in this study set provided an independent assessment of the benthic-based CSI and benthos-based thresholds for existing SQG approaches with respect to benthic community data, there was partial overlap of these data with the toxicity development data set used in Bay et al. (2008). Therefore, the toxicity data used for comparing toxicity-based thresholds for this study was not entirely independent of threshold selection. Unfortunately, due to limitations in data availability, such overlap could not be avoided. Therefore, percent agreement of the SQG index values with respect to the toxicity categories reported in this study is expected to be somewhat higher than if the indices had been evaluated independently. In fact, toxicity agreements reported in Bay et al. (2008), where an independent toxicity validation data set was used for evaluation, were 3 to 7 percentage points lower than those reported here.

It is interesting to note that the Spearman correlations for the SQG approaches with respect to toxicity in this study are also higher than those reported in Bay et al. (2008), even though calibration was not required for Spearman analysis (with the exception of CA LRM). This suggests that the greater association of the SQG index values with toxicity in this study is more likely due to distributional differences in the evaluation data sets. Recall that evaluations in this study were based on bootstrapping roughly even numbers of benthic subsamples in 2 categories: unaffected and affected. In contrast, bootstrapping in Bay et al. (2008), ensured uniformity among all 4 toxicity categories. Although benthos and toxicity responses are correlated, they do not necessarily agree in all cases, particularly on the low end of the toxicity range. In fact, nontoxic samples made up the majority of samples among the benthic bootstrap subsamples. The toxicity-based SQG indices tend to have greater association with the lower end of the toxicity range (Bay et al. 2008).

The results of this study suggest 3 strategies for increasing the utility of SQG indices in the evaluation of sediment quality with respect to benthic community effects. First, the performance of established approaches such as the ERM can be enhanced through the use of thresholds calibrated to local benthic community response patterns. A 2nd strategy is to use a SQG index developed specifically for benthic macrofauna responses, instead of a toxicity-based index. A 3rd strategy consists of using a combination of SQG approaches, where multiple SQG indices based on either toxicity or benthos are combined to describe the overall potential for biological effects. A similar multiple-index approach is frequently used in sediment quality assessments, where various toxicity-based chemical indices are integrated in order to interpret sediment chemistry data (Chapman et al. 2002; McDonald et al. 2007).

Integrating the results of various types of SQG indices is likely to be the most reliable strategy for assessing the potential for sediment contamination to impact benthic communities. Use of this strategy strengthens the conceptual relationship between sediment contamination and benthic community response, while preserving the ability of the analysis to assess the potential for toxicity, an independent and widely used measure of contaminant effects. The need for a combined analysis is supported by the results of this study, which found different chemical-specific and station-specific patterns of response for toxicity and benthic community condition. The greater effectiveness of a multiple index approach has recently been documented for the analysis of benthic community condition, where the integration of results from multiple indices provided a more accurate determination of benthic community condition compared to the use of a single index (Ranasinghe et al. 2009). Such a multiple-index approach may provide a more confident method to assess the significance of chemical contamination, considering the limitations of current SQG approaches to predict the bioavailability and biological effects of contaminants under diverse conditions.

Editor's Note

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

This article represents 1 of 6 papers describing development and evaluation of a sediment quality assessment framework to support implementation of California's new sediment quality objectives for bays and estuaries, which became effective in 2009. Over thirty scientists collaborated on this effort by the California State Water Resources Control Board, which resulted in the establishment of one of the first statewide programs in the US to fully incorporate the sediment quality triad for regulatory applications.

SUPPLEMENTAL DATA

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

A: Chemical score indicator (CSI).

B: Weighted Agreement and Weighted Kappa Statistic.

C: Benthic Response Index Calculation.

Acknowledgements

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

We thank Stephen Weisberg, Chris Beegan from the California Water Resources Control Board, and Mike Connor and Bruce Thompson of the San Francisco Estuary Institute for their suggestions on the design of this study. Ananda Ranasinghe provided benthic community analyses and manuscript review. Peggy Myre of Exa Data and Mapping and Darrin Greenstein, Jeff Brown, and Diana Young compiled the data and assisted with data analysis. The authors also thank Peter Landrum, Ed Long, Todd Bridges, Tom Gries, Rob Burgess, and Bob Van Dolah for their thoughtful review of the ideas contained within the document. Work on this project was funded by the California State Water Resources Control Board under agreement 01-274-250-0.

REFERENCES

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information
  • Barrick R, Becker S, Brown L, Beller H, Pastorok R. 1988. Sediment quality values refinement: 1988 update and evaluation of Puget Sound AET, Volume 1. Bellevue (WA): PTI Environmental Services.
  • Bay S, Greenstein D, Young D. 2007. Evaluation of methods for measuring sediment toxicity in California bays and estuaries. Technical Report 503. Costa Mesa (CA): Southern California Coastal Water Research Project.
  • Bay SM, Ritter KJ, Vidal-Dorsch DE, Field LJ. 2008. Comparison of national and regional sediment quality guidelines for classifying sediment toxicity in California. In: Weisberg SB, Miller K, editors. Southern California Coastal Water Research Project Annual Report 2008. Costa Mesa (CA): Southern California Coastal Water Research Project. p 7990.
  • Bettinetti R, Galassi S, Falandysz J, Camusso M, Vignati AL. 2009. Sediment quality assessment in the Gulf of Gdańsk (Baltic Sea) using complementary lines of evidence. Environ Manag 43:13131320.
  • Borja Á, Bricker SB, Dauer DM, Demetriades NT, Ferreira JG, Forbes AT, Hutchings PA, Jia X, Kenchington R, Marques JC, et al. 2008. Overview of integrative tools and methods in assessing ecological integrity in estuarine and coastal systems worldwide. Mar Pollut Bull 56:15191537.
  • Canran L, Berry PM, Dawson TP, Pearson RG. 2005. Selecting thresholds of occurrence in the prediction of species distributions. Ecography 28:385393.
  • Chapman PM, McDonald BG, Lawrence GS. 2002. Weight-of-evidence issues and frameworks for sediment quality (and other) assessments. Hum Ecol Risk Assess 8:14891515.
  • Choueri RB, Cesar A, Abessa DMS, Torres RJ, Morais RD, Riba I, Pereira CDS, Nascimento MRL, Mozeto AA, DelValls TA. 2009. Development of site-specific sediment quality guidelines for North and Southr Atlantic littoral zones: comparison against national and international sediment quality benchmarks. J Hazard Mater 170:320331.
  • Cicchetti DV, Allison T. 1971. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol 11:101109.
  • Cohen J. 1960. A coefficient of agreement for nominal scales. Educ Psychol Meas 20:3746.
  • Cohen J. 1968. Weighted Kappa nominal scale agreement with provision for scale disagreement or partial credit. Psychol Bull 70:213220.
  • Di Toro DM, Zarba CS, Hansen DJ, Berry WJ, Swartz RC, Cowan CE, Pavlou SP, Allen HE, Thomas NA, Paquin PR. 1991. Technical basis for establishing sediment quality criteria for nonionic organic chemicals using equilibrium partitioning. Environ Toxicol Chem 10:15411586.
  • Fairey R, Long ER, Roberts CA, Anderson BS, Phillips BM, Hunt JW, Puckett HR, Wilson CJ. 2001. An evaluation of methods for calculating mean sediment quality guideline quotients as indicators of contamination and acute toxicity to amphipods by chemical mixtures. Environ Toxicol Chem 20:22762286.
  • Field LJ, MacDonald D, Norton SB, Severn CG, Ingersoll CG. 1999. Evaluating sediment chemistry and toxicity data using logistic regression modeling. Environ Toxicol Chem 18:13111322.
  • Field LJ, MacDonald DD, Norton SB, Ingersoll CG, Severn CG, Smorong D, Lindskoog R. 2002. Predicting amphipod toxicity from sediments using logistic regression models. Environ Toxicol Chem 9:19932005.
  • Fienstein AR, Cicchetti DV. 1990. High agreement but low Kappa: I. The problems of two paradoxes. J Clin Epidemiol 43:6:543549.
  • Helsel D. 2005. More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39:419A423A.
  • Holmes RW, Anderson B, Phillips B, Hunt JW, Crane DB, Mekebri A, Connor V. 2008. Statewide investigation of the role of pyrethroid pesticides in sediment toxicity in California's urban waterways. Environ Sci Technol 42:70037009.
  • Hunt JW, Anderson BS, Phillips BM, Tjeerdema RS, Taberski KM, Wilson CJ, Puckett HM, Stephenson M, Fairey R, Oakden J. 2001. A large-scale categorization of sites in San Francisco Bay, USA, based on the sediment quality triad, toxicity identification evaluations, and gradient studies. Environ Toxicol Chem 20:12521265.
  • Hyland JL, Balthis WL, Engle VD, Long ER, Paul JF, Summers JK, Van Dolah RF. 2003. Incidence of stress in benthic communities along the U.S. Atlantic and Gulf of Mexico coasts within different ranges of sediment contamination from chemical mixtures. Environ Monit Assess 81:149161.
  • Kraemer HC, Bloch DA. 1988. Kappa coefficients in epidemiology: an appraisal of a reappraisal. J Clin Epidemiol 41:959968.
  • Kwok KWH, Bjorgesæter A, Leung KY, Lui GCS, Gray JS, Shin PKS, Lam PKS. 2008. Deriving site-specific sediment quality guidelines for Hong Kong marine environments using field-based species sensitivity distributions. Environ Toxicol Chem 27:226234.
  • Lantz CA, Nebenzahl E. 1996. Behavior and interpretation of the kappa statistic: Resolution of the two paradoxes. J Clin Epidemiol 49:431434.
  • Leung KMY, Bjorgesater A, Gray JS, Li WK, Lui GCS, Wang U, Lam PKS. 2005. Deriving sediment quality guidelines from fieldbased species sensitivity distributions. Environ Sci Technol 39:51485156.
  • Liess M, Von der Ohe PC. 2005. Analyzing effects of pesticides on invertebrate communities in streams. Environ Toxicol Chem 24:954965.
  • Lladó XM, Gibert O, Martí V, Díez S, Romo J, Bayona JM, de Pablo J. 2007. Distribution of polycyclic aromatic hydrocarbons (PAHs) and tributyltin (TBT) in Barcelona harbour sediments and their impact on benthic communities. Environ Pollut 149:104113.
  • Long ER, Ingersoll CG, MacDonald DD. 2006. Calculation and uses of mean sediment quality guideline quotients: A critical review. Environ Sci Technol 40:17261736.
  • Long ER, MacDonald DD, Severn CG, Hong CB. 2000. Classifying the probabilities of acute toxicity in marine sediments with empirically derived sediment quality guidelines. Environ Toxicol Chem 19:25982601.
  • Long ER, MacDonald DD, Smith SL, Calder FD. 1995. Incidence of adverse biological effects within ranges of chemical concentrations in marine and estuarine sediments. Environ Manag 19:8197.
  • Long ER, Field LJ, MacDonald DD. 1998. Predicting toxicity in marine sediments with numerical sediment quality guidelines. Environ Toxicol Chem 17:714727.
  • Long ER, Dutch M, Aasen S, Welch K, Hameedi MJ. 2005. Spatial extent of degraded sediment quality in Puget Sound (Washington state, USA) based upon measures of the sediment quality triad. Environ Monit Assess 111:173222.
  • MacDonald DD, Ingersoll CG, Smorong DE, Greening H, Pribble R, Janicki T, Janicki S, Grabe S, Sloane B, Eckenrod D, et al. 2004. Development of an ecosystem-based framework for assessing and managing sediment quality conditions in Tampa Bay, Florida. Arch Environ Contam Toxicol 46:147161.
  • MacDonald DD, Di Pinto LM, Field LJ, Ingersoll CG, Long ER, Swartz RC. 2000. Development and evaluation of consensus-based sediment effect concentrations for polychlorinated biphenyls (PCB). Environ Toxicol Chem 19:14031413.
  • MacDonald DD, Carr RS, Calder FD, Long ER, Ingersoll CG. 1996. Development and evaluation of sediment quality guidelines for Florida coastal waters. Ecotoxicology 5:523278.
  • McDonald BG, de Bruyn AMH, Wernick BG, Patterson J, Pellerin N, Chapman P. 2007. Design and application of a transparent and scalable weight of evidence framework: An example from Wabamun Lake, Alberta, Canada. Integr Environ Assess Manag 3:476483.
  • Ranasinghe JA, Weisberg SB, Smith RW, Montagne DE, Thompson B, Oakden JM, Huff DD, Cadien DB, Velarde RG, Ritter KJ. 2009. Calibration and evaluation of five indicators of benthic community condition in two California bay and estuary habitats. Mar Pollut Bull 59:513.
  • SAS Institute. 2004. SAS OnlineDoc® 9.1.3. Cary (NC): SAS Institute.
  • Schäfer RB, Casquet T, Siimes K, Mueller R, Lagadic L, Liess M. 2007. Effects of pesticides on community structure and ecosystem functions in agricultural streams of three biogeographical regions in Europe. Sci Total Environ 382:272285.
  • Smith RW, Ranasinghe JA, Weisberg SB, Montagne DE, Cadien DB, Mikel TK, Velarde RG, Dalkey A. 2003. Extending the Southern California benthic response index to assess benthic condition in bays. Technical Report 410. Westminster (CA): Southern California Coastal Water Research Program.
  • Smith RW, Bergen M, Weisberg SB, Cadien DB, Dalkey A, Montagne DE, Stull JK, Velarde RG. 2001. Benthic response index for assessing infaunal communities on the southern California mainland shelf. Ecol Appl 11:10731087.
  • Swartz RC. 1999. Consensus sediment quality guidelines for PAH mixtures. Environ Toxicol Chem 18:780787.
  • Thompson B, Lowe S. 2004. Assessment of macrobenthos response to sediment contamination in the San Francisco Estuary, California, USA. Environ Toxicol Chem 23:21782187.
  • [USEPA] US Environmental Protection Agency. 1994. Methods for assessing the toxicity of sediment associated contaminants with estuarine and marine amphipods. Washington (DC): USEPA Office of Research and Development. EPA 600-R94-025.
  • [USEPA] US Environmental Protection Agency. 2005. Predicting toxicity to amphipods from sediment chemistry (Final Report). Washington (DC): USEPA National Center for Environmental Assessment Office of Research and Development. EPA/600/R-04/030.
  • Van Sickle J, Huff DD, Hawkins CP. 2006. Selecting discriminant function models for predicting the expected richness of aquatic macroinvertebrates. Freshwater Biol 51:359372.
  • Vidal DE, Bay SM. 2005. Comparative sediment guideline performance for predicting sediment toxicity in southern California, USA. Environ Toxicol Chem 24:31733182.
  • Wenning RJ, Batley GE, Ingersoll CG, Moore DW, editors. 2005. Use of sediment quality guidelines (SQGs) and related tools for the assessment of contaminated sediments. Pensacola (FL): Society of Environmental Toxicology and Chemistry.
  • Wright JF, Furse MT, Armitage PD. 1993. RIVPACS: a technique for evaluating the biological water quality of rivers in the UK. Eur Water Pollut Control 3:1525.

Supporting Information

  1. Top of page
  2. Abstract
  3. Editor's Note
  4. INTRODUCTION
  5. METHODS
  6. RESULTS
  7. DISCUSSION
  8. Editor's Note
  9. SUPPLEMENTAL DATA
  10. Acknowledgements
  11. REFERENCES
  12. Supporting Information

Supporting information may be found in the online version of this article.

FilenameFormatSizeDescription
ieam_191_sm_SuppApp.doc77KSupplementary Appendix

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.