Use of receiver operating characteristic curves to evaluate sediment quality guidelines for metals

Authors

  • James P. Shine,

    Corresponding author
    1. Environmental Science&Engineering Program, Department of Environmental Health, 665 Huntington Avenue, Boston, Massachusetts 02115, USA
    • Environmental Science&Engineering Program, Department of Environmental Health
    Search for more papers by this author
  • Crista J. Trapp,

    1. Environmental Science&Engineering Program, Department of Environmental Health, 665 Huntington Avenue, Boston, Massachusetts 02115, USA
    Search for more papers by this author
  • Brent A. Coull

    1. Department of Biostatistics, Harvard School of Public Health, 665 Huntington Avenue, Boston, Massachusetts 02115, USA
    Search for more papers by this author

Abstract

Receiver operating characteristic (ROC) curves are commonly used in the biomedical field to assess the quality of a diagnostic test. The area under an ROC curve, which ranges from 0.5 to 1.0, is a measure of the overall effectiveness of a diagnostic test. These curves can be used to elucidate compromises in sensitivity (ability to correctly classify a toxic sample as toxic) and specificity (ability to correctly classify a nontoxic sample as nontoxic) associated with a given threshold. In this study, ROC curves were used to evaluate methods for estimating acute toxicity of metals in marine sediments. Differences in the effectiveness of speciation (comparisons of labile sulfides with simultaneously extracted metals) and total sediment concentration (such as the National Oceanographic and Atmospheric Administration Guidelines, Washington, DC) approaches were assessed by using a database of field and laboratory spiked sediments. Despite uncertainties associated with these methods, the areas under the ROC curves ranged from 0.84 to 0.89 for all approaches tested, with no significant differences between speciation and whole sediment approaches. Thresholds commonly used by environmental managers, although yielding high sensitivity, came at the expense of low specificity. Thresholds providing desirable trade-offs in sensitivity and specificity generally are higher than commonly used thresholds.

Ancillary