Sediment reference sites were used to establish toxicity standards against which to compare results from sites investigated in San Francisco Bay (California, USA) monitoring programs. The reference sites were selected on the basis of low concentrations of anthropogenic chemicals, distance from active contaminant sources, location in representative hydrographic areas of the Bay, and physical features characteristic of depositional areas (e.g., fine grain size and medium total organic carbon [TOC]). Five field-replicated sites in San Francisco Bay were evaluated over three seasons. Samples from each site were tested with nine toxicity test protocols and were analyzed for sediment grain size and concentrations of trace metals, trace organics, ammonia, hydrogen sulfide, and TOC. The candidate sites were found to have relatively low concentrations of measured chemicals and generally exhibited low toxicity. Toxicity data from the reference sites were then used to calculate numerical tolerance limits to be used as threshold values to determine which test sites had significantly higher toxicity than reference sites. Tolerance limits are presented for four standard test protocols, including solid-phase sediment tests with the amphipods Ampelisca abdita and Eohaustorius estuarius and sea urchin Strong ylocentrotus purpuratus embryo/larval development tests in pore water and at the sediment-water interface (SWI). Tolerance limits delineating the lowest 10th percentile (0.10 quantile) of the reference site data distribution were 71% of the control response for Ampelisca, 70% for Eohaustorius, 94% for sea urchin embryos in pore water, and 87% for sea urchins embryos exposed at the SWI. The tolerance limits are discussed in terms of the critical values governing their calculation and the management implications arising from their use in determining elevated toxicity relative to reference conditions.