Importance of background threshold value development within risk‐based corrective action programs

Risk‐based corrective action (RBCA) programs employ conservative models to develop default values for soil screening, which simplify the risk assessment process. However, for several naturally occurring metals (e.g., arsenic and lead), these published screening values are often unrealistic and well below the documented background levels in soil. This can lead to confusion among the regulated community and inexperienced regulators, as it will inappropriately identify naturally occurring conditions as a release (false positive or Type I error). An effective RBCA program requires the incorporation of defensible background threshold values (BTVs) in the screening process. Recent datasets and BTV development methods are available to enhance existing RBCA programs and reduce the occurrence of Type I errors. This review evaluated the role “background” currently plays in the Texas Risk Reduction Program (TRRP) and offers defensible approaches in minimizing Type I errors estimated by one Texas municipality to directly result in an unnecessary expenditure of over $250,000 annually to address this confusion in the form of additional assessment, remediation, soil management, and even disposal requirements. The same BTV development process demonstrated in this Texas case study can also inform risk assessment efforts in other areas where BTVs can supplement existing RBCA programs.


INTRODUCTION
Many states have sophisticated risk-based corrective action (RBCA) programs that include detailed exposure and fate and transport models on par with federal programs.By using a conservative set of site-specific parameters (e.g., soil type, soil pH, and depth to groundwater) in these models, precalculated values for soil, water, and soilgas are published to ensure that the regulated public has a sufficient margin of safety.While this approach has been successful in simplifying the assessment process, it can lead to excessive Type I errors (false positives) if representative background

Background on the Texas Risk Reduction Program
TRRP has proven to be a highly effective risk-based assessment program that is employed daily in Texas to ensure all those engaged in the practice of site assessment, investiga-

Core Ideas
• Understanding an expected "background" value is essential to any environmental risk-based corrective action (RBCA) decision-making.Unfortunately, many RBCA programs use artificially low, if any, background values for naturally occurring compounds.This can result in high false positive rates (Type I errors) and unnecessary costs and confusion within the regulated community.• High-resolution data are available on a national scale to inform the development of background threshold values for many naturally occurring metals (e.g., arsenic and lead) that are related to common Type I errors observed in RBCA programs.
• Like any regulatory process, RBCA programs should incorporate periodic reviews of their risk assessment processes with stakeholder input to ensure that the best available data and methods are utilized.
tion, and risk evaluation have a common framework.It has no doubt led to a large number of brownfield properties being addressed and successfully redeveloped under the Texas Corrective Action and Voluntary Cleanup Programs.Like any effective RBCA program, the flexible nature of TRRP in identifying and addressing risk is key to its success.This approach employs default Tier 1 Protective Concentration Levels (PCLs) calculated and published by the TCEQ using conservative inputs as initial screening "lookup" tables and the use of "background" values for 22 naturally occurring metals.For convenience, these 22 naturally occurring chemical elements are called "metals" despite including various nonmetals and metalloids: arsenic, antimony, boron, fluoride, and selenium.Of these metals, lead and arsenic are the most frequently identified, exceeding the default Tier 1 PCLs and requiring a comparison to the expected background levels.It is well known that arsenic (ATSDR, 2007) and lead (ATSDR, 2020) are prevalent in the natural environment (e.g., AsO 4 −3 ; AsO 2 -, Pb +2 , PbSO 4 , and PbCO 3 ), and their concentrations vary greatly depending on the parent material from which the soil is derived as well as vegetation (e.g., biomagnification near the surface).For example, soils with negligible weathering (Entisols) often have much lower metal concentrations than highly developed soils (Alfisols) (Díez et al., 2009), and soils derived from shale (Hydrolyzates) can have more than five times the arsenic or lead concentrations compared to soils derived from sedimentary formations (Resistates) (Hem, 1985).
When TRRP was promulgated in 1999, the TCEQ (then the Texas Natural Resource Conservation Commission [TNRCC]) elected to use a single median value for each of the 22 TSSBCs representing all of Texas.Comments to the proposed TSSBCs suggested the use of a median value was not statistically valid to derive a "background" value as it falsely suggests 50% of a normally distributed population is an exceedance (TCEQ, 1999) (e.g., false positive, Type I error).Although this is true theoretically, it is more common for environmental data to exhibit right-skewed lognormal or gamma distributions (US EPA, 2022a).Under these conditions, the median value will frequently be less than the mean, meaning that the use of a median statistic for the background is likely to cause a greater than 50% false positive (Type I error).This error rate is inconsistent with 30 TAC §350.79(2)(A)(iii), which requires a Type I error rate of no more than 5% within TRRP.The TSSBCs also neglect the consideration of spatial variability and increase Type I error rates for many parts of Texas (as further illustrated in Section 6.2).
It is important to emphasize that when a regulatory criterion is based on "background" rather than a risk-based cleanup standard, the null hypothesis (H o ) is framed where the collected sample is presumed to be consistent with background.This is why the decision that a given sample is above the background, when it is actually below the background, is a Type I error (false positive).This type of compliance monitoring is central to the EPA's risk assessment approach (US EPA, 1989EPA, , 1992aEPA, , 1992b)), detailed under 40 CFR 264.98(f) and within TRRP itself (30 TAC §350).

3
What is "background?" Under 30 TAC §350.4(a)(6),TRRP offers a definition of "background" as a "population of concentrations characterized from samples in an environmental medium containing a chemical of concern (COC) that is naturally occurring . . .or anthropogenic . . .but is not the result of site-specific use or release. . ." Under 30 TAC §350.51(c), it is further clarified that "if the assessment level is based upon background concentrations, then the assessment shall only extend to the background concentration level." The TRRP Preamble (TCEQ, 1999) provides some context for both the concerns expressed by the regulated community concerning the TSSBC approach and the TCEQ (then TNRCC) responses when originally proposed in 1999.These specifically include the following: 1. "TRRP allows persons to use the 'Texas-specific background concentration;' it is not a requirement."2. "Clearly, there is no scientific basis for drawing inferences about the distribution of background concentrations on a specific affected property based on a value that represents a median concentration for the entire state."3. "Persons have absolute latitude to establish background on a more site-specific basis in lieu of using the Texas median background levels."

Risk assessment under TRRP
TRRP is clear that under 30 TAC §350.51(l),you can "determine representative concentrations of COCs at the affected property or within areas representative of site-specific background conditions" as long as the study includes sound geostatistical methods, an appropriate number of samples are included to allow predictive modeling, exposure area assumptions are met, and "hot spots" (i.e., originating from an actual release) are not suggested.However, the use of TSSBCs for TRRP applicability decisions was expanded in a 2003 memorandum entitled Determining Which Releases are Subject to TRRP (TRRP Memo, revised 2010;TCEQ, 2010) in an effort to "help clarify when a release is subject to TRRP."Within the TRRP Memo, "background" is presented as an "Action Level" and TSSBCs as being "background." While "screening in" a COC for further evaluation is indeed a tenant of any risk-based program, the ability to also account for expected regional or site-specific background was always an intended element within TRRP and the seminal EPA (US EPA, 1996) guidance upon which it was partially based.The selection of an arbitrary median value to represent the entire state has resulted in penalizing specific physiographic regions of Texas where elevated metal concentrations occur naturally.This has resulted in large portions of the regulated community expending time and resources "delineating" and "remediating" to what TCEQ has set as an "Action Level" in the TRRP Memo.This is also inconsistent with the defined intent of TRRP as set forth in 30 TAC §350.1, which outlined "a consistent corrective action process directed toward protection of human health and the environment balanced with the economic welfare of the citizens of this state." TRRP uses expected background concentrations in concert with pre-calculated PCLs as a means to simplify the risk assessment process.For example, during the course of general environmental due diligence, samples may be collected and evaluated for arsenic and encounter concentrations of 9-11 mg/kg in parts of Texas where this is commonly expected (see Figure 2).The regulated community was asked to screen against the highest of the lowest PCL for that medium (i.e., 5.0 mg/kg in soil) and background (TSSBC of 5.9 mg/kg).This yields a screening value (Residential Assessment Level) of 5.9 mg/kg.The TRRP Memo requires the public to expend time and money to either develop a site-specific background Vadose Zone Journal value or commence risk-assessment/remedial activities and obtain regulatory concurrence, even though the available data suggest that the investigation result is likely background.It should be noted that the TCEQ will not currently allow the person to utilize existing USGS data for screening (TCEQ, 2023), so the public must collect data, at their expense, for a technical submission to the TCEQ.While well intended, the TRRP Memo highlights several overlapping conservative issues in one place.This includes an overly conservative PCL developed assuming a soil pH of 4.9 (contrary to EPA guidance to use a pH of 6.8; US EPA, 1996), use of Synthetic Precipitation Leaching Procedure by EPA Method 1312 even though TCEQ noted it may "overpredict" leaching (TCEQ, 2001a), and the median values represented by TSSBCs.For reference, recent data review efforts (Walkinshaw et al., 2022) have confirmed the median soil pH in Texas to be approximately 7.85, with only an estimated 3% of Texas surface soil (i.e., surface to a depth of 2 m) meeting conditions currently utilized to calculate PCLs (i.e., pH ≤ 4.9).Figure S1 shows a depiction of surface soil pH across Texas.
While the earlier portions of this review demonstrate that TRRP goals are not met with TSSBCs serving as part of any initial screening exercise, it is the resulting confusion, lost time, and expense (Daniel, 2015) that should be immediately recognized so that the matter can be addressed.
Interestingly, in 2001, the TCEQ updated the TRRP Tier 1 PCLs for mercury to include both 4.9 and 6.8 pH listing that lessened the cost implication of the TSSBC for this metal.Similar pH-specific PCLs were not offered for the other COCs (i.e., lead).When the authors inquired about the two pH PCLs for mercury, TCEQ indicated there was no "responsive information" as to why this was done (TCEQ, 2023).Separately, a 2004 Q&A document suggested that this 6.8 pH PCL for mercury was convenience-based and that "if the pH is 6.8 or greater, then the person can use the 6.8-based PCL" (TCEQ, 2004).Seeing the precedent established by the TCEQ for mercury and its immediate benefit for the regulated community, the authors requested that additional Tier 1 PCLs be developed by the TCEQ for important metals such as arsenic and lead.Unfortunately, the TCEQ declined to develop these 6.8 pH PCLs (TCEQ, personal communication, September 14, 2023), and further emphasizing the importance of BTVs in risk assessment under TRRP.
Based on returned surveys for this article, large municipal areas experience five-to six-figure cost implications as a result of the false positive (Type I error rate) issues created from TSSBCs (see Figure 1; Modern, 2023).
The City of Dallas specifically noted, "The current medianbased background values by TCEQ under represent naturally occurring conditions in the Dallas area which leads to confusion within the regulated community.This has led to direct and indirect costs for additional assessment, additional soil management and disposal, and additional regulatory reporting when no actual release has occurred."

1981 USGS dataset
While the initial reference for the dataset used to generate the TSSBCs was not given when the TRRP rule was first published in 1999, in response to comments submitted to TCEQ in 2006, the TRRP rule update (TCEQ, 2007) cited a 1975 USGS study (Connor and Shacklette, 1975

2013 USGS dataset
Since the 1981 study was released, the USGS has worked to improve sampling techniques, spatial representation of the United States, and more specifically, the data resolution needed to better understand the variability found across the United States, given our heterogeneous pedology and geology.A modern collection of these data was published in 2013 (Smith et al., 2013) by the USGS and is available to the public to allow improved screening for expected (spatially relevant) background conditions.A representative Texas map of sampling points comparing the 1981 and 2013 USGS efforts is shown in Figure S2.The visual comparison clearly shows the 2013 USGS dataset offers a significantly higher resolution representation of Texas.
F I G U R E 1 Estimated municipal cost impact of Texas-specific soil background concentrations (TSSBCs) (Modern, 2023).
The 2013 USGS dataset utilized a highly detailed process for sample point collection through analysis that included avoidance of anthropogenic sources (i.e., not sampling within 200 m of a major highway or within 5,000 m within industrial activities), high sampling density (∼1 per 1600 km 2 ), multiple depths (e.g., 5 cm, Horizon A, Horizon C), and extensive use of quality control procedures within the laboratory analysis.
USGS provides geographic information system (GIS) layers to the public to inform any size study with correlated isopach maps, clearly providing the reader with expected ranges of dozens of elements found in the surface soils across the United States.An example of the 2013 USGS arsenic data (Horizon A) as shown across Texas is presented in Figure 2. The 2013 USGS lead data (Horizon A) as shown across Texas is shown as Figure S3.

Review of potential bias between USGS and EPA methods
TCEQ staff have recently become reluctant to allow the use of USGS data for background concentration development, noting that "since Texas-specific background concentrations are memorialized in the rule, we go by them and not USGS" (TCEQ, personal communication, January 1, 2023).While this neglects the fact that TSSBCs are based on USGS data, it may be related to a larger concern that USGS and EPA analysis methods are not the same (Ames and Prych, 1995).It is true that historic and current USGS methods differ from the EPA methods (e.g., EPA SW-846 6010, 6020, and 7141) commonly utilized in RBCA.Additionally, the 2013 USGS methods dried all samples before analysis to remove moisture bias, while the EPA methods utilize the measured moisture content in "wet" samples to devlop "dry" weight values.
Based on this, it would be expected that all USGS mercury results would be biased low when compared to EPA methods that require preservation of this volatile metal prior to analysis.

EPA utilization of USGS data for background evaluations
When tasked with developing soil screening levels for ecological assessment, the EPA (US EPA, 2007) included a review of the 1981 USGS dataset used by the TCEQ TSSBCs.From the screening, the EPA preferred USGS data to its own EPA programmatic data specifically because EPA data "may not be representative of background concentrations."Instead, EPA decided the USGS data were "considered natural background concentrations because environmental samples were intentionally excluded in locations where metal concentrations were expected to be affected due to anthropogenic activities." This is consistent with many other EPA guidance documents (Breckenridge andCrockett, 1995, US EPA, 1995) outlining methods for developing background values for risk assessment.The EPA specifically includes the USGS as a resource to "establish an upper limit of background . . .if the investigator wanted to compare single values for a soil type."However, the EPA's most definitive support for using USGS data for background is the National Interactive Map for background lead concentrations that uses exclusively 2013 USGS data.The EPA notes, "USGS datasets for soil provide a baseline for the amount and distribution of chemical elements and minerals against which scientists can measure future changes from natural processes or human activities" (US EPA, 2023).

Reanalysis of USGS samples by EPA methods (Minnesota study)
An even more targeted answer to this bias concern between the USGS and EPA methods is available regarding arsenic, specifically in the form of recent work completed by the Minnesota Pollution Control Agency (MPCA).In 2021, MPCA sought to address the bias concern through a reanalysis of USGS soil samples for eight target inorganics (Brooks, 2021).The USGS soil samples from the 2013 USGS study (Smith et al., 2013) were shipped from Denver, Colorado, to MPCA and then reanalyzed to determine bias and to allow a more quantitative use of this dataset.A total of 45 samples were analyzed and compared against previous results obtained using USGS methods.
The MPCA effort definitively demonstrated that the USGS and EPA methods were nearly indistinguishable with regard to arsenic evaluation (Brooks, 2021).More specifically, the MPCA noted "an approximate −1% difference between USGS and EPA methods is observed in the 95th percentile." The MPCA elected to use all prior USGS datasets for arsenic in conjunction with the 45 reanalyzed samples to develop an arsenic BTV of 9 mg/kg for their state.It is notable that bias was quantified for other elements in the form of percent differences (−19% to −71%), and the data were then utilized to update the Minnesota BTVs.The Minnesota Study suggests significantly high biases within the USGS methods for aluminum, barium, chromium, and vanadium (>50% difference).Unfortunately, the MPCA study did not include lead in its evaluation.

USGS versus EPA method performance evaluation (Montana study)
The Montana Department of Environmental Quality (MDEQ) completed a Montana Background Soils Investigation (MBSI) in 2013 (Hydrometrics, 2013), which included the analysis of 20 metals.In the evaluation, the MDEQ included a comparison of the mean values from the USGS datasets (Shacklette and Boerngen, 1984) and 2013 MBSI results.In this review, a high USGS bias, similar to that noted in the 2021 MPCA study, was suggested for aluminum.However, a slight low USGS bias was suggested for elements such as arsenic and manganese (under representation).Consistent (no significant bias) results were observed between the USGS and MDEQ datasets for mercury, lead, nickel, copper, cobalt, and beryllium.MDEQ concluded that "the general agreement of the MBSI and USGS datasets in terms of mean concentrations suggests that MBSI samples are not dissimilar from the samples evaluated by Shacklette and Boerngen, which have frequently been used as generic regional 'background' benchmarks."Arsenic concentrations ranged from 1.5 to 116 mg/kg, with a median of 11.5 mg/kg and a BTV of 22.5 mg/kg was developed.Lead concentrations ranged from 3 to 45 mg/kg, with a median of 15 mg/kg and a BTV of 29.8 mg/kg was developed.

4.3.4
EPA arsenic background study (addressing false positives) A 2009 review utilized EPA-collected samples from 1995 to 2001 (Vosnakis & Perry, 2009), focusing exclusively on arsenic and the impact of "false positives" in the risk assessment process.To derive representative BTVs for risk assessment, only EPA-based methods were used on soil samples collected from 189 sites in Kentucky, Maryland, New York, Ohio, Pennsylvania, Virginia, and West Virginia (over 1,600 sample points) under strict EPA quality assurance/quality control (QA/QC) protocols in compliance with a Superfund Administrative Order on Consent (AOC).
This study found arsenic values ranging from 1.1 to 89 mg/kg and noted, "The consideration of this dataset and the BTVs may aid in the appropriate identification of arsenic in soils below typical background concentrations.In turn, the use of BTVs may aid in identifying where risks are truly elevated relative to background, and thus where remediation may or may not be appropriate."An example of state-specific BTVs developed and applicable risk-based screening levels (RBSLs) used is shown for Ohio (Figure S4).This shows a useful illustration of how RBCA criteria, when screened without a realistic BTV, can unintentionally result in an expenditure of time and resources to demonstrate that a release is not present (see gap between the RBSL of 6.8 mg/kg and BTV of 25.5 mg/kg, false positive region).

Review of BTV methods
In 2022, the Interstate Technology Regulatory Council (ITRC) released guidance (ITRC, 2022), noting that "since the mean is a measure of the central tendency of a dataset, upper confidence limit (UCL) of the mean should not, under all but select circumstances, be used as a BTV because the result would be excessive false positive results."The authors could not identify regulatory guidance, outside the TSSBCs themselves, supporting the use of a median value as a repre-sentative of the background concentration.Using the median itself is much more conservative than the UCL discussed by the ITRC as inappropriate.
The ITRC and EPA recognize that the 95% upper tolerance limit (95% UTL) "has become the most common measure of BTV in practice," with an upper simultaneous limit (95% USL) presented when there are minimal outliers and can be "specifically used to mitigate the issue of excessive false positive error rate in point-by-point comparisons" (ITRC, 2022).The 95% UTL represents the value where 95% of recorded samples would be expected to fall below 95% of the time, whereas the 95% USL represents a value where all recorded samples will fall below with a 95% confidence level.Another often "unrealistically conservative" BTV tool, per ITRC, can be the 95% upper prediction limit (95% UPL), which is uniquely tailored to predict an "upper bound value for a single comparison value." While TCEQ (TCEQ, 1998) has previously confirmed "an accepted statistical method for determining a background value from a set of data is the 95% UTL," consistent with current ITRC recommendations, a subsequent Interoffice Memorandum (IOM) (TCEQ, 2001c) expressed TCEQ concerns with 95% UTL values for less normal datasets.More recently, the TCEQ has confirmed (TCEQ, 2018) that "the UPL is the TCEQ's preferred background statistic" and "EPA's ProUCL program can be used to calculate the UPLs." For consistency, all statistical calculations were performed within ProUCL (Version 5.2), which utilizes adjusted formulas based on population distributions.The general normal distribution statistical formulas utilized within ProUCL and EPA guidance (US EPA, 2006) are shown for reference in Figure S5, but will be augmented pursuant to current EPA guidelines as performed within ProUCL itself within the calculations.A more detailed summary of all the statistics is provided in the Supporting Information.

BTV calculation
Given the verified performance of USGS arsenic and lead datasets in the above Minnesota and Montana studies, the sample precision and verification goals met within the 2013 USGS dataset (Smith et al., 2013), and a higher number of available samples representing Texas (10-fold above the 1981 USGS dataset), the authors isolated the Texas 2013 USGS data for further evaluation and development of BTVs that could inform RBCA under TRRP.Raw data from each vertical interval and combined intervals for arsenic and lead were then evaluated using the EPA's ProUCL (Version 5.2).For context, the sample number (n), mean ( x), standard deviation (σ), minimum, and maximum observations within each dataset, along with the 95% UPL and 95% UTL, are shown (Table 1).Note: All concentrations in mg/kg.There are areas of Texas with elevated arsenic naturally occurring above 100 mg/kg (i.e., Weches formation [Ledger, 2005] in Anderson, Cherokee, Nacogdoches, Sabine, and San Augustine counties).This is suggestive that even the elevated results removed as outliers are likely naturally occurring and regional screening would yield a higher locally applicable background threshold value (BTV).Abbreviations: CV, coefficient of variance; Max, maximum; Min, minimum; UPL, upper prediction limit; UTL, upper tolerance limit.a Censored dataset: In an effort to minimize bias from outliers noted during data review, 14 observations in excess of 24 mg/kg were removed from the arsenic dataset and 10 observations in excess of 41 mg/kg were removed from the lead dataset prior to development of possible BTVs (95% UPL and 95% UTL).b Censored dataset: In an effort to minimize bias from outliers noted during data review, eight observations in excess of 24 mg/kg were removed from the arsenic dataset and 10 observations in excess of 41 mg/kg were removed from the lead dataset prior to development of possible BTVs (95% UPL and 95% UTL).
The authors considered combined intervals (e.g., Horizons A, C, and 5 cm data and Horizon A and 5 cm data) to develop the most robust sample set for a state-wide population consistent with 30 TAC §350.51(l).The combined datasets were evaluated in both full form (i.e., with possible outliers) and censored form (i.e., with the removal of possible outliers) based on a desire to obtain the most defensible population distribution to allow calculation of confidence intervals suitable for BTV purposes.It is recognized that our censoring of only high-concentration observations likely removed naturally occurring data and may have produced results that are biased low.However, this is done purposefully to increase regulatory acceptance in both the methodology and the resulting BTVs.
Within the screening, the authors utilized visual examination of individual quantile-quantile (Q-Q) plots to identify possible outliers (presence of significant breaks/gaps) and indications of statistical distribution (e.g., normal or lognormal) along with minimum population statistics that included a linear correlation coefficient (R) of ≥|0.7| (NAVFAC, 2002) and a coefficient of variance (CV) of ≤1.0 (US EPA, 2022b) to confirm that the dataset normality thresholds were acceptable to allow the proposed statistical testing.This is consistent with the EPA CV requirement for censoring data (US EPA, 2006) and represents the EPA's recommended maximum CV when diverse soil types are anticipated (Breckenridge and Crockett, 1995).This approach was inclusive of outlier censoring based on visual and numerical considerations, as noted in Table 1 (e.g., R ≥ 0.95, without significant Q-Q breaks for BTV establishment).No non-detect results were identified in the USGS dataset for lead or arsenic.A normality quadrant graph (NQG) was provided with each dataset screening to illustrate the performance of the resulting R and CV with and without censoring.Datasets within the upper left quadrant of the NQG were considered suitable for further statistical testing and development of BTVs.Datasets falling outside the upper left quadrant may require censoring of data and/or further evaluation using nonparametric methods.
The same screening of raw datasets, followed by censoring to ensure population distributions supported the development of BTVs, was applied to the physiographic evaluation of lead and arsenic occurrence within the 2013 USGS data, as well as further screening of the 1981 USGS and 2013 USGS datasets with regard to other metals.If non-detects were within a given dataset, their treatment was noted in the table footnotes.
The authors included 95% UPL and 95% UTL calculations as part of BTV value development consistent with 30 TAC F I G U R E 3 Comparison of lead dataset quantile-quantile (Q-Q) plots.§350.79(2)(A)(iii).While more conservative than EPA and many state program recommendations (ADEC, 2018; DTSC, 2012; ITRC 2021; TCEQ, 1998; US EPA, 2022a), the authors utilized the 95% UTL for BTV development only when the dataset exhibited an R of ≥|0.95| and a CV of ≤0.75.When dataset normality goals did not allow use of the 95% UTL statistic but met the more general normality requirements (R of ≥|0.7| and a CV of ≤1.0), the more conservative 95% UPL statistic was utilized for BTV development.

6
RESULTS AND DISCUSSION

State-wide BTV development (lead and arsenic)
Table 1 shows the population statistics and confidence intervals, assuming a normal distribution for the 95% UPL and 95% UTL development.To ensure that the most defensible statistical interval was selected for BTV purposes, normality testing was performed using ProUCL within the combined intervals exhibiting the highest degree of normality after potential outlier removal to confirm population distribution (e.g., Gaussian, gamma, lognormal, and others).
Figure 3 shows a comparison of Q-Q plots for the full 2013 (all horizons) lead dataset versus the same dataset censored at 41 mg/kg.The censoring updated the initial CV and R of 0.771 and 0.66, respectively, to 0.411 and 0.97, respectively.The censored data exhibit a common lognormal pattern naturally occurring parameters and is a superior dataset for BTV development.
Figures 4 and 5 show the NQGs for the arsenic and lead datasets, respectively.The upper-left quadrant provides an easy visual reference when a dataset has sufficient normality to allow consideration of BTV development.Within the upper left quadrant is a dashed line providing a visual demarcation of when highly normal datasets are represented with the 95% UTL (three datasets) versus the 95% UPL for the remainder of the upper left quadrant (one dataset).Addressing outliers in a dataset will often improve normality, as seen in the censored arsenic and lead data for all intervals and the surface and Horizon A datasets.
Based on the data evaluation (Table 1), potential state-wide BTVs for arsenic (11 mg/kg) and lead (25 mg/kg) were developed using censored data within all intervals.These BTVs represent conservative values that can be employed as a starting point instead of TSSBCs, as they are more consistent with 30 TAC §350.79(2)(A)(iii).However, the authors wish to emphasize that while state-wide screening BTVs are a starting place for screening, more targeted area-based BTVs will better represent likely physiographic variability (see Table 2 and  Table S1) when possible, provided that a minimum number of samples (≥20) are available (US EPA, 2000).
Figure 6 shows the frequency histogram of arsenic, as seen across all 2013 USGS data intervals in Texas, as well as the related 1981 data histogram.A full statistical summary of each dataset and the corresponding Q-Q plots are also provided in the Supporting Information.Figure S6 shows a similar comparison for lead datasets from 1981 to 2013.
Figure 6 confirms a large false positive region to be present, consistent with the concerns expressed within the municipal survey feedback concerning the use of a median value (5.9 mg/kg) versus a more representative BTV (11 mg/kg) for arsenic proposed in this review across both the 1981 and 2013 data.Similar distributions were noted for arsenic between the 1981 and 2013 datasets, as well as for the individual soil horizons within the 2013 dataset.

Evaluating spatial variability
While the development of individual physiographic BTVs was not an objective of this review, physiographic screening was performed to demonstrate the significance of spatial variability within the 2013 USGS arsenic and lead datasets.For this effort, the authors independently evaluated USGS Horizon A data against 12 major ecoregions in Texas (see Figure 7; Griffith et al., 2007).Table 2 and Table S1 show a summary of the observed statistics within the individual physiographic regions for arsenic and lead, respectively.The NQGs for the arsenic and lead data within each ecoregion are shown in Figure 8 and Figure S7, respectively.
Table 2 and Figure 8 illustrate that when uncensored 2013 USGS (Horizon A) arsenic data is divided into physiographic areas, a significant improvement in normality is seen without any outlier censoring.Nine of the 11 ecoregions screened met the normal performance thresholds to allow BTV development.Six would allow the use of the 95% UTL for BTV development.The South Central Plains and East Central Texas Plains likely have subpopulations that are not accounted for in this screening or possess significant outliers.
Consistent with the performance of 2013 USGS arsenic data divided by physiographic area, Table S1 and Figure S7 illustrate the 2013 USGS lead dataset (Horizon A) to also see a significant improvement in normality without outlier censoring.Eight of the 11 ecoregions screened met the normal performance thresholds to allow BTV development.Five would allow the use of the 95% UTL for BTV development.
The confirmed spatial variability for lead and arsenic underscores the need for inclusive state-wide BTVs for all naturally occurring compounds that can replace the current use of TSSBCs.However, it also highlights the need for area BTV development early in the process that can be used in concert with precalculated risk-based criteria (i.e., PCLs).

State-wide BTV evaluation: 1981 USGS data
The TSSBC datasets were obtained and evaluated consistent with the 2013 USGS datasets to allow both an understanding of the data as well as the development of a representative "background" more consistent with TRRP and EPA requirements (5% Type I error).This evaluation utilized the same data as selected by the TCEQ in 1999, analysis using EPA's ProUCL (Version 5.2), and included the 95% UPL as TCEQ's preferred background statistic as well as 95% UTL values.As only mercury and strontium exceeded both the R and CV normality performance thresholds, censoring was completed on these datasets (as noted below) to obtain a reasonable population distribution from which to derive defensible BTVs under parametric conditions.The antimony dataset had a limited number of samples (n = 27) and exhibited an R of 0.59, below the minimum goal of 0.7.
The results of our review of the 1981 USGS dataset utilized by TCEQ for development of the TSSBCs confirmed that development of 95% UPL and 95% UTL values compliant with EPA and TRRP's Type I error requirements was possible.The NQG for all TSSBC metals within the 1981 USGS dataset is shown in Figure 9.While performance issues in the mercury and strontium datasets were addressed through censoring, antimony data were not considered suitable for BTV development.The best-fit distributions selected by ProUCL were included in Table 3.

State-wide BTV evaluation: 2013 USGS data
To complete the BTV evaluation for all 22 TSSBC metals selected by the TCEQ, the authors screened the full 2013 USGS dataset (all intervals) consistent with the previous methods applied for arsenic and lead from the 2013 USGS dataset and 1981 USGS dataset.This provides the largest possible dataset to inform BTV development for all 22 TSSBC metals in Table 4. Boron and fluoride were not included in the 2013 USGS dataset.Selenium is not included as the 2013 USGS dataset includes over 50% non-detects and requires regulatory concurrence on the treatment of non-detect data to allow the generation of a defensible BTV.Censoring of maximum concentrations only was performed on the arsenic, barium, copper, lead, strontium, and tin datasets to address possible outliers and meet the R and CV thresholds to allow the development of BTVs.The NQG for all TSSBC metals within the 2013 USGS dataset is shown in Figure 10.
An analysis of the 22 metals identified by the TCEQ for TSSBC development within the 2013 USGS dataset confirmed that the development of 95% UPL and 95% UTL values is possible.The NQG for all TSSBC metals within the 2013 USGS dataset is shown in Figure 10.

State-wide BTV values
Within the Texas BTV summary table (Table 5), the authors presented the 95% UPL or 95% UTL results, derived from the 1981 and 2013 USGS datasets, based on dataset normality.However, given the demonstrated bias with the 1981 and 2013 USGS data for aluminum, barium, chromium, and vanadium (Montana and Minnesota studies), it is suggested that the mean results be utilized for these metals until laboratory performance is verified (USGS methods vs. EPA methods) and an appropriate statistic can be developed.Where applicable, the 95% UPL or 95% UTL derived from the censored datasets and/or those with the highest number of samples (n) were selected.For simplicity, values were rounded to the nearest whole number, where possible.As illustrated in Table 5, with the exception of total chromium, vanadium, and mercury, the proposed state-wide BTVs exceeded the existing TSSBCs and would lessen the ongoing Type I error rates seen when screening against TSS-BCs.In the case of total chromium and vanadium, this is because a mean value likely underrepresents a true BTV.In the case of mercury, the calculated BTV is equal to the current TSSBC of 0.04 mg/kg.Because of the volatility of mercury, it is anticipated that all USGS data under represent actual background concentrations, and analysis using EPA methods that minimize volatility during sample collection and analysis would be preferable.However, as outlined in this review, the TCEQ elected to publish pH 6.8 based PCLs for mer-cury, which indirectly reduces the false positive rates for this metal.

CONCLUSIONS
It is apparent from this review that the median TSSBCs represent a starting point for background comparison, as a new RBCA program was seeing initial implementation.However, with time, the median TSSBCs became an "Action Level" that triggered an expenditure of resources by the regulated community unnecessarily.It is also clear from this review that the Type I error rate (false positive) was not considered when the median TSSBCs were selected for use across Texas.
The statistical analyses performed for this study successfully developed defensible state-wide BTVs for arsenic (11 mg/kg) and lead (25 mg/kg), as well as for other TSSBC metals and metalloids.A review of the 2013 USGS datasets also confirmed the effect of regional physiographic differences on expected background concentrations and reaffirmed the need for an inclusive BTV within a state-wide program such as TRRP.Additionally, this study confirms the viability of 2013 USGS data in meeting 30 TAC §350.51(l), as these were collected "within areas representative of sitespecific background conditions" and allow the development of local area BTVs using 95% UPL and 95% UTL values.This approach utilizes spatially relevant datasets and commonly accepted statistical practices preferred by the EPA and the TCEQ.
We hope that we have added to the RBCA discussion in a meaningful way and benefited practitioners working within TRRP or similar RBCA programs that lack BTVs.RBCA programs remain an effective approach to risk assessment that serves regulator and regulated alike, while protecting human health and the environment.However, the absence of truly representative BTVs within an RBCA program can unnecessarily complicate the assessment process.Accordingly, RBCA programs should include periodic reviews with meaningful stakeholder involvement so that updated information concerning risk assessment can be incorporated efficiently.
Having shown the TSSBCs are not representative of "background," we suggest the TCEQ consider updating the TCEQ Memo (Determining Which Releases are Subject to TRRP) as this document is the starting point for most risk assessment efforts in Texas.Future work concerning BTV development in Texas and elsewhere should focus on verifying USGS versus EPA analysis method performance (e.g., bias rate) for metals where a concern was identified (e.g., mercury, aluminum, barium, vanadium, and chromium).It is also suggested that the TCEQ reconsider addition of pH 6.8 based PCLs for metals only modeled at a pH of 4.9.This will reduce unnecessary cost concerns expressed by municipalities in relation to false positive rates currently seen under TRRP.

AU T H O R C O N T R I B U T I O N S
Kenneth S. Tramm: Conceptualization; data curation; formal analysis; funding acquisition; investigation; methodology; project administration; resources; software; supervision; validation; visualization; writing-original draft; writingreview and editing.Jason Minter: Formal analysis; validation; writing-review and editing.Catherine A. Seaton: Visualization; writing-review and editing.

A C K N O W L E D G M E N T S
The authors would like to thank Modern Geosciences for allowing staff hours and resources needed to complete this paper.We would also like to thank the leadership and staff of the Texas Commission on Environmental Quality (TCEQ) past and present.While this review focuses on one element of Texas Risk Reduction Program (TRRP) that needs to be updated to reflect current science, the authors want to emphasize that the TCEQ has been at the forefront of RBCA since the 1990s and has an exemplary record of protecting Texas by reducing and preventing pollution.The authors would like to acknowledge the following individuals for their support,

F
Surface soil arsenic concentrations in Texas (USGS, 2013).Depiction of USGS data layer for arsenic in surface soils (Horizon A) demonstrating variability and extensive false positive footprint (all yellow, orange, and red).
Normality quadrant graph for 2013 United States Geological Survey (USGS) arsenic data by interval.CV, coefficient of variance.Q-Q, quantile-quantile.F I G U R E 5 Normality quadrant graph for 2013 United States Geological Survey (USGS) lead data by interval.Q-Q, quantile-quantile.

Note:
All concentration in mg/kg.Ecoregions(TCEQ, 2007)  presented as documented by USGS and Environmental Protection Agency (EPA).Data normality goal set as a coefficient of variance (CV) of ≤ 1.0 and R ≥ 0.7 for this review when considering a dataset suitable for further review.Non-normal datasets should be further evaluated with additional statistical tools or consider removal of outliers.Abbreviations: CV, coefficient of variance; Max, maximum; Min, minimum; UPL, upper prediction limit; UTL, upper tolerance limit.F I G U R E 6 Arsenic concentrations in Texas soil samples (1981 vs. 2013 United States Geological Survey [USGS] datasets).United States Geological Survey (USGS) data points divided by ecoregions (TCEQ, 2007a).

F
Normality quadrant graph for 2013 United States Geological Survey (USGS) arsenic data by ecoregion.The coefficient of variance (CV) limit has been extended to accommodate elevated CV results.Q-Q, quantile-quantile.F I G U R E 9 Normality quadrant graph for 1981 United States Geological Survey (USGS) data (Texas-specific soil background concentration [TSSBC] dataset).CV, coefficient of variance.Q-Q, quantile-quantile.

Note:
All concentrations in mg/kg.All data from USGS(1981).To match Texas Commission on Environmental Quality (TCEQ) TSSBC results and ensure a consistent dataset was evaluated, half of estimated values (J) were honored and non-detects were not represented for boron, beryllium, cobalt, fluoride, antimony, and selenium datasets.For the lead dataset, a value of half the detection limit was used for 19 non-detect results.Abbreviations: CV, coefficient of variance; Max, maximum; Min, minimum; NP, nonparametric; UPL, upper prediction limit; UTL, upper tolerance limit.a Normal (1% significance).b Normal (CV, R, quantile-quantile [Q-Q] basis as noted earlier) with slight skewness-Use of normal distribution statistic acceptable.F I G U R E 1 0 Normality quadrant graph for 2013 United States Geological Survey (USGS) data (Texas-specific soil background concentration [TSSBC] dataset).The uncensored copper dataset not depicted (coefficient of variance [CV] = 7.98; R = 0.17).Q-Q, quantile-quantile.
United States Geological Survey arsenic and lead data by interval for Texas.
T A B L E 1 United States Geological Survey (USGS) arsenic data by Texas ecoregion.
T A B L E 2 Full evaluation of 1981 United States Geological Survey dataset used for original median Texas-specific soil background concentrations (TSSBCs).
Full evaluation of Texas-specific soil background concentration (TSSBC) metals within 2013 dataset.All concentrations in mg/kg.All data from USGS (2013).Non-detect treatment: cobalt (2% of sampled ND at <0.01 noted of 1237 samples honored as 0.005), chromium (3% of sampled ND at <1 noted of 1237 samples honored as 0.05), and mercury (12.1% of sampled ND at <0.01 noted of 150 samples honored as 0.005).Selenium exhibited >50% non-detect results and was not further evaluated.Inclusion of regulatory input on treatment of non-detect results required to allow further processing.
State-wide Texas background threshold value (BTV) summary.All concentrations in mg/kg.Symbol "-" indicates that normality criteria are not met.Abbreviations: NE, not evaluated; TSSBC, Texas-specific soil background concentration.a Values for this metal utilize a mean concentration to address anticipated high bias within United States Geological Survey analysis methods when compared to EPA analysis methods.
T A B L E 5Note: