Enhancing perspectives on lake impairments using satellite observations: A case study on High Rock Lake, North Carolina

Stakeholders developing water quality improvement plans for lakes and reservoirs are challenged by the sparsity of in‐situ data and the uncertainty ingrained in management decisions. This study explores how satellite images can fill gaps in water quality databases and provide more holistic assessments of impairments. The study site is an impaired water body that is serving as a pilot for improving state‐wide nutrient management planning processes. An existing in‐situ database was used to calibrate semi‐analytical models that relate satellite reflectance values to turbidity and total suspended solids (TSS). Landsat‐7 images from 1999 to 2020 that overpass High Rock Lake, North Carolina were downloaded and processed, providing 42 turbidity and 39 TSS satellite and in‐situ match‐ups for model calibration and validation. Model r‐squared values for the fitted turbidity and TSS models are 0.72 and 0.74, and the mean absolute errors are 14.6 NTU and 3.2 mg/L. The satellite estimates were compared to the in‐situ data and simulated TSS values produced by a calibrated hydrologic‐hydrodynamic model. The process‐based model is considered less accurate than the satellite model based on statistical performance metrics. Comparisons between data sources are illustrated with time series plots, frequency curves, and aggregate decision metrics to highlight the dependence of lake impairment assessments on the spatial and temporal frequency of available data and model accuracy.

of returning water bodies to an unimpaired state.The establishment of TMDLs or management strategies can take several years and requires input from diverse stakeholder groups composed of environmental scientists, engineers, utility managers, non-profit advocacy group representatives, and engaged community leaders (National Research Council, 2001).Once plans are established, frequent water quality monitoring is needed to adaptively manage these complex ecosystems (Gunderson, 1999).
Stakeholder groups often base their decisions on in-situ water quality data that has significant spatial and temporal gaps and processbased model simulations that estimate the impact of alternative load reduction scenarios.Due to the complexity of nutrient and sediment runoff processes and lake hydrodynamics, these simulations are often highly uncertain, and accuracy is limited by the spatial and temporal resolution of available calibration data (Borah et al., 2006;Shirmohammadi et al., 2006).This uncertainty concerns stakeholders who are using limited information to develop new regulations and policies that have significant social, environmental, and economic impacts on their community.Increasing the spatial and temporal frequency of water quality databases would support more flexible and adaptive management approaches.However, in-situ data collection campaigns are expensive, and many monitoring plans are consequently designed to meet the minimum requirements that are specified by state rules (Benham et al., 2008;Freedman et al., 2008;Jesiek et al., 2008).Alternative low-cost water quality measurements are therefore needed to support stakeholder groups that are assessing lake impairments and developing TMDL's and watershed management plans.
A largely underutilized resource for inland water quality assessments is freely available satellite imagery.Optical satellites record the radiance reflectance from the water surface at different wavelengths, providing measurements of remote sensing reflectance.This information can be translated into water quality estimates to produce spatially distributed maps of Chlorophyll-a (Chl-a), Total Suspended Solids (TSS), and Color Dissolved Organic Matter (CDOM) (Pahlevan et al., 2020;Palmer et al., 2015;Reif, 2011;Topp et al., 2020).Integrating water quality maps into operational lake monitoring and management has the potential to significantly reduce costs associated with in-situ sampling; one study estimated an annual potential avoidance cost for Chl-a measurements of up to $316 million (Papenfus et al., 2020).Satellite-based water quality maps have been used operationally within the ocean sciences community for decades but are less often applied to inland water bodies.
The research community has been developing operational inland satellite water quality products that are gaining interest from water managers.For example, the cyanobacteria assessment network (CyAN) leverages medium resolution imaging spectrometer (MERIS) to estimate cyanobacteria cell counts for more than 2000 inland water bodies and was designed to serve as an early warning indicator for detecting algal blooms (Schaeffer et al., 2018).Estimates can be accessed on the CyAN mobile interface, and several applications of this tool are being documented on story maps hosted on the North American Lake Management Society website.MERIS also produces Chl-a products for over 2300 inland water bodies in the continental US, and estimates have been shown to have high accuracy in terms of bias and error when compared to in-situ Chl-a measurements (Seegers et al., 2021).
Estimating water quality on a continental scale from satellites with higher spatial resolution, including Landsat (30 m) and Sentinel-2 (20 m), is more challenging because the sensors were designed for land cover assessments as opposed to inland water color.Machine-learning based algorithms are being developed and hold great promise (Pahlevan et al., 2022), but accuracy is currently limited due challenges with atmospheric correction (Pahlevan et al., 2021) and the availability of a robust and consistent calibration/validation dataset.While these limitations are being addressed, more accurate models can be developed on regional and local scales where dense calibration/validation datasets are available.For instance, a regional tool for monitoring turbidity in the Sacramento-San Joaquin Delta has been developed and is used operationally by water managers in California (Lee et al., 2021).
Despite potential benefits and recent technological advances, lake management groups are often reluctant to integrate satellite-based water quality estimates into their decision-making process.Barriers to adoption are attributed to (i) the required initial investment in data collection and research, (ii) limited and inconsistent accuracy comparisons between remotely sensed and in-situ water quality estimates, (iii) insufficient training for water managers on how to update satellite databases and apply quality control best practices, and (iv) a lack of demonstrations on how remotely sensed estimates can be used to complement existing decision-making tools, as opposed to replacing them entirely (IOCCG, 2018;Schaeffer et al., 2013).These barriers to adoption can be addressed through collaborative partnerships between researchers and water managers, where the final data products and tools are co-developed and designed with stakeholders in the context of relevant decision-making needs.

Research Impact Statement
Satellite images can fill gaps in water quality databases and produce accurate estimates relative to process-based model simulations that are often used in long-term planning.
The goal of this study is to demonstrate the utility of satellite-based water quality estimates in the context of a specific lake in North Carolina that is currently developing a basin-wide nutrient management plan.The study focuses on a single location to emphasize the importance of considering local context and needs, but the methodology and applications are general and can be applied to any lake or reservoir.The stakeholder group in this study is tasked with allocating nutrient load reductions across the watershed that will reduce Chl-a concentrations within the lake.The required nutrient load reductions were determined using a hydrologic-hydraulic model that was calibrated with a dense in-situ database, compiled during the 2005 to 2010 period.Stakeholders have expressed their concerns with the model accuracy, the lack of in-situ data that has been collected since 2010, and the absence of a monitoring plan that will enable them to adaptively assess and manage the proposed nutrient reduction plan.This scenario presents an opportunity to demonstrate how satellite-based Chl-a maps can fill data gaps and be translated into metrics that can inform the decision-making process.Toward this goal, in-situ measurements are currently being collected to calibrate and validate a Chl-a model using Sentinel-2 imagery.As the new database and Chl-a models are being developed, we engaged with the stakeholder group though a historical satellite study that leverages the exiting in-situ database and produces results that can be compared those of the hydrologic-hydraulic model, their current decision-making tool.We hypothesized that the satellite-based water quality estimates have higher accuracy and less uncertainty compared to the process-based model simulations.
Because the in-situ database that is needed to calibrate and validate a satellite-based model was compiled during the 2005 to 2010 period, Landsat-7 was chosen as the source of satellite imagery in this study.Landsat-7 imagery is not optimal for Chl-a retrievals due to the limited spectral bands and relatively high signal-to-noise ratios, but it has been applied successfully to estimate turbidity and TSS (Bustamante et al., 2009;Hadjimitsis et al., 2006).The Landsat-7 mission spanned from 1999 to 2022, providing over 20 years of data that can be used to analyze multi-decadal trends and fill data gaps during, before, and after the in-situ study was completed.This manuscript describes how turbidity and TSS models were developed for the study site with Landsat-7 reflectance data, using current best practices for statistical regression analysis applied to remote sensing of water quality (Seegers et al., 2018).The satellite-based estimates, in-situ data, and process-based model simulations were compared using time series plots, monthly means, and frequency curves for alternative spatial and temporal aggregations.
These comparisons are presented and discussed in the context of stakeholder-driven management questions.The resulting figures and statistical summary metrics show how satellites can augment in-situ databases spatially and temporally and provide a more synoptic view of a lake's water quality status.Completing this historical analysis while Chl-a models are being developed helped us establish a productive relationship with the stakeholder group and identify strategies for presenting future results in a way that will inform the ongoing nutrient management planning process.

| S TUDY S ITE
The state of North Carolina (NC) is currently developing Nutrient Criteria Development Plans (NCDPs) for several river basins that have impaired water bodies.Many of these impairments reflect high levels of turbidity and Chlorophyll-a (Chl-a) that are caused by sediment and nutrient loads in upstream tributaries.These loads originate from both point (i.e., wastewater treatment plants) and non-point (i.e., agriculture, including swine and poultry farms) sources, the latter considered very challenging to quantify and regulate.A survey of CAFOs (Concentrated Animal Feeding Operations) in North Carolina estimated that 10 billion gallons of wet animal waste and 2 million tons of dry animal waste are produced each year across the state (Environmental Working Group, 2016).These agricultural activities and CAFOs impact water quality within the Yadkin Pee Dee River Basin (YPDRB), located in central North Carolina and highlighted in Figure 1.High Rock Lake (HRL), the focus of this study, is within the YPDRB and is a man-made reservoir that was constructed in 1929 to produce hydroelectric power and maintain downstream water levels, but it now supports a very good sport fishery and other recreational activities.HRL water quality has been monitored since the 1970s and starting in 2004 the lake has been listed as impaired for Chl-a and turbidity and is now also impaired for pH (N.C.

Environmental Management Commission, 2021).
In response to these impairments, the North Carolina Division of Water Resources (NC DWR) worked with stakeholders to collect data and develop watershed and lake hydrodynamic/nutrient response models.These models were completed in 2016 based on data collected during the 2005 to 2010 period.A consulting firm completed modeling efforts in 2016, following several iterations and improvements that were recommended by a Technical Advisory Committee (TAC) (NC DWR, 2016).The calibrated models can simulate nutrient and sediment inputs into the lake and several water quality parameters within the lake over a distributed grid in 6-h time steps.However, the calibration statistics highlight parameters that have low accuracy (Tetra Tech, 2016).Additionally, the watershed model does not accurately represent current land cover and land use because it did not directly consider nutrient inputs from animal operations, specifically poultry.Many observers believe that poultry production has expanded rapidly in the YPDRB and the waste produced is likely to be a significant source of nutrient and sediment runoff that enters HRL (Environmental Working Group, 2016;The Piedmont Triad Regional Council, 2020), although the US Department of Agriculture (USDA) data does not fully support this view (USDA, 2021).Considering this watershed model will be used to quantify nutrient and sediment inputs for alternative management scenarios, this deficiency is concerning to stakeholders that are advocating for equal accountability and action from all members of the HRL community toward improved management.
Since the model was completed, progress toward the HRL NCDP has been slow, primarily due to a proposed rule change by the NC DWR to move from a state-wide Chl-a water quality standard to a site-specific standard.This process took nearly 5 years, and the new standard specifies that the seasonal geometric mean (geomean) of Chl-a concentrations at a station should not exceed 35 μg/L (NC SAC, 2020).Stakeholders have shared several concerns regarding the implementation of the new standard (YPDRBA, personal communication, November 12, 2021), largely related to monitoring.A Scientific Advisory Council (SAC) that studied the rule change stated that the geomean should be evaluated over a multi-year period and should not exceed the 35 μg/L threshold more than once in 3 years (NC SAC, 2020).They also indicated that NC DWR should apply the standard only to open water areas on HRL and not shallow cove areas.NC DWR currently samples HRL once every 5 years in monthly intervals from May to September at 10 stations spread across 15,000 acres of surface water.Adopting the SAC recommendation would require NC DWR to modify their monitoring process and sample HRL more frequently.Despite the SAC recommendation, NC DWR did not specify an acceptable frequency of exceedance in the proposed rule change that was submitted, and thus it was proposed as a not-to-exceed threshold.The new Chl-a standard is expected to become effective in early 2023.Meanwhile, the nutrient management planning process commenced in Fall 2022.Considering the uncertainty ingrained in the load reduction curves, stakeholders are advocating for an adaptive management approach, where the lake water quality is frequently assessed to evaluate the impact of management interventions.
However, this approach requires an extensive monitoring plan that exceeds the minimum spatial and temporal sampling frequency specified in the new Chl-a standard.

| In-situ water samples
Satellite-based water quality models are typically calibrated to in-situ "match-ups", where water grab samples are collected concurrent with satellite overpass dates and analyzed in a laboratory to derive parameters of interest.Sampling dates in existing in-situ lake water quality databases typically do not align with satellite observations, limiting the amount of data available for model calibration.However, a large in-situ database is available for HRL for the 2005 to 2010 period that aligns with the hydrologic-hydraulic model calibration period.With the F I G U R E 1 Map of North Carolina river basins and impaired water bodies, created using online ArcGIS maps (https://ncdenr.maps.arcgis.com/), and a close-up of High Rock Lake (35 o 39′ N, 80 o 18′ W), within the Yadkin Pee Dee River Basin that shows the distribution of Chl-a, turbidity, and pH impairments across the lake, adapted from Tetra Tech (2016).
support of an EPA 319 grant, NC DWR worked with LimnoTech and Alcoa Power Generating Inc. to carry out an intensive sampling campaign that produced the data needed to calibrate the process-based models.The sampling campaign was piloted with a scoping study that resulted in monthly measurements from March 2005 to August 2006 at 12 lake stations.The full campaign took place from April 2008 to April 2010.
Composite samples from the photic zone were collected at 10 stations approximately every 2 weeks, with more samples collected during the Summer.The samples were stored in a cooler and processed in a laboratory within 24 h to extract chemical and water solids parameters, including TSS.Turbidity was measured in-situ with a depth-profiling device to obtain an average reading over the photic zone.The stations are shown in Figure 2; they span the full lake area and align with the NC DWR's standard monitoring locations.
In addition to these dense datasets that supported the development of the process-based models, data associated with NC DWR's ambient monitoring program was downloaded from the National Water Quality Data Portal (www.waterquali tydata.us).These data consist of monthly samples collected over a 1-year period every 5 years and provides additional water

| Process-based model simulations
The coupled watershed hydrologic and lake hydrodynamic model was initially developed and calibrated by TetraTech in 2012 and updated by the NC DWR in 2016 (Tetra Tech, 2016) using Hydrologic Simulation Program-FORTRAN (HSPF) for the watershed model and Environmental Fluid Dynamics Code (EFDC) for the lake hydrodynamic model.HSPF is a spatially distributed model that can simulate rainfall-runoff flow rates and associated sediment and nutrient loads for any point in the watershed.The model computes sediment and nutrient concentrations in the tributaries that enter the lake at user-specified time step.These concentrations are then used as inputs in the lake hydrodynamic model.The movement of water throughout the lake is simulated by EFDC, a three-dimensional numerical model.
EFDC was coupled to WASP (version 7.53), an EPA-supported modeling system that is commonly used for TMDL assessments.WASP simulates the advection and dispersion of nutrients and other pollutants through the lake for each EFDC grid.The final calibrated model for HRL contains 538 horizontal grid cells on the surface that have dimensions of approximately 100 to 300 m.The model simulation period is Jan 2005 to March 2010, using 6-h time steps.This dataset was reduced to a daily temporal resolution by taking the values at noon to align most closely with in-situ and satellite observations.Several water quality parameters were simulated and calibrated to inform nutrient management planning efforts in the basin.However, this study focused on the TSS simulations since they can be directly compared to concurrent satellite data.
The WASP model has limitations that affect its accuracy, including the representation of particle size distributions.Consequently, the model cannot accurately simulate inflows of fine silt and clay sediments from the tributaries and cannot replicate particle settling and resuspension processes within the lake.Comparisons between TSS model simulations and in-situ measurements reflect these limitations and show that the model tends to overestimate TSS during high flow events and underestimate TSS during ambient periods that have higher clay and silt concentrations.The TSS calibration statistics are provided for each station in the final report (Tetra Tech, 2016).
Relative absolute errors varied from 55% to 125%, and correlation coefficients between simulated and observed time series ranged from 0.13 to 0.84.These limitations in model accuracy increase the uncertainty of the sediment loading curves that have been proposed for the nutrient management plan.Therefore, these calibration performance measures can be compared to those resulting from the satellitebased models to assess the value of satellite derived TSS concentrations for planning and management in the context of existing tools and information.

| Satellite imagery
Satellite images of HRL that align temporally with the 2005 to 2010 in-situ database and model simulation period were available from Landsat 7 Enhanced Thematic Mapper Plus (ETM+).Landsat 7 images are acquired over the same area every 16 days and contain reflectance values for 8 spectral bands and two thermal bands.The green, red, and near-infrared (NIR) bands were used in this study and have a spatial resolution of 30 m.The satellite was launched in 1999 and was operational until April 2022; however, in 2003 the Scan Line Corrector (SLC) failed, resulting in missing stripes in images and a loss of about 20% of pixels.All Landsat 7 Collection 2 Level 1 images that covered HRL from 1999 to 2022 were downloaded from Earth Explorer (earth explo rer.usgs.gov)unless they were completely cloudy over the study site.Pixel-level quality assessments for each image were used to mask out pixels that were flagged for clouds, cloud shadows, and high aerosols.The images were atmospherically corrected in ACOLITE (version 20210802.0),a program that uses a dark spectrum fitting approach (Vanhellemont, 2019(Vanhellemont, , 2020;;Vanhellemont & Ruddick, 2016) that is appropriate for atmospheric correction over inland and coastal waters.Figure 2

displays a Google
Earth image that delineates the in-situ stations along with an atmospherically-corrected L7 image and reflectance values (spectral curves) of stations where in-situ match-ups are available.The ACOLITE parameter settings were modified to include sun glint corrections and a fixed path for the aerosol optical thickness retrieval, based on the relatively small size of the study area.Ancillary data, including elevation from the SRTM DEM, was pulled into ACOLITE through the user's EarthData account.A total of 227 images were downloaded and processed, although only 10 of the images aligned with in-situ sampling dates.
Each 30-m pixel within an image contains latitude and longitude coordinates that can be used to identify the pixels that align with in-situ sampling locations.However, several factors should be considered when selecting pixels that are "closest" to in-situ samples, including the geometric accuracy of the satellite image, temporal lags between the satellite acquisition and sample collection time, and the proximity of the actual in-situ sampling location to the station coordinates.In addition, satellite sensors and the atmospheric correction algorithms applied to them have limited precision and accuracy, resulting in significant uncertainty in pixel-scale data.These geometric errors and data uncertainties can be mitigated by averaging over multiple pixels when extracting reflectance values for model calibration.Bailey and Werdell (2006) performed a sensitivity analysis with ocean color data to identify an optimal number of pixels and found that a 5 × 5 pixel box was the smallest size that met error tolerances.Seegers et al. (2018) reference this study and recommend taking the median value from a 5 × 5 box to support more consistent satellite water quality model performance metrics across multiple study sites.However, a global study that evaluated the accuracy of turbidity derived from Landsat 8 and Sentinel 2 reflectance values recommended a 3 × 3 box (Kuhn et al., 2019).For the HRL calibration, alternative box sizes were tested, including a single pixel and 2 × 2, 3 × 3, 4 × 4, and 5 × 5 grids.The pixel coordinates were used to find the closest surrounding pixels in the 2 × 2 grid and the 4 × 4 grid, whereas the 3 × 3 and 5 × 5 grids could evenly buffer the central pixel.Pixels that were impacted by clouds and cloud shadows were identified visually and removed from the dataset.Outlier reflectance values were also removed if they differed by more than 2 standard deviations from the mean value calculated for each grid.Finally, median values of the masked grids were extracted if more than half of the grid had valid data (Table 1).

| ME THODS
4.1 | Bio-optical model calibration Nechad et al. (2009Nechad et al. ( , 2010) ) introduced a model that relates remote sensing reflectance to turbidity and TSS concentrations that has been widely applied (Caballero et al., 2018;Dogliotti et al., 2015;Novoa et al., 2017;Petus et al., 2010) and integrated into operational tools (Lee et al., 2021;Vanhellemont, 2019).This semi-analytical model can be applied to any optical satellite sensor data and is structured as the ratio of backscattered radiation out of the water surface to the total radiation that entered the water surface.Equation (1) provides the mathematical formulation for turbidity (T; Nechad, et al., 2010), where A T is a calibration parameter, w ( ) is the measured remote sensing reflectance at a specific wavelength, B T is a second calibration parameter that accounts for measurement and model errors, and C is a third calibration parameter that controls the degree of non-linearity.The TSS model (Nechad et al., 2009) takes the same form.The model parameters have been estimated analytically from in-situ measurements of the inherent optical properties of several water bodies (Nechad et al., 2010).When satellite/in-situ match-ups are available, these parameters can be calibrated to a specific water body, The optimal wavelength to use for w ( ) in the bio-optical model is often selected empirically; multiple satellite bands are tested and the one that produces the lowest model error is selected.Several studies (Caballero et al., 2018;Dogliotti et al., 2015;Novoa et al., 2017) suggest models that switch between the green, red, and near-infrared (NIR) bands based on the understanding that shorter wavelengths can detect changes in low concentrations and longer wavelengths are more suitable for high concentrations.In this study, the relationship between all three bands and the turbidity and TSS concentrations were first explored using scatter plots.If the relationship between the satellite reflectance and TSS and turbidity concentrations did not appear to follow the non-linear semi-analytical model (Equation 1), then a linear relationship (equivalent to Equation 1 with a very large C parameter) was calibrated.This approach to determining the most appropriate model structure is consistent with that of Novoa et al. (2017).
Two alternative approaches were employed to determine the optimal model coefficients: the traditional least squares approach and the minimization of the mean absolute error (MAE) and bias.The second approach was chosen based on suggestions for bio-optical model calibration procedures presented by Seegers et al. (2018).Least squares regression approaches are sensitive to the outliers that frequently occur in bio-optical modeling datasets, and both in-situ water quality data and remote sensing reflectance values that are atmospherically corrected over inland water bodies have high error and uncertainty.Seegers et al. therefore suggest that mean absolute error and bias are used to assess model performance in addition to the more commonly applied r-squared and root mean square error metrics.Seegers et al. also suggest that the regression slope and intercept (for the in-situ water quality versus fitted estimates) is used to evaluate the model performance, and they advocate for the use of reduced major axis regression (Smith, 2009) to account for errors ingrained in the in-situ water quality estimates.
The model performance metrics and the equations applied to calculate them are summarized in Table 2. Seegers et al. recommended an alternative MAE formulation that applies a log transformation to the residuals.While we agree that this approach is more statistically sound, we selected the traditional MAE formulation shown in Table 2 because it is more commonly used by water managers and facilitates comparisons with existing models.The least squares approach was implemented using MATLAB statistical functions and the MAE and bias minimization was a brute-force approach, where the optimal least squares values were used to identify the range of feasible values.Performance metrics were calculated using the Leave-One-Out-Cross Validation (LOOCV) method, where the model was trained with a single data point removed, and then tested with the removed data.The process is repeated for each data point, and performance metrics comparing the predicted concentrations to the observed concentrations were calculated.However, all data were used to determine the optimal models and best estimates of the fitted parameters.The residuals resulting from the optimal model were plotted to detect trends and assess whether they are TA B L E 1 Summary of water quality data sources compared in this study.

Spatial coverage for HRL
In

| Model application and analysis
The optimal turbidity and TSS regression models were applied to all 227 atmospherically corrected Landsat images, 95 of which were collected during the 2005 to 2010 period when the in-situ database and process-based model simulations are available.All three data sources were compared using alternative data visualizations, including time series plots, monthly mean comparisons, and frequency curves.Aggregate metrics were also calculated, including the mean, geomean, and threshold exceedance percentages.These visualizations and metrics were produced for different levels of spatial and temporal aggregation, demonstrating how decision-making metrics may be impacted by the limited spatial and temporal frequency of in-situ databases.Satellite-based turbidity estimates for the 1999 to 2022 period were produced using the entire L7 database and were analyzed to illustrate how this information can support multi-decadal impairment assessments.

| RE SULTS AND D ISCUSS I ON
The optimal regression models are displayed in Figure 3 and the corresponding error metrics are provided in Table 3.The fitted model coefficients and their standard error and 95% confidence intervals are provided in Table 4.The green and NIR bands had poor performance metrics compared to those of the red band in all model iterations; therefore, the red band was selected as the predictor.This result is similar to previous studies that have used Landsat 7 to estimate turbidity and TSS (Bustamante et al., 2009;Hadjimitsis et al., 2006).The 2 × 2 pixel grid size consistently yielded better performance metrics than the single pixel, 3 × 3, 4 × 4, and 5 × 5 grid size and was therefore used in the median reflectance calculation.
The metrics in Table 3 were used to holistically compare the least squares approach to the MAE and bias minimization approach.The TSS least squares model slightly outperforms the MAE and bias minimization model in terms of the R 2 , RMSE, and MAE; however, the intercept, slope, and percent wins metrics favor the MAE and Bias minimization model.All turbidity performance metrics favor the MAE and Bias minimization model.The regression slopes suggest whether the satellite TSS and turbidity estimates are generally overestimated (values greater than unity) or underestimated (values less than unity).A slope that deviates significantly from a value of one indicates that the regression model might be biasing TSS and turbidity estimates.This potential bias becomes problematic when comparing the full satellite and in-situ databases because it is difficult to assess whether differences are introduced by errors ingrained in the regression model or whether differences are attributed to the spatial and temporal resolutions of data sources.Considering a primary goal of this study is to assess how alternative data sources might affect aggregate measures that are used to quantify the extent of lake impairments, the regression slope is a very important metric for model performance in this study.Therefore, the minimized MAE and bias models were selected for both the TSS and turbidity models.The RMSE of the selected TSS model, 4.94 mg/L, can be directly compared to the average RMSE reported for the calibrated hydrodynamic model, 24.5 mg/L, which is approximately five times higher.The mean absolute percent relative error of the optimal satellite TSS model is 24%, also significantly smaller than the 55% to 125% reported in the process-based model summary.
TA B L E 2 Summary of metrics used to assess regression model performance.

Metric Equation or description
Mean absolute error (MAE)

Regression slope and intercept
The slope and intercept terms resulting from reduced major axis regression of estimated (ŷ) and in-situ (y) TSS and turbidity estimates Percent wins (% wins) The percentage of estimated values that have the lowest mean absolute error for a model that is being compared to an alternative model.This metric provides a direct comparison between models, where a majority of percent wins is desirable The metrics in Table 3 suggest that the satellite regression models can produce accurate and unbiased estimates of TSS and turbidity; however, Figure 3  a limitation on uncertainty assessments.Statistical assessments with regards to assumptions of normality and constant variance were not reported in the process-based model report (Tetra Tech, 2016).We therefore assumed that the TSS model residuals were normally distributed and calculated a 95% confidence interval from the reported RMSE to enable uncertainty comparisons.
Quantitative comparisons between the in-situ data, satellite estimates, and model simulations are summarized in Figures 4-8. Figure 4 shows time series of the alternative data sources for a station in the middle section of the lake.This station (YAD152C) has historically had high Chl-a concentrations, and NC DWR has indicated in stakeholder meetings that they will use this station to regulate the new Chl-a standard.While this study focused on TSS instead of Chl-a due to data availability, the comparisons between data sources translate to future Chl-a estimates.The TSS time series plot highlights the significance of the errors in the hydrodynamic model simulation, where ambient values are consistently underestimated, and peaks are severely overestimated.Compared to model simulations, the satellite estimates are of similar magnitude to the in-situ database.Figure 4 also illustrates how the satellite data effectively fills temporal gaps in the in-situ database.This study focuses on a time period with a dense in-situ database, but since 2011, in-situ samples have been collected on a five-year cycle, so the temporal gaps are now more significant than what is shown here.
To further assess the reliability and consistency of the satellite TSS and turbidity estimates, the monthly mean values were calculated for each data source and for all ten stations during the 2005 to 2010 period.The full temporal resolution of each source (not just the match-ups) TA B L E 3 Regression model performance metrics using 2 × 2 pixel satellite grids for both the least squares and minimization of mean absolute error (MAE) and mean bias (MB) approaches.F I G U R E 4 Example of time series comparison of TSS and turbidity estimates for a sampling station in the middle section of High Rock Lake.
was used to increase the number of data points available to estimate the mean values.The 95% confidence intervals of the TSS monthly means are shown in Figure 5 and were determined using the confidence intervals of the fitted satellite model coefficients (Table 4) and the are therefore highly uncertain and potentially biased, yet they are a key metric for lake water quality assessments and can heavily influence longterm management objectives.Since the temporal and spatial resolution of satellite imagery is constant and the full extents of water bodies are captured, aggregate water quality estimates generated from satellite images are less likely to be impacted by sampling bias.Although, a seasonal analysis of the missing satellite data attributed to cloud cover should be performed to assess sampling bias on a site-by-site basis.Regardless, comparing frequency curves between alternative data sources can help stakeholders understand and explore these potential biases.
Frequency curves of TSS and turbidity were produced for HRL using both the satellite and in-situ database, enabling comparisons of the frequency of threshold exceedances.Frequency curves were first produced using the satellite/in-situ match-ups only to assess how well the curves align and the extent to which the regression model error may impact exceedance frequencies.These curves are displayed in the top row of Figure 6 for TSS and turbidity.The TSS comparison also includes the model simulations as a comparison.The turbidity axes were limited to 60 NTU because the figures would otherwise be skewed by a few extremely large values, making it difficult to distinguish differences between the curves.For both parameters, as concentration magnitudes increase, the satellite values are slightly lower than the in-situ values, but the difference is less significant in the turbidity curves.This observation is consistent with the regression model performance metrics.The model simulation TSS curves deviate substantially from both the satellite and in-situ curves as it underestimates low values and then begins to overestimate the higher values, reflecting the known physical model limitations.To assess how the frequency curves shift when the temporal frequency of observations increases, the station estimates from the 95 Landsat images from the 2005 to 2010 period were used in the analysis.To assess how the frequency curves shift when the spatial frequency of observations increases, the full lake area for the same 95 Landsat images was used in the analysis.These alternate frequency curves are displayed in the bottom row of Figure 6.To compute the curves over the full lake area, a water mask was created using the mean Modified Normalized Difference Wetness Index (MNDWI) (Xu, 2006) and a lower threshold of 0.5.The pixels within this mask were included in the areal averages.When a higher temporal frequency is used, the curves shift to the right for mid-range concentrations.If the 25 NTU turbidity threshold is used as an impairment metric, the probability of exceedance shifts just slightly above 20%.However, when the full spatial and temporal frequency of the satellite estimates is used, the curve shifts to the left and the probability of exceedance for the 25 NTU threshold drops below 10%.This difference is likely attributed to the shape of the lake and the choice of sampling stations; the highly turbid upstream reaches of the lake are generally narrower, but these stations are given equal weight in aggregate measurements of all stations.
Similar to the NC Chl-a study implemented by Keith (2014), Figure 7 contains time series of turbidity estimates compared to the exceedance threshold at specific stations and for the entire lake.The station time series compare the satellite and in-situ data for the 2000 to 2011 period, and the full lake assessment time series uses all 227 images in the Landsat-7 database, covering the 2000 to 2021 period.The geometric mean (geomean) was calculated from all lake pixels within each image and is plotted along with the percentage of lake pixels exceeding the 25 NTU threshold.The model validation metrics from the satellite/in-situ match-ups (Table 3) indicate that the turbidity model is unbiased, but differences between the satellite and in-situ estimates are present when using a larger database; the satellite turbidity estimates are slightly lower than the in-situ estimates for station HRL051, and the reverse occurs for station YAD152C.The exceedances and geomeans are lower when the full lake area is considered, compared to the individual stations.These observations are consistent with the turbidity frequency curves in Figure 6, but the time series plots show when the exceedances occur and illustrate differences in temporal frequency for each data source.Note that there are approximately twice as many satellite observations compared to the in-situ dataset for each station, and the dataset is significantly amplified if the full lake area is considered, by a factor of nearly 2000.Landsat-7 has a relatively long revisit period compared to Sentinel-2 and Landsat-8/9, so the quantity of satellite-based water quality observations will further increase when the new models are developed (Table 5).
As a final comparison, singular aggregate measures of TSS and turbidity are compared in Figure 8 to illustrate the variability of decisionmaking metrics attributed to alternative data sources and levels of spatial and temporal aggregation.Means, geomeans, and the percent of threshold exceedances are compared for each data source for concurrent data (same spatial and temporal frequency -the match-ups), the full time series that was available at the stations, and the full lake area over the full time series.The in-situ "Concurrent" measure is repeated for the "Full Time" and "Full Time & Space" plots to facilitate comparisons between the denser datasets.The mean and geomean comparisons demonstrate how these two metrics can lead to very different conclusions, especially for the model TSS simulations that have some extremely high values but consistently underpredict ambient values.In general, the geomean is of lower magnitude than the mean values and differences between the alternative data sources are less significant.Differences between the mean and geomean are less severe when using the full  temporal and spatial resolution of the satellite imagery.Both the mean and geomean satellite estimates are substantially reduced when the full lake area is used, a result that is consistent with the frequency curve and time series analysis.
Figures 4-8 highlight potential applications of turbidity and TSS estimates derived from Landsat-7, and the model performance metrics presented in Table 3 suggest that satellite estimates are accurate and reliable.However, the satellite/in-situ match-ups that were used for the model calibration/validation were manually screened to ensure they were not affected by clouds, cloud shadows, surface objects, nor excessive sun glint, and the stations were all located far from the shoreline, suggesting that they are not impacted by bottom effects, submerged vegetation, nor adjacency effects from tree canopies.This manual screening process is not feasible when processing full images with thousands of pixels, so automated quality control processes must be implemented.The Landsat-7 pixel quality assurance data was used to mask out pixels that were flagged for clouds, cloud shadows, and high aerosols, but these flags are sometimes inaccurate.This pixel quality assurance information does not flag additional issues related to remote sensing of water quality, including white caps, sun glint, adjacency effects, and bottom effects.A conservative water mask was applied to prevent the inclusion of pixels that were close to the shoreline to mask some of these pixels, but the reliability of this processing step was not thoroughly evaluated in this study.Recall that the primary focus of this study was to demonstrate the utility of satellite-based water quality estimates in the context of nutrient management at HRL.This goal has been met, but additional research and algorithm development is required to produce an operational product that successfully masks low-quality pixels.

| CON CLUS ION
The analysis presented for TSS and turbidity demonstrates how satellite imagery can be leveraged to inform ongoing lake assessments.
Our present analysis is somewhat limited by the lack of in-situ water quality and satellite image match-ups; however, the calibrated models quality information from 2005 to 2006 and 2011.The aggregated database contains approximately 760 observations of each water quality variable.Despite having a relatively large database, only 10 out of the 79 total sampling dates have a concurrent, same-day Landsat-7 satellite image.These images are spread across the 2005 to 2011 period and are listed in Figure 3.The number of match-ups are further reduced due to missing values from the in-situ database and null values in the satellite images.The final in-situ calibration/validation database contains 42 turbidity and 39 TSS values.

F
Illustration of the satellite image processing procedure.The left image is a high-resolution true-color image from Google Earth (from September 2020) that shows the locations of in-situ stations.The upper right image displays a true-color Landsat-7 surface reflectance image, produced by the ACOLITE software using the blue, green, and red bands.The image on the lower right displays the satellite reflectance values for 2 × 2 pixel grids for selected stations, derived from the processed Landsat-7 image.The in-situ turbidity (NTU) and TSS (mg/L) values associated with the reflectance data are provided in the lower right image legend.The lake center is located at approximately 35°39′ N and 80°18′ W.
illustrates the model limitations.Out of the forty (approx.)data points used to calibrate the model, relatively few of them have high TSS and turbidity values, and most of them occur on the same date in March 2005.The optimal model parameters are therefore highly dependent on a single satellite image.Quantifying model uncertainty is challenging considering the model residuals contain many outliers and are not normally distributed, as shown in the normal probability plots in Figure3.The confidence interval for the turbidity model was not calculated because the turbidity variance increases with increasing magnitude, and Bootstrapping is not appropriate for variables with non-constant variance.Addressing this issue would require us to transform the data and use a different model structure than the commonly applied semi-analytical turbidity model.We decided to maintain the common model structure and note that the non-constant variance poses F I G U R E 3 Optimal total suspended solids (TSS) and turbidity regression models.The top row displays the reflectance data, color-coded by date, and the fitted linear and non-linear models along with their statistical performance metrics.The 95% confidence interval of the fitted model is shown for TSS but could not be determined for the turbidity model because the residuals have non-constant variance.The middle row shows the fitted versus in-situ data, from which the regression slope and intercept metrics were calculated, and the bottom row contains the normal probability plots of the residuals.
24.5 mg/L RMSE reported for the process-based model simulations.Error estimates for the in-situ TSS estimates were not available, but these values are considered uncertain based on the laboratory procedure, estimated to have a precision of ±20% (APHA-AWWA-WEF, 2005).The monthly means calculated from the satellite and in-situ datasets have very similar trends and magnitudes, especially when compared to the model simulation values.The in-situ estimates lie within the 95% confidence interval of the satellite estimate with the exception of December, although the difference is very small.The 95% confidence intervals of the model TSS means is significantly larger than those of the satellite, and the best estimates of the model means frequently fall outside of the satellite TSS confidence intervals.The relatively narrow uncertainty and general consistency between the monthly mean satellite and in-situ estimates suggests that satellite-based estimates can be used to fill temporal gaps while adequately capturing seasonal trends.The severity of lake impairments is often measured using thresholds on the water quality parameters of concern.These thresholds are established under the guidance of the EPA and are incorporated in an assessment methodology document that is approved by the Environmental Management Commission (EMC) for each biennial assessment(N.C.Environmental Management Commission, 2021).The current N.C.turbidity standard specifies a threshold of 25 NTU in lakes and reservoirs.With current in-situ sampling practices, the number of exceedances observed depends on the spatial and temporal frequency of sampling, which is limited by the State's financial resources.Exceedance frequency estimates

F
Monthly mean TSS and turbidity estimates from satellite images and model simulations compared to in-situ values, for all stations during the 2005 to 2010 period.The whiskers in the left image represent the 95% confidence intervals.

F
Exceedance frequency curves for TSS and turbidity concentrations using satellite/in-situ match-ups only (top row), and full spatial and temporal resolution of satellite images (bottom row).F I G U R E 7 Time series comparison of in-situ and satellite turbidity estimates for station HRL051 (top), and station YAD152C (middle) from 2000 to 2011, and satellite-based turbidity geomeans and percent exceedances calculated over the full lake area, from 2000 to 2021.TA B L E 5 Turbidity summary metrics for select stations during the 2000 to 2011 period and the full lake area for the 2000 to 2021 period.Both satellite and in-situ data are included in the station comparisons, and only satellite data can be presented for the full lake area over the extended time period.
have very low bias and the satellite estimates capture seasonal trends that are consistent with the in-situ data.The satellite model for TSS out-performed the process-based model TSS simulations based on statistical performance metrics and the time series, seasonal mean, and F I G U R E 8 Comparison of aggregate decision metrics for TSS (top) and turbidity (bottom) using alternative levels of spatial and temporal aggregation.
The standard errors of the fitted coefficients were calculated, and bootstrapping was applied to identify 95% confidence intervals, considering the errors were not normally distributed.Bootstrapping consists of randomly sampling from an estimated distribution with replacement and fitting a new model for each bootstrapped sample, allowing a distribution of the model estimates to be created that provides the 95% confidence interval.
Selected models and fitted coefficients, and their estimated error and confidence intervals.