Validity of the Landsat surface reflectance archive for aquatic science: Implications for cloud‐based analysis

Originally developed for terrestrial science and applications, the US Geological Survey Landsat surface reflectance (SR) archive spanning ~ 40 yr of observations has been increasingly utilized in large‐scale water‐quality studies. These products, however, have not been rigorously validated using in situ measured reflectance. This letter quantifies and demonstrates the quality of the SR products by harnessing a sizeable global dataset (N = 1100). We found that the Landsat 8/9 SR in the green and red bands marginally meet the targeted accuracy requirements (30%), whereas the uncertainties in the blue and coastal‐aerosol bands ranged from 48% to 110%. We further observed > +25% biases in the visible bands of Landsat 5/7 SR, which can introduce an apparent downward trend when applied in time‐series analyses combined with Landsat 8/9. Users must exercise caution when using this archive for trend analyses, and progress in atmospheric correction is required to foster advanced applications of the Landsat archive for aquatic science.

apparent downward trend when applied in time-series analyses combined with Landsat 8/9.Users must exercise caution when using this archive for trend analyses, and progress in atmospheric correction is required to foster advanced applications of the Landsat archive for aquatic science.
Studying spatiotemporal variability of water quality (WQ) is essential to understanding the impacts of anthropogenic pressure and climate change on drinking water supplies, reservoirs, lakes, rivers, and coastal estuaries (Mouw et al. 2015).However, in situ measurements of WQ parameters across vast geographic areas are costly and demand extensive efforts.Government-funded satellite remote sensing programs (like the US Geological Survey [USGS] and National Aeronautics and Space Administration's Landsat mission) allow WQ practitioners to overcome this barrier by providing well-calibrated observations over inland and coastal waters at no cost (Zhu et al. 2019), paving the way for examining spatiotemporal trends in WQ (Mouw et al. 2015).Research on algorithm development and validation, however, continues to evolve to improve the consistency and credibility of remote sensing products in aquatic environments (Pahlevan et al. 2021).One of the largest sources of uncertainty in satellite-derived WQ parameters stems from atmospheric correction, computed differently over aquatic and terrestrial surfaces.Several atmospheric correction methods exist for terrestrial applications, such as the Land Surface Reflectance Code (LaSRC) used as the basis for the operational processing of Landsat 8/9 imagery from 2013 to the present.Examples of water-specific atmospheric correction processors include the Atmospheric Correction for OLI lite (ACOLITE; Vanhellemont 2019), image correction for atmospheric effects (De Keukelaere et al. 2018), and SeaWiFS Data Analysis System (SeaDAS;Franz et al. 2015), which is used to generate the USGS Landsat Provisional Aquatic Reflectance (AR) products (only for Landsat 8-9).However, reflectance products from these methods are not readily available for the entire Landsat archive, restricting their use for global-or continental-scale applications.For example, as the Landsat AR products are still on-demand, the end-user communities (e.g., limnologists, hydrologists) can only count on the data available in the cloud for largescale studies (e.g., long-term time series at global or continental scales).For the heritage missions, Pahlevan et al. (2018) demonstrated SeaDAS-corrected reflectance products, however, because of the random/systematic noise in their near-infrared (NIR) and shortwave infrared bands, the extension of this processing to Landsat 5/7 images has been put off until a more viable processing method is formulated and tested.
With the free and open-access Landsat archive (Wulder et al. 2022), web-based platforms have been developed to enable rapid large-scale analyses of this invaluable Earth observation database.For example, cloud access and computing platforms, such as Google Earth Engine (GEE) or Microsoft Planetary Computer (MPC), offer a publicly available and efficient computational environment for redistributing, storing, and processing big (spatial) data from Federally funded satellite programs like Landsat (Gorelick et al. 2017;Microsoft Open Source et al. 2022).For example, the USGS Landsat Collection 2 Level-2 surface reflectance (SR) products (or ρ s ) are available for Landsat 4-9 in GEE and MPC, which offers an easy-to-use platform that has been harnessed not only for terrestrial science and applications (Wulder et al. 2022), but also for WQ assessments (Kuhn et al. 2019;Ross et al. 2019;Topp et al. 2021;Zhang et al. 2022).For instance, Hu et al. (2022) employed 35 yr of Landsat data to evaluate trends in eutrophication in China.For Taihu Lake, Yin et al. (2021Yin et al. ( , 2023) ) used the SR products to analyze trends in water clarity and chlorophyll a (Chl a) concentration.The SR products have also been used to develop global Secchi Disk depth (Z sd ) algorithms (He et al. 2022;Zhang et al. 2022), and assessment of lacustrine algae blooms (Hou et al. 2022).In addition, Ross et al. (2019) used a previous version of the SR products (Collection 1) to develop the AQUASAT dataset-a compilation of more than 600,000 matchups between SR and WQ parameters.Despite the reported utility of SR products for WQ assessments, these products have not been specifically developed for aquatic environments, and their viability and application in this domain have not been fully examined.Ilori et al. (2019) assessed the quality of the Landsat 8 SR products (2013)(2014)(2015)(2016)(2017) in coastal waters with a limited match-up dataset (N $ 50) and reported large uncertainties compared to equivalent products from ACOLITE and SeaDAS, corroborating their lack of accuracy for WQ applications in coastal waters.
This study aims to evaluate the accuracy of the USGS Landsat SR archive in global inland and near-shore coastal waters.To do so, a sizeable in situ radiometric dataset (N $ 12,000) was compiled and matched with the Landsat archive .We report the quality of the Landsat SR products collectively and discuss the results and their implications for future short-and long-term WQ assessments.

Study site and field data
The in situ data corresponds to multiple sources from globally measured over-water radiometric spectra referred to as spectral remote sensing reflectance (R rs [λ]) (N $ 12,000; Fig. 1; λ is excluded hereafter for brevity).The first one is the GLORIA dataset (Lehmann et al. 2023).The second one is a compilation of remote sensing data obtained by other partners in Pahlevan et al. 2022, which were not included in GLORIA.The last dataset is from Brazilian inland waters (Maciel et al. 2021).Reader is referred to each reference for a detailed description of the datasets.
R rs is calculated via the ratio of water-leaving radiance (L w ) and downwelling irradiance just above the water surface (E s ).To compare with the Landsat SR products, R rs is multiplied by p to generate unitless SR (ρ s ) (Eq. 1; Mobley 1999).

Landsat Collection 2 Level-2 SR, match-up identification
The Landsat Collection 2 SR products are based on two different algorithms.For OLI and OLI-2, the LaSRC algorithm (Masek et al. 2020;United States Geological Survey 2023) has been conditioned by the USGS to run operationally, but the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) algorithm is applied to TM and ETM+ data (Schmidt et al. 2013).The main difference between these algorithms is how they obtain the atmospheric parameters.Both algorithms obtain the Aerosol Optical Thickness (AOT) directly from the imagery using the Dark Dense Vegetation method (Kaufman et al. 1997).LEDAPS employs the relationship between the blue and red spectral bands to derive AOT at 550 nm (Claverie et al. 2015) and uses a fixed continental aerosol model.On the other hand, LaSRC assumes an urban clean aerosol model, and the inversion of AOT is performed using the relationship between the coastal-aerosol (CA)/red or blue/red spectral bands (Vermote et al. 2016).In addition, LaSRC adjusts the aerosol model through a spectral dependence coefficient optimized by using CA and blue bands for AOT inversion.The original LaSRC source code was translated by the USGS to run computationally efficiently and reduce archival processing costs (see Masek et al. 2020 for a general overview).It is important to note that Landsat SR products that are accessed and analyzed in GEE or MPC are redistributed and cannot be considered authoritative USGS Landsat products, as the official source is the USGS, and data provided in cloud platforms is a copy from USGS.
To extract SR from the Landsat SR archive, we used the MPC's SpatioTemporal Asset Catalog.We identified Landsat scenes matching our field dataset ("Study site and field data" section) by accepting a AE 2-d temporal gap for inland matchups.This decreased the number of match-ups to 5398.Representative SR products were extracted by taking the median value of a 5 Â 5-element window to suppress various noise sources (e.g., whitecaps).After that, we reduced the timedifference condition to AE2 h for our near-shore coastal matchups to minimize the effects of changes in water mass due to tide events.The second criterion was the exclusion of a station if any pixels within the 5 Â 5-element window were flagged as land or cloud/cloud shadows for which the pixel_QA band was applied.In addition, samples with ρ s < 0 in any visible or NIR spectral bands (VNIR) were removed.We also added a criterion to exclude outliers defined as match-ups with differences between in situ measured and satellitederived ρ s in the green band > 200% (Pahlevan et al. 2021).These filtering criteria led to a total of 1100 data points out of the initial $ 12,000 match-ups (i.e., N = 64 For TM, N = 543 for ETM+, N = 454 for OLI, and N = 39 for OLI-2).The final dataset is available on Maciel (2023).

Optical water types
We evaluated the accuracy of the SR products for different aquatic conditions by classifying the spectra according to their Optical Water Types (OWT) and computing the associated performance metrics for each OWT.We classified water spectra into seven classes according to Pahlevan et al. (2021), which is a coarser categorization scheme than that of Spyrakos et al. (2018).Briefly, each reference spectrum (400-750 nm) was normalized by its area under the curve and compared to each reference class through the Spectral Angle Mapper algorithm (Clark et al. 1993).Our reference spectra followed the ones described in (Pahlevan et al. 2021) with a gradual transition from blue waters (OWT1) to highly turbid waters represented by OWT7.We selected only the combination of OWT/sensor with a sample size larger than 50 to provide statistically robust analyses.See the Supporting Information for further information about OWTs.

Time-series generation
To evaluate the consistency of the SR products in timeseries analyses, we examined the Landsat archive over three optically distinct water bodies across the United States.These regions include Crater Lake (deep oligotrophic), western Lake Erie (shallow eutrophic), and Lake Okeechobee (shallow, highly turbid, and eutrophic), covering broad limnological conditions.The SR products were computed by applying median operators spatially (approximately 3 km Â 3 km windows) and temporally (monthly).To further mitigate noise in the time-series data, a moving 3-month median window filter was applied (Ho et al. 2019).More information on the timeseries generation is available in Supporting Information.

Accuracy assessment
Five statistical metrics are used to evaluate the quality of SR products.The two principal metrics are the median symmetric accuracy (ε) and median bias (β) (Morley et al. 2018;Seegers et al. 2018).These metrics provide accuracy and bias in percentage (%) are interpretable, zero-centered, calculated in log scale, and are resistant to outliers.In addition to these metrics, we added the mean absolute percentage error (MAPE), root mean square log error (RMSLE), and the slope of linear regression (S) are also computed to allow comparability with previous studies.ε and MAPE quantify accuracy in percentage (with zero as the ideal performer), β provides bias in percentage, and allows negative and positive values to report under(over)estimation, and RMSLE captures mean errors in log scale.

Results
The accuracy of ρ s in the visible spectral bands is presented in Fig. 2, but the complete statistical metrics are tabulated in Table 1.For TM, ε ranges from 37.3% to 52.6%, with a mean β of $ 30%.For ETM+, ε is between 31.5% (red spectral band) and 62.3% (blue spectral band), with b from 19.5% to 61.5%, indicating overestimated satellite-derived ρ s .For these heritage Fig. 2. Accuracy assessment of SR products for TM, ETM+, and OLI's blue, green, and red spectral bands.See Table 1 for the full statistics including the other bands and OLI-2.
The time-series results for each lake are shown in Fig. 4. We noticed biases in time-series generation using TM and ETM+ green spectral bands relative to that of OLI (see the Supporting Information for details).The relative differences in the median ρ s range from > 100% over the Crater Lake to $ 40% across the highly scattering waters of western Lake Erie.

Discussion
The USGS Landsat SR products have revolutionized human-scale biophysical studies that require consistent and atmospherically corrected products (Dwyer et al. 2018).However, the aquatic remote sensing user base tends to apply water-tuned atmospheric correction processors to account for confounding effects of the atmosphere and meet the targeted accuracy requirements in aquatic reflectance products (i.e., 30%; GCOS 2016; Pahlevan et al. 2021) for generating reliable downstream WQ products.Although this requirement could constrain relevant SR applications (e.g., WQ), we noted that the Landsat 8/9 OLI ρ s products in the green and red spectral bands carry marginally acceptable uncertainties (ε $ 30%) that closely resemble that of selected processors whose performances were reported in fig.A1 in Pahlevan et al. (2021).
This observation supports downstream products (2013 to present) that employ these two spectral bands for assessing changes in global WQ.However, the equivalent TM and ETM + spectral bands suffer from large biases, limiting their utility for end-users' applications and algorithms (e.g., ocean-color three bands; O'Reilly and Werdell 2019) where absolute ρ s values are critical.The overestimation in the green and red bands hinders long-term aquatic studies by introducing apparent downward trends in the time-series analysis of ρ s from TM -ETM+ to OLI -OLI-2 (see Fig. 4).These portrayed trends could lead to the mischaracterization of long-term WQ studies.For example, Hu et al. (2022) evaluated Trophic State Index (TSI) trends in Eastern China between 1986 and 2020.They observed an upward trend between 1986 and 2013, with a downward trend after 2013.This trend could be attributed to OLI's capability relative to TM and ETM+ in retrieving Chl a (and TSI) over clearer waters due to its enhanced radiometric fidelity (Pahlevan et al. 2014).This improvement may appear as a downward trend in time-series analyses, leading to erroneous conclusions on temporal changes in WQ.The noticeable range of biases aligns well with our match-up and OWTs analyses ("Results" section), where the quality of SR tends to improve for waters with higher water-leaving radiance.It is important to elucidate that these differences can be partially explained by the differences in the RSRs of each sensor/band (Chander et al. 2013;Roy et al. 2016;Helder et al. 2018).Nonetheless, such differences were found to be very small in the visible bands (< 11%; see the Supporting Information).Overall, with Landsat 7/ETM+'s nominal science operations ending on 05 April 2022, the user community will continue to rely on OLI and OLI-2 products to track changes in aquatic ecosystems in the 2020s.Such a discontinuity in SR products may hamper these efforts unless more appropriate reflectance products are operationally produced in future Landsat collections.
Although no major biases are identified in OLI, and OLI-2's CA and blue spectral bands, their overall uncertainties (from 50% to 110%) appear inadequate for aquatic applications.Hence, algorithms that harness the blue spectral bands or all four visible spectral bands of Landsat 8/9 OLI/OLI-2 SR products may be susceptible to substantial uncertainties.Identifying the source of bias in the Landsat 5 TM and Landsat 7 ETM + SR products is challenging and requires further investigation.For instance, Pahlevan et al. (2018) applied system vicarious calibrations to remove observed biases (albeit in the opposite direction of what is found here) in TM and ETM+ TOA observations.It is, thus, imperative to determine whether the biases in SR products originate from TOA observations or can be attributed to algorithmic inaccuracies in the atmospheric correction scheme.Our observations further corroborate the need for improvements in atmospheric corrections methods to enable advancements in the retrieval of WQ constituents and limnological studies (e.g., estimating particle type and size; Reynolds et al. 2016).

Conclusions
Although developed for terrestrial science and applications, the SR products have been applied in several aquatic remote sensing studies due to their accessibility.Using a comprehensive in situ dataset of global inland and near-shore coastal waters (N = 1100), for the first time, we quantified the quality of the Landsat SR archive from TM, ETM+, OLI, and OLI-2 for aquatic applications (e.g., WQ studies).We found that OLI and OLI-2 green and red spectral bands narrowly meet the current accuracy requirements (30%) in reflectance products for aquatic science and applications, whereas their validity in the CA, blue, and NIR spectral bands are inadequate.We further identified large biases in the TM, and ETM+ SR products across all the VNIR bands, restricting their utility for limnological studies.More importantly, this bias can distort future time-series analyses based on multimission data (from TM, ETM+, OLI, and OLI-2) by exhibiting apparent downward trends (or darkening) in the SR products.Therefore, care must be exercised when analyzing the Landsat archive for such applications.Our results suggest that aquatic reflectance products obtained from more rigorous processing techniques should be harnessed for their reliability.Until their broad accessibility via existing cloud-computing platforms, the user community may treat SR-based trend assessments (1984 to present) with extra care.

Fig. 1 .
Fig. 1.Study area with locations where satellite to in situ radiometric data (match-ups) were available for evaluating the Landsat SR archive.Samples in South Africa (N = 2) are not shown in this Figure.Basemap Source: GEBCO Bathymetric Grid.

Fig. 3 .
Fig. 3. Variability of ρ s in each OWT.The color palette varies from blue (best quality) to red (poorest).

Fig. 4 .
Fig. 4. Time series of the green bands for arbitrarily selected locations in three optically different lakes across the United States.The SR products from TM and ETM+ are biased high to varying degrees relative to OLI.Temporally median ρ s and the geographic location for the center pixel are annotated.See Supporting Information for the blue and red bands.

Table 1 .
Summary of accuracy assessment of SR products.