SPACE-BORNE CLOUD-NATIVE SATELLITE-DERIVED BATHYMETRY (SDB) MODELS USING ICESat-2 and SENTINEL-2

Shallow nearshore coastal waters provide a wealth of societal, economic and ecosystem services, yet their topographic structure is poorly mapped due to a reliance upon expensive and time intensive methods. Space-borne bathymetric mapping has helped address these issues, but has remained largely dependent upon in situ measurements. Here we fuse ICESat-2 lidar data with Sentinel-2 optical imagery, within the Google Earth Engine cloud platform, to create openly available spatially continuous high-resolution bathymetric maps at regional-to-national scales in Florida, Crete and Bermuda. ICESat-2 bathymetric classified photons are used to train three Satellite Derived Bathymetry (SDB) methods, including Lyzenga, Stumpf and Support Vector Regression algorithms. For each study site the Lyzenga algorithm yielded the lowest RMSE (approx. 10-15%) when compared with validation data. We demonstrate a means of using ICESat-2 for both model calibration and validation, thus cementing a pathway for fully spaceborne estimates of nearshore bathymetry in shallow, clear water environments.


Introduction
Accurate and current bathymetric maps are essential for coastal management. Emerging demands of the blue economy will open up new opportunities, but have significant impacts on coastal regions and habitats globally (LiVecci et al, 2019). Several key markets that will demand resources from the nearshore environment have been identified for future and continued development including, marine navigation, aquaculture, climate change adaption and mitigation, coastal resilience and disaster recovery. Technological innovation that yields contemporary nearshore seafloor maps with regular repeat observations will enable proper Marine Spatial Planning (MSP) and sharing of coastal waters (Lester et al, 2018;Foley et al, 2010). This is particularly important for Big Ocean States (Small Island Nations) that have limited data access (Flower et al., 2020) or where data does not match local needs (Kelman and west, 2009).
Competing sectors of the Blue Economy will change the bathymetry of nearshore waterways and in a variety of ways. Dredging to meet shipping and navigation demands will increase channel depth and quantity of material spoil (Bishop et al, 2006). Similarly, aquaculture practices such as kelp and oyster farming will reduce erosion and increase sediment accumulation (Zhang et al, 2020;de Paiva et al, 2018). These practices will change the structure of the seafloor and have local-scale implications for sub-aquatic ecosystems and nearshore navigation, by causing rapid changes in benthic morphology. In addition, nearshore structure is increasingly being looked to as a source of nature-based risk reduction solutions, including the use of natural barriers to sea level rise and storm surges (Spalding et al, 2014). Accurate maps of the seafloor are a critical parameter in measuring the wave attenuation of benthic habitats, like seagrasses and coral reefs, (Narayan et al, 2016) and the erosion potential of dune-lined beaches (Schweiger et al, 2020), but up-to-date and repeatable observations of sediment stability and structural complexity are needed

Accepted Article
This article is protected by copyright. All rights reserved. (Christianen et al, 2013: Harris et al, 2018. These and other processes are not fully captured by current, openly available bathymetry data (Wolfl et al, 2019), which are limited in spatial and temporal resolution. Increasing this resolution requires financial investment and substantial computation and energy needs to conduct more comprehensive or frequent surveys, particularly to capture the detail required in coastal environments.
There are several free and open initiatives that procure bathymetric data such as the  Kapoor, 1981). While these initiatives ensure that global bathymetric data are freely available across open oceans, these data are inadequate in shallow waters where the vertical and spatial resolution is insufficient. Singlebeam (SBES) and Multibeam Echo Sounders (MBES) are commonly utilized for local-scale high-resolution mapping (Janowski et al, 2018) but collecting data in shallow water is hazardous and time consuming.
Bathymetric lidar data acquired from airborne systems (Kim et al, 2017) circumvent navigation in busy shipping traffic, however they are economically expensive and time-intensive to gather over large areas.Thus existing methods of acquiring bathymetric models face frequent and prohibitive limitations.
Recent advances in Satellite-Derived Bathymetry (SDB) using commercial and open source multispectral Earth Observations have lead to new methodological developments and applications through increased spatial resolution and improved estimations (Traganos et al, 2018a;Caballero et al, 2019;Stumpf, 2020a, 2020b;Casal et al, 2019Daly et al, 2020;Mateo-Perez et al. 2020, Li et al, 2019Poursanidis et al, 2019a;Lyons et al, 2020).

Accepted Article
This article is protected by copyright. All rights reserved.
number of SDB studies have improved water depth retrieval through empirical correlations of surface reflectance with field-acquired depth points (Lyzenga et al., 2006, ;Stumpf et al, 2003); machine learning that combines surface reflectance and in-situ data (Pan et al, 2015;Geyman and Maloof, 2019;Albright and Glennie, 2020); automatic tuning of SDB to water column conditions (Kerr and Purkis, 2018;Li et al, 2019); tracking surface wave kinematics based on the temporal offset between the satellite's bands (Daly et al, 2020); and physics-based inversion algorithms that produce highly-accurate SDB estimations but at the expense of required increased computational power . Despite the advances in SDB and the recent increased availability of cloud-computing platforms such as the Google Earth Engine and Amazon Web Services, most approaches still rely on airborne/shipborne data and local computing resources.
An ability to widely collect consistent open source SDB calibration and validation data will alleviate some of the limitations in deriving routine nearshore bathymetry. The first release of ICESat-2 data highlighted the potential to acquire global bathymetric lidar data in shallow (<40m) coastal waters (Markus et al, 2017;Parrish et al, 2019). This exciting new capability is especially timely for coastal ecosystem studies as it paves the way for purely spaceborne SDB approaches in the optically shallow global seascape realm (Albright andGlennie, 2020, Ma et al. 2020). Such fusion approaches between satellite-based multispectral imagery and lidar data are now feasible and could significantly reduce the needed time, costs, and computation to produce seamless SDB maps, especially in data poor regions.
In the present study, we have developed one of the first open source fully space-based approaches to measure nearshore bathymetry in optically shallow waters. The SDBs presented here are derived from a newly designed cloud-native workflow within the Google Earth Engine (GEE) cloud platform (Gorelick et al, 2017) using multi-temporal Sentinel-2A/B data (Traganos

Accepted Article
This article is protected by copyright. All rights reserved.

Sentinel-2
The estimation of satellite-derived bathymetry is based on Copernicus Sentinel-2 data. Sentinel-2 is a twin-satellite mission with 10-m spatial resolution and a 5-day revisit period, available since

Accepted Article
This article is protected by copyright. All rights reserved.
L1C yielded higher quality at some locations due to increased data availability. L2A and L1C data were not mixed at a single study site. Atmospheric correction was not performed on the L1C data as this is computationally expensive and allows a comparison of the errors bewtween L2A and L1C data. year period (1993-2012Sutherland et al, 2013). Data was derived from a variety of measurement techniques including topographic surveys, bathymetric lidar, gridded and raw multibeam bathymetry, and nautical chart sounding depth, that were all combined to create a 30 m DEM. The Biscayne Bay (S200) DEM has a 30 m spatial resolution which was derived from nearly 150,000 soundings collected within the bay from 12 different surveys over a period of 63 years (1930-1993NOAA, 1998). The data was gridded by prioritizing most recent data and/or highest resolution data. Where data coverage was sparse, generic interpolation and extrapolation models were used to fill gaps. No information on the accuracy of the datasets are disseminated with the data.

Singlebeam Sonar
Using low cost fishfinder tools, the collection of bathymetry data at the Gulf of Chania (Crete) was completed June -July 2020 based on the method of Poursanidis et al (2018), covering a

Accepted Article
This article is protected by copyright. All rights reserved.

Biscayne Bay, Florida
Biscayne Bay is an estuary on the east coast of South Florida (USA) that is ecologically diverse and serves as a nursery for many marine species. The bay is heavily influenced by human activities such as boating, diving, recreational and commercial fishing, and serves as a major shipping port where there are incised channels that can exceed 15 m in depth and require regular dredging. The benthic habitats of the central and southern regions of the bay are dominated by

Accepted Article
This article is protected by copyright. All rights reserved.

Bermuda
Bermuda is a subtropical Caribbean Island over 1200 km north of the Bahamas, and 965 km east of North Carolina. It is surrounded by the northernmost coral reef assemblage in the Atlantic Ocean and also includes seagrass beds, mangroves, salt marshes, and rocky and sandy intertidal areas (Coates et al, 2013). The reef complex forms a 2 to 10 m deep, 1.5 km wide rim that surrounds the northern part of the islands. Patch reefs within the lagoon can be found within 1-2 m of the water surface. Marine transport is important to the island as most resources are imported and shipping channels have been modified to accommodate large cruise ships. Channel dredging has led to water quality issues in nearby regions (Lester et al, 2016).

ICESat-2 Bathymetric Photons
ICESat-2 ATL03 data was queried via the Open Altimetry online portal (https://www.openaltimetry.org) where it was subset and downloaded over our regions of interest from the NSIDC server in HDF5 format (Figure 1, a, b, c). The ATL03 product does not record the true location and elevation of sub-aquatic photons due to the refraction of the laser at the air/water interface and the delayed travel time of the laser through the water column. To correct the offset, accurate longitude, latitude, and photon height corrections were performed using the methods of Parrish et al. (2019). This uses the spacecraft geometry and incident laser refraction at the water surface to correct target photon depths, using inputs that include the ICESat-2 instrument wavelength (532 nm), water salinity (35 Practical Salinity Units (PSU)) at atmospheric pressure

Accepted Article
This article is protected by copyright. All rights reserved. and location specific ocean temperatures. Photons were transformed to orthometric height (EGM2008), and local UTM zone. Water surface photons were manually selected using an interactive Python plot and user interpretation The photons located beneath the water surface were then corrected for refraction. Longitude and latitude corrections were minimal and photon depth correction was approximately 25% shallower than the value recorded in the ATL03 data, in line with the calculations of Parrish et al (2019). Bathymetric photons were isolated from the corrected photons using an interactive Python plot and user interpretation. In areas where no seafloor surface was identified, no bathymetric photons were selected. Examples of the refraction corrected photons and selected bathymetric photons are given in Figure 1 (g, h, i). For each ICESat-2 pass only the high-power beams were utilized. When input into the SDB model, only ICESat-2 depth photons with a system assigned "Land Confidence" level of 4 were used. This system assigned value is determined by the sensor based upon the detected signal-to-noise ratio and ranges from 0 to 4 from least to most confident. Based on preliminary models, the use of photons with the highest confidence values alone improved the model accuracy. Photons were aggregated into a Sentinel-2 10-m resolution grid where multiple photons within a grid cell were averaged (Figure 1 d, e f).

Accepted Article
This article is protected by copyright. All rights reserved.

Accepted Article
This article is protected by copyright. All rights reserved.

Sentinel-2 Satellite Derived Bathymetry (SDB)
We developed a novel cloud-native geoprocessing workflow to pre-process and synthesize Sentinel-2 data in order to to estimate and scale up the SDB models. This new cloud-based workflow builds on the pre-processing and SDB estimation developed by Traganos et al, (2018a) and Traganos et al, (2018b). Firstly, an evolved cloud mask was developed that combines the GEE-based Sentinel-2 Cloud Probability dataset, the QA60 band, and metadata information.
Next, a multi-temporal mosaic was derived using four Sentinel-2 bands; B1-coastal aerosol (resampled to 10 m), B2-blue, B3-green, and B4-red, as these wavelengths are less susceptible to light attenuation than longer wavelength bands. Based on our work within both temperate and tropical optical conditions (Poursanidis et al., 2019b;, the use of the first four Sentinel-2 bands yields increased accuracies in comparison to the exclusion of one or more bands. This is particularly pertinent to the first 5 m of depth within which one can observe the minimum light attenuation in the water column. We derived the mosaic using the 20 th percentile from the cloud masked Sentinel-2 reflectance data cube at each region, which relates to the darker values of the entire reflectance range to ensure the reduction of brighter common natural interferences in satellite images over coastal regions, such as sunglint, turbidity, waves, and remaining clouds and haze.
After the initial pre-processing steps, three cloud-based SDB models were derived using the relationship between the multi-temporal satellite image mosaic and the ICESat-2 datasets. We

Accepted Article
This article is protected by copyright. All rights reserved. method, cluster-based regression with Lyzenga (CBL) algorithm, combined CBR and the loglinear regression SDB method developed by Lyzenga et al. (2006). Prior to selecting the relevant bands for modeling, we performed preliminary statistical tests with various combinations of Sentinel-2 bands with this method, and acquired the best accuracy with the multilinear regression of B1, B2, B3, and B4. The second method, cluster-based regression with Stumpf (CBS) algorithm, combined CBR and the log-ratio regression SDB method developed by Stumpf et al. (2003). Our preliminary statistical tests with various cluster-based ratios identified the B3/B2 ratio as the most accurate model(lowest RMSE). The implementation of k-means clustering (Arthur and Vassilvitskii, 2007) prior to the regression in these two methods ensures that by splitting the multi-temporal data into numerous classes, each band adheres to the linear model assumption of homogenous bottom albedo, at each study site. This was particularly beneficial in Bermuda and Biscayne Bay which have variable bottom types. The variable benthic habitats in Bermuda and Biscayne Bay prompted the use of five clusters in the CBL and CBS SDB models.
In contrast, Crete features a predominantly homogeneous sandy seabed, averting us from applying clustering prior to the empirical linear and ratio models. In the third method, SVR, we implemented an Epsilon-SVR with the default GEE parameters and a linear kernel to map the first four log-transformed Sentinel-2 bands to a high-dimensional feature space to fit a regression hyperplane. The relationship between ICESat-2 and Sentinel-2 data, for the best performing model, at each study site is presented in supplementary Figure S1.

Reference Data and Accuracy Assessment
We used three different training/validation approaches for each study site which was driven by the availability and quality of reference bathymetric DEM data. Training and validation sample

Accepted Article
This article is protected by copyright. All rights reserved.
sizes were based on an 80/20 rule with 80% of the sample size used for training and 20% used for validation. For the country of Bermuda, we used six ICESat-2 transects for training (5,173 points) and two ICESat-2 transects for validation-reduced from 2,510 to 1,293 points to fit the 80/20 ratio. Here, the training and validation data were collected on different days, during different passes, which served as separate independent observations. An initial comparison to the NOAA DEM (Bermuda) yielded poor results due to the low temporal and spatial resolution of the DEM (Supplemenatry Figure S2 and Figure S3). For Biscayne Bay, ICESat-2 data was used for training (34,342 points) whereas NOAA DEM data was used for validation, using 8,585 (20%) random-stratified points. Lastly, an amalgamation of training with ICESat-2 data (133 points) and with validation data collected from in-situ singlebeam data (85 points) was used in Crete. As a simplifying assumption, for the specific microtidal areas listed, the uncertainty related to data collected at unknown and different-stages of tide was assumed to be small, albeit non-negligible.
For each validation, we calculated the root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R 2 ), and the standard deviation of the bathymetry estimation. Residual maps to identify the spatial distribution of the differences between the SDB models and the corresponding NOAA DEMs in Bermuda and Biscayne Bay were also generated (Supplementary Figure S3 and Figure S4).

Results
The CBL method produced the most reliable SDB estimates at all sites with a RMSE of 2.62 m, 0.83 m, and 2.19 m, and MAE of 2 m, 0.65 m, and 2.02 m for Bermuda, Biscayne Bay, and Crete, respectively (Table 1). Among the three methods, SVR tends to underestimate the range

Accepted Article
This article is protected by copyright. All rights reserved.
of the depths, although its error values do not differ much from that of the CBL method. The RMSEs of the CBL models are lower or close to 10% of the maximum depth for the Bermuda (26 m) and Crete (22 m) models, but close to 17% for the Biscayne Bay model where the maximum depth of the modeled area is much lower (5 m). The R 2 of the models for Bermuda, Biscayne Bay, and Crete were 0.68, 0.79, and 0.83, respectively (Figure 2 j-l). The reference and the modeled depths of Bermuda (Figure 2 j) are in good agreement mostly between the depths of 11-17 m, whereas for the shallower Biscayne Bay (Figure 2 k) it is between 1.2-3 m.

Accepted Article
This article is protected by copyright. All rights reserved.

Accepted Article
This article is protected by copyright. All rights reserved.

Accepted Article
This article is protected by copyright. All rights reserved.

Comparison with publicly available DEM data
The detailed Bermuda SDB model picks up the bathymetric relief and rugosity of the shallow and coral reef areas in more detail compared to the NOAA DEM product (Figure 2 d and g). The Bermuda NOAA DEM product is a merge of multiple datasets and is heavily interpolated in regions with sparse sounding data. In addition the DEM is composed of 63 years worth of data and therefore changes are expected to have occurred due to hurricanes, dredging and other anthropogenic changes (Lester et al., 2016;Smith et al., 2013). For these reasons, we observe high residuals between the Bermuda SDB and NOAA DEM in the coral reef complexes ( Figure S4). Photons were able to penetrate the water to these depths , but did not exhibit an obvious surface for training selection.
For the Crete study site, however, the SDB models under predict the deeper depths, specifically greater than a depth of 15 m, where the water is optically deep. Here the relationship between

Accepted Article
This article is protected by copyright. All rights reserved.
ICESat-2 reference depths and the log-transformed Sentinel-2 reflecance values become nonlinear. The reference and the modeled depths of Crete are in good agreement mostly between the depth of 7-15 m. We obtained an available bathymetric map for the Crete study site for qualitative comparison, but not for validation, from EMODNET, with a resolution of ~115 m (Figure 2 i). Compared to the EMODNET bathymetry map, our Crete SDB model is of much higher spatial resolution (10 m), and features a more gradual changes in depth.

Discussion and Conclusions
In this study, we have demonstrated the unique fusion of openly available ICESat-2 and Sentinel-2 data for retrieving openly availale shallow water bathymetry DEMs, from coastline to island nation scales. We developed adaptive bathymetry estimation methods derived solely from space-borne observations over coastal waters in Bermuda, Biscayne Bay, and Crete at highresolution and with low error. The open GEE cloud computing platform provides large computational power and a well-integrated system to process hundreds of multi-temporal images in short time, as well as perform analysis with thousands of data points with ease. The high resolution of Sentinel-2 and ICESat-2 data allows us to map benthic variability in detail, improving upon freely available bathymetry maps, especially within optically shallow water where the reflectance values between different depths are more distinguishable. Errors from L2A and L1C data were comparable, highlighting that computationally expensive atmospheric correction is not needed for accurate SDB retrieval in shallow coastal environments.
Our approach improved upon locally specific and openly available DEM data ExistingDEMdata were composed of multiple datasets collected over a large temporal period at varying resolutions.
Therefore, changes in seabed structure due to hydrodynamic processes (such as sediment

Accepted Article
This article is protected by copyright. All rights reserved.
deposition) and natural catastrophes could have been overlooked. At Biscayne Bay, while our RMSE error was small, some uncertainty can be attributed to differences between the high resolution SDB model and lower temporal and spatial resolution NOAA DEM. However, sources of uncertainty are also recognized in the remotely sensed data. At Biscayne Bay, ICESat-2 was unable to detect the bottom of dredged channels, which may have been caused by sediment (turbidity in active shipping lanes) or the inability of the photons to reliably penetrate to those depths. As the ICESat-2 bathymetric photons were manually selected, photons at these depths could be omitted where they did not form a coherent reflecting surface, however the manual selection of photons is simultaneously a potential source of error due to user interpretation. This error may also be present in Bermuda and Crete, particularly where ICESat-2 was used as both training and validation data. Furthermore, a more general source of uncertainty was the effect of tide level in the analysis. By creating Sentinel-2 composite images using the 20% percentile of tens to hundreds of tiles, we collected the darkest reflectance values, which might not coincide with the depth values acquired by the ICESat-2 platform on a certain acquisition date. The tidal range for our study sites was microtidal (< 1 m) thus the advantages of the approach over the small introduction of error are interpreted to be an acceptable tradeoff.
Our SDB models are capable of contributing to the development of the Blue Economy. Without the requirement for in situ data, repeat bathymetric maps can be created for customized timeperiods. This flexibility is required for monitoring changes in nearshore topography for the purposes of navigation (Mavraeidopoulos et al, 2017), site assessments, post-disaster mobilization and response (Stronko, 2013), infrastructure developments (Coughlan et al, 2020), and for bathymetry and benthic cover mapping in regions where field data acquisitions are scarce

Accepted Article
This article is protected by copyright. All rights reserved.
or prevented by a hazardous environment. Coastal zones will experience a future increase in development and impacts from storm events (Horton et al, 2015) and therefore the need for contemporary and repeat bathymetric observations, particularly for data poor regions, will be critical for ensuring sustainability of coastal resources. This is particularly pertinent for "Big Ocean States" (or "Small Island Nations") which may lack the capacity to carry out bathymetric surveys of their territories (Purkis et al, 2019).
Furthermore, our demonstrated method could enable the development of a global map of coastal submerged ecosystems, which continues to be a critical need of the Blue Economy community.
This would be the foundation of global habitat accounting for currently poorly mapped subaquatic ecosystems as seabed morphology is a usual and helpful parameter in aiding underwater coastal habitat monitoring. Indeed, the need for global distribution maps for seagrasses, a blue carbon ecosystem, has been an issue in coastal ecosystem studies, global conservation efforts and national climate change policy agendas (Unsworth et al, 2019). Moving forward, ensuring vertical reference datum consistency and including tide gauge data or tide models will allow for further characterization of the benthic surface and help to resolve errors across tidal amplitudes.
Time series and change detection assessments using pseudo-invariant features could also decrease spectral variabilities in Sentinel-2 composites. We believe that by further characterizing errors and scaling up our approach, we will be able to contribute one of the key factors in making a global map of seagrass, and other shallow benthic habitats, possible.