Artificial Intelligence and Objective‐Function Methods Can Identify Bankfull River Channel Extents

Bankfull channel extents are of fundamental importance in fluvial geomorphology, to describe the geomorphic character of a river, and to provide a boundary for further processing of morphologic and hydraulic attributes. With ever‐increasing availability of high‐resolution spatial data (e.g., lidar, aerial photography), manual delineation of channel extents is a bottleneck which limits the geomorphic insights that can be gained from that data. To address this limitation, we developed and tested two automated channel delineation methods that define bankfull according to different conceptualisations of bankfull extent: (a) a cross‐sectional method called HydXS that identifies the elevation which maximizes hydraulic depth (cross‐section area/wetted width); and (b) a neural network image segmentation model based on a pretrained model (ResNet‐18), retrained with images derived from a digital elevation model. The cross‐sectional method outperformed the neural network method overall. Its prediction accuracy varied according to channel size and type, with overall precision of 0.87 and recall of 0.80. The neural network method was strongest in larger streams, and outperformed the cross‐sectional method in channel sections with inset benches. A tool to delineate morphological bankfull conditions can allow us to more efficiently implement high‐resolution and large‐scale analyses of channel morphology (e.g., regional hydraulic geometry, channel evolution, physical complexity/habitat surveys), and improve management of fluvial geomorphology and stressors.


10.1029/2023WR035269
2 of 24 Defining the top of bank has long been recognized as a difficult problem in geomorphology, whether it is undertaken in the field or remotely.While hydrological definitions of bankfull flow are commonly used (e.g., the 1.5-year recurrence interval; Leopold et al., 1964), their usefulness and transferability is limited.Williams (1978) found that recurrence intervals of bankfull flow ranged from 3 months to 32 years.Furthermore, hydrological definitions do not provide a useful metric to assess altered flow regimes (Page et al., 2005) or transitory morphological responses such as incised channels (De Rosa et al., 2019;Johnson & Heil, 1996).Definitions based on hydrology and sediment transport, such as effective discharge (Wolman & Miller, 1960), can correspond well to alluvial channel capacity (Andrews, 1980), but these require much more data to assess compared to recurrence interval flows.Morphological definitions are more fundamental and useful, but vary depending on the type of channel, purpose of assessment and data availability.Williams (1978) identified 11 possible definitions of bankfull level, loosely classified into three groups: (a) identification of sedimentary surfaces (benches, floodplains); (b) observations of boundary features (vegetation, particular sediment sizes); and (c) analysis of cross-section geometry.Geomorphologists using the first two of these paradigms in field inspections can achieve highly consistent and accurate results.Field estimates by multiple crews generally fall within around 10% of each other (Roper et al., 2008) but this accuracy requires a great deal of manual labor.Digitizing river boundaries with aerial or satellite imagery (usually using vegetation boundaries) can be highly accurate in large rivers, with horizontal errors of around 1.4-2 m (Donovan et al., 2019;Gurnell et al., 1994).Additionally, automation of methods using water or vegetation boundaries has led to efficient production of river width datasets for large rivers (e.g., Allen & Pavelsky, 2015;Pavelsky & Smith, 2008).However, vegetation boundaries do not always provide reliable indicators of bankfull (Riley, 1972), and imagery alone cannot indicate bank elevation or channel depth.Additionally, methods relying on aerial imagery have limitations where vegetation canopy or fringing vegetation obscures the banks, particularly in small rivers and streams (Priestnall & Aplin, 2006).
Instead, analysis of cross-sectional geometry (i.e., the third group of definitions identified by Williams (1978)) provides the most useful avenue to capitalize on lidar data, requiring only geometry and no interpretation of sedimentation or vegetation (see Table 1).Nonetheless, selection of a single bankfull elevation from cross-sectional geometry has considerable uncertainty (Johnson & Heil, 1996).For example, which depositional surfaces (benches/floodplains) are more relevant to top-of-bank (Williams, 1978)?Should the break of slope or the flat surface be considered the top of bank (Navratil et al., 2006)?Approaches such as defining bankfull level as a reach average (Williams, 1978), directly interrogating three-dimensional geometry (McKean et al., 2009), or using fuzzy logic (Johnson & Heil, 1996) address some of this uncertainty.However, such averaging methods potentially ignore important information about variability in bankfull channel morphology which could be useful to interpret morphologic complexity or channel evolution history and trajectory (e.g., identifying channel incision and recovery).
Objective cross-section methods generally require some manual effort to provide the spatial framework of cross-sections to run these analyses.Once a channel centreline is developed, cross-sections can be automated (e.g., Kalyanapu et al., 2007) but difficulties can arise from intersecting cross-sections, particularly for highly sinuous channels (Bales & Wagner, 2009;Merwade et al., 2008).Fully automated methods which work directly from a two-dimensional digital elevation model therefore have great appeal.For example, image segmentation methods developed for visible or hyperspectral imagery (e.g., Buscombe & Ritchie, 2018;Rowland et al., 2016) could be adapted to ingest lidar data instead.Rowland et al. (2016) used the Genetic Algorithm Genie Pro (Perkins et al., 2005) to segment out river pixels in order to study planform change from multiyear aerial imagery sets.Accuracy of derived widths and erosion rates were found to be high, although accuracy of the channel segmentation itself was not reported, nor were details of the layers or methods used in the algorithm, highlighting the limitations of 'black box'-type software packages.More recently, transfer learning has been applied to retrain convolutional neural networks (CNNs) to classify riverscape images (Buscombe & Ritchie, 2018;Carbonneau et al., 2020).Like all image-based methods, however, these CNNs are unable to delineate the top of bank, and suffer from occlusion of boundaries by vegetation.Demarchi et al. (2016) combined hyperspectral imagery and lidar to delineate fluvial geomorphic units using a multi-level object-based identification algorithm (GEOBIA).They found that classification accuracy based on Kappa (κ) coefficients (Cohen, 1960) using lidar estimates alone were around 60%, but lidar combined with imagery yielded κ as high as 95%.While not explicitly segmenting out channel geometry, this paper shows the potential for machine learning approaches to integrate elevation data to map the active river channel.

Cross-sectional geometry methods
Minimum W/D ratio Wolman (1955) An objective method based on minimizing the width-to-depth ratio.Riley (1972) found discrepancies which varied with channel size and shape.This method worked best for channels with rectangular profiles.Navratil et al. (2006) confirmed this definition was not consistent for meandering streams, indicating the top of the point bar at bends.
First maximum of the bench index Riley (1972) An objective method based on maximizing the change in wetted width across a set increment of depth.
Is most consistent with manual methods when the floodplain is relatively horizontal (Riley, 1972).The first maximum depends greatly on the upper limit of the surveyed cross-section (Navratil et al., 2006).
Break of slope on logarithmic plot of A versus W Williams (1978) "The stage corresponding to a change in the relation of cross-sectional area to top width" Some subjectivity involved in interpreting results (Williams, 1978).

Passalacqua et al. (2012)
Lower of the two localized peaks in slope each side of the centerline.
Applicable to meandering, single-thread rivers (assumes that the depositional bank has lower slope that the cutbank).Passalacqua et al. (2012) reported reach-averaged channel width by this method (55.7 m) exceeded manual field and air photo measurements (37-47 m).70% of the total cross-section analyzed was reported as correct through a visual check, although the evaluation method/metric were not report.

Maximum hydraulic depth (A/W ratio)
De Rosa et al. (2019) The elevation corresponding to the maximum of the hydraulic mean depth (flow area/width) Has not yet been widely tested.De Rosa et al. ( 2019) found best results on a single-channel river, but also found useful results on a wandering disequilibrium river (underestimated mean BF width by 21%).

River bathymetry toolkit McKean et al. (2009)
A detrended digital elevation model is processed to produce a volume-elevation curve from which a break of slope is identified.

Automated raster-based methods
Genie Pro Perkins et al. (2005) Genetic Algorithm for image classification, integrating spectral, texture, and shape inputs.Rowland et al. (2016) applied Genie Pro to derive channel masks for the full extent of a river channel but did not assess its accuracy.Some manual cleaning was required to remove classification errors, trim overhanging trees and clean tributary junctions.Demarchi et al. (2016) Geographic Object-Based Image Analysis using Machine Learning classifiers (random forest; support vector mechanics) using both spectral and topographic data inputs.

GEOBIA
Classification accuracy (Kappa) of ∼0.9 with both spectral and topographic data, compared to 0.6 with only topographic data and 0.8 with only spectral data.

Table 1 Automated and Semi-automated Methods for Identification of Bankfull Elevation Based on Geometry, Relevant to High-Resolution Digital Elevation Models
Despite multiple existing approaches to delineate rivers from remote sensing data, there is currently no method that allows for automated detection of bankfull extent and extraction of bankfull geometry in small, complex and vegetated channels.In this article, we develop and test two near-fully automated methods with different paradigms, that extract bankfull elevation and dimensions from lidar data.We apply and test these approaches in a small river in south-eastern Australia that has a history of degradation and disequilibrium, as well as challenges presented by overhanging vegetation.The first method adopts and improves an objective cross-sectional geometry method developed by De Rosa et al. (2019).This method is based on maximizing hydraulic depth (area/ width) for each cross-section, thus detecting the elevation at which flow starts to spill over the bank.The method includes a smoothing procedure to select the most prominent extreme value and ignore small breaks of slope (e.g., in-channel benches).The second method retrains a CNN using lidar data and its derivatives.Under this approach, a pretrained U-net CNN called ResNet-18 (He et al., 2016) was retrained to classify pixels within the river to using elevation, slope, and other geometry information.In contrast with the first method, the only manual input this method requires is training data, comprising digitized masks of channel extent.The objectives of this study are to test and compare the two methods in a robust and quantitative way, to understand which methods work better in which channel types, and to explore why they perform the way they do.

Study Area
We tested the methods on a 13 km segment of Cardinia Creek, a small stream in Victoria, Australia, and a 3 km segment of its tributary Officer Road Drain (Figure 1).Cardinia Creek is located east of Melbourne and has a catchment area of 132 km 2 .It flows from hills in the north reaching 380 m above sea level (ASL), to Western Port, a large bay to the south.A large reservoir in the north-east of the catchment provides water supply to Melbourne's south-eastern suburbs, and hydrologically alters part of the catchment.Cardinia Creek now flows through the urban area of Beaconsfield, contributing further to channel simplification and enlargement (Anim et al., 2018), and Officer Road Drain drains a rapidly urbanizing catchment.The creek flows all year, with most high flow events occurring from May to November and low flows from December to April.
The catchment has mixed geology with Devonian granodiorite in the northern part of the catchment, mudstone and hornfels in the mid-catchment foothills, and alluvium in the lower catchment.The study segment spans lowland alluvial parts of the stream, with invert elevation from around 6 to 41 m ASL.Officer Road Drain and the most downstream reach of Cardinia Creek are straightened and deepened channels built to drain the Dalmore Swamp for agriculture in the late 1800s and early 1900s (Butcher, 1979).That channelization led to major channel incision upstream in Cardinia Creek, resulting in inset channels in some reaches that have now recovered.
Various drop structures and rock chutes have been constructed along the channel to arrest incision (Wilkinson et al., 2016).The study area was split into five reaches ranging from 2.2 to 4.5 km in length (Figure 1), with similar geomorphic character in each reach ( methods to extract bankfull dimensions on small, complex and vegetated alluvial and constructed streams with varying character and geomorphic legacies.

Data Availability and Preparation
The main data input was aerial lidar survey data, from which river centrelines were confirmed, cross-sections were delineated, images were extracted and ground truth bankfull polygons were annotated.The data preparation steps are summarised in Figure 2. Aerial lidar survey data was available for the study area for 2009-2010and 2017-2018. The 2017-2018 Greater Melbourne lidar dataset (DELWP, 2019) had horizontal accuracy of 0.2 m and vertical accuracy of 0.1 m (1 σ).Improvements in point classification and filtering methods mean this dataset had better definition in vegetated areas than the previous dataset.The 1 m resolution digital elevation model (DEM) was used for this study, after assessing the point cloud and determining that it could not provide additional value.High-resolution aerial imagery was also available, most recently for 2016 (DELWP, 2016).

Centerline and Cross-Section Preparation
Stream centerlines were the main spatial reference that cross-sections and subsequently data points for the HydXS method were derived from.Centerlines based on government mapping corrected with ESRI Arc Hydro  2. 10.1029/2023WR035269 6 of 24 Tools were available (Kunapo et al., 2020) and required further minor manual adjustment prior to generation of cross-sections, to ensure it aligned as closely as possible to the lidar DEM.Centerlines derived from automated methods can give spurious results in urban areas like our study area, where barriers are common.Therefore, manual checks remain important even where automated stream lines have been generated.Station points were generated along the adjusted stream centerlines at 25 m intervals using the ArcGIS function "Generate-PointsAlongLines".At each station point, a cross-section was developed for a total width of 80 m in the direction from left to right bank.A python tool was developed to construct cross-sections which were designed to be as close as possible to perpendicular to the stream network, without intersecting with each other or with meanders in the centerline.Elevation was extracted from the lidar DEM at points at approximately 1 m spacing along each cross-section as the basis for cross-sectional geometry analyses.Elevations were extracted for every point regardless of whether water was present in the channel.Where water was present, the extracted elevation more closely reflects water level and likely overestimates bed level.Bank morphology above the water level should still provide adequate information to identify the maximum hydraulic depth and resulting channel extent.Following elevation data extraction, the lowest point within 5 m of the original centerline point (which in most cases was the same as the original centerline point) for each cross-section was identified as the new center for the cross section, to correct minor inaccuracies in the centerline alignment.

Annotation of Bankfull Extent as Ground Truth Data
Bankfull channel extents were manually delineated for the whole study area to provide data for training and testing of automated methods.Extents were delineated in ArcGIS, using the lidar 1 m digital elevation model and derived products such as hillshade, slope and curvature rasters, as well as aerial imagery to provide context.Polygon extents were not leveled but represented the morphologic top of bank on each side of the channel.These extents were used directly where they were comparable with the expected output of the algorithm (e.g., neural network).Where the automated method produced a leveled bankfull channel delineation (e.g., the De Rosa et al. (2019) model), the annotation was leveled according to the following procedure.Firstly, the cross-sections used for the algorithm were clipped to the manually delineated bankfull polygon.Then, the elevation of each endpoint was extracted from the DEM and the lower endpoint elevation recorded.Then, the cross-section was converted to points at 0.1 m spacing and points with elevation higher than the lower bank elevation were deleted.Finally, the points were converted back to polylines.Note.Sinuosity is defined as stream length/valley length.Slope is calculated between inverts of the upstream and downstream ends of the reach.

Table 2 Geomorphic Properties of Study Reaches
Figure 2 for a summary of the method).Hydraulic depth is defined as the ratio of wetted area to wetted top width of a cross-section.As water level in a typical channel increases, hydraulic depth increases, then reaches a maximum just before it starts to spill over banks onto the floodplain.By creating a table of elevation versus hydraulic depth, the elevation which produces the maximum hydraulic depth can be extracted as an estimate of leveled bankfull elevation.To avoid the influence of minor breaks of slope and in-channel features, the elevation-hydraulic depth curve is smoothed using a spline with parameters generated through the k-fold cross-validation procedure developed by De Rosa et al. ( 2019) before the maximum is detected.
The DeRosa code is available on GitHub (https://github.com/pierluigiderosa/BankFullDetection).The code was constructed as a plugin for GIS, but for the purposes of this research only the two key code segments were used: BankFullDetection.py and spline_withR.py.The input required is a "pointList" of distance-elevation couplets for each cross-section (where distance is measured from the beginning of the cross-section, calculated as the sum of distances between each point; the first data point for each cross-section has distance of 0 m).The key outputs for the purposes of this work are (a) bankfull elevation, and (b) left and right bank locations.We first tested the  2019) model (referred to here as DeRosa), with minor code amendments to update and streamline it (see Supporting Information S1).During testing it became apparent that further development of the DeRosa bankfull calculation model was warranted for our study area (see Results).Therefore, we developed an improved model called HydXS, as detailed in the following sections.

HydXS: Improving the DeRosa Bankfull Method
For HydXS, the core of the DeRosa smoothing and maximizing functions remained unchanged.Amendments were made to address how cross-sections with multiple depressions were handled, and to handle variability in outputs due to randomization in the smoothing function.All changes made to the code, between DeRosa and HydXS, are parameterized and can be turned on and off for comparison.

Allowing for Multi-Polygons in Bankfull Calculation
The DeRosa model presumes that (a) the lowest elevation point in the cross-section is the river channel and (b) the largest area between the calculated bankfull elevation and cross-section is the river channel.However, this is not the case in the Cardinia Creek dataset, where minimum elevations sometimes occur outside the river channel (e.g., adjacent ponds, perched channels).Where there is more than one depression in the cross-section-where, at a given elevation, the polygon bounded by the cross-section has more than one segment (i.e., multi-polygonal)the DeRosa model calculates a bankfull that is too low because the calculation of hydraulic depth is maximized at the base of the shallower of the polygons.
HydXS differs from DeRosa in that the code bases the bankfull calculation on only the polygon containing the minimum cross-section elevation, giving a more accurate bankfull calculation.This modification led to erroneous bankfull calculation in cases where the minimum cross-section elevation is outside the river channel.This is addressed in HydXS by creating a trimming function, which removes left and/or right contiguous segments of the cross-section where the elevation is lower than the minimum in the "river channel".The "river channel" is identified based on the river centerline (an additional HydXS-specific data requirement), and the "river minimum" determined as the lowest elevation within a certain number of data points either side of that centerline (parameterized, default = 10).The trimming function moves from the center line outwards, and if at any point the elevation drops below this "river minimum" the remaining edge of the cross-section is removed.

Identifying the Right and Left Bank Locations
After the bankfull elevation is calculated, DeRosa identifies intersections between the bankfull elevation line and the cross-section line.It calculates cross-sectional areas under the bankfull elevation line and sets the left and right bank location at the limits of the largest area.If the largest area is outside the river channel (e.g., perched channels), DeRosa identifies the incorrect bank locations.HydXS differs from DeRosa in that the code now sets the left and right bank locations based on the polygon containing the minimum elevation in the cross-section (after trimming described above); that is, the river channel.

Recalculating if Bankfull Intersects With Left or Right Data Boundaries
For some cross-sections, it is possible for the bankfull elevation line to intersect with only one point on the cross-section line (where the opposite side of the cross-section line remains below the bankfull elevation due to poor placement or poor definition of the top of bank).DeRosa allows bankfull elevation to be calculated in such cases, and returns one of the bank locations at the left or right data boundary.This appeared inappropriate as it presumes to extend to data not actually available, resulting in a possibly incorrect bankfull calculation.HydXS handles such anomalies by checking the bankfull calculation, and re-running the calculation if the left or right boundary is intersected.The calculation is attempted a parameterized number of times (default = 3), and if a valid bankfull calculation is still not returned, then that cross-section is marked as being unable to calculate (i.e., bankfull output value is left blank) and the code moves to the next cross-section.

Handling Variability of Bankfull Calculations
The calculation of bankfull is dependent on a smoothing parameter generated through a k-fold cross-validation procedure which includes some inherent randomness (De Rosa et al., 2019).The more complex a cross-section, the more variability there is across the bankfull calculations runs (with different smoothing parameters).Only the most straightforward cross-section will have the same bankfull calculation for any value of smoothing parameter.
HydXS handles this by applying the bankfull calculation several times to each cross-section (e.g., 11 runs), then 10.1029/2023WR035269 9 of 24 comparing the bankfull calculations for all runs to determine the "most likely" bankfull (using mode or 'binning'; detailed further in Section 2.3.3) for comparison to the annotated ground truth.

Testing DeRosa and HydXS on Study Area
Bankfull extent was estimated using DeRosa and HydXS for the study area, and each was compared to the annotated ground truth to determine and compare their accuracy.Two of the DeRosa default parameters were altered for the study area.Firstly, DeRosa has a default minimum hydraulic depth of 1.0 m; for HydXS, we set the default at 0.2 m, as Cardinia Creek is a smaller river system than the rivers DeRosa was based on.Secondly, DeRosa adds an extension to the top of the cross-section for creating a polygon, set at 1.0 m.The smaller this parameter, the less smoothing is applied, and thus the more likely the bankfull calculation is to falsely identify benches and minor convex kinks as top of banks.We retained the value of 1.0 m after testing values down to 0.2 m.
The final bankfull elevation and top-of-bank points were determined by comparing calculations from 11 runs.The mode of bankfull elevation was used if more than two-thirds of runs fell within 0.05 m (parameterized) of the mode; then left and right bank were determined based on the mode.If not, bankfull results were allocated into three bins (using Python's binning function) and the average of the most populated bin was used; left and right banks were then interpolated.Approximately 7% of the cross-sections used binning to determine bankfull; that is, these are the most complex and variable of the cross-sections.

Neural Network Image Segmentation and Transfer Learning
We used a deep CNN as an alternative methodology to delineate channel boundaries via image segmentation of lidar-based images.Neural networks are mathematical algorithms that mimic the interactions between neurons in a brain.They can be constructed to solve a variety of predictive problems in science, technology, and business applications.CNNs are a type of neural network specifically suited to the classification and segmentation of image data into different object classes.Image segmentation is most useful for this study as it classifies each pixel of an image as a different object, and therefore is ideal for classifying a lidar-derived image into pixels that fall within and outside the bankfull channel.Training a new model by choosing the right architecture and training the model from random weights requires a large training set of thousands to tens of thousands of images (Tajbakhsh et al., 2016), which is not often achievable in many settings.However, transfer learning is a growing subset of machine learning that takes models that have been trained on large datasets, and retrains them on new datasets, with new categories (Pan & Yang, 2009).This process uses the existing architecture, but retrains the weights in the top "aggregating" layer and one or more neural network layers in order to predict new classes on a new dataset.Pretrained CNNs can be re-trained on new categories with much fewer training samples (typically 50 or more) by capitalizing on the pretrained CNN's ability to recognize base features of an image, such as shapes, edges, and textures.
One other study that has attempted to use CNN image segmentation on lidar data used a Mask CNN (Maxwell et al., 2020), which both performs both pixel segmentation and object localization to define the boundary of coal mine tailings deposits.Because we do not need to localize individual river objects, we use U-net CNN architecture (Ronneberger et al., 2015) pre-trained on the Imagenet dataset (i.e., ResNet-18;He et al., 2016) as our starting point for transfer learning.The U-net architecture has been shown to be effective for the image segmentation of medical imagery (Siddique et al., 2021), and we expect it will be retrainable for imagery derived from lidar data.

Imageset Construction
In preparing a lidar dataset for CNN-based image segmentation, the clipping and masking of the images, as well as the derived data from the lidar DEM can be adjusted to fit the individual problem and allow for the most Following Ghorbanzadeh et al. (2019), where the smallest image sizes resulted in the best segmentation of landslide scars, we decided to use the smallest images allowable for our 1 m DEM without resampling to a smaller pixel size.Therefore, we created a regular grid of 224 m by 224 m image boundaries, to satisfy the CNN U-net requirement that images be 224 by 224 pixels (Figure 3).This grid contained many images that did not intersect with the river corridor or the valley bottom, which would overbalance the "not in river" class, so we filtered out images that did not overlay a 50 m river centerline buffer to reduce the proportion of training images without any river pixels in them.
We derived several different raster layers to use as pseudo-RGB channels in our input (Table 3) including regionally normalized elevation (DZ ds ), where we normalized lidar elevations using the maximum and minimum elevation in the overall lidar dataset.To accentuate local features on an image-by-image basis, we also calculated image-normalized elevation, with a minimum relief (DZ L-MR ) This function normalizes the image elevation data to the minimum and maximum elevations within the image.It also applies an extra rule for minimum relief, to prevent relatively flat sections of the lidar dataset from accentuating small depressions, thereby preventing false positives in floodplain images where the river isn't present.In addition to normalized elevation, several other parameters were used as input image channels to potentially accentuate the river boundaries.To accentuate the depressions of a river, we used up-facing topographic openness (Op) created from SAGA GIS open-source tools (Yokoyama et al., 2002).Additionally, slope (S) and Lagrangian curvature (Cu) were calculated from the DEM.Lagrangian curvature was included to capture inflection points at the tops of banks and bank slopes, as well as to differentiate constructed embankments from the river.Additionally, an ArcGIS Spatial Analyst derived hillshade was used as an input to accentuate slopes and slope breaks (HS).Finally, an empty layer of zeros (0) was used as a no-data channel to substitute for a data channel, in order to run the CNN with fewer than three data channels.where Z is the elevation at a given pixel, pixel Z ds,max is the maximum elevation in the dataset, and Z ds,min is the minimum elevation in the dataset.

DZ L-MR
Image-normalized elevation with minimum relief where Z is the elevation at a given pixel Z im,max is the maximum elevation in the image, and Z im,min is the minimum elevation.A minimum relief, MR is enforced using the following logic: Op Up-facing topographic openness Smallest vertical angle that can be created between two elevation points along a straight-line diameter centered at the pixel (Yokoyama et al., 2002).The algorithm calculates the smallest angle for 8 cardinal directions (N, NE, E, SE, S, SW, W, NW) and takes the mean for each pixel.The maximum horizontal distance is set by a radius attribute, and we used a radius of 100 m to emphasize river-corridor-scale effects.

S Slope
Calculated from the DEM using NumPy's gradient function, and the Pythagorean theorem to combine the northing and easting components of the slope.Using the slope magnitude produces a one-sided bell curve distribution, whereas typically a Gaussian distribution of values for each channel of an input image yields the best results for a CNN.Therefore, we introduced positive and negative slopes based on the compass direction (slopes north and west were negative).

Table 3
Raster Layers Used as Pseudo-RGB Channels for Input to CNN These layers were combined in 14 different combinations to assess the best inputs for CNN image segmentation of the bank full river channel.In the results, we label the assembled imagesets with the symbols for the three channels separated by underscores.For example, DZ L-4 _Op_0, represents the imageset where the image-normalized DEM with a minimum relief of four is the first channel, the topographic openness is the second channel, and a no-data layer of zeros is the third channel.A list of all the imagesets run can be found in Figure 5 and

Image Augmentation
To effectively retrain a neural network on a small number of samples, and prevent the model from overtraining, or effectively memorizing the training set, image augmentation was performed on each training image before being input into the model.Image augmentation applies random transformations and random noise to the image before predicting on the image during the training loop.This way, the model never sees the exact same image twice during the training loops, aiding in producing a more generalizable model with less data (Perez & Wang, 2017;Shorten & Khoshgoftaar, 2019).
We augmented the training data (Figure 3) using processes that change the image but maintain realistic but altered riverscape patterns, such as: • Vertical and horizontal flipping: images had a 50% chance of being randomly flipped vertically and horizontally.• Random zooming: images were randomly zoomed in or out of the image by 20%.Empty space was filled with zeros.• Warping: images were perspective-warped randomly to a magnitude of 50%.
• Random rotations: images were randomly rotated as much as 180°.
• Random erasing: small square regions in the image were randomly erased and filled with noise.
• Random Gaussian noise: random amounts of noise were added to the dataset from a Gaussian distribution with mean of 0 and standard deviation of 0.3.

CNN Training
Image segmentation CNN's are trained by using the model to predict pixel by pixel classifications on small batches of the training images at a time, checking the predicted pixels against the actual ground truth masks by calculating a loss metric, and adjusting the weights within all or a subset of "unfrozen" layers of the CNN model according to the loss function and a learning rate, and repeats this until all the training images have been fed through the model.Typically, multiple passes through training dataset are performed in a loop, and each pass is called an epoch.
We trained a ResNet-18 (He et al., 2016) based U-net neural network from pretrained imageset model weights using the Fastai python package on a Tesla A100 GPU.We used a weighted combination of Focal Loss (Lin et al., 2017) and Soft Dice Loss (Wang et al., 2020), which are preferred loss functions for image segmentation tasks with unbalanced classes, given that in this case there are many more out-of-river pixels than in-river pixels.
We used a one-cycle-policy (Smith, 2017) to vary the learning rate during training, and the maximum learning rate was found via a grid search (further details are provided in the Supporting Information S1).

Metrics for Assessing and Comparing the Performance of Methods
Other studies attempting to assess the accuracy of bankfull or geomorphic landform prediction have used a variety of methodologies.For cross sectional methods, accuracy calculations include expert assessment (Passalacqua et al., 2012), and percent error in predicted bankfull widths (De Rosa et al., 2019).2D landform mapping methodologies usually assess accuracy with Kappa coefficient (κ), as well as Users and Producers accuracy (Ua, Pa), although alternative metrics have been proposed (Orlandini & Moretti, 2009).Users and Producers accuracy are identical to precision and recall, widely used for assessment of model predictions.They are computed from the number of True Positive (TP), False Positive (FP), and False Negative (FN) pixels for the "in river" category, according to the following equations: (1) We report these values along with Dice coefficients for both models.The Dice coefficient is a balanced accuracy metric identical to the F1 Score and is the harmonic mean of precision and recall, calculated according to: (3) Kappa (κ) coefficient (Cohen, 1960) is a metric suited to assessing inter-rater agreement for nominal scales, but which has also been widely used to assess classification accuracy, taking into account the expected accuracy of a random classifier.κ is sensitive to imbalanced data which is common in river classification problems.There are likely to be many more non-river than river pixels in an image, and the degree of imbalance depends on the width of the corridor under consideration.Dice coefficient is considered a better metric for our purposes than κ because it is insensitive to the number of true negatives and thus to the width of the river corridor being considered.We computed κ for comparison to other studies which reported κ but not Dice, precision or recall, but did not use κ in comparing or evaluating our own models.
These metrics were calculated differently when assessing the performance of the DeRosa and HydXS cross-sectional results, and the CNN results.For the CNN results, we counted each pixel that was a TP, FP, or FN within the test regions that lay within 50 m of the river centerline.TP and FN are insensitive to the width of the corridor, as long as it contains the whole extent of the river, although FP can increase with corridor width, affecting Precision and Dice if other features such as dams, paleochannels and drains are incorrectly classified as river pixels.Therefore, the corridor width was selected to be as narrow as possible while containing the whole expected width of the river.A comparison of model performance using images cropped to a 50 m river corridor versus whole images is provided in Table S2 in Supporting Information S1.We also provide in the Supporting Information a comparison of training, validation and test set Dice coefficients (Table S3 in Supporting Information S1) and an example of the training and validation loss profile over the course of a model run (Figure S1 in Supporting Information S1) to indicate the generalization capabilities of the model.
To calculate accuracy metrics for one-dimensional cross-sectional bankfull extent, the number of cross section points that were TP, FP, and FN were aggregated by overlaying the predicted bankfull extent and the ground truth at each meter along each cross section, then computing precision, recall and Dice coefficient.For assessment of the cross-sectional models, these metrics were computed for the whole study area (as there was no test-training split).However, for comparison to the CNN results, the metrics were recomputed for only the neural network test areas to allow direct comparison.Additional metrics were used to assess the bankfull elevation predicted by the cross-sectional methods, including mean error, mean absolute error, and root-mean-squared error (RMSE) of predicted versus ground truth bankfull elevations.

Computational Robustness
DeRosa and HydXS were successfully applied to the Cardinia Creek study area, converging on a solution for the vast majority of cross-sections.Around 2%-4% of cross-sections repeatedly returned errors and were therefore excluded from the comparison analysis.This was usually due to cross-section geometry which did not have clear riverbanks.Such cross-sections occurred due to lidar error (e.g., interpolation across the channel where point density was not adequate), poor alignment of the centerline, or automated extraction of cross-sections at points on the centerline where a river channel did not exist (e.g., where it flows underground under road crossings).Error return in such cases is considered an appropriate outcome, prompting the user to check the input data and verify that a result is not expected.

Bankfull Elevation
DeRosa modeled bankfull elevation was within 0.1 m of ground truth for 27% of the cross-sections in the study area, and within 0.1-0.5 m of ground truth for a further 47%.The remaining 26% diverged by more than 0.5 m.DeRosa diverged most noticeably from ground truth in reaches 4 and 5 (Figure 4).DeRosa tended to underpredict the ground truth bankfull elevation by a mean value of 0.46 m (Table 4; Table S1 in Supporting Information S1).
HydXS bankfull elevation differed from DeRosa for 85% of cross-sections, with 56% being more than 0.1 m different.HydXS calculated a higher bankfull elevation than DeRosa for more than two-thirds of the cross-sections calculated (414 of 629).HydXS still tended to underpredict the ground truth bankfull elevation, but only by a mean value of 0.35 m.HydXS tended to predict bankfull elevations closer to ground truth than DeRosa did.The improvements of HydXS were particularly beneficial for reach 2 (Figure 4), though reach 4 challenged both DeRosa and HydXS, and reach 5 was only marginally improved by HydXS.

Bankfull Channel Extents
The original DeRosa code predicted bankfull channel extents with precision ranging from 0.36 to 0.91 and recall ranging from 0.63 to 0.84 across the study reaches (Table 4; Table S1 in Supporting Information S1).Accuracy varied depending on reach characteristics, and tended to be higher for larger channels.DeRosa appeared useful for predicting channel extent in reaches 4 and 5, though less so for others.
Running the HydXS model over the study area cross-sections gave an improved result compared to the original DeRosa code.River reaches 1, 2, and 3 particularly benefited, while reaches 4 and 5 had HydXS precision and recall similar to the original DeRosa code.DeRosa tended to underestimate bankfull elevation in reaches 1, 2, and 3, indicating that it was picking up a lower maximum hydraulic depth due to the presence of perched channels or adjacent paleochannels.The improvements to the treatment of multiple and perched channels implemented in HydXS were able to rectify these problems to some extent.In reach 5, inset floodplains and benches were particularly prevalent, and reach 4 had some lidar artifacts (steep in-river channels caused by infilling lidar gaps  2. with a constant value) which resulted in false identification of the top of bank and hence poor recall which could not be remedied without correcting the data or substantially altering the smoothing method.

Key Results
Losses of both test and validation set decreased with epoch until they stabilized, and no second peak in losses of the validation set occurred, suggesting that there was no overfitting occurring during training.Detailed results can be found in the Supporting Information, including an example plot of loss versus epoch (Figure S1 in Supporting Information S1).Depending on the imageset chosen, the performance of the neural network varied widely (Figure 5), with Dice coefficients as low as 0.35-0.6 when using slope (S_0_0) or hillshades (HS_0_0) to over 0.82 for image normalized elevations with a minimum relief of 2 m ( −2 _0_0).Although the optimal minimum relief was 2 m, minimum relief set at 0, 3, 4 and 1 m all exhibited accurate results (Dice coefficients >0.8) within the river corridor, defined as the region within a 50-m buffer of the river centerline.The image-normalized elevations performed significantly better than dataset-normalized elevations (DZ ds , Dice = 0.72), indicating that the amplification of local relief by local normalization creates a more consistent image for the neural network to identify the correct riverine extent.
One surprising finding is that using multiple derived input layers to create the training images did not appear to improve model performance, with the top five performing imagesets only containing locally normalized relief.
The addition of additional layers, such as topographic openness, curvature and slope result in poorer performance than elevation alone.For example, combining DZ L-4 with topographic openness resulted in a decrease in Dice coefficient of 0.02, and adding curvature decreased the Dice coefficient a further 0.2.This shows that choosing the correct lidar-derived input and not providing extraneous information can considerably improve the predictive capability of the CNN.

Comparison of Neural Network and HydXS Results
We compared the two methodologies by overlaying the predicted river corridor extents from the CNN on the cross sections used for HydXS, to assess the relative performance of the two methods.We did so by overlaying the pixel level predictions from the CNN model on top of the cross section points, the comparing the results between HydXS and the CNN model for each cross section.This cross-sectional comparison was chosen to compare equivalent metrics; comparing a cross sectional Dice coefficient to a Dice Coefficient calculated by the number of pixels overlapping would not be a like-for-like comparison of metrics.We report the cross-sectional Dice coefficient comparisons using the cross sections in the test regions of each reach (Figure 6).
We found HydXS outperformed the best CNN imageset ( −2 _0_0) with a difference in overall Dice coefficient of 0.07 (see Table S4 in Supporting Information S1).However, in certain situations, the neural network performed comparably or outperformed HydXS.HydXS typically performed much better further upstream, and Note.The mean error (ME), mean absolute error (MAE), and root-mean-squared error (RMSE) are given for each reach.The precision, recall and Dice coefficient for bankfull extent are also given for each reach.Cross-sections for which a bankfull elevation could not be calculated (10 for DeRosa and 24 for HydXS) are excluded from the statistics.Reach details are given in Table 2.

Table 4
Bankfull Elevation and Extent Comparisons Between DeRosa and Annotated Ground Truth, and HydXS and Annotated Ground Truth in the ORD tributary (Figure 6), whereas the CNN method performed better in the two most-downstream reaches (RB to ORD confluence, and downstream of ORD).Both algorithms developed in this study improved on the DeRosa algorithm, with the CNN method improving on DeRosa by 5% and HydXS improving on DeRosa by 12% overall.
This variable performance from reach to reach reflects in the differences in channel form between reaches (Figure 7).In cross sections with inset benches or floodplains (Figure 6b), the CNN outperformed the HydXS method, because the HydXS algorithm identified the peak in hydraulic depth at the inset bench, and therefore was likely to choose that as bankfull rather than the bankfull floodplain delineated by the expert annotations.However, HydXS performed better with multiple channels, as well as normal single thread cross sectional morphologies than the CNN model.The difference between predictions of the HydXS method and the neural network method are plotted on a hillshade in Figure 7.

Discussion
We tested three algorithms for automatically delineating bankfull river channel boundaries from high-resolution airborne lidar DEMs.We found that HydXS performed most accurately on the test set and on the whole dataset.The adjustments we made on previous objective-based models allowed application to more complex channels, including multiple depressions in the cross section (e.g., fringing/constructed wetlands, perched channels), and improved robustness of the results (by preventing bankfull extents that reached the cross-section edge, and handling variability between runs).These amendments improved accuracy in prediction of bankfull extent over the DeRosa formulation of the model.In addition to the objective model, we used transfer learning to retrain a U-net CNN on several combinations of derived lidar outputs.We found that the choice of inputs was important in accurately delineating the channel boundary, and that image-normalized elevation with a minimum relief of 2 m provided the most accurate bankfull extent results of all combinations tested.
One of the advantages of using transfer learning and image augmentation is the reduced requirements for training data versus training a neural network from randomized weights or a new architecture (Tajbakhsh et al., 2016).We found that it took less than 13 km of river length to create a large enough dataset (101 images) to create a useful image segmentation model.However, it is likely that a more generalizable model will require more training data from a greater diversity of stream types.While we cannot prescribe a target number of images or length of river to create an accurate model, we suggest that the amount of training data needed will be a factor of the image chip size, complexity of the river in question, and the targeted generalizability of the model.
Both of the new methods presented here (HydXS and CNN) performed better than the original DeRosa algorithm, and both new methods yielded comparable results to other studies using machine learning to automatically  2.
map and identify landforms (Demarchi et al., 2016;Ghorbanzadeh et al., 2019;Maxwell et al., 2020).HydXS performed better on average than the CNN, except in reaches where inset floodplain features were dominant.Given this dependence on the morphologic setting, we do not expect HydXS to consistently outperform CNN in new areas.Choosing an appropriate method requires an understanding of channel types and forms being considered and the theoretical underpinnings of different methods as well as validation efforts in new areas.Alternatively, a multi-method approach can also provide a weight of evidence where complex or variable channel forms exist.

Comparison of HydXS to Previous Work
HydXS significantly improved results from De Rosa et al. ( 2019), increasing the Dice coefficient for the whole dataset from 71% to 83%.The biggest gains in accuracy occurred in the upstream reaches which exhibited shallower streams with some multithread sections, and presence of anthropogenic structures (artificial berms and settling basins).De Rosa et al. ( 2019)'s implementation of their algorithm on an unstable reach of the Paglia River resulted in poor accuracy, and we found similarly poor accuracy for an incised river in Australia.HydXS has improved the method's ability to identify channel boundaries in channels with complex morphology.This shows that correctly accounting for multichannel features, aggregating repeat trials and preventing the algorithm from choosing the boundary of the cross section as bankfull substantially increase the ability of the algorithm to accurately predict bankfull elevation and extent in more complex channel morphologies.

Comparison of CNN to Previous Work
Various neural network and machine learning methods have been built to either delineate the channel boundary (Demarchi et al., 2016) or other geomorphic features (Ghorbanzadeh et al., 2019;Maxwell et al., 2020).Our neural network produced comparable results with previous work which used only lidar data (Maxwell et al., 2020), or combinations of lidar and aerial imagery (Demarchi et al., 2016;Ghorbanzadeh et al., 2019).2.
One other study has attempted to use transfer learning from another image segmentation neural network to map features using lidar alone (Maxwell et al., 2020).The methodologies used by Maxwell et al. (2020) are the most similar CNN pipeline to our method, though they used a mask R-CNN feeding in greyscale lidar DEM slope maps instead of a U-net architecture.Though Maxwell et al. (2020) used over 600 images, they only got marginally better image segmentation results than our study (Dice coefficient of 0.81 for image segmentation of the test region) to which our best performance of a 0.78 Dice coefficient is similar.It is surprising that slope-derived datasets did not improve the performance for bankfull in this study, when they worked well for identifying artificial valley fills.This is likely due to the fact that slopes which characterize riverbank features can result in falsely identifying other steep slope escarpments as in-river (e.g., artificial berms and road fill).By contrast, the slope patterns of artificial valley fills are distinctive when compared to natural surrounding hillslopes, and thus accentuated the objects for segmentation.This shows that it is important to choose the inputs that fit the segmentation problem to maximize the performance of an image segmentation CNN for geomorphic landform extraction.
Our transfer learning approach yielded more accurate results than other machine learning methods designed to delineate channel landforms with lidar data (Demarchi et al., 2016).Demarchi et al. (2016) trained multi-hierarchical object identification models on 21 km of river, and found that they were only able to achieve validation κ coefficients of ∼0.6 when categorizing objects as "active channel" using lidar data alone for object categorization.Our method is more accurate (Highest κ was 0.76-see Table S2 in Supporting Information S1) though Demarchi et al. (2016) achieved higher accuracy when using both aerial imagery and lidar slope maps as inputs.For the first step of their model, an off-the-shelf unsupervised object detection delineated set objects using combined lidar and aerial imagery followed by a model to categorize each object.The fact that our model produced similar results with similar amounts of training data, while only using a lidar derived DEM, suggests that transfer learning has more potential than Support Vector Machines and unsupervised object detection to identify the extents of different geomorphic features, due to the translatability of ImageNet trained neural networks.
Like Demarchi et al. (2016), Ghorbanzadeh et al. (2019) used a combination of spectral imagery and DEM-derived topographic layers as inputs into a landslide identification model using CNN as well as other machine learning methodologies.They found that using topographic layers did not improve the model, and the best performance was using a 16-layer CNN on five spectral raster layers (Dice coefficient of 0.87), whereas including an additional three topographic layers lowered the score to 0.82.This differs from the other studies here in that the authors created their own neural network model architecture using randomised weights as a starting point, instead of applying transfer learning to a predetermined model architecture with weights pre-trained on a large dataset.They could do that because they had a larger dataset (∼3,500 images).Despite the larger training set, the addition of topographic channels to the neural network inputs may have increased the complexity of the problem past what the dataset could resolve.Additionally, the resolution of the topographic data (5 m) may not have been fine enough to express the topographic signatures of landslide scars.Comparing our results to this study further illustrates the potential of transfer learning to adapt to lidar-derived imagery, as our Dice coefficients approached the best results with topography reported by Ghorbanzadeh et al. (2019), with two orders of magnitude fewer image tiles used for training.
Li et al. ( 2020) compared random forest and CNN approaches to map loess hills and loess ridges using 30 m DEM data and satellite imagery.Like our study, Li et al. (2020) showed that image segmentation with U-net architectures is viable with a small sample set (150 images combined for training, validation and testing) using DEM-derived outputs.However, the authors were not able to achieve the accuracies of our study.The overall pixel accuracy they obtained was 91% for loess hills and 83% for loess ridges, which is lower than the best CNN run from our study (overall pixel accuracy = 97%).It is important to note, however, that pixel accuracy is not the best measure of accuracy in our case, as our dataset contains a much larger number of pixels outside of the river than in the river.Li et al. (2020) found that the best performing input image contained a layer that was a grayscale satellite image, and a layer that was the 30 m DEM, while adding aspect and slope did not improve accuracy.We too found that adding additional DEM-derived layers beyond elevation alone (e.g., slope, curvature, openness) did not improve accuracy.
In the context of the growing number of studies of automated geomorphic mapping using machine learning, our results illustrate that transfer learning of pretrained U-net architectures is a valid approach, resulting in comparable accuracies to models trained on larger datasets (Ghorbanzadeh et al., 2019;Li et al., 2020;Maxwell et al., 2020), other CNN architectures (Ghorbanzadeh et al., 2019;Maxwell et al., 2020), and additional spectral inputs (Demarchi et al., 2016;Ghorbanzadeh et al., 2019;Li et al., 2020).The results also suggest that fewer input layers may result in better results, as greater numbers of input channels require an even greater amount of data to resolve in a robust CNN predictor.By comparing our results to Maxwell et al. (2020), andLi et al. (2020), we further illustrate how the types of derived DEM inputs have a strong influence on effectiveness of the CNN learner, suggesting that thinking about the topographic signature of the landform, as well as possible falsely identified landforms can aid in the selection of the right input layers for the segmentation task.

Future Improvements of HydXS
The HydXS bankfull delineation algorithm is highly accurate for our study area, which includes both alluvial and engineered channels ranging from 7.5 to 26 m in width.Dice coefficients approached 90% in some reaches, approaching the expected level of agreement between independent expert field crews (Roper et al., 2008).Geomorphologists make choices when delineating bankfull extent, and hold biases based on the fluvial contexts they are most familiar with.Therefore, achieving on-paper accuracy higher than around 90% with a single expert annotator is probably not indicative of higher real-world accuracy.De Rosa et al. ( 2019) successfully tested the previous version of the model on the Tiber River, a larger and simpler stream than our study stream.Generalizability can be further confirmed by future testing in new river contexts including smaller streams and semi-alluvial streams.We expect to find limits to generalizability in confined streams which have little to no floodplain which would be detectable as a maximum hydraulic depth.
Future improvements could attempt to address the problem of misidentification of inset benches/floodplains (see Section 4.1.5).This could be done by adjusting the smoothing function, but care would need to be taken not to oversmooth and fail to detect real riverbanks.An adjustable smoothing function would allow the user to decide on an appropriate level of smoothing for the local context and the desired treatment of inset channels.In our study area, lidar artifacts produced artificially steep-banked inset channels.Adjusting the smoothing function to a point where it could ignore such apparently sharp breaks of slope would be unlikely to give satisfactory results across the whole system.In this case, improvements could be achieved by filtering out lidar artifacts (e.g., by inspecting point clouds and detecting large gaps in ground-strike points).

Limitations and Potential Improvements of CNN Methodologies
Notwithstanding the high accuracies we achieved with a small training dataset, our CNN pipeline can be improved in several ways.The accuracy and generalizability of a neural network model is very much dependent on the amount of input data, and the representativeness of the input images as a sample of all possible river systems.Therefore, it is possible that regionally trained neural network river segmentation models may not work in other fluvial contexts.However, one advantage of using lidar data only is that one can make disparate parts of the river corridor appear quite similar by scaling the elevations and the size of the images (in map units) according to the potential size of the river.Our goal was a "naïve" river segmentation algorithm, that could be used on any piece of lidar DEM without any other a priori knowledge.However, our results show that the CNN was better at identifying larger deeper rivers in the downstream-most reaches.This provides an opportunity to scale the input image according to the possible width and depth of the river to make the CNN model better at identifying the river boundary.The approximate width and depth can be assessed manually or using regional hydraulic geometry relations which relate river dimensions to catchment area.Future model iterations could scale the image size and minimum relief with the catchment size calculated from a flow accumulation raster.This might create a more generalizable river segmentation model.This type of dynamically scaled lidar derived imagery may also be useful for other forms of geomorphic landform identification.
One other possible improvement that effectively makes the model less "naïve" would be the centering of all images on the river centerline, which could be constructed from an accumulation raster, as a way to center the image on the river, and limit edge effects.Identifying a small portion of river on the edge of an image proved problematic (see Figure 7, reach 5 for an example).Better accuracy may be possible when the model is trained on a subset of lidar images that are expected to have active channel in them (i.e., images in the river corridor).
Though a less generalizable model, it could yield more accurate results.
The CNN model had issues with misidentifying artificial features, such as berms and retention ponds as "in river".Therefore, future work could train the CNN to label these features as artificial structures (which would require a larger training set), or filter them out (Passalacqua et al., 2012) to improve model accuracy.
We also suggest and reiterate suggestions of other authors (Li et al., 2020;Maxwell et al., 2020) that a global database of publicly available imagery and annotation data for river boundaries and other geomorphic features be collated, as well as a registry for models and model weights and architectures.We also advocate for standardization in the reporting of data and accuracy statistics.Reporting pixel accuracy (e.g., Li et al., 2020) is a potentially misleading statistic for most landform segmentation tasks, in that correctly identifying pixels that aren't the target landform will occur quite often, as most landforms will take only a small portion of the target area.Therefore, we suggest using Dice coefficient (F1 score) which is the standard used by image segmentation specialists in the Data Science field, as it is sensitive to the correct overlaying of the predicted and the ground truth mask.We also suggest reporting the number of TP, FP, TN and FN to allow others to assess accuracy directly and compute new error statistics.

Influence of Inset Benches/Floodplains on Predictions of Bankfull
Though much improved on previous formulations, the one complex channel morphology that HydXS struggled to correctly identify was channel boundaries with the presence of inset channel benches or floodplains, which were falsely identified as the bankfull boundary.This systematic error occurs because these features create extrema in the hydraulic depth versus discharge curve, and are therefore the first peak to be identified by the algorithm.This error highlights one of the confounding issues with bankfull identification via objective function maximization.Like HydXS, other methodologies typically assume the lowest elevation identifiable is the bankfull point (De Rosa et al., 2019;Passalacqua et al., 2012).Passalacqua et al. (2012) argued that this assumption is justified for meandering rivers as the lower bank level is typically the top of the point bar side of the meandering river (Lauer & Parker, 2008).However, in Australian rivers, inset floodplain and bench features are very common (Croke et al., 2013(Croke et al., , 2016;;Daley et al., 2017;Erskine et al., 2012;Vietz et al., 2012), and in some river systems, there is an active debate as to whether or not the inset floodplains should be considered bankfull, as they are often a result of the widening and aggradation of a river responding to a wave of incision (Daley et al., 2017;Hupp & Simon, 1991;Schumm, 1973).However, in some systems, inset depositional "benches" are often formed at below bankfull discharge, as a result of local hydraulic patterns (Croke et al., 2013;Nanson & Page, 1983;Vietz et al., 2012).
The fact that the CNN method outperformed the HydXS method where there were inset floodplains/benches shows some promise in the treatment of inset features when mapping bankfull extents.This is likely due to the fact that U-net CNN architectures look at the neighborhood of pixels and the layer values together when constructing the probabilities of a given class, and therefore is natively "spatially aware" when constructing the river boundaries.This means that the CNN can take into account upstream and downstream parts of the river where there may not be an inset bench, when classifying a section of the river with an inset bench, and is therefore able to choose the right river boundary.Conversely, cross sectional methods typically isolate and attempt to find bankfull elevation at the individual cross section, without any information from nearby cross sections.This comparison illustrates the great potential that CNNs have in automatically mapping geomorphic features by both examining the topographic attributes that are input as the image, and the spatial context in which these features lie.

Conclusions
We developed and tested two computational methods that can help automatically detect river boundaries from lidar data.These methods have great utility in areas where aerial imagery cannot readily be used to detect river boundaries due to overhanging vegetation, and where rapid and automated delineation is required.By directly using lidar data, these methods calculate not only channel widths, but bankfull elevations and depths, as well as allowing a range of further terrain analyses to be done directly on bankfull channel cross-sections or polygons.The two methods are derived from distinct paradigms, and each have particular strengths.The objective-based method that identifies bankfull elevation from cross-section geometry data was particularly strong in delineating smaller channels, while the neural network image segmentation model performed best in large channels, and outperformed the objective-based method in delineating complex channels with inset benches.The models have the potential to streamline channel delineation workflows, leading to larger-scale morphological and hydraulic analyses, integration with other automated processes, resulting in more complete regional datasets to inform flood management, waterway rehabilitation design, and channel evolution and recovery studies.This research was undertaken under a collaboration between the Melbourne Data Analytics Platform and the Waterway Ecosystem Research Group.Funding and lidar data were provided by Melbourne Water under the Melbourne Waterway Research-Practice Partnership (mwrpp.org).

Figure 1 .
Figure 1.Study area, showing reach extents, digital elevation model and aerial imagery.For reach details, refer to Table2.

Figure 2 .
Figure 2. Flow chart of methods.Boxes represent processing steps in the pipelines.Sections of this article which describe each method step are indicated in the flow chart.
CNNs are made up of a model architecture, and model weights.The architecture of a model can vary significantly from model to model, depending on what prediction task that is the aim of the model.The architecture consists of layers handling the input, layers of neurons that are connect with each other in a variety of patterns, and a hidden layer that aggregates the CNN layers and turns them into the predictions.Within each layer are weights at each neuron, that are adjusted during the training until the model minimizes a loss function and predicts the training set most accurately.
accurate prediction.For example,Maxwell et al. (2020) used lidar-derived hillshades as inputs to their image segmentation model.Transfer learning using CNN image segmentation requires input of exactly three image channels mimicking the RGB channels of an image.Ghorbanzadeh et al. (2019) combined layers of topographic and spectral outputs into imagery cut to different window sizes to assess the relative effectiveness of CNN image segmentation of landslide scars.They found that spectral inputs in the images, along with the smallest image size resulted in the most accurate CNN predictions.We expect that the choice of image size, the location and orientation of images clipped from the lidar, and the derived layers may affect model performance and will likely be different depending on the landform being detected.

Figure 3 .
Figure 3. (a) Map showing the image grids used for imageset creation, overlaying the river centerline buffers.(b) Example augmented image that has been warped, and randomly erased before being fed into the training loop to prevent overfitting.
Lagrangian curvature Lagrangian curvature was calculated from the DEM using the functions outlined in Passalacqua et al. (2012) and Sangireddy et al. (2016) using NumPy.HS Hillshade Hillshade produced by ArcGIS Spatial Analyst.0 Zero No-data layer composed of all zeros.

Figure 4 .
Figure 4. (a) Bankfull elevation comparisons between HydXS predictions, DeRosa predictions, and annotated ground truth (GT), by river distance along each reach.(b) Histograms of difference between modeled and ground truth bankfull elevation, for both De Rosa et al. (2019) and HydXS (this study) versions of the cross-sectional objective model, by reach.Histograms use a bin width of 0.1 m.Differences less than −1 m and greater than 1 m are accumulated and shown left of −1.0 and right of 1.0 on the x-axis.Reach details are given inTable 2.

Figure 5 .
Figure 5. Bar plot of Dice coefficient for each imageset, aggregated for the entire test set (a.), and aggregated by reach (b).Note that the imageset is delineated by symbols denoting the three layers as defined in Table 3.(a) shows the Dice coefficients aggregated for the entire image, and aggregated within only a 50m buffer of the river centerline.(b) shows Dice coefficients aggregated within a 50 m buffer of the river centerline for each reach's test set.Reach properties are given in Table2.Reaches 3 and 4 which are the furthest downstream are predicted with highest accuracy, and image-normalized elevations with a minimum relief of 2 m (DZ L-2 _0_0) yield the best results overall.

Figure 6 .
Figure 6.Box plot of difference between cross sectional Dice coefficients for HydXS outputs, and the best neural network output (Run: −2_0_0), for the test set only.Results are sown for (a.) each reach and (b.) each geomorphic category; "multi-channel" indicates there are multiple low channel areas within the bankfull cross section, and "inset floodplain" indicates an inset floodplain or bench feature within the bankfull cross section.Details of reaches are given inTable 2.

Figure 7 .
Figure 7. Maps showing the test set region for each reach with HydXS bankfull extent results (red circles) and the best neural network ( −2 _0_0) bankfull extent predictions (blue polygons) overlaying the ground truth (black lines).Details of reaches are given inTable 2.

Table 2
). Manually estimated mean bankfull width (see Section 2.2.4) ranges from 7.5 m in Officer Road Drain to 26 m in the incised reach of Cardinia Creek.Riparian vegetation is variable in quality and extent, with forested buffers of around 100 m width in some reaches, and narrow or sparse forest cover, or weed-dominated vegetation in others.The study area thus provides the opportunity to test

Cross-Sectional Method 2.3.1. Method Overview We
developed an objective method for delineation of bankfull extents based on De Rosa et al. (2019), named HydXS to indicate that the bankfull calculation is a function of the Hydraulic Depth of a Cross Section (see

Table S2
Annotations (see Section 2.2.4) were split into test and training sets with a 10%-90% split stratified by reach.From the training data, 10% was used as a validation set within the training loop.To split out these sections geographically, each reach was split into 10 equal segments and one segment from each reach was randomly selected and reserved for test data.The remaining nine segments from each reach were used for training data.A 100 m buffer of the river centerline was constructed and split into test and training regions based on the segments.The image tiles extracted from the lidar set were sorted into training (69 images), validation (16 images) and test (20 images) sets according to which regions of the corridor the images overlaid.