Topological Relationship-Based Flow Direction Modeling: Stream Burning and Depression Filling

Flow direction modeling consists of (a) an accurate representation of the river network and (b) digital elevation model (DEM) processing to preserve characteristics with hydrological significance. In part 1 of our study, we presented a mesh-independent approach to representing river networks on different types of meshes. This follow-up part 2 study presents a novel DEM processing approach for flow direction modeling. This approach consists of (a) a topological relationship-based hybrid breaching-filling method to conduct stream burning for the river network and (b) a modified depression removal method for rivers and hillslopes. Our methods reduce modifications to surface elevations and provide a robust two-step procedure to remove local depressions in DEM. They are mesh-independent and can be applied to both structured and unstructured meshes. We applied our new methods with different model configurations to the Susquehanna River Basin. The results show that topological relationship-based stream burning and depression-filling methods can reproduce the correct river networks, providing high-quality flow direction and other characteristics for hydrologic and Earth system models.

. The stream burning method was extensively discussed in our earlier study (Liao et al., 2022) and other literature (Lindsay, 2016b).
The major limitation in existing stream-burning models is their aggressive modifications to the river (and the riparian zone) elevations.These modified elevations directly alter the calculation of slope, an important flow routing parameter.The modifications are needed because the models treat the vector-based river networks as a binary mask to lower the elevations.Unlike stream burning, depression removal does not require a vector-based river network data set as input and can be carried out before or after stream burning.Depending on how elevation is modified, depression removal can be classified into (a) depression filling, which increases the elevation of the depression (Barnes et al., 2014) and (b) depression breaching, which breaches a path from the depression toward the domain boundary.Depression filling is more computationally efficient but suffers from aggressive elevation modifications.Depression breaching does not have the aggressive modification issue, but it suffers from computational complexity (Lindsay, 2016b).
While stream burning and depression removal are different techniques, they are closely connected.Both methods may modify a cell's elevation so that stream burning may alter the result of depression removal or vice versa.Several studies have tried to combine stream burning and depression removal within a unified workflow to obtain consistent results (Liao et al., 2022;Saunders, 2000).However, producing a hydrologic-simulation-ready DEM and its associated flow direction remains challenging as a well-established elevation manipulation scheme does not exist (Lindsay, 2016a(Lindsay, , 2016b)).Previous research proposed an alternative hybrid breaching filling method to avoid aggressive modification to both river and land elevations (Lindsay, 2016b).The method uses a revised priority flood approach to fill the land cell depressions and river network topological relationships to breach river cell depressions.
The second flow direction modeling method is used at a continental or global scale.As discussed in our part 1 study, it is often referred to as the "upscaling" method (e.g., the Dominant River Tracing model) because it uses high spatial resolution data sets (e.g., results from the raster DEM-based method) as guidance to define the coarse resolution (around 10-200 km) cell-to-cell flow direction (Davies & Bell, 2009;Fekete et al., 2001;Wu et al., 2011).Because this method often assumes that there is always one major river channel within each large-scale mesh cell, the flow direction field is generally equivalent to the river networks.Because the upscaling method relies on high spatial resolution data sets, it does not require additional stream burning or depression removal.It derives flow routing parameters through fine-scale data synthesis.
Similar to the river network representation methods, existing flow direction models at both regional and global scales are limited to the rectangle mesh systems.However, some algorithms can be extended to other mesh systems (Barnes et al., 2014).Model development based on unstructured meshes has become an emerging area of interest in hydrologic and ESMs.In addition to the three advantages discussed in our part 1 study, model development based on unstructured meshes addresses several limitations of traditional hydrologic models, including high latitude spatial distortion (Feng et al., 2022;Liao et al., 2020).
To the authors' knowledge, the HexWatershed model, a hexagon mesh-based watershed delineation model, is the only flow direction model that includes stream burning and depression removal and can be extended to a fully unstructured mesh framework as of this writing (Liao et al., 2020).This study extends our part 1 study (Liao et al., 2023), describing a topological relationship-based river network representation method to introduce topological relationship-based stream burning and depression-filling algorithms within the HexWatershed model.We upgrade the HexWatershed model to a fully mesh-independent framework (Liao et al., 2020(Liao et al., , 2022(Liao et al., , 2023)).Part 2 of the study is organized as follows.We first introduce the model algorithms.We then apply the updated model to the same coastal watershed used in the part 1 study, the Susquehanna River Basin (SRB), with different model configurations and evaluate the model performance against several characteristics and data sets (e.g., elevation, slope, and drainage area).Finally, we discuss the method's limitations and future applications in hydrologic and ESMs.

Overview of HexWatershed
HexWatershed (v1.0/2.0) was initially designed as a hexagonal mesh-based watershed delineation model based on the priority-flood depression filling algorithm (Liao et al., 2020).Later, we introduced stream burning to improve the flow direction, and stream network representation at coarse spatial resolutions (Figure S1 in Supporting Information S1) (Liao et al., 2022).Because the core stream burning algorithm within HexWatershed v2.0 is based on a rasterization-based method, the model is subject to the same limitations as existing methods.HexWatershed (v1.0/2.0) was written in C++11 with OpenMP enabled.
In HexWatershed v3.0, we introduced topological relationship-based stream burning and revised depression-filling algorithms.The overall workflow of HexWatershed v3.0 is similar to earlier versions, with the major difference being the use of topological relationships (Figure 1).Additional watershed characteristics are also modeled, including travel distance (the flow direction-based distance between each cell and its watershed outlet).
In HexWatershed v3.0, we introduced the hybrid Python (front-end) and C++11 (back-end) software architecture to streamline the model simulation.The Python component processes most of the data I/O to support various data formats, including GeoTIFF and the network Common Data Form.It also links the HexWatershed model with the dependency PyFlowline Python package, which allows advanced high-performance packages such as Cython (Behnel et al., 2011).The C++11 component conducts the depression removal with enhanced capabilities.Following the design of PyFlowline, which supports parallel computing for multiple watersheds, HexWatershed also supports parallel computing in several steps.However, parallel computing is limited within a large watershed because of the graph dependency, especially for unstructured meshes.In the future, advanced graph theory algorithms such as the axis-aligned bounding boxes tree will be used to further improve model efficiency.
We will first introduce the topological relationship-based stream burning and depression-filling algorithms before providing details of the mesh-independent framework.

Topological Relationship-Based Stream Burning and Depression Filling
The topological relationship-based stream burning and depression-filling algorithms process cell elevations using a two-step approach.First, the model processes river cells and their riparian zone land cells using a hybrid breaching filling stream burning algorithm.Each river cell may be modified more than once in this step because of the breaching algorithm.Second, the model processes the remaining land cells using a revised priority flood depression filling algorithm.Because the second step does not modify the results of the first step, this approach generates a consistent depression-free DEM with river networks burnt in.

Hybrid Breaching-Filling Stream Burning
The PyFlowline simulation from our part 1 study produces a JavaScript Object Notation (JSON) file that contains the neighbor and downstream information (if applicable) of each mesh cell (Liao & Cooper, 2022, 2023).The hybrid breaching filling stream burning algorithm uses this information to fill adaptively or breach river cell elevation.Our model's stream-burning algorithm is a depression removal (filling and breaching) algorithm specifically designed for river networks.
Similar to our earlier study (Liao et al., 2022), the algorithm reversely searches and adjusts river cells from the outlet toward the headwater.Without significantly decreasing the outlet elevation, it adjusts the elevation of a depression river cell using either filling or breaching based on the elevation difference between the depression and a user-provided threshold.For example, if the absolute value of depression is lower than the user-provided threshold, a filling is applied.Otherwise, breaching is applied.Figure 2 provides a one-dimensional example.
This algorithm runs recursively until all the river segments/reaches are processed (Figure S2 in Supporting Information S1).Because stream order information is also available from the PyFlowline simulation, different parameters are used for different upstream channels when a river cell is a confluence.For example, a lower percentage (e.g., 1%) is used for high-order rivers, and a higher value (e.g., 2%) is used for low-order rivers.After the river cells are processed, the land cells' elevations in their riparian zones are increased if needed.Because of the availability of the topological relationship, the flow direction in the river networks is directly defined using the upstream-downstream information.
The topological relationships feature can also be turned off (Table S1 in Supporting Information S1).This converts the river networks from the PyFlowline JSON file to a binary mask.As a result, the model runs in a traditional rasterization-based stream burning method and only applies depression filling in the river cells and their riparian zone land cells.

Revised Priority-Flood Depression Filling
Priority-flood is an algorithm that fills DEM depressions by sequentially "flooding" the domain from a boundary inward to adjust elevations to assure surface drainage.In HexWatershed v1.0, all the mesh cells are treated the same in the priority-flood algorithm.This algorithm performs reasonably well when the quality of DEM is high.In HexWatershed v2.0, the priority-flood algorithm was revised to prioritize the river cells when pushing neighboring cells into the priority queue.For example, after the algorithm locates the cell with the lowest elevation (starting from the watershed outlet) in the queue, it will first adjust the upstream river cell(s) of this cell, then processes the remaining non-river cells.This process runs recursively until all cells are processed.However, if a river cell has two or more neighboring river cells (e.g., near confluence or meander), the algorithm cannot decide which upstream cell should be processed first, leading to uncertainty in modeled flow direction in river networks.Even with user-provided stream order information, it remains a challenge to consider all the scenarios if the explicit topological relationship is unknown (Figure S3 in Supporting Information S1).Besides, without the topological relationship, the algorithm can only increase the cell elevation when a depression is identified; thus, a large elevation drop (a user-provided parameter) is required to reduce the elevation modifications to the land cells.In HexWatershed v3.0, the depression removal as a whole is separated into two steps.As described above, the first step focuses on the river mesh cells and their riparian zone land mesh cells using a hybrid breaching-filling stream-burning algorithm.In the second step, we revised the priority-flood algorithm to accommodate this two-step workflow.First, although the revised algorithm still searches the whole domain to identify the cell with the lowest elevation in each step, it does not distinguish river and land cells.Second, it will not change the cell elevation once it recognizes it is already processed in Step 1, guaranteeing the consistency of the model results.This design significantly reduces the algorithm's complexity because it no longer needs to consider different scenarios.

Mesh-Independent Framework
To support unstructured mesh systems (Engwirda, 2017;Ringler et al., 2013;Sahr, 2015), HexWatershed v3.0 includes several changes.First, it supports all mesh systems from the mesh-independent PyFlowline model (Liao et al., 2023).However, the definitions of neighboring cells in PyFlowline and HexWatershed v3.0 are not always identical (Text S1 in Supporting Information S1).For example, a rectangle cell in PyFlowline has only four neighbors.In contrast, the same cell may have eight neighbors (four face neighbors + four vertex neighbors) in HexWatershed (Text S1 in Supporting Information S1).
Second, in a projected coordinate system, flow accumulation is often represented using the total number of upslope cells contributing to the current cell.The total drainage area can be calculated by multiplying the flow accumulation by the cell area, which is a constant.In an unstructured mesh, the cell area is not constant.To resolve this, we use the geodesic area of each cell when calculating the flow accumulation.
Third, HexWatershed v3.0 supports continental to global scale simulation, which is enabled by the design of the PyFlowline model.PyFlowline allows for multi-outlet modeling to generate multiple river basin networks within a single mesh.Based on this, HexWatershed v3.0 performs stream burning and depression filling for multiple watersheds in one simulation.

Study Area and Data
We applied the model to the same study area used in our part 1 study, the SRB (Figure S4 in Supporting Information S1).We use the same baseline data sets including Watershed Boundary Data set from our part 1 study (USGS, 2013).However, the user-provided river networks in this study represent the conceptual river networks produced from our part 1 study.We also obtained the DEM data set from the United States National Elevation Dataset (NED).Spatial data sets and maps were produced using Python packages, including Matplotlib and GDAL (GDAL/OGR Contributors, 2019; Gillies et al., 2023;Hunter, 2007;Liao, 2022bLiao, , 2022c;;Liao et al., 2023).

Model Setup
To evaluate the performance of the HexWatershed v3.0, we ran the model under different configurations with case indices used for illustrations (Table S1 in Supporting Information S1 and Table 1).The resolutions and case indices differ from those in our part 1 study.
For structured meshes, we ran two different spatial resolutions (5 and 40 km).For unstructured mesh, that is, the Model for Prediction Across Scales (MPAS) mesh, we used a variable resolution mesh with cell lengths varying from 3 to 10 km.These resolutions were selected for two reasons: (a) existing large-scale hydrologic models, such as the Variable Infiltration Capacity and river routing model, such as the Model for Scale Adaptive River Transport, often run at 0.5° × 0.5° or even coarser spatial resolutions; (b) the "coastal resolving" ESMs often use high spatial resolution (3-5 km) near the coastal lines.To demonstrate the effect of the topological relationship-based stream burning algorithm, we ran two simulations (without and with the topological relationships) for each resolution.The Supporting Information S1 contain the high-resolution meshes (overlapped with flow direction) of Cases 2, 6, 10, and 14 (Figures S5-S8 in Supporting Information S1).

Results and Analysis
Although the new algorithms affect many results, we only present major watershed characteristics often used by hydrologic and ESMs, such as surface slope and flow direction.

Surface Elevation
The modeled surface elevations with and without topological relationships exhibit significant differences near river cells.When the topological relationships feature is turned off, the modeled river cell elevations decrease approximately 100 m due to the large threshold applied.As a result, the river networks are also visible (e.g., Cases 1, 5, and 9 in Figure 3).The dramatic modification is also widespread from the headwater to the outlet.
In contrast, when the topological relationships feature is turned on, modeled river cell elevations are closer to their riparian zone cell elevations (e.g., Cases 2, 6, 10, and 14 in Figure 3).
We also extracted the elevation profiles from the watershed outlet to a USGS gauge site (Site ID: 01497842) on the main channel.The results show that when the topological relationships feature is turned on, the model can produce more realistic elevation gradients along the channel (Figure 4).However, the modeled elevations are still overestimated compared to the NED data sets (Text S2 in Supporting Information S1).
The topological relationships also significantly impact the distributions of domain-wide channel elevations (Figure 5).Generally, the average river channel elevations are much higher with the feature turned on than when it is turned off.

Surface Slope
Because the model calculates the between-cell slope from the depression-free surface elevation, the spatial patterns of the modeled slope with and without topological relationships are generally similar.Significant differences can appear near the river cells.
The density functions of the channel slope show that the average channel slope is smaller when the topological relationships feature is turned on than when it is turned off (Figure S9 in Supporting Information S1).This is consistent with the elevation profiles (Figure 4).
Because the river cell elevations substantially decrease when the topological relationships feature is turned off, the slopes between the river cells and their riparian zone cells are much larger than when this feature is turned on (Figure 6).
The density functions of the riparian zone slopes show that the average riparian-zone slope is larger when the topological relationships feature is turned off than when it is turned on.Their distributions are less affected by mesh types and resolutions when it is turned on (Figure 7).

Flow Direction
When the topological relationships feature is turned off, the model cannot reproduce flow direction fields that precisely follow the user-provided river networks, especially at coarse resolutions (e.g., Cases 3,7,and 11 in Figure 8).Specifically, the modeled flow direction fields cannot resolve river meanders and confluences.
In contrast, when the topological relationships feature is turned on, the modeled flow direction fields exactly overlap the user-provided river networks regardless of mesh type and resolution (e.g., Cases 2, 4, and 14 in Figure 8).

Drainage Area
Because the drainage area is calculated based on cell area and flow direction, the modeled drainage area varies with mesh type and resolution.When the topological relationships feature is turned off, the spatial patterns of modeled drainage areas from different meshes are similar at high resolutions.However, they differ significantly at coarse resolutions (e.g., Cases 3 and 6 in Figure 9).
LIAO ET AL. 10.1029/2022MS003487 7 of 17 In contrast, the drainage area spatial patterns from different meshes are very similar when the topological relationships feature is turned on across all tested resolutions (e.g., Cases 4, 8, and 12 in Figure 9).
At high mesh resolution, turning on the topological relationships feature better captures drainage area near river channels (e.g., Cases 13 and 14 in Figure 9).
The reference drainage area can be calculated from the Watershed Boundary Dataset using the watershed polygons.Compared with the reference drainage area, all cases underestimated the total drainage area by 5%-10% at high spatial resolution.This is primarily caused by missing portions at the upper boundary (e.g., Cases 2 and 10 in Figure 9).The numbers of cells in all cases suggest that the MPAS mesh includes more cells than other meshes due to its refinement near the watershed outlet (Figure S10 in Supporting Information S1).  1).Because the topological relationships feature is turned off in cases with odd indices (e.g., 1, 3, 5, and 7), the river cell elevations are much lower than the corresponding cases with even indices (e.g., 2, 4, 6, and 8).1).The x-axis is the travel distance from the outlet, and the y-axis is the elevation.The black line represents the elevation profile from the NED data sets.Different cases have a different number of data points due to resolution differences.The NED data sets are not depression-free.
Figure 5. Density functions of the river channel elevation (m) from Cases 1 to 14 (Table 1).10.1029/2022MS003487 10 of 17 of mesh type and resolution (Figure 11).The topological relationship feature does not have a significant impact on travel distance.This is because a cell can have a similar travel distance even with different flow directions (Figure S11 in Supporting Information S1).
The scatter plot between the observed and modeled travel distances suggests that the model can reasonably capture the travel distance, especially when the topological relationship feature is turned on.First, when this feature is turned off at the high spatial resolution, the flow direction often takes shortcuts and the model often underestimates the travel distance (e.g., Cases 1, 5, 9, and 13 in Figure 12).In contrast, when it is turned on, and the flow direction precisely follows the river channel, the modeled travel distances are larger and maybe even be greater than observations (e.g., Cases 2, 6, 10, and 14).Second, compared with structured meshes, MPAS mesh-based cases underestimate travel distance because the river cells are aligned with real-world river channels.Third, a strong correlation ratio exists between the observed and modeled travel distances when the topological relationship feature is turned on.This ratio depends on the mesh type.For example, the ratio for the structured latlon, square, and hexagon meshes are 1.04, 1.04, and 1.03, respectively.Finally, Case 14 performs similarly to the DRT model at 1/16° resolution (∼7 km) near the watershed outlet.The DRT model uses the actual flowlines to represent the travel distance (Wu et al., 2011).

Importance of Topological Relationships in Stream Burning
Model simulations from Cases 1 to 14 demonstrate that the topological relationship-based stream burning can reduce modification to river and land elevations.This is because the adaptive hybrid breaching filling algorithm uses topological relationships to adjust elevations on demand (Figure 3) (Lindsay, 2016b).As a result, the updated DEM can be used to directly model river channel and riparian zone slopes that meet hydrologic model requirements.However, the comparison between modeled DEM and NED data sets suggests that the model parameters, that is, the filling ratio and breaching threshold, should be tested to improve further model performance  (Figure 4).The flow direction algorithm considers both the topological relationships and elevation gradient, allowing it to define flow direction fields that are consistent with the user-provided conceptual river networks (Figures S4-S7 in Supporting Information S1).
However, because these capabilities depend on the topological relationships modeled by PyFlowline, the limitations from the part 1 study propagate into this study.

Depression Filling
Unlike the method from our earlier study (Liao et al., 2022), our new method separates stream burning and depression filling into two steps and significantly simplifies the workflow.
First, the new method demonstrates that, if carefully designed, we can conduct full-domain depression removal sequentially to include different hydrologic features (e.g., lakes, rivers, and land) without introducing additional model complexity.The results remain consistent after the final step.Second, because stream burning reduces the modification to the river (and its riparian zone) elevations, the improvements will also improve depression filling for the remaining land cells.

Watershed Characteristics Across Scales
The hydrologic processes in hydrologic and ESMs are not at the same spatial scales as the mesh resolutions (BERAC, 2017).As a result, representing watershed characteristics across scales is critical.Our simulation cases suggest that some characteristics, such as the travel distance, can be reconstructed from the conceptual travel distance by a scale factor (Figure 12).However, reconstruction remains challenging for other characteristics.For example, river segment and river order information can differ from case to case.Even in the same case, the modeled stream segment and order outputs can vary from the user-provided values because of the flow accumulation threshold (Lin et al., 2021).In some scenarios, preserving these values is preferred to maintain consistency (the orange arrow in Figure 1).Another example is the drainage area, as all cases either underestimate or overestimate the total drainage area.This is mainly caused by the missing portions or the partially included/excluded cells at basin margins (Figure 9 and Figure S9 in Supporting Information S1).

Limitations
This study has a few limitations: 1. Currently, we only consider the immediate neighbors as the river channel buffer zone in the stream burning algorithm, which means the single-cell resolution determines the buffer zone width.An adaptive buffer zone width that includes more than immediate neighbors is needed when the riparian zone width is larger than the mesh resolution.2. The elevation gradient near the river mouth is generally smoother compared to the vicinity of the headwaters (Figure 4).As a result, the filling and breaching parameters should be adaptive, considering the mesh resolution and distance to the watershed outlet.In some cases, a dam may alter the elevation profile.These parameters should also depend on the location of the river channel (Figure 4). 3. The model should include other hydrologic features such as watershed boundary and (endorheic) lakes in the workflow.For example, it is possible to include watershed boundaries in the mesh generation process to allow the model to improve the drainage area without missing the marginal areas.Similarly, we should include lakes in the mesh generation and depression removal processes to consider fill-spill scenarios (Barnes et al., 2020).

Conclusions
In this study, we extended our part 1 study to develop a mesh-independent topological relationship-based flow direction model (HexWatershed v3.0).We applied the model in different configurations to the SRB.The results show that our model reduces modification to the river and land elevations and produces high-quality flow direction fields and other flow routing parameters.We suggest that hydrologic and ESMs with a flow routing component should adopt our method, especially for unstructured mesh-based simulations.1).As explained in our earlier studies, the travel distance may be larger when the topology feature is turned on because of the "zig-zag" effect.

Figure 1 .
Figure 1.Workflow of the HexWatershed v3.0 model with the topological relationship-based stream burning and depression filling algorithms.Rectangles inside the green dashed rectangle are the part 1 study topological relationship-based river network representation using the PyFlowline model, which produces the topological relationships information (green rectangle).The topological relationships are used by the hybrid breaching filling stream burning algorithm (light purple rectangle), followed by the revised depression filling algorithm (orange rectangle).The topological relationships are also used by the flow direction algorithm (blue arrow) and stream segment/order definition algorithms (orange arrow).

Figure 2 .
Figure 2. Illustration of the hybrid breaching filling stream burning algorithm.(a) The original river cell elevation profile is the same as in Figure S1 in Supporting Information S1.Each cell is marked with an index, the process step.(b) Because the depression between Cell 2 and 3 is less than the user-provided threshold (e.g., 5 m), Cell 3's elevation is increased by a gentle slope (e.g., 1%).(c) Similarly, Cell 7's elevation is increased.(d) Because the difference between the updated Cell 7 and 8 exceeds the threshold, Cell 8's elevation is unchanged, while Cell 7 and its downstream cells are breached if needed.(e) Shows the resulting river cell elevation profile.
The illustrations and analyses all use the same indices.

Figure 4 .
Figure 4. River elevation (m) from the outlet to the main channel upstream United States Geological Survey gage site 01497842 (travel distance in m) for Cases 1 to 14 and the National Hydrography Data set/National Elevation Dataset (NED) (Table1).The x-axis is the travel distance from the outlet, and the y-axis is the elevation.The black line represents the elevation profile from the NED data sets.Different cases have a different number of data points due to resolution differences.The NED data sets are not depression-free.

Figure 8 .
Figure 8. Modeled flow direction fields from Cases 1 to 14 (Table1).The black line features represent flow direction fields with the drainage area scaled as the line thickness.The colored line features are the conceptual river networks generated by the PyFlowline simulation.The non-conceptual black line features are the original National Hydrography Data set river networks.When the topological relationships feature is turned on, the modeled flow direction fields are consistent with the PyFLowline generated conceptual river networks.

Figure 10 .
Figure10.Drainage area at the watershed outlet from Cases 1 to 14 (km 2 ) (Table1).The x-axis is the mesh resolution (5 and 40 km), and the y-axis is the drainage area (km 2 ).The dashed line is the reference drainage area from the Watershed Boundary Dataset.Model for Prediction Across Scales-based cases are plotted in both resolutions.

Figure 11 .
Figure11.The modeled travel distances from Cases 1 to 14 (Table1).As explained in our earlier studies, the travel distance may be larger when the topology feature is turned on because of the "zig-zag" effect.