A Rainfall‐Based, Sequential Depression‐Filling Algorithm and Assessments on a Watershed in Northeastern Indiana, USA

The landscapes across much of the Midwestern United States are characterized by glacial activity that left water‐holding kettles, depressions, and potholes. Until recently, traditional watershed algorithms assumed these depressions to be errors in the elevation data and filled them as a means of correction, when many of these features may rarely fill and have the potential to dramatically affect surface flow patterns. These depressions play an important role in hydrology, water management, site planning, and agronomy. An optimized sequential depression‐filling algorithm (SDFA) was developed which fills these water‐holding features sequentially based on their respective retention capacity and contributing area. Outputs reflect the state of connectivity following the application of a user‐specified amount of rainfall excess (i.e., in excess of infiltration); depressions that have not been filled will remain as their own hydrologically common subcatchments. A performant algorithm is integral to future delivery and utilization by practitioners both in the office and in the field. The optimal set of subroutines were able to fill all of the depressions in a 239 km2 watershed in northeastern Indiana in 42 s and 1.5 h on a consumer desktop computer for digital elevation models (DEMs) at 30 m and 3 m resolutions, respectively (O(n2) overall performance).

depressions. The earliest algorithms filled all depressions; the assumptions behind this decision are suggested by Lindsay and Creed (2006): "(1) the scale and accuracy of a DEM is inadequate to represent actual depressions, which are generally small landforms; (2) depressions rarely occur in natural landscapes and artifact digital depressions are abundant; and (3) actual depressions have minimal impact on hydrogeomorphic processes since they either fill with water and overflow or find subsurface pathways that are closely approximated by surface topography." Early attempts to acknowledge depressional features applied filling and breaching to depressions greater than user-defined thresholds in depth, volume, or area (Lindsay & Creed, 2005;Martz & Garbrecht, 1998, 1999. However, a single threshold applied to all depressions is arbitrary in effect because it fails to represent dynamics of nested depressions and the hydrologic role that these depressions serve in the landscape. Initially, studies of the role of surface depressions within the landscape focused on the effects of soil roughness on depression storage, overall hydrologic connectivity, and runoff generation (Antoine et al., 2009;Darboux et al., 2002;Huang & Bradford, 1990;Yang & Chu, 2012). More recently, algorithms that simulate the fill-spill behavior of depressions have been developed, gradually filling depressions as they would actually be filled by water in a rainfall event. Some algorithms filled depressions by incrementally applying small amounts of water and adjusting the fill-levels of every depression dynamically (Appels et al., 2011). Shaw et al. (2012) developed a proof-of-concept model called the Simple Pothole Terrain Analysis Algorithm (SPILL) that fills depressions sequentially (one per iteration), resulting in improved theoretical performance over algorithms that incrementally apply water to every depression. Their results, however, were limited to small numbers of depressions and lacked the programmatic implementation details necessary for replication. Chu et al. (2010) developed a Windows-based tool for delineating depressions and several physical parameters, given a DEM. Yang and Chu (2012); Chu et al. (2013) then detailed a progressive puddle-to-puddle model that incorporates filling, spilling, and merging in addition to a conceptual splitting component where infiltration causes depressions to disconnect from one another. Yang and Chu (2015) introduced overland flow and infiltration into their model. Nasab, Singh, and Chu (2017) integrated this model into the popular Soil and Water Assessment Tool (SWAT). Nasab, Zhang, and Chu (2017) recognizes the stepwise patterns to hydrologic connectivity, depression storage, ponding area, and contributing area that results from topographies with depressions in the D-cubed algorithm developed.
A description of the algorithmic complexity will inform the performance of the algorithm for other potential users, as well as the scalability of such algorithms for generating expansive datasets of depression information. Existing studies of algorithm complexity are limited to those that were devised to remove all depressions as opposed to filling them sequentially (Planchon & Darboux, 2001). Barnes et al. (2014) provides a thorough review. The most efficient of these algorithms, referred to as priority-flooding, begin along the perimeter of the DEM and work inward, raising cells as necessary while prioritizing lower-elevation cells for processing first. Priority-flood algorithms are able to achieve an algorithmic efficiency of O (m log m), where the number of depressions m does not exceed the number of cells in the DEM, n.
Further performance gains were made by leveraging the hierarchical nature of depressions nested within one another in a graph structure where nodes represent each depression and edges represent the potential spillover elevations (i.e., a "pass" in common literature) between adjacent depressions. Motivated by wetland cataloging, Wu et al. (2015) devised a novel contour-tree algorithm where nested depressions are identified as enclosed rings contained within other rings from which the hierarchical relationships can then be extracted (Wu & Chu, 2017). After identifying wetlands of interest, they utilized conventional flow routing and watershed delineation techniques, that is, the contour tree method's minimum unit is the depression and does not simultaneously handle the complexity of raster flow (re)routing. A later iteration improved performance over the vector-based method using a raster-based level-set method (Wu et al., 2019). Furthermore, the work of Wu and Chu (2017) demonstrates a definition of depressions, commonly used where only the "wet surface" is considered, omitting the contributing area above the high water line. In doing this, a global hierarchy of depressions based on the rainfall events required to fill them cannot be produced, precluding rainfall depth-based sequencing of depression filling.
The algorithms of Barnes et al. (2014); Barnes et al. (2020) utilize a wet surface definition of depressions in their development of filling algorithms optimized for the case, where all depressions must be filled, disregarding sequence. Cordonnier et al. (2019) used a minimum spanning tree (MST) approach to find the depression hierarchy. A graph representation of the landscape is developed where all adjacent depressions are linked in the graph, and weight by the minimum spillover elevations between pairs of depressions. Two different MST algorithms were then applied, preserving the minimal set of spillover locations and granting connectivity from each depression to the DEM boundaries (i.e., spanning the tree). However, the process which then traverses this graph, merging depressions, and handling the associated raster datasets is not equipped to sequence these merge operations based on rainfall depth. As a result, these algorithms do not generate the rainfall-based depression hierarchy nor can they produce the set of intermediate states of connectivity resulting from a particular rainfall event. By developing a rainfall-based sequence through which depression merge operations should follow, an algorithm will better replicate the natural process of rainfall filling depressions, producing intermediate flow routes and states of overland flow connectivity that better match the progressions observed in the field. This is critical in landscapes where natural depressions exist and do not readily overflow.
This study seeks to extend existing research with the following objectives: 1. Develop an open source sequential depression-filling algorithm (SDFA) capable of simulating the fill-spill flow patterns of surface depressions based on rainfall depth in an automated, repeatable fashion 2. Describe the resulting set of states of connectivity through a graph-structured, rainfall-based depression hierarchy 3. Assess the theoretical performance of the algorithm using Big O notation to describe algorithmic complexity 4. Assess the performance of SDFA on a 239 km 2 watershed in northeastern Indiana to demonstrate the effect of rainfall excess and depression storage on the resulting contributing area at the watershed outlet

Methods
A sequential depression-filling algorithm is key to simulating the dynamics of water retention within depressions and simulating surface flow connectivity that matches real-world observations. The algorithm fills depressions with a user-specified rainfall excess amount, that is, the runoff-ready portion of rainfall that assumes infiltration and other abstractions have already been taken into account; all rainfall excess is applied to the depression-filling process. Figure 1 presents a flow chart of the primary algorithm steps.

Initialize Flow Direction Grid (Step 1a)
D-8 flow direction (Jenson & Domingue, 1988) was chosen due to its simplicity, reliability, and ubiquity in common software. While other methods such as D-Infinity may produce improved localized flow routes, this choice has little impact on the overall area contributing to each depression, only how it arrives there. For a particular cell, if no neighboring cells have a lower elevation, including flats, then that cell is marked as a pit cell in the flow direction grid.
An alternate data structure of flow direction under the D-8 algorithm was implemented whereby each cell's flow direction is defined as the index of the cell to which it flows (its child) rather than the power of two representation used conventionally and made arbitrary by modern programming language capabilities.  providing a means to representing real-world structures such as culverts, drainage tiles, and other conduits given the appropriate datasets. By eliminating the conventional lookup table associating neighbors with unique powers of two and instead referencing cell indices directly, traversals of the flow direction data (e.g., watershed delineation) are performed more rapidly.

Initialize Flow Direction Parents (Step 1b)
The set of cells that contribute flow into downstream cells (i.e., the flow direction parents) were also stored, making the flow direction tree structure doubly linked. While the relationship between parent and child is one-to-one under the D-8 approach, the relationship between child and parents is one to many, necessitating a different data structure from the downstream flow direction matrix. In MATLAB, this was accomplished using a cell array where each element contains a variable-length array of parent cell indexes.

Initialize Depression Label Matrix (Step 2)
Each pit cell that was found in the flow direction grid is assigned a unique integer value label beginning with one and counting up in ascending order. Specifically, as shown in Figure 3, we define a depression as a single pit cell and the set of cells that flow to that pit cell; this set of cells are assigned the same label value in the depression ID matrix. To initialize this matrix, each pit cell must be located and marked, an O(n) operation by itself. This process also returns the array of pit cell indices as output to be used in preceding steps. The depression label matrix is initialized by traversing upstream from each pit cell via flow direction parents. As the traversal takes place, each visited cell is assigned a label in the label matrix while the set of traversed cell indices are accumulated in an array. The array may then be reused in order to avoid unnecessary traversals in future steps of the algorithm.

Initialize Spillover Elevation/Location (Step 3)
Each depression has a minimum spillover elevation along the perimeter where water will begin to overflow. This step is perhaps algorithmically unintuitive because it is dependent upon not only the cells along the perimeter of the depression in question, but also the cells of neighboring depressions that are immediately adjacent. As demonstrated in Figure 3, cell F is the cell of minimum elevation along the boundary of a depression. However, cells just beyond the perimeter (cells A and G) belonging to the neighboring depressions may be of greater elevation. Such cells serve as barriers and must also be considered. In Figure 3, the depression cannot overflow until filled to the elevation of cell B. This detail affects the spillover location as well as calculations of retention volume, volume to contributing area ratio (VCA), and ordinal position in the filling process.
Perimeter cells are defined as any cell having an adjacent cell of a different depression label within its 3-by-3 neighborhood. Given the set of cells along the perimeter of two depressions i and j, the following equation defines the spillover elevation/location: where s ij E is the spillover elevation (m), and [E i , E j ] is the array of paired elevations along the perimeter between depressions i and j. E i is the elevation of a perimeter cell that belongs to the depression of interest and E j is the elevation of a neighboring cell just outside of the perimeter in an adjacent depression.

Initialize Depression Volumes, Areas, and VCAs (Step 4)
After finding the spillover elevation, the volume of the depression can be computed: 10.1029/2020MS002362 4 of 12 where: V d = depression volume, m 3 V pf = volume filled in previous iterations, m 3 n = number of cells with elevations less than the spillover elevation A i = cell area, (DEM resolution) 2 , m 2 E s = spillover elevation, m E i = cell elevation, m Only cells with elevations less than the spillover elevation will retain water and are evaluated in Equation 2. The value of V pf accounts for the volume that was filled in prior iterations of the filling process (further described in Section 2.6). The full set of cells used in Equation 2 are stored in memory and looked up, avoiding any need to traverse the flow direction data to gather them. Next, the contributing area of the depression is given by: where: m = number of cells draining to the pit cell, including the pit cell A c = contributing area, m 2 Together, the VCA is equivalent to the depth of rainfall excess necessary to fill that particular depression: where VCA = volume to contributing area ratio, m P e = excess precipitation, m

Iterative Depression Filling (Step 5)
Filling begins as the depression with the minimum VCA overflows, continuing in ascending VCA order until all depressions are filled. The algorithm can also perform filling to a particular rainfall excess value by iterating through depressions until the VCA of the next depression exceeds that particular rainfall excess value. Any depressions that remain will not be filled, they will remain their own hydrologically common watershed and do not contribute to other watersheds.
Two types of depression mergers have been identified based on how the two depressions are oriented. A submerge-type merger (Figure 4a) occurs when the spillover elevation of the resulting composite depression is greater than the spillover elevation identified between the two depressions individually (i.e., the two merging depressions are submerged in the result). The DEM representing the current state of the surface, as it is progressively filled may be referenced to correctly compute volume on each iteration for submerge-type depressions. A cascade-type merger (Figure 4b) occurs where the first depression fills and cascades into the second depression; the new spillover elevation has a lower elevation than one or more depressions that have been previously filled. Because the summation of volume under this new spillover elevation would exclude the volume located above the spillover elevation, a separate sum, V pf in Equation 2, is necessary and maintained for each depression.
Step 5a begins each iteration by identifying the depression of minimum VCA. Next, Step 5b updates the flow directions of the overflowing pit to reflect the new state of flow connectivity downstream. Flow is rerouted NOEL ET AL.

10.1029/2020MS002362
from the overflowing depression by iteratively tracing downstream from the spillover location until the pit cell is reached, pointing flow back toward the previous iteratee along the way. By virtue of providing a flow path for the pit cell, every cell in the depression is granted connectivity downstream. Alternatively, an O(1) method assigns the pit cell a flow direction value equal to the spillover index. However, this method does not conform to conventional D-8 and results in discontinuities in the resulting flow routing outputs that can be visually unintuitive.
Step 5c updates the DEM, raising the elevations of the depression and then updating the depression ID matrix such that the overflowing depression takes on the label of the downstream depression to which it now flows. Again, the cached copy of cell indexes is accessed to update the depression ID matrix and DEM using conditionals. In Step 5d, the set of candidate spillover locations are obtained by appending those of the two merging depressions, pruning out spillover locations no longer along the perimeter, and using Equation 1 to then find the new spillover location. For Step 5e, the set of cells comprising the two merging depressions are concatenated, and volume, area, and VCA are computed using the same equations and methods as in Step 4. An optimization for cascading-type mergers (detected when the new spillover location is less than the previous one) where the volumes of the two merging depressions are summed, eliminating the need to visit the set of cells comprising the depressions. This O(1) operation, however, reverts to an upper bound of O(n) (for an n-cell depression) in the case of a submerge-type merger.

For
Step 5, the algorithmic complexity must consider the number of iterations in addition to the complexity of the step itself. For example, a step that must visit every cell in the DEM on each fill iteration could be described as O(nd), where n is the number of cells in the DEM and d is the number of depressions. At most, Step 5 must compute on the set of cells comprising the merged depression as opposed to all n cells in the DEM. Unfortunately, there exists no analytical description of the depressions in terms of n due to their unpredictable size and shape in a given DEM, but only an upper bound in algorithmic complexity that may occur (though unlikely) when every cell in the DEM is a pit cell. Given a DEM with n cells and therefore n depressions, there will be n total iterations where the first iteration begins with two one-cell depressions merging (visiting 1 + 1 cells on that iteration). On the next iteration, the two-cell depression will merge with another one-cell depression (visiting 2 + 1 cells on that iteration), and so forth until no depressions remain. This series can be described as follows: where n is the number of cells in the DEM, which reduces to a Big O notation of O(n 2 ). This result also demonstrates the potential shortcomings in Big O notation comparisons; for example, an algorithm that must visit every cell in the DEM on every iteration would have an equivalent Big O notation despite greater run times.  excess amount of interest. Each step along the graph represents a change in the state of connectivity over the DEM surface.

Flow Accumulation (Step 6) and Watershed Delineation (Step 7)
Flow accumulation is a challenging computation because it requires global knowledge regarding the other cells flowing toward any particular cell of interest. Flow accumulation is computed using a recursive traversal upstream along flow direction parents. As each recursive call returns, flow accumulation values are incremented and this sum is returned to be used in the next flow direction child. Finally, watershed delineation (Step 7) utilizes the traversal method described in Step 2 on a selected outlet cell to generate a Boolean matrix repressing the watershed consisting of ones for contributing cells and zeros for all remaining cells. Because of the redundancy of this subroutine with respect to others, this step is omitted from performance evaluation.

Test Watershed and Data Used
A watershed was selected to evaluate the scalability of the model in terms of performance on a broad scale. A subportion of the St. Joseph River Watershed (HUC 0410000306) located in northeastern Indiana was chosen for analysis (St. Joseph River Watershed Initiative, 2005). It has an area of 239 km 2 and glacial activity played a major role in the development of the soils, topography, and hydrology of the region. The landscapes are flat to rolling and feature a number of potholes and depressions that retain water. It is representative of the glaciated landscapes across much of the Midwestern United States. It is largely undeveloped with agriculture as the primary land use. Figure 6 shows imagery and elevations of the watershed. A watershed of this size was selected for testing in order to (1) demonstrate the scalability of the algorithm in terms of algorithmic performance and (2)  Note, the graph is rooted at the node labeled edge (i.e., the edge of the terrain) and VCA increases along graph edges when working from leaves to roots. These figures illustrate the set of states realized only after depressions are filled. For example, the spillover elevation of depression 2 in (a) and the VCA of depression 2 in (b) are computed with the knowledge that depression 1 contributes to depression 2 prior to its own fill-spill event.
watershed, and cells outside of the watershed were assigned NaN values. The DEMs were processed in MATLAB 2018a running on a Windows 10 desktop machine with an Intel Core i7-4770K processor and 6.7 GBof allocated Java Heap Memory. The MATLAB profiler was used to track performance and outputs of run time, and peak memory usage were recorded. A parallel pool consisting of six workers was configured for parallel processes. Other outputs of the algorithm itself also serve to illustrate the hydrologic role of depressions in the landscapes of this region. The percentage of the total watershed area connected to the outlet was recorded on each iteration for the three DEM resolutions.
Finally, a mass balance was performed to verify the conservation of mass through this simple model, omitting infiltration and other abstractions. The overall mass balance can be described as follows: where V t is the total rainfall excess volume applied, V s is the volume stored, and V r is the volume that ran off. V t can be computed as the rainfall depth applied uniformly over the area of interest, A AOI .
Next, the filled DEM surface can be compared with the original DEM surface to compute the volume stored in depressions: While depressions are typically ignored once they begin flowing off of the DEM surface, the paired array of contributing area versus rainfall excess was used to compute the runoff volume for the study watershed: where V r = runoff volume; m 3 n = number of iterations A i = contributing area at iteration i; m 2 VCA i = volume to contributing area ratio at iteration i, m, and VCA i−1 is the initial VCA of the region contributing to the outlet when iterative filling begins.
NOEL ET AL.

Algorithm Performance Results
Table 1 displays the performance results in terms of run time and peak memory usage associated with each subroutine comprising the algorithm. The high peak memory usage of Step 1 can be attributed to the additional resources needed for parallel processing.
Step 3 was grouped with Step 4 and performed in parallel across depressions, resulting in both reduced run time and increased peak memory usage. Unfortunately, MATLAB does not provide detailed profiling results for code within parfor loops, only summary information. For Step 5a, MATLAB utilizes an optimized min function that provides performance comparable to that of finding the minimum of a sorted list (finding the minimum of a sorted list was expected to perform as O(n)).
Step 5b, in which two options were described, one constrained by the convention of adjacent flow paths and the other not, is shown in Table 1. While the latter is superior due to its O(1) complexity, the former provides more intuitive output, it is compatible with conventional D-8 implementations, and is not a bottleneck in the algorithm.
Step 5d of the algorithm is one of the costliest steps in the algorithm as each potential spillover location must be evaluated.
Step 5e, however, has the largest impact on the performance of the algorithm with exception to flow accumulation, which though it has been evaluated, is not considered part of the proposed filling algorithm. While cascading-type mergers are an O(1) operation, the computation of depression volume is much more unpredictable and inefficient for submerge-type mergers. For this reason, the resulting run time and memory consumption can be largely attributed to the submerge-type depression mergers. Finally, Step 6, flow accumulation, serves as a proxy for the similar process of watershed delineation. While this step is the costliest in terms of run time, it is included for reference, not as part of the proposed algorithm.
The performance results for the algorithm run on three different DEM resolutions are summarized in Table 1. Peak memory usage occurs during Steps 3 and 4 where depression parameters are initialized in parallel, resulting in six times the memory demand had it been performed serially (due to the six parallel workers). The algorithm utilizes 3.15 GBof memory at peak on the 3 m DEM, meaning that an even larger watershed could be processed on the hardware used for testing before the constraint becomes memory.

Other Algorithm Outputs and Interpretations
The plot in Figure 7, used in tandem with Figure 8, demonstrates the discrepancy between previous assumptions that depressions are artifacts that can be negated and the claim herein that depressions can play an appreciable role in hydrology at a range of scales. These depressions are important to consider in regard to agricultural, drainage, flood control, and other decisions. This plot in Figure 7 takes a stepwise form, where after a threshold amount of rainfall excess, a depression is filled, overflows, and begins to contribute downstream to the watershed outlet. While many depressions are filled as rainfall excess is applied, only those depressions that contribute to the selected watershed outlet point are captured in this plot. After 50 mm of applied rainfall excess, less than 20% of the watershed area is connected to the outlet using the 3 m DEM (compared to 70% for the 30 m DEM). After 150 mm of applied rainfall excess, only 60% of the potential watershed area achieves connectivity to the watershed outlet according to the 10 and 3 m DEMs (80% for the 30 m DEM). Finally, a rainfall excess amount of 600 mm is necessary to fill all depressions and achieve 100% watershed connectivity, exceeding the equivalent of a 100 years, 24 h rainfall event, approximately 140 mm according to Hershfield (1961).  Figure 7 can also be used to visualize the runoff versus storage relationship with increasing rainfall excess. The area below the curve can be integrated to compute runoff (as a percentage of the total watershed area in its current form), while the area above the curve represents NOEL ET AL.
10.1029/2020MS002362 10 of 12  the water stored. Furthermore, the shape of the curve shows the rate at which the watershed transitions from storing water to generating runoff. Figure 8 shows the set of depressions produced on the 10 m DEM after applying 0, 25, 75, 150, 300, and 500 mm of rainfall excess. Connectivity, observed as contiguous regions of a particular color, gradually grow in spatial extent with increasing rainfall excess. The areas immediately adjacent to rivers and drainage ditches begin to connect in the early stages of the process (beginning with Figure 8b). As filling continues, this region eventually grows into the full watershed Figure 8. Subcatchment boundaries with straight edges corresponding to interstate roads and highways can be identified, for example, Figure 8e; such features erroneously retain water that is likely drained by culvert and bridge features that have not yet been hydrologically corrected in the elevation data. While the nonadjacent flow connections outlined in this study serve to address these sorts of features, the focus of this study is on the sequential depression-filling algorithm.

Conclusions, Limitations, and Future Work
An algorithm was developed which routes water across a modeled landscape (using the DEM), simulates retention, and fills depressions sequentially given an input rainfall excess amount. Such an algorithm is useful where elevation data reveals surface depressions that are not artifactual and may dramatically affect surface flow patterns as present in much of the Midwestern United States as well as other glaciated topographies the world over. Requiring only a value for rainfall in excess of infiltration provides a straightforward, intuitive parameterization of the algorithm, and should serve to provide a better understanding of the dynamics of potholes and depressional features on existing hydrologic models. In its current form, the algorithm is limited in terms of its ability to handle spatiotemporally vary rainfall events; while the algorithm can be augmented to better handle this, the complexity, intuitiveness, and performance of the algorithm may be impacted. Future work will focus on the integration of this algorithm into traditional hydrologic modeling frameworks to assess the overall hydrologic role of depressions in these landscapes and within agricultural systems in particular. This also means integrating additional model processes such as infiltration, routing, and other abstractions.
End-to-end, the algorithm produced a total run time of under 1 min for a 30 m DEM and about 1.5 h for a 3 m DEM of a 239 km 2 watershed (on a consumer desktop machine). Given a machine equipped with additional system hardware resources, DEMs that are several times larger can be processed. Statewide or nationwide datasets may be viable given a week's time of processing with a small cluster of machines. Assuming a similar frequency of depressions as the study watershed, the land area of the entire state of Indiana (94,322 km 2 ) at 3 m resolution is estimated to take about 5 days to process on a 16 GB machine. Shifting run time toward data initialization prior to filling provides future opportunities to serve the initialized depression parameters, allowing end-use applications to alter flow directions and perform filling to user-specified levels.
Corrugated drainage tiles are buried below the ground surface to drain excess water from heavy, poorly drained soils in much of the Midwestern United States; these tiles also provide an outlet for isolated depressions that produce standing water conditions. Tile inlets, culverts, and other conduits fundamentally alter the depression filling results as well as field-scale flow patterns and should be incorporated to provide accurate flow predictions beyond what can be achieved using only topographic data. Nonadjacent flow connections were implemented and utilized to redirect water after each spillover event, but also provide a means to include the aforementioned drainage features if supplied with this input data set.

Data Availability Statement
Source code and input data to replicate the results found herein are available via the Purdue University Research Repository with DOI 10.4231/ S2B6-J628 (Buckmaster & Noel, 2020