River networks: An analysis of simulating algorithms and graph metrics used to quantify topology

River networks are frequently simulated for use in the development and testing of ecological theory. Currently, two main algorithms are used, stochastic branching networks (SBNs) and optimal channel networks (OCNs). The topology of these simulated networks and ‘real’ rivers is often quantified using graph theoretic metrics; however, to date, there has not been a comprehensive analysis of how these algorithms compare regarding graph theoretic metrics, or an analysis of metric redundancy and variability across dendritic ecological networks. We aim to provide guidance as to which algorithm and metrics should be used, and under what circumstances. We performed an extensive simulation study in which we (a) identified orthogonal sets of metrics that describe the topology of real and simulated river networks, (b) analysed the relationship between algorithm hyper‐parameters and node topology metrics, (c) determined whether simulated and real rivers are indistinguishable in their graph metric scores and (d) examined how patterns of species abundances compare across the three network types. We identified two orthogonal sets of node metrics; those that describe centrality and those that describe neighbourhood characteristics. Both stochastic branching networks and optimal channel networks can reproduce network topology metric scores of real rivers, but this relationship is dependent on the algorithm hyper‐parameters used. Finally, using a metapopulation model, we show that both SBNs and OCNs can reproduce ecological patterns of species abundances similar to those of real rivers. SBNs and OCNs can replicate the node topology of real rivers. The choice of which algorithm to use will depend on the research aims, SBNs are faster to generate and more tractable, whereas OCNs can reproduce a wider variety of the characteristics of real rivers, but are more time‐consuming to generate. When quantifying node topology in river networks, we recommend the orthogonal node metrics eccentricity, when interested in network centrality, and mean neighbour degree, when interested in local node importance.


| INTRODUC TI ON
River networks exhibit intricate branching topologies and share properties such as critical scaling and self-similarity with other complex systems (Abed-Elmdoust et al., 2017;Albert & Barabási, 2002).
There is theoretical and empirical evidence that network topology is important for maintaining metapopulation stability and influences patterns of biodiversity (Carrara et al., 2012;Fronhofer & Altermatt, 2017;Terui et al., 2018). Graph theoretical approaches have frequently been used to quantify network topology (Dale & Fortin, 2010;Minor & Urban, 2008;Newman, 2018). Of broad interest are metrics that characterise the importance of nodes in the network in terms of the flow of information, material and energy (O'Sullivan, 2014); however, a comprehensive analysis of node metric redundancy and variability across dendritic ecological networks (DENs) has not been conducted. This analysis is necessary to provide guidance as to which metrics are best suited to capturing variability in node characteristics in DENs.
Graph (network) theory provides an elegant framework for unifying and evaluating multiple aspects of network topology and can quantify either structural or functional connectivity (Minor & Urban, 2008). A graph or network is a set of nodes (or vertices) and edges (links), where nodes are the individual elements in the network and edges represent connections between nodes. Edges may be binary (connected or not) or contain additional information (e.g. weighted based on distance and can be directed or undirected).
Likewise, nodes can be represented simply as points or can have additional attributes, such as size or carrying capacity (Newman, 2018).
Graph theory has been used to describe many network types including transportation, social interactions and landscape structure; here, we focus strictly on spatial graphs.
Frequently used graph-based algorithms designed to simulate river networks fall into three broad categories ( Figure 1). Stochastic branching networks (SBNs) depict rivers in the simplest terms (Labonne et al., 2008;Turcotte et al., 1998;Yeakel et al., 2014). This algorithm starts from an initial downstream node (the river mouth), one or two nodes are then iteratively added to a random node in the network with no upstream connections, the process is repeated until the desired number of nodes is reached (Labonne et al., 2008).
SBNs have been widely used in models of metapopulation dynamics K E Y W O R D S dendritic ecological networks, graph metrics, graph theory, network topology, node metrics, river simulation, river topology F I G U R E 1 Graph representations of rivers using (a, d) a stochastic branching network algorithm, (b, e) optimal channel network algorithm and (c, f) generated from a shapefile of a real river. Nodes represent habitat patches. Node colour is scaled by node eccentricity (yellow = higher eccentricity, blue = lower eccentricity) and ecosystem functioning (Anderson & Hayes, 2018;Fagan, 2002;Labonne et al., 2008;Terui et al., 2018). Recently, a second algorithm, optimal channel networks (OCNs), has become popular. OCNs mimic some of the geometric properties of real rivers, such as the distribution of upstream and downstream lengths and total contributing area to a point. The algorithm can also reproduce scaling features of real rivers such as Hack's law (the relationship between the length of streams and the area of their basins) and the scaling of the probability distribution of drainage areas (Rigon et al., 1993;Rinaldo et al., 2014). Finally, networks of real rivers can be extracted from digital elevation models or satellite imagery and converted to adjacency matrices and graph objects. These approaches lie along the tractability-realism gradient; SBNs are easy to generate but are the least realistic (in terms of mimicking some of the geometric properties of 'real' rivers). OCNs are more realistic than SBNs but are computationally more expensive to generate ( Figure S1).
The SBN and OCN algorithms can be used to generate many networks with varying and fixed hyper-parameters (inputs supplied to the algorithm), which is desirable for modelling applications where replicate simulations with variable network topologies are required.
Converting real networks to graph objects provides the most realistic representation, but it is not possible to produce a variety of networks with fixed input parameters (e.g. basin size, number of nodes).
Real networks are typically used for place-based experiments and analyses, whereas SBNs and OCNs are more frequently used for more abstract simulation models and theoretical development (Carrara et al., 2012;Erős et al., 2012;Fields et al., 2017;Fronhofer & Altermatt, 2017;Segurado et al., 2013). While OCNs reproduce many of the characteristics of real rivers, how these networks, and SBNs, compare to real rivers regarding network topology metrics has not been evaluated.
A plethora of metrics quantify different aspects of network topology; these metrics have a long history of use in spatial ecology (Bunn et al., 2000;Cantwell & Forman, 1993;Dale & Fortin, 2010;Wagner & Fortin, 2005) and many have been co-opted from other disciplines (Urban & Keitt, 2001). Graph metrics typically focus on the node level (one value per node), the network level (one value per network) or somewhere in between, such as motifs (there are also many edge metrics, which are analogous to node metrics).
Node-level metrics characterise each node in terms of their degree (the number of links the node has to other vertices), how often the node is expected to be traversed when moving between nodes, the number of shortest paths between nodes that cross through a given node, the distance from a node to other nodes, and can integrate additional information such as edge weights (Newman, 2018). A number of reviews have assessed the application of node importance metrics across a variety of network topologies (e.g. landscape and temporal networks and random graphs; Minor & Urban, 2008;Dale & Fortin, 2010;Bounova & De Weck, 2012;Nicosia et al., 2013), but not on dendritic ecological networks. In many non-DENs, such as social networks or terrestrial habitat networks, each node may be connected to many other nodes and hence there will be multiple pathways between any two nodes. By contrast in DENs there is typically just one pathway between any pair of nodes (assuming the network is constructed at the basin scale); this characteristic reduces the possible range of some metric scores and renders some topological metrics redundant. Because DENs differ in fundamental ways from the types of networks covered in previous reviews, there is a need to identify a set of metrics that enable concise characterisation of independent aspects of network topology and to help modellers determine the appropriate algorithm and parameterisation for a given problem.The primary aim of our study was to evaluate how a suite of frequently used network topology metrics vary across a gradient of river representations, ranging from networks produced via a simple stochastic branching algorithm to real rivers.
Specifically, we aimed to:

| Network generation
SBNs were generated using a stochastic branching process. The network generation process started from an initial downstream node (the river mouth). At each iteration, a random node in the network with no upstream connections was selected, and one or two nodes added upstream of it, depending on a branching probability (p). This process was repeated until some pre-determined number of nodes across the entire network was attained. Because the generation process was stochastic, two networks generated with the same branching probability could differ and, conversely, different branching probabilities could produce identical network structures (i.e. equifinality). The exception to this stochasticity is when p = 0.0 or 1.0, which always produce linear, non-branching and bifurcating networks, respectively.
OCNs are oriented spanning trees (built on rectangular lattices) that reproduce some of the scaling features characteristic of real river networks Rodriguez-Iturbe et al., 2009). The algorithm starts with a feasible network configuration imposed on a lattice landscape, and the OCNs are obtained by minimisation of a function representing total energy dissipated by water flowing through the network spanning the lattice. Links in the initial network are sequentially rewired until total energy expenditure across the network is minimised (Carraro et al., 2020). We simulated OCNs using the r package OCNet version 0.3.2 (Carraro et al., 2020), where the network topology is determined by seven hyper-parameters. Hyper-parameters dim x and dim y determine the size (number of pixels) of the initial lattice, thrA, the threshold value of the drainage area used to derive the aggregated network, maxL, the maximum reach length allowed (in planar units). If the path length between a channel head and the downstream confluence is higher than maxL, an additional node is inserted into the reach. The shape of the initial network configuration is defined by the parameter state, and temperature (a combination of two parameters, initialNoCoolingPhase and coolingRate) determines the temperature of the simulated annealing algorithm, which is used to derive the final network configuration. Finally, type determines the level at which pixels are aggregated together. Carraro et al. (2020) provide a detailed description of the implementation of the OCN algorithm.
Graphs of real river networks were generated by clipping a shapefile of global river centrelines with catchment polygons (https://www.hydro sheds.org) ranging in size from 300 to 6,000 km 2 (given we did not sample the largest rivers, it is possible extrapolating the results of this study to larger systems will underestimate differences between simulated and real river networks). Each river network was converted to an adjacency matrix with a node placed at all reach terminals and confluences. In all, 150 randomly selected catchments were sampled ( Figure S2). The initial 150 sampled rivers consisted of nodes with a degree of 1 or 3, whereas SBNs and OCNs can produce networks with a degree of 1, 2 and 3 (because these algorithms can discretise a river reach into a sequence of nodes). Thus, to make SBNs, OCNs and real river networks comparable, when we converted shapefiles of real rivers to graph networks, we split reaches and inserted additional nodes so that none of the obtained reaches were longer than some predefined length (med n ). We set med n to 0.5, 1 and 1.5 multiplied by the median network reach length, this yielded a pool of 600 real river networks for analysis (including the 150 networks where no partitioning of reaches was performed). Finally, for all metric calculations, we used unweighted graphs so that we could compare metrics across the three algorithms; metric scores generated from weighted and unweighted networks were highly correlated ( Figure S3).

| Metric calculation
We calculated 14 structural topology metrics for every node in each network, (Table 1; Figure S4). Each of these metrics provides information about a given node's position in the network (e.g. centrality) or its importance based on the number of connections it has to other nodes (e.g. degree). The 14 metrics we evaluated cover those

Eccentricity
The shortest path distance from the farthest node in the graph West (1996) Farness centrality Sum of the length of the shortest paths between the node and all other nodes Altermatt (2013) Mean neighbour degree Average degree of neighbouring nodes Barrat et al. (2004) Degree Number of edges connected to a given node Newman (2010) Harary Centrality Reciprocal of the shortest path distance from the farthest other node in the graph (reciprocal of eccentricity) Hage and Harary (1995) Closeness centrality Reciprocal of the sum of the length of the shortest paths between the node and all other nodes (reciprocal of the farness) Newman (2010) PageRank Based on the number of connected nodes and the number of connections those neighbours have Newman (2010) Katz centrality Measures the number of nodes that can be connected through a path, while the contributions of distant nodes are penalised Newman (2010) Eigenvector centrality Eigenvector centrality measures a node's importance (degree) while considering the importance (degree) of its neighbours Newman (2010) Betweenness centrality Number of shortest paths that pass through the node Newman (2010) Harmonic centrality Reverses the sum and reciprocal operations in the definition of closeness centrality Marchiori and Latora (2000) Neighbours within (graph diameter × d a ) distance Number of nodes within a given distance along the network West (1996) a d = 0.05, 0.1 and 0.2.
widely used in the analysis of river networks, as well as some metrics not commonly used in the analysis of rivers (e.g. PageRank) but frequently used in other fields of network analysis. We did not consider metrics do not focus on node position in the network (e.g. fragmentation metrics). Additionally, we only considered node metrics on simple, unweighted, undirected networks; that is, we did not consider metrics that require additional information, such as species dispersal kernels.

| Statistical analysis
2.3.1 | Aim 1: Identify a set of orthogonal metrics To identify a set of orthogonal metrics that capture the most information about the topology of DENs, we used principal components analysis (PCA). SBNs and OCNs were generated in a full factorial design across hyper-parameter combinations (Table 2).
Because the network generation algorithms are stochastic, 25 replicate networks were produced per hyper-parameter combination yielding 5,500 and 145,800 SBNs and OCNs, respectively.
Once networks were generated, node metrics were calculated and standardised by subtracting the mean and dividing by the standard deviation; this standardisation was calculated for each node × metric matrix prior to the PCA. Each algorithm (SBN, OCN and real rivers) was analysed independently and PCAs were conducted separately for individual networks (i.e. for the SBNs 5500, PCAs were computed). PCAs for each algorithm were aligned using Procrustes rotation (Peres-Neto & Jackson, 2001) and loadings used to identify the metrics contributing most to each of the principal components; only principal components that explained more than 10% of variation in node metrics were retained. In the subsequent aims, we focus on the node metrics eccentricity (the maximum distance between a node to all other nodes) and mean neighbour degree because they had high loadings on principal components one and two and are intuitively understood. Additionally, we illustrate that the explanatory power of these two metrics depends on the processes controlling ecological patterns, and hence the utility of these distinct metrics ( Figure S5).

| Aim 3: Similarity of simulated networks and real rivers
We compared node topology metrics for real rivers to those from artificial networks (SBNs and OCNs) to evaluate whether the artificial networks can adequately replicate metric patterns of real rivers. Given the computation expense of generating many large OCN networks, we restricted our analysis to networks up to 1,000 nodes in size. For real rivers, we used the networks used to address aim 1, resulting in 541 networks after dropping those larger than 1,000 nodes. SBNs were generated in a full factorial design across hyperparameter combinations ( from that of the real river network. We assumed that local population growth followed a discrete-time stochastic logistic model in each patch: (1) where d i,j is the along the network distance (i.e. number of nodes, links were not weighted) between patches i and j and δ determines the shape of the dispersal kernel.
To address aim 4, a real river network with 55 nodes was sampled from the 600 networks generated to address aim 1, two OCNs and two SBNs were then generated, where one of each was similar to the real network regarding the distribution of node eccentricity scores, and one of each were dissimilar (see Figure S8 for an analysis using mean neighbour degree). Hyper-parameters used to (2) 1 e − d i,j TA B L E 2 Hyper-parameter values used to generate SBNs and OCNs that were subsequently used to analyse node metrics. In all, 25 replicate networks were generated for each hyper-parameter combination of SBNs and OCNs

| Aim 1: Identify a set of orthogonal metrics
The variation in node topology explained by principal components analysis was relatively consistent across SBNs, OCNs and real rivers, as were the loadings (Figure 2). For all network types, PC one was dominated by eccentricity and farness centrality, while PC two was dominated by mean neighbour degree, degree, Harary and closeness centrality and PageRank, and katz centrality. Eccentricity and farness centrality are distance-based metrics calculated using the distance from the target node to (a) the farthest node (eccentricity) or (b) all other nodes (farness centrality). The majority of metrics dominating PC two are associated with node degree and, therefore, number of local connections rather than centrality.

| Aim 2: Relationship between algorithm hyperparameters and node metrics
For SBN networks, ordination of metric scores revealed clear structuring for the hyper-parameters n nodes (total number of nodes) and p (branching probability). As n nodes increased eccentricity increased, Metric scores for the hyper-parameter maxL exhibited clustering, but this was not consistent with the gradients associated with either mean neighbour degree or eccentricity. It is likely for maxL and the other hyper-parameters that interactions between hyperparameters obscure the patterns (Figure 4).

| Aim 3: Similarity of simulated networks and real rivers
SBNs were able to reproduce eccentricity scores of real rivers, regardless of the real river discretisation method (med n ) used, or the size of the network. The ability of SBNs to reproduce eccentricity scores of real rivers depended on the hyper-parameter p, with medium values (p = 0.5) replicating eccentricity scores of non-discretised real rivers and the required value of p decreasing as med n got smaller ( Figure 5).
Values of p below 0.1 and greater than 0.6 resulted in eccentricity scores substantially departing from those of the real rivers ( Figure 5; Figure S10). However, the eccentricity values derived from real river networks were obtained for arbitrarily selected values of med n , and further increasing or decreasing med n will change these results. For mean neighbour degree, the SBNs could reproduce metric scores of real rivers when p was greater than 0.1 ( Figure 5).
OCNs could also reproduce the eccentricity scores of real rivers, but the relationship was more complicated given the large number (7) of hyper-parameters required to generate OCNs. The hyper-parameter type was the main determinant of the similarity between OCNs and real networks in terms of their eccentricity, with type = RN resulting in networks similar to real networks with discretisation method med n = 0.5 ( Figures S11-S16), while OCNs generated with type = AG were similar to real rivers either with no discretisation or discretisation method med n = 1.5 (Figure 6; Figures S11-S16), although this relationship was dependent on maxL and thrA. Hyper-parameter thrA was next most important; networks generated with high thrA values were typically similar to real rivers with discretisation method of med n = 1.5 or none, as thrA decreased eccentricity scores became more similar to networks with med n = 0.5, when standardised thrA was less than 0.1 OCNs were not similar to real rivers, suggesting low values of thrA F I G U R E 3 nMDS ordination plots generated using 14 standardised node metric scores from 5,500 SBNs, coloured by the hyperparameters n nodes (number of nodes in the network) and p (branching probability), and the metrics eccentricity and mean neighbour degree.  Figures S7, S8, S10-S16). F I G U R E 4 nMDS ordination plots generated using 14 standardised node-metric scores from 24,300 OCNs, coloured by the hyper-parameters dim y (determines the size of the network), thrA (determines drainage area), state and temp (determine the temperature of the simulated annealing algorithm) and type (level at which pixels are aggregated), and the metrics eccentricity and mean neighbour degree. Significantly related hyper-parameter vectors are shown as arrows (vector indicates direction and strength of association) and factors as labelled points, located at group centroids. Stress: 0.016 Different metrics characterise different components of network topology. We recommend eccentricity and/or farness metrics be used to quantify node centrality and either mean neighbour degree or degree be used to quantify local node importance in DENs. Given the increasing use of graph-theoretic approaches to analyse river networks, it is important to identify how commonly used metrics respond to a variety of network topologies. Because of the constrained nature of rivers' topology, many metrics show little variability across nodes. This reduced variability is driven by (i) the low values and range in degree of nodes in DENs, which range from 1 to 3 and (ii) there only ever being one unique pathway between any pair of nodes in DENs (exceptions to this assumption are braided rivers and for species capable of overland dispersal, not considered here), which is important because many metrics are fundamentally derived from either the number of times a node is traversed or internode distances (Newman et al., 2001). Many metrics are calculated using the same limited set of network characteristics (degree, distance, path traversal); thus, there is considerable redundancy among metrics. This redundancy is exacerbated by the topologically constrained nature of DENs. To reduce this redundancy, we identified orthogonal sets of metrics. Our analysis suggests metrics can be split into two broad groups: (1) those that describe a node's position in the network (node centrality) and (2) those that describe the number of local connections (node importance). Eccentricity and farness were strongly associated with PC 1, which explained a large (>50%) amount of variation in node variability; these metrics are well suited to identifying nodes most vulnerable to fragmentation and keystone patches that link the wider network (Sarker et al., 2019).

| Aim 4: Metapopulation abundance related to node eccentricity
Mean neighbour degree and degree were two of the preferred node importance metrics; these metrics are determined by the number of connections a node and its neighbours have and describe how locally connected a node is rather than the node's overall position in the network (à la centrality metrics). Importance metrics are useful for identifying locally well-connected patches and sites for invasive species control or targets for restoration (Drake et al., 2017;Ferrari et al., 2014;Perry et al., 2017).
When interrogating the influence of topology on ecological processes and their emergent patterns, it is useful to be able to generate a gradient of network structures spanning the breadth of possible network configurations. Ideally, a clear relationship would exist between algorithm hyper-parameters and the resulting network metrics. For both SBNs and OCNs, network size was most associated with consistent shifts in metric scores; relationships with other hyper-parameters were mixed. Aside from network size, for SBNs the hyper-parameter branching probability (p) had a predictable effect on node metrics. In particular, increasing p resulted in increasing mean node neighbour degree, suggesting networks become more 'branchy'. Other than the size of the lattice used, when generating OCNs aggregation level (type) and thrA were the hyperparameters with the most predicable effect on node metric scores; pixel aggregation at the river network level (type = RN) resulted in nodes with lower mean neighbour degree scores (i.e. nodes having fewer local connections), compared to type = AG. The lack of clear association between other OCN hyper-parameters and node metric scores makes generating networks with predictable node metric scores difficult using OCN algorithms. There is no one-size-fits-all F I G U R E 5 Relationship between network size (number of nodes) and graph metrics eccentricity and mean neighbour degree, comparing SBNs (coloured points) to real rivers (grey shapes represent the four discretisation methods used to convert shapefiles of real rivers). Note, p = 0.0 is not shown as it was an outlier, including it compressed the rest of the points to the extent they were uninterpretable. See Figure S9 for a graph with p = 0.0 included  Centrality and node importance metrics measure different aspects of network topology, and they can be used in conjunction to evaluate different hypotheses. This use of different metrics was highlighted by our demonstration of how local abundance can be predicted by node metrics in a metapopulation model ( Figure S3).
In our analysis, a local node importance metric (degree) better predicted abundance when dispersal was local; however, when dispersal was wide ranging, eccentricity was a better predictor. This analysis highlights how the (spatial) extent of the process of interest can be matched by topology metrics that operate at similar scales.
We recommend that authors explicitly state whether they are interested in node centrality or node importance (or both) and use the metrics highlighted above.

| CON CLUS IONS
Riverine ecosystems are spatially structured habitats in which network topology shapes many biotic and abiotic processes. The hierarchical and branching structure of river networks can help to explain patterns of diversity, metapopulation stability and processes affecting productivity in riverine ecosystems (Helton et al., 2018;Mari et al., 2014;Muneepeerakul et al., 2019). Graph theory provides a convenient way to represent river networks and a host of easy-to-calculate metrics for quantifying the relative importance of nodes based on their position in the network. Because of the topologically constrained nature of river networks, there are fewer suitable metrics available than for other network types (e.g. terrestrial habitat networks, social networks). Eccentricity and farness centrality are recommended for capturing node centrality, and mean neighbour degree or degree are recommended for identifying well-connected nodes. In terms of generating graph-based representations of rivers (e.g. for simulation exercises), all three methods we evaluated have their limitations, but OCNs represent a balance between ease of generation and realism.
Many questions remain about how spatial node characteristics relate to ecological patterns and processes; identifying a suite of topological metrics that adequately describe these systems will help to answer these questions. Finally, the metapopulation modelling results presented here provide some concrete predictions about how ecological patterns may relate to network topology metrics. Empirically testing these hypotheses should be the focus of future research.

ACK N OWLED G EM ENTS
The authors wish to thank two anonymous reviewers for providing detailed, constructive feedback that has greatly improved this manuscript. F.L. was supported by a University of Auckland Doctoral

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

AUTH O R S ' CO NTR I B UTI O N S
All authors conceived the ideas and designed the methodology; F.L.
compiled the data, developed the code, performed the data analysis and led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13854.

DATA AVA I L A B I L I T Y S TAT E M E N T
The code and associated data are archived on Zenodo .