A theory for the relationship between lake surface area and maximum depth

Maximum depth is crucial for many lake processes and biota, but attempts to explain its variation have achieved little predictive power. In this paper, we describe the probability distribution of maximum depths based on recent developments in the theory of fractal Brownian motions. The theoretical distribution is right‐tailed and adequately captures variations in maximum depth in a dataset of 8164 lakes (maximum depths 0.1–135 m) from the northeastern United States. Maximum depth increases with surface area, but with substantial random variation—the 95% prediction interval spans more than an order of magnitude for lakes with any specific surface area. Our results explain the observed variability in lake maximum depths, capture the link between topographic characteristics and lake bathymetry, and provide a means to upscale maximum depth‐dependent processes, which we illustrate by upscaling the diffusive flux of methane from northern lakes to the atmosphere.

Maximum depth varies among lakes between $ 0.1 and 1741 m (Herdendorf 1982;Sobek et al. 2011). This variation engenders patterns of diverse ecosystem characteristics including mixing and thermal stratification (Lewis 1983), the relative sizes of littoral and pelagic habitats (Seekell et al. 2021a, b), and carbon cycling including methane evasion (Li et al. 2020). However, there is a paucity of bathymetric data relative to the global abundance of lakes, and empirical relationships that relate maximum depth to lake and landscape characteristics consistently fail to develop sufficient predictive power to estimate maximum depth in lakes that have not been depth sounded (Seekell 2018). In particular, there is need to develop scaling relationships that relate maximum depth to lake characteristics that are easily measured across broad geographic regions, such as surface area (e.g., Cael et al. 2017;Seekell et al. 2021a,b). Such relationships provide the simple rules used to generalize understanding of aquatic ecosystem patterns and processes at regional to global scales (Downing 2009).
Bathymetric surveys are time consuming, and prohibitively expensive for large numbers of lakes (Seekell 2018;Hollister et al. 2011). While global perspectives have come to dominate the aquatic sciences over the last 20 years, the proportion of lakes depth sounded has remained low, preventing upscaling of empirical results from local to global scales (Seekell 2018;Oliver et al. 2016;Downing 2014). This has spurred a series of empirical studies seeking to predifct lake-specific maximum depth based on surface area and other easily mapped characteristic, typically some metric of vertical relief within a buffer zone around each lake (e.g., Minns et al. 2008;Hollister et al. 2011;Sobek et al. 2011;Heatcote et al. 2015;Messsager et al. 2016;Oliver et al. 2016). These studies assume that larger lakes should be deeper than smaller lakes, and that integrative measures of topography (i.e., variance, slope) should relate to lake bathymetry. However, these might not be reliable assumptions. For example, while the deepest lakes all have large surface areas (≥ 500 km 2 ), many lakes with large surface areas are remarkably shallow, often only a few meters deep (Herdendorf 1982). Furthermore, correlations between surface area and maximum depth are often weak. For example, Minns et al. (2008) found a Pearson correlation between the logarithms of surface area and maximum depth for Canadian lakes of r = 0.46. Based on this correlation, the probability of a larger lake being deeper than a smaller lake is only slightly better (p = 0.65) than a fair coin flip (p = 0.5), if the two lakes are selected at random (Dunlap 1994).
In addition, it is not clear that topography should predict maximum depth. Maximum depth is essentially an extreme along a combined random topographic-bathymetric profile (Seekell 2018). While integrative measures of topography may be able to predict integrative measures of bathymetry, there is no clear reason to believe that they should accurately predict the value of a random extreme (Seekell 2018). The consequence of this is highly uncertain predictions that preclude the application of these equations for upscaling. For example, the error ratio (Carpenter et al. 1991) for maximum depth predictions for Canadian lakes is approximately 2 in Minns et al. (2008). This means that a lake predicted to have a maximum depth of 10 m will have a maximum depth between 5 and 20 m (Carpenter et al. 1991). Error ratios can be even higher, such as in Heathcote et al. (2015). Appreciable prediction errors occur across the entire size spectra of lakes (Carpenter et al. 1991;Minns et al. 2008;Heatcote et al. 2015).
In this paper, we describe the theoretical basis for predicting lake maximum depth from surface area. This theory is based on the characteristics of fractional Brownian motions, which approximate key features of Earth's topography and are the basis for other lake scaling relationships including mean depth and volume (e.g., Goodchild 1988;Seekell et al. 2013;Cael et al. 2017;Seekell et al. 2021a,b). Specifically, we describe how recent developments generalizing the arcsine laws from Brownian motions to fractional Brownian motions relate to the problem of predicting maximum depths Wiese 2015, 2016) by considering lakes as resting on a fractal Brownian surface, as in Fig. 1. We test goodness-of-fit of the theoretical distribution derived based on these theoretical developments to a bathymetry database with more than 8000 lakes. Finally, we demonstrate how this distribution can be applied to advance understanding of global lake characteristics by upscaling diffusive methane flux from temperate lakes to the atmosphere, a process that has previously been shown to correlate with maximum depth but not surface area, and hence a process that is difficult to upscale without maximum depth data or an approach such as the one presented here (Li et al. 2020). Collectively, these results both advance fundamental understanding of patterns of lake morphometry, and 4. This path is taken to be an analog of a transect along a landscape; lakes fill in sections along this transect, which thus sets their maximum depth. The lake surface transect L is proportional to the square-root of the lake surface area ffiffiffi a p ; the maximum depth z is the minimum of the fractal Brownian motion path beneath this transect.
provide tools for upscaling depth dependent processes when seeking to place lakes within a global context.

Theory
Earth's topography is approximately scale-invariant (Turcotte and Huang 1995). This is self-evident for many landforms including lakes. For example, it would be difficult or impossible to determine if a lake is 1 or 1000 ha on a map or image without a scale reference (e.g., a scale bar on a map or a house on an aerial image) because the key characteristics of lake geometry are similar across wide ranges of scales (Seekell et al. 2013;Seekell 2018;Seekell et al. 2021a,b). In particular, Earth's topography is self-affine, which implies different scaling in the vertical and horizontal directions (Turcotte and Huang 1995). This is also self-evident; when walking or driving away from a mountain range, the profile flattens more rapidly than it compresses horizontally (Seekell 2018 H takes values in the range (0, 1). When H ¼ 1 2 , transects across Z have the statistical properties of a one-dimensional Brownian motion; when H > 1 2 ( < 1 2 ), surfaces are smoother (rougher) than Brownian trajectories (e.g., Goodchild 1988). These characteristics engender the derivation of empirically robust and theoretically sound power-law relationships between lake surface area and various aspects of lake morphometry including abundance, volume, mean depth, perimeter, and hydrological connectivity (Seekell et al. 2013;Cael et al. 2017;Seekell et al. 2021a,b). However, to our knowledge, the relationship between maximum depth and surface area has not been described from a fractal perspective (Seekell 2018).
When topography approximates a Brownian motion (H ¼ 1 2 ), the maximum depth over a given interval is described by the third arcsine law (Delorme and Wiese 2015). Essentially, because the maximum depth is a single random displacement, that is, an extreme on a random topographic profile, the arcsine law shows that maximum depth converges as a probability distribution based on surface area. Recently, this result has been generalized to cases where H ≠ 1 2 Wiese 2015, 2016). This generalization allows application of the arcsine law to predict the distribution of lake maximum depth p(z) based on surface area and the Hurst coefficient. Unfortunately, the mathematical form of p(z) is complex. First, define the normalized maximum depth and define the probability distribution in terms of y, that is, p (y) (later we recast this in terms of z as well). Doing so, where 1 ffiffi 2 p L H is a normalization constant that ensures the total probability integrates to one, and f(y) has the form: and G y ð Þ is the complex expression: where F is the hypergeometric function, erfi is the imaginary error function, and γ E is the Euler-Mascheroni constant Wiese 2015, 2016).
To date, the arcsine laws for fractal Brownian motions are only proven for the one-dimensional case. Therefore, lake areas must be recast as one-dimensional lengthscales L to empirically test the theoretical depth distribution. The lengthscale L is in a sense the length of a randomly chosen horizontal transect along a lake's bathymetry, which includes the lake's maximum depth (solid black line in Fig. 1). As surface area a (m 2 ) is the fundamental horizontal metric for lakes, it would be preferable to cast L in terms of a; from unit considerations one necessarily must have L / ffiffiffi a p . However, the coefficient relating the two is not easily derived from first principles because lake surfaces often have complex shapes, and their bathymetry can also be complex, including multiple basins and/or having maximum depths far from their centers. We, therefore, introduce the free parameter ℓ that allows us to relate L to a according to L ¼ ℓ ffiffiffi a p . ℓ should be no larger than what it would be for a circular lake with its maximum depth at the center, that is, ℓ ¼ ffiffiffiffiffiffiffiffi 2=π p ≈ 0:8, but is likely much smaller as lakes take myriad shapes that are often very far from circular, and their maximum depths do not have to be in their center (Stachelek et al. 2022). To compare the theoretical distribution to observed depths, it is further useful to use a normalized maximum depth y ¼ z= ffiffiffi 2 p L H and define the probability distribution in terms of y, that is, p(y) (that can later be recast in terms of z). Thus in terms of maximum depth and area, normalized maximum depth is: This allows an overall test of goodness-of-fit that is needed because of difficulty finding large numbers of lakes with identical surface areas.

Empirical test
We tested goodness-of-fit of normalized maximum depths to the theoretical distribution using the publicly available LAGOS database, which includes maximum depths for N = 8164 lakes in the northeastern United States (Oliver et al. 2015(Oliver et al. , 2016 (accessed 28 August 2021). We selected this database because it has been extensively documented and has previously been utilized for developing and testing predictive models for lake maximum depth (Oliver et al. 2016). Maximum depth spans more than three orders of magnitude among these lakes, from z = 0.1 m to z = 135 m while surface area spans from a = 4 ha to a = 989 ha. This database does not distinguish between lakes and reservoirs.
We compare the theoretical and empirical depth distributions using the Kolmogorov-Smirnov statistic D ¼ max P y ð ÞÀE y ð Þ ð Þ , where P(E) is the theoretical (empirical) cumulative distribution function for y (Stephens 1974). For a given value of H and ℓ, we calculate y for each lake, then E(y) from the empirical distribution of y values, then p(y) based on that H value, then P(y) by integrating p(y), then finally D by comparing E(y) and P(y). First, we set H = 0.4 as an estimated value for Earth's topography (Mark and Aronson 1984;Dodds and Rothman 2000), which has also been shown to capture global scale relationships between mean depth and surface area (Cael et al. 2017). We then select ℓ by systematically varying ℓ and repeating this procedure until D is minimized. We also test the best-fit value of H by systematically varying both H and ℓ until D is minimized. Finally, because small lakes are often underrepresented in lake databases (Downing 2009;Stanley et al. 2019) and are prone larger relative errors simply by being shallower on average, we repeat this process but only considering the upper half of the distribution, that is, using a modified D alt ¼ max P y > median y ð Þ ð Þ À E y > median y ð Þ ð Þ ð Þ (Seekell 2018). This tests the goodness-of-fit of the larger well mapped lakes to the upper half of the theoretical distribution. Figure 2 shows that the theoretical distribution p(z) adequately captures the empirical distribution of maximum depth when H = 0.4 (D = 0.033). The distribution is rightskewed and unimodal, with few lakes having a y ≈ 0, many lakes having a y ≈ 0.5, and a heavy tail of relatively deep lakes having a y (1, 4). The best-fitting ℓ value was 0.17, substantially lower than the theoretical limit of ffiffiffiffiffiffiffiffi 2=π p ; this presumably reflects the extent to which these lakes are not circular and their maximum depths are not located in their centers. Furthermore, when H is left as a free parameter rather than externally set by topographic considerations, the best-fitting H = 0.38 is negligibly different to H = 0.4 ( Fig. A1; D = 0.027; ℓ also changes only slightly, from 0.17 to 0.18).

Results
This distribution p(y) can be recast as a distribution for maximum depth z for lakes of a given area a as shown in Fig. 3; for a lake with a = 1 ha this corresponds to a median maximum depth of 3.9 m and a 95% confidence interval of (0.4 m, 10.5 m). This wide range of predicted depths arises naturally from random topographic variations, and is consistent with the ranges of variability observed in empirical datasets (Fig. 3). In this context of a particular a value, the difference between the theoretical and empirical distributions can be most easily understood by comparing the percentiles of each, which underscores that the correspondence is indeed quite good. If we treat the empirical and theoretical y distributions as if they were maximum depth distributions for lakes with a = 1 km 2 , the 1 st -91 st percentiles of the theoretical and empirical distributions for maximum depth agree within 0.20 m. Differences of the 92 nd -99 th percentiles are larger at 0.25-1.40 m, but these lakes have deeper maximum   3. Lake surface areas a and maximum depths z, with percentiles of the probability distribution for z (from Fig. 2, for a given a) superimposed.

Discussion
Scaling relationships provide simple rules for understanding hydrographic patterns at regional and global scales. Our study contributes to this understanding by describing the relationship between surface area and maximum depth. Prior work has primarily focused on developing empirical relationships with multiple linear regression or similar methods (e.g., Sobek et al. 2011;Heatcote et al. 2015;Oliver et al. 2016), whereas our study provides a rigorous theoretical perspective which is very general, as it is based on fractal topographic theory, while also accurately characterizing the data we use to test it. Lake area and the Hurst coefficient are the key factors controlling lake depth, with a large stochastic component. These factors can be measured without bathymetric surveys, and hence our results can be applied to existing hydrographic and topographic data sets to estimate characteristics of lake across broad geographic extents including littoral habitat size, carbon burial, or greenhouse gas evasion (Li et al. 2020;Seekell et al. 2021a,b).
Theoretical lake morphometry, including our study, primarily derives from the assumption that Earth's topography approximates a fractal Brownian surface (Goodchild 1988;Seekell et al. 2013;Cael et al. 2017). This assumption is imperfect because (1) it implies that the landscape is a static surface whereas real landscapes are dynamic and evolve over time, (2) it implies scale-invariance whereas real landscapes are shaped by scale dependent processes, and (3) it implies that lakes are formed by flooding pre-existing landscape depressions, which is often not the case (Goodchild 1988;Timms 1992). Within this context, linear regression analyses based on landscape characteristics are trying to leverage the differences between real and random surfaces to improve maximum depth predictions. However, the low explanatory power of such analyses, and the general consistency between theoretical predictions and empirical patterns, indicates that the imperfect assumptions of the fractal Brownian motion do not materially detract from their application to lake morphometry. Hence, fractional Brownian motion is an effective starting point for developing predictions about the global characteristics of lakes.
The overall fit is remarkably good in our analysis. However, discrepancies do arise between the data and theory, for several potential reasons. The number of medium-depth lakes is slightly overestimated and the number of small lakes slightly underestimated by the theoretical distribution. Whether H is set by external topographic considerations (H = 0.4) or estimated from the lake y distribution (H = 0.38), there are some discrepancies between the theoretical depth distribution and observations. Specifically, at the lower end of the distribution, with the theory slightly overestimating the number of medium-depth lakes (y [1, 2]), underestimating the number of shallow lakes (y 1 4 ,1 À Á ), and overestimating the number of very shallow lakes (y < 1 4 ). The general ontongeny of lakes is decreased depth over time due to sedimentation (Seekell et al. 2021a,b). Empirical deviations from the theoretical depth distribution are consistent with this ontogeny, specifically very shallow lakes are rarer than expected, probably because they have transitioned from lake to wetland or terrestrial ecosystem. In addition, the accentuated peak for shallow lakes is consistent with patterns expected due sedimentation.
There are several other factors that can contribute to discrepancies between theory and empirical patterns, primarily related to the collection of bathymetric data. First, large lake databases typically contain samples biased to certain lake characteristics, and the values in the data we analyzed may not be completely representative of the true maximum depth distribution (Seekell 2018;Seekell et al. 2021a,b). Second, maximum depths are often only reported to one decimal place, which can lead to significant rounding errors for shallow lakes. This is clearly visible in Fig. 3, where there is a regular patterning of shallow maximum depths, but not deep maximum depths (i.e., equal spacing among points along the ordinate). Finally, very shallow lakes may be misclassified as wetlands and therefore not included in bathymetric databases. Collectively, these factors most strongly impact small and shallow lakes. This observation is consistent with the excellent fit between theoretical and empirical distributions for deep lakes and somewhat weaker fit across the whole depth distribution, which can be interpreted as caused by larger relative errors in the shallower lakes' maximum depths. Specifically, when we fit only the upper-half of the y distribution (i.e., using D alt above; Fig. A2); the goodness-of-fit was excellent (D alt = 0.012). Advancing global limnology relies on developing robust probability distributions for lake characteristics (Downing 2009). The variety of factors causing discrepancies between the theoretical and observed distributions highlight the challenges faced in characterizing these distributions, even in ideal cases like this where theory provides clear guidance on the appropriate distribution.
To demonstrate the utility of our approach, we apply it to the example of estimating evasion of diffuse methane flux from temperate lakes to the atmosphere. Note that our approach is useful for predicting characteristics of collections of lakes, but not individual lakes. The only accurate way to measure morphometry for individual lakes remains the detailed bathymetric survey. However, many urgent questions in the aquatic sciences are framed in a global perspective where the characteristics of large populations are of specific interest (Downing 2009(Downing , 2014; our approach is well-suited to these questions. The characteristics of small numbers of lakes are typically upscaled based on abundance-area distributions to estimate population level characteristics (Downing 2009;Seekell et al. 2018). This poses a problem for ecosystem characteristics that are closely tied to maximum depth but not surface area.
Diffusive methane flux from temperate lakes to the atmosphere is one example of these characteristics (Li et al. 2020). Methane is a potent greenhouse gas and estimating evasion from lakes to the atmosphere is a priority for understanding contributions of lakes to the global carbon cycle. A recent synthesis of diffusive flux measurements for temperate lakes revealed a statistically significant inverse relationship with maximum depth (p < 0.001), but no significant relationship with surface area (p > 0.05) (Li et al. 2020). We applied the empirical relationship for diffusive methane flux from (Li et al. 2020) to the full database of lakes from 17 northeastern United States for which surface areas are available (N = 141, 265) by randomly simulating maximum depths for these according to the distribution above (with H = 0.4, and ℓ = 0.17 estimated from the 8164 lakes from these same 17 states for which maximum depth measurements were available; Fig. 2) and then calculating diffusive methane flux for each lake as a function of surface area and simulated maximum depth. We repeated this process many times to estimate uncertainty due to the randomness associated with simulating maximum depths via a probability distribution.
Using this approach to estimate methane diffusion, we find an estimated flux of 0.5 AE 0.04 Tg CH 4 yr À1 from these lakes. This is a substantially different estimate than if one takes a simple average rate per unit area across all lakes because the relationship between methane flux and maximum depth is nonlinear (Li et al. 2020) and this average will be dominated by lakes with large surfaces, deep maximum depths, and low rates of methane flux per unit area. One may substantially underestimate overall methane flux as a result; our results thus represent an important methodological advancement for upscaling lake characteristics correlated with depth but not surface areas. For instance, multiplying the average of 0.9 g CH 4 m À2 yr À1 from (Li et al. 2020) by the total surface area of the 141,265 lakes above results in an estimate of 0.03 Tg CH 4 yr À1 .
Our study highlights the far reaching influence of the Hurst coefficient on global scale lake characteristics. Specifically, the differences between horizontal and vertical scaling described by the Hurst coefficient underlie differences in characteristics between large and small lakes-small lakes are typically deeper relative to their surface area compared to large lakes which has implications for energy and carbon budgets across the lake size spectra. In addition, the Hurst coefficient is involved in most other lake scaling relationships, including for abundance, perimeter, volume, and hydrologic connectivity (Goodchild 1988;Seekell et al. 2013;Cael et al. 2017;Seekell et al. 2021a,b). Despite this, empirical measurements of the Hurst coefficient for Earth's topography and bathymetry are relatively rare and highly variable (H = 0.4-0.7) (Goodchild 1988;Weissel et al. 1994;Gagnon et al. 2006). Developing such measurements should be an important priority for advancing global scale understanding of lakes. These measurements could explain variations in scaling relationships among regions with different lake basin forming processes, as well as improve the precision of predictions by reducing uncertainty in parameterization.