### Abstract

- Top of page
- Abstract
- 1 Introduction
- 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry
- 3 Empirical Analysis
- 4 Discussion
- Acknowledgments
- References
- Supporting Information

[1] The abundance and size distribution of lakes is critical to assessing the role of lakes in regional and global biogeochemical processes. Lakes are fractal but do not always conform to the power law size-distribution typically associated with fractal geographical features. Here, we evaluate the fractal geometry of lakes with the goal of explaining apparently inconsistent observations of power law and non–power law lake size-distributions. The power law size-distribution is a special case for lakes near the mean elevation. Lakes in flat regions are power law distributed, while lakes in mountainous regions deviate from power law distributions. Empirical analyses of lake size data sets from the Adirondack Mountains in New York and the flat island of Gotland in Sweden support this finding. Our approach provides a unifying framework for lake size-distributions, indicates that small lakes cannot dominate total lake surface area, and underscores the importance of regional hypsometry in influencing lake size-distributions.

### 1 Introduction

- Top of page
- Abstract
- 1 Introduction
- 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry
- 3 Empirical Analysis
- 4 Discussion
- Acknowledgments
- References
- Supporting Information

[2] How many lakes are there and how big are they? This is one of the most fundamental questions when assessing the roles of lakes in regional and global biogeochemical cycling. Small lakes are generally not recorded on maps, and even the best compilations of global lake data are thought to greatly underestimate the abundance and surface area of small lakes [*Meybeck*, 1995; *Lehner and Döll*, 2004; *Downing et al*., 2006]. Consequently, the abundance and surface area of small, unrecorded lakes is typically estimated based on extrapolations from power law size-distributions [*Downing et al*., 2006]. Analyses based on this methodology have revealed that lakes cover a much greater portion of Earth's land surface (~3%) than previously believed, that lakes store substantial amounts of carbon in their sediments (up to 820 Pg C), and that greenhouse gas emissions from lakes may almost completely offset the terrestrial carbon sink [e.g., *Cole et al*., 2007; *Tranvik et al*., 2009; *Bastviken et al*., 2011]. Because of their abundance and high biogeochemical rates, small lakes appear to play a large role in carbon emission and sequestration [*Wetzel*, 1990; *Downing*, 2010].

[3] There are two principal lines of evidence in support of extrapolation based on a power law lake size-distribution. First, lakes are fractals, meaning the convolutedness of their shorelines is proportional to the scale at which they are examined [*Goodchild*, 1988; *Hamilton et al*., 1992]. Fractal geological features, like lakes, typically conform to a power law size-distribution [*Mandelbrot*, 1983]. Second, linear regressions on log-abundance log-size plots typically have high *r*^{2} values, a pattern consistent with power law-distributed data [*Downing et al*., 2006; *Seekell and Pace*, 2011]. However, many size distributions have high *r*^{2} values on log-abundance log-size plots when small values are excluded (i.e., lower size limits are truncated because of the uncertain accuracy of observations at these lake sizes). Some high-resolution lake size data sets that accurately observe small lakes have low *r*^{2} values and deviate considerably from the power law distribution, potentially indicating orders of magnitude overestimations of the abundance of small lakes by power law distributions [*Meybeck*, 1995; *Seekell and Pace*, 2011]. For instance, a recent lake census for the United States has found that the power law distribution does not adequately describe the size distribution of lakes in some regions and that early extrapolations based on the power law distribution may have overestimated the global abundance of lakes by 240 million [*McDonald et al*., 2012]. These differences in estimates of lake abundance are significant. For example, *Lewis* [2011] compared estimates of global gross primary production of lakes based on power law and non–power law lake size-distributions. The power law-based estimate produced a 45% larger estimate than an alternate non–power law distribution. Hence, the size distribution of lakes is poorly constrained, but understanding lake size-distributions is critical to evaluating the role of lakes in regional and global biogeochemical cycles [*Tranvik et al*., 2009].

[4] Analyses of lake size-distributions [e.g., *Hamilton et al*., 1992; *Downing et al*., 2006; *Seekell and Pace*, 2011; *McDonald et al*., 2012] are limited, and there is a critical lack of theory from which to derive testable hypotheses to guide new developments in global-scale limnological analyses. Here, we consider lake size-distributions in a fractal geometry framework, with the goal of resolving inconsistent observations of power law and non–power law lake size-distributions. We specifically focus on regional hypsometry (area-elevation relationships) in influencing the shape of lake size-distributions. We evaluate our findings with analyses of lake size-distributions from mountainous and flat regions.

### 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry

- Top of page
- Abstract
- 1 Introduction
- 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry
- 3 Empirical Analysis
- 4 Discussion
- Acknowledgments
- References
- Supporting Information

[5] A lake shoreline is equivalent to a contour line for the lake surface elevation. Consequently, fractal geometry theory developed for topographic contour lines is applicable to the analysis of lake shorelines. Here, we synthesize evidence from several studies of the fractal geometry of contour lines that are relevant to lakes and show that apparently conflicting reports of both power law and non–power law lake size-distributions are consistent with expectations from fractal geometry [*Downing et al*., 2006; *Seekell and Pace*, 2011].

[6] If we approximate a landscape with a fractal surface [*Mandelbrot*, 1975, 1983; *Russ*, 1994] and intersect the landscape with a horizontal plane at the mean landscape elevation, the points where fractal surface returns to the horizontal plane form a fractal known as a random Cantor set [*Matsushita et al*., 1991; *Russ*, 1994]. The fractional dimension of the random Cantor set is one less than the fractal dimension of the surface and can be measured based on the distribution of distances between returns [*Russ*, 1994]. The return points can be connected to form contour lines. Longer distances between returns lead to larger areas enclosed by the contour lines [in the sense of *Matsushita et al*., 1991; *Russ*, 1994]. Because lake shorelines are contour lines for the lake surface elevation, the distance between returns is analogous to lake areas [in the sense of *Matsushita et al*., 1991; *Russ*, 1994]. The shorelines of individual lakes have a fractal dimension, but the collections of areas enclosed by the shorelines are also power law distributed and associated with a fractal dimension that represents the collection of shorelines [*Matsushita et al*., 1991; *Isogami and Matsushita*, 1992; *Russ*, 1994; *Sasaki et al*., 2006]. The fractal dimension of the size distribution of lakes near the mean elevation is measured with the regression

- (1)

[7] where *N* is the number of lakes greater than or equal to the area *A*, *c* is a constant, *b* = *D* / 2, and the functional form (i.e., power law form) of the regression is based on the first return rate of a fractional Brownian to the mean elevation [*Goodchild*, 1988; *Matsushita et al*., 1991]. *D* is the fractal dimension of the shorelines surrounding the lake area and is constrained between *D* = 1 (a population of perfectly smooth shorelines) and *D* = 2 (a population of shorelines so irregular they are space filling). Hence, there is a theoretical basis for a power law size-distribution of lakes at the mean elevation, but there are theoretical constraints (0.5 ≤ *b* ≤ 1) on the plausible range of exponents [*Goodchild*, 1988; *Hamilton et al*., 1992].

[8] In some landscapes (e.g., mountainous ones) lakes are present at elevations far from the mean [*Goodchild*, 1988]. If returns through a horizontal plane intersecting the landscape at an elevation far from the mean are recorded, the distribution of return times through the section (lake areas) begins to deviate from a power law [cf. *Ding and Yang*, 1995]. In this case the log-abundance log-size equation takes the form

- (2)

[9] where *d* is a constant and the functional form of the regression is determined by the probability of the first return of a fractional Brownian motion to an elevation not equal to the mean [*Ding and Yang*, 1995]. This second size distribution is just a more generalized function than the equation for the mean elevation only. If *d* = 0, the equation is equivalent to the equation for the size distribution at the mean elevation (i.e., equation ((1))). Conceptually, this means that in regions of high vertical relief, there is less surface area at any one elevation, and hence, there is simply not enough surface area for lakes to form in a great enough abundance to achieve a power law. As a consequence, regional hypsometry likely plays a strong role in determining the shape of the size distribution of lakes.

### 3 Empirical Analysis

- Top of page
- Abstract
- 1 Introduction
- 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry
- 3 Empirical Analysis
- 4 Discussion
- Acknowledgments
- References
- Supporting Information

[10] Based on the concepts outlined above, we conclude that while the shorelines of individual lakes are fractal, this only leads to a power law size-distribution of lakes in specific, although not necessarily uncommon, cases. To test these expectations, we analyzed the lake size data from the Adirondack Mountains in New York [*Seekell and Pace*, 2011] and from the island of Gotland in Sweden [*Verpoorter et al*., 2012]. We selected the Adirondack data because they derive from a well-defined mountainous region and the Gotland data because they derive from a well-defined flat region.

[11] We evaluated the Adirondack data based on four criteria derived from the fractal concept for lakes. First, we tested the data for deviation, based on the *r*^{2} value, from a power law on a log-abundance log-size plot [*Seekell and Pace*, 2011]. Power law distribution data should form a straight line on a log-abundance log-size plot, and a low *r*^{2} value indicates deviation from a straight line and hence deviation from a power law [*Seekell and Pace*, 2011; Appendix A]. There is a large vertical relief in the Adirondack data set, and hence, the distribution should deviate significantly from a power law. Second, we extracted lakes at the mean lake elevation (*n* = 19 at elevation = 503 m), which for this data set is approximately the same as the mean landscape elevation, and tested them, based on the *r*^{2} value, on a log-abundance log-size plot for deviation from a power law. Lakes at the mean elevation should not deviate significantly from a power law distribution. Third, we compared the fractal dimension *D* from the log-abundance log-size plots of lakes at the mean elevation to fractal dimensions derived from dimensional analysis (log perimeter-log area analysis) [*Russ*, 1994]. The values estimated from the log-abundance log-size plots should be theoretically plausible and similar to estimates of *D* from dimensional analysis. Fourth, we compared the fit of size-distribution equations ((1)) and ((2)) (above) to lakes away from the mean elevation. For lakes away from the mean elevation, size-distribution equation (2) should exhibit improved fit relative to size-distribution equation ((1)).

[12] The Adirondack Mountain data set is based on a stratified sample of lakes digitized from USGS topographical maps designed to accurately represent lakes of different sizes and includes ponds as small as 0.1 ha. For the Adirondack Mountain lake data set (*n* = 1469; Figure 1), the slope of the log-abundance log-size regression on the entire data set was −0.658 (Figure 1a). This falls within the theoretical constraints of 0.5 ≤ *b* ≤ 1, but the *r*^{2} value (*r*^{2} = 0.853) was significantly lower than expected (critical *r*^{2} = 0.990) if the data conformed to a power law distribution [*Seekell and Pace*, 2011]. The lake elevation distribution roughly conformed to the normal distribution (Figure 1b), and there was considerable variability in lake elevation (mean elevation = 503.6 m, standard deviation = 111.7). The slope of the log-abundance log-size regression for lakes at the mean elevation (503 m, *n* = 19 lakes) was −0.613 (Figure 1c). This slope falls within the theoretical constraints of 0.5 ≤ *b* ≤ 1, and the *r*^{2} value (*r*^{2} = 0.98) was consistent (critical *r*^{2} = 0.846) with data that conform to a power law distribution [*Seekell and Pace*, 2011]. The fractal dimension of the size distribution is *D* = 1.23, which is very similar to the fractal dimension (*D* = 1.22) derived from dimensional analysis for the entire lake data set.

[13] We fit the size-distribution equations ((1)) and ((2)) to lake sizes from a 25 m range (*n* = 67 lakes between 612.5 and 637.5 m) about 100 m above the mean elevation. Using lakes from this small range as opposed to just from one elevation was necessary in order to achieve a sample size large enough for analysis, and the range of elevations was based on the bounds of an arbitrarily selected bin from a histogram of lake elevations. We compared the fits of the alternate regression models on log-abundance log-size plots by examining the dual criteria of linearity of predicted versus observed values and evenness of distribution of points above and below the regression line [*Quandt*, 1964]. Both equations are unbiased in the statistical sense (the slopes of regressions on these variables = 1), but the predicted and observed values are not linearly related for size-distribution equation ((1)) (Figure 2a) whereas they are for size-distribution equation ((2)) (Figure 2b). The points are much more evenly distributed around the regression line for size-distribution equation ((2)) (Figure 2b) than they are for size-distribution equation ((1)) (Figure 2a). Based on these dual criteria, the fit of size-distribution equation ((1)) (*r*^{2} = 0.826) was poor relative to the fit of size-distribution equation ((2)) (*r*^{2} = 0.992). We do not attempt to interpret the values of the coefficients from size-distribution equation ((2)) because both independent variables are lake surface area and this collinearity can lead to highly inaccurate parameter estimates.

[14] The Gotland lake (*n* = 114) data are based on a recent high-resolution census of lakes greater than 0.01 km^{2}, described in detail by *Verpoorter et al*. [2012]. We tested the lake area for deviation from a power law on a log-abundance log-size plot based on the *r*^{2} value. Gotland is an island with low vertical relief, and hence, the data should not deviate from the power law. We compared the fractal dimension derived from the log-abundance log-size regression to a fractal dimension derived from dimensional analysis. The fractal dimensions should be similar to each other. For the Gotland data set, the slope of the log-abundance log-size regression on the entire data set was −0.795 (Figure 3a). This falls within the theoretical constraints of 0.5 ≤ *b* ≤ 1, and the *r*^{2} value (*r*^{2} = 0.995) was consistent (critical *r*^{2} = 0.935) with data that conform to a power law distribution [*Seekell and Pace*, 2011]. There was little variability in lake elevation in this region (mean elevation = 18.8 m; standard deviation = 18.7), and all lakes are near the mean elevation (Figure 3b). The fractal dimension derived from the log-abundance log-size plot (*D* = 1.59) is higher than the fractal dimension derived from dimensional analysis (*D* = 1.3), but 95% confidence intervals for these parameters overlapped such that they are not inconsistent.

### 4 Discussion

- Top of page
- Abstract
- 1 Introduction
- 2 Conceptual Foundation for Lake Size-Distributions in Fractal Geometry
- 3 Empirical Analysis
- 4 Discussion
- Acknowledgments
- References
- Supporting Information

[15] Our empirical analysis of lake sizes from a flat and mountainous region supported expectations drawn from the fractal concept and can explain apparently conflicting observations of power law and non–power law lake size-distributions. Regional hypsometry influences the shape of lake size-distributions such that mountainous regions likely depart from the power law lake size-distribution, whereas other flatter regions likely conform to a power law lake size-distribution. The fractal concept is not specific to Earth's surface, and the generality of the concept, as applied to lakes, is supported by additional empirical analyses of ancient lake basins on Mars (Appendix B).

[16] Our empirical results for lakes at the mean elevation met the dual criteria for the fractal concept that (1) the shape on a log-abundance log-size plot conformed to a power law distribution and (2) the slope of the regression on the log-abundance log-size plot was consistent with that expectation from independent measurements of the topography. However, some distributions, even for lakes at or near the mean elevation, might still deviate from the power law distribution. In many distributions the upper tail deviates from a power law because it is impossible to fit enough large lakes on a finite surface, truncating the distribution. In this case the lower tail of the distribution will still potentially conform to the power law distribution [*Hamilton et al*., 1992]. When small lakes are not completely enumerated, the lower tail of the size distribution may depart from a power law, but only below the minimally reliably mapped area. Deviations in the lower tail above the minimum reliably mapped size are not the results of mapping error [*Seekell and Pace*, 2011]. We cannot completely rule out potential impacts of truncation or mapping error on our analysis. However, we observed curvature extending throughout the entire Adirondack lake distribution (e.g., Figure 1a). This curvature is inconsistent with a power law, but consistent with size-distribution equation ((2)) [*Seekell and Pace*, 2011]. Size-distribution equation ((2)) cannot explain the complete flattening of the extreme lower tail (lakes < 0.01 km^{2}), which could be due to omission of small lakes from the data set. This flattening could also occur if scale-dependent geomorphic processes have eliminated these very small lakes from the landscape. Our approach serves as null hypotheses for landscapes without these processes and hence does not account for this effect [*Goodchild*, 1988]. We did not observe any patterns (e.g., breaks in linearity, regular spacing in between points on the low end of area and perimeter ranges) in our perimeter-area relationships that would suggest an adverse effect of mapping resolution.

[17] Our analysis is based on a theoretical fractal surface. Fractal surfaces can approximate a wide variety of landscapes [*Mandelbrot*, 1975, 1983], but are scale free and hence do not include scale-dependent geological processes including some that may change lake abundances [*Goodchild*, 1988]. The fractal approach utilized here is only an approximation to real landscapes, but this approach is advantageous for simplifying the development of testable hypotheses that are useful for regional and global limnological studies [*Goodchild*, 1988]. The geology of landscapes may modify how well data fit the simple fractal landscape model. For example, Gotland formed through isostatic uplift and the draining of Baltic Sea's freshwater predecessors. The rock is carbonate, resulting in karst weathering, a landscape likely to mimic the equally peaked and pitted fractal model because pits are less likely to be filled by mass wasting than other landscapes [*Goodchild*, 1988; *Clarke*, 1988]. Hence, our null model for Gotland suggests a power law distribution because it is relatively flat, but the karst topography may also promote the observed lake size-distribution conforming to this null hypothesis. The Adirondacks are mountainous and a null hypothesis for this landscape suggests deviation from a power law (i.e., size-distribution equation ((2))). This landscape was glaciated with steep slopes subject to soil erosion, rockslides, and valley formation. These types of geomorphic processes may modify the landscape at scales that can contribute to deviations from a power law (i.e., additional deviation beyond that due to elevation alone). The hypothetical fractal surface is flat (i.e., not spherical) with a single fractal dimension. While this is a reasonable assumption for the relatively small regions in our empirical analysis, it may not be appropriate for large-scale analyses that combine multiple physiographic regions. Overall, our empirical results were largely consistent with fractal expectations, suggesting that this approach will have utility for further analyses of lake morphometry and lake size-distributions.

[18] In an analysis by *Downing et al*. [2006], the value of *b* for the world's largest lakes was 1.06. This has important implications because, if the data conform to a Pareto distribution (power law distribution), this slope (*b* = 1.06) indicates that small lakes dominate the total surface area covered by lakes. However, this value is inconsistent with fractal geometry theory in that the slope *b* is constrained between *b* = 0.5 and *b* = 1 [*Goodchild*, 1988]. A potential reason for finding *b* > 1 may be truncation of the lower tail of the lake size-distribution. In their analysis of the world's largest lakes, *Downing et al*. [2006] excluded all lakes less than 10 km^{2} in area. This type of truncation can make data from many size distributions mimic the linearity of a power law distribution on log-abundance log-size plots and also make the slope of the mimicking data appear steeper than is plausible from power law-distributed data [*Perline*, 2005; *Seekell and Pace*, 2011]. Another estimate of *b* based on 251 large lakes was 0.83, a plausible value [*Downing et al*., 2006], but extrapolation from many of these estimates remains difficult because the mean and variance calculated from samples of power law-distributed data vary wildly with small changes in sample size [*Mandelbrot*, 1963]. Hence, there is tremendous uncertainty in (1) whether the power law distribution adequately describes the global and regional size distribution of lakes and (2) what the relative contribution of small versus large lakes is to global lake surface area. These uncertainties are probably best resolved by complete enumeration of lakes [*Seekell and Pace*, 2011; *McDonald et al*., 2012].

[19] Most of Earth's land surface (~75%) is located outside of mountainous or high-elevation regions, and consequently, the power law distribution may hold over much of Earth's surface [*Miller and Spoolman*, 2011]. Many lake-rich regions are relatively flat (e.g., Finland), and power law distribution fits to lakes in these regions may be useful for understanding lake hydrological and biogeochemical response to environmental change. Small lakes are thought to play an important role in regional- and global-scale biogeochemical cycles, and the potential dominance of small lakes in terms of surface area has been cited as important to this argument [*Downing*, 2010]. While our results suggest that small lake dominance of surface area is unlikely, even in regions where the power law distribution holds (because *b* must be ≤ 1), this does not preclude small lakes from significance in regional biogeochemical cycles. Small lakes typically have higher fluxes and faster reaction rates than large lakes and consequently may still contribute disproportionately to biogeochemical cycles of lake-rich regions [*Downing*, 2010]. This potential importance, however, is not due to small lake dominance of total lake surface area. Many critical biogeochemical processes in lakes may be driven by processes at a regional scale [*Lapierre and del Giorgio*, 2012]. Relating the spatial organization of biogeochemical processes to regional lake size-distributions may be a promising route for improved understanding of the role of lakes in biogeochemical cycles at broad spatial scales.