Integrating ecosystem metabolism and consumer allochthony reveals nonlinear drivers in lake organic matter processing

Lakes process both terrestrial and aquatic organic matter, and the relative contribution from each source is often measured via ecosystem metabolism and terrestrial resource use in the food web (i.e., consumer allochthony). Yet, ecosystem metabolism and consumer allochthony are rarely considered together, despite possible interactions and potential for them to respond to the same lake characteristics. In this study, we compiled global datasets of lake gross primary production (GPP), ecosystem respiration (ER), and zooplankton allochthony to compare the strength and shape of relationships with physicochemical characteristics across a broad set of lakes. GPP was positively related to total phosphorus (TP) in lakes with intermediate TP concentrations (11–75 μg L−1) and was highest in lakes with intermediate dissolved organic carbon (DOC) concentrations. While ER and GPP were strongly positively correlated, decoupling occurred at high DOC concentrations. Lastly, allochthony had a unimodal relationship with TP and related variably to DOC. By integrating metabolism and allochthony, we identified similar change points in GPP and zooplankton allochthony at intermediate DOC (4.5–10 mg L−1) and TP (8–20 μg L−1) concentrations, indicating that allochthony and GPP may be coupled and inversely related. The ratio of DOC:nutrients also helped to identify conditions where lake organic matter processing responded more to autochthonous or allochthonous organic matter sources. As lakes globally face eutrophication and browning, predicting how lake organic matter processing will respond requires an updated paradigm that incorporates nonlinear dynamics and interactions.

The production and processing of organic matter are central to the field of ecosystem science. Freshwater lakes are particularly active sites for carbon cycling because they can receive substantial organic matter inputs from the watershed in addition to within-system production from photosynthesis (Cole et al. 2007).
The balance of internal (i.e., autochthonous) production and terrestrial (i.e., allochthonous) inputs influence the extent to which carbon is stored in sediments, transformed for release to the atmosphere or entry to the food web, or transported downstream (Cole et al. 2007;Tranvik et al. 2009). For instance, high rates of gross primary production (GPP) can reduce atmospheric CO 2 emissions and increase organic matter burial (Schindler et al. 1997;Downing et al. 2008;Battin et al. 2009). In contrast, large fluxes of terrestrial dissolved or particulate organic matter (DOM or POM, respectively) may enhance microbial respiration, and cause lakes to be net carbon sources to the atmosphere (Cole et al. 2000;Solomon et al. 2013). In addition to metabolic measurements of GPP and ecosystem respiration (ER), organic matter processing occurs within lake food webs as both autochthonous and allochthonous carbon establish basal energy sources for higher trophic levels, leading to differences in consumer reliance on terrestrial sources (i.e., allochthony) across lakes. As metabolic rates (GPP, ER) and consumer allochthony each describe organic matter processing and interact with one another (Rüegg et al. 2021), understanding these three metrics and their response to lake characteristics will inform both global carbon cycling and predictions of lake responses to environmental change.
Prior work has identified drivers of GPP, ER, and consumer allochthony, including lake DOM (often measured as dissolved organic carbon, DOC) concentration, nutrient concentration, and lake size (Ask et al. 2009;Staehr et al. 2012;Wilkinson et al. 2013). For instance, GPP is often positively related to total phosphorus (TP) and negatively related to DOC, whereas ER is positively related to both GPP and DOC (Cole et al. 2000;Hanson et al. 2003;Solomon et al. 2013). Allochthony predictions are less clear, although allochthony may relate positively to DOC and negatively to lake size and nutrient concentrations (Solomon et al. 2011;Wilkinson et al. 2013;Tanentzap et al. 2017). While this past work has advanced our knowledge of lake organic matter processing, recent research has highlighted context-dependency and nonlinearities between lake characteristics and GPP. One avenue of recent research has reimagined the paradigm that DOC negatively relates to GPP; researchers now propose a unimodal relationship where DOCassociated nutrients can fertilize lakes at low DOC concentrations, before limiting GPP due to greater light attenuation at high DOC concentrations (Seekell et al. 2015;Kelly et al. 2018;Olson et al. 2020). Another line of research suggests that GPP and nutrients may relate nonlinearly in hypereutrophic systems, where high nitrate can reduce chlorophyll concentration through nitrogen stress (Filstrup et al. 2018). Nonlinear relationships may also affect ER similarly to GPP due to the two metrics often being tightly coupled (Solomon et al. 2013;Cremona et al. 2016). Additionally, as GPP and ER can become uncoupled at high DOC or nutrient concentrations (Solomon et al. 2013;Bortolotti et al. 2019), there is additional potential for nonlinear or threshold dynamics at various DOC-to-nutrient ratios.
Given the potential for GPP and ER to respond nonlinearly to possible drivers, we may also expect to observe nonlinear patterns between lake characteristics and consumer allochthony. Terrestrial inputs can be consumed directly or enter the food web via the microbial loop, and previous studies have linked higher allochthony to greater terrestrial inputs as measured by DOC or water color (Pace et al. 2007;Solomon et al. 2011;Tanentzap et al. 2017). However, the relative availability of both autochthonous and allochthonous resources may better predict allochthony, as chlorophyll concentration can modulate the relationship between DOC and allochthony, and some studies have demonstrated strong relationships between allochthony and water color-to-chlorophyll or water color-to-nutrient ratios rather than DOC (Carpenter et al. 2005;Wilkinson et al. 2013). As terrestrial support of food webs may relate to GPP (Karlsson et al. 2003;Grosbois et al. 2020), we expect that allochthony may also respond nonlinearly to DOC and nutrient gradients.
To date, the identification of context-dependent and nonlinear patterns between lake characteristics and these three measurements of organic matter processing (i.e., GPP, ER, and allochthony) have been constrained to a relatively narrow lake profile, and are seldom considered together. In particular, most surveys of food web allochthony are performed in nutrient poor, mid-sized north-temperate lakes (Pace et al. 2007;Solomon et al. 2011;Wilkinson et al. 2013). Additionally, linkages between organic matter processing rates and resource use by consumers remain lacking, as coupled measurements of metabolism and allochthony are rare. Establishing the connection between metabolism and allochthony, as well as assessing whether similar nonlinear patterns exist between lake characteristics and allochthony would help close the gap in our understanding of how organic matter is processed within lakes, not only by primary producers and microbes, but throughout the food web.
In this study, we combine metrics of metabolic rates (i.e., GPP, ER) with food web dynamics (i.e., allochthony) to holistically evaluate lake organic matter processing. We compiled data from the literature on GPP, ER, and consumer allochthony in a broad selection of lakes worldwide that varied in size, geographic setting, and productivity. Our goal was to examine the strength and shape of the relationship between these three metrics of organic matter processing (GPP, ER, and allochthony) and environmental predictor variables. We hypothesized that (1) GPP would have a unimodal relationship with DOC and increase with TP, (2) ER would be strongly coupled to GPP, with decoupling at high DOC concentrations, and (3) consumer allochthony and GPP would each respond to the abundance of DOC relative to nutrients. Ultimately, by examining all three metrics simultaneously, we can identify nonlinearities and possible thresholds between predictor variables and metrics of organic matter processing.

Methods
We used a literature search to compile two datasets: one on ecosystem metabolism and the second on zooplankton allochthony in the food web.

Ecosystem metabolism dataset
We started with an ecosystem metabolism dataset compiled and published by Hoellein et al. (2013). Their dataset included studies published through part of 2012 that measured ecosystem metabolism using open-water diel O 2 measurements in the summer months for freshwater and estuarine ecosystems; we included their freshwater pond and lake data in our dataset. In November 2019, we searched the literature for additional studies published between 2011 and 2019 that measured metabolism with diel O 2 as in Hoellein et al. (2013). We searched Web of Science using the terms: "ecosystem AND (metabolism OR primary production OR respiration) AND (oxygen OR O-2 OR O2) AND (pond* OR lake* OR pool*)." We included search terms for pools, ponds, and lakes to capture water bodies of different sizes, but subsequently refer to all categories as "lakes." Our search yielded 344 results, which we filtered for relevant titles down to 146 studies that we investigated for possible inclusion in the dataset.
We included studies if they reported lake-specific estimates of GPP and ER. We had no requirement for the minimum number of days measured, and the number of days per study ranged from 3 to 760 (mean: 79 d). If open-water diel O 2 metabolism was measured, but not reported for individual lakes, we contacted authors for more details (including for some studies in the Hoellein et al. 2013 dataset). We reported areal metabolic rates following Hoellein et al. (2013). When only volumetric metabolic rates were reported, we converted them to areal rates using mean lake depth or mixing depth if it was available. In some instances, multiple metabolism estimates were available for the same lake. In cases where a single study measured multiple habitats within a lake, we used the open water rate. If a lake had metabolic rates reported in multiple publications, the following rules were used to calculate or select a single value for the lake. If two or more studies encompassed overlapping study dates, the metabolism estimate from the study with the greater number of study days was used. If studies did not have overlapping timelines and the durations of the studies were all > 5 d, we calculated a weighted mean based on the study length. If one of the studies lasted ≤ 5 d, only values from longer studies were used. If all studies lasted ≤ 5 d, metabolism estimates were averaged. The 5-d threshold was selected based on the distribution of the data, and effectively separated out short-term studies when longer, more robust data were available. Ultimately, we compiled a metabolism dataset that included GPP and ER estimates from 27 studies, representing 92 unique lakes.

Zooplankton allochthony dataset
We examined consumer allochthony using zooplankton due to (1) their ability to assimilate allochthonous and autochthonous organic matter directly and indirectly via the microbial loop, (2) their importance in linking basal resources to higher trophic levels, and (3) their widespread use for measuring consumer allochthony across lake sizes, allowing us to compile a large dataset. We created a zooplankton allochthony dataset by searching Web of Science in November 2019 with the following terms: "(lake* OR pond* OR pool*) AND "stable isotop*" AND (allochth* OR terrestrial)". Our search yielded 1131 results, which we filtered for relevant titles to 244 results to further investigate. We included studies that reported lake-specific estimates of zooplankton allochthony based on stable isotope analysis (usually δ 13 C and δ 15 N, but also δD). If allochthony was measured, but was not reported for each lake, we contacted authors for more details. We included one dataset that did not come up in our literature search, which was a study reporting zooplankton allochthony in 10 lakes in the northern Midwest of the United States (Kelly et al. 2014). Our final dataset included 173 zooplankton allochthony estimates from 18 studies, representing 92 unique lakes.
Our dataset maintained the taxonomic resolution provided by authors (from as coarse as "mixed zooplankton" to as fine as genus). We grouped taxa into one of three groups: mixed zooplankton, copepods, or cladocerans. When multiple estimates of zooplankton allochthony were provided for the same lake and taxa type, we used the following rules to determine allochthony. First, if allochthony was reported for the same lake and taxa, we took the average across the ice-free season. Second, if estimates came from both natural abundance stable isotopes as well as an isotope tracer study, we used natural abundance estimates for more consistency with the rest of our dataset. Our resulting taxa-specific datasets included allochthony estimates in 69, 49, and 32 unique lakes for mixed zooplankton, copepods, and cladocerans, respectively.

Lake characteristics
For both datasets, we compiled information on lake characteristics, including latitude and longitude, surface area, maximum depth, mean depth, phytoplankton chlorophyll a, DOC, total nitrogen (TN), and TP. We compiled variables primarily as reported in the original source; when these variables were not reported, we contacted authors and searched for other published studies on the same lake that reported any of the missing information for a similar time of year (source can be found in "comments" column of the datasets; cite repository here). In some cases, we calculated lake area using Google Earth or gathered data from county websites. We unfortunately could not account for differences in DOC sources, and while we use DOC as a proxy for terrestrial inputs, we recognize the potential for autochthonous contributions. We determined terrestrial biome from The Nature Conservancy's Terrestrial Ecoregions GIS database (Olson and Dinerstein 2002) to provide landscape context.

Statistical analysis
Prior to statistical analysis, we explored our datasets for possible outliers. We removed two sites from our metabolism dataset: Lake Ontario due to its large size (18,960 km 2 ) and maximum depth (244 m) and Santa Olalla (Spain) for its high nutrient (TN = 13,778 μg L À1 ; TP = 680 μg L À1 ) and chlorophyll a (365 μg L À1 ) concentrations, bringing our sample size to 90 lakes. In our mixed zooplankton allochthony dataset, we removed Lake Superior due to its large size (82,103 km 2 ) and maximum depth (406 m), bringing our sample size to 68 lakes.
To examine the strength and shape of relationships between lake characteristics and GPP, ER, and zooplankton allochthony, we used three separate boosted regression trees. Boosted regression trees are a nonparametric tool that extends the concept of a regression tree, where observations are partitioned into groups that respond similarly to predictor variables, and uses machine learning to iteratively and sequentially add trees to the model (i.e., boosting) in a way that reduces loss in predictive performance (Breiman 2001; Leathwick et al. 2006;Elith et al. 2008). Thus, boosted regression trees do not produce a single "best" model, but rather fit a final model in a forward stagewise procedure to best predict the response variable. Boosted regression trees are advantageous because they can handle both categorical and continuous predictor variables and are robust to missing data, multicollinearity, and nonlinear relationships (Elith et al. 2008). Boosted regression trees can be subject to overfitting, a challenge that can be reduced by introducing randomness into the model by using only a portion of data (called "bag fraction") in each iteration, slowing the shrinkage parameter (called "learning rate") that determines the contribution of each tree to the growing model, and limiting the "tree complexity," which controls interactions among variables (Elith et al. 2008). We fit boosted regression trees using the "gbm.step" function within the "dismo" R package (Hijmans et al. 2017). The "gbm.step" function implements 10-fold cross validation, which divides these data into 10 subsets and calculates residual deviance at each step to estimate the optimal number of trees. By holding out data (using the bag fraction) and implementing cross validation, the final model is less prone to overfitting, and cross validation statistics can be used to evaluate model fit. We compared models with different bag fractions (0.60-0.65), learning rates (0.001-0.003), and tree complexity (3-4) to minimize cross-validated variance and to ensure that models were fit with at least 1000 trees, the minimum recommended by Elith et al. (2008).
We fit separate boosted regression models for GPP, ER, mixed zooplankton allochthony, and copepod allochthony. We did not run models for cladoceran allochthony due to a limited sample size (n = 32 lakes), as models on this smaller dataset would require dramatically increasing the bag fraction, reducing model accuracy and increasing the likelihood of overfitting models. We fit all models with the following predictor variables: surface area, maximum depth, latitude (absolute value), DOC, TP, and TN. We did not include mean depth because it was highly correlated with maximum depth (r = 0.89-0.97, Tables S1 and S2), had a smaller sample size than maximum depth (Table 1), and had a similar relationship as maximum depth to response variables in model outputs. We also excluded chlorophyll a from the boosted regression tree analysis as it is not an independent predictor, but rather is mechanistically linked to variables such as nutrients, DOC, and morphometry, which are included in the models. Response variables were transformed to fit Gaussian model assumptions: GPP and ER were natural-log transformed, and zooplankton allochthony was arcsine transformed. To examine relationship shape between each predictor variable and GPP, ER, and allochthony, we fit partial dependence plots. To evaluate relationship strength, we examined variable importance and which variables could be removed from the model without reducing predictive deviance by using cross-validation in the "gbm.simplify" function. We evaluated full model performance by examining two cross-validation statistics: the final cross-validated correlation, which describes the correlation between fitted values and raw data withheld, and the cross-validated deviance explained (cross-validated D 2 ), which is equal to 1 À (estimated cross-validated variance/mean total variance).
To explicitly test our second hypothesis related to coupling between GPP and ER, we ran a linear model between LNtransformed GPP and ER, and extracted the model residuals. This approach removes the GPP-ER relationship without assuming which variable should serve as the response; we then examined if any of our predictor variables could explain the residual variation using univariate generalized additive models (GAMs). We constructed GAMs with the "mgcv" R package (Wood 2021) using restricted maximum likelihood (REML), a Gaussian distribution, and thin plate regression Table 1. Physical and chemical characteristics of the 68 lakes analyzed in the mixed zooplankton allochthony dataset and 90 lakes analyzed in the ecosystem metabolism dataset. Note that we removed two outliers from the metabolism dataset and one outlier from the allochthony dataset (see text).

Metabolism database (n = 90)
Allochthony database (n = 68) splines for smoothing. We examined model fit based on the deviance explained (D 2 ), calculated as 1 -(residual deviance/ null deviance). Lastly, we tested our third hypothesis that GPP and allochthony would each respond to DOC and nutrients by using GAMs to supplement our boosted regression tree analysis. We constructed multivariate GAMs using the variables that improved model fit in the boosted regression trees. We fit GAMs using the "mgcv" R package (Wood 2021) as above with a Gaussian distribution for GPP models and a beta distribution with a logit link function for allochthony models (fit as proportions). We used the "visreg" R package (Breheny and Burchett 2017) to plot predicted relationships between one predictor variable (either TP or DOC) and the response variable (either GPP or allochthony), while holding other predictor variables to their mean value. To specifically test how GPP and allochthony each responded to the relative abundance of DOC and nutrients, we created univariate GAMs with the ratio of DOC to productivity as the predictor variable (e.g., DOC: total nutrients and DOC: chlorophyll a). As above, we evaluated model fits using D 2 .

Ecosystem metabolism
We analyzed metabolism data from 90 lakes across five continents, with only Africa and Antarctica not represented in the dataset (Fig. 1). While the 90 sites represent tropical, subtropical, Mediterranean, and temperate ecosystems, the majority of sites (n = 75) were located in the temperate broadleaf and mixed forest biome in North America and Europe. The lakes covered a wide range of environmental characteristics, spanning from small ponds to large lakes (83 m 2 -2611 km 2 ), from oligotrophic to eutrophic in nutrient status, and encompassed both low and extremely high DOC concentrations (0.8-35.2 mg L À1 ) ( Table 1, Fig. S1). Both GPP and ER had right-  skewed distributions ( Fig. 2A,B), and median ER (4.60 g O 2 m À2 d À1 ) was greater than median GPP (3.61 g O 2 m À2 d À1 ).
Each of the boosted regression trees indicated largely nonlinear relationships between GPP or ER and their predictor variables (Fig. 3). The boosted regression tree for GPP explained 37.5% of the cross-validated deviance and the cross-validated correlation was 0.63 (cross-validated residual deviance = 0.362, total deviance = 1.284, 5850 trees, learning rate = 0.001, bag fraction = 0.65, tree complexity = 3). The most important predictor variable was TP, followed by surface area, latitude, and DOC; maximum depth and TN did not improve GPP model performance (Fig. 3). Models predicted that GPP was positively related to TP in lakes with intermediate TP concentrations (11-75 μg L À1 ); there was no relationship in lakes with either lower or higher concentrations. GPP increased with lake surface area until~5 km 2 , after which models discerned little effect. Models also predicted that GPP was lower in low latitude lakes and higher above 50 N. Lastly, models predicted a unimodal relationship between GPP and DOC, where GPP was highest in lakes with intermediate DOC concentrations (4.5-7.0 mg L À1 ).
The boosted regression tree for ER explained 22.9% of the cross-validated deviance and the cross-validated correlation was 0.52 (cross-validated residual deviance = 0.825, total deviance = 1.070, 4200 trees, learning rate = 0.001, bag fraction = 0.65, tree complexity = 3). The most important predictor was TP, with maximum depth and TN each having lower importance; surface area, latitude, and DOC did not improve model performance (Fig. 3). ER response to TP mirrored that of GPP, with models predicting that ER increases with TP in lakes with intermediate TP concentrations (11-75 μg L À1 ), whereas lakes with lower TP concentrations had uniformly low ER, and lakes with higher TP concentrations had uniformly high ER. Models also predicted that ER was nonlinearly related to maximum depth: ER was high in both shallow (<1.15 m) and deep (>8.5 m) lakes, and lower at intermediate depths (minimum ER found at~4.5 m). Lastly, models predicted a negative relationship between ER and TN, but only in lakes with intermediate TN concentrations (520-1000 μg L À1 ).
The linear model indicated that ER was tightly coupled with GPP, particularly at low metabolic rates (LN-transformed, adj. R 2 = 0.64, p < 0.001; Fig. 4A). By examining relationships between GPP-ER residuals and predictor variables using GAMs, we found that DOC explained 44% of the remaining deviance (Fig. 4B), whereas the other predictor variables each explained ≤6% of the deviance. Fig. 3. Partial dependence plots for boosted regression trees relating each predictor variable to GPP (row 1, cross-validated D 2 = 37.5%), ER (row 2, cross-validated D 2 = 22.9%) in units of g O 2 m À2 d À1 , and mixed zooplankton allochthony (row 3, cross-validated D 2 = 60.0%). The plots represent the effect of each individual predictor variable while holding the other variables to their mean level. The relative importance of each predictor variable to each response variable is provided in the x-axis label; plots for the top predictor of each response variable is outlined in a thick black line. Numbers in the bottom right of plots report the order of importance, and only include variables that improved model performance. Note the log-transformed x-axis for all variables except latitude.

Food web allochthony
We analyzed mixed zooplankton allochthony from 68 lakes. The geographic representation of these lakes was limited to North America and Europe, with all sites above 41 N latitude in temperate, boreal, and tundra biomes (Fig. 1). The lake characteristics for the allochthony dataset had narrower ranges than the metabolism dataset (Table 1; Fig. S1), but still represented a variety of lake sizes (500 m 2 -47 km 2 ), maximum depth (0.5-62 m), DOC concentration (1.8-25.9 mg L À1 ), and spanned from oligotrophic to eutrophic in nutrient status (Table 1, Fig. S1).
Median mixed zooplankton allochthony was 25.8%, ranging from 0% to 80% and skewed right (Fig. 2C). The final boosted regression tree model explained 60.0% of the cross-validated deviance in allochthony and the cross-validated correlation was 0.78 (cross-validated residual deviance = 0.006, total deviance = 0.088, 4000 trees, learning rate = 0.005, bag fraction = 0.65, tree complexity = 3). The most important variable was DOC, followed by latitude, surface area, and TP; TN and maximum depth did not improve model performance (Fig. 3). Zooplankton allochthony was nonlinearly related with all variables. Models predicted intermediate levels of allochthony in lakes where DOC concentrations were either low (<5 mg L À1 ; 37% allochthony) or high (>10 mg L À1 ; 34% allochthony). The greatest range of allochthony values occurred in lakes with 5-10 mg L À1 DOC, with the lowest values in lakes with DOC concentrations~5.6 mg L À1 and highest values in lakes with~8.5 mg L À1 . Models predicted low zooplankton allochthony in lakes with latitudes 50-60 N and higher allochthony above 60 N; unfortunately, we were limited by no estimates below 41 N. Allochthony values were highest in small lakes <0.1 km 2 , lower in lakes 0.1-0.4 km 2 , and intermediate (~35%) in lakes >0.4 km 2 . Lastly, models predicted a unimodal relationship between allochthony and TP, where highest allochthony occurred in lakes with intermediate TP (~8-20 μg L À1 ).
We analyzed estimates of copepod allochthony from 49 unique lakes, which represented less productive (maximum chlorophyll a: 13.0 μg L À1 ) and shallower (maximum depth: 24 m) sites compared to the mixed zooplankton database (- Table S3). Median copepod allochthony was 29.4%, and predictor variables explained 32.6% of cross-validated deviance (Fig. S2, Supporting Information). Overall, models predicted that copepods responded to environmental drivers similarly to mixed zooplankton, although TP was the most important variable whereas DOC and latitude had reduced importance (Fig. S3). Similar to mixed zooplankton assemblages, copepod allochthony was highest in lakes with intermediate TP concentrations (~8.5-20 μg L À1 ) and exhibited reduced allochthony in lakes with 10-13 mg L À1 DOC (Fig. S4).

Integrating allochthony and GPP responses to DOC and nutrients
Multivariate GAMs explained 66.4% of the deviance for GPP and 81.7% of the deviance for allochthony (Fig. 5). The GAMs and boosted regression trees predicted similar relationships between TP and both GPP and allochthony. While holding other variables at their mean values, lakes with intermediate levels of TP showed a positive and somewhat linear relationship with GPP (Fig. 5A), while the relationship between TP and allochthony was unimodal, with the highest GPP occurring around 10 μg L À1 (Fig. 5C). The GAMs did not pick up the nonlinear patterns observed in boosted regression trees for DOC relationships with GPP and allochthony. Instead, the GAM models predicted a negative relationship between GPP and DOC (Fig. 5B) and a positive almost linear relationship between allochthony and DOC (Fig. 5D),

Fig. 4. (A) A linear regression model between GPP and ER shows tight
coupling, particularly at low metabolic rates. The solid line represents a linear regression model with 95% confidence intervals in gray (log-log transformation R 2 = 0.64). (B) A univariate GAM model where DOC explains 44% of the residual deviance between GPP and ER. The solid line represents the fitted spline from the GAM with 95% confidence intervals in gray. Each point represents one lake.  GPP and allochthony were each related to the DOC-tonutrient and DOC-to-chlorophyll a ratios (Fig. 6). GPP was greatest when nutrients or chlorophyll a were higher relative to DOC, and GPP was lower in lakes with greater DOC relative to nutrients or chlorophyll a, although the deviance explained for DOC: chlorophyll a was weak (Fig. 6A,B). The highest allochthony values occurred at intermediate ratios, where neither DOC nor nutrients or chlorophyll a dominated (Fig. 6C,D). Interestingly, lakes with the highest allochthony values and lakes with the lowest GPP occurred at similar DOCto-nutrient ratios (Fig. 6A,C).

Discussion
By examining the drivers of ecosystem metabolism and allochthony across a diverse set of global lakes, we identified that nonlinear relationships predominated between lake physicochemical characteristics and organic matter processing. Our results support previously observed patterns of GPP relating nonlinearly to nutrients and DOC, which had yet to be shown in lakes across a large geographic scale. While GPP and ER were closely coupled for most lakes in the dataset, we identified uncoupling in lakes with high DOC concentrations (>20 mg L À1 ), suggesting a shift in the underlying activity driving ER. We also discovered nonlinear relationships between allochthony and both nutrient and DOC concentrations. Ultimately, GPP and allochthony each exhibited large shifts across intermediate concentrations of TP (11-75 ug L À1 ) and DOC (4.5-10 mg L À1 ), and the shifts in allochthony and GPP often aligned at similar DOC and TP concentrations (e.g., the TP range where GPP began to increase corresponded to the TP range where allochthony peaked). Lastly, the ratio of DOC: nutrients identified that both GPP and allochthony responded to relative availability of different organic matter sources.

Nonlinear patterns among drivers of GPP and ER
Our models identified nonlinear relationships between predictor variables and ecosystem metabolism. The boosted regression trees explained 37.5% and 22.9% of the crossvalidated deviance for GPP and ER (Fig. 3), respectively, whereas the GAM models for GPP predicted 66.4% of the deviance (Fig. 5). The discrepancy in deviance explained is due to different statistical approaches, where the boosted regression trees deviance is based on 10-fold cross validation as opposed to deviance reported from the whole dataset in the GAMs. The boosted regression trees predicted that GPP and ER were most strongly related to TP (Fig. 3), which was supported in the multivariate GAMs for GPP (Fig. 5A), thus partially supporting our first hypothesis. Previous studies also highlight a strong positive relationship between TP and productivity (McCauley et al. 1989;Hanson et al. 2003;Solomon et al. 2013), yet we found the relationship was more nuanced: GPP and ER were positively related to TP only in lakes with intermediate (11-75 μg L À1 ) TP concentrations. To put this into context, assuming linear relationships between GPP and TP would inaccurately estimate eutrophication effects on metabolism in lakes with starting TP concentrations <11 or >75 μg L À1 , which would include 41% of lakes in the 2012 US National Lake Assessment (U.S. Environmental Protection Agency 2016). The lack of GPP response to TP in lakes with both low and high TP concentrations could indicate limitation by another resource, as iron is often limiting in oligotrophic lakes (Vrede and Tranvik 2006), and either light or nitrogen could limit productivity in high TP systems (Blindow et al. 2006;Bortolotti et al. 2019;Scott et al. 2019).
GPP was also related to DOC, as predicted by our first hypothesis. The boosted regression trees predicted a nonlinear, unimodal response between GPP and DOC (Fig. 3); however, the GAMs predicted a nearly linear, negative relationship (Fig. 5B). The unimodal DOC-GPP relationship was recently described in empirical and modeling studies suggesting that DOC can promote GPP due to DOC-associated nutrients or coupled nutrient-DOC loading up to a threshold, after which DOC reduces GPP via light limitation (Zwart et al. 2016;Kelly et al. 2018;Olson et al. 2020). Our boosted regression trees predicted that peak GPP occurred in lakes with 6-7 mg L À1 DOC, close to the 7-8 mg L À1 DOC peak predicted by Kelly et al.'s (2018) model, and within the range of empirical estimates from arctic and boreal regions (Seekell et al. 2015;Bergström and Karlsson 2019). Although the GAMs did not identify the unimodal relationship, the relationship was wedge-shaped with more variability in GPP at low DOC concentrations (Fig. 5B), a pattern described along with the unimodal relationship in Kelly et al. (2018) and predicted to be the likely outcome for spatial surveys of GPP. Kelly et al. (2018) found the exact shape (unimodal or wedge) and location of peak GPP in relation to DOC was mediated by additional lake characteristics, including surface area (the second-best predictor in our boosted regression trees) and the DOC: nutrient ratio (Fig. 6A). Specifically, greater P relative to DOC induced higher GPP peaks, whereas the location of peak GPP occurred at higher DOC concentrations in smaller lakes due to shallow mixed layer depths and reduced light limitation (Kelly et al. 2018). Increases in DOC may especially impact shallow lakes, where shading of benthic plants can release sediment nutrients, boosting pelagic and wholeecosystem GPP (Vasconcelos et al. 2016(Vasconcelos et al. , 2018. While TP was the strongest predictor of ER in the boosted regression trees, DOC did not improve the boosted regression tree model performance (Fig. 3). We suspect that the lack of relationship between ER and DOC along with the relatively low explanatory power for the boosted regression trees (22.9% of the cross-validated D 2 ) may be explained by the strong coupling between GPP and ER (Fig. 4A). GPP-ER coupling may overwhelm any potential relationship between ER and DOC seen elsewhere (Staehr et al. 2012;Hoellein et al. 2013;Solomon et al. 2013). Indeed, by examining the GPP-ER residuals, we found that DOC was the best predictor (Fig. 4B), supporting our second hypothesis. Specifically, we observed more decoupling between GPP and ER in lakes with DOC concentrations >20 mg L À1 . At high DOC concentrations, a greater proportion of ER may be attributed to allochthonous matter, rather than respiration reflecting autochthonous production (Jansson et al. 2008).
The drivers of GPP and ER described above are mediated by other lake characteristics, such as latitude and morphometry. Our boosted regression trees predicted higher GPP at higher latitudes. Models also predicted a positive relationship between GPP and surface area in lakes <5 km 2 , contrasting previous work (i.e., Staehr et al. 2012;Kelly et al. 2018). Low areal GPP in small lakes from our dataset may result from correlations with other variables: specifically, lake size was negatively correlated with DOC but had no relationship with TP (Table S1). For ER, models predicted nonlinear relationships with lake depth. Shallow sites likely had higher ER due to a greater contribution from benthic respiration (Staehr et al. 2012) whereas deeper lakes may have higher ER due to coupling with GPP, as our models predicted higher GPP in larger, deeper lakes (Fig. 3).

Nonlinear patterns among drivers of allochthony
Our models demonstrated nonlinear relationships between predictor variables and zooplankton allochthony. The boosted regression trees explained 60.0% of the cross-validated deviance and indicated that DOC was the strongest predictor of mixed zooplankton allochthony (Fig. 3), whereas the GAMs explained 81.7% of the deviance (Fig. 5) with TP and DOC together providing the most explanatory power (accounting for 58.6% of the deviance). Copepods showed similar patterns to mixed zooplankton samples, with TP and DOC being the top two predictors of boosted regression trees ( Fig. S2; Supporting Information). Collectively, these results support our third hypothesis that both nutrient and DOC availability impact consumer allochthony. The boosted regression trees predicted a nonlinear relationship with low mixed zooplankton allochthony in lakes with DOC concentrations around 5.1-5.7 mg L À1 , higher allochthony at around 8-9 mg L À1 , and reduced allochthony at higher DOC concentrations (>10 mg L À1 ) (Fig. 3). The wiggliness of the predicted model suggests possible overfitting; to evaluate this possibility, we ran a bootstrapped approach by sampling from our dataset with replacement. In general, the bootstrapping also identified troughs and peaks in allochthony in lakes with 5-10 mg L À1 DOC, with exact locations varying within this range. It is possible that the wide variety of allochthony values in lakes with 5-10 mg L À1 DOC made it challenging to fit precise models. In contrast, the GAM models predicted a nearly linear, positive relationship between allochthony and DOC, similar to Tanentzap et al. (2017). Yet, there was highest variability between 3 and 10 mg L À1 DOC (Fig. 5D) where the peak occurred in the boosted regression tree, suggesting we need more research on how DOC impacts allochthony at low to intermediate concentrations. We further suspect that the allochthony-DOC relationship may interact with nutrient concentrations, as nutrients and DOC were strongly correlated in this dataset (Table S2).
Both the boosted regression trees and the GAMs predicted a unimodal relationship between zooplankton allochthony and TP, with allochthony peaking in lakes with TP concentrations around 10 μg L À1 (Figs. 3 and 5C). To our knowledge, our study is the first to describe this unimodal allochthony-TP relationship, and indicates that allochthony is tied to lake productivity. At low TP concentrations, limited terrestrial inputs may limit both TP and allochthony; allochthony may also increase with nutrients due to microbial priming, where algal exudates promote microbial breakdown of terrestrial carbon that can then enter the food web (Guenet et al. 2010;Grosbois et al. 2020). While our analysis only considered DOC, priming can also impact particulate organic carbon (POC), which can account for a large proportion of allochthony when POC is consumed directly (Cole et al. 2006;Emery et al. 2015;Mehner et al. 2016). The unimodal relationship suggests that after a critical amount of TP (here~10 μg L À1 ), lakes may have enough autochthonous production that consumers can primarily feed on higher quality algal resources and allochthony will decline.

Integrating drivers and mechanisms of GPP, ER, and allochthony
Recent work has suggested possible connections between ecosystem metabolism and consumer community dynamics (Grosbois et al. 2020;Rüegg et al. 2021), and our results provide further evidence of connections between GPP, ER, and consumer allochthony. Specifically, our results support our third hypothesis that GPP and consumer allochthony each respond to nutrients and DOC, with both exhibiting large shifts at similar change points in TP and DOC. The inflection point at which GPP begins to increase with increasing TP (~11 μg L À1 ; Figs. 3 and 5A) aligns with the TP concentration predicting peak zooplankton allochthony (Figs. 3 and 5C). Similarly, lakes with 4.5-7.0 mg L À1 DOC aligned with peak GPP and the trough of zooplankton allochthony (Fig. 3). The connection between increased production and decreased allochthony suggests that allochthony may be sensitive to primary production. While previous work has typically linked zooplankton allochthony to terrestrial inputs, a few studies suggest that allochthony may only be prevalent when primary production is low relative to bacterioplankton production (Karlsson et al. 2003;Berggren et al. 2015). A whole-lake fertilization experiment also demonstrated a reduction in zooplankton allochthony following nutrient additions (Carpenter et al. 2005). The idea that primary production may decrease allochthony is intuitive based on its higher resource quality compared to terrestrial resources (Brett et al. 2009;Taipale et al. 2014;Brett et al. 2017;Hiltunen et al. 2017).
While the connection between allochthony and primary production has not been widely acknowledged, previous work has described a positive relationship between allochthony and DOC-to-chlorophyll a ratios where greater DOC relative to productivity increases allochthony (Carpenter et al. 2005;Wilkinson et al. 2013). This previous work occurred in lakes where the ratios fell to the left-hand side of our hump-shaped curve (Fig. 6C,D), suggesting the unimodal relationship we observed only emerged by comparing a broad set of lakes. The decline in allochthony at higher DOC: TP ratios may be due to interactions with other lake characteristics such as mixing depth, habitat use, or size. For example, even in lakes with high DOC concentrations, a shallow epilimnion and high volumetric phytoplankton biomass may provide an abundant supply of autochthonous resources, which may be favored over poor-quality terrestrial resources. This may be particularly true of shallow waterbodies, where light penetrates a deeper proportion of the water column, resulting in high production and low allochthony despite high terrestrial carbon inputs (Holgerson et al. 2016).
Connections between ER and zooplankton allochthony have less support in our analysis. ER was largely coupled to GPP, except at high DOC concentrations (Fig. 4); yet, we observed high allochthony across a range of intermediate to high DOC concentrations (Figs. 3 and 5D). The disconnect between ER and zooplankton allochthony can likely be explained by microbial allochthony, which we did not assess. For example, Karlsson (2007) found that allochthonous organic carbon supported~80% of respiration, yet only~40% of zooplankton. Even when the vast majority of organic carbon in a lake is allochthonous, much of it is lost through bacterial respiration before reaching zooplankton (Karlsson 2007;Karlsson et al. 2012). The disproportionately low allochthony in zooplankton may relate to the poorer food quality of allochthonous organic matter (Brett et al. 2009(Brett et al. , 2017 or reduced assimilation or transfer efficiencies compared to autochthonous organic matter (Kritzberg et al. 2005;Taipale et al. 2014). It may be that ER, DOC, and zooplankton allochthony only relate in lakes with high terrestrial inputs and low primary production.

Remaining uncertainties and future outlooks
While our datasets identified important relationships among predictor variables and GPP, ER, and allochthony, uncertainties remain due to limitations of our dataset. For instance, the lower explanatory power for the ecosystem metabolism models may result from differences in study timescales and mismatches to the static measurements of predictor variables. Metabolism often has high day-to-day variation driven by abiotic factors that exhibit daily variability (e.g., photosynthetically active radiation, temperature). In contrast, the environmental variables included here represent more stable measurements or average values across the openwater period. Long-term metabolism patterns are more indicative of site-specific patterns than are day-to-day variations (Bernhardt et al. 2018), but the study length varied among lakes and often focused on summer measurements. Annual trends in metabolism would be interesting to explore, particularly in conjunction with allochthony (Grosbois et al. 2020).
There are also sources of uncertainty in stable isotope-based allochthony estimates for zooplankton. These uncertainties include assumptions made about zooplankton trophic position, variability in isotopic fractionation in primary producers or during feeding, and degree of dietary water influence when δ 2 H isotopes are used (Vander Zanden and Rasmussen 2001;Post 2002;Solomon et al. 2011). Recently developed Bayesian techniques have attempted to deal with these sources of error in isotope models by including priors and uncertainty as input parameters for measured dietary water contribution and trophic position. These models allow for presentation of uncertainty in input parameters and resource use estimates as posterior distributions from model output, ultimately often using the median from the posterior distribution for point estimates of consumer allochthony (Moore and Semmens 2008;Solomon et al. 2011;Layman et al. 2012;Kelly et al. 2014). Input assumptions and models used for allochthony estimates presented here varied across studies, with some studies using Bayesian techniques and including deuterium (and therefore including dietary water) and others using only C and/or N isotopes. As methodology covaried with geographic region in our dataset, we did not explicitly account for this variability in methodology in our models. We also did not observe any obvious bias when comparing allochthony estimates across different isotopes/modeling techniques; however, we acknowledge it may be possible.
Our literature search also highlighted key gaps in field collection, supporting previously documented research biases in limnology (Stanley et al. 2019). Our allochthony dataset came mainly from intermediate-high DOC and low nutrient lakes (Fig. S1), likely pointing to specific research questions aimed at assessing the role of terrestrial matter on zooplankton allochthony. A more comprehensive assessment of allochthony in lakes with wider DOC and nutrient gradients would more adequately describe patterns in terrestrial resource use. This is particularly true for hypereutrophic lakes where GPP-nutrient and GPP-ER relationships may break down, and where phytoplankton may be more difficult to access and/or digest (Filstrup and Downing 2017;Filstrup et al. 2018). Our datasets were also limited in latitudinal range (Fig. 1), and more work is needed to understand the relationships between latitude and GPP and allochthony (Fig. 3). Lastly, our datasets underrepresented small, shallow systems relative to their high frequency on the landscape (Downing 2010); this was particularly true in the metabolism dataset. Shallow waterbodies can have greater benthic-pelagic and aquatic-terrestrial coupling, likely explaining higher DOC and nutrient concentrations, which can result in increased ER and decreased allochthony (Holgerson et al. 2016). To address these data gaps, future research could examine metabolism and allochthony in lakes representing diverse DOC-nutrient combinations across latitudes, including globally abundant small lakes and high nutrient and/or high DOC lakes.
In addition to the need for studying lakes with more diverse characteristics, our study emphasizes a need to measure metabolism and allochthony in the same set of lakes at the same time. We found only seven lakes where GPP, ER, and allochthony were all measured, and all were in the Northern Highlands Lake District of Michigan and Wisconsin, USA. This lack of data may underscore discipline-specific investigations of carbon cycling in lakes, as research into food webs and biogeochemical cycling often happen independently. Accordingly, future research that integrates animal-based energy flow with broad assessments of C flux would allow for explicitly testing the linkage between allochthony and GPP, and create a stronger framework to connect all facets of organic matter processing in lakes (sensu Rüegg et al. 2021).

Conclusions
Our newly compiled datasets highlight common drivers of organic matter processing in lakes, and emphasize that metabolism and allochthony can collectively provide a more complete picture of organic matter source and fate. GPP and allochthony respond nonlinearly to TP and DOC, with each demonstrating large changes at intermediate DOC and TP concentrations. The ratio of DOC: nutrients can help explain organic matter processing, with nutrients promoting GPP and reducing allochthony. As lakes globally experience both eutrophication and browning, understanding mechanisms for organic matter processing is increasingly necessary. For example, lakes recovering from acidification may receive increased DOC loads with only subtle changes in nutrients (Lennon and Pfaff 2005;Monteith et al. 2007;Gerson et al. 2016). Elsewhere, climate and land use change may increase nutrient loads relative to DOC (Mattsson et al. 2009;Duan and Kaushal 2013). An updated paradigm for organic matter processing that accounts for nonlinear dynamics and interactions among nutrients and DOC will better explain differences among lakes, and will guide predictions on how lake organic matter processing responds to environmental change.