Identifying Data Needed to Reduce Parameter Uncertainty in a Coupled Microbial Soil C and N Decomposition Model

Advancements in microbially explicit ecosystem models incorporate increasingly accurate representations of microbial physiology and enzyme‐mediated depolymerization of soil organic matter in predicting biogeochemical responses to global change. However, a major challenge with model structural improvements is the requirement for additional parameters, which are often poorly constrained sources of uncertainty. Furthermore, it is often unclear how to best focus data collection efforts toward reducing model uncertainty. Here, we use Dual Arrhenius Michaelis‐Menten Microbial Carbon and Nitrogen Physiology, a microbially mediated, coupled soil C and N cycling model, as a tool to explore the influence of microbial physiological and enzyme kinetic parameters on model estimates. We first quantify the potential for constraining model parameters using empirical measurements of soil respiration. We then use simulated data to identify which additional sources of data collection from the field would provide the greatest impact for constraining model estimates. We find that modeled soil C and N pools and fluxes are disproportionately sensitive to only a few parameters (e.g., activation energies and microbial CUE), while others exert less influence (e.g., Michaelis‐Menten half‐saturation constants). While some parameters can be constrained by the available data on heterotrophic respiration, the collection of additional data on dissolved organic C and N pools in the soil is identified as a high‐priority data need. Improving our ability to model the interactions of soil microbial physiology, soil chemistry, enzyme activities, and environmental factors on C and N cycling will require closely considering model uncertainties and prioritizing future data collection opportunities based on their impact on model performance.

2 of 21 Manzoni & Porporato, 2009;Zhang et al., 2020Zhang et al., , 2021. Advancements in the representation of N cycling in these models to account for the potential for N limitation has been shown to improve their capacity to predict C sequestration, which is often over-estimated in C-only models (Hungate, 2003;Thornton et al., 2007;Zaehle, 2013). An additional development in modeling C and N cycling has been the incorporation of more explicit representations of soil microbial physiology and the role of microbial groups in producing extracellular enzymes (e.g., Bouskill et al., 2012;Fatichi et al., 2019;Huang et al., 2018;Kyker-Snowden et al., 2020;Schimel & Weintraub, 2003;Sistla et al., 2014;Tang & Riley, 2015;Waring et al., 2013). These advancements have allowed models to elucidate drivers of biogeochemical responses to environmental change (Allison et al., 2010;Sistla et al., 2014;Wieder et al., 2014) and improve their capacity to predict the spatial distribution of soil C (Wieder et al., 2013).
With growing complexity, biogeochemical models are sometimes able to more realistically represent microbial C and N cycling; however, the addition of new model compartments and parameters also has the potential for increasing uncertainty (Manzoni & Porporato, 2009;Marschmann et al., 2019;Shi et al., 2018). Several recent model comparison studies demonstrate divergent model predictions from closely related soil C cycle models. For example, even when given identical forcing data, models disagree on their response to temperature and plant inputs, due to a combination of differences in model forms and parameterization (Wieder et al., 2017). While such divergences are often demonstrated across model intercomparison efforts, it is often not immediately apparent how to best tailor data collection efforts toward reducing model uncertainty, highlighting the need for closer integration between model development and data collection (Sulman et al., 2018;Weintraub et al., 2019;Xie et al., 2020;Zhou et al., 2021).
Soil biogeochemical models are subject to a number of sources of uncertainty, including model structural uncertainty (Ajami & Gu, 2010;Myrgiotis et al., 2018). For example, microbial processes were previously often represented by a single microbial biomass pool, while more recent models have shifted to include multiple functional guilds or distinguish between different microbial functional types (Moorhead & Sinsabaugh, 2006;Sistla et al., 2014;Waring et al., 2013;Wieder et al., 2014). Similarly, the number and types of specific C and N pools represented across models also varies, with a shift toward representing measurable, rather than conceptual pools (Abramoff et al., 2021;Zhang et al., 2021). Identifying soil biogeochemical model structures that are not only accurate representations of soil microbial functions but also useful and parsimonious tools for making projections and predictions remains an active area of research, with many contrasting approaches to model structure (Fan et al., 2021;Kyker-Snowman et al., 2020;Sainte-Marie et al., 2021;Tang & Riley, 2020;Waring et al., 2020;Zhang et al., 2021) and several model comparison efforts exploring the implications of differences in structure Sulman et al., 2018;Zhou et al., 2021).
A second source of uncertainty, which often receives less attention than model structural uncertainty, is related to parameters used in models (Luo & Schuur, 2019). Most soil biogeochemical models use fixed parameter values selected ad hoc or derived from site-specific literature measurements. While some model parameters correspond to values that can be directly measured with relative ease (e.g., stoichiometries of soil organic matter and plant litter) or which reflect physical constants (e.g., diffusion coefficients), parameters that describe enzyme kinetics and microbial physiology can be more challenging to directly quantify. Specific enzyme kinetics are often measured in lab assays, but their translation into models has been challenging due to a mismatch between the conceptual or simplified substrates represented in models and the diversity of highly specific extracellular enzymes, each with unique stoichiometries and sensitivities (Drake et al., 2013;Sinsabaugh et al., 2014Sinsabaugh et al., , 2015. Microbial physiological parameters, such as carbon use efficiency or allocations toward extracellular enzyme production, are also particularly hard to quantify due to metabolic diversity across taxa and challenges with empirical measurements (Ballantyne & Billings, 2018;Dijkstra et al., 2015;Geyer et al., 2016;Hagerty et al., 2018;Saifuddin et al., 2019). Despite large uncertainties in these parameter choices, soil C and N cycle models rarely account for parameter uncertainty explicitly.
In addition to quantifying these distinct sources of uncertainty, identifying which types of data have the potential to reduce uncertainty in soil biogeochemical models is necessary to ensure that data collection efforts are synchronized with efforts to improve model performance. In complex soil biogeochemical models with several pools and dozens of parameters, it may not be readily evident which data sets will most directly reduce model uncertainty. The use of a Bayesian statistical framework has been demonstrated to allow for improved comparisons between soil biogeochemical models (Ajami & Gu, 2010;Xie et al., 2020), and the use of simulated data with a Bayesian approach has been used to identify pool sizes in a soil carbon model (Scharnagl et al., 2010). Here, we develop one approach to quantifying model parameter uncertainty using a combination of real and simulated data to inform future data collection efforts. Specifically, we use a microbially explicit model of soil C and N cycling, Dual Arrhenius Michaelis-Menten Microbial Carbon and Nitrogen Physiology (DAMM-MCNiP), to explore how parameters associated with enzyme kinetics, microbial uptake of C and N, and microbial physiology impact estimates of C and N cycling (Abramoff et al., 2017). Using a combination of sensitivity analyses and a Bayesian statistical approach, we explore the following questions: Inherent Model Sensitivity: Which model parameters have the greatest impact on estimates of soil C and N cycling in this model? Bayesian Data Assimilation: Can the assimilation of existing data reduce parameter uncertainty in this model, and what is the impact of reducing parameter uncertainty on model estimates? Simulated Data to Identify High-Priority Field Studies: Which specific types of data collection efforts have the potential to reduce parameter uncertainty in this model most effectively?

DAMM-MCNiP Model Description
The DAMM-MCNiP model represents coupled soil C and N cycling through soil organic stocks (SOCN), dissolved organic stocks (DOCN), microbial biomass, and extracellular enzyme pools based on soil temperature and soil moisture (Figure 1, Abramoff et al., 2017;Davidson et al., 2011;Finzi et al., 2015). The model utilizes a combination of Arrhenius and Michaelis-Menten kinetic equations to describe the depolymerization of SOCN, resulting in the production of DOCN. A similar set of Arrhenius and Michaelis-Menten kinetic equations describe the incorporation of DOCN into microbial biomass. A series of microbial physiological parameters determine how C and N are partitioned between microbial biomass, enzyme production, respiration, and N mineralization ( Figure 1). The model captures seasonal patterns of heterotrophic respiration at trenched (root-free) plots in a temperate hardwood forest (Abramoff et al., 2017). This model structure is similar to those of several microbially explicit biogeochemical models across ecosystem types, making inferences regarding parameter uncertainty and model structure potentially generalizable Manzoni & Porporato, 2009;Schimel & Weintraub, 2003;Sistla et al., 2014;Waring et al., 2013).
In DAMM-MCNiP, extracellular enzymes produced by microbes depolymerize a fraction of the SOCN pool according to equilibrium chemistry approximation kinetics (Tang, 2015), which are an approximation of reaction kinetics, similar to Michaelis-Menten: SOMavail, the fraction of the total SOCN pool available for depolymerization, is dependent on the cube of soil moisture (θ), which represents diffusional constraints of DOC substrates and extracellular enzymes in water films (dLiq), and the fraction of SOCN that is not physically or chemically occluded from enzyme-binding (Frac; Magill et al., 2000): The maximum rate of depolymerization is determined by soil temperature according to the Arrhenius function: Thus, a total of three kinetic parameters (Km dep , a dep , and Ea dep ) are involved in specifying depolymerization rates based on temperature and the pool sizes of SOCN and enzymes.
Following depolymerization, SOCN enters the DOCN pool, which is available for uptake by microbial biomass (Figure 1). Uptake rate is determined by Michaelis-Menten kinetics, which are dependent on the concentration of DOCN and the concentration of O 2 based on soil moisture: The maximum rate of uptake is specified by the Arrhenius function: Thus, four parameters (Km upt , Km o2 , a upt , and Ea upt ) are involved in specifying rates of DOCN uptake based on DOCN pool sizes and soil O 2 concentration, which is dependent on soil moisture (Abramoff et al., 2017).
Following uptake, C and N are either retained within microbial biomass, allocated toward enzyme production, or lost through respiration and N mineralization. The model ranks the order of operation for these activities according to: respiration > enzyme production > microbial biomass production > overflow C = N min.
Respiration is first calculated as a fixed fraction of C uptake (1 − CUE). Two separate parameters, p and q, determine maximum allocation of C and N toward enzyme production respectively, with actual allocations constrained by enzyme stoichiometry according to Liebig's law of the minimum (Liebig, 1842). If there remain excess C or N from this initial allocation that cannot be incorporated into enzymes due to stoichiometric demand, remaining C and N are maximally incorporated into microbial biomass. Finally, if there remain excess C or N that cannot be incorporated into biomass due to stoichiometric demand, it is lost as overflow respiration or N mineralization. While these assumptions regarding microbial physiology reflect a simplified representation of microbial resource allocation, they are in line with other microbially explicit biogeochemical models. Note that this model, parameterized with data from trenched (root-free) plots also lacks explicit representation of root and mycorrhizal uptake of N, which may lead to an overestimation of N availability for microbes relative to stands with intact roots.

Estimation of Parameter Sensitivity
DAMM-MCNiP requires a total of 25 parameters (Table 1). A subset of seven parameters are directly involved in the kinetic equations describing depolymerization and uptake, and three parameters are used to represent microbial physiology. The remaining 15 parameters are primarily associated with defining stoichiometries and soil physical properties. We assessed the sensitivity of model outputs to the 10 depolymerization, uptake, and physiological parameters. Each parameter was set to vary from 50% below to 50% above the default parameter value set in previous model publications ( Note. First ten rows show parameters estimated in the present analysis.
This approach scales rates of change by both pool size and parameter size to allow for congruent comparisons across model outputs and parameter types, which can vary by orders of magnitude due to the variety of types of parameters and model outputs.

Estimation of Parameter Values Based on Data Assimilation
We utilized a Bayesian statistical framework to explore the potential for constraining parameter uncertainty through data assimilation (Dietze et al., 2014;Lu et al., 2017;Luo et al., 2009;Shi et al., 2015). This approach involves providing prior distributions describing the potential range of values for each parameter of interest. The likelihood of the observed data given particular parameter selections from within the prior distribution is then evaluated. This process is repeated manyfold to generate a posterior distribution describing how likely particular parameter values are given prior constraints and data.
As per Bayes' theorem (Box & Tiao, 1992), the posterior distribution P(θ|X) of the model parameters θ, given observational data X, can be calculated from the prior distribution P(θ) of the model parameters θ and the likelihood P(X|θ) as: In this analysis, we utilized broad, uniform prior distributions throughout all simulations (±50% of default parameter values) to explore a wide range of parameter options outside of those currently incorporated in the published model (Abramoff et al., 2017). Differential Evolution Markov Chain Monte Carlo (DEzs-MCMC, Ter Braak & Vrugt, 2008) simulations were performed to evaluate the likelihood across parameter space using the Bayesian-Tools package (Hartig et al., 2018) in R (R core team, 2017). We used a normally distributed Likelihood function and performed simulations using three chains, each with 20,000 to 100,000 iterations as needed to achieve convergence based on Gelman-Ruben potential scale reduction factors (psrf < 1.3). We estimated all 10 parameters of interest simultaneously or estimated parameters of a given category (depolymerization, uptake, or physiology) while holding others fixed.
We first assimilated published, field measurements of heterotrophic respiration from trenched plots at the Little Prospect Hill tract of the Harvard Forest Long-Term Ecological Research Site (42.58°N, 72.188°W) in Petersham, Massachusetts, USA to constrain depolymerization, uptake and physiological parameters in DAMM-MCNiP.
These observations were collected from a mixed hardwood forest dominated by Quercus rubra and Acer rubrum. Soils at these sites are classified as Canton fine sandy loam, Typic Distrochrepts. Mean annual precipitation at the site is 110 cm and mean annual temperature is 8°C. The respiration data used in this analysis were collected using automated soil respiration chambers monitoring CO 2 efflux from the soil at half-hourly increments across the growing season in 2009 (Davidson et al., 2011;Savage et al., 2008). Simultaneous, automated measurements of soil moisture and soil temperature were also recorded (Davidson et al., 2011;Savage et al., 2008). Additional details on the carbon budget of Harvard Forest placing these fluxes in context of carbon stocks in the forest are described in Finzi et al. (2020).
Several of the model pools and fluxes in DAMM-MCNiP represent mechanistic processes upstream of respiration which currently lack comparable, high-resolution observational data ( Figure 1). To identify which of these data sets would be most useful for future collection and to explore how additional data constraints might impact parameter estimation, we simulated data for SOCN, DOCN, and microbial biomass pools and respiration rates using the default parameter values set in previous model publications and added normally distributed observation error. Simulated data paralleled the frequency of the observational respiration data, with hourly estimates over the course of a single growing season. We then used this simulated data to estimate parameters and compare resulting posterior parameter distributions with the known parameter values used to simulate data. The model was fit to simulated data under five scenarios ((a) all pool size data available, (b) only SOCN pool size data available, (c) only DOCN pool size data available, (d) only microbial biomass data available, and (e) only respiration data available). We calculated the percent difference between the maximum a posteriori (MAP) parameter estimate 7 of 21 and the parameter value used in simulation to measure the accuracy of parameter estimation in each data availability scenario. Scenarios with low percent differences indicate that the data assimilation resulted in reconstruction of the true parameter value with high accuracy, while large percent differences between the MAP and the initial parameter value indicate that the data assimilation did not provide sufficient information for constraining parameter estimation. We measured the precision of parameter estimates by comparing the range of the posterior distribution (2.5%-97.5% interval) to the range of the prior distribution for each parameter: A reduction in the range of the posterior distribution relative to the prior distribution indicates that assimilation of the provided data has provided some constraint on the range of parameter values. In contrast, no change to the posterior distribution relative to the prior distribution would indicate no reduction in uncertainty from assimilation of the provided data.

Model Projections With Updated Parameter Estimates
For parameters that could be constrained through assimilation of respiration data, we assessed the potential impacts of new parameter estimates on long-term model projections by comparing model outputs using posterior parameter estimates to model outputs estimated using default parameters. An annual seasonal cycle of temperature, moisture, and litterfall inputs was repeated for 100 years. Annual totals were calculated for respiration rates and annual means were calculated for pool sizes of SOC, SON, DOC, DON, microbial biomass C, and microbial biomass N. Annual estimates at median parameter settings and their uncertainties based on 2.5% and 97.5% posterior parameter intervals were compared to annual estimates at default parameter settings and their uncertainties based on the range of prior distributions.

Parameter Sensitivity
Model outputs are disproportionately sensitive to a few select parameters and largely insensitive to others (Figure 2). Calculated sensitivities, which reflect scaled changes in model outputs relative to scaled changes in model parameters from ±50% of their default values (Equation 6), ranged from 0 to 3.4 × 10 6 . All model outputs are most sensitive to either the activation energy of depolymerization (Ea dep ) or the activation energy of uptake (Ea upt ). In terms of output variables, the DOCN pool sizes and respiration rates show the highest mean sensitivity to variation in individual parameters, while microbial biomass pools have the lowest mean sensitivity. SOCN pool sizes are almost exclusively sensitive to Ea dep , with sensitivities below 0.02 for all other parameters. The low sensitivity of SOCN pools to most kinetic and physiological parameters is amplified by the fact that only a small fraction of total SOCN is available for enzymatic activity due to chemical and physical protection (Magill et al., 2000). DOCN pool sizes show relatively high sensitivities for all parameters except Km dep , which shows the lowest sensitivity across all model outputs. Microbial biomass pools are most sensitive to Ea dep , Ea upt , CUE, and p. Respiration was most sensitive to Ea upt, due to near-zero rates of respiration resulting when Ea upt is raised above approximately 95 kJ mol −1 . In contrast, respiration rates are nearly insensitive to the other uptake parameters.

Results of Assimilating Observed Respiration Data
Assimilating direct observations of seasonal heterotrophic respiration data from Harvard Forest resulted in large reductions in parameter uncertainty for Ea dep , Ea upt , and CUE (Figures 3 and 4a, Table 2). Although respiration is sensitive to a dep , this parameter does not show a reduction in uncertainty when provided with respiration data. The parameters used to define allocations of C and N to enzyme production (p and q) show reductions in uncertainty, but are highly skewed toward the maximum values allowed in the prior specification. Thus, assimilating respiration data alone allowed for large reductions in parameter uncertainty for some of the most sensitive parameters (e.g., Ea dep , Ea upt, and CUE), while most other parameters require additional data constraints as discussed below.

Results of Parameter Estimation With Simulated Data
Using simulated respiration data with known parameter values was useful for corroborating the patterns observed with field measurements of respiration and identifying the potential for constraining parameter estimates with a more complete simulated respiration data set (i.e., one with no missing timepoints in contrast to the observational data set). Estimating parameters following assimilation of simulated respiration data with known, default parameter values indicates that more complete respiration data alone can reliably reduce parameter uncertainty with high accuracy for Ea dep , Ea upt , CUE, and q ( Figure 4b). However, the remaining six parameters showed low accuracy and limited reductions in uncertainty, reinforcing the finding that reducing uncertainty in these parameters would require additional data sources.
Providing the model with simulated data on SOCN pool sizes only reduced uncertainty in Ea dep , while other parameters showed mostly low reductions in uncertainty and inaccurate MAP values (Figure 4c). These observations are consistent with the strong sensitivity of SOCN pool sizes to Ea dep (Figure 2). In contrast, assimilating data on DOCN pool sizes reduced parameter uncertainty for several parameters (Km O2 , Ea upt , q, CUE, Ea dep ) with high accuracy (Figures 4d and 5). Microbial biomass data was primarily useful for reducing uncertainty in Ea dep and the microbial physiological parameters q and CUE (Figure 4e).  The most useful data set for constraining all three depolymerization parameters and all three physiological parameters was DOCN data (Figures 4d and 5). Parameters associated with uptake were the most challenging to constrain, with Km upt largely unidentifiable by any provided data. While reductions in uncertainty in uptake parameters were relatively small regardless of data type provided, DOCN showed the greatest overall uncertainty reduction and accuracy.

Results of Model Projections With Updated Parameter Values
Incorporating parameter estimates for Ea dep , Ea upt , and CUE based on assimilated respiration data results in major changes in estimates of specific pools and fluxes ( Figure 6). For example, the posterior estimate for Ea dep was 4% lower than the default parameter value (Table 2). Incorporating this relatively small decrease in parameter value resulted in large declines in SOCN stocks over time (Figure 6a), while other pools and fluxes equilibrate toward similar values. This reduction in Ea dep represents a reduced barrier for SOCN depolymerization, which frees a larger fraction of the initial SOCN pool to be released as DOCN before eventually stabilizing. The increased availability of SOCN is temporarily associated with increases in DOCN and microbial biomass. Updating the parameter value for Ea upt based on the posterior median estimate (a 17% increase in parameter value; Table 2) Updating CUE based on the posterior estimate (a 20% reduction in the parameter value; Table 2) results in sustained declines in microbial biomass, declining SOC stores due to reduced microbial inputs, and temporary increases in respiration, N mineralization, and DOC accumulation due to reduced uptake (Figure 6c).

Discussion
Modeling coupled soil microbial C and N cycling is critical for understanding the drivers of soil organic matter sequestration, losses of C and N through mineralization, and their responses to global change. In particular, incorporating explicit representations of microbial physiology and extracellular enzyme activities has been helpful for developing biogeochemical models that are more representative of the underlying processes (Abramoff et al., 2017;Allison et al., 2010;Fatichi et al., 2019;Kyker-Snowman et al., 2020;Schimel & Weintraub, 2003;Wieder et al., 2013). A major challenge with modeling these processes involves accounting for multiple sources of uncertainty, and few studies explicitly account for the potential impacts of parameter uncertainty in particular when reporting model outputs. Furthermore, efforts to reduce model uncertainty can be improved with closer integration of data collection efforts with an analysis of the impact of available data on model performance (Xie et al., 2020).
We used the soil biogeochemical model DAMM-MCNiP to explore how parameters associated with the enzyme kinetics of depolymerization, microbial uptake of C and N, and microbial metabolism impact estimates of C and N cycling. We then used a Bayesian statistical approach to constrain parameter estimates through data assimilation, and we identified specific parameters that could be constrained through existing respiration data alone. As several parameters required additional data constraints outside of the available respiration data, we used simulated data to identify which additional data sets would be most useful targets for future data collection to reduce model parameter uncertainty. Lastly, we updated model parameters based on data assimilation and explored the potential impacts of these shifts in parameter values on long-term model projections.

Which Model Parameters Have the Greatest Impact on DAMM-MCNiP Model Outputs?
DAMM-MCNiP requires a total of 25 parameters, with 10 parameters involved in depolymerization, uptake and physiology specifically (Table 1). We focused our analyses on these parameters as they are some of the most challenging parameters to measure directly, in contrast to soil biophysical parameters (e.g., bulk density, stoichiometries of soil organic matter, and plant litter) which are routinely measured. These parameters are associated with some of the most recent advancements in soil biogeochemical modeling, as they are related to the direct representation of enzyme-mediated decomposition and microbial metabolism (Allison et al., 2010;Manzoni & Porporato, 2009;Tang, 2015). Note. Parameter estimates for all 10 parameters estimated together using field observations of heterotrophic respiration (100,000 iteration MCMC with three chains and burn-in of 1,000).  Prioritizing efforts to constrain model parameter uncertainty should be guided in part by an assessment of which parameters have the largest impacts on model outputs of interest. In DAMM-MCNiP, model outputs are disproportionately sensitive to the activation energies of depolymerization and uptake (Ea dep , Ea upt ), while other parameters, including the half-saturation constant of depolymerization (Km dep ) show minimal impact on C and N cycling (Figure 2). For example, holding temperature and moisture constant, but varying Km dep by ±50% of its default value only alters estimated rates of depolymerization by 3%. Furthermore, depolymerization acts only on a small fraction of the total SOCN pools, based on the available fraction of unprotected soil organic matter and diffusion dictated by soil moisture. Therefore, the large stocks of total SOCN are essentially insensitive to this parameter in this model. In contrast, varying Ea dep by ±50% of its default value leads to variation in depolymerization rates over several orders of magnitude, allowing this parameter to have detectable impacts on SOCN pool sizes. In DAMM-MCNiP, activation energies (Ea dep , Ea upt ) and the pre-exponential factors (a dep , a upt ) are used to calculate the maximum reaction rates (Vmax) for depolymerization and uptake, which are both sensitive to temperature. In the context of global change, it is particularly important to constrain these enzyme kinetic parameters as they directly impact estimates of how SOCN stocks respond to warming . As Vmax increases exponentially in response to temperature, models parameterized according to this formulation predict positive feedbacks to warming. However, substrate supply can also limit reaction rates, resulting in observed rates that are below the predicted temperature-dependent Vmax. Indeed, if higher temperatures are accompanied by greater evapotranspiration and lower soil moisture, then the effect of substrate limitation (Equation 4) could offset the effect of temperature (Equation 5) on observed rates of decomposition. Climatic effects on plant inputs of soil C would also affect SOCN stocks, but are beyond the scope of this study. Collectively, these observations demonstrate the importance of constraining enzyme kinetic parameters, and articulating the contributions of uncertainties in Ea, Km, a dep , and a upt to driving predictions in SOCN responses to global change.

Can the Assimilation of Existing Data Reduce Model Parameter Uncertainty, and What is the Impact of Reducing Parameter Uncertainty on Model Estimates?
In addition to analyzing the sensitivity of model outputs to variation in parameter settings, it is important to also pair these analyses with an assessment of parameter uncertainty. Focusing solely on parameter sensitivity can be misleading as highly sensitive, but tightly constrained parameters can potentially be smaller sources of uncertainty to model estimates than less-sensitive, but poorly constrained parameters (Dietze et al., 2014). Assimilating field data on heterotrophic respiration was primarily useful for constraining three parameters (Ea dep , Ea upt , CUE), while the remaining parameters were not reliably identifiable from respiration data alone (Figure 3). These parameters also had some of the largest impacts on modeled estimates of C and N. Thus, although respiration data alone could not constrain all parameters, it was associated with uncertainty reductions in some of the most critical parameters to constrain.
Microbial physiological parameters like CUE are among the most challenging to directly measure and represent in biogeochemical models due to the diversity of microbial metabolic processes and variation across microbial taxa and substrate chemistry (Saifuddin et al., 2019;Sinsabaugh et al., 2013). The capacity to reduce uncertainty in CUE through the assimilation of heterotrophic respiration data may reflect the view of CUE as an emergent property of several intersecting microbial processes (Hagerty et al., 2018). However, reducing uncertainty in this individual parameter is itself not sufficient for determining whether or not the particular structural representation of CUE in this model and other similar ones is a valid way to simplify multiple sources of variation in emergent CUE, which may be dependent on constituent processes including microbial assimilation efficiency, biomass-specific respiration, and enzyme production (Hagerty et al., 2018).
Parameter estimates identified through data assimilation were sometimes very different from default parameter settings currently used in the published model. Even small adjustments to these parameter choices, justified by reductions in uncertainty, could have major consequences for model projections ( Figure 6) and their associated uncertainties (Figures 7 and 8). For example, assimilation of the respiration data supported a small reduction in Ea dep by 4%. This minimal change in a single parameter resulted in increased rates of depolymerization and reduced SOCN stocks by over 50% within 30 yr compared to the default parameter simulation (Figure 6a). Note that this is simply a reflection of model sensitivity to decomposition parameter selection rather than a projection of future global change scenarios, in which SOCN stocks would depend also on other changes which may covary with these parameters such as changes in rates of gross primary productivity, for example. Additional constraints outside of the observed respiration data would be helpful in further reducing uncertainty in parameter estimation for Ea dep and other parameters. While respiration was the only direct observational data available to us, using simulated data allowed us to identify the particular types of data for future collection with the greatest potential for reducing parameter uncertainty.

What Specific Types of Data Have the Potential to Reduce Model Parameter Uncertainty Most Effectively in DAMM-MCNiP?
As respiration data alone could not be used to constrain all model parameters, we explored which potential future data sources would be most useful for reducing parameter uncertainty in DAMM-MCNiP. We found that identifying changes in DOCN pool sizes, particularly reflecting their availability at the site of microbial uptake, could reduce parameter uncertainty for most parameters (Figure 4d), while measuring SOCN pool sizes had limited utility ( Figure 4c). DOCN pool sizes were also sensitive to variation in most parameters, in contrast to SOCN pool sizes which were relatively stable across parameter variation ( Figure 2). We note that in the present version of DAMM-MCNiP, there is not a diffusivity function for DOCN to the enzyme reactive site as there is for the depolymerization step. Therefore, while bulk DOCN measurements alone may be useful, it is likely that modeling the diffusion of DOCN and hence DOCN concentrations at uptake sites is of equal, if not greater, importance to parameter estimation .
DOCN pools are produced through depolymerization and consumed through uptake, placing them centrally in the model and making them directly responsive to both depolymerization and uptake parameters. Additionally, microbial physiological parameters indirectly impact DOCN pool sizes as the size of the microbial biomass pool impacts rates of uptake. Thus, due to their high connectedness and relatively small size compared to other pools and parameters in the model, DOCN data showed the greatest potential as a single source for constraining multiple parameters. While soil decomposition models differ in their specific representation of C and N pools, our analysis suggests that dynamic pools and fluxes that are located more centrally within model structures are most likely to constrain parameters and are a high-priority for data collection.
Classical decomposition models represent conceptual C pools with constant decay rates, which can be difficult to measure directly, in contrast to a growing trend of representing actual pools such as DOCN Manzoni & Porporato, 2009;Robertson et al., 2019). DOCN is composed of amino acids, peptides, and other compounds which can be rapidly consumed by soil microorganisms as both C and N sources (Farrell et al., 2011(Farrell et al., , 2014Finzi & Berthrong, 2005;Warren, 2014). A variety of methods exist for measuring this pool in the field, including in situ measurements through lysimetery or microdialysis (Currie et al., 1996;Inselsbacher 18 of 21 et al., 2011;Warren, 2014) making it possible to pair measurements of DOCN along with respiration measurements for constraining model parameters in the future. Despite these opportunities, collecting data on DOCN pools at a comparable temporal scale to the respiration data used in this study is likely to remain challenging.

Implications for Modeling Soil Biogeochemistry
DAMM-MCNiP combines the effects of temperature, soil moisture and N on C cycling through the SOCN-microbial system. Several soil biogeochemical models share similar structures to DAMM-MCNiP, featuring multiple distinct soil C pools with transfers among pools mediated by microbially produced enzymes according to Michaelis-Menten type kinetics (Huang et al., 2018;Manzoni & Porporato, 2009;Tang & Riley, 2015;Wang et al., 2013). Thus, our findings related to uptake and depolymerization kinetic parameters and microbial physiological parameters may have similar implications for models with shared core structures. However, it is critical to note that DAMM-MNCiP represents one of many potential model structural possibilities, and there exist many contrasting approaches to model structure that impact the function and interpretation of shared parameters (Fan et al., 2021;Kyker-Snowman et al., 2020;Sainte-Marie et al., 2021;Tang & Riley, 2020;Waring et al., 2020;Zhang et al., 2021).
Parameter estimation efforts in biogeochemical modeling have largely focused on aboveground processes, plant-related parameters, or simple decay rate constants for decomposition, while similar approaches in microbially explicit coupled C-N soil biogeochemical models are lagging (Luo et al., 2009). The assimilation of simulated data has previously been used to assess the identifiability of decay rate parameters in the two-pool Introductory Carbon Balance Model (ICBM; Luo et al., 2017) and initial pool sizes in the Rothamsted C model (Scharnagl et al., 2010). Both the ICBM and Rothhamsted C model lack a mechanistic representation of microbial processes, coupled C and N cycling, and enzyme-mediated depolymerization, making it necessary to extend these approaches to explore parameters in more recent soil biogeochemical models. Adopting a similar approach to the one explored here for DAMM-MCNiP will be critical to identifying the specific sources of parameter uncertainty and most promising opportunities for data collection to reduce parameter uncertainty in related soil biogeochemical models with different structures and parameters.
Current microbially explicit coupled C-N soil biogeochemical models predict widely divergent model projections in response to global change due to differences in structure and parameterization (Sulman et al., 2018;Wieder et al., 2017). A comparison of microbial biogeochemical models found that confronting conflicting models with a large synthesis of available data from field experiments on SOC responses to global change was unable to identify which models are most representative and also highlighted a consistent failure among models to predict certain empirical observations of increased SOC accumulation under warming (Sulman et al., 2018). These challenges indicate a need for more closely integrating data collection efforts with model development, both for model parameterization and model structural improvements.
Efforts on improving model representations of C and N cycling tend to focus on increasing model structural complexity to include more realistic representations of microbial physiology and enzyme-mediated decomposition. These advancements have been critical for understanding the direct controls of enzyme activities on soil organic matter storage and depolymerization, as well as the role of microbial physiology in regulating C and N cycling; however, it is equally important to consider model uncertainties associated with parameterization. The present study explored sources of parameter uncertainty and opportunities for tailoring data collection toward reducing this uncertainty for one specific model structure; however, it would be necessary to use a similar approach to compare results and opportunities across various soil biogeochemical model structures. Improving our ability to model the interactions of soil microbial physiology, soil chemistry, enzyme activities, and environmental factors on C and N cycling and their responses to global change will require quantifying model uncertainties and closely integrating future data collection with model needs.

Data Availability Statement
The DAMM-MCNiP code developed in this manuscript is archived and publicly accessible in a GitHub repository: https://github.com/rabramoff/DAMM-MCNiPv0. The C efflux measurements used in this study and related metadata can be accessed at the Harvard Forest Data Archive, a freely accessible online archive of measurements