21st‐century biogeochemical modeling: Challenges for Century‐based models and where do we go from here?

21st‐century modeling of greenhouse gas (GHG) emissions from bioenergy crops is necessary to quantify the extent to which bioenergy production can mitigate climate change. For over 30 years, the Century‐based biogeochemical models have provided the preeminent framework for belowground carbon and nitrogen cycling in ecosystem and earth system models. While monthly Century and the daily time‐step version of Century (DayCent) have advanced our ability to predict the sustainability of bioenergy crop production, new advances in feedstock generation, and our empirical understanding of sources and sinks of GHGs in soils call for a re‐visitation of DayCent's core model structures. Here, we evaluate current challenges with modeling soil carbon dynamics, trace gas fluxes, and drought and age‐related impacts on bioenergy crop productivity. We propose coupling a microbial process‐based soil organic carbon and nitrogen model with DayCent to improve soil carbon dynamics. We describe recent improvements to DayCent for simulating unique plant structural and physiological attributes of perennial bioenergy grasses. Finally, we propose a method for using machine learning to identify key parameters for simulating N2O emissions. Our efforts are focused on meeting the needs for modeling bioenergy crops; however, many updates reviewed and suggested to DayCent will be broadly applicable to other systems.


| THERE IS A RICH HISTORY OF CENTURY-BASED MODELS UTILIZING A SIMPLE BUT POWERFUL FRAMEWORK FOR PREDICTING SOIL CARBON POOLS
The Century model was initially used to predict the extent to which abiotic and biotic factors control carbon cycling in temperate grasslands (Parton, Schimel, Cole, & Ojima, 1987). Over time, a large number of contemporary earth system models (Table 1) have adopted the Century framework as the core prediction of carbon cycling in terrestrial ecosystems. It is difficult to find a study comparing the performance of soil carbon dynamics between models that does not include Century or DayCent, the daily time-step version of Century (Parton, Hartman, Ojima, & Schimel, 1998), among the ranks Sulman et al., 2018;Walker et al., 2014;Wieder, Bonan, & Allison, 2013;Ye, Bradford, Dacal, Maestre, & García-Palacios, 2019). Century runs on a monthly time-step and simulates carbon, nitrogen, phosphorous, and sulfur dynamics within soil and vegetation of a specified system. The soil organic matter (SOM) sub-model uses three major pools to model carbon flow through the soil relying on first-order kinetics (decay constants; k-values) and soil texture (Figure 1a). This system consists of discrete pools to represent SOM and litter. SOM is partitioned into active, slow, and passive pools, each with a different intrinsic rate of decomposition which is altered by abiotic factors (i.e., soil temperature and soil moisture; Parton, 1996). Most of the models reviewed in Table 1 share similar methods of simulating soil carbon dynamics using decay constants to drive decomposition and carbon transfer between pools. While some models, such as DayCent, CASA (Wang, Law, & Pak, 2010), and DNDC (Li, Frolking, & Frolking, 1992;Li, Frolking, & Harriss, 1994), include a microbial biomass pool, they do not use microbial biomass, microbial enzyme production, or microbial carbon use efficiency (CUE; the ratio of carbon assimilated that is converted to biomass vs. that which is respired) to drive decomposition rates. The microbial biomass pool simply serves as another carbon pool with a different rate of carbon loss and transfer.
While Century-like soil carbon dynamics have been the staple for many models over the past three decades (Table 1), there is a movement to improve how decomposition is represented in models by progressing beyond k-value driven first-order kinetic equations that lack the complexity necessary to capture smaller scale temporal dynamics that are important for balancing carbon budgets (Wieder et al., 2013). Recent model advances have shown that explicit representation of soil microbial physiology leads to divergent soil carbon flux trajectories not captured by models that rely on first-order kinetics. For example, microbial models have been successful in capturing ephemeral increases in decomposition due to warming (Allison, Wallenstein, & Bradford, 2010), re-wetting (Evans, Dieckmann, Franklin, & Kaiser, 2016), and root-priming (Sulman et al., 2017). Researchers argue that temperature sensitivities are too variable across space to work well with single site observational data and therefore do not allow models of decomposition to be truly mechanistic (Davidson, Janssens, & Luo, 2006). As such, earth system model development (CLM; Table 1) has begun to integrate microbial process-based reverse Michaelis-Menten kinetics (Sulman et al., 2019;Wieder et al., 2013) as well as incorporating vertical connectivity to the Century-based model of decomposition already present in CLM (Koven et al., 2013). However, microbial process-based mechanisms are not being incorporated into the CMIP6 version of CLM. To date, the only CMIP6 model that is incorporating microbial processes (i.e., microbial biomass has an influence on decomposition rates) is IPSL-CM6A-LR which uses the ORCHIDEE-PRIM biogeochemical model.

IS BEING UTILIZED TO IMPROVE MODEL MECHANISMS AND CONSTRAIN MODEL PARAMETERS
The ability of models to accurately predict biogeochemical cycling of bioenergy systems ultimately relies on the quantity and quality of data available for parameterization and validation. When initially developed, Century was validated using ecosystem carbon and nitrogen budgets with the goal of accurately simulating biomass yields and equilibrium soil carbon and nitrogen stocks across grassland ecosystems (Parton et al., 1987). Century used these budgets to distill first principle controls (climate, tissue chemistry, etc.) on plant productivity and SOM decomposition into simple yet powerful mechanistic representations of these complex processes. As technology progressed, particularly with the advent of eddycovariance techniques to quantify ecosystem GHG budgets, Carbon dioxide is lost as respiration when there is a transfer from a pool to one of the live microbial C pools these new data streams were used to push the validation of Century and DayCent beyond ecosystem carbon and nitrogen pools to the fluxes that control them. Given the development and maturity of new empirical techniques, there is now the potential to delve even deeper and refine the mechanistic controls on these fluxes. For example, the ability to rapidly measure key leaf and photosynthetic traits (e.g., leaf mass per area, nitrogen content, V max , etc.) of bioenergy feedstocks can improve plant functional type (PFT) parameterizations of such traits in DayCent. Belowground, the maturity of soil fractionation techniques has led to the development of soil decomposition models (MEMS; Cotrufo, Wallenstein, Boot, Denef, & Paul, 2013;Robertson et al., 2019) that simulate soil carbon fractions that are measurable in the field, thereby replacing the conceptual active, slow, and passive fractions used by Century-based models.

FORMATION, STABILIZATION, AND LOSS ARE NOT INCLUDED IN ECOSYSTEM AND EARTH SYSTEM MODELS
Despite the success of models that rely on decay constants or "k-values" to control the turnover rates of SOM pools, recognition of new paradigms describing the formation, stabilization, and loss of SOM may improve the ability of these models to predict the environmental impact of bioenergy crops (and other agricultural and native plants) as we transition to novel climatic conditions in the future. At the heart of these new theories is the concept that plant-microbial interactions govern the extent to which plant-derived inputs (i.e., root and leaf litter) form stable SOM (Sokol & Bradford, 2019). For bioenergy crops, there are two key plant traits that feedback on SOM persistence. The first is that there is a strong interaction between feedstock litter chemistry and microbial traits. The second is that differences between feedstocks in belowground carbon allocation to rhizosphere microbes can enhance the formation of stable SOM. Feedstock litter chemistry drives keystone microbial traits that control the rate and pathway of SOM formation. For example, feedstocks like corn with low C:N ratio litters promote microbes with high CUE (carbon assimilated into biomass per unit carbon taken up) and fast turnover rates (Cotrufo et al., 2013;Zhu, Liang, Masters, Kantola, & DeLucia, 2018). The resulting pool of dead microbial products is thought to be preferentially sorbed to clay minerals in the soil making it physically protected from microbial attack (Schmidt et al., 2011). By contrast, plants with high C:N ratio litters like Miscanthus require greater enzyme investment for microbial decomposition, resulting in lower CUE and turnover rates. As a result, more SOM is chemically stabilized as particulate organic matter because microbial decay is energetically limited (Castellano, Mueller, Olk, Sawyer, & Six, 2015). Why do these pathways matter for models? SOM that is chemically protected is highly susceptible to loss in a warmer world because rising temperatures lower the activation energy of microbial decay. As such, models that can mechanistically link microbial traits and activity with the distribution of SOM that is physically versus chemically protected are essential to projecting differences between feedstocks in SOM persistence.
To improve DayCent's ability to better represent plantmicrobe interactions and to simulate soil carbon dynamics under future climate scenarios, we plan to integrate microbial explicit mechanisms of the coupled soil carbon and nitrogen cycling model, FUN-CORPSE ( Figure 1; Sulman et al., 2017) into DayCent. The nitrogen portion of the model, FUN (Fixation and Uptake of Nitrogen), estimates the carbon cost of nitrogen to determine carbon allocation from plants to stimulate N cycling in the rhizosphere (Brzostek, Fisher, & Phillips, 2014;Fisher et al., 2010). The SOC model, CORPSE (Carbon, Organisms, Rhizosphere, and Protection in the Soil Environment), integrates microbial explicit controls of SOC transformations with feedbacks between live microbial biomass and decomposition to drive soil carbon efflux and stabilization (Sulman, Phillips, Oishi, Shevliakova, & Pacala, 2014). We select FUN-CORPSE as a guide to modifying DayCent's SOM cycling processes because, unlike many other microbial explicit SOM models, FUN-CORPSE simulates N and well and C cycling, which are both vital to DayCent's plant growth and trace gas models, and has been evaluated against measurements of rhizosphere fluxes and SOC (Sulman et al., 2017(Sulman et al., , 2018. While the mechanisms driving decomposition vary, much of the core structural components of the DayCent SOM cycling serve the similar roles as pools in microbial explicit models, such as FUN-CORPSE. For example, the metabolic litter pools in DayCent serve the same function as the labile carbon pools in FUN-CORPSE. The structural litter and slow SOC pools in DayCent are similar to the recalcitrant or chemically protected pools with varying levels of protection. DayCent's passive pool, defined as microbially derived inputs sorbed to soil clay particles, is representative of physically protected carbon. DayCent's active pool is its representation of live microbial biomass. While there are similarities between these pools, there is also a fundamental need to redefine and reroute flows between pools so they serve more mechanistic functions that align with microbial frameworks. While maintaining a similar structure of the DayCent's SOM cycling sub-model (Figure 1a), we propose redefining pools to match advances in functional understanding in accordance with the MEMS framework (Cotrufo et al., 2013;Robertson et al., 2019). New pool definitions would align with measurable pools that serve a functional role similar to | 779 BERARDI Et Al. those described by Lavallee, Soong, and Cotrufo (2020) to move beyond the conceptual active, slow, and passive framework in Century-based models.
Here, we rename the metabolic surface and rhizosphere litter pools as labile carbon pools to align better with contemporary terminology. They serve the same role as a category of plant residues with simple chemical structure entering the litter or rhizosphere layers. However, the labile pools will now be routed directly into live microbial pools. Recalcitrant plant material (e.g., lignin, cellulose, etc.) will still enter the litter or rhizosphere through pools categorized as structural carbon. Carbon from this pool will pass through the live microbial carbon pools. After microbial processing of structural carbon, this material will be transferred to the chemically protected, recalcitrant carbon pools. Recalcitrant carbon pools can may undergo further microbial decomposition or experience physical transfer of material from the litter recalcitrant pool to the rhizosphere recalcitrant pool. The recalcitrant carbon pools in the litter and rhizosphere will take the place of the slow carbon pools in Century-based models.
Carbon that enters the live microbial biomass pool will either go toward microbial biomass growth or will be lost as CO 2 to the atmosphere based on the CUE of the original pool. The live microbial biomass pool can transfer material to the dead microbial carbon pool as microbial necromass or biproducts. From there, carbon can be transferred abiotically to the Mineral Associated Organic Matter pool (MAOM); this will replace the conceptual passive soil carbon pool in Century-based models. There will be a carbon saturation level of the MAOM pool that will be based on the mineral soil composition (sand, silt, and clay content). Carbon, regardless of the source, must undergo microbial processing before it can be incorporated into the MAOM pool.
We are also integrating mechanisms and parameters from FUN-CORPSE that will incorporate more plant-soil and soil-microbial interactions ( Figure 1b). We will replace DayCent's decomposition function with the microbial explicit, reverse Michaelis-Menten kinetics functions, and add the carbon cost of nitrogen function to determine carbon allocation as root exudates to the rhizosphere carbon pool. The new live microbial biomass pools for each, the litter and rhizosphere allow for the feedbacks between decomposition rate and microbial biomass growth and the transfer of microbial necromass to the MAOM pool. While here, we are describing how microbial processes may be incorporated specifically into DayCent, this may be applied similarly to other Centurybased soil carbon models.
As microbial processes are incorporated into ecosystem models, the challenge will be to attain parameterization and validation data at larger spatial scales with varying microbial and plant communities. For example, to date, FUN-CORPSE has only been validated in Eastern USA deciduous forests. Microbial response has been a focus of many warming studies but changes in microbial CUE, microbial biomass, and enzyme activity vary in direction and duration of response Ye et al., 2019). While studies are addressing the mechanisms driving this variation (Alvarez et al., 2018), more measurements are necessary to validate models across larger spatial scales. Recent studies have found that CUE of microbial communities has a stronger correlation with mean annual temperature (MAT) rather than assay temperature (Sinsabaugh, Moorhead, Xu, & Litvak, 2017;Takriti et al., 2018). Recently, Ye et al. (2019) evaluated using MAT as a driver of CUE and microbial extracellular enzyme kinetics (V max , the maximal activity, and k m , the half-saturation constant) using three microbial-explicit models compared with an early version of the Century soil model (Parton, Stewart, & Cole, 1988). They found that there was a strong positive relationship between CUE and MAT and that the microbial models performed better than the first-order kinetics model. In both DayCent and FUN-CORPSE, each SOM pool has its own CUE, and while it can be tuned during model calibration, direct measurements are often not available. Until measurements of microbial traits become more widespread, developing and testing the use of empirical relationships with microbial traits, such as that between CUE and MAT, is a step toward more data-driven parameterization. As models progress toward including the more process-based microbial mechanisms of decomposition, it will be vital that we continue to expand on field-based studies that can evaluate microbial model parameterizations and results.

LACK S REPRESENTATION OF VEGETATION DYNAMICS AND LANDSCAPE HETEROGENEIT Y OVER TIME
The perennial growth form of many high-yielding bioenergy crops, including Miscanthus (Miscanthus × giganteus), switchgrass (Panicum virgatum), and sugarcane (Saccharum officinarum) pose a special challenge to modelers as endogenous (i.e., aging) and management factors, in addition to the environment, may affect their yields over long periods of time (Figure 2). For several years after planting, perennial bioenergy grasses produce relatively low yields (i.e., establishment effects; Figure 2b) and are more vulnerable to nutrient and water deficiencies than established stands. Because of high costs of crop establishment and for the construction of biorefineries, it is desirable that the yield of bioenergy crops remain high and stable over many years; however, this may not be the case. Sugarcane yields decline precipitously after only two or three cycles of vegetative growth-ratoon cycles (Smith, Inman-Bamber, & Thorburn, 2005) necessitating replanting. Generalizations about the yield stability of Miscanthus and switchgrass are more difficult as few long-term data sets are available. After an initial establishment phase where yields increase for 1-5 years, Arundale et al. (2014) observed a gradual decline in yield for both species thereafter. This study was conducted on rich agricultural soils in the Midwest US without fertilization. After peak yields, reductions also were observed for these species in other regions; however, observed yield reductions are not universal. For Miscanthus, it took longer (>15 years) for these reductions to become apparent in the Mediterranean (Alexopoulou et al., 2015).
Predicting age-related declines in yield remains a key challenge for models. Adding to this challenge, the empirical understanding of the mechanisms driving time-dependent F I G U R E 2 Modeled and observed aboveground biomass of Miscanthus from two studies at the University of Illinois Urbana Champaign Energy Farm. Davis et al. (2010) did not have any Miscanthus validation data at the time of the study. Hudiburg et al. (2015) had biomass data for validation and used daily site weather data through 2012 when the study was finished. In other words, observations at the time of these studies did not have either a drought or aging-induced decline in yield and in the case of Davis et al. (2010), an establishment phase. DayCent's crop model (primarily used for annual crops) was not designed to capture these types of dynamics because annual crops do not have multi-year establishment or multi-year responses to past events. These new observations have led to the development of a new plant sub-model in DayCent that is now being used for the perennial bioenergy crops yield reductions is limited. Excessive tillage at planting followed by soil compaction contribute to the strong decline in sugarcane yield from 1 year to the next (Pankhurst et al., 2003). Nutrient management for Miscanthus and switchgrass is not well understood (Heaton, Dohleman, & Long, 2009). Particularly for Miscanthus, which can yield well in excess of 20 dry Mg ha −1 year −1 (Laurent, Pelzer, Loyce, & Makowski, 2015), removal of nitrogen and phosphorus from the soil as plant biomass is harvested can contribute to declining yields. Miscanthus more than switchgrass is sensitive to hard winter freezes which can damage overwintering rhizomes and reduce yields the following summer (Clifton-Brown & Lewandowski, 2000). Physiologically, the extraordinarily dense canopies developed by these crops can reduce photosynthetic efficiency by shading (Pignon, Jaiswal, McGrath, & Long, 2017). There is ongoing work that is using a staggered stand age to separate age dynamics from environmental effects on Miscanthus physiology and yield (Tejera et al., 2019). Studies such as this will provide insight to develop a more comprehensive and integrated understanding of mechanisms driving these changes, including the role of management practices, edaphic and biotic factors, and endogenous physiological controls of productivity. It is critical to continue incorporating emerging mechanistic improvements as they are identified in the literature.
In addition to an establishment phase for Miscanthus, a decline in yield was observed following the 2012 drought at the University of Illinois Energy Farm (Figure 2; red dots); this decline has continued. DayCent model predictions for Miscanthus yield at the Energy Farm prior to planting (Figure 2a; Davis et al., 2010) neither capture the establishment phase nor a decline in yield (to drought or aging). Model predictions were improved in later modeling studies (Hudiburg, Davis, Parton, & Delucia, 2015) by adding an establishment phase by modifying DayCent's plant growth potential parameter for Miscanthus. However, representing the establishment phase did not improve the ability of the model to capture the response to drought, or yield declines due to aging. The suggested path to addressing age related decline is described in the following section describing long-term plant dynamics in DayCent.
To improve our ability to model drought effects on yield, we must be able to incorporate the ways in which plants respond to drought which varies between feedstocks. Switchgrass pursues a strategy of drought avoidance by investing in root mass to extract deep soil water, whereas Miscanthus has been found to have a variety of physiological responses to drought. When established Miscanthus stands were subjected to an extreme drought in 2012, Joo, Zeri, Hussain, DeLucia, and Bernacchi (2017) observed increased ET driven by deep soil water use by Miscanthus compared to switchgrass and prairie plots. In contrast, another study found that Miscanthus had lower evapotranspiration in 2012 than either switchgrass, corn or mixed prairie (Hamilton, Hussain, Bhardwaj, Basso, & Robertson, 2015). Other research has suggested a lack of sensitivity of Miscanthus to moderate drought, with water use and photosynthesis remaining high through early phases of scarcity and stomatal closure only upon onset of severe drought (Ings, Mur, Robson, & Bosch, 2013). DayCent crop parameters currently allow specification of root allocation responses to water scarcity. However, the rate of consumption of available water is dictated by leaf area and climatic factors that are not specific to crop species. If field results establish clear differences in the water use strategies of perennial bioenergy crops, it may be necessary to introduce a water consumption algorithm that accounts for species-specific responses to drought.

DEVELOPMENTS SHOW IMPROVED SHORT-AND LONG-TERM DYNAMICS
To simulate the dynamics of fast-growing, highly productive biofuel plants which have different dynamics and traits than more traditional crops and grasses, we have developed a new bioenergy grass PFT with additional physiological parameters for the DayCent ecosystem model. Although the new bioenergy grass can represent both annual and perennial crops, it was developed primarily to improve the representation of the long-term dynamics of large perennial biofuel grasses such as sugarcane, switchgrass, and Miscanthus. The grass category differs from the model's traditional crop/ grass plant type by representing aboveground biomass as both stems and leaves instead of as shoots only and representing belowground root biomass as both rhizomes and fine roots instead of as fine roots only. The stems, which contain a larger percentage of structural material than leaves, have a higher C:N ratio than leaves and the amount of leaves that can grow is dependent on the amount of stem biomass that can support them. Phenology, such as when senescence begins, can be prescribed or can be governed by growing degree days and moisture-related triggers. Shading can cause partial senescence of leaves which can either remain attached to the stems or fall to the ground. During the latter part of the growing season, perennial plants build up carbohydrate storage which will be available for growth in the subsequent growing season. Similarly, during senescence large perennial grasses re-translocate nitrogen from leaves to the rhizomes and this stored nitrogen will be available for growth in the subsequent growing season. Thus, the health of the plant in one growing season may affect yields in the subsequent growing season.
New PFT parameterizations in DayCent-CABBI, a version of DayCent developed by the Center for Advanced Bioenergy and Bioproducts Innovation, have improved the model representation of perennial plant establishment and growth (Figure 3; Moore et al., in press). Model-data agreement can be tuned with a comparable amount of success for both DayCent-CABBI and DayCent-Photo, the previous version of the model (Stenzel et al., 2019;Straube et al., 2018). However, to simulate lower yields during establishment years for perennials in DayCent-Photo, the model required three consecutive switchgrass PFT parameterizations to be used with varying growth potential (see table in Figure 3). In this comparison, DayCent-CABBI captured the effects of establishment in a single parameterization of switchgrass since simulated stem and rhizome growth required several years to achieve peak biomass through additional biomass to LAI and maximum LAI parameters. DayCent-CABBI also differentiates carbon and nitrogen allocation between leaves and stems and root tissues, whereas previous versions of the model only differentiate C:N ratios between aboveground and belowground pools.
To improve long-term dynamics, we are investigating whether long-term changes in allocation between aboveground and belowground biomass are altered by prolonged drought and plant age. We are also investigating the best way to represent the age-related declines in Miscanthus and switchgrass production that have been observed in field trials in the USA since we are unsure if these yield declines are a result of reduced soil nutrient availability, changing soil conditions or pests and disease. Furthermore, heterogeneous conditions within a field may reduce average biomass harvest over time. The model already simulates nutrient uptake and therefore simulated yields will respond to nutrient supply, but other age-related productivity declines are not currently represented (Arundale et al., 2014). These changes to model algorithms will also be useful for woody plants, especially age-related changes to allocation of carbon.
The role of carbon and nitrogen stored belowground is also important to understanding stress response in perennial grasses. Eichelmann, Wagner-Riddle, Warland, Deen, and Voroney (2016) found that water-use efficiency of mature switchgrass increased in the 2012 drought year relative to 2011 despite lower CO 2 fixation, suggesting the use of belowground stored carbon to "subsidize" drought year biomass production. The new grass category provides for carbon accumulation in rhizomes which will allow for plants to mobilize carbon stored in roots under drought stress for aboveground biomass production. However, algorithms involved with the new grass category will need to be validated against measurements such as these as they become available to enable accurate prediction of yield stability across wet and dry growing seasons.

UNDERSTANDING ABOUT THE TIMING AND MAGNITUDE OF TRACE GAS FLUXES (e.g., N 2 O) IS INSUFFICIENT FOR MODEL DEVELOPMENT
Given the amplified global warming potential of N 2 O relative to other GHGs, inaccurate estimates of N 2 O fluxes under land use or climate change scenarios represent a large source of uncertainty on terrestrial ecosystemsclimate feedbacks, particularly when considering the agricultural sector (Stein & Yung, 2003). Accurate predictions of F I G U R E 3 DayCent-CABBI and DayCent-Photo (the previous version of the model) evaluated against measurements of peak aboveground and belowground peak biomass of switchgrass at the UIUC Energy Farm over 8 years following the initial planting. Points represent observed data. Error bars are standard error (n = 5). The table provides differences in major crop parameterizations between the old and the new version of the model. DayCent-Photo adjusting the growth potential coefficient during the first 2 years to limit growth during switchgrass establishment, whereas DayCent-CABBI achieves this though LAI parameters. The C:N and LAI values were derived from measurements taken at UIEF (Masters et al., 2016;Smith et al., 2013) | 783 BERARDI Et Al. potential feedbacks are key for evaluating risks and designing mitigation strategies.
Because a direct measurement of all factors and interactions integrating the nitrogen (N) cycle is unfeasible and subject to large variability, models of varying complexity have been developed to reproduce the complex processes driving N dynamics and ultimately N 2 O emissions. DayCent was developed to link to a daily land surface model to improve estimates of trace gas fluxes by incorporating daily soil hydrological and thermal dynamics in conjunction with modified parameterizations to accommodate a higher temporal resolution of ecosystem processes subject to significant daily variability (Parton, 1996;Parton et al., 1998). DayCent simulates N 2 O and NO x emissions from nitrification and denitrification and N 2 emissions from denitrification reproducing the regulation of each process separately. The model assumes that nitrification rates are controlled by the availability in soil of NH 4 + , and soil water content, temperature, and pH (Parton et al., 2001). Maximum nitrification rates occur at close to 50% water-filled pore space (WFPS) and are assumed to decreases as temperature and pH decrease. Denitrification is assumed to be a function of soil NO 3 − (e − acceptor) concentration, labile C (e − donor) availability, and O 2 (competing e acceptor). Heterotrophic respiration is used as a proxy for labile C availability while O 2 availability is estimated from WFPS, and soil physical properties related to texture that influence gas diffusivity (Del Grosso et al., 2000). The DayCent model has been extensively used to generate N 2 O flux estimates for regional GHG inventories, predict N 2 O emissions and N 2 O emission factors across land uses and management practices, and evaluate the impacts of climate change on agriculture (Adler, Del Grosso, & Parton, 2007;Davis et al., 2010Davis et al., , 2012Del Grosso et al., 2006;Del Grosso, Mosier, Parton, & Ojima, 2005;Del Grosso, Wirth, Ogle, & Parton, 2008). However, despite yielding more accurate estimates than the IPCC recommended EF methodology (Del Grosso et al., 2005) and a generally better performance relative to alternative process-based models (Abdalla et al., 2010;Ye et al., 2019), DayCent estimates are not devoid of uncertainty, with reported deviations ranging from −57% to 38% (Abdalla et al., 2010;Del Grosso et al., 2005;Gaillard et al., 2018;Ye et al., 2019). The complexity of model predictions of ecosystem N 2 O fluxes lies largely on the fact that N 2 O fluxes result from multiple processes, whose regulation, drivers, and interactions are still not sufficiently understood or even identified.
The lack of mechanistic understanding of N 2 O dynamics is largely due to the highly variable nature of these fluxes. In the last few decades, it has become apparent that the majority of N 2 O emissions from soil and aquatic systems occurs in hot spots and during hot moments (McClain et al., 2003). Sampling is limited in space and time leading to an uneven representation in the literature (e.g., more accessible vs. less accessible areas; growing season vs. non-growing season), making it difficult to disentangle the primary variables driving the response of N 2 O to changes in the environment.
Predicting ecosystem N 2 O fluxes requires integration of multiple processes operating simultaneously (i.e., nitrification and denitrification). These processes have different lag-times and sensitivities to a wide range of biotic factors (e.g., plant-soil interactions, natural inhibitors of the nitrogen metabolism, and dominant microbial communities) and environmental conditions which are still not fully understood or even identified. The role of soil pH in denitrification metabolism has only recently been recognized (Liu, Mørkved, Frostegård, & Bakken, 2010) and new evidence suggests that the contribution of N 2 O emissions induced by freezethaw cycles to annual emissions has been underestimated by ~20%-30% (Wagner-Riddle et al., 2017). Neglecting to capture these processes in biogeochemical models likely contributes to strong biases in ecosystem N 2 O annual budgets. Moreover, static model site (or grid cell) parameters such as soil texture, drainage, or pH may not be constant as they can evolve over the course of the development of supported vegetation, buildup of SOM, or particularly following disturbance events. A recent review found that there is frequently high N 2 O emissions in the first few years following conversion of annual crops to perennial bioenergy feedstocks (Whitaker et al., 2018). To accurately predict ecosystem N 2 O emissions, models need be able to integrate these legacy effects, where the short-term effects of changes in land use and/ or climate on the ecosystem physical and biological parameters are foundational to long-term responses.
The complex functional relationships between N 2 O and its controlling factors make it difficult for process-based biogeochemical models to predict spatial and temporal N 2 O dynamics. Furthermore, the popular process-based models for N 2 O prediction such as DayCent and DNDC are often parameterized based on limited number of observations from controlled laboratory experiments that do not represent the complex variable interactions. The process-based models often fail to accurately predict fine-scale (daily) temporal N 2 O predictions (Jarecki, Parkin, Chan, Hatfield, & Jones, 2008;Parton et al., 2001). We propose that datadriven machine learning models could be used to improve predictability as well as understand controlling variable sensitivity and identify their functional relationships with N 2 O, which, in turn, could help to refine the process-based models through improved parameterization (Perlman, Hijmans, & Horwath, 2014;Philibert, Loyce, & Makowski, 2013;Reichstein et al., 2019). For example, deep learning models can improve prediction accuracy for poorly understood processes at the expense of interpretability (Brenowitz & Bretherton, 2018;de Bezenac, Pajot, & Gallinari, 2019). Whereas tree-based models such as Random Forest is a non-parametric machine learning technique that learns functional forms between the response and predictor variables from the data (Breiman, 2001). Hence, can provide novel understanding on process controls while improving prediction accuracy. This process can operate in parallel or in fusion with biogeochemical or other process-based models (Walsh & Hudiburg, 2019). Unlike process-based models, the Random Forest algorithm learning is facilitated through an iterative process of recursive data partitioning and constructing hundreds of decision trees to partition the observations into distinct groups characterized by different properties of the predictor variables. Meta-modeling approach using Random Forest and process-based models has been used in predicting soil N biogeochemistry (Ramanantenasoa, Génermont, Gilliot, Bedos, & Makowski, 2019;Shahhosseini, Martinez-Feria, Hu, & Archontoulis, 2019); however, its use in predicting highly variable temporal soil N 2 O fluxes is limited. Saha, Basso, and Robertson (in review) developed a Random Forest model based on automated flux chamber data from corn in the upper Midwest that predicted 51% variability in daily N 2 O fluxes from an unknown site.
High temporal and spatial resolution input data are critical to facilitate this learning process; hence, measured data from diverse soil, climate, and cropping system management practices are critical for model training ( Figure 4). As more long-term observed flux data from automated flux chamber sites become available, the opportunity to use machine learning methods to improve process-based biogeochemical models, like DayCent, is increasing. The predictor variables may include soil properties (texture organic matter, pH), weather variables (precipitation, temperature), management practices (tillage, cover crop, fertilization, and land use change), and dynamic soil biogeochemistry (mineral nitrogen availability, soil moisture, and temperature). As a way forward, we suggest first identifying the available measured N 2 O data sources from diverse soil, climate, and production systems. Second, creation of a database of N 2 O fluxes and associated predictor variables that are empirically identified as important. Process-based models can be used to fill in the gaps in input data with a certain level of confidence to facilitate an integration of data-driven and physics-based N 2 O modeling approach ( Figure 4). Third, machine learning models should be trained and then tested with separate validation data. Fourth, performance comparisons between the process-based models and Random Forest method in predicting N 2 O fluxes can be used as a metric of improved model confidence. This method could extend to other ecosystem fluxes that are also difficult to model under quickly changing environmental conditions such as CO 2 from heterotrophic respiration and CH 4 production and consumption.

| CONCLUSION
Modeling is a powerful tool for generating hypotheses and predictions about how major changes in land use, driven by the expansion of bioenergy crops and other changes in managed landscapes, will affect the land-atmosphere exchange of GHGs. Century-based models have been keystone models representing processes affecting SOC and GHG exchange. Assimilation of new understanding of plant physiology, new paradigms explaining changes in SOC, the ability to quantify feedbacks between the microbial community and decomposition, and the use of machine learning to identify and optimize key parameters when high-resolution data is lacking will improve the predictive power of DayCent when confronted with rapidly changing environmental conditions. F I G U R E 4 A schematic representation of integrating machine learning and biogeochemical simulation models to predict N 2 O fluxes. The left broken box represents the development of machine learning model by learning from the data. To improve the machine learning model's ability to generalize predictions, the model should be trained and tested on data from diverse environmental and agronomic production systems. The right solid box represents output generation from process-based simulation model. The relevant outputs can serve as input variables for the machine model to predict N 2 O fluxes at the same resolution as the simulation model (adapted from Saha, Basso, & Robertson, in review)