Toward Robust Parameterizations in Ecosystem‐Level Photosynthesis Models

In a model simulating dynamics of a system, parameters can represent system sensitivities and unresolved processes, therefore affecting model accuracy and uncertainty. Taking a light use efficiency (LUE) model as an example, which is a typical approach for estimating gross primary productivity (GPP), we propose a Simultaneous Parameter Inversion and Extrapolation approach (SPIE) to overcome issues stemming from plant‐functional‐type (PFT)‐dependent parameterizations. SPIE refers to predicting model parameters using an artificial neural network based on collected variables, including PFT, climate types, bioclimatic variables, vegetation features, atmospheric nitrogen and phosphorus deposition, and soil properties. The neural network was optimized to minimize GPP errors and constrain LUE model sensitivity functions. We compared SPIE with 11 typical parameter extrapolating methods, including PFT‐ and climate‐specific parameterizations, global and PFT‐based parameter optimization, site‐similarity, and regression approaches. All methods were assessed using Nash‐Sutcliffe model efficiency (NSE), determination coefficient and normalized root mean squared error, and contrasted with site‐specific calibrations. Ten‐fold cross‐validated results showed that SPIE had the best performance across sites, various temporal scales and assessing metrics. Taking site‐level calibrations as a benchmark (NSE = 0.95), SPIE performed with an NSE of 0.68, while all the other investigated approaches showed negative NSE. The Shapley value, layer‐wise relevance and partial dependence showed that vegetation features, bioclimatic variables, soil properties and some PFTs determine parameters. SPIE overcomes strong limitations observed in many standard parameterization methods. We argue that expanding SPIE to other models overcomes current limits and serves as an entry point to investigate the robustness and generalization of different models.

and a lack of ecological considerations as additional constraints (Bloom & Williams, 2015). Although model parameters contribute to considerable uncertainties, most global vegetation models are parameterized using fixed, biome-or plant functional type (PFT)-based values, which alone cannot capture the spatial variability of carbon assimilation processes (Bloom et al., 2016), arguing for considering aspects related to ecosystem development stage and biodiversity. The fixed and PFT-based parameterization are also widely used and introduced uncertainties in gross primary productivity (GPP) models (Groenendijk et al., 2011;Ryu et al., 2019), including light use efficiency (LUE) models, leaf-scale-process-based models, machine-learning and sun-induced fluorescence models (Frankenberg et al., 2011;Jung et al., 2011;Running et al., 2004;Tian et al., 2020;Zhang et al., 2012). A more robust and physically intuitive parameterization method is desired for constraining the global GPP estimation.
LUE models are typical approaches for the estimation of GPP at large global scales (Mahadevan et al., 2008;Potter et al., 1993;Running et al., 2004;Tian et al., 2020;Yuan et al., 2007). These models incorporate the knowledge of environmental constraints to the originally proposed empirical LUE model, Monteith's model (1972), having the advantages of high efficiency and algorithmic transparency compared to leaf-scale-process-based models and machine-learning-based models, respectively.
The first global GPP product based on the MODIS LUE algorithm (Running et al., 2004) proposed a set of PFT-dependent parameters. Later, other published global LUE models inherited the PFT-based parameterization method or incorporated parameters directly extracted from literature (He et al., 2013;Mahadevan et al., 2008;Xiao et al., 2004;Xie & Li, 2020). The PFT-based approach is not exclusive to LUE models but is also commonly used in dynamic global vegetation models and the land surface schemes of global Earth system models supporting IPCC reports Masson-Delmotte et al., 2021;T. Stocker, 2014;Wenzel et al., 2014). However, applying parameterizations in locations or regions, which have not been previously used or evaluated against observations, might easily lead to naïve conclusions about model structure robustness and/or parameter generalization. To overcome this limitation, LUE models usually calibrate parameters within their physical ranges to minimize the mismatch between the modeled and the observational GPP (Carvalhais et al., 2008;Horn & Schulz, 2011a;Mäkelä et al., 2008;Yan et al., 2017;Zhou et al., 2016), that is, the GPP estimated from eddy covariance (EC) carbon flux. This method is supported by the availability of EC flux towers, given the need for parameter generalization limited for the global domain. To parameterize the Carnegie-Ames-Stanford approach (CASA) model in locations without EC observations, Carvalhais et al. (2010) considered in situ parameterizations from EC sites with the same PFT and similar climate and vegetation features. An alternative approach is to apply globally fixed parameters from literature (Mengoli et al., 2022;H. Wang et al., 2017) or globally optimized parameters (B. D. Stocker et al., 2020;Yuan et al., 2019). Another option to extrapolate parameters is based on clustering approaches. The clustering approaches denote using the average site-level optimized parameters per PFT (Guan et al., 2022;Yuan et al., 2014a;Zhou et al., 2016) or PFT-specific optimized parameters (Tian et al., 2020;Zheng et al., 2020). Yuan et al. (2014b) showed using six different LUE models that the modeled GPP using globally optimized parameters was not significantly different from that using PFT-specific optimized parameters. However, it has been illustrated that at least parameter variations between PFTs need to be considered to reach confident model performances (Tian et al., 2020). In general, most studies did not account for parameter variances within PFT, despite assuming that LUE model parameters are related to characteristics of the vegetation and the environment.
In some studies, the drivers for spatial changes of model parameters were analyzed based on site-level calibrated parameters. For example, Horn and Schulz (2011b) found that their LUE model parameters, which represent the maximum LUE and the sensitivity of GPP to temperature and soil moisture, varied across climate zones and biomes and can be predicted using vegetation and environmental properties. The relationship between parameters and plant traits also existed in process-based GPP models (Peaucelle et al., 2019). Moreover, Luo and Schuur (2020) claimed that model parameters, which can represent both the evolving ecosystem properties and the unresolved ecosystem processes, should be determined according to our knowledge about the changing ecosystem properties. All these studies confirmed the control of vegetation attributes and environmental features on model parameters, which represents GPP sensitivities. These findings inspire the possible next generation of parameterization methods based on the physical connection between model parameters and ecosystem properties.

10.1029/2022MS003464
3 of 21 conditions to parameter spatial variability. We allow the spatial variation of parameters, indicating GPP sensitivities to environmental forcings, to be learned from a set of ecosystem properties. To test the approach, we compared 12 different parameterization methods (see details in Section 2.2) based on an exemplified LUE model with appropriate environmental drivers and sensitivity functional forms selected from an ensemble of 5600 LUE models (Bao et al., 2022). These parameterization methods were assessed according to the accuracy of the simulated GPP (GPP sim ) across different time scales at the site level, per PFT, per climate type and globally (see Section 2.3). The importance of drivers for model parameters was evaluated using three different methods (see Section 2.4).

Light Use Efficiency Model
LUE models define GPP as a product between photosynthetically active radiation (PAR), the fraction of absorbed photosynthetically active radiation (FAPAR) and the maximum LUE (ε max ), regulated by environmental sensitivity functions. The environmental drivers and sensitivity functional forms, representing the responses from the photosynthesis processes, differ across LUE models and can affect the model's performance. To minimize the selection effect of environmental drivers and sensitivity functions, we select a LUE model based on Bao et al. (2022). The study evaluated a large set of LUE models with various combinations of drivers and functional forms against GPP observations across various PFTs and climate types. The selected model was shown to be the best one compared with those LUE models with different drivers or sensitivity functional forms. In summary, the LUE model considers the impacts of temperature (T), vapor pressure deficit (VPD), atmospheric CO 2 concentration (c a ), soil moisture (W), light intensity (L) and the cloudiness index on GPP dynamics (see Equations 1-8, and Bao et al. (2022)).
The LUE model includes 13 parameters in total (in bold). All sensitivity functions (i.e., fT, fVPD, fW, fL, and fCI) are scaled from zero to one, representing from strong to no constraints. The physical meanings and units of the parameters and references of these sensitivity functions are summarized in Table 1.
The sensitivity function of VPD (fVPD) includes the effect of both VPD and c a , which jointly control the leaf internal CO 2 concentration. The pure CO 2 fertilization effect is described only by the right part of fVPD (i.e., the sum of one and c a function). The product of PAR and FAPAR, that is, the absorbed photosynthetically active radiation (APAR), is the estimate of the light energy intercepted by the vegetation canopy, and thus was used as the light intensity input of the sensitivity function of L (fL). The T and W were temporally filtered using lag parameters, α T and α W , at boreal and arid regions, respectively, according to Horn and Schulz (2011a).

Forcing Data and Parameter Calibration
The forcing data for the LUE model was collected at 196 EC sites (listed in Table S1 of the Supporting Information S1) from FLUXNET (www.fluxnet.org). The detailed sources and algorithms of the forcing data are 10.1029/2022MS003464 4 of 21 summarized in Table S2 of Supporting Information S1. The GPP estimated from the observed net ecosystem exchange at EC sites (GPP obs ) were also collected to calibrate parameters and evaluate parameterization methods.
In the parameter calibration process, we added constraints on GPP simulation errors and sensitivity functions (e.g., fT) following the concept of ecological and dynamic constraints (see details in Text S1 of the Supporting Information S1). As a derivative-free global searching algorithm, CMAES (Hansen & Kern, 2004), was used to search the optimal parameters in its physical range according to the full-time series of GPP obs . We assumed that the simulated GPP using the calibrated parameters (GPP calib ) can reach the model potential (i.e., the highest model performance). Thus, the performance of the site-level calibrations ("site-calib") was set as the benchmark for other parameterization methods in cross-validation (will be introduced in Section 2.2).

Input Features of Ecosystem Properties for Predicting Model Parameters
To extrapolate the parameters across various sites and in the future to the global scale, we collect mainly the variables that can represent the ecosystem properties available at both local (i.e., site-level) and global scales. These variables include the PFT, climate classification types (Clim; Rubel et al., 2017), 19 bioclimatic variables (BIO1-19, classified as "BioClim"; Xu & Hutchinson, 2011), two aridity features (AI1-2, classified as "BioClim"), 11 vegetation index features (VIF1-11, represented by "VIF"), atmospheric Nitrogen and Phosphorus deposition (NdepNHX, NdepNOY and Pdep, summarized as "NPdep"; R. Wang et al., 2017) and 17 soil properties (classified as "Soil"; . In other words, we used six classes of input features which are shorted to BioClim, Clim, NPdep, PFT, Soil and VIF. The details of the input features are summarized in Table S3 of Supporting Information S1. The categorical variables (PFT and climate types) were converted to one or zero to indicate whether the target location belongs to a specific type or not. All non-categorical variables were normalized by subtracting the mean of each feature and dividing by the standard deviation (Equation 9). The original and normalized features are represented by var and var nor . The mean and standard deviation per feature are represented by mean and std, respectively.

Parameterization Methods
We extrapolate the parameters based on a ten-fold cross-validation strategy using the collected input features. Namely, the samples, referring to the EC sites, were divided into 10 groups randomly (see the group number of  Table S1 of the Supporting Information S1). We trained the parameterization models using nine of 10 groups and validated the result using the one left, and repeated 10 times until we got validated results from all sites. All PFT and climate classification types (11 PFT and 14 climate classification types in total, see Table S1 in Supporting Information S1) were included in each training data set.
The 12 parameterization methods can be divided into six groups, that is, univariate clustering, similarity-based, optimization-based, regression-based, our hybrid approach based on a neural network and globally fixed methods (see details in Sections 2.2.1-2.2.6).
2.2.1. Univariate Clustering Methods ("PFT mean ", "Clim mean ", "PFT med ", and "Clim med ") In regions without observational data, the parameters were determined by the arithmetic mean from the calibrated parameters at the sites with the same PFT (Guan et al., 2022;Yuan et al., 2014b;Zhou et al., 2016). Here we tested the methods using the mean and median parameters per PFT and climate type. "PFT mean ": means for calibrated parameter vectors per PFT; "Clim mean ": means for calibrated parameter vectors per main climate type; "PFT med ": medians for calibrated parameter vectors per PFT; "Clim med ": medians for calibrated parameter vectors per main climate type.

Similarity-Based Method ("Site sim ")
The site similarity is defined by Carvalhais et al. (2010) which measures the similarity (D) of the climate and vegetation features between site i and site j as in Equation 10: Here, V i is a vector including the normalized daily mean of the air temperature, precipitation (in logarithm), global radiation and LANDSAT-based normalized difference vegetation index (NDVI, see data source and processing method in Table S2 of the Supporting Information S1) between 1986 and 2015 at site i. N is the vector length (=366 × 4) and refers to the mean of V i .
To determine the parameters of a target location, we calculated D for each training site within the same PFT as the target location. The parameter vector at the site with the maximum D was used.
"Site sim ": parameter vectors for each site from the most similar site.

Optimization-Based Methods ("OPT-All" and "OPT-PFT")
The parameters can be optimized across all sites or at sites per PFT (Yuan et al., 2014a). Here we adopted the same algorithm, CMAES, and the same cost functions as the site-specific calibration method (see Text S1 in Supporting Information S1). "OPT-All": a parameter vector optimized using all sites in the training data set (Yuan et al., 2014a). "OPT-PFT": parameter vectors per PFT optimized using the sites within the same PFT in the training data set (Kuppel et al., 2012;Tian et al., 2020).

Regression-Based Methods ("sRF", "mRF", "mNN")
To test the assumption that calibrated parameters are determined by ecosystem properties, here we predict the calibrated parameters using the normalized features based on different regression methods. "sRF": parameter vectors per site of which each parameter was predicted independently based on the single-output random forest (trees number = 100; Breiman, 2001). "mRF": parameter vectors per site predicted simultaneously based on the multi-output random forest (trees number = 100; Pedregosa et al., 2011). "mNN": parameter vectors per site predicted simultaneously based on the multi-layer perceptron neural network (hidden layers number = 2, neurons number = 16; Gardner & Dorling, 1998;McCulloch & Pitts, 1943).

Simultaneous Parameter Inversion and Extrapolation Based on Ecosystem Properties ("SPIE")
Models can suffer from equifinality problems, namely different parameter vectors can generate similar model performance. Consequently, the calibrated parameters may not represent the true parameters to simulate GPP, which reflect the GPP sensitivities controlled by the environmental properties. Here we additionally test the assumption that the predicted parameter vector based on ecosystem properties, which might differ from the calibrated parameter vector, can simulate GPP with good accuracy. Instead of directly predicting parameters constraining by calibrated or literature-referenced values, we applied the neural network to predict the parameters of a LUE model which is constrained by GPP obs . The flowchart for this method is shown in Figure 1.
The sites were split into 10 groups according to the ten-fold cross-validation strategy to ensure robustness evaluation. Each training group includes all kinds of PFTs and climate types. For each group of training sites, the neural network takes the normalized site-level input features (Table S3 in Supporting Information S1) to predict simultaneously the individual site-level parameter vector (Table 1) for the LUE model to estimate GPP per site. GPP sim is assessed against GPP obs based on a custom cost function (see the equation of "cf NN " in Text S2 of the Supporting Information S1). Further, parts of the cost function are the constraints of the sensitivity functions (see the equations of "cf 3 " and "cf 4 " in Text S1 of the Supporting Information S1). To optimize the neural network, we calculate the gradient of the cost function to each parameter and backpropagate the gradient to the weight and bias terms of each hidden layer with the gradient descent algorithm. Afterwards, the weight and bias of each neuron are optimized using the ADAM algorithm (Kingma & Ba, 2014). Either if the number of epochs reaches 2 × 10 3 or the cost function is below a defined threshold (ε = 10 −2 ), the optimization process stops. The hyperparameter of the neural network was tuned using a grid-searching approach (not shown here) which resulted in the following Figure 1. Flowchart of the SPIE approach. We rely on the learning of a neural network, that outputs parameter vectors per site, contingent on site-level input features (introduced in Section 2.1.2), and constrained by observation-based gross primary productivity fluxes and sensitivity functions (see the cost function definitions in Text S2 in Supporting Information S1). The training process stops once the tolerance value for the cost is passed (ε = 10 −2 ) or a maximum number of epochs are reached (N e = 2 × 10 3 ).
values: learning rate = 10 −3 , L 2 regularization coefficient = 10 −4 , mini-batch size = 32, neurons per layer = 16 and hidden layers = 2. We apply the dropout strategy to the input and hidden layers, meaning that the neural network randomly deactivates a certain number of neurons in each batch. This leads to more robust predictions due to the avoidance of the overfitting problem (Srivastava et al., 2014). The output from the full pipeline is the trained neural network, which is then used to predict the parameter vector per site in test data sets. Finally, the LUE model with the predicted parameters is forced to simulate GPP and sensitivity functions (fT, fVPD, fW, fL, and fCI).
"SPIE": parameter vectors per site predicted based on ecosystem properties and constrained by both GPP errors and sensitivity functions of the LUE model.

Globally Fixed Method ("P-Model")
The P-model (B. D. Stocker et al., 2020;H. Wang et al., 2017) is derived based on Farquhar et al. (1980) and Fick's law together with an optimality theory (Prentice et al., 2014). All of its parameters are extracted from literature and upscaled from leaf-scale processes, therefore fixed globally. Mengoli et al. (2022) improved P-model from the original version by adding an acclimation process to the maximum rate of Rubisco activity and maximum rate of electron transport. Here we ran the Mengoli model based on daily data (introduced in Section 2.1.1). The P-model parameters were kept the same (acclimation window size = 15, running-mean approach based on daily data; Mengoli et al., 2022

Statistical Analysis for Parameterized Results
All the parameterization methods were assessed according to the GPP accuracy measured by Nash-Sutcliffe model efficiency (NSE, (−∞,1]; NSE = 1 indicates a perfect model), determination coefficient (R 2 , [0,1]; R 2 = 1 indicates a perfect model) and normalized root mean squared error (NRMSE, [0,∞); NRMSE = 0 indicates a perfect model) which is equal to the root mean squared error divided by the mean observational variable (e.g., GPP obs ). Only good-quality data were used to calculate NSE, R 2 , and NRMSE. Here the good-quality data refers to the input vector of which the quality assessment flags (see "QA" and "QC" in Table S2 of the Supporting Information S1) are all higher than 0.8, including quality assessment flags of all meteorological inputs, vegetation index and GPP obs . When aggregated to longer time scales, the good quality data means the average quality flags are all higher than 0.7 at the weekly and monthly scales, and 0.5 at the yearly scale. Besides, predicted parameters were compared to calibrated parameters to test if the model equifinality problem exists.

Site-Level Temporal GPP Assessment
We forced the LUE model at the daily scale and got the daily GPP sim as a result. The weekly, monthly and yearly GPP sim and GPP obs were calculated based on the mean daily GPP sim and GPP obs , respectively. These time series of site-level GPP at different time scales were evaluated using NSE, R 2 , and NRMSE. The vectors of NSE, R 2 , and NRMSE were compared across all sites, per PFT and climate types.

Spatial Variability of GPP Assessment
The site-mean GPP obs across sites represent the spatial variance of GPP. We used NSE, R 2 , and NRMSE to measure the accuracy of the site-mean GPP sim compared with GPP obs to evaluate the ability of these parameterization methods to capture the spatial variability of GPP.

Comprehensive Assessment Across Spatio-Temporal Scales Based on Model Likelihood
The likelihood of each parameterization method, P, was calculated according to Bao et al. (2022). To avoid selecting a method falling shortly at locally describing ecosystem GPP, P represents an overall performance at 200 different site groups. In each group, 100 sites were selected randomly from all sites and 2-year GPP was then randomly extracted from each of these 100 sites. The 200 site-years GPP sim were compared to GPP obs based on NSE, R 2 , and NRMSE at each site-year independently. The differences between the daily, weekly, and monthly NSE, R 2 , and NRMSE vectors and the yearly NRMSE vectors per parameterization method (each with 200 elements) were tested using Kolmogorov-Smirnov statistical test and t-test. The method with statistically higher NSE, R 2 or lower NRMSE than others was given with the largest score (=1, otherwise = 0) at each site group. In case that two or more methods were statistically equal and better than others, NSE, R 2 , or NRMSE across all site-years was additionally computed to sort these methods independently. P is equal to the average score across all site groups. The average P across different statistical metrics (NSE, R 2 , and NRMSE) and time scales (daily, weekly, monthly, and yearly) was used to detect the best parameterization method.

Comparison Between Predicted Parameters and Calibrated Parameters
Since the 13 parameters have different meanings and ranges, they were compared independently. The similarity between the predicted parameters using methods introduced in Section 2.2 and the calibrated parameters based on the observational data (see Text S1 in Supporting Information S1) was assessed using NSE, R 2 , and NRMSE.

Feature Importance Estimation
We evaluated the importance of input features using three methods and selected the most important features based on the average normalized feature importance values.

Shapley-Based Feature Importance (SHAP)
The Shapley value of a feature is calculated based on the deviation of the predicted parameter at a certain input from the average prediction (Lundberg & Lee, 2017), which represents the contribution of a feature to the output. Here SHAP is equal to the average absolute Shapley value across all inputs. The average SHAP across all cross-validation groups (Friedman, 2001) was used to assess the contribution of features for each parameter and all parameters.

Layer-Wise-Relevance-Propagation-Based Feature Importance (LRP)
The layer-wise relevance propagation refers to a strategy that allows decomposing the prediction of the neural network over an input feature (Montavon et al., 2019). It is usually used in deep classification neural networks, here we applied LRP to assess a shallow regression neural network. We calculated the relevance vector according to Bach et al. (2015) and measured the feature importance according to the average relevance across different cross-validation groups.

Partial-Dependence-Based Feature Importance (PD)
We estimated the partial dependence of the prediction on each input feature based on Friedman's (2001) algorithm. The feature importance, PD, was measured according to the partial dependence, which is equal to the standard deviation of the partial dependence if the input feature is non-categorical variables, otherwise is equal to the one-fourth of the absolute partial dependence range (Greenwell et al., 2018).

Temporal and Spatial Assessment
The parameterization method based on the neural network aiming at minimizing GPP errors and constraining sensitivity functions, SPIE, had the best performance compared with other typical parameterization methods. All the assessing metrics at daily, weekly, monthly, and yearly scales, NSE (Figure 2), R 2 ( Figure S1 in Supporting Information S1) and NRMSE ( Figure S2 in Supporting Information S1), showed that SPIE was better at more sites (i.e., more bright color blocks in Figure 2). The spatial variability of GPP can be also better captured by SPIE, which had higher NSE, R 2 , and lower NRMSE measured by site-mean GPP obs and GPP sim (Figure 3). The accuracy of time series and site-mean GPP sim using other methods were all significantly worse than SPIE. The results of these parameterization methods were also compared to those of site-specific calibrations ("site-calib", see details in Section 2.1.1). In the following analyses, the performance of "site-calib" is set as the benchmark of the cross-validation exercises since it represents the maximum NSE would reach in any of the out-of-sample prediction approaches. Although SPIE cannot perform as well as the site-specific calibration, it is the best parameter extrapolation method globally.
The global best parameterization method, SPIE, outperformed across various PFTs and climate types. It had the highest daily NSE quantiles for each PFT and climate type considered in this study (Figure 4). While SPIE was relatively better than other methods, no extrapolated parameters can provide accurate GPP dynamics (NSE > 0.4) at closed shrubland (Figure 4c), woody savanna (Figure 4l), tropical ( Figure 4m) and polar ( Figure 4q) climate types given that the model using calibrated parameters was good. It demonstrated that the variance of current extrapolated parameters was still insufficient and the parameters are possible to be overfitted in site calibrations.
Using R 2 or NRMSE as the assessing metric (see details in Figures S3 and S4 of the Supporting Information S1), the parameterization methods showed smaller but robust relative differences, that is, the SPIE was still the best method. In general, while none of these parameter extrapolation methods can reach the highest model performance, SPIE was the best option for areas without observational data.
The model likelihood, P, which represents the likelihood of a model statistically better than others across various site groups, illustrated that SPIE was the best method to extrapolate parameters, followed by OPT-All and Clim med with likelihoods lower than 0.06 (i.e., at less than 6% groups of sites the two methods can outperform). The average P of SPIE across daily, weekly, monthly, and yearly scales, and across various assessing metrics was also significantly higher than the other methods. It represented that the method is robust across multiple temporal and spatial scales.

Differences Between Calibrated Parameters and Predicted Parameters
The predicted parameters displayed different distribution patterns from the calibrated parameters. Taking the best method, SPIE, as an example (Figure 6), ranges of predicted parameters were narrower than calibrated parameters given the same predefined range. Further, SPIE predicted parameters had no "edge-hitting" problem, which means that the parameter frequently reaches its maximum or minimum values, for example, the calibrated parameters T opt , k T , C κ , C a0 , C m and k W (gray bars in Figures 6b-6c, 6f-6h, and 6j). The other parameterization methods based on ecosystem properties (e.g., mRF, Figure S5 in Supporting Information S1) also showed narrower ranges. However, clustering and optimization-based methods had similar ranges to the site-specific calibrations and more edge-hitting instances than SPIE (see details in Figures S6 and S7 of the Supporting Information S1).
The LUE model performance was not determined by the parameter difference to site-specific calibrations. For example, NSE between the predicted parameters using SPIE and calibrated parameters across sites were all negative. The maximum R 2 was 0.08 and the lowest NRMSE was 0.08. Furthermore, the NSE difference between the calibration and SPIE was not correlated with Figure 2. Comparison of NSE between GPP obs and GPP sim based on 12 different parameterization methods (see definitions of PFT mean , Clim mean , PFT med , Clim med , Site sim , OPT-All, OPT-PFT, sRF, mRF, mNN, SPIE, and P-model in Section 2.2), and NSE between GPP obs and GPP calib ("site-calib", see the calibration process in Text S1 of the Supporting Information S1) at daily (a), weekly (b), monthly (c) and yearly (d) scales. The methods are sorted according to the number of sites with positive NSE, which is displayed under each bar. The sites with negative NSE are in white color. The area under the gray dashed line represents the sites excluded in the comparison due to less than four good-quality (see the definition of "good-quality" data in Section 2.3) data points. Figure 3. Comparison of NSE, R 2 , and normalized root mean squared error between the site-mean GPP obs and GPP sim . The bars (i.e., parameterization methods) are sorted according to NSE.
the relative difference between calibrated and predicted parameters (Figure 7). Thus, the predicted parameters were not comparable to the calibrated parameters while they can produce similar GPP, suggesting the overfitting and parameter equifinality in the site-specific calibrations of LUE model. It also reflects the potential risk in applying calibrated parameters as the ground truth to make predictions or extrapolations of model parameters in space.

Important Features for Explaining Model Parameter Variability
The average normalized values of SHAP, LRP and PD illustrated that bioclimatic variables, soil properties, and VIF were the most important feature classes explaining the spatial variability of model parameters (Figure 8). The importance values differed across three different methods, but all of them showed that atmospheric NPdep was not an important feature class. However, the original SHAP shows that atmospheric phosphorus deposition was the tenth important feature ( Figure S8b in Supporting Information S1). Some specific PFTs, for example, DBF and MF, had relatively higher SHAP and LRP (Figures S8b and S8c in Supporting Information S1), indicating that their relative parameter vectors had different ranges compared to other PFTs. All approaches showed that most climate types were not important for determining model parameters (see details in Figures S8a-S8d of the Supporting Information S1). Given the divergence among approaches, the specific important feature was hard to determine. The approach differences also existed in each parameter, while the ranks of feature classes remained in the same order as the ranks for all parameters ( Figure S9 in Supporting Information S1). On average, the most important feature classes were bioclimatic variables, soil properties, and VIF.

Well-Constrained Site-Specific Parameterization Is Better Than PFT-Dependent Parameterization
It has been shown that the long-used PFT-based parameterization cannot capture the variance in parameters within PFT (Bloom et al., 2016) and can be influenced by PFT misclassification errors. The method of directly using parameters from papers without local or global evaluation can be also risky. P-model which adopted the globally fixed parameters upscaled from the leaf scale might not include PFT errors (Mengoli et al., 2022;B. D. Stocker et al., 2020), but had limited accuracy across temporal ( Figure 2) and spatial scales (Figure 3). Here, results showed that the globally fixed parameterization method (e.g., P-model) was worse than the PFT-based method (e.g., OPT-PFT, and PFT mean ). The global optimization method (e.g., OPT-All) had slightly better performance than PFT-based optimization at the global scale ( Figure 5) due to higher spatial generalizability (e.g., Figure 3), the same as Yuan et al. (2014a). However, OPT-All had accurate predictions at fewer sites ( Figure 2). This agrees with a study using the PRELES model (Tian et al., 2020), which demonstrated that globally optimized parameters are not sufficient to reflect the variability of GPP sensitivities. Luo and Schuur (2020) also confirmed that model parameters should vary with the spatial and temporal changes of ecosystem properties but not depend on PFTs. While the site-specific parameterizations (Site sim , sRF, mRF, and mNN) have higher flexibility than clustering methods (PFT mean , Clim mean , PFT med , and Clim med ), they did not show a robust advantage due to uncertainties remained in the calibrated parameters, which were used to constrain the predicted parameters in these methods. However, the site-specific parameterization constraining GPP prediction errors and sensitivity functions, SPIE, reaches the highest performance, highlighting that the well-constrained site-specific parameterization method can provide more reliable outputs than clustering and optimization methods. This is the opposite of the conclusion of Tian et al. (2020) who tested only the site-specific optimization method showing higher uncertainties than the PFT-based optimization method.

Reduced Parameter Variability by Considering the Relationship Between Parameters and Ecosystem Properties
Our results reveal the equifinality of model parameters, which consequently increases the model uncertainty. While no extrapolated parameter vectors outperformed calibrated ones, parameter ranges were better constrained in all methods based on ecosystem properties (e.g., sRF, mRF, mNN, and SPIE) compared with site calibrations, clustering and optimization methods. The results of parameter distribution and GPP simulation performance demonstrate that considering physical links between GPP sensitivities and ecosystem properties can reduce the parameter equifinality and the overfitting in site-specific calibrations. This is also true in other LUE models (Horn & Schulz, 2011b). Furthermore, SPIE, considering only the GPP errors and constraints on sensitivity functions but not the distance to calibrated parameters, avoids inheriting uncertainties from model calibration. In general, the model parameterization relying on ecosystem properties can reduce the parameter uncertainty resulting from model equifinality.

Drivers of the Spatial Variability of GPP Sensitivities
Overall, the information in bioclimatic variables, soil properties and vegetation features explain most of the spatial variation in predicted parameters by SPIE (Figure 8). Taking T opt as an example, our results display that this parameter is controlled by bioclimatic variables primarily ( Figure S9b in Supporting Information S1), followed by soil properties and vegetation features. PFT is the fourth important feature determining the spatial variability of T opt . The results agree partly with Huang et al. (2019) and Chang et al. (2020), which illustrate the relationship of T opt to mean annual temperature and vegetation index, respectively. These two studies together with ours showed that bioclimatic variables, soil properties and vegetation features cannot be neglected in determining T opt . Furthermore, bioclimatic variables, vegetation features and soil properties presented higher importance than PFT for other parameters of the LUE model. The parameters varying with six classes of features are significantly better than those parameters only varying with PFT or without spatial variability (Figures 2-5). The results are in contrast to studies supporting PFT-based and globally fixed parameterization (Tian et al., 2020;Yuan et al., 2014a). However, the general observation is similar to the results of other independent studies using a different LUE model (Horn & Schulz, 2011b) and terrestrial biosphere models (Peaucelle et al., 2019). These two studies also demonstrated that ecosystem model parameters are controlled by climate and vegetation features along with PFT. Our study highlights the significance of considering more ecosystem properties including vegetation features, climate conditions and soil properties in ecosystem model parameterization. Given our assumption that model parameters indicate ecosystem sensitivities, the results demonstrate the ecological understanding that vegetation features, climate conditions, soil properties along with plant forms impose instrumental controls on how ecosystems respond to environmental changes.
However, the current approach needs further development toward pinpointing key features controlling the spatial variability of parameters. On the one hand, strong covariation between input features ( Figure S10 in Supporting Information S1) limits a confident variable attribution. On the other hand, differences in variable importance between different methods (e.g., SHAP, LRP, and PD) may result in different variable rankings (Figures S8 and S9 in Supporting Information S1). These may be associated with the covariation between features as well as the underlying assumptions per feature importance estimation method. However, given stationarity in the correlation structure between features, none of these aspects would limit a statistical extrapolation of parameter variability in space.   Figure S8 of the Supporting Information S1. The feature classes (see definitions in Table S3 of the Supporting Information S1) refer to plant functional types, Koeppen-Geiger climate classification types (Clim), bioclimatic variables and aridity indexes (BioClim), vegetation index features, atmospheric nitrogen and phosphorus deposition and soil properties (Soil). Differences in the model fitness between the site calibration and SPIE may reflect common overfitting issues in standard calibration exercises, challenging the generalization of a given GPP model and/or of features and architectures for the extrapolation approach. Exploring the contribution of additional variables may be a model-specific exercise, for example, illumination features, as in the model of Horn and Schulz (2011b), that can contribute to the enhanced spatial variance in predicted parameters.
Given the background understanding that ecosystem properties, including vegetation features, climate regimes, soil characteristics and other environmental traits, influence response sensitivities of carbon assimilation fluxes to environmental conditions, our results suggest further expanding SPIE approach to other kinds of terrestrial ecosystem carbon cycle models (e.g., land surface models and dynamic global vegetation models). Yet, temporal changes in ecosystem properties can affect parameters (Luo & Schuur, 2020). Our results also suggest a cautionary remark regarding the temporal variation in parameters when extrapolating or applying parameters to long time scales (e.g., beyond decadal scales in prognostic simulations).

Conclusions
In this study, we developed a model parameterization approach based on the link between ecosystem sensitivities and properties constrained by GPP simulation errors and sensitivity functions. The method leverages a neural network method to learn the parameterization of an LUE model. Moreover, the experimental design demonstrates that the method enables parameter extrapolation in regions without observational data with significantly and substantially higher accuracy than the widely used PFT-based and globally fixed parameterization methods. Compared with a set of univariate clustering, similarity-, regression-based and fixed methods (NSE > 0 at less than 41% of sites) considered in this study, our approach uniquely predicts GPP fluxes with confidence in cross-validation (NSE > 0 at 74% of sites). The resulting reduction in parameter variability through prediction via ecosystem properties suggests reductions in equifinality and overfitting, which commonly emerge from site-specific parameterizations. Vegetation features, bioclimatic variables, soil properties and some PFTs can explain the spatial variability of LUE model parameters. Overall, these are key features for diagnostic modeling exercises, although the temporal variation in the parameters and the relationship between them and ecosystem properties need further exploration toward prognostic modeling. Given the general need for biological and ecological parameterizations in carbon and water terrestrial ecosystem models, our results propose statistical learning approaches for the parameterization of land surface schemes in Earth system models.

Data Availability Statement
Meteorological input data for the LUE model is available at the website of FLUXNET . The FluxnetEO product used for extracting vegetation index is accessed from ICOS Carbon Portal for LANDSAT (Walther et al., 2022a) and for MODIS (Walther et al., 2022b). CRUNCEP data set for calculating bioclimatic variables is available at NCAR . Atmospheric NPdep data (R. Wang et al., 2017) are extracted and applied as elements of input features. SoilGrids product is accessed through WCS tool . The neural network framework was developed using trainNetwork (Kudo et al., 1999) introduced in MATLAB 2022a, available under MATLAB license at https://www.mathworks.com/products/matlab.html. Data were processed and figures were made with MATLAB 2022a. The model outputs, parameters and scripts supporting the analysis and production of figures are available at Bao and Carvalhais (2023).