[9] The inverse analysis conducted in this study was based on a Terrestrial Ecosystem Regional (TECO-R) model, where 12 data sets were used for parameter estimation. The TECO-R model was developed by combining the Carnegie-Ames-Stanford-Approach (CASA) model [*Potter et al.*, 1993; *Field et al.*, 1995] with the Vegetation-And-Soil-Carbon-Transfer (VAST) model [*Barrett*, 2002] to estimate the spatial patterns of carbon residence times in the conterminous United States. The TECO-R model uses the CASA algorithms on net primary productivity (NPP), which was estimated from satellite observation and ground measurements, and the VAST algorithms for the relationships of C transfer among pools. The TECO-R model is described in Appendix A, with a schematic diagram shown in Figure 1 and the parameter definitions given in Table 1. The parameters estimated in this study included the maximum potential light-use efficiency (*ɛ**), C allocation coefficients among pools, and C residence times in individual plant and soil pools. The allocation coefficients and residence times of C in the individual pools were integrated to estimate the whole ecosystem residence times over the conterminous United States. To facilitate parameter estimation, this study divided soil organic carbon (SOC) and root biomass into three soil layers (top: 0–20 cm, middle: 20–50 cm, and bottom: 50–100 cm) as by *Barrett* [2002], instead of compartmentalized SOC according to decomposition rates as in the Century model [*Parton et al.*, 1987]. In this way, the state variables of root biomass and SOC in the model have a one-to-one correspondence with the respective observations and do not need extra mapping functions (or observational operators) as by *Luo et al.* [2003].

#### 2.1. Data

[10] In this study, 12 observed data sets were used for the parameters estimation, which included three NPP data sets (i.e., NPP in leaves, stems, and roots), five biomass data sets (i.e., one for biomass of leaves, one for stems, and three for roots in three soil layers), one litter data set (i.e., fine litter mass), and three SOC data sets in the three soil layers. There were a total of 7660 observed data points, which contained 7 data points in fine litter, 468 data points in NPP, 316 data points in biomass, and 6869 data points in SOC. Spatial distribution of the data points and the detailed information of data points and their sources are listed in Figure S1 and Text S1.

[11] Sources of auxiliary data used in this study were (1) the AVHRR-NDVI continental subsets of 8-km spatial resolution from 1982 to 1999 available from the Data and Information Services Center of Goddard Earth Science; (2) annual solar radiation produced by the NASA/Global Energy and Water Cycle Experiment with one-by-one degree spatial resolution; (3) monthly precipitation and temperature data sets with 4-km spatial resolution offered by the Spatial Climate Analysis Service; (4) soil texture data set from State Soil Geographic Database (STATSGO) available from USDA Natural Resources Conservation Service; and (5) 1-km spatial resolution land cover data, containing eight vegetation types in the conterminous United States, derived from AVHRR using a decision tree classifier [*Hansen et al.*, 2000]. All those auxiliary data sets were resampled to a common projection (Lat-Long Projection) and spatial resolution (0.04 degree).

#### 2.2. Parameter Estimation

[12] The parameter estimation was based on the weighed least squares principle that minimized the deviations between the modeled and observed values of all the 12 data sets for each of the eight biomes, which included evergreen needleleaf forest (ENF), deciduous broadleaf forest (DBR), mixed forest (MF), woodland (W), wooded grassland (WG), shrubland (S), grassland (G), and cropland (C). Given one biome, we defined a partial cost function *j*_{m} as the sum of squares of deviations between observed and modeled values for data set *m*:

where *y*_{nm} is the nth observed data point in the mth data set; _{nm}(*x*_{n}; **a**) is the modeled value that corresponds to the observation *y*_{nm}; *N*_{m} is the total data points in the mth data set; *x*_{n} is an auxiliary forcing vector that includes NDVI, solar radiation, air temperature, precipitation, and soil texture, in a spatial grid where the nth observation was made; and *a* is a vector consisting of 22 parameters: **a** = {*ɛ**, *α*_{L}, *α*_{W}, *α*_{R}, *ξ*_{R1}, *ξ*_{R2}, *ξ*_{R3}, *τ*_{L}, *τ*_{W}, θ_{F}, θ_{C}, *η*, *τ*_{R1}, *τ*_{R2}, *τ*_{R3}, *τ*_{F}*, *τ*_{C}*, *τ*_{S1}*, *τ*_{S2}*, *τ*_{S3}*, θ_{S1}, θ_{S2}}. Each of the parameters is described with equations in Appendix A and also defined in Table 1.

[13] A particular data set may provide information to constrain a subset of parameters in vector **a**. For example, the data set of leaf NPP directly constrains the parameters of *ɛ**, *α*_{L}, and *τ*_{L}. When all 12 data sets are used, all 22 parameters can be constrained to a certain degree. One parameter may be constrained by multiple data sets. In this case, an integrated cost function *J*, which consists of *M* (=12) partial cost functions *j*_{m}, is defined to measure the deviations between modeled and observed values for all the data points in the 12 data sets. Thus the cost function, *J*, to be minimized is

where *λ*_{m} is a weighing factor of the partial cost *j*_{m}, which is inversely proportional to the variance of each data set. Thus, each data set was equally weighed in the cost function [*Luo et al.*, 2003]. The cost function, *J*, in equation (2) was applied to each of the eight biomes so that eight sets of biome-specific values of parameter vector, **a**, were obtained in this study.

[14] To estimate the globally optimal parameters, the genetic algorithm (GA) was used in this study. The parameter spaces and constraints shown in Table 1 were defined primarily in reference to the work by *Barrett* [2002], but specified for their applications to eight vegetation types in this study instead of three biomes in the continent of Australia. The steps of searching for the globally optimal parameters in this study were (1) initializing the parameter vector, **a**, from the parameter ranges in Table 1 with random numbers; (2) applying genetic algorithm (selection, crossover, and mutation) to generate the new offspring of parameter values of **a**; (3) using the generated parameter values in equations (A5)–(A11) to calculate the modeled value, _{nm}(*x*_{n}; **a**), under a steady state assumption (i.e., the d*q*_{i}/d*t* = 0, *i* = L, W, R1, R2, R3, F, C, S1, S2, and S3); (4) using observation data, *y*_{nm}, and corresponding modeled value, *y*_{nm}(*x*_{n}; **a**), to calculate partial cost function, *j*_{m}, in equation (1); (5) calculating integrated cost function *J*; and (6) judging stopping condition of evolution (change of *J* in last 100 offspring less than 0.01%). If the stopping criterion was satisfied, then the algorithm exported the optimal parameters. Otherwise it went to step (2) to continue the search.

[15] The estimated C residence times and allocation coefficients for individual C pools in plants and soils were used to calculate the aggregated C residence time for the whole ecosystem *τ*_{E} using the following formula [*Barrett*, 2002]:

where

[16] We have run the optimization algorithm for 30 times to obtain means and standard errors of the estimated parameters. Estimated standard errors reflected integration of model errors, data errors, and errors in the data-model fusion technique.

#### 2.3. Carbon Uptake

[17] The means of parameter values estimated from genetic algorithms together with the corresponding carbon pool sizes in the inverse analysis were used in forward modeling to simulate carbon uptake. The same set of environmental variables (e.g., temperature, precipitation, land cover, and soil texture) used in the inverse analysis was used in the simulation of carbon uptake.

[18] We applied the NPP increase trend estimated by *Hicke et al.* [2002] to quantify spatial distributions of carbon uptake in the conterminous United States in two ways. One was that an assumed uniform NPP increase, 1.83 g C m^{−2} a^{−1} (i.e., the averaged NPP increase in the conterminous United States estimated by *Hicke et al.* [2002]), was used for each spatial grid to evaluate C uptake. So, the spatial difference of C uptake potential in this case was caused only by the spatial pattern of C residence times. The other case was that the actual spatial pattern of NPP increases [*Hicke et al.*, 2002], combined with the spatial pattern of C residence times, was used as the driving forces to evaluate the actual C uptake potential caused by both NPP increases and C residence times. The two ways of evaluating carbon uptake can help distinguish roles of NPP increase and residence times in regulation of C uptake. The NPP increase trend was a direct driving force as it induced more C (extra carbon) to enter into the ecosystem. Carbon residence times determined the length of time the extra carbon can stay in the ecosystem and then regulated the capacity of C uptake in the ecosystem [*Luo et al.*, 2001, 2003].

[19] To focus on these two factors of NPP increases and C residence times in influencing C uptake, we assumed that there were no changes in other environmental factors (e.g., temperature) and the same rate of NPP change continued for 50 years.

#### 2.4. Sensitivity Analysis

[20] Sensitivity analyses were conducted to evaluate impacts of observation errors, steady state assumption, and initial soil organic C on parameter estimation and on the C uptake. Because of the lack of well-documented time serials of data on NPP, plant biomass, and SOC in most of the ecosystems, this study was unable to estimate residence times and initial values of pool sizes to assess nonsteady state carbon dynamics as done by *White et al.* [2005]. To examine influences of the steady state assumption on the estimated residence times, we conducted a sensitivity analysis to estimate nonsteady state carbon residence times. In the analysis, we increased C influx into an ecosystem so that C uptake equals 10 to 50% of NPP (). That is, the yearly C uptake equals 0.1 , 0.2 , 0.3 , 0.4 , and 0.5 , respectively. Under these nonsteady state scenarios, the C residence times were estimated and compared with those under steady state.

[21] As measurement errors in the observed data sets potentially impact the precision of the estimated parameters in the inverse analysis [*Raupach et al.*, 2005], we conducted a sensitivity analysis to assess the sensitivity of the estimated parameters to measurement errors. Eight scenarios were used in this study; each scenario assumed only one observation data set being overestimated and underestimated by 20%, respectively. The observation data sets in eight scenarios included (1) leaf NPP; (2) stem NPP; (3) root NPP; (4) leaf biomass; (5) stem biomass; (6) SOC in layer 1 (0–20 cm); (7) SOC in layer 2 (20–50 cm); and (8) SOC in layer 3 (50–100 cm).

[22] Land use change can substantially influence carbon uptake. We did a sensitivity analysis to evaluate potential impacts of land use on parameter estimation. We decreased soil organic carbon by 40% for the woodland to simulate land use change from previous croplands and increased it by 40% for cropland to simulate land use change from previously forested lands.

[23] Another factor that influences soil C uptake is the initial value of SOC content when C cycling processes were not in steady state. We did a sensitivity analysis to assess effects of initial values on parameter estimation with three scenarios: initial SOC being 20% below, at, and 20% above the equilibrium level.