A number of global land surface models simulate photosynthesis, respiration, and disturbance, important flows in the carbon cycle that are widely tested against flux towers and CO2 concentration gradients. The resulting forest biomass is examined in this paper for its resemblance to realistic stands, which are characterized using allometric theory. The simulated biomass pools largely do not conform to widely observed allometry, particularly for young stands. The best performing models had an explicit treatment of stand-thinning processes, which brought the slope of the allometry of these models closer to observations. Additionally, models that had relatively shorter wood turnover times performed were generally closer to observed allometries. The discrepancy between the pool distribution between models and data suggests estimates of NEE have biases when integrated over the long term, as compared to observed biomass data, and could therefore compromise long-term predictions of land carbon sources and sinks. We think that this presents a practical obstacle for improving models by informing them better with data. The approach taken in this paper, examining biomass pools allometrically, offers a simple approach to improving the characteristic behaviors of global models with the relatively sparse data that is available globally by forest inventory.
 This paper compares the distribution of different pools of forest biomass in land surface models with biomass observed in forest inventory. This study is contextualized in part by the gap in our understanding of the nature of the large land sink of CO2. Since 1850, land sinks have nearly counterbalanced land use change sources, by taking up ∼160 Pg C [Canadell et al., 2007]. Roughly 70% of the current land sink has been attributed to biomass increase, with the remainder accumulating as soil C. This 110 Pg C of extra biomass is substantial compared to the standing biomass stock of 150 Pg C in the northern hemisphere, and 200 Pg C in the tropics [Dixon et al., 1994, Table 2]. The ∼25% additional biomass embodied in the land sink is an important research object, but not well addressed by terrestrial carbon cycle models whose biomass accumulation is relatively unvalidated. It would be hopeless task to directly measure all the biomass on Earth to address this knowledge gap, so improvements to our understanding of the state and changes in global biomass will come from a mix of both improved data sources and improved model representation of the terrestrial carbon cycle.
 Currently, major research programs evaluating land surface model (LSM) performance rely almost exclusively on net ecosystem exchange (NEE) as a quality metric [Hoffmann et al., 2007], because modeled CO2 fluxes per unit land area are readily compared against the global networks of eddy covariance flux measurement sites [Baldocchi et al., 2001], the seasonal cycle of CO2 [Randerson et al., 1997], and, through atmospheric transport models, to atmospheric CO2 concentration gradients [Masarie and Tans, 1995] (Figure 1). It is insufficient, however, to validate the instantaneous performance of LSMs, when the important performance benchmark from the perspective of the carbon cycle-climate feedbacks is their long-term integral, namely the pool of carbon that is sequestered from the atmosphere as biomass. Even high-quality observations and careful model simulations can show substantial differences in representing interannual variability, suggesting that basic controls of the land carbon sink are not well constrained [Desai et al., 2010], in part because of the historical legacy of these models and measurements have emphasized characterizing short-term (i.e., diurnal to seasonal) controls of the carbon cycle. Flux tower observations of NEE aggregate many component fluxes that are difficult to partition and they generally have large short-term and long-term measurement error [Richardson et al., 2006; Wolf et al., 2008] and can only loosely distinguish the performance of different flux models [Stöckli and Vidale, 2005]. Empirical measurements of biomass, particularly viewed through the allometric lens used in this study, offer an opportunity to falsify model performance across the time span of ecosystem development relevant for monitoring the state and future of terrestrial carbon sinks on Earth.
 Biomass has a particular relevance for evaluating carbon cycle models for their suitability in data assimilation, because simulations of biomass accumulate biases in the integration of the component carbon fluxes of photosynthesis, growth, allocation, mortality and respiration. Obviously, most of these fluxes are difficult if not impossible to directly observe on the relevant spatial and temporal scales [Schulze, 2006], so direct data assimilation of most these fluxes is essentially out of the question. However, we do have some perspective on how these fluxes should add up, because it is impossible to simulate biomass correctly if the component underlying fluxes are incorrect. Because data assimilation offers the prospect of improving estimates of key underlying process fluxes in a model (such as respiration) by revising an estimate of key observable state variables (such as biomass) [Williams et al., 2009], it is critical that the relationship between biomass and the component fluxes it integrates be realistic.
 There are several impediments that have limited the ability of the land surface modeling community to validate simulations of biomass in this latter sense. Even the most careful model validation studies are still strongly hampered by the availability of biomass data for direct model-data comparison [Randerson et al., 2009]. Forest inventory programs that estimate biomass by direct measurement are important regionally [Caspersen et al., 2000; Ciais et al., 2008; Lewis et al., 2009; Phillips et al., 2008], but their limited spatial and temporal coverage inhibits their use for continuous monitoring of the global biosphere from the perspective of the carbon cycle (Figure 1). Another major impediment to validating models against measured biomass is the need to consider the many idiosyncrasies of each forest stand at each site where a model can be tested. These idiosyncrasies could include the mix of species, the aboveground climate, water or nutrient limitations, the timing of past disturbances and management. Validating a model against a measured survey of site biomass is also problematic because it does not necessarily predict good future performance if the allocation and bulk turnover of the component pools are not also validated.
 To overcome these obstacles to evaluating terrestrial carbon cycle model biomass, we employ a scaling analysis of each model's biomass pools using the concept of allometry. Allometry is a widely used technique for summarizing biomass variation collected in forest inventories [Jenkins et al., 2004], and we explore it here as a link to compare land surface model simulations and forest inventory data (Figure 1). Allometric scaling relates the size of one part of an organism to that of a different part of the organism across a range of sizes, and thus captures the biophysical and evolutionary constraints to the ontology of form. Carbon cycle models simulate the biomass in different biomass pools that together constitute a forest, but do these trees look like forest stands we observe in nature? Despite the abundance of idiosyncrasies that make each forest stand unique, allometry suggests there are evolutionary forces and biophysical constraints that greatly constrains the covariation of different components of biomass in a forest that are empirically observed [cf. Wolf et al., 2010]. This variation extends from young stands with many small trees with proportionately large foliage biomass, to old stand with few large trees with proportionately large stem biomass.
 The allometric scaling analyses in this paper involve the regression of one component of biomass, say foliage biomass, against another component of biomass, say stem biomass, for individuals across a range of tree sizes [Niklas et al., 2003; Price et al., 2007]. Biomass components of individual trees are calculated for the mean tree in a stand by taking stand biomass and dividing by the number of mature trees (defined as larger than a minimum stem size, say 2.54 cm). This process discards all information about the distribution of stem sizes within a stand, but the remaining information is remarkably effective at summarizing quantitative differences between stands, because stem size distributions themselves have very regular features that progress in predictable ways with stand development [Mohler et al., 1978; West et al., 2009].
 Ultimately, allometric models and carbon cycle models should be complementary, because they are trying explain the same thing: the variation in plant biomass observed in nature. Interestingly, the considerations of branching morphology and biophysical vascular constrains that are used formulate allometric scaling theory [Enquist and Niklas, 2001; Enquist et al., 1998; West et al., 1999] are almost wholly different from the biochemical and thermodynamic concepts used to develop land surface models [e.g., Bonan, 2008; Monteith and Unsworth, 2008]. A synthesis between these two fields is slowly starting to emerge [Enquist et al., 2007], but in the meantime we can evaluate how well the ecosystem models commonly used in carbon cycle science reproduce theoretical predictions and empirical relations emerging from allometric scaling theory.
 We hope that the synthesis between these two largely independent efforts offers a path forward to improved representation of ecosystem processes, by emphasizing the need for ecosystem models to accommodate the broad array of measurements, such as forest inventory, that come from direct observation of individual plants (Figure 1). In particular, this study is motivated by the desire to link land surface models to satellite data, which together could help constrain our understanding of the state of the world's forests. Next generation remote sensing observations will be sensitive to individual level attributes such as crown diameter, tree height, and stand density [Hurtt et al., 2010; Wolf et al., 2010], but there is no clear way to use such data to inform land surface models unless these models have a representation of individual-level state variables (Figure 1). Allometry can be the conceptual link that bridges current land surface models with remote sensing to improve constraints on the state of the carbon cycle.
 Our objective in this paper will be to calculate allometric scaling parameters for different biomass components in land surface models, compare these parameters with those obtained from empirical studies of forest component biomass, and discuss the implications for any discrepancies.
 In this paper, the output of several models designed for global applications and future projections of the coupled carbon-climate system is compared from an allometric perspective to several data sets of forest biomass that include separate component measurements of biomass: foliage, branch and trunk (stem), and coarse roots. The land surface models (LSMs) and the forest biomass databases are described in more detail under their respective section headings below. The LSMs depict biomass as stored in foliage, wood, and fine root pools, which are important to distinguish in a carbon cycle because they have different turnover times. These pools are for the most part directly comparable to the biomass components reported in the databases, with some subtle differences that require attention: the wood pool in LSMs is not always separated into aboveground stem and belowground coarse wood, and the root biomass in forest databases do not generally distinguish fine roots. Refer to Table 1 for a summary of the different biomass pools used in both the forest databases and LSMs and how they are related. In addition to resolving the definitions of the different pools between databases and LSMs, the component biomass information in the databases is summarized using allometric equations, which requires reframing the pools in LSMs on a per individual basis for comparison. These methodological issues are considered in the following paragraphs.
All masses express total dry matter. A log prefix in the text (e.g., logM) refers to log-10 transformation.
Tree population density
Foliage mass per tree
Stem (trunk + branch) mass per tree
Coarse root mass per tree
Fine root mass per tree
Mcroot + Mfroot
Mstem + Mcroot
Mfol + Mfroot
Mfol + Mstem + Mcroot + Mfroot
Coarse woody debris mass per tree
Mean tree height
 The estimation of allometric scaling coefficients for empirical data and LSMs uses biomass pools calculated on a mean individual basis. For the inventory data, the number of individuals per area (N) is reported in the literature comprising the database, so biomass per individual (M) is readily calculated as biomass per area (M * N) divided by N. The biomass per individual is able to be directly calculated in ED and Orchidee-FM, which directly simulate population in different diameter size classes. Orchidee-STD maintains a diagnostic population density variable, based on the quadratic mean diameter, that can be used to compute the size of the average individual. With the remaining LSMs, we estimate the number of individuals per area implicit in the simulated biomass per area by using the widely observed “self-thinning” behavior of forest stands, in which young stands composed of many small trees develop into older stands with few large trees. The self-thinning “law” [Mohler et al., 1978; Yoda et al., 1963] is formulated as:
where M is the mass of an individual tree (kg) and N is the number density of trees (ha−1). (See Table 1 for a summary of variables referred to in this study.) The scaling exponent β generally takes values of −3/2 [Yoda et al., 1963] or −4/3 [Enquist et al., 1998], but regardless of the appropriate theoretical value, the parameters can be empirically estimated from data by linear regression after taking the logarithm of each side:
where α′ = log(α). This relationship is calculated indirectly from stand level forest inventory data [e.g., Cannell, 1982] on N and biomass per area M * N (kg ha−1), which has been divided by N to estimate a mean M [Enquist and Niklas, 2002]. This is in essence the same transfer function necessary to estimate N from the area-based biomass M * N that is simulated by carbon cycle models, which is necessary to calculate individual biomass M. Multiplying the antilog of each side of (2) by N gives the scaling relation between M * N with N:
Rearrangement of (3) to bring N to the left hand side permits the diagnosis of N from M * N as simulated by carbon cycle models:
 The scaling parameters to equation (1) were estimated separately for angiosperms and gymnosperms from the Cannell database using Type II (reduced major axis or RMA) regression to estimate the scaling parameters [Sokal and Rohlf, 1995, section 14.13]. RMA regression was used in part by convention, and in part because the solved parameters are symmetric with respect to X and Y in RMA regression, such that if Y ∝ Xβ, then X ∝ Y1/β, which is not true in Type I regression [Niklas, 1994]. The parameters were checked for significant differences between tropical, temperate, and boreal sites, as well as monoculture and multispecies stands. The tree population density N for each forested grid cell in each model was then diagnosed using the α and β calibrated using the Cannell database. The biomass pools in each LSM were then divided by N to estimate component biomass pools per individual for each grid cell.
 Most models did not distinguish aboveground and belowground woody biomass (the exception is Orchidee). The separation of woody biomass (Mwood) into stem (Mstem) and coarse roots (Mcroot), and subsequent aggregation of root biomass (Mroot) from fine root (Mfroot) and Mcroot was made on the basis of allometric scaling between Mcroot:Mwood using the Luyssaert database [Luyssaert et al., 2007]. The allometric relationship between Mcroot:Mwood from the Luyssaert database was estimated as Mcroot = 0.2357 * Mwood0.978 (r = 0.972, n = 40). This relationship was nearly isometric (β = 1), indicating that Mcroot was a fixed fraction of Mwood (Figure S1 in Text S1). Given that Mfroot was a small fraction of total belowground biomass, this allometry was consistent with the isometric scaling by Enquist and Niklas . The proportionality constant was not significantly different from 0.25, which is the proportionality used in CASA, so for ease of interpretation, the ratio between Mstem:Mcroot was also fixed at 75:25 for IBIS, Triffid, JSBACH, ED, and MEL. For JSBACH, the “active pool” of biomass includes both Mfol and Mfroot, which had to be separated in this study. The JSBACH pools high turnover biomass together, and so a similar approach was employed to separate Mfol and Mfroot from Mactive, using data from the Luyssaert database. The fitted regression using the Luyssaert database was log(Mfroot) = −0.3552 + 1.04 * log(Mactive). This relation was nearly identical to (and not significantly different from) Mfroot = 0.5 * Mactive (Figure S2 in Text S1), which we adopted in this study for ease of interpretation.
N and Mcwd columns indicate whether these models include representations for the tree population density and the coarse woody debris biomass pool, respectively. τfol, τwood and τfroot represent the characteristic lifetimes of foliage, wood, and fine roots, respectively. Studies are ordered from fastest turnover to slowest turnover: ss, varying with successional status; a, angiosperm; g, gymnosperm; *, maximum value with additional losses imposed by stand thinning; **, maximum value, with additional losses imposed by climate.
 The LSM simulation results come from a variety of sources. The model results for CASA, IBIS, Triffid, Orchidee and JSBACH come from a single year in distributed global simulations, run to equilibrium with current CO2, in which biomass from every forested pixel has been used to calculate scaling parameters. ED model results come from the final time period of a distributed regional run reported by Moorcroft et al. . These results can be seen as cross-sectional data, similar to the forest inventory databases themselves. Each model has a variety of biome types representing broadleaves and conifers from different biomes; all gymnosperm and angiosperm biomes in the LSMs were aggregated for this study for the estimation of population density and for comparison to forest inventory data. The exception to this scheme is CASA, which does not have a separate parameterization for gymnosperm and angiosperms; the calculation of trees per area comes from scaling parameters for grouped angiosperm and gymnosperm data. Biomass scaling parameters derived from MEL come from a single run parameterized for the H.J. Andrews experimental forest in Oregon, dominated by the Douglas Fir (Pseudotsuga menziesii), and therefore population density was calculated using the equation for gymnosperms and compared to gymnosperm inventory data.
 The biomass distribution in the different LSMs was evaluated against several databases of forest biomass, including the Cannell world forest biomass and production database [Cannell, 1982], which is a benchmark database used in several high-profile papers developing allometric scaling theory [Enquist and Niklas, 2001, 2002; Enquist et al., 1998; Niklas and Enquist, 2001, 2002], supplemented by a more extensive Russian forest inventory database translated by the authors [Usoltsev, 2001] and a smaller, but more contemporary and methodologically vetted database compiled from Fluxnet sites [Luyssaert et al., 2007]. All databases include data on tree population density, leaf, stem, and root biomass collected from a large number of diverse published sources.
 The Cannell database [Cannell, 1982] (hereafter “the Cannell database”) reports the number of trees per area, average plant height, and plant biomass per area separated when possible into leaves, branches, bark, stem, reproductive structures, and roots. Mass of each pool per individual was computed by dividing mass per area by trees per area to yield mass per individual. All results henceforth will refer to mass (kg dry matter) per individual unless otherwise noted.
 The Usoltsev database [Usoltsev, 2001] (hereafter “the Usoltsev database”) is a previously unpublished (in English) database of destructive forest harvests from boreal Eurasia. Approximately half of the studies cited in the Usoltsev database have been used in IIASA studies of Eurasian biomass [Lapenis et al., 2005; Shvidenko and Nilsson, 2002, 2003; Shvidenko et al., 2007], but Usoltsev database includes both more studies, and additional statistics from each study. The Usoltsev database includes 3874 records of destructively harvested forest plots spanning boreal Eurasia, approximately evenly split between four genera of gymnosperms (n = 1764) and 7 genera of angiosperms (n = 2110). By far the genus most represented is Pinus, which is largely composed of Pinus sylvestris L. from western Russia and Scandinavia. Each record in the database cites an author and publication year, the location of the plot (to county level, and generally with coarse coordinates), and general attributes such as the dominant species, the yield potential class [Shvidenko et al., 2007], and the species composition. For each plot, the age, the tree density, diameter, height, volume of trunk, and mass of trunk, bark, branches, foliage, roots, and understory are reported. Not all records include all data, particularly bark, roots and understory biomass. The areal mass reported in the Usoltsev database was converted to an individual basis by dividing by the population density, as was done by Enquist and Niklas  for analyzing allometry within the Cannell database. This database is used to broaden coverage of boreal forests, which are relatively underrepresented in the Cannell database, and provide an independent check on biomass allometry between Mfol, Mstem and Mroot. Additionally, the Usoltsev database has a large number of Mcwd measurements that are used to test the allometry of Mcwd:M.
 The Luyssaert database [Luyssaert et al., 2007] (hereafter “the Luyssaert database”) is collected from a comparatively small number of sites (n = 111 total, n = 41 angiospermae, n = 65 gymnospermae, n = 5 mixed) but that have been more intensively studied as part of the Fluxnet program. This data set was previously used to examine net ecosystem exchange in old growth forests [Luyssaert et al., 2008], but has not otherwise been used to test allometric relationships. Like the Cannell and Usoltsev databases, the Luyssaert database includes stand age, diameter, height, density, and basal area, and separate biomass measurements for foliage, branch, stem, coarse and fine root pools. In general, the Luyssaert database is more comprehensive for NPP measurements than for biomass measurements, with the exception of its treatment of fine root biomass. Therefore this database is used mainly to estimate scaling exponents for separating Mwood into Mstem and Mcroot and in one case for separating “active biomass” into Mfol and Mfroot components. The Luyssaert database was also used to test the allometry of Mfroot:M.
 Although the goal of this study was to compare the biomass allometry implicit in land surface models with the allometry estimated empirically using the forest inventories above, the self-thinning relationship (equation (1)) from the Cannell database was used in some cases to estimate the population density of forests from stand-level biomass for both angiosperm and gymnosperm taxa. Because these two equations were imposed on all of the Earth's angiosperm and gymnosperm biomes, as conceptualized in the land surface models considered here, care was taken to ensure that the data was relevant to diverse, natural forest stands, a major portion of which are located in the tropics, which are generally undermeasured relative to temperate regions. The Cannell database does have the greatest representation from temperate sites, but nearly 20 percent of all sites (198 of 1047) are from tropical regions. And while one might expect most sites to be monospecific, managed sites, nearly 2/3 of the sites (670) have more that 1 species (Table 3). Significant differences in the parameter values between different subsets of the data (tropical versus temperate and boreal; monoculture versus polyculture) were examined and are be presented in the results.
Table 3. Distribution of Plots in Cannell Forest Database Among Various Taxa, Climate Zone, and Species Diversity
 The main theoretical work on allometry referenced in this paper is Enquist and Niklas , in which Mfol ∝ Mstem3/4, Mfol ∝ Mroot3/4, and Mstem ∝ Mroot1. It is not the goal of this paper to argue that these scaling exponents are necessarily correct, but rather that they offer guidance in the types of relationships one would expect, and clarify the relationships observed in the empirical data.
 As a diagnostic to interpret discrepancies between modeled and observed allometries, aboveground coarse woody debris (CWD) is treated in this paper as a biomass pool that obeys allometric scaling. In other words, we posit that the mass of CWD divided by N (Mcwd) scales with aboveground woody biomass (Mstem) with some scaling exponent, as an outcome of self-thinning of stands and self-pruning of individual trees. Although this is an atypical consideration of CWD [cf. Harmon et al., 2000, 1986], the collection of CWD data in forest inventories [Krankina et al., 2002; Waddell, 2002] make it amenable to scaling analysis and comparison with model simulations.
 This study exploited the “self-thinning” relation that scales M to N. The global Cannell database was used to fit the parameters relating M to N, and the results checked against the Usoltsev data, because the boreal biomes were relatively underrepresented in the Cannell database (Table 3). Figure 2a shows the M:N regression fitted to the Cannell database, and Figure 2b shows the resulting inverse relationship relating N from Mtot, again superimposed to the Cannell forest plots. Figure 2c shows the Cannell M:N regression superimposed on the Usoltsev boreal plots, and similarly, Figure 2d shows the inverse relation predicting N from Mtot fitted to the Cannell plots superimposed on the Usoltsev plots. Interestingly, the line predicting N from Mtot (Figures 2b and 2d) lies intermediate between the Type I (least squares) and Type II (reduced major axis) regression fits if they were directly applied to regress N on Mtot. However, it is important to note that M:N scaling parameters estimated from regressions fit to N:Mtot were wholly inconsistent with the range of parameters presented elsewhere for the self-thinning relationship. Therefore, while the amount of variation in N explained by Mtot may appear to be low for both the Cannell and Usoltsev databases, this relationship captured both the theoretical understanding of M:N scaling in forests and had coefficient values that are intermediate relative to direct regression of N on Mtot.
 The M-N scaling relationship adopted from the Cannell database and imposed on IBIS, Triffid, CASA, JSBACH and MEL was log(M) = 6.22 − 1.320 * log(N) for angiosperms and log(M) = 6.619 * −1.453 * log(N) for gymnosperms (Table 4). Within each taxonomic class (angiosperms and gymnosperms), the M:N regression was calculated separately for sites in temperate and boreal versus tropical zones, and plots composed of monoculture versus polyculture (Table 4). While the different regressions did have different coefficients (Figure S3 in Text S1), the treatments with the largest departures, such as tropical gymnosperms, had relatively lower sample size, and did not have allometric scaling coefficients that were significantly different than those derived from all plots within each taxon.
Table 4. Mass-Density Scaling in the Cannell Database for Different Subsets, Using the Power Law Relationship log10(M) = α + β * log10(N), Where M is Individual Biomass (kg DM) and N is Stand Density (ha−1)a
Temp refers to temperate and boreal zones, and trop refers to tropical zones. Mono refers to monoculture plots with only one reported species, and poly refers to plots with more than one species. n is sample size.
Y, X, n
M, N, α
M, N, β
M, N, 2σ
 The self-thinning exponents relating M to N for Orchidee-STD, Orchidee-FM and ED are presented in Table 5 and shown in Figure S4 in Text S1. CASA, IBIS, Triffid, JSBACH and MEL all are represented by the same equation, which is identical to the M-N regression from the Cannell database used to estimate the number of individuals per area in these models. Both Orchidee models reasonably approximated the observed self-thinning in the Cannell and Usoltsev databases, but ED had a self-thinning line that is considerably steeper than the observed M-N line (Figure S4a in Text S1). It should be noted that N and Mtot for ED were calculated as integrals over all size cohorts in a patch, analogous to the empirical forest inventory data. Although the modeled patches largely fall near the regression for the Cannell and Usoltsev data, the patches are high-N low-M patches that are largely outside the plots observed in the databases, despite the majority of the cohorts being >50 years old.
Table 5. Allometric Relationships Between Tree Biomass Components (Mfol, Mstem, Mroot) and Tree Density (N), Using the Power Law Relationship log10(Y) = α + β * log10(X)a
Confidence intervals around β are denoted β+ and β− for the upper and lower bounds. Model parameters highlighted in bold are not significantly different from the C82 data. Asterisk does not have a CWD pool; r is the correlation coefficient.
Y, X, n
M, N, α
M, N, β
M, N, β−
M, N, β+
M, N, r
Mfol, Mstem, α
Mfol, Mstem, β
Mfol, Mstem, β−
Mfol, Mstem, β+
Mfol, Mstem, r
Mfol, Mroot, α
Mfol, Mroot, β
Mfol, Mroot, β−
Mfol, Mroot, β+
Mfol, Mroot, r
Mstem, Mroot, α
Mstem, Mroot, β
Mstem, Mroot, β−
Mstem, Mroot, β+
Mstem, Mroot, r
Mcwd, Mstem, α
Mcwd, Mstem, β
Mcwd, Mstem, β−
Mcwd, Mstem, β+
Mcwd, Mstem, r
 The allometry between Mfol:Mstem, Mfol:Mroot and Mstem:Mroot is likewise presented in Table 5. Few models are within the 95% confidence intervals for the slope term of Mfol:Mstem allometry. The main bias evident in the models was an Mfol:Mstem relationship that is too steep, with an intercept that is too low, meaning an inability to reproduce biomass distributions in younger stands (Figure 3). The worst performing models in this regard were CASA and IBIS, which only passed through the data at fairly high levels of Mstem. The best models, Orchidee-FM and ED had Mfol:Mstem exponents that were not significantly different from observed and perform well across the range of Mstem, although Orchidee-FM was systematically low for both angiosperms and gymnosperms. This underprediction is particularly true for gymnosperms, in which nearly all of the models save Orchidee-STD passed below the lower envelope of gymnosperm data. By contrast, Orchidee-STD showed exceptionally high Mfol for conifers.
 In general, the models performed better for both Mfol:Mroot and Mstem:Mroot (Figure 3), although because Mfroot is a relatively small component of Mroot, the tight convergence in models in Mstem:Mroot is partly an outcome of imposing the allometry from the Luyssaert database which sets the ratio between Mstem:Mcroot (Figure S1 in Text S1). MEL was the only outlier in the Mstem:Mroot allometry, largely because it had the largest Mfroot component (Figure 4). All the models diverged greatly from each other and from the data in Mfroot:M (Figure 4), and it is interesting to note the strong differences between allocation to Mfroot in both angiosperms and gymnosperms. ED and MEL and CASA best approximated the Luyssaert database, but both IBIS and TRIFFID had 1–2 orders of magnitude too small of Mfroot than observed.
 The allometry of Mcwd:Mstem likewise showed large dispersion among models, and large differences between the models and data (Figure 4). The allometry of Mcwd:Mstem was isometric (β ∼ 1). Orchidee-FM and CASA most closely approximated the upper envelope of Mcwd, MEL and ED approximated the lower envelope of Mcwd, but IBIS has approx 1 order of magnitude too much Mcwd and Orchidee-STD has 2 orders of magnitude too little Mcwd. Note that neither Triffid nor JSBACH had a pool representing litter or coarse woody debris.
 Errors in self-thinning and biomass allocation partly compensate for one another in some LSMs to yield approximately correct distributions of biomass in leaf, stem, and root over a range of stand biomass for several models (Figure 5). In particular, Orchidee-STD, Orchidee-FM and ED all represent the growth trajectory of biomass distributions stands fairly well, particularly the sharp decline in the proportion of foliage biomass and the slow decline in root:shoot ratio. LSMs that were provided with the self-thinning relation from the Cannell database are at odds with the empirical data, particularly being too “stemmy” in younger stands. The greatest contrasts among models and between models and data are in the younger (low biomass) stands; all models and data converge on approximately the same result in the oldest (high biomass) stands.
 This paper is intended to diagnose shortcomings in model representation of biomass, using a new allometric approach. Particularly relevant to a world where human and nonhuman disturbance is fundamentally altering the age structure of forests, it is important to check whether the distribution of biomass in fast (foliage and fine root) and slow (stem, coarse root, CWD) pools is represented equally well by models in all stages of forest succession. It may be less obvious that this paper is also intended to enhance the possibility for informing models with global data. That is, if we were to provide LSMs with better information on disturbance, forest structure, or even biomass directly, could the models accommodate this information and put it to sound use?
 As we have alluded to earlier, examining the allometry implicit in land surface models is difficult to approach directly, because most models have skipped lightly over the process of scaling up from individual trees (the object of analysis in allometry) to whole stands. To be sure some models, particularly ED and Orchidee-FM, have directly addressed the scaling of individuals to stands and each parameterizes the competition among plants for horizontal space that results in the loss of individual trees as each grows larger over the course of stand development. The remaining models have essentially ignored this issue, and treated woody biomass as a large aggregated pool, which we term the “big wood” approximation, by analogy to the “big leaf” models that scale up leaf-level photosynthesis to the canopy by explicitly treating interactions between leaves [Ehleringer and Field, 1993]. The big wood approximation in essence considers forest biomass as obeying first-order chemical kinetics dM/dt = −kM + NPP, where all factors that contribute to forest biomass loss, such as growth respiration, maintenance respiration, branch mortality and whole-tree mortality, are embodied in a single rate parameter k. Big wood models were an important step in the creation of global-scale carbon cycle simulations, both because their simplified set of equations eased the burden of computation [Running and Coughlan, 1988], and because these area-based models were readily linked to global gridded canopy reflectance measured by satellites [Running and Nemani, 1988; Sellers et al., 1997], which is informative of the photosynthetic activity of ecosystems [Sellers et al., 1992].
 Although there are clear benefits to modeling biomass using the big wood approximation, above all simplicity and speed, there are consequences that are worth considering, particularly in light of the large disagreement of the size of the future carbon sink among different LSMs. The behavior of the LSMs allometrically is by and large inconsistent with the data. First, it should be pointed out that only Orchidee has information on plant population density and distinguishes between aboveground and belowground biomass for direct comparison to empirical data. CASA did distinguish between aboveground and belowground biomass, but had no concept of plant density, and ED had a concept of plant density but no distinction between aboveground and belowground biomass. All other models required a post hoc recalculation of data to compare with forest inventory, which serves to emphasize that these are not models readily amenable to validation against conventional observed biomass data.
 Given that the population density in the models could be calculated either using the internal model representation for plant density or with the allometry imposed from empirically measured scaling constants, the models largely performed poorly with respect to the scaling of different organs. Of the different plant organ allometries, only Orchidee-FM (for angiosperms only) has the correct allometry between Mfol:Mstem. Surprisingly, no model had a reasonably high amount of foliage in conifers except Orchidee-FM, which had nearly an order of magnitude too much Mfol for a given Mstem. Most models had a reasonable Mstem:Mroot allometry, but this is largely attributable to the Mstem:Mcroot scaling imposed from the data.
 The Mfol:Mstem allometry calculated from the models generally follows the gradient of characteristic tissue lifetimes (Table 2). The models with the greatest foliage for a given stem biomass also had the fastest stem biomass turnover rates, led by ED, whose ratio of wood lifetime to leaf lifetimes is fairly low (25–30). JSBACH, which has the largest ratio of lifetimes (100), also has the allometry that most favors Mstem over Mfol. Orchidee-STD, which has the greatest bias toward Mfol among gymnosperms, also has the shortest ratio of lifetimes (16). Another feature evident from the Mfol:Mstem allometry is that the models that include some treatment of stand thinning (ED, Orchidee) have generally flatter Mfol:Mstem allometries than the remaining models, suggesting that the inclusion of a mechanism for increasing Mstem loss as the stand biomass increases is central to reproducing the slope of this allometry. Among the remaining models, the ordering from left to right of Triffid-CASA-IBIS-JSBACH in their Mfol:Mstem allometries for angiosperms follows the ordering of the ratios of their stem:leaf tissue lifetimes of 25–50–80–100.
 The worst model performance is for Mfroot and Mcwd scaling, in which most models had errors of 1 order of magnitude and larger in the intercept term. Fine roots are a small component of biomass but their turnover is a major fraction of CO2 efflux from ecosystems [Jackson et al., 1997], and this analysis suggests that this pool may be greatly underestimated by the models, particularly IBIS and Triffid, which suggests excessively large turnover rate. Similarly, CWD has important implications on the carbon balance of ecosystems because of its large mean residence time [Harmon et al., 1990; Ramankutty et al., 2007], but this analysis showed that all models have either an order of magnitude or more excess Mcwd (IBIS, Orchidee-FM, CASA) or an order of magnitude or more too little Mcwd (MEL, Orchidee-STD). Neither Triffid nor JSBACH have a representation of Mcwd, and it is reasonable to interpret the extremely small Mcwd simulated by Orchidee-STD as a misrepresentation of structural litter (dead straw or leaves) as CWD. Because Mcwd, as a slow turnover pool, provides inertia to the carbon balance of ecosystems, it is logical to attribute the high feedback of Triffid in the C4MIP comparison [Friedlingstein et al., 2006] to a structural failure in the model, namely an essential reservoir for carbon that is completely absent and which would dampen model feedback to climate perturbation.
 The combination of too little Mfroot, too much Mstem and Mcwd suggest that the rate of respiration in LSMs is underestimated. It would be impossible from the data presented to determine whether the problem lies with parameters governing allocation (the flux of C into these pools) or turnover (the flux of C out of these pools), but the overall picture is one in which the velocity of C through the ecosystem is too slow. It is interesting that the Orchidee-FM model showed substantial improvements over Orchidee-STD in both Mstem and Mcwd, possibly attributable to explicit treatment of branch biomass and mortality in Orchidee-FM. This suggests that the bulk parameterization of stem wood in Orchidee-STD as inclusive of branches led to an overestimation in the residence time of carbon in woody pools by neglecting fine wood with higher rates of turnover caused by self-pruning. To the extent that Mfroot and Mfol are strongly correlated with Mstem (on a per individual basis), and these pools are important paths of C into the atmosphere due to high turnover, observations of Mstem can give improve constraints on the magnitude of these fluxes by constraining the size of these pools.
 Generally, the accumulation of biomass over the course of stand development in LSMs does not closely resemble the empirical data. To illustrate this, the component allometries of foliage (Mfol:M), stem (Mstem:M), and root (Mroot:M) per individual were integrated to an areal basis using their individual M:N allometry to illustrate the practical implications for errors in allometry implicit in LSMs (Table S1). The self-thinning allometries in the models that have a representation of individuals (Orchidee-STD, Orchidee-FM and ED) create stands that are at odds with the empirical allometry relating individual biomass against stand biomass (Figure S5 in Text S1). Although the estimated scaling exponents of these models were approximately correct, the departure of the scaling lines from the data was greatest for young stands (high number/low biomass). The steepness of the self-thinning curves in these models implied that for a given total biomass, the stand is composed of more, and smaller individual trees than is observed in nature. The median values of stand biomass are 154 and 178 t DM/ha for the Cannell and Usoltsev databases, respectively, with stands of 500 t/ha falling above the 90th and 95th percentile of all stands in the Cannell and Usoltsev databases, respectively. The median tree biomass for these databases is 80 and 115 kg/tree in the Cannell and Usoltsev databases, respectively. At these values of stand biomass, the LSMs produce individuals that are approximately 100th the size found in the databases.
 Allometric theory [Enquist and Niklas, 2002] suggests that the biomass distribution in different pools has a strong dependence on organism size, such that the discrepancy in organism size between the models and data leads to major differences in the biomass pools for organisms of a different sizes (Figure S6 in Text S1). The Mstem:Mroot distribution in the models is fairly consistent with the data (a notable exception is IBIS), but the fraction of biomass in foliage is in general less than observed, particularly for gymnosperms. Most models do not simulate the nitrogen cycle, and some like Orchidee, substitute a nitrogen limitation to foliage growth with an imposed limit to leaf area index, which could explain this pattern. CASA, by contrast, has much too much leaf biomass compared to the data (Figure S6 in Text S1). Because the turnover times of foliage [Williams et al., 1989; Wright and Westoby, 2002], fine roots [Jackson et al., 1997; Matamala et al., 2003], wood [Makinen, 1999], and coarse woody debris [Delaney et al., 1998; Harmon et al., 2000] are orders of magnitude different from one another, and have different construction and maintenance costs [Amthor, 2000; Merino et al., 1984; Penning de Vries et al., 1974], the total ecosystem respiratory losses due to leaf growth and senescence, wood mortality and fine root turnover are closely linked to the distribution of biomass in these different pools.
 Given the choice, would we want models to perform better during early growth or at the mature equilibrium? Models are generally spun up to reach equilibrium biomass in order to match the estimated global biomass of 600 Gt C [Olson et al., 1983]. However, evidence from a number of biomes worldwide suggests that forests in many regions are far from equilibrium, and in many cases different forces now lead to forests becoming demographically younger due to harvest [Houghton, 2005], increased fire frequency [Soja et al., 2006], climate or herbivore enhanced mortality [Ayres and Lombardero, 2000; van Mantgem et al., 2009] and woody encroachment [Asner et al., 2003; Mast et al., 1997], so estimating the current and future carbon cycle arguably requires greater fidelity during early growth. The analysis above suggests that none of the LSMs can accommodate satellite or forest inventory information about forest structure (i.e., tree number or crown size), stand age, or total stand biomass because of structural errors in the models themselves that create logical inconsistencies between these forest stand attributes at variance with that observed in nature.
 To understand the implications of these discrepancies, consider the goal of enhancing the ability of linking LSMs to global satellite data. A previous study found that optical remote sensing data, particularly the underused off-nadir and multiangle observations, are strongly affected by the size, shape and spatial arrangement of trees in the scene [Wolf et al., 2010], which is an extension of the traditional use of optical remote sensing for estimation of leaf area index using nadir-observed NDVI. The implications for this finding are that global land surface models can be informed by the sizes of trees, as well as the large number of attributes associated with size, particularly biomass. The potential to properly initialize land surface models with appropriate biomass has important consequences for accurately predicting the strength of the carbon source or sink for land ecosystems [Friend et al., 2007; Williams et al., 2009]. There is, however, a major obstacle preventing the direct link between land surface models and satellite reflectance, because with few exceptions land surface models do not simulate any state variables related to the size or number of individual trees.
 The present study does not take up the issue of linking land surface models to satellite reflectance by estimating the number and size of individual trees. Instead, it considers the issue: if individual level biomass were provided to a land surface model, are there structural biases that would inhibit the appropriate use of such data? It is useful in this context to consider the data assimilation equation for the Kalman Filter:
where k∣k refers to the estimated state of the system (for instance forest biomass) at time k, subject to all information available by time k. The estimate of X is based on a weighted average of the previous estimate of X at time k − 1 (Xk∣k−1), plus some new data Yk, which has been remapped in terms of X, using the Hessian matrix of partial derivatives H, where Hi,j = dYi/dXj. The final term is called the observation error, and represents the discrepancy between the observation Y and the modeled prediction of what should be observed based on the prior estimate of the state, f(X). The weighting of the prior state estimate and the new data are cast in terms of the inverse of their uncertainty, where P and R are the variance covariance matrices of the state vector and the data, respectively. The Kalman filter is a particular form of this weighted average, in which the new state estimate is framed as a linear combination of the old state estimate and an observation error, multiplied by the “Kalman gain,” PH(HPHT + R)−1.
 We would like to draw attention to the fact that the new information is assimilated via a matrix of partial derivatives H, which is acting on small errors on the prior state estimate (Y − f(X)), which implies that the important benchmark for model accuracy is not only whether the correspondence between a model and reality is approximately right in magnitude, but that in fact its functional form of the model trajectory is also correct. To make this more concrete, consider the following scenario: say we model biomass M for some plot of land, and estimate it as 210 Mg/ha. Later we come to learn that a separate survey estimates the biomass M as 231 Mg/ha. It appears we have to correct our prior estimate of biomass, but how? Logically we would reduce the biomass in all individual pools (Mfol, Mstem, Mcroot, Mfroot) that comprise the total biomass, and we would use some vector of partial derivatives of each state variable to each observation to allocate M among the different pools, which in this case yields H = [dMfol/dM, dMstem/dM, dMcroot/dM, dMfroot/dM]. Across some small perturbation of biomass, the H matrix can be thought of as linear, but for larger observation errors, we know from analyses of allometric scaling [Enquist and Niklas, 2002] that the relationships between the different components of biomass are nonlinear, and vary greatly over the span from young forests with many small trees to large forests with few large trees. It follows from this that the important benchmark of model accuracy for assimilating biomass is the allometric relations embodied within the land surface models.
 The stand biomass simulated by a number of land surface models (LSMs) that are widely used in carbon cycle research was examined to determine whether the interrelationships between component parts, particularly between foliage, woody stem, coarse and fine roots, were consistent with observations in several forest biomass databases and with theoretical predictions. The interrelationships between biomass of different plant parts are collectively known as allometry. The allometric approach to diagnosing whether models realistically simulate biomass distributions was particularly powerful in identifying strong departures from empirically measured stands, and whether the errors occurred equally in stands with low biomass and high biomass. For the most part, models have a disproportionate amount of stem, with the largest discrepancy for young stands (low biomass, high population), which are much more widespread globally than old growth stands. The slope of the Mfol:Mstem allometry was different between stands that represented self-thinning (flatter slopes, consistent with data) and those that did not (steeper slopes, not consistent with data). The intercept of the Mfol:Mstem allometry was linked to the relative turnover time of the tissues, with those models having a smaller ratio of wood:leaf turnover having a higher intercept (consistent with the data). The best performing models generally had the shortest wood turnover time. All models perform particularly badly when compared to coarse woody debris data, with some models dramatically overestimating CWD and some models underestimating CWD. Those that underestimated CWD did not conceptually define this pool as woody debris, but as fine litter. The discrepancy between the pool distribution between models and data suggests estimates of NEE may have biases when integrated over the long term, as compared to observed biomass data, and could therefore compromise long-term predictions of land carbon sources and sinks. The pattern of excess stem and CWD and too little fine roots and foliage suggests that the flux of carbon through most ecosystem models is too low, leading to underestimations in respiration associated with mortality and turnover. We think that this presents a practical obstacle for improving models by informing them better with data. The approach taken in this paper, examining biomass pools allometrically, offers a simple approach to improving the characteristic behaviors of global models with the relatively sparse data that is available globally by forest inventory.
 This work was supported by a NASA ESS Fellowship to AW to visit LSCE, Saclay, France. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Thanks to Jen Johnson, Julia Pongratz, Long Cao, and Robert McKane, who provided model simulations for this work.