Estimating the wood density of species for carbon stock assessments


Correspondence author. E-mail:


1. Studies of forest carbon stocks and fluxes rely on estimates of specific wood density to convert tree diameter and height measurements made within permanent plots into carbon stock estimates. However, measurements of wood density are often available for only a subset of species. As there is strong phylogenetic trait conservatism of wood density, missing data are usually estimated by averaging the wood densities of other species within the same genus or family, using whatever data are available locally.

2. The Global Wood Density (GWD) database ( provides wood densities for 8412 species from around the world, providing an opportunity to utilize data from further afield when faced with missing values in a study area. We investigated whether the GWD provides better estimates than local data sets when conventional averaging methods (AM) are used. Secondly, we develop Hierarchical Bayesian Models (HBM) that incorporate phylogenetic covariance to estimate missing wood densities.

3. Using AM, we found that correlations between observations and estimates were higher when the GWD was used in place of local data sets, mostly because of larger sample size. Missing wood densities were more accurately estimated from the global data set than from local data sets, indicating that the GWD should be used as a common standard when calculating carbon stocks.

4. Estimates based on including phylogenetic dependency in HBMs were also closely correlated to observations, but were no better than those obtained from the simpler AM. Estimations based on HBM could become more useful when phylogenetic trees resolved to the species level are available. Until such improvements are made, we conclude that building more data into the reference data set, rather than improving the method itself, is the most productive way to refine estimates of unknown wood densities.


Forests are thought to contain about half of the carbon in terrestrial biomes (Prentice et al. 2001). Tropical forests alone absorb 1·3 billion tons of carbon more than they lose per year (Lewis et al. 2009). However, deforestation is estimated to contribute about 15% of global emissions of CO2 (Quere et al. 2009; van der Werf et al. 2009), and these fluxes are central to policy debates regarding forest management in the context of climate change. One limitation is that all fluxes are estimated with a large degree of uncertainty. The estimates are derived from measurements of diameters and heights of individual trees in forest inventory plots, which are converted to carbon stocks using standard formulae, summed across all trees within a plot, then scaled up to provide regional and global estimates (Phillips et al. 1998; Chambers et al. 2001; Coomes et al. 2003; Baker et al. 2004a; Chave et al. 2004; Lewis et al. 2009). Information on the specific wood density (ρ, the dry mass of a unit volume of fresh wood) of trees is needed to convert wood volume measurements (i.e. tree heights and diameters) into carbon estimates, and it is recognized that variability in ρ is a critical factor influencing these estimates (Baker et al. 2004a; Chave et al. 2004; Nogueira et al. 2005).

Specific wood density information is often available for only a proportion of stems recorded in forest inventory plots, especially in species-rich forests (c. 60% for instance in Lewis et al. 2009), giving rise to questions about how to estimate missing wood densities. The conventional approach is to use whatever wood density information is available from the study region to fill in the gaps: species are assigned averages calculated from other species within their genus or family, or an overall average is used if no data are available from closely related species (Slik 2006). This averaging method (AM) has been justified by the strong phylogenetic conservatism observed in wood density data sets, meaning that closely related species have more similar wood densities than distantly related species (Baker et al. 2004b; Chave et al. 2006; Swenson & Enquist 2007). However, a rigorous assessment of the approach is lacking. A recent compilation of wood density information to produce a global wood density data set (Chave et al. 2009; Zanne et al. 2009, GWD) provides an opportunity to improve upon existing approaches. Can the accuracy of carbon stock estimates be improved by drawing on information available at the global scale from closely related species? Should more sophisticated approaches be used to calculate the mean wood density of closely related species rather than simply averaging within taxonomic groups? Specifically, can we use phylogenetic relatedness obtained from mega-trees when estimating missing values? The answers to these questions are not obvious because wood density correlate with environmental factors (Swenson & Enquist 2007), and regional floras are variedly endemic, such that including information from a wide geographical area may blur rather than sharpen these estimates.

Here, we combine wood density information in the Global Wood Density (GWD; Zanne et al. 2009) with phylogenetic relatedness information based on super-trees at the family and the genus levels (Chaw et al. 2000; Davies et al. 2004) to assess the effectiveness of alternative approaches to wood density estimation. We use a Bayesian framework to make estimates based on phylogenetic relatedness, because it enables the incorporation of prior knowledge and phylogenetic dependence, and also readily models missing data from actual observations (Banerjee et al. 2003; Wikle 2003; Clark et al. 2005).

Material and methods

We describe briefly the standard AM, then describe our approach for testing whether more accurate estimates are obtained by using a regional or global reference data set. Then, we describe a general method for estimating missing data using hierarchical models which incorporate phylogenetic information. Finally, we explain how we used this approach to estimate missing wood densities in the Oceania region, and how the accuracy of the estimates were evaluated.

The averaging method

Suppose that a particular species s in a given genus g and family f has an unknown wood density ρ. We use a reference data set (R) to estimate ρ from one of the following rules: (i) if R contains species in the genus g, then ρ is the average wood density of these species; (ii) if R contains species in genera within the family f, then ρ is the average wood density of these species; and (iii) if the family f is not represented in R, then ρ is the overall average of all measurements in R. The method can be extended to include additional taxonomic levels (e.g. averaging within an order), but is usually restricted to the above scheme, as little information is gained at the level of orders and above (Baker et al. 2004b; Chave et al. 2006).

Regional vs. global reference

Global Wood Density contains wood density measures for 8412 species in 1683 genera and 184 families taken in 16 regions of the world as defined in Chave et al. (2009). We evaluated – using the AM – whether it was better to use a regional or global reference (R) to estimate wood density. As we were interested in deriving species-level predictions, species with multiple records in the GWD were assigned the mean of measured wood densities.

Regional floras represented in the GWD have varying species richness (nregion) from 68 species for Europe to 2755 species for tropical South-East Asia. For a given region, we first selected 100 species at random from the regional pool in regions where nregion > 200, or nregion/2 in regions with fewer species (Europe, Madagascar and Oceania; Table 1). This method allowed us to account for differences in nregion, and to obtain sample sizes comparable with those in forest biomass studies. The random sample of species had densities ρobs. We pretended that we did not know these wood densities, and used the AM to estimate them with three different R: (i) the regional data set; (ii) the complete GWD; and (iii) 100 random samples of size nregion taken from the GWD. Using these three types of reference data set allowed us to compare the predictive power of a regional vs. global reference data set, and to test whether the advantage of using the GWD (if observed) was because of its greater size. In all analyses, the wood densities that we were estimating were excluded from R.

Table 1.   Comparison between estimated and observed wood densities within regional floras
rε (%)rGWDεGWD (%)rsubεsub (%)
  1. Mean Pearson's correlation coefficient (r) and mean relative error (ε) obtained from 50 randomly drawn samples of size nsample species within 16 regions (Chave et al. 2009) with data on nregion species each. Estimates were obtained using the averaging method with either the regional subset (r,ε), the complete Global Wood Density database (GWD, Zanne et al. 2009) (rGWD,εGWD) or random samples of GWD of size nregion (rsub,εsub) as a reference.

  2. †Regions where the regional reference gave superior correlations than the sampled GWD.

  3. ‡Regions where the GWD gave superior correlations.

Africa (tropical)6191000·62†160·65160·4521
Australia/PNG (tropical)9151000·74†170·74160·6220
Central America (tropical)3291000·52†270·70‡210·3832
North America2071000·70†150·82‡140·4227
South America5801000·59†220·67‡190·4325
South America (tropical)20041000·75†140·74150·6418
South-East Asia2061000·27280·66‡240·2533
South-East Asia (tropical)27551000·66170·68170·6119

Each of these methods provided estimates of wood density (ρest) which we compared with the observed values (ρobs) using Pearson's correlation coefficient. We also calculated the relative error (ε) as a measure of the deviation of estimates from observed values:


In each region, we repeated the sampling and testing procedure using 50 different randomly chosen species samples. The complete procedure thus produced 2400 comparisons (16 regions × 50 replicates × 3 references).

Hierarchical models using phylogenetic similarity

We developed Hierarchical Bayesian Models (HBM) of wood density for a data set comprised of ns species in ng genera in nf families, including observed and missing data ρ = (ρmis,ρobs). In the Bayesian framework, estimates of missing data can be readily produced using the predictive distribution (Gelman et al. 2004): p(ρmis | θ,ρobs) once model parameters (θ) have been estimated conditional on observations. Estimation of missing data therefore benefits from having a large quantity of information available.

We tested three different models that varied in the level of phylogenetic information included. The models are based on a special case of the following model:


where ρijk is the wood density of species i in genus j and family k, μ is the grand mean and αjk and βk are nested genus and family effects. At the species level, wood density is the sum of a random species-specific effect and a nested genus effect. A similar specification defines the genus effect at the genus level. At the family, the family effect is modelled as the sum of a constant mean and a random effect. We assumed that si, gj and fk were normally distributed:


where inline image and inline image are unknown variance parameters. The components of the mean vector, μ, were normally distributed around the mean wood density in the entire reference data set, inline image: inline image.

The three models that we tested differed with respect to the specification of gj and fk. The most simple model (MI) assumed independence at the genus and family levels, in which case Σg = Σg = I. The second model (MΦF) assumed covariation owing to phylogenetic relatedness at the family level, and independence among genera within families, in which case


where Φf measures phylogenetic distance across families (see next section). The third model (MΦG) assumed covariation because of phylogenetic relatedness at the genus level. In this case, Σg = Φg, and Φg measures phylogenetic distance across genera within and across families. The family level is absent in this third model because phylogenetic relatedness was accounted for in the covariance calculated at the genus level. We expected MΦG to yield better estimates of wood density compared with MΦF, and MΦF compared with MI.

The set of parameters for the models was thus inline image. The posterior distribution of parameters was estimated using Markov chain Monte carlo (MCMC) methods in JAGS (Plummer 2003): p(θ | ρ,X), where X represents explicative variables, here taxonomical levels. Missing data were automatically estimated from the posterior predictive distribution p(ρmis | θ,ρobs). Standard deviation parameters were assigned uninform priors: inline image (see Data S2, Supporting information). We first ran a burn-in phase (30 000 iterations), checked that convergence was reached and then obtained a sample of estimates for missing wood densities (20 000 iterations). For each species, the posterior mean served as our estimate of the unknown wood density.

Oceania: a hierarchical modelling case study

We selected the Oceania region as a case study for hierarchical modelling of wood density. In this analysis, the reference data set R was a subset of GWD containing the 35 families represented in Oceania. We first built a supertree for these families using the trees of Chaw et al. (2000) for Gymnosperms families and Davies et al. (2004) for Angiosperms families, and ensuring that family names matched those in the Angiosperm Phylogeny Group classification (Angiosperm Phylogeny Group III 2009). This second supertree served to estimate the covariance among families in the model MΦF. Secondly, the relationships between genera within families were resolved for 14 families using published phylogenies (see Data S1, Supporting information). In the remaining families, genera were all branched on the family level node. Overall, the second supertree had 534 genera as tips and 242 internal nodes, among which 70% were bifurcations (nodes with two descending clades). Of the 61 genera contained in the regional data set for Oceania, 52 (92 species) could be placed on this second supertree and were thus retained for the case study. Thirdly, we calculated covariance matrices of size corresponding to the number of tips in each of the two supertrees, and used them in the hierarchical models MΦF and MΦG described in the previous section. The covariance between two genera i and j (Cij) was calculated as the inverse of the phylogenetic distance between them (dij): Cij∝(1/dij). Phylogenetic distances were calculated from branch lengths estimated following Grafen (1989). This approach conforms to a Brownian model of trait evolution (Hansen & Martins 1996). Other evolutionary models could be tested, but this is beyond the scope of our study.

As with AM, we pretended that the wood densities of the 92 retained species were unknown, and used the HBMs to estimate these values. We calibrated the three hierarchical models presented above and compared the hierarchical and averaging approaches using Pearson correlation coefficient and average relative error.


Regional vs. global reference using AM

We found that correlation between observed and estimated wood densities was higher in 687 of 800 tested cases (i.e. 16 regions × 50 random subsamples) when we used the complete GWD as our reference data set instead of the regional data set. In four regions with high species richness (inline image), regional references gave similarly accurate predictions compared with the complete global reference (tropical Africa, Australia/PNG, tropical South-East Asia and South America; Table 1). Relative errors were consistent with correlations favouring the global data set in most cases (Table 1). Overall, the relative error on estimates was 17% with the complete global reference compared with 20% with the regional reference data set (Table 2). As expected, mean relative errors were lower when missing values were assigned averages within genera compared with within families, and within families compared with within the overall reference (Table 2): the relative error was 16% when missing wood densities were estimated by averages within genera, 24% within families and 30% when they were estimated as the average of the complete GWD (Table 2).

Table 2.   Mean relative error (ε, in %) in wood density estimates produced with the averaging method (n = 800 simulations)
  1. Wood densities were estimated using averages of species within the same genus, or same family when possible, or ‘All woody plants’ when missing values were assigned the average trait value in the complete reference data set (R). Three different types of reference data sets were used: the regional subset, the complete Global Wood Density database (GWD, Zanne et al. 2009) or random subsets of GWD of size similar to the regional reference.

All woody plants283028

Using the GWD was advantageous simply because it provided a large data set for the AM. Indeed, when the GWD was sampled to obtain a reference data set that was identical in size to the regional reference, the regional data gave better correlation in 734 of 800 cases (Table 1; Fig. 1). Correlations strongly decreased when a subset was used in place of the complete GWD, especially in regions with low species richness (Europe, Madagascar; Table 1). Only in Australia, South-East Asia and tropical South-East Asia were the correlations between observations and estimates similar with the regional and the sampled global reference (Table 1).

Figure 1.

 Pearson's correlation coefficients between estimated and observed wood density using the averaging method with either the entire GWD (Zanne et al. 2009) as the reference, rP (GWD), a regional data set, rP (region) or the GWD sampled to the regional data set size, rP (sampled GWD). The panels show the results of the estimation of woody density for 50 random subsets of each 16 regional flora (listed in Table 1): scatterplots showing pairwise comparisons (left column) and boxes showing horizontal and vertical interquantile ranges within regions (right column); solid lines extend to extreme values in each region.

Oceania: a hierarchical modelling case study

Posterior estimates of model parameters for wood densities in the Oceania region are given in Table S1 (Supporting information). We expected lower estimated variability at the species level in MΦG compared with MΦF and in MΦF compared with MI. Surprisingly, estimates were similarly scattered at the species in all three models (σs; Table S1) even though phylogenetic information was used to estimate mean effects at the family or the genus level.

Estimates produced by the three HBM were significantly correlated to observed wood densities (Table 3 and Fig. 2), but the models yielded very similar correlations to the observed data as the AM approach (robs−MI = 0·52, robs−MΦF = 0·53, robs−MΦG = 0·50). Relative errors between observations and estimates were also similar (Table 3). Phylogenetic models (MΦF, MΦG) produced better estimates than AM and MI for wood densities that were estimated by the average of the entire reference data set (Table 3), but this concerned very few observations (n = 3; bsl00084 in Fig. 2c).

Table 3.   Pearson correlation coefficients (r) and mean relative errors (ε) in wood density estimates for 92 species from the Oceania region
  1. Four methods are compared: the averaging method (AM), and hierarchical Bayesian models assuming independence within taxonomic levels (MI), phylogenetic covariance among families (MΦF) and phylogenetic covariance among genera (MΦG). Level refers to the taxonomical level at which wood densities were averaged to produce estimates using AM and n refers to the number of estimated values at the corresponding level.

All woody plants3r
Figure 2.

 Relationships between observed and estimated wood densities of species from Oceania. Results from three estimating methods are presented: (a) Averaging method, (b) hierarchical Bayesian models based on independence between families (MI) and (c) accounting for phylogenetic relationship among families (MΦF). Symbols and colours indicate the averaging level with respect to the reference data set: genus (○), family (□) or whole sample level (bsl00084). Dots represent the 1:1 line. Horizontal lines show the mean wood density in the GWD data set. Ticks along axes represent the distribution of the corresponding variable.


The GWD database is a powerful resource for estimating unknown wood densities. The data base improves the accuracy of regional wood density estimates because, by virtue of its size, it contains information on genera/family that may be poorly represented at the regional scale. For example, the global reference produced much better estimates for south-east Asia than the regional reference, because the regional data set had only one estimate of wood density for 128 of the 162 genera present, so there were clear advantages to including genus-level data from other regions. As the size of the regional data set increases, the advantages of using the GWD diminishes, and the regional data set can even provide better estimates. For example, wood densities of closely related species are often lower in tropical than temperate regions (e.g. Coomes & Bellingham 2010), so using data from temperate regions to estimate tropical wood densities will introduce inaccuracies. Indeed, we found that regional data bases of wood densities outperformed global data bases of comparable size, indicating that the primary benefit of using the GWD is that it represents a wide range of species within genera.

The three HBM performed similarly to the simpler AM. When independence was supposed across clades, the model (MI) actually reproduced the averaging approach within a nested model. But it allowed for a different weighting scheme depending on the magnitude of the variance components. Posterior estimates evidenced different level of variability within the different taxonomical levels, but the model did not perform better than AM. We had anticipated that the phylogenetic models (MΦF and MΦG) would produce better estimates than the AM and MI. Indeed, phylogenetic conservatism implies that the mean trait values of closely related clades are closer than for distant clades. In previous work based on large data sets, wood density exhibited a strong taxonomical signal, and a large part of the total variability in wood density occurred at the genus level (Baker et al. 2004a; Chave et al. 2006) or above: for instance, wood density varies widely in the large Fabaceae family but shows conservatism within subclades (Swenson & Enquist 2007). Thus, information should be gained by making wood density covary among clades according to phylogenetic relatedness.

One reason why HBMs did not outperform AM is that, despite strong phylogenetic conservatism, there is still considerable variability within families and within genera, in addition to small differences on averages and overlap across clades. These factors make it difficult to predict wood density better using phylogeny than the simpler AM. Phylogenetic models performed better when missing data concern species in families for which no data were available, which is likely for rare families. In such cases, the phylogenetic approach helps estimate wood density based on observations in closely related families, while accounting for the degree of relatedness. However, the number of observations of this type was low in our case study, so that further work is needed to clarify this issue. Overall, the similar performance of the three models suggest that the accuracy of wood density estimation is presently more reliant on the quality and extent of the reference data set than on the estimation method. We recommend using simple AM for estimating wood densities, but suggest that more complex approaches may come into their own in the near future, as more detailed phylogenies become available and the computational costs of Bayesian statistics are reduced.

We believe the proposed methodology using HBM paves the way for other standardized approaches for estimating forest biomass and carbon stocks. Numerous global data sets are becoming available to ecologists and phylogenetic information is increasingly available at various resolution and taxonomic levels. Meanwhile, Bayesian statistics and computing tools are well-developed such that hierarchical models can be easily specified and calibrated for large quantities of information (Ronquist 2004). Our approach is general and can be applied to estimate other conserved traits. In the case of wood density estimation, our results show that it is often better to estimate missing values using the complete global data base as a reference, although regional data sets of sufficient sample size can outperform the GWD. In addition, we showed that the AM performs as well as HBM, but real improvement could come from species-level phylogenies. We believe that more effort should be put to get supertrees resolved up to species level, as initiated by Sanderson et al. (2008) for instance. We also encourage readers to add data to the global wood density data set (Zanne et al. 2009).


We wish to thank Andrew J. Tanenzap for useful comments on the article, and anonymous referees who helped improve the first version of the article.