Using fractal self‐similarity to increase precision of shrub biomass estimates

Abstract We show that aerial tips are self‐similar fractals of whole shrubs and present a field method that applies this fact to improves accuracy and precision of biomass estimates of tall‐shrubs, defined here as those with diameter at root collar (DRC) ≥ 2.5 cm. Power function allometry of biomass to stem diameter generates a disproportionate prediction error that increases rapidly with diameter. Thus, biomass should be modeled as a single measure of stem diameter only if stem diameter is less than a threshold Dmax. When stem diameter exceeds Dmax, then the stem internode should be treated as a conic frustrum requiring two additional measures: a second, node‐adjacent diameter and a length. If the second diameter is less than Dmax, then the power function allometry can be applied to the aerial tip; otherwise an additional internode is measured. This “two‐component” allometry—internodes as frustra and aerial tips as shrubs—can reduce estimated biomass error propagated to the plot‐level by as much as 50% or more where very large shrubs are present Dmax is any diameter such that the ratio of single‐component to two‐component uncertainty exceeds the ratio of two‐component to single‐component measurement time. Guidelines for estimating Dmax based on pilot field data are provided. Tall shrubs are increasing in abundance and distribution across Arctic, alpine, boreal, and dryland ecosystems. Estimating their biomass is important for both ecological studies and carbon accounting. Reducing field‐sample prediction error increases precision in multi‐stage modeling because additional measures efficiently improve plot‐level biomass precision, reducing uncertainty for shrub biomass estimates.

While not ideal, we'll use a subset of the data used to construct the model above to calculate RMSE and 95%PI of wet-mass. Later, we'll do the same for the two-component model and compare them for accuracy and precision, recognizing that these are data from the training dataset.

Two-component Shrub Biomass Allometry
This is the new, two component approach with one allometric equation for aerial tips based on diameter and a second for internodes based on their volume as approximated by a conic frustrum.

Internode allometry
If each internode is considered as a conic frustrum with volume calculated using the length of the internode (L) and its two end diameters (D i ,) then volume is Previous analysis shows that field mass can be regressed on volume in a linear fashion with a slope very close to the measured density of species-specific alder wood; however, the residuals around the best-fit line are unequal across field-masses. The log-log regression is homoscedastic in residuals and so is preferred. The log-log regression also has the nice feature that its regression coefficient, i.e., the slope p equivalent to the power on volume, is essentially one (see below), indicating that it is a linear relationship between an internode's mass and volume as frustrum; the intercept of the log-log regression a as a power of e is the wood's wet density: density = e a . So the log-log regression of field-mass (f rust) on V is essentially where e a = e −0.09060 = 0.91 kg L −1 is an estimate of the density of the wet alder wood and p = 1.00071 ≈ 1, so that This linear allometric relation has the nice feature that it passes naturally through the origin and offers a high R 2 = 0.97 without forcing through the origin. The residuals of the log-log regression are normally distributed and so lognormal on the arithmetic scale. where tip is field-mass, D diameter, with a = −2.6504 and p = 2.4149 the intercept and regression slope, the latter an allometric scaling constant within 0.8 standard error units of 2.5, a multiple of 0.25 considered a universal scaling multiple (West 2017). Again, the residuals of the log-log regression are normally distributed and so lognormal on the arithmetic scale.

Ben Bolker's Ecological Models and Data in R (page 137) and Wikipedia's Lognormal entry
and mean(y) =ŷ = e µ+(σ 2 )/2 = e µ e σ 2 /2 or ln(ŷ) = µ + σ 2 /2 rearranged as So in terms of a regression involving lognormal residuals, the expected value of field-mass is E[Y ] =ŷ with ln(E[Y ]) = ln(ŷ) meaning the distribution of y in R can be represented as the predicted fit value minus one half the standard error of the residuals.
We know from basic statistical theory (Mendenhall, Sheaffer, and Wackerly 1981) that given any sum of random variables W = m i X i , the mean of the sum is the sum of the means, and the variance of the sum is the sum of the variances Thus, given s = T + F pieces of shrubs with T tips and F internodes in a plot, where each tip has expected mass given by the tip allometry (mass =t ip) and internodes have mass as density times the frustral volume (mass =f rust), then we sum the field-masses to get the total mass in a plot and sum the variances to get the variance. Apparently, the sum of lognormal distributions have no closed-formed distibution (Asmussen and Rojas-Nandayapa, 2009), so we simulate the distribution with Monte Carlo sampling.
The mean is straightforward; The variance is messier than the mean, but conceptually simple. Given one field measurement used to predict field-mass, the uncertainty in that predicted mass arises from both regression uncertainty in the parameters and from the uncertainty around the regression model (residuals). That is, the variance of sum of the masses depends on each model prediction, that in turn equals the sum of the regression variance (se 2 reg , dependent on the tip diameter or internode volume distance from their mean value used to construct the regression) and the residual variance (σ 2 ).
For an individual internode, the prediction variance depends on the volume V and the regression statistics (residuals variance = σ 2 , sample size = n, mean volume =V , variance of volumes = s 2 V ) with var(w pred ) =σ 2 + se 2 reg and se 2 reg =σ

Approach using R
To combine the two terms in the variance estimate for any given sampling situation, the approach here uses the base R function predict.lm() with argument se=TRUE. This provides both σ as residual.scale and se reg as se.fit in the output of the predict function. Thus the variance in the prediction estimate for any single predicted field-mass is the sum of the squared value of residual.scale and the squared value of se.fit. That is, var(w pred ) =σ 2 + se 2 reg = residual.scale 2 + se.fit 2 A 95% confidence interval for log-log regresions such as for tips would be: 95% log (CI) = a + p log(D) ± t n−2 (0.975) residual.scale 2 + se.fit 2 but in arithmetic space the lower limit is from which it's clear how error increases as a power function of diameter D with p ≥ 1. However, rather than use these formulas, which work well for single-component allometry and are conveniently calculated by predict.lm(), we'll use a Monte Carlo approach because we need to sum lognormally distributed random variables, and then find the distributions of these sums (which have no closed-form solution). We need this both for individual shrubs and sample-plots.
The recipes above give individual shrub uncertainities and their sums as plot-level uncertainties. But these first moments still beg the question of what distribution is the sum of all these lognormal distributions as the total weight W ? Even with the mean and the variance in hand, we still are not clear what distribution to sample from when we pass on the plot-level uncertainty to the lidar-assisted estimate of, say, landscape estimates of shrub biomass.

Approximating Individual Shrub Biomass Distributions
Again we'll use the eight alder dataset, but this time get each piece's estimate as expected log(we.mass), sd(residuals), and se(fit), then use a Monte Carlo method to estimate the distribution of each shrub as sum of lognormal distributions of tip and internode pieces.
1. Find the expected mean and se of each shrub piece's biomass recognizing that the field-mass of tip and internode estimates are lognormally distributed. 2. For tips and internodes, sample from lognormal distributions 10,000 times with mean given by the expected log of field-mass and prediction uncertainty given by residual.scale 2 + se.fit 2 . 3. For each of the 10,000 simulations, sum each individual shrub's pieces, find the mean, and the 0.025 and 0.975 quantile to give the point estimate and an estimate of propagated error.

One-component allometry estimates for whole shrubs using Monte Carlo simulation
For fair comparison we'll apply the Monte Carlo method to the single-compoinent allometry as well. First apply the single-component allometry.

Two-component allometry for whole shrubs with propagation of uncertainty in tips and internodes
First we prepare the dat8 file for tips and node allometry.

Simulating Plot-level Shrub Biomass Distributions with Monte Carlo Sampling
Next we'll use a dataset of shrubs measured in southcentral Alaska in 2019.

Error Propagation with Single-component Allometry at Sample-plot Scale
First we'll use the single-measure allometry model to estimate biomass by plot, propagating the errors of each shrub through to the sum. The algorithm uses Monte Carlo sampling of lognormal distributions for each individual shrub, summing these, and then finding the middle 95%CI of biomass estimates.
1. Find the expected mean and se of each shrub's biomass recognizing that the field-mass estimate is lognormally distributed. 2. For each shrub, sample from its lognormally distributed mass estimate 10,000 times with mean given by the expected log of field-mass and prediction uncertainty given by residual.scale 2 + se.fit 2 . 3. For each sample-plot of 10,000 simulations, sum the shrub masses, find the mean, and the 0.025 and 0.975 quantile to give the point estimate and an estimate of propagated error.

Error Propagation with Two-component Allometry at Sample-plot Scale
Here we simulate the lognormal distribution of each shrub piece using the two-component allometry of frustra and tips following these steps: 1. Find the expected mean and se of each shrub piece's biomass recognizing that the field-mass of tip and internode estimates are lognormally distributed. 2. For tips and internodes sample from lognormal distributions 10,000 times with mean given by the expected log of field-mass and prediction uncertainty given by residual.scale 2 + se.fit 2 . In addition, the two-component method is more precise, since the difference in the log of 95%CI range (i.e., uncertainty) is significant.