### Abstract

- Top of page
- Abstract
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
- References

The Cavalieri method is an unbiased estimator of the total volume of a body from its transectional areas on systematic sections. The coefficient of error (*CE*) of the Cavalieri estimator was predicted by a computer-intensive method. The method is based on polynomial regression of area values on section number and simulation of systematic sectioning. The measurement function is modelled as a quadratic polynomial, with an error term superimposed. The relative influence of the trend and the error component is estimated by techniques of analysis of variance. This predictor was compared with two established short-cut estimators of the *CE* based on transitive theory. First, all predictors were applied to data sets from six deterministic models with analytically known *CE*. For these models, the *CE* was best predicted by the older short-cut estimator and by the computer-intensive approach, if the measurement function had finite jumps. The best prediction was provided by the newer short-cut estimator when the measurement function was continuous. The predictors were also applied to published empirical datasets. The first data set consisted of 10 series of areas of systematically sectioned rat hearts with 10–13 items, the second data set consisted of 13 series of systematically sampled transectional areas of various biological structures with 38–90 items. On the whole, similar mean values for the predicted *CE* were obtained with the older short-cut estimator and the computer-intensive method. These ranged in the same order of magnitude as resampling estimates of the *CE* from the empirical data sets, which were used as a cross-check. The mean values according to the newer short-cut *CE* estimator ranged distinctly lower than the resampling estimates. However, for individual data sets, it happened that the closest prediction as compared to the cross-check value could be provided by any of the three methods. This finding is discussed in terms of the statistical variability of the resampling estimate itself.

### (1) Introduction

- Top of page
- Abstract
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
- References

Systematic sampling designs are of major importance in stereology. Equidistant windows of observation are placed into an object, whereby the location of one window determines the location of all other windows (Cochran, 1977). The classical example of one-dimensional systematic sampling is volume estimation by Cavalieri sampling, where the area contents *A*_{i} of *n* sectional planes through a body *K* with a distance *d* apart are estimated or measured along an arbitrarily orientated spatial axis (the *x*-axis) orthogonal to the planes. Let us denote the projected height of the body onto the *x*-axis as *h* and its volume as *V*. Provided that the location of one of the planes is selected uniformly at random in the interval [0,* d*), the Cavalieri estimator reads:

- (1)

For the sake of illustration, the origin of the *x*-axis may be identified, e.g. with the lower tangent point of the orthogonal planes to *K*. The Cavalieri estimator is unbiased for the volume estimation of bodies of arbitrary shape and anisotropy properties and is also highly efficient. Systematic sections usually lead to volume estimates with much less variance than a hypothetical volume estimate, which would result, e.g. from the product of *h* (supposed to be known) with the mean value of *A*_{i} taken from parallel sections through *K* with independent random positions along the *x*-axis. The squared coefficient of error (*CE*) in the latter case, which corresponds to a simple random sample, declines in the order of 1/*n*, whereas under the condition of systematic sampling, it declines usually in the order of 1/(*n*^{2}) − 1/(*n*^{4}), dependent on the geometry of the object. Moreover, systematic sampling is the natural approach and simpler to realize in applications, as compared to simple random sampling. In biomedical imaging techniques, for example, observations are recorded at registered equidistant planes perpendicular to a vertical axis by many devices by default (scans from computer axial tomography, nuclear magnetic resonance tomography, confocal laser scanning microscopy, etc.).

Although the Cavalieri estimator of the absolute volume of single objects is unbiased, the problem of predicting its *CE* from a given data set of areas alone is not solved in general. Various estimators (predictors) have been suggested for this purpose (e.g. Gundersen & Jensen, 1987; Mattfeldt, 1987, 1989; Cruz-Orive, 1997, 1999; Gundersen *et al*., 1999; García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana *et al*., 2003). In the present paper, a computer-intensive method for *CE* prediction of the Cavalieri estimator from a set of empirical data is presented. Polynomial regression methods are used to decompose the area series into a deterministic component and an error component. Systematic sectioning is simulated within the computer in a deterministic manner after fitting cubic splines to the data points. After taking into account the results of both steps, an estimate of the total *CE* of the Cavalieri estimator is obtained, that considers both the deterministic and the error component. In the present paper, the *CE* estimation by the computer-intensive method was compared to two short-cut estimators, which are based on a quadratic approximation to the estimated covariogram of the data (*CE*_{sh1}: Gundersen & Jensen, 1987; *CE*_{sh2}: Gundersen *et al*., 1999). Both estimators are based on transitive theory (Matheron, 1965, 1971). For the purpose of the comparisons, we use six analytical models of *f*(*x*) as well as previously published data sets of empirical areas of transected planes resulting from one-dimensional systematic sectioning designs (Mattfeldt, 1987: Table 1; Gundersen *et al*., 1999: data sets V1–V13).

Table 1. Heidelberg rat heart data. Resampling: 1 sample 2 overlapping subsamples. No. | *CE*_{sys} | *R*^{2} | *CE*_{ran} | *CE*_{res} | *CE*_{tot} | *CE*_{sh1} | *CE*_{sh2} |
---|

1 | 0.05944 | 0.98405 | 0.12110 | 0.05173 | 0.06086 | *0.04448 | 0.00994 |

2 | 0.04509 | 0.97055 | 0.12032 | 0.00663 | 0.04509 | 0.03780 | *0.00845 |

3 | 0.06391 | 0.98010 | 0.13307 | 0.06669 | *0.06579 | 0.04580 | 0.01024 |

4 | 0.06423 | 0.99385 | 0.12515 | 0.05654 | 0.06423 | *0.04909 | 0.01097 |

5 | 0.05294 | 0.99340 | 0.13756 | 0.02736 | 0.05294 | *0.04402 | 0.00984 |

6 | 0.05751 | 0.99520 | 0.12214 | 0.04890 | 0.05801 | *0.04423 | 0.00989 |

7 | 0.04730 | 0.98435 | 0.14278 | 0.03211 | 0.05043 | *0.03649 | 0.00816 |

8 | 0.07298 | 0.99730 | 0.12472 | 0.03653 | 0.07308 | *0.05498 | 0.01229 |

9 | 0.05451 | 0.97865 | 0.13857 | 0.05570 | *0.05756 | 0.04774 | 0.01067 |

10 | 0.04627 | 0.98170 | 0.12146 | 0.00756 | 0.04870 | 0.03639 | *0.00813 |

Mean | 0.05640 | 0.98592 | 0.12868 | 0.03897 | 0.05766 | 0.04410 | 0.00985 |

### (4) Discussion

- Top of page
- Abstract
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
- References

The results may be summarized as follows. The *CE* of the Cavalieri estimator was predicted by a computer-intensive method based on a polynomial regression procedure and simulation of systematic sectioning, and by two established short-cut estimators based on transitive theory (*CE*_{sh1}, *CE*_{sh2}). When applied to synthetic data sets from deterministic models with analytically known CEs, very similar predictions were obtained for all models by *CE*_{sh1} and *CE*_{tot}, whereas *CE*_{sh2} ranged much lower. For models *M*_{1} and *M*_{2}, the estimators *CE*_{sys} and *CE*_{sh1} gave better predictions than *CE*_{sh2} (Fig. 2a,b). For the models *M*_{3}, *M*_{4} and *M*_{6} the estimator *CE*_{sh2} provided the best prediction, whereas *CE*_{sys} and *CE*_{sh1} were too high (Fig. 2c,d,f). The better prediction of *CE*(*m*) by *CE*_{sh1} for models *M*_{1} and *M*_{2} is fully concordant with theoretical expectation, as these models are examples of 0-objects for which *CE*_{sh1} is most suitable. The better prediction by *CE*_{sh2} in case of models *M*_{3}, *M*_{4} and *M*_{6} is plausible, because these models are 1-objects for which the predictor *CE*_{sh2} is superior. In the case of *M*_{5}, *CE*_{sh1} was too high and *CE*_{sh2} was too low (Fig. 2e). The latter result is due to the fact that *M*_{5} is a counterexample to classical transitive theory. This model is only considered by the new fractional approach to variance estimation, as *m* is not an integer in this case (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana *et al*., 2003; Cruz-Orive & Geiser, 2004). On the whole, it was not possible to enhance the predictive accuracy of the short-cut estimators definitely by introducing the computer-intensive method. One might expect that this estimator would lie, due to its assumption-free nature, always closest to the analytical *CE*(*m*), thus more resemble *CE*_{sh1} for *M*_{1}–*M*_{2}, more resemble *CE*_{sh2} for *M*_{3}, *M*_{4} and *M*_{6}, and excel both in case of *M*_{5}. But this was evidently not the case: it behaved basically always very similarly as *CE*_{sh1}.

The same estimators were then applied to empirical datasets. Here reduced data sets were generated from the primary samples with increased sampling distance. On the whole, similar estimates were obtained for *CE*_{sh1} and *CE*_{tot} in both data sets, whereas the results by *CE*_{sh2} were distinctly lower. For both data sets, mean *CE*_{sh1} or *CE*_{tot} lay nearest to the resampling estimates, whereas mean *CE*_{sh2} appeared too low. Nevertheless in some individual area series of both data sets, *CE*_{sh2} happened to provide the best prediction. This was true for series with the lowest values of *CE*_{res} of both data sets. In this context one should note that the resampling estimate is, albeit qualified as a reference value, also an estimator of the *CE* derived from empirical data. The estimates of *CE*_{res} may vary strongly, because they may fall into the hills and valleys of the Zitterbewegung, too. Hence it is very difficult to obtain a robust prediction of the *CE* for an individual real case. As a recommendation for a short-cut estimator for our empirical data sets, one would prefer *CE*_{sh1} over *CE*_{sh2}. The latter produced the best predictions for some of the deterministic models and also for unique cases of the empirical data sets, but its mean value was too low. The computer-intensive estimator *CE*_{tot} also led to an augmented estimate of the *CE* as compared to *CE*_{sh2} for the empirical data sets. On the other hand, it was not possible to increase the accuracy of prediction further by using *CE*_{tot} instead of *CE*_{sh1} also in the case of empirical data. The estimator *CE*_{tot} seems rather robust to noisy data as its computation takes into account the noise component by definition. In comparison of *CE*_{sh1} and *CE*_{tot}, one must admit that the latter involves more computational workload. However, once the data set is entered in electronic form, the procedure is fully automatic, thus the human workload is not different. Also, computer-intensive procedures do not necessarily imply long execution times: the estimation of a Cavalieri *CE* by PC is performed within a few seconds for a single data set consisting of 10–20 area values. On the other hand, by simply simulating the systematic sectioning process, the computer-intensive approach does not involve advanced mathematics, such as transitive theory. All steps of its computation are easy to grasp. In the end, it is planned to make the programs described in this paper easily accessible for the PC user, who has only to prepare an ordered list of the cross-sectional areas as an ASCII file, and obtains *R*^{2} and a prediction of the *CE* of the volume estimate as output.

In the present study, Cavalieri sampling was simulated in a deterministic manner, whereas in the previous studies, it was simulated by Monte Carlo methods, where the startpoint of the systematic sample was placed uniformly at random into [0, 1), In the procedure here, the sectioning was simulated by distributing the startpoints in [0, 1), equidistantly, i.e. at (0, 0.01, 0.02, … , 0.99). The uniform distribution, albeit deterministic or random, is required anyway to guarantee unbiasedness when applying the Cavalieri estimator. According to the deterministic approach, the points are most homogeneously distributed. When 100 points are selected by Monte Carlo methods at random, it may happen that clusters of points concentrate randomly in some regions in [0, 1), and other regions are undersampled. The deterministic homogeneous distribution of points avoids this. Hence in repeated estimations of the *CE* on the basis of a given data set, the deterministic approach provides reproducibly the same value, which is not the case when Monte Carlo methods are used.

Selection of an explicite model for regression of the *A*_{i} on *x*_{i} remains essentially an arbitrary decision. In the present paper quadratic polynomial regression was applied. This model is guided by experience with the structure of human organs and cells; it should be realistic for a broad class of biological structures. The present approach is conceptually different from estimators of the *CE* in Cavalieri sampling based on transitive theory; nevertheless, there is a fundamental parallel. In both approaches a quadratic approximation is chosen, albeit at a different level. Using transitive theory, the approximation is performed to the covariogram of the data, whereas in this paper the approximation is performed to the data series itself. Recently the application of these two basic approaches – covariogram vs. measurement function – has been lively discussed (see Glaser, 2005; and reply: Cruz-Orive & García-Fiñana, 2005). It follows from eq. (7) that there is a close correspondence between the measurement function and the covariogram. According to common sense, both approaches appear plausible. Presently, the data are not sufficient to decide whether one of the two alternatives is superior to the other. The approaches have seldomly been compared. In general, the short-cut estimators based on transitive theory have been applied more often to empirical data, whereas prediction of the *CE* directly on the basis of the measurement function was only rarely used. The computational workload of the latter approach may play a role. In the present study, the older estimator based on transitive methods for 0-objects, unmodified for error/nugget variance, and computer-intensive prediction of the *CE* on the basis of data of the measurement function, yielded generally consonant results. A similar finding was reported earlier using Monte Carlo simulation of systematic sampling as compared to *CE*_{sh1} (Mattfeldt, 1989).

In the present study, systematic sampling was studied in synthetic deterministic data sets without error, and in empirical data sets from the real world with unknown error. The missing link between these approaches would be synthetic data with known error. Hence, as an improved model for real data, simulation studies of models with a deterministic component and a random error term superimposed could be helpful. The model equations as well as the relative strength of the error component would be known for such models, whereas they are never exactly known for empirical data. For the synthetic models with error components it is difficult or impossible to obtain analytical results for the *CE*(*m*), as in the models *M*_{1}–*M*_{6}. However, it is feasible to produce very long series of simulated *A*_{i} values (e.g. some hundreds or thousands), and to produce shorter subsamples leading to resampling estimates, just as it was performed with the empirical data. This would be a further challenge for the *CE* estimators, which could be studied in more depth in this setting.

The present study was restricted to a comparison of a computer-intensive method with the two most popular estimators derived from classical transitive theory. As mentioned above, further estimators of the *CE* for 0-objects and 1-objects are available (Gundersen *et al*., 1999; Kiêu *et al*., 1999). Further improved estimators based on transitive theory have been presented, where it was attempted to overcome the dichotomy between 0-objects and 1-objects (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana *et al*., 2003; Cruz-Orive & Geiser, 2004). Essentially, the concept of a fractional trend of the variance in Cavalieri sampling has been considered, which allows a more flexible characterization of real objects. Instead of a binary classification into the classes *m =* 0 or *m =* 1, the new approach leads to a characterization of the object in terms of a continuous smoothness constant *q*∈[0, 1]. For *q = m =* 0 and *q = m =* 1, the well-known estimators (eq. 6a, 6b) result with the constant factors α *=* 1/12 and α *=* 1/240, respectively. For continuous *q*-values between 0 and 1, the factor α(*q*) is a nonlinear function of *q*, which can be computed directly, or read off the graphical plot of α(*q*) in (García-Fiñana *et al*., 2003; García-Fiñana & Cruz-Orive, 2004). The factor α(*q*) lies between 1/12 and 1/240; for example, one finds α(*q*) ≈ 1/45 for *q =* 1/2 (García-Fiñana *et al*., 2003, 2004). Provided that data from at least 5 sections are available, *q* and hence α(*q*) can be estimated from real data sets (García-Fiñana *et al*., 2003; García-Fiñana & Cruz-Orive, 2004). Thus, fine-tuning of the *CE* predictor with adaptation to individual data sets becomes possible. These methods will hopefully further increase the accuracy of variance prediction in Cavalieri sampling. For example, Tables 1 and 2 show several cases where the resampling error lies between *CE*_{sh1 } and *CE*_{sh2}. For these cases, it would be worth while to explore whether application of the fractional theory could provide a better prediction. This analysis, however, was beyond the scope of the present study, which was focused mainly on a comparison of a computer-intensive approach with the established short-cut predictors. In future simulation studies of deterministic models with superimposed error, it is planned to include all transitive short-cut estimators based on *m*-classification as well as the new method based on *q*-estimation.