- Top of page
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
The Cavalieri method is an unbiased estimator of the total volume of a body from its transectional areas on systematic sections. The coefficient of error (CE) of the Cavalieri estimator was predicted by a computer-intensive method. The method is based on polynomial regression of area values on section number and simulation of systematic sectioning. The measurement function is modelled as a quadratic polynomial, with an error term superimposed. The relative influence of the trend and the error component is estimated by techniques of analysis of variance. This predictor was compared with two established short-cut estimators of the CE based on transitive theory. First, all predictors were applied to data sets from six deterministic models with analytically known CE. For these models, the CE was best predicted by the older short-cut estimator and by the computer-intensive approach, if the measurement function had finite jumps. The best prediction was provided by the newer short-cut estimator when the measurement function was continuous. The predictors were also applied to published empirical datasets. The first data set consisted of 10 series of areas of systematically sectioned rat hearts with 10–13 items, the second data set consisted of 13 series of systematically sampled transectional areas of various biological structures with 38–90 items. On the whole, similar mean values for the predicted CE were obtained with the older short-cut estimator and the computer-intensive method. These ranged in the same order of magnitude as resampling estimates of the CE from the empirical data sets, which were used as a cross-check. The mean values according to the newer short-cut CE estimator ranged distinctly lower than the resampling estimates. However, for individual data sets, it happened that the closest prediction as compared to the cross-check value could be provided by any of the three methods. This finding is discussed in terms of the statistical variability of the resampling estimate itself.
- Top of page
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
Systematic sampling designs are of major importance in stereology. Equidistant windows of observation are placed into an object, whereby the location of one window determines the location of all other windows (Cochran, 1977). The classical example of one-dimensional systematic sampling is volume estimation by Cavalieri sampling, where the area contents Ai of n sectional planes through a body K with a distance d apart are estimated or measured along an arbitrarily orientated spatial axis (the x-axis) orthogonal to the planes. Let us denote the projected height of the body onto the x-axis as h and its volume as V. Provided that the location of one of the planes is selected uniformly at random in the interval [0, d), the Cavalieri estimator reads:
For the sake of illustration, the origin of the x-axis may be identified, e.g. with the lower tangent point of the orthogonal planes to K. The Cavalieri estimator is unbiased for the volume estimation of bodies of arbitrary shape and anisotropy properties and is also highly efficient. Systematic sections usually lead to volume estimates with much less variance than a hypothetical volume estimate, which would result, e.g. from the product of h (supposed to be known) with the mean value of Ai taken from parallel sections through K with independent random positions along the x-axis. The squared coefficient of error (CE) in the latter case, which corresponds to a simple random sample, declines in the order of 1/n, whereas under the condition of systematic sampling, it declines usually in the order of 1/(n2) − 1/(n4), dependent on the geometry of the object. Moreover, systematic sampling is the natural approach and simpler to realize in applications, as compared to simple random sampling. In biomedical imaging techniques, for example, observations are recorded at registered equidistant planes perpendicular to a vertical axis by many devices by default (scans from computer axial tomography, nuclear magnetic resonance tomography, confocal laser scanning microscopy, etc.).
Although the Cavalieri estimator of the absolute volume of single objects is unbiased, the problem of predicting its CE from a given data set of areas alone is not solved in general. Various estimators (predictors) have been suggested for this purpose (e.g. Gundersen & Jensen, 1987; Mattfeldt, 1987, 1989; Cruz-Orive, 1997, 1999; Gundersen et al., 1999; García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003). In the present paper, a computer-intensive method for CE prediction of the Cavalieri estimator from a set of empirical data is presented. Polynomial regression methods are used to decompose the area series into a deterministic component and an error component. Systematic sectioning is simulated within the computer in a deterministic manner after fitting cubic splines to the data points. After taking into account the results of both steps, an estimate of the total CE of the Cavalieri estimator is obtained, that considers both the deterministic and the error component. In the present paper, the CE estimation by the computer-intensive method was compared to two short-cut estimators, which are based on a quadratic approximation to the estimated covariogram of the data (CEsh1: Gundersen & Jensen, 1987; CEsh2: Gundersen et al., 1999). Both estimators are based on transitive theory (Matheron, 1965, 1971). For the purpose of the comparisons, we use six analytical models of f(x) as well as previously published data sets of empirical areas of transected planes resulting from one-dimensional systematic sectioning designs (Mattfeldt, 1987: Table 1; Gundersen et al., 1999: data sets V1–V13).
Table 1. Heidelberg rat heart data. Resampling: 1 sample 2 overlapping subsamples.
|1||0.05944||0.98405||0.12110||0.05173|| 0.06086||*0.04448|| 0.00994|
|2||0.04509||0.97055||0.12032||0.00663|| 0.04509|| 0.03780||*0.00845|
|3||0.06391||0.98010||0.13307||0.06669||*0.06579|| 0.04580|| 0.01024|
|4||0.06423||0.99385||0.12515||0.05654|| 0.06423||*0.04909|| 0.01097|
|5||0.05294||0.99340||0.13756||0.02736|| 0.05294||*0.04402|| 0.00984|
|6||0.05751||0.99520||0.12214||0.04890|| 0.05801||*0.04423|| 0.00989|
|7||0.04730||0.98435||0.14278||0.03211|| 0.05043||*0.03649|| 0.00816|
|8||0.07298||0.99730||0.12472||0.03653|| 0.07308||*0.05498|| 0.01229|
|9||0.05451||0.97865||0.13857||0.05570||*0.05756|| 0.04774|| 0.01067|
|10||0.04627||0.98170||0.12146||0.00756|| 0.04870|| 0.03639||*0.00813|
|Mean||0.05640||0.98592||0.12868||0.03897|| 0.05766|| 0.04410|| 0.00985|
- Top of page
- (1) Introduction
- (2) Methods
- (3) Results
- (4) Discussion
The results may be summarized as follows. The CE of the Cavalieri estimator was predicted by a computer-intensive method based on a polynomial regression procedure and simulation of systematic sectioning, and by two established short-cut estimators based on transitive theory (CEsh1, CEsh2). When applied to synthetic data sets from deterministic models with analytically known CEs, very similar predictions were obtained for all models by CEsh1 and CEtot, whereas CEsh2 ranged much lower. For models M1 and M2, the estimators CEsys and CEsh1 gave better predictions than CEsh2 (Fig. 2a,b). For the models M3, M4 and M6 the estimator CEsh2 provided the best prediction, whereas CEsys and CEsh1 were too high (Fig. 2c,d,f). The better prediction of CE(m) by CEsh1 for models M1 and M2 is fully concordant with theoretical expectation, as these models are examples of 0-objects for which CEsh1 is most suitable. The better prediction by CEsh2 in case of models M3, M4 and M6 is plausible, because these models are 1-objects for which the predictor CEsh2 is superior. In the case of M5, CEsh1 was too high and CEsh2 was too low (Fig. 2e). The latter result is due to the fact that M5 is a counterexample to classical transitive theory. This model is only considered by the new fractional approach to variance estimation, as m is not an integer in this case (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003; Cruz-Orive & Geiser, 2004). On the whole, it was not possible to enhance the predictive accuracy of the short-cut estimators definitely by introducing the computer-intensive method. One might expect that this estimator would lie, due to its assumption-free nature, always closest to the analytical CE(m), thus more resemble CEsh1 for M1–M2, more resemble CEsh2 for M3, M4 and M6, and excel both in case of M5. But this was evidently not the case: it behaved basically always very similarly as CEsh1.
The same estimators were then applied to empirical datasets. Here reduced data sets were generated from the primary samples with increased sampling distance. On the whole, similar estimates were obtained for CEsh1 and CEtot in both data sets, whereas the results by CEsh2 were distinctly lower. For both data sets, mean CEsh1 or CEtot lay nearest to the resampling estimates, whereas mean CEsh2 appeared too low. Nevertheless in some individual area series of both data sets, CEsh2 happened to provide the best prediction. This was true for series with the lowest values of CEres of both data sets. In this context one should note that the resampling estimate is, albeit qualified as a reference value, also an estimator of the CE derived from empirical data. The estimates of CEres may vary strongly, because they may fall into the hills and valleys of the Zitterbewegung, too. Hence it is very difficult to obtain a robust prediction of the CE for an individual real case. As a recommendation for a short-cut estimator for our empirical data sets, one would prefer CEsh1 over CEsh2. The latter produced the best predictions for some of the deterministic models and also for unique cases of the empirical data sets, but its mean value was too low. The computer-intensive estimator CEtot also led to an augmented estimate of the CE as compared to CEsh2 for the empirical data sets. On the other hand, it was not possible to increase the accuracy of prediction further by using CEtot instead of CEsh1 also in the case of empirical data. The estimator CEtot seems rather robust to noisy data as its computation takes into account the noise component by definition. In comparison of CEsh1 and CEtot, one must admit that the latter involves more computational workload. However, once the data set is entered in electronic form, the procedure is fully automatic, thus the human workload is not different. Also, computer-intensive procedures do not necessarily imply long execution times: the estimation of a Cavalieri CE by PC is performed within a few seconds for a single data set consisting of 10–20 area values. On the other hand, by simply simulating the systematic sectioning process, the computer-intensive approach does not involve advanced mathematics, such as transitive theory. All steps of its computation are easy to grasp. In the end, it is planned to make the programs described in this paper easily accessible for the PC user, who has only to prepare an ordered list of the cross-sectional areas as an ASCII file, and obtains R2 and a prediction of the CE of the volume estimate as output.
In the present study, Cavalieri sampling was simulated in a deterministic manner, whereas in the previous studies, it was simulated by Monte Carlo methods, where the startpoint of the systematic sample was placed uniformly at random into [0, 1), In the procedure here, the sectioning was simulated by distributing the startpoints in [0, 1), equidistantly, i.e. at (0, 0.01, 0.02, … , 0.99). The uniform distribution, albeit deterministic or random, is required anyway to guarantee unbiasedness when applying the Cavalieri estimator. According to the deterministic approach, the points are most homogeneously distributed. When 100 points are selected by Monte Carlo methods at random, it may happen that clusters of points concentrate randomly in some regions in [0, 1), and other regions are undersampled. The deterministic homogeneous distribution of points avoids this. Hence in repeated estimations of the CE on the basis of a given data set, the deterministic approach provides reproducibly the same value, which is not the case when Monte Carlo methods are used.
Selection of an explicite model for regression of the Ai on xi remains essentially an arbitrary decision. In the present paper quadratic polynomial regression was applied. This model is guided by experience with the structure of human organs and cells; it should be realistic for a broad class of biological structures. The present approach is conceptually different from estimators of the CE in Cavalieri sampling based on transitive theory; nevertheless, there is a fundamental parallel. In both approaches a quadratic approximation is chosen, albeit at a different level. Using transitive theory, the approximation is performed to the covariogram of the data, whereas in this paper the approximation is performed to the data series itself. Recently the application of these two basic approaches – covariogram vs. measurement function – has been lively discussed (see Glaser, 2005; and reply: Cruz-Orive & García-Fiñana, 2005). It follows from eq. (7) that there is a close correspondence between the measurement function and the covariogram. According to common sense, both approaches appear plausible. Presently, the data are not sufficient to decide whether one of the two alternatives is superior to the other. The approaches have seldomly been compared. In general, the short-cut estimators based on transitive theory have been applied more often to empirical data, whereas prediction of the CE directly on the basis of the measurement function was only rarely used. The computational workload of the latter approach may play a role. In the present study, the older estimator based on transitive methods for 0-objects, unmodified for error/nugget variance, and computer-intensive prediction of the CE on the basis of data of the measurement function, yielded generally consonant results. A similar finding was reported earlier using Monte Carlo simulation of systematic sampling as compared to CEsh1 (Mattfeldt, 1989).
In the present study, systematic sampling was studied in synthetic deterministic data sets without error, and in empirical data sets from the real world with unknown error. The missing link between these approaches would be synthetic data with known error. Hence, as an improved model for real data, simulation studies of models with a deterministic component and a random error term superimposed could be helpful. The model equations as well as the relative strength of the error component would be known for such models, whereas they are never exactly known for empirical data. For the synthetic models with error components it is difficult or impossible to obtain analytical results for the CE(m), as in the models M1–M6. However, it is feasible to produce very long series of simulated Ai values (e.g. some hundreds or thousands), and to produce shorter subsamples leading to resampling estimates, just as it was performed with the empirical data. This would be a further challenge for the CE estimators, which could be studied in more depth in this setting.
The present study was restricted to a comparison of a computer-intensive method with the two most popular estimators derived from classical transitive theory. As mentioned above, further estimators of the CE for 0-objects and 1-objects are available (Gundersen et al., 1999; Kiêu et al., 1999). Further improved estimators based on transitive theory have been presented, where it was attempted to overcome the dichotomy between 0-objects and 1-objects (García-Fiñana & Cruz-Orive, 2000a, 2000b, 2004; García-Fiñana et al., 2003; Cruz-Orive & Geiser, 2004). Essentially, the concept of a fractional trend of the variance in Cavalieri sampling has been considered, which allows a more flexible characterization of real objects. Instead of a binary classification into the classes m = 0 or m = 1, the new approach leads to a characterization of the object in terms of a continuous smoothness constant q∈[0, 1]. For q = m = 0 and q = m = 1, the well-known estimators (eq. 6a, 6b) result with the constant factors α = 1/12 and α = 1/240, respectively. For continuous q-values between 0 and 1, the factor α(q) is a nonlinear function of q, which can be computed directly, or read off the graphical plot of α(q) in (García-Fiñana et al., 2003; García-Fiñana & Cruz-Orive, 2004). The factor α(q) lies between 1/12 and 1/240; for example, one finds α(q) ≈ 1/45 for q = 1/2 (García-Fiñana et al., 2003, 2004). Provided that data from at least 5 sections are available, q and hence α(q) can be estimated from real data sets (García-Fiñana et al., 2003; García-Fiñana & Cruz-Orive, 2004). Thus, fine-tuning of the CE predictor with adaptation to individual data sets becomes possible. These methods will hopefully further increase the accuracy of variance prediction in Cavalieri sampling. For example, Tables 1 and 2 show several cases where the resampling error lies between CEsh1 and CEsh2. For these cases, it would be worth while to explore whether application of the fractional theory could provide a better prediction. This analysis, however, was beyond the scope of the present study, which was focused mainly on a comparison of a computer-intensive approach with the established short-cut predictors. In future simulation studies of deterministic models with superimposed error, it is planned to include all transitive short-cut estimators based on m-classification as well as the new method based on q-estimation.